[jira] [Commented] (HIVE-26579) Prepare for Hadoop and Zookeeper switching to Reload4j

2022-10-04 Thread Attila Doroszlai (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612855#comment-17612855
 ] 

Attila Doroszlai commented on HIVE-26579:
-

Thanks [~ayushtkn] for reviewing and committing it.

> Prepare for Hadoop and Zookeeper switching to Reload4j
> --
>
> Key: HIVE-26579
> URL: https://issues.apache.org/jira/browse/HIVE-26579
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hadoop moved from Log4j1 to Reload4j (HADOOP-18088). The goal of this task is 
> to prepare Hive for that change:
>  * Hive build fails with current {{useStrictFiltering=true}} setting in some 
> assemblies, due to excluded dependency (log4j) not really being present.
>  * Exclude {{ch.qos.reload4j:\*}} in addition to current {{log4j:\*}} to 
> avoid polluting the assemblies and shaded jars.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26579) Prepare for Hadoop and Zookeeper switching to Reload4j

2022-10-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612853#comment-17612853
 ] 

Ayush Saxena commented on HIVE-26579:
-

Committed to master.

Thanx [~adoroszlai] for the contribution!!!

> Prepare for Hadoop and Zookeeper switching to Reload4j
> --
>
> Key: HIVE-26579
> URL: https://issues.apache.org/jira/browse/HIVE-26579
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hadoop moved from Log4j1 to Reload4j (HADOOP-18088). The goal of this task is 
> to prepare Hive for that change:
>  * Hive build fails with current {{useStrictFiltering=true}} setting in some 
> assemblies, due to excluded dependency (log4j) not really being present.
>  * Exclude {{ch.qos.reload4j:\*}} in addition to current {{log4j:\*}} to 
> avoid polluting the assemblies and shaded jars.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26579) Prepare for Hadoop and Zookeeper switching to Reload4j

2022-10-04 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-26579:

Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Prepare for Hadoop and Zookeeper switching to Reload4j
> --
>
> Key: HIVE-26579
> URL: https://issues.apache.org/jira/browse/HIVE-26579
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hadoop moved from Log4j1 to Reload4j (HADOOP-18088). The goal of this task is 
> to prepare Hive for that change:
>  * Hive build fails with current {{useStrictFiltering=true}} setting in some 
> assemblies, due to excluded dependency (log4j) not really being present.
>  * Exclude {{ch.qos.reload4j:\*}} in addition to current {{log4j:\*}} to 
> avoid polluting the assemblies and shaded jars.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26555) Read-only mode for Hive database

2022-10-04 Thread Teddy Choi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-26555:
--
Fix Version/s: 4.0.0-alpha-2

> Read-only mode for Hive database
> 
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive database needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. Design
> Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
> interface. hive.exec.pre.hooks needs to have the class name to initiate an 
> instance. The "readonly" database property can be configured to turn it on 
> and off.
> Allowed operations prefixes
>  * EXPLAIN
>  * USE(or SWITCHDATABASE)
>  * REPLDUMP
>  * REPLSTATUS
>  * EXPORT
>  * KILL_QUERY
>  * DESC
>  * SHOW
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_delete.q
>  * read_only_insert.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26584) compressed_skip_header_footer_aggr.q is flaky

2022-10-04 Thread John Sherman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612693#comment-17612693
 ] 

John Sherman commented on HIVE-26584:
-

Thanks [~ayushtkn] and [~zabetak] for helping to make the patch better.
I don't see any indication that the tests require external tables. But I agree 
that I could be overlooking a detail and I don't functionally need to change 
the test so significantly (even if I think the newer version is cleaner). I 
went with just rmr the created directories to clean up the directories at the 
end.

I did not add DROP IF EXISTS or rmr(s) before the creation since I find that 
practice typically hides problems and sometimes causes hidden dependencies 
between tests.

As for clearTablesCreatedDuringTests - it doesn't clean these files up because 
it only cleans up tables under the configured warehouse directory. This test 
case manually creates a location not under the warehouse directory so it 
doesn't end up cleaning up the files. I could modify the 
clearTablesCreatedDuringTests to clean up all directories mentioned in CREATE 
EXTERNAL TABLE location clauses, but that could be risky since it could lead to 
user files being accidentally removed with a misconfigured location clause and 
I am not sure I would be able to add all the possible checks to prevent that.

In the future I think tests like this should load the data via LOAD DATA if 
possible and not use a custom LOCATION clause so it gets cleaned up normally.

> compressed_skip_header_footer_aggr.q is flaky
> -
>
> Key: HIVE-26584
> URL: https://issues.apache.org/jira/browse/HIVE-26584
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> One of my PRs compressed_skip_header_footer_aggr.q  was failing with 
> unexpected diff. Such as:
> {code:java}
>  TestMiniLlapLocalCliDriver.testCliDriver:62 Client Execution succeeded but 
> contained differences (error code = 1) after executing 
> compressed_skip_header_footer_aggr.q
> 69,71c69,70
> < 1 2019-12-31
> < 2 2018-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 89d87
> < NULL  NULL
> 91c89
> < 2 2018-12-31
> ---
> > 2 2019-12-31
> 100c98
> < 1
> ---
> > 2
> 109c107
> < 1 2019-12-31
> ---
> > 2 2019-12-31
> 127,128c125,126
> < 1 2019-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 146a145
> > 2 2019-12-31
> 155c154
> < 1
> ---
> > 2 {code}
> Investigating it, it did not seem to fail when executed locally. Since I 
> suspected test interference I searched for the tablenames/directories used 
> and discovered empty_skip_header_footer_aggr.q which uses the same table 
> names AND external directories.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)

2022-10-04 Thread Pratik Malani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612672#comment-17612672
 ] 

Pratik Malani commented on HIVE-26591:
--

Hi [~zabetak] 
Thanks for your response.

I think they have resolved it in the latest version.

Can't see any reference to the mentioned attributes in the code.

[https://jar-download.com/artifacts/org.apache.hive/hive-service/4.0.0-alpha-1/source-code/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java]

 

But Spark 3.3.0 is not compatible with the latest mentioned hive version. Any 
idea when we will be having a stable release from Hive for Hive 4?

 

> libthrift 0.14.0 onwards doesn't works with Hive (All versions)
> ---
>
> Key: HIVE-26591
> URL: https://issues.apache.org/jira/browse/HIVE-26591
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 1.2.2, 2.3.7, 2.3.9
>Reporter: Pratik Malani
>Assignee: Navis Ryu
>Priority: Critical
> Fix For: 3.1.3, 4.0.0
>
> Attachments: image-2022-10-03-19-51-20-052.png, 
> image-2022-10-03-19-55-16-030.png
>
>
> libthrift:0.13.0 is affected with 
> [CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]
> Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.
> When we do an upgrade to use libthrift:0.14.0 and above jar, below exception 
> is thrown while starting the Spark Thriftserver.
> {noformat}
> org.apache.hive.service.ServiceException: Failed to Start HiveServer2
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:79)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args;
>         at 
> org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101)
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176)
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:69)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$

[jira] [Commented] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)

2022-10-04 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612669#comment-17612669
 ] 

Stamatis Zampetakis commented on HIVE-26591:


The latest Hive release is 4.0.0-alpha-1. Does the problem exist there as well?

> libthrift 0.14.0 onwards doesn't works with Hive (All versions)
> ---
>
> Key: HIVE-26591
> URL: https://issues.apache.org/jira/browse/HIVE-26591
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 1.2.2, 2.3.7, 2.3.9
>Reporter: Pratik Malani
>Assignee: Navis Ryu
>Priority: Critical
> Fix For: 3.1.3, 4.0.0
>
> Attachments: image-2022-10-03-19-51-20-052.png, 
> image-2022-10-03-19-55-16-030.png
>
>
> libthrift:0.13.0 is affected with 
> [CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]
> Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.
> When we do an upgrade to use libthrift:0.14.0 and above jar, below exception 
> is thrown while starting the Spark Thriftserver.
> {noformat}
> org.apache.hive.service.ServiceException: Failed to Start HiveServer2
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:79)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args;
>         at 
> org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101)
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176)
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:69)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apach

[jira] [Commented] (HIVE-26582) Cartesian join fails if the query has an empty table when cartesian product edge is used

2022-10-04 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612658#comment-17612658
 ] 

Stamatis Zampetakis commented on HIVE-26582:


[~kkasa] Checking if the stats are up to date is not a responsibility of the 
rule but the metadata provider. If the stats cannot be trusted then the 
metadata provider should return an appropriate value (null or something 
different according to the API). There are already various rules which base 
their decision on the metadata both in Hive and Calcite (e.g., 
https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L176)
 so adding more similar rules should be fine.

> Cartesian join fails if the query has an empty table when cartesian product 
> edge is used
> 
>
> Key: HIVE-26582
> URL: https://issues.apache.org/jira/browse/HIVE-26582
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Reporter: Sourabh Badhya
>Priority: Major
>
> The following example fails when "hive.tez.cartesian-product.enabled" is true 
> - 
> Test command - 
> {code:java}
> mvn test -Dtest=TestMiniLlapCliDriver -Dqfile=file.q 
> -Dtest.output.overwrite=true {code}
> Query - file.q
> {code:java}
> set hive.tez.cartesian-product.enabled=true;
> create table c (a1 int) stored as orc;
> create table tmp1 (a int) stored as orc;
> create table tmp2 (a int) stored as orc;
> insert into table c values (3);
> insert into table tmp1 values (3);
> with
> first as (
> select a1 from c where a1 = 3
> ),
> second as (
> select a from tmp1
> union all
> select a from tmp2
> )
> select a from second cross join first; {code}
> The following stack trace is seen - 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Number of items is 0. Should 
> be positive
>         at 
> org.apache.tez.common.Preconditions.checkArgument(Preconditions.java:38)
>         at org.apache.tez.runtime.library.utils.Grouper.init(Grouper.java:41)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.FairCartesianProductEdgeManager.initialize(FairCartesianProductEdgeManager.java:66)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.CartesianProductEdgeManager.initialize(CartesianProductEdgeManager.java:51)
>         at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:213)
>         ... 22 more{code}
> The following error is seen because one of the tables (tmp2 in this case) has 
> 0 rows in it. 
> The query works fine when the config hive.tez.cartesian-product.enabled is 
> set to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26584) compressed_skip_header_footer_aggr.q is flaky

2022-10-04 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612615#comment-17612615
 ] 

Stamatis Zampetakis commented on HIVE-26584:


Many thanks for the detailed analysis [~jfs] and [~ayushtkn] for keeping an eye 
on this! 

Before checking the changes in the PR in detail I would like to point out the 
[QTestUtil#clearTablesCreatedDuringTests| 
https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L342]
 method. The method tries to clear any kind of side effects coming from table 
creation and there is code trying to address [directories coming from external 
tables|https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L395].
 I am wondering if this is an appropriate place to address the flakiness 
observed here. 

> compressed_skip_header_footer_aggr.q is flaky
> -
>
> Key: HIVE-26584
> URL: https://issues.apache.org/jira/browse/HIVE-26584
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> One of my PRs compressed_skip_header_footer_aggr.q  was failing with 
> unexpected diff. Such as:
> {code:java}
>  TestMiniLlapLocalCliDriver.testCliDriver:62 Client Execution succeeded but 
> contained differences (error code = 1) after executing 
> compressed_skip_header_footer_aggr.q
> 69,71c69,70
> < 1 2019-12-31
> < 2 2018-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 89d87
> < NULL  NULL
> 91c89
> < 2 2018-12-31
> ---
> > 2 2019-12-31
> 100c98
> < 1
> ---
> > 2
> 109c107
> < 1 2019-12-31
> ---
> > 2 2019-12-31
> 127,128c125,126
> < 1 2019-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 146a145
> > 2 2019-12-31
> 155c154
> < 1
> ---
> > 2 {code}
> Investigating it, it did not seem to fail when executed locally. Since I 
> suspected test interference I searched for the tablenames/directories used 
> and discovered empty_skip_header_footer_aggr.q which uses the same table 
> names AND external directories.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26581) Test failing on aarch64

2022-10-04 Thread odidev (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612609#comment-17612609
 ] 

odidev commented on HIVE-26581:
---

[~zabetak] Thanks for the reply. I am facing the same issue in the amd64 
platform also.

> Test failing on aarch64
> ---
>
> Key: HIVE-26581
> URL: https://issues.apache.org/jira/browse/HIVE-26581
> Project: Hive
>  Issue Type: Bug
>Reporter: odidev
>Priority: Major
>
> Hi Team, 
> I tried to build and test the Apache hive repository on an aarch64 machine 
> but when I run *mvn clean install* it is giving me the following error:
> {code:java}
> [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.265 
> s <<< FAILURE! - in 
> org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator
> [ERROR] 
> org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator.testFinishableStateUpdateFailure
>   Time elapsed: 2.206 s  <<< ERROR!
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SignableVertexSpec$Builder.setUser(LlapDaemonProtocolProtos.java:5513)
> at 
> org.apache.hadoop.hive.llap.tez.Converters.constructSignableVertexSpec(Converters.java:135)
> at 
> org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.constructSubmitWorkRequest(LlapTaskCommunicator.java:912)
> at 
> org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.registerRunningTaskAttempt(LlapTaskCommunicator.java:512)
> at 
> org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator$LlapTaskCommunicatorWrapperForTest.registerRunningTaskAttemptWithSourceVertex(TestLlapTaskCommunicator.java:335)
> at 
> org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator.testFinishableStateUpdateFailure(TestLlapTaskCommunicator.java:141)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:750)
> [INFO]
> [INFO] Results:
> [INFO]
> [ERROR] Errors:
> [ERROR]   TestLlapTaskCommunicator.testFinishableStateUpdateFailure:141 ? 
> NullPointer
> [INFO]
> [ERROR] Tests run: 53, Failures: 0, Errors: 1, Skipped: 2
> {code}
> When I tried to run *mvn clean install –DskipTests* the installation was 
> successful but for testing when I ran *mvn test* it is giving me the 
> above-mentioned error. The error is the same on the amd64 platform also. 
> Can anyone suggest any pointers on the above error?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26582) Cartesian join fails if the query has an empty table when cartesian product edge is used

2022-10-04 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612573#comment-17612573
 ] 

Krisztian Kasa commented on HIVE-26582:
---

[~zabetak] no worries.
In theory using {{RelMdMaxRowCount}} to make the decision whether to prune 
parts of the plan can work. IIUC this would rely on basic stats of the 
underlying tables. However based on my experience stats are not 100% accurate 
because there are scenarios when it is not updated when a statement finished. 
For example inserting into a table parallel running multiple statements or the 
table is external and some 3rd party tool updates the data only but not the 
stats. So the rule may have to check some preconditions like basic stats 
up-to-date.

https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L2013-L2018

> Cartesian join fails if the query has an empty table when cartesian product 
> edge is used
> 
>
> Key: HIVE-26582
> URL: https://issues.apache.org/jira/browse/HIVE-26582
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Reporter: Sourabh Badhya
>Priority: Major
>
> The following example fails when "hive.tez.cartesian-product.enabled" is true 
> - 
> Test command - 
> {code:java}
> mvn test -Dtest=TestMiniLlapCliDriver -Dqfile=file.q 
> -Dtest.output.overwrite=true {code}
> Query - file.q
> {code:java}
> set hive.tez.cartesian-product.enabled=true;
> create table c (a1 int) stored as orc;
> create table tmp1 (a int) stored as orc;
> create table tmp2 (a int) stored as orc;
> insert into table c values (3);
> insert into table tmp1 values (3);
> with
> first as (
> select a1 from c where a1 = 3
> ),
> second as (
> select a from tmp1
> union all
> select a from tmp2
> )
> select a from second cross join first; {code}
> The following stack trace is seen - 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Number of items is 0. Should 
> be positive
>         at 
> org.apache.tez.common.Preconditions.checkArgument(Preconditions.java:38)
>         at org.apache.tez.runtime.library.utils.Grouper.init(Grouper.java:41)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.FairCartesianProductEdgeManager.initialize(FairCartesianProductEdgeManager.java:66)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.CartesianProductEdgeManager.initialize(CartesianProductEdgeManager.java:51)
>         at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:213)
>         ... 22 more{code}
> The following error is seen because one of the tables (tmp2 in this case) has 
> 0 rows in it. 
> The query works fine when the config hive.tez.cartesian-product.enabled is 
> set to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26582) Cartesian join fails if the query has an empty table when cartesian product edge is used

2022-10-04 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612566#comment-17612566
 ] 

Stamatis Zampetakis commented on HIVE-26582:


[~kkasa] my bad I didn't pay enough attention to the input query sorry about 
that.

I am thinking that maybe we could use the {{RelMdMaxRowCount}} metadata to do 
similar pruning with what we do with {{PruneEmptyRules}} when the return value 
is zero. This could be an improvement to {{PruneEmptyRules}} or a new rule 
altogether. This may not be a solution to this ticket but if it may be an 
optimization worth adding, WDYT?

> Cartesian join fails if the query has an empty table when cartesian product 
> edge is used
> 
>
> Key: HIVE-26582
> URL: https://issues.apache.org/jira/browse/HIVE-26582
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Reporter: Sourabh Badhya
>Priority: Major
>
> The following example fails when "hive.tez.cartesian-product.enabled" is true 
> - 
> Test command - 
> {code:java}
> mvn test -Dtest=TestMiniLlapCliDriver -Dqfile=file.q 
> -Dtest.output.overwrite=true {code}
> Query - file.q
> {code:java}
> set hive.tez.cartesian-product.enabled=true;
> create table c (a1 int) stored as orc;
> create table tmp1 (a int) stored as orc;
> create table tmp2 (a int) stored as orc;
> insert into table c values (3);
> insert into table tmp1 values (3);
> with
> first as (
> select a1 from c where a1 = 3
> ),
> second as (
> select a from tmp1
> union all
> select a from tmp2
> )
> select a from second cross join first; {code}
> The following stack trace is seen - 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Number of items is 0. Should 
> be positive
>         at 
> org.apache.tez.common.Preconditions.checkArgument(Preconditions.java:38)
>         at org.apache.tez.runtime.library.utils.Grouper.init(Grouper.java:41)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.FairCartesianProductEdgeManager.initialize(FairCartesianProductEdgeManager.java:66)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.CartesianProductEdgeManager.initialize(CartesianProductEdgeManager.java:51)
>         at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:213)
>         ... 22 more{code}
> The following error is seen because one of the tables (tmp2 in this case) has 
> 0 rows in it. 
> The query works fine when the config hive.tez.cartesian-product.enabled is 
> set to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26008) Dynamic partition pruning not sending right partitions with subqueries

2022-10-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-26008:
---

Assignee: (was: László Bodor)

> Dynamic partition pruning not sending right partitions with subqueries
> --
>
> Key: HIVE-26008
> URL: https://issues.apache.org/jira/browse/HIVE-26008
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
> Attachments: HIVE_26008_1_DPP_path.svg, HIVE_26008_2_DPP_paths.svg, 
> Screenshot 2022-03-08 at 5.04.02 AM.png
>
>
> DPP isn't working fine when there are subqueries involved. Here is an example 
> query (q83). 
> Note that "date_dim" has another query involved. Due to this, DPP operator 
> ends up sending entire "date_dim" to the fact tables. 
> Because of this, data scanned for fact tables are way higher and query 
> runtime is increased.
> For context, on a very small cluster, this query ran for 265 seconds and with 
> the rewritten query it finished in 11 seconds!. Fact table scan was 10MB vs 
> 10 GB.
> {noformat}
> HiveJoin(condition=[=($2, $5)], joinType=[inner])
> HiveJoin(condition=[=($0, $3)], joinType=[inner])
>   HiveProject(cr_item_sk=[$1], cr_return_quantity=[$16], 
> cr_returned_date_sk=[$26])
> HiveFilter(condition=[AND(IS NOT NULL($26), IS NOT 
> NULL($1))])
>   HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> catalog_returns]], table:alias=[catalog_returns])
>   HiveProject(i_item_sk=[$0], i_item_id=[$1])
> HiveFilter(condition=[AND(IS NOT NULL($1), IS NOT 
> NULL($0))])
>   HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> item]], table:alias=[item])
> HiveProject(d_date_sk=[$0], d_date=[$2])
>   HiveFilter(condition=[AND(IS NOT NULL($2), IS NOT 
> NULL($0))])
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> date_dim]], table:alias=[date_dim])
>   HiveProject(d_date=[$0])
> HiveSemiJoin(condition=[=($1, $2)], joinType=[semi])
>   HiveProject(d_date=[$2], d_week_seq=[$4])
> HiveFilter(condition=[AND(IS NOT NULL($4), IS NOT 
> NULL($2))])
>   HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> date_dim]], table:alias=[date_dim])
>   HiveProject(d_week_seq=[$4])
> HiveFilter(condition=[AND(IN($2, 1998-01-02:DATE, 
> 1998-10-15:DATE, 1998-11-10:DATE), IS NOT NULL($4))])
>   HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> date_dim]], table:alias=[date_dim])
> {noformat}
> *Original Query & Plan: *
> {noformat}
> explain cbo with sr_items as
> (select i_item_id item_id,
> sum(sr_return_quantity) sr_item_qty
> from store_returns,
> item,
> date_dim
> where sr_item_sk = i_item_sk
> and   d_datein
> (select d_date
> from date_dim
> where d_week_seq in
> (select d_week_seq
> from date_dim
> where d_date in ('1998-01-02','1998-10-15','1998-11-10')))
> and   sr_returned_date_sk   = d_date_sk
> group by i_item_id),
> cr_items as
> (select i_item_id item_id,
> sum(cr_return_quantity) cr_item_qty
> from catalog_returns,
> item,
> date_dim
> where cr_item_sk = i_item_sk
> and   d_datein
> (select d_date
> from date_dim
> where d_week_seq in
> (select d_week_seq
> from date_dim
> where d_date in ('1998-01-02','1998-10-15','1998-11-10')))
> and   cr_returned_date_sk   = d_date_sk
> group by i_item_id),
> wr_items as
> (select i_item_id item_id,
> sum(wr_return_quantity) wr_item_qty
> from web_returns,
> item,
> date_dim
> where wr_item_sk = i_item_sk
> and   d_datein
> (select d_date
> from date_dim
> where d_week_seq in
> (select d_week_seq
> from date_dim
> where d_date in ('1998-01-02','1998-10-15','1998-11-10')))
> and   wr_returned_date_sk   = d_date_sk
> group by i_item_id)
> select  sr_items.item_id
> ,sr_item_qty
> ,sr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 sr_dev
> ,cr_item_qty
> ,cr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 cr_dev
> ,wr_item_qty
> ,wr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 wr_dev
> ,(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 average
> from sr_items
> ,cr_items
> ,wr_items
> where sr_items.item_id=cr_items.item_id
> and sr_items.item_id=wr_items.item_id
> order by sr_items.item_id
> ,sr_item_qty
> limit 100
> INFO  : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20220307055109_88ad0cbd-bd40-45bc-92ae-ab15fa6b1da4); 
> Time taken: 0.973 seconds
> INFO  : OK
> Explain
> CBO PLAN:
> HiveSortLimit(sort0=[$0], sort1=[$1

[jira] [Updated] (HIVE-26543) Improve TxnHandler, TxnUtils, CompactionTxnHandler logging

2022-10-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26543:

Fix Version/s: 4.0.0-alpha-2

> Improve TxnHandler, TxnUtils, CompactionTxnHandler logging
> --
>
> Key: HIVE-26543
> URL: https://issues.apache.org/jira/browse/HIVE-26543
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> TxnHandler has some bad logging, like:
> {code}
> LOG.debug("Going to execute query<" + txnsQuery + ">");
> {code}
> https://github.com/apache/hive/blob/8e39937bdb577bc135579d7d34b46ba2d788ca53/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L533
> this will involve a pretty unnecessary string concatenation in production 
> when we're on INFO level usually, let's use string formats



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26543) Improve TxnHandler, TxnUtils, CompactionTxnHandler logging

2022-10-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-26543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612530#comment-17612530
 ] 

László Bodor commented on HIVE-26543:
-

merged to master, thanks [~zabetak], [~achennagiri] for the review!

> Improve TxnHandler, TxnUtils, CompactionTxnHandler logging
> --
>
> Key: HIVE-26543
> URL: https://issues.apache.org/jira/browse/HIVE-26543
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> TxnHandler has some bad logging, like:
> {code}
> LOG.debug("Going to execute query<" + txnsQuery + ">");
> {code}
> https://github.com/apache/hive/blob/8e39937bdb577bc135579d7d34b46ba2d788ca53/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L533
> this will involve a pretty unnecessary string concatenation in production 
> when we're on INFO level usually, let's use string formats



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26543) Improve TxnHandler, TxnUtils, CompactionTxnHandler logging

2022-10-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-26543.
-
Resolution: Fixed

> Improve TxnHandler, TxnUtils, CompactionTxnHandler logging
> --
>
> Key: HIVE-26543
> URL: https://issues.apache.org/jira/browse/HIVE-26543
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> TxnHandler has some bad logging, like:
> {code}
> LOG.debug("Going to execute query<" + txnsQuery + ">");
> {code}
> https://github.com/apache/hive/blob/8e39937bdb577bc135579d7d34b46ba2d788ca53/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L533
> this will involve a pretty unnecessary string concatenation in production 
> when we're on INFO level usually, let's use string formats



--
This message was sent by Atlassian Jira
(v8.20.10#820010)