[jira] [Commented] (HIVE-26579) Prepare for Hadoop and Zookeeper switching to Reload4j
[ https://issues.apache.org/jira/browse/HIVE-26579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612855#comment-17612855 ] Attila Doroszlai commented on HIVE-26579: - Thanks [~ayushtkn] for reviewing and committing it. > Prepare for Hadoop and Zookeeper switching to Reload4j > -- > > Key: HIVE-26579 > URL: https://issues.apache.org/jira/browse/HIVE-26579 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 50m > Remaining Estimate: 0h > > Hadoop moved from Log4j1 to Reload4j (HADOOP-18088). The goal of this task is > to prepare Hive for that change: > * Hive build fails with current {{useStrictFiltering=true}} setting in some > assemblies, due to excluded dependency (log4j) not really being present. > * Exclude {{ch.qos.reload4j:\*}} in addition to current {{log4j:\*}} to > avoid polluting the assemblies and shaded jars. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26579) Prepare for Hadoop and Zookeeper switching to Reload4j
[ https://issues.apache.org/jira/browse/HIVE-26579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612853#comment-17612853 ] Ayush Saxena commented on HIVE-26579: - Committed to master. Thanx [~adoroszlai] for the contribution!!! > Prepare for Hadoop and Zookeeper switching to Reload4j > -- > > Key: HIVE-26579 > URL: https://issues.apache.org/jira/browse/HIVE-26579 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Hadoop moved from Log4j1 to Reload4j (HADOOP-18088). The goal of this task is > to prepare Hive for that change: > * Hive build fails with current {{useStrictFiltering=true}} setting in some > assemblies, due to excluded dependency (log4j) not really being present. > * Exclude {{ch.qos.reload4j:\*}} in addition to current {{log4j:\*}} to > avoid polluting the assemblies and shaded jars. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26579) Prepare for Hadoop and Zookeeper switching to Reload4j
[ https://issues.apache.org/jira/browse/HIVE-26579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-26579: Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed Status: Resolved (was: Patch Available) > Prepare for Hadoop and Zookeeper switching to Reload4j > -- > > Key: HIVE-26579 > URL: https://issues.apache.org/jira/browse/HIVE-26579 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 50m > Remaining Estimate: 0h > > Hadoop moved from Log4j1 to Reload4j (HADOOP-18088). The goal of this task is > to prepare Hive for that change: > * Hive build fails with current {{useStrictFiltering=true}} setting in some > assemblies, due to excluded dependency (log4j) not really being present. > * Exclude {{ch.qos.reload4j:\*}} in addition to current {{log4j:\*}} to > avoid polluting the assemblies and shaded jars. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26555) Read-only mode for Hive database
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-26555: -- Fix Version/s: 4.0.0-alpha-2 > Read-only mode for Hive database > > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 40m > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive database needs to be read-only, while > other one is writable to keep a single source of truth. > h1. Design > Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext > interface. hive.exec.pre.hooks needs to have the class name to initiate an > instance. The "readonly" database property can be configured to turn it on > and off. > Allowed operations prefixes > * EXPLAIN > * USE(or SWITCHDATABASE) > * REPLDUMP > * REPLSTATUS > * EXPORT > * KILL_QUERY > * DESC > * SHOW > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_delete.q > * read_only_insert.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26584) compressed_skip_header_footer_aggr.q is flaky
[ https://issues.apache.org/jira/browse/HIVE-26584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612693#comment-17612693 ] John Sherman commented on HIVE-26584: - Thanks [~ayushtkn] and [~zabetak] for helping to make the patch better. I don't see any indication that the tests require external tables. But I agree that I could be overlooking a detail and I don't functionally need to change the test so significantly (even if I think the newer version is cleaner). I went with just rmr the created directories to clean up the directories at the end. I did not add DROP IF EXISTS or rmr(s) before the creation since I find that practice typically hides problems and sometimes causes hidden dependencies between tests. As for clearTablesCreatedDuringTests - it doesn't clean these files up because it only cleans up tables under the configured warehouse directory. This test case manually creates a location not under the warehouse directory so it doesn't end up cleaning up the files. I could modify the clearTablesCreatedDuringTests to clean up all directories mentioned in CREATE EXTERNAL TABLE location clauses, but that could be risky since it could lead to user files being accidentally removed with a misconfigured location clause and I am not sure I would be able to add all the possible checks to prevent that. In the future I think tests like this should load the data via LOAD DATA if possible and not use a custom LOCATION clause so it gets cleaned up normally. > compressed_skip_header_footer_aggr.q is flaky > - > > Key: HIVE-26584 > URL: https://issues.apache.org/jira/browse/HIVE-26584 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0-alpha-2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > One of my PRs compressed_skip_header_footer_aggr.q was failing with > unexpected diff. Such as: > {code:java} > TestMiniLlapLocalCliDriver.testCliDriver:62 Client Execution succeeded but > contained differences (error code = 1) after executing > compressed_skip_header_footer_aggr.q > 69,71c69,70 > < 1 2019-12-31 > < 2 2018-12-31 > < 3 2017-12-31 > --- > > 2 2019-12-31 > > 3 2019-12-31 > 89d87 > < NULL NULL > 91c89 > < 2 2018-12-31 > --- > > 2 2019-12-31 > 100c98 > < 1 > --- > > 2 > 109c107 > < 1 2019-12-31 > --- > > 2 2019-12-31 > 127,128c125,126 > < 1 2019-12-31 > < 3 2017-12-31 > --- > > 2 2019-12-31 > > 3 2019-12-31 > 146a145 > > 2 2019-12-31 > 155c154 > < 1 > --- > > 2 {code} > Investigating it, it did not seem to fail when executed locally. Since I > suspected test interference I searched for the tablenames/directories used > and discovered empty_skip_header_footer_aggr.q which uses the same table > names AND external directories. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)
[ https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612672#comment-17612672 ] Pratik Malani commented on HIVE-26591: -- Hi [~zabetak] Thanks for your response. I think they have resolved it in the latest version. Can't see any reference to the mentioned attributes in the code. [https://jar-download.com/artifacts/org.apache.hive/hive-service/4.0.0-alpha-1/source-code/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java] But Spark 3.3.0 is not compatible with the latest mentioned hive version. Any idea when we will be having a stable release from Hive for Hive 4? > libthrift 0.14.0 onwards doesn't works with Hive (All versions) > --- > > Key: HIVE-26591 > URL: https://issues.apache.org/jira/browse/HIVE-26591 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Affects Versions: 1.2.2, 2.3.7, 2.3.9 >Reporter: Pratik Malani >Assignee: Navis Ryu >Priority: Critical > Fix For: 3.1.3, 4.0.0 > > Attachments: image-2022-10-03-19-51-20-052.png, > image-2022-10-03-19-55-16-030.png > > > libthrift:0.13.0 is affected with > [CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949] > Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4. > When we do an upgrade to use libthrift:0.14.0 and above jar, below exception > is thrown while starting the Spark Thriftserver. > {noformat} > org.apache.hive.service.ServiceException: Failed to Start HiveServer2 > at > org.apache.hive.service.CompositeService.start(CompositeService.java:79) > at > org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.NoSuchMethodError: > org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args; > at > org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176) > at > org.apache.hive.service.CompositeService.start(CompositeService.java:69) > at > org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$
[jira] [Commented] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)
[ https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612669#comment-17612669 ] Stamatis Zampetakis commented on HIVE-26591: The latest Hive release is 4.0.0-alpha-1. Does the problem exist there as well? > libthrift 0.14.0 onwards doesn't works with Hive (All versions) > --- > > Key: HIVE-26591 > URL: https://issues.apache.org/jira/browse/HIVE-26591 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Affects Versions: 1.2.2, 2.3.7, 2.3.9 >Reporter: Pratik Malani >Assignee: Navis Ryu >Priority: Critical > Fix For: 3.1.3, 4.0.0 > > Attachments: image-2022-10-03-19-51-20-052.png, > image-2022-10-03-19-55-16-030.png > > > libthrift:0.13.0 is affected with > [CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949] > Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4. > When we do an upgrade to use libthrift:0.14.0 and above jar, below exception > is thrown while starting the Spark Thriftserver. > {noformat} > org.apache.hive.service.ServiceException: Failed to Start HiveServer2 > at > org.apache.hive.service.CompositeService.start(CompositeService.java:79) > at > org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.NoSuchMethodError: > org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args; > at > org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176) > at > org.apache.hive.service.CompositeService.start(CompositeService.java:69) > at > org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apach
[jira] [Commented] (HIVE-26582) Cartesian join fails if the query has an empty table when cartesian product edge is used
[ https://issues.apache.org/jira/browse/HIVE-26582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612658#comment-17612658 ] Stamatis Zampetakis commented on HIVE-26582: [~kkasa] Checking if the stats are up to date is not a responsibility of the rule but the metadata provider. If the stats cannot be trusted then the metadata provider should return an appropriate value (null or something different according to the API). There are already various rules which base their decision on the metadata both in Hive and Calcite (e.g., https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L176) so adding more similar rules should be fine. > Cartesian join fails if the query has an empty table when cartesian product > edge is used > > > Key: HIVE-26582 > URL: https://issues.apache.org/jira/browse/HIVE-26582 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Reporter: Sourabh Badhya >Priority: Major > > The following example fails when "hive.tez.cartesian-product.enabled" is true > - > Test command - > {code:java} > mvn test -Dtest=TestMiniLlapCliDriver -Dqfile=file.q > -Dtest.output.overwrite=true {code} > Query - file.q > {code:java} > set hive.tez.cartesian-product.enabled=true; > create table c (a1 int) stored as orc; > create table tmp1 (a int) stored as orc; > create table tmp2 (a int) stored as orc; > insert into table c values (3); > insert into table tmp1 values (3); > with > first as ( > select a1 from c where a1 = 3 > ), > second as ( > select a from tmp1 > union all > select a from tmp2 > ) > select a from second cross join first; {code} > The following stack trace is seen - > {code:java} > Caused by: java.lang.IllegalArgumentException: Number of items is 0. Should > be positive > at > org.apache.tez.common.Preconditions.checkArgument(Preconditions.java:38) > at org.apache.tez.runtime.library.utils.Grouper.init(Grouper.java:41) > at > org.apache.tez.runtime.library.cartesianproduct.FairCartesianProductEdgeManager.initialize(FairCartesianProductEdgeManager.java:66) > at > org.apache.tez.runtime.library.cartesianproduct.CartesianProductEdgeManager.initialize(CartesianProductEdgeManager.java:51) > at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:213) > ... 22 more{code} > The following error is seen because one of the tables (tmp2 in this case) has > 0 rows in it. > The query works fine when the config hive.tez.cartesian-product.enabled is > set to false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26584) compressed_skip_header_footer_aggr.q is flaky
[ https://issues.apache.org/jira/browse/HIVE-26584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612615#comment-17612615 ] Stamatis Zampetakis commented on HIVE-26584: Many thanks for the detailed analysis [~jfs] and [~ayushtkn] for keeping an eye on this! Before checking the changes in the PR in detail I would like to point out the [QTestUtil#clearTablesCreatedDuringTests| https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L342] method. The method tries to clear any kind of side effects coming from table creation and there is code trying to address [directories coming from external tables|https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L395]. I am wondering if this is an appropriate place to address the flakiness observed here. > compressed_skip_header_footer_aggr.q is flaky > - > > Key: HIVE-26584 > URL: https://issues.apache.org/jira/browse/HIVE-26584 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0-alpha-2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > One of my PRs compressed_skip_header_footer_aggr.q was failing with > unexpected diff. Such as: > {code:java} > TestMiniLlapLocalCliDriver.testCliDriver:62 Client Execution succeeded but > contained differences (error code = 1) after executing > compressed_skip_header_footer_aggr.q > 69,71c69,70 > < 1 2019-12-31 > < 2 2018-12-31 > < 3 2017-12-31 > --- > > 2 2019-12-31 > > 3 2019-12-31 > 89d87 > < NULL NULL > 91c89 > < 2 2018-12-31 > --- > > 2 2019-12-31 > 100c98 > < 1 > --- > > 2 > 109c107 > < 1 2019-12-31 > --- > > 2 2019-12-31 > 127,128c125,126 > < 1 2019-12-31 > < 3 2017-12-31 > --- > > 2 2019-12-31 > > 3 2019-12-31 > 146a145 > > 2 2019-12-31 > 155c154 > < 1 > --- > > 2 {code} > Investigating it, it did not seem to fail when executed locally. Since I > suspected test interference I searched for the tablenames/directories used > and discovered empty_skip_header_footer_aggr.q which uses the same table > names AND external directories. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26581) Test failing on aarch64
[ https://issues.apache.org/jira/browse/HIVE-26581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612609#comment-17612609 ] odidev commented on HIVE-26581: --- [~zabetak] Thanks for the reply. I am facing the same issue in the amd64 platform also. > Test failing on aarch64 > --- > > Key: HIVE-26581 > URL: https://issues.apache.org/jira/browse/HIVE-26581 > Project: Hive > Issue Type: Bug >Reporter: odidev >Priority: Major > > Hi Team, > I tried to build and test the Apache hive repository on an aarch64 machine > but when I run *mvn clean install* it is giving me the following error: > {code:java} > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.265 > s <<< FAILURE! - in > org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator > [ERROR] > org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator.testFinishableStateUpdateFailure > Time elapsed: 2.206 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SignableVertexSpec$Builder.setUser(LlapDaemonProtocolProtos.java:5513) > at > org.apache.hadoop.hive.llap.tez.Converters.constructSignableVertexSpec(Converters.java:135) > at > org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.constructSubmitWorkRequest(LlapTaskCommunicator.java:912) > at > org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.registerRunningTaskAttempt(LlapTaskCommunicator.java:512) > at > org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator$LlapTaskCommunicatorWrapperForTest.registerRunningTaskAttemptWithSourceVertex(TestLlapTaskCommunicator.java:335) > at > org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator.testFinishableStateUpdateFailure(TestLlapTaskCommunicator.java:141) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:750) > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] TestLlapTaskCommunicator.testFinishableStateUpdateFailure:141 ? > NullPointer > [INFO] > [ERROR] Tests run: 53, Failures: 0, Errors: 1, Skipped: 2 > {code} > When I tried to run *mvn clean install –DskipTests* the installation was > successful but for testing when I ran *mvn test* it is giving me the > above-mentioned error. The error is the same on the amd64 platform also. > Can anyone suggest any pointers on the above error? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26582) Cartesian join fails if the query has an empty table when cartesian product edge is used
[ https://issues.apache.org/jira/browse/HIVE-26582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612573#comment-17612573 ] Krisztian Kasa commented on HIVE-26582: --- [~zabetak] no worries. In theory using {{RelMdMaxRowCount}} to make the decision whether to prune parts of the plan can work. IIUC this would rely on basic stats of the underlying tables. However based on my experience stats are not 100% accurate because there are scenarios when it is not updated when a statement finished. For example inserting into a table parallel running multiple statements or the table is external and some 3rd party tool updates the data only but not the stats. So the rule may have to check some preconditions like basic stats up-to-date. https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L2013-L2018 > Cartesian join fails if the query has an empty table when cartesian product > edge is used > > > Key: HIVE-26582 > URL: https://issues.apache.org/jira/browse/HIVE-26582 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Reporter: Sourabh Badhya >Priority: Major > > The following example fails when "hive.tez.cartesian-product.enabled" is true > - > Test command - > {code:java} > mvn test -Dtest=TestMiniLlapCliDriver -Dqfile=file.q > -Dtest.output.overwrite=true {code} > Query - file.q > {code:java} > set hive.tez.cartesian-product.enabled=true; > create table c (a1 int) stored as orc; > create table tmp1 (a int) stored as orc; > create table tmp2 (a int) stored as orc; > insert into table c values (3); > insert into table tmp1 values (3); > with > first as ( > select a1 from c where a1 = 3 > ), > second as ( > select a from tmp1 > union all > select a from tmp2 > ) > select a from second cross join first; {code} > The following stack trace is seen - > {code:java} > Caused by: java.lang.IllegalArgumentException: Number of items is 0. Should > be positive > at > org.apache.tez.common.Preconditions.checkArgument(Preconditions.java:38) > at org.apache.tez.runtime.library.utils.Grouper.init(Grouper.java:41) > at > org.apache.tez.runtime.library.cartesianproduct.FairCartesianProductEdgeManager.initialize(FairCartesianProductEdgeManager.java:66) > at > org.apache.tez.runtime.library.cartesianproduct.CartesianProductEdgeManager.initialize(CartesianProductEdgeManager.java:51) > at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:213) > ... 22 more{code} > The following error is seen because one of the tables (tmp2 in this case) has > 0 rows in it. > The query works fine when the config hive.tez.cartesian-product.enabled is > set to false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26582) Cartesian join fails if the query has an empty table when cartesian product edge is used
[ https://issues.apache.org/jira/browse/HIVE-26582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612566#comment-17612566 ] Stamatis Zampetakis commented on HIVE-26582: [~kkasa] my bad I didn't pay enough attention to the input query sorry about that. I am thinking that maybe we could use the {{RelMdMaxRowCount}} metadata to do similar pruning with what we do with {{PruneEmptyRules}} when the return value is zero. This could be an improvement to {{PruneEmptyRules}} or a new rule altogether. This may not be a solution to this ticket but if it may be an optimization worth adding, WDYT? > Cartesian join fails if the query has an empty table when cartesian product > edge is used > > > Key: HIVE-26582 > URL: https://issues.apache.org/jira/browse/HIVE-26582 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Reporter: Sourabh Badhya >Priority: Major > > The following example fails when "hive.tez.cartesian-product.enabled" is true > - > Test command - > {code:java} > mvn test -Dtest=TestMiniLlapCliDriver -Dqfile=file.q > -Dtest.output.overwrite=true {code} > Query - file.q > {code:java} > set hive.tez.cartesian-product.enabled=true; > create table c (a1 int) stored as orc; > create table tmp1 (a int) stored as orc; > create table tmp2 (a int) stored as orc; > insert into table c values (3); > insert into table tmp1 values (3); > with > first as ( > select a1 from c where a1 = 3 > ), > second as ( > select a from tmp1 > union all > select a from tmp2 > ) > select a from second cross join first; {code} > The following stack trace is seen - > {code:java} > Caused by: java.lang.IllegalArgumentException: Number of items is 0. Should > be positive > at > org.apache.tez.common.Preconditions.checkArgument(Preconditions.java:38) > at org.apache.tez.runtime.library.utils.Grouper.init(Grouper.java:41) > at > org.apache.tez.runtime.library.cartesianproduct.FairCartesianProductEdgeManager.initialize(FairCartesianProductEdgeManager.java:66) > at > org.apache.tez.runtime.library.cartesianproduct.CartesianProductEdgeManager.initialize(CartesianProductEdgeManager.java:51) > at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:213) > ... 22 more{code} > The following error is seen because one of the tables (tmp2 in this case) has > 0 rows in it. > The query works fine when the config hive.tez.cartesian-product.enabled is > set to false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26008) Dynamic partition pruning not sending right partitions with subqueries
[ https://issues.apache.org/jira/browse/HIVE-26008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-26008: --- Assignee: (was: László Bodor) > Dynamic partition pruning not sending right partitions with subqueries > -- > > Key: HIVE-26008 > URL: https://issues.apache.org/jira/browse/HIVE-26008 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance > Attachments: HIVE_26008_1_DPP_path.svg, HIVE_26008_2_DPP_paths.svg, > Screenshot 2022-03-08 at 5.04.02 AM.png > > > DPP isn't working fine when there are subqueries involved. Here is an example > query (q83). > Note that "date_dim" has another query involved. Due to this, DPP operator > ends up sending entire "date_dim" to the fact tables. > Because of this, data scanned for fact tables are way higher and query > runtime is increased. > For context, on a very small cluster, this query ran for 265 seconds and with > the rewritten query it finished in 11 seconds!. Fact table scan was 10MB vs > 10 GB. > {noformat} > HiveJoin(condition=[=($2, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $3)], joinType=[inner]) > HiveProject(cr_item_sk=[$1], cr_return_quantity=[$16], > cr_returned_date_sk=[$26]) > HiveFilter(condition=[AND(IS NOT NULL($26), IS NOT > NULL($1))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > catalog_returns]], table:alias=[catalog_returns]) > HiveProject(i_item_sk=[$0], i_item_id=[$1]) > HiveFilter(condition=[AND(IS NOT NULL($1), IS NOT > NULL($0))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > item]], table:alias=[item]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[AND(IS NOT NULL($2), IS NOT > NULL($0))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > date_dim]], table:alias=[date_dim]) > HiveProject(d_date=[$0]) > HiveSemiJoin(condition=[=($1, $2)], joinType=[semi]) > HiveProject(d_date=[$2], d_week_seq=[$4]) > HiveFilter(condition=[AND(IS NOT NULL($4), IS NOT > NULL($2))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > date_dim]], table:alias=[date_dim]) > HiveProject(d_week_seq=[$4]) > HiveFilter(condition=[AND(IN($2, 1998-01-02:DATE, > 1998-10-15:DATE, 1998-11-10:DATE), IS NOT NULL($4))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > date_dim]], table:alias=[date_dim]) > {noformat} > *Original Query & Plan: * > {noformat} > explain cbo with sr_items as > (select i_item_id item_id, > sum(sr_return_quantity) sr_item_qty > from store_returns, > item, > date_dim > where sr_item_sk = i_item_sk > and d_datein > (select d_date > from date_dim > where d_week_seq in > (select d_week_seq > from date_dim > where d_date in ('1998-01-02','1998-10-15','1998-11-10'))) > and sr_returned_date_sk = d_date_sk > group by i_item_id), > cr_items as > (select i_item_id item_id, > sum(cr_return_quantity) cr_item_qty > from catalog_returns, > item, > date_dim > where cr_item_sk = i_item_sk > and d_datein > (select d_date > from date_dim > where d_week_seq in > (select d_week_seq > from date_dim > where d_date in ('1998-01-02','1998-10-15','1998-11-10'))) > and cr_returned_date_sk = d_date_sk > group by i_item_id), > wr_items as > (select i_item_id item_id, > sum(wr_return_quantity) wr_item_qty > from web_returns, > item, > date_dim > where wr_item_sk = i_item_sk > and d_datein > (select d_date > from date_dim > where d_week_seq in > (select d_week_seq > from date_dim > where d_date in ('1998-01-02','1998-10-15','1998-11-10'))) > and wr_returned_date_sk = d_date_sk > group by i_item_id) > select sr_items.item_id > ,sr_item_qty > ,sr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 sr_dev > ,cr_item_qty > ,cr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 cr_dev > ,wr_item_qty > ,wr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 wr_dev > ,(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 average > from sr_items > ,cr_items > ,wr_items > where sr_items.item_id=cr_items.item_id > and sr_items.item_id=wr_items.item_id > order by sr_items.item_id > ,sr_item_qty > limit 100 > INFO : Starting task [Stage-3:EXPLAIN] in serial mode > INFO : Completed executing > command(queryId=hive_20220307055109_88ad0cbd-bd40-45bc-92ae-ab15fa6b1da4); > Time taken: 0.973 seconds > INFO : OK > Explain > CBO PLAN: > HiveSortLimit(sort0=[$0], sort1=[$1
[jira] [Updated] (HIVE-26543) Improve TxnHandler, TxnUtils, CompactionTxnHandler logging
[ https://issues.apache.org/jira/browse/HIVE-26543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26543: Fix Version/s: 4.0.0-alpha-2 > Improve TxnHandler, TxnUtils, CompactionTxnHandler logging > -- > > Key: HIVE-26543 > URL: https://issues.apache.org/jira/browse/HIVE-26543 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > TxnHandler has some bad logging, like: > {code} > LOG.debug("Going to execute query<" + txnsQuery + ">"); > {code} > https://github.com/apache/hive/blob/8e39937bdb577bc135579d7d34b46ba2d788ca53/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L533 > this will involve a pretty unnecessary string concatenation in production > when we're on INFO level usually, let's use string formats -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26543) Improve TxnHandler, TxnUtils, CompactionTxnHandler logging
[ https://issues.apache.org/jira/browse/HIVE-26543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612530#comment-17612530 ] László Bodor commented on HIVE-26543: - merged to master, thanks [~zabetak], [~achennagiri] for the review! > Improve TxnHandler, TxnUtils, CompactionTxnHandler logging > -- > > Key: HIVE-26543 > URL: https://issues.apache.org/jira/browse/HIVE-26543 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > TxnHandler has some bad logging, like: > {code} > LOG.debug("Going to execute query<" + txnsQuery + ">"); > {code} > https://github.com/apache/hive/blob/8e39937bdb577bc135579d7d34b46ba2d788ca53/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L533 > this will involve a pretty unnecessary string concatenation in production > when we're on INFO level usually, let's use string formats -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26543) Improve TxnHandler, TxnUtils, CompactionTxnHandler logging
[ https://issues.apache.org/jira/browse/HIVE-26543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor resolved HIVE-26543. - Resolution: Fixed > Improve TxnHandler, TxnUtils, CompactionTxnHandler logging > -- > > Key: HIVE-26543 > URL: https://issues.apache.org/jira/browse/HIVE-26543 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > TxnHandler has some bad logging, like: > {code} > LOG.debug("Going to execute query<" + txnsQuery + ">"); > {code} > https://github.com/apache/hive/blob/8e39937bdb577bc135579d7d34b46ba2d788ca53/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L533 > this will involve a pretty unnecessary string concatenation in production > when we're on INFO level usually, let's use string formats -- This message was sent by Atlassian Jira (v8.20.10#820010)