[jira] [Comment Edited] (HIVE-25952) Drop HiveRelMdPredicates::getPredicates(Project...) to use that of RelMdPredicates
[ https://issues.apache.org/jira/browse/HIVE-25952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868103#comment-17868103 ] Alessandro Solimando edited comment on HIVE-25952 at 7/23/24 3:44 PM: -- It's been a long time but IIRC, I marked HIVE-25966 as blocking this ticket due to [HiveRelMdPredicates.java#L160|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L160]. It seems that the opposite should be true, we can't fix HIVE-25966 (and adopt Calcite's machinery) without making them agree on what a constant is (the RexCall case you mentioned, with the imprecision you correctly spotted). Most probably the test results are not available anymore, but the divergence of Hive/Calcite on what is a constant where causing some issues that had to be fixed. EDIT: when the run of your PR is over we will probably see what issues blocked me back then when working on it. I don't remember how big these changes were, but for sure they were there. was (Author: asolimando): It's been a long time but IIRC, I marked HIVE-25966 as blocking this ticket due to [HiveRelMdPredicates.java#L160|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L160]. It seems that the opposite should be true, we can't fix HIVE-25966 (and adopt Calcite's machinery) without making them agree on what a constant is (the RexCall case you mentioned, with the imprecision you correctly spotted). Most probably the test results are not available anymore, but the divergence of Hive/Calcite on what is a constant where causing some issues that had to be fixed. > Drop HiveRelMdPredicates::getPredicates(Project...) to use that of > RelMdPredicates > -- > > Key: HIVE-25952 > URL: https://issues.apache.org/jira/browse/HIVE-25952 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 4.0.0 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > There are some differences on this method between Hive and Calcite, the idea > of this ticket is to unify the two methods, and then drop the override in > HiveRelMdPredicates in favour of the method of RelMdPredicates. > After applying HIVE-25966, the only difference is in the test for constant > expressions, which can be summarized as follows: > ||Expression Type|Is Constant for Hive?||Is Constant for Calcite?|| > |InputRef|False|False| > |Call|True if function is deterministic (arguments are not checked), false > otherwise|True if function is deterministic and all operands are constants, > false otherwise| > |CorrelatedVariable|False|False| > |LocalRef|False|False| > |Over|False|False| > |DymanicParameter|False|True| > |RangeRef|False|False| > |FieldAccess|False|Given expr.field, true if expr is constant, false > otherwise| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-25952) Drop HiveRelMdPredicates::getPredicates(Project...) to use that of RelMdPredicates
[ https://issues.apache.org/jira/browse/HIVE-25952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868103#comment-17868103 ] Alessandro Solimando commented on HIVE-25952: - It's been a long time but IIRC, I marked HIVE-25966 as blocking this ticket due to [HiveRelMdPredicates.java#L160|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L160]. It seems that the opposite should be true, we can't fix HIVE-25966 (and adopt Calcite's machinery) without making them agree on what a constant is (the RexCall case you mentioned, with the imprecision you correctly spotted). Most probably the test results are not available anymore, but the divergence of Hive/Calcite on what is a constant where causing some issues that had to be fixed. > Drop HiveRelMdPredicates::getPredicates(Project...) to use that of > RelMdPredicates > -- > > Key: HIVE-25952 > URL: https://issues.apache.org/jira/browse/HIVE-25952 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 4.0.0 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > There are some differences on this method between Hive and Calcite, the idea > of this ticket is to unify the two methods, and then drop the override in > HiveRelMdPredicates in favour of the method of RelMdPredicates. > After applying HIVE-25966, the only difference is in the test for constant > expressions, which can be summarized as follows: > ||Expression Type|Is Constant for Hive?||Is Constant for Calcite?|| > |InputRef|False|False| > |Call|True if function is deterministic (arguments are not checked), false > otherwise|True if function is deterministic and all operands are constants, > false otherwise| > |CorrelatedVariable|False|False| > |LocalRef|False|False| > |Over|False|False| > |DymanicParameter|False|True| > |RangeRef|False|False| > |FieldAccess|False|Given expr.field, true if expr is constant, false > otherwise| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28264) OOM/slow compilation when query contains SELECT clauses with nested expressions
[ https://issues.apache.org/jira/browse/HIVE-28264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847000#comment-17847000 ] Alessandro Solimando commented on HIVE-28264: - I guess the problem applies to the respective Calcite rules from which the Hive ones were derived, do you know if that has been addressed there? > OOM/slow compilation when query contains SELECT clauses with nested > expressions > --- > > Key: HIVE-28264 > URL: https://issues.apache.org/jira/browse/HIVE-28264 > Project: Hive > Issue Type: Bug > Components: CBO, HiveServer2 >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > {code:sql} > CREATE TABLE t0 (`title` string); > SELECT x10 from > (SELECT concat_ws('L10',x9, x9, x9, x9) as x10 from > (SELECT concat_ws('L9',x8, x8, x8, x8) as x9 from > (SELECT concat_ws('L8',x7, x7, x7, x7) as x8 from > (SELECT concat_ws('L7',x6, x6, x6, x6) as x7 from > (SELECT concat_ws('L6',x5, x5, x5, x5) as x6 from > (SELECT concat_ws('L5',x4, x4, x4, x4) as x5 from > (SELECT concat_ws('L4',x3, x3, x3, x3) as x4 from > (SELECT concat_ws('L3',x2, x2, x2, x2) as x3 > from > (SELECT concat_ws('L2',x1, x1, x1, x1) as > x2 from > (SELECT concat_ws('L1',x0, x0, x0, > x0) as x1 from > (SELECT concat_ws('L0',title, > title, title, title) as x0 from t0) t1) t2) t3) t4) t5) t6) t7) t8) t9) t10) t > WHERE x10 = 'Something'; > {code} > The query above fails with OOM when run with the TestMiniLlapLocalCliDriver > and the default max heap size configuration effective for tests (-Xmx2048m). > {noformat} > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) > at > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:152) > at org.apache.calcite.rex.RexCall.toString(RexCall.java:165) > at org.apache.calcite.rex.RexCall.appendOperands(RexCall.java:105) > at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:151) > at org.apache.calcite.rex.RexCall.toString(RexCall.java:165) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:90) > at > org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144) > at > org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) > at > org.apache.calcite.rel.externalize.RelWriterImpl.explainInputs(RelWriterImpl.java:122) > at > org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:116) > at > org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144) > at > org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) > at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2308) > at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2292) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RuleEventLogger.ruleProductionSucceeded(RuleEventLogger.java:73) > at > org.apache.calcite.plan.MulticastRelOptListener.ruleProductionSucceeded(MulticastRelOptListener.java:68) > at > org.apache.calcite.plan.AbstractRelOptPlanner.notifyTransformation(AbstractRelOptPlanner.java:370) > at > org.apache.calcite.plan.hep.HepPlanner.applyTransformationResults(HepPlanner.java:702) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:545) > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:271) > at > org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2452) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2411) > {noformat} -- This
[jira] [Commented] (HIVE-26313) Aggregate all column statistics into a single field in metastore
[ https://issues.apache.org/jira/browse/HIVE-26313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702796#comment-17702796 ] Alessandro Solimando commented on HIVE-26313: - [~zabetak] there is a draft branch available [here|https://github.com/asolimando/hive/tree/master-HIVE-26313-statistics_blob], it's the one [~veghlaci05] is referring to. The code is based on _Jackson_ and _Immutables_ libraries and has the building blocks to serialize statistics to _json_ and deserialize them. What I was aiming for, was to keep both individual columns and the blob, test that everything was working end-to-end (comparing both), then remove and clean up the individual columns version once I was happy with the result. That's the reason why you see many unnecessary serialization and deserialization of the json blob. At the same time the idea was to also simplify the subclasses of _ ColumnStatsMerger_ and push more complexity into each class (in line with HIVE-27000 but pushed even further). I am not currently working on it, so if you are interested feel free to pick this up and use the branch if it's useful. > Aggregate all column statistics into a single field in metastore > > > Key: HIVE-26313 > URL: https://issues.apache.org/jira/browse/HIVE-26313 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore, Statistics >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Priority: Major > Labels: backward-incompatible > > At the moment, column statistics tables in the metastore schema look like > this (it's similar for _PART_COL_STATS_): > {noformat} > CREATE TABLE "APP"."TAB_COL_STATS"( > "CAT_NAME" VARCHAR(256) NOT NULL, > "DB_NAME" VARCHAR(128) NOT NULL, > "TABLE_NAME" VARCHAR(256) NOT NULL, > "COLUMN_NAME" VARCHAR(767) NOT NULL, > "COLUMN_TYPE" VARCHAR(128) NOT NULL, > "LONG_LOW_VALUE" BIGINT, > "LONG_HIGH_VALUE" BIGINT, > "DOUBLE_LOW_VALUE" DOUBLE, > "DOUBLE_HIGH_VALUE" DOUBLE, > "BIG_DECIMAL_LOW_VALUE" VARCHAR(4000), > "BIG_DECIMAL_HIGH_VALUE" VARCHAR(4000), > "NUM_DISTINCTS" BIGINT, > "NUM_NULLS" BIGINT NOT NULL, > "AVG_COL_LEN" DOUBLE, > "MAX_COL_LEN" BIGINT, > "NUM_TRUES" BIGINT, > "NUM_FALSES" BIGINT, > "LAST_ANALYZED" BIGINT, > "CS_ID" BIGINT NOT NULL, > "TBL_ID" BIGINT NOT NULL, > "BIT_VECTOR" BLOB, > "ENGINE" VARCHAR(128) NOT NULL > ); > {noformat} > The idea is to have a single blob named _STATISTICS_ to replace them, as > follows: > {noformat} > CREATE TABLE "APP"."TAB_COL_STATS"( > "CAT_NAME" VARCHAR(256) NOT NULL, > "DB_NAME" VARCHAR(128) NOT NULL, > "TABLE_NAME" VARCHAR(256) NOT NULL, > "COLUMN_NAME" VARCHAR(767) NOT NULL, > "COLUMN_TYPE" VARCHAR(128) NOT NULL, > "STATISTICS" BLOB, > "LAST_ANALYZED" BIGINT, > "CS_ID" BIGINT NOT NULL, > "TBL_ID" BIGINT NOT NULL, > "ENGINE" VARCHAR(128) NOT NULL > ); > {noformat} > The _STATISTICS_ column could be the serialization of a Json-encoded string, > which will be consumed in a "schema-on-read" fashion. > At first at least the removed column statistics will be encoded in the > _STATISTICS_ column, but since each "consumer" will read the portion of the > schema it is interested into, multiple engines (see the _ENGINE_ column) can > read and write statistics as they deem fit. > Another advantage is that, if we plan to add more statistics in the future, > we won't need to change the thrift interface for the metastore again. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled
[ https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-27065: Affects Version/s: 4.0.0-alpha-2 > Exception in partition column statistics update with SQL Server db when > histogram statistics is not enabled > --- > > Key: HIVE-27065 > URL: https://issues.apache.org/jira/browse/HIVE-27065 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0-alpha-2 >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > *[Description]* > java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with > SQL Server db when histogram statistics is not enabled. > *java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query.* > > *[Steps to reproduce]* > Create stage table, load data into stage table, create partition table and > load data into the table from the stage table. > {code:java} > 0: jdbc:hive2://localhost:1> create database mydb; > 0: jdbc:hive2://localhost:1> use mydb; > > 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name > string) row format delimited fields terminated by '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table > stage; > > 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) > partitioned by (category string) row format delimited fields terminated by > '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; > {code} > > *[Exception Stack]* > {code:java} > 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] > metastore.DirectSqlUpdateStat: Unable to update Column stats for dynpart > java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query. > at > com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303) > ~[mssql-jdbc-6.2.1.jre8.jar:?] > at > org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_292] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_292] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292] > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown > Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >
[jira] [Updated] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled
[ https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-27065: Component/s: Metastore Statistics > Exception in partition column statistics update with SQL Server db when > histogram statistics is not enabled > --- > > Key: HIVE-27065 > URL: https://issues.apache.org/jira/browse/HIVE-27065 > Project: Hive > Issue Type: Bug > Components: Metastore, Statistics >Affects Versions: 4.0.0-alpha-2 >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > *[Description]* > java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with > SQL Server db when histogram statistics is not enabled. > *java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query.* > > *[Steps to reproduce]* > Create stage table, load data into stage table, create partition table and > load data into the table from the stage table. > {code:java} > 0: jdbc:hive2://localhost:1> create database mydb; > 0: jdbc:hive2://localhost:1> use mydb; > > 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name > string) row format delimited fields terminated by '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table > stage; > > 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) > partitioned by (category string) row format delimited fields terminated by > '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; > {code} > > *[Exception Stack]* > {code:java} > 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] > metastore.DirectSqlUpdateStat: Unable to update Column stats for dynpart > java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query. > at > com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303) > ~[mssql-jdbc-6.2.1.jre8.jar:?] > at > org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_292] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_292] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292] > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown > Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at >
[jira] [Commented] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled
[ https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688632#comment-17688632 ] Alessandro Solimando commented on HIVE-27065: - Fixed via [8592eb0|https://github.com/apache/hive/commit/8592eb0c6466234b9162b27c26bc2b13030cf71f], thanks [~VenuReddy] for your patch! > Exception in partition column statistics update with SQL Server db when > histogram statistics is not enabled > --- > > Key: HIVE-27065 > URL: https://issues.apache.org/jira/browse/HIVE-27065 > Project: Hive > Issue Type: Bug >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > *[Description]* > java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with > SQL Server db when histogram statistics is not enabled. > *java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query.* > > *[Steps to reproduce]* > Create stage table, load data into stage table, create partition table and > load data into the table from the stage table. > {code:java} > 0: jdbc:hive2://localhost:1> create database mydb; > 0: jdbc:hive2://localhost:1> use mydb; > > 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name > string) row format delimited fields terminated by '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table > stage; > > 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) > partitioned by (category string) row format delimited fields terminated by > '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; > {code} > > *[Exception Stack]* > {code:java} > 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] > metastore.DirectSqlUpdateStat: Unable to update Column stats for dynpart > java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query. > at > com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303) > ~[mssql-jdbc-6.2.1.jre8.jar:?] > at > org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_292] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_292] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292] > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown > Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at >
[jira] [Resolved] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled
[ https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando resolved HIVE-27065. - Fix Version/s: 4.0.0 Resolution: Fixed > Exception in partition column statistics update with SQL Server db when > histogram statistics is not enabled > --- > > Key: HIVE-27065 > URL: https://issues.apache.org/jira/browse/HIVE-27065 > Project: Hive > Issue Type: Bug >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > *[Description]* > java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with > SQL Server db when histogram statistics is not enabled. > *java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query.* > > *[Steps to reproduce]* > Create stage table, load data into stage table, create partition table and > load data into the table from the stage table. > {code:java} > 0: jdbc:hive2://localhost:1> create database mydb; > 0: jdbc:hive2://localhost:1> use mydb; > > 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name > string) row format delimited fields terminated by '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table > stage; > > 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) > partitioned by (category string) row format delimited fields terminated by > '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; > {code} > > *[Exception Stack]* > {code:java} > 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] > metastore.DirectSqlUpdateStat: Unable to update Column stats for dynpart > java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query. > at > com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303) > ~[mssql-jdbc-6.2.1.jre8.jar:?] > at > org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_292] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_292] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292] > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown > Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_292] >
[jira] [Assigned] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled
[ https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-27065: --- Assignee: Venugopal Reddy K > Exception in partition column statistics update with SQL Server db when > histogram statistics is not enabled > --- > > Key: HIVE-27065 > URL: https://issues.apache.org/jira/browse/HIVE-27065 > Project: Hive > Issue Type: Bug >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > *[Description]* > java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with > SQL Server db when histogram statistics is not enabled. > *java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query.* > > *[Steps to reproduce]* > Create stage table, load data into stage table, create partition table and > load data into the table from the stage table. > {code:java} > 0: jdbc:hive2://localhost:1> create database mydb; > 0: jdbc:hive2://localhost:1> use mydb; > > 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name > string) row format delimited fields terminated by '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table > stage; > > 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) > partitioned by (category string) row format delimited fields terminated by > '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; > {code} > > *[Exception Stack]* > {code:java} > 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] > metastore.DirectSqlUpdateStat: Unable to update Column stats for dynpart > java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query. > at > com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303) > ~[mssql-jdbc-6.2.1.jre8.jar:?] > at > org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_292] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_292] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292] > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown > Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_292] > at >
[jira] [Commented] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled
[ https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686915#comment-17686915 ] Alessandro Solimando commented on HIVE-27065: - Thanks [~VenuReddy] for reporting and opening the PR, I will take a closer look at it during the weekend. I will leave some preliminary comments to kick off the discussion! > Exception in partition column statistics update with SQL Server db when > histogram statistics is not enabled > --- > > Key: HIVE-27065 > URL: https://issues.apache.org/jira/browse/HIVE-27065 > Project: Hive > Issue Type: Bug >Reporter: Venugopal Reddy K >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > *[Description]* > java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with > SQL Server db when histogram statistics is not enabled. > *java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query.* > > *[Steps to reproduce]* > Create stage table, load data into stage table, create partition table and > load data into the table from the stage table. > {code:java} > 0: jdbc:hive2://localhost:1> create database mydb; > 0: jdbc:hive2://localhost:1> use mydb; > > 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name > string) row format delimited fields terminated by '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table > stage; > > 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) > partitioned by (category string) row format delimited fields terminated by > '\t' stored as textfile; > > 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; > {code} > > *[Exception Stack]* > {code:java} > 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] > metastore.DirectSqlUpdateStat: Unable to update Column stats for dynpart > java.sql.BatchUpdateException: Implicit conversion from data type varchar to > varbinary(max) is not allowed. Use the CONVERT function to run this query. > at > com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303) > ~[mssql-jdbc-6.2.1.jre8.jar:?] > at > org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_292] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_292] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292] > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown > Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > at >
[jira] [Commented] (HIVE-27055) hive-exec typos part 3
[ https://issues.apache.org/jira/browse/HIVE-27055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685182#comment-17685182 ] Alessandro Solimando commented on HIVE-27055: - I have changed the "Affect Version/s" field value to 4.0.0-alpha-2 because 4.0.0 is not out yet AFAIK. > hive-exec typos part 3 > -- > > Key: HIVE-27055 > URL: https://issues.apache.org/jira/browse/HIVE-27055 > Project: Hive > Issue Type: Improvement > Components: Query Planning, Query Processor >Affects Versions: 4.0.0-alpha-2 >Reporter: Michal Lorek >Assignee: Michal Lorek >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > multiple typos and grammar errors in hive-exec module code and comments -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27055) hive-exec typos part 3
[ https://issues.apache.org/jira/browse/HIVE-27055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-27055: Affects Version/s: 4.0.0-alpha-2 (was: 4.0.0) > hive-exec typos part 3 > -- > > Key: HIVE-27055 > URL: https://issues.apache.org/jira/browse/HIVE-27055 > Project: Hive > Issue Type: Improvement > Components: Query Planning, Query Processor >Affects Versions: 4.0.0-alpha-2 >Reporter: Michal Lorek >Assignee: Michal Lorek >Priority: Trivial > > multiple typos and grammar errors in hive-exec module code and comments -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27000) Improve the modularity of the *ColumnStatsMerger classes
[ https://issues.apache.org/jira/browse/HIVE-27000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27000 started by Alessandro Solimando. --- > Improve the modularity of the *ColumnStatsMerger classes > > > Key: HIVE-27000 > URL: https://issues.apache.org/jira/browse/HIVE-27000 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > *ColumnStatsMerger classes contain a lot of duplicate code which is not > specific to the data type, and that could therefore be lifted to a common > parent class. > This phenomenon is bound to become even worse if we keep enriching further > our supported set of statistics as we did in the context of HIVE-26221. > The current ticket aims at improving the modularity and code reuse of the > *ColumnStatsMerger classes, while improving unit-test coverage to cover all > classes and support more use-cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27000) Improve the modularity of the *ColumnStatsMerger classes
[ https://issues.apache.org/jira/browse/HIVE-27000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-27000: --- > Improve the modularity of the *ColumnStatsMerger classes > > > Key: HIVE-27000 > URL: https://issues.apache.org/jira/browse/HIVE-27000 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > *ColumnStatsMerger classes contain a lot of duplicate code which is not > specific to the data type, and that could therefore be lifted to a common > parent class. > This phenomenon is bound to become even worse if we keep enriching further > our supported set of statistics as we did in the context of HIVE-26221. > The current ticket aims at improving the modularity and code reuse of the > *ColumnStatsMerger classes, while improving unit-test coverage to cover all > classes and support more use-cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26297) Refactoring ColumnStatsAggregator classes to reduce warnings
[ https://issues.apache.org/jira/browse/HIVE-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando resolved HIVE-26297. - Resolution: Abandoned > Refactoring ColumnStatsAggregator classes to reduce warnings > > > Key: HIVE-26297 > URL: https://issues.apache.org/jira/browse/HIVE-26297 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore, Statistics >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Minor > > The interest of reducing warnings is to be able to focus on the important > ones. > Some of the bugs fixed while writing unit-tests were highlighted as warnings > (potential NPEs and rounding issues), but it was hard to see them among the > many other (less severe) warnings. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26221) Add histogram-based column statistics
[ https://issues.apache.org/jira/browse/HIVE-26221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646588#comment-17646588 ] Alessandro Solimando commented on HIVE-26221: - Thank you [~dengzh] and [~amansinha100] for the great review! I would also like to acknowledge the work of Ryan Johnson for all the benchmarking he did which shaped the final decision of directly using CDF function without needing intermediate (binned) histogram representation, and [~kgyrtkirk] and [~amansinha100] for their inputs in the design phase of the proposal. > Add histogram-based column statistics > - > > Key: HIVE-26221 > URL: https://issues.apache.org/jira/browse/HIVE-26221 > Project: Hive > Issue Type: Improvement > Components: CBO, Metastore, Statistics >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 11.5h > Remaining Estimate: 0h > > Hive does not support histogram statistics, which are particularly useful for > skewed data (which is very common in practice) and range predicates. > Hive's current selectivity estimation for range predicates is based on a > hard-coded value of 1/3 (see > [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).]) > The current proposal aims at integrating histogram as an additional column > statistics, stored into the Hive metastore at the table (or partition) level. > The main requirements for histogram integration are the following: > * efficiency: the approach must scale and support billions of rows > * merge-ability: partition-level histograms have to be merged to form > table-level histograms > * explicit and configurable trade-off between memory footprint and accuracy > Hive already integrates [KLL data > sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. > Datasketches are small, stateful programs that process massive data-streams > and can provide approximate answers, with mathematical guarantees, to > computationally difficult queries orders-of-magnitude faster than > traditional, exact methods. > We propose to use KLL, and more specifically the cumulative distribution > function (CDF), as the underlying data structure for our histogram statistics. > The current proposal targets numeric data types (float, integer and numeric > families) and temporal data types (date and timestamp). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26820) Disable hybridgrace_hashjoin_2.q flaky test
[ https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando resolved HIVE-26820. - Fix Version/s: 4.0.0 Resolution: Fixed > Disable hybridgrace_hashjoin_2.q flaky test > --- > > Key: HIVE-26820 > URL: https://issues.apache.org/jira/browse/HIVE-26820 > Project: Hive > Issue Type: Test > Components: Test >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Had this test failing many times in the last months, let's disable it for the > moment: > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26820) Disable hybridgrace_hashjoin_2.q flaky test
[ https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645311#comment-17645311 ] Alessandro Solimando commented on HIVE-26820: - Thanks [~zabetak] for the review and merge, I have filed and linked HIVE-26828 to this ticket, which can now be closed! > Disable hybridgrace_hashjoin_2.q flaky test > --- > > Key: HIVE-26820 > URL: https://issues.apache.org/jira/browse/HIVE-26820 > Project: Hive > Issue Type: Test > Components: Test >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Had this test failing many times in the last months, let's disable it for the > moment: > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26828) Fix OOM for hybridgrace_hashjoin_2.q
[ https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26828: Description: _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM transiently (from [flaky_test output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/], in case it disappears): {quote}< Status: Failed < Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml < A masked pattern was here < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded < Serialization trace: < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator) < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) < A masked pattern was here < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < A masked pattern was here < ] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < [Masked Vertex killed due to OTHER_VERTEX_FAILURE] < DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5 < FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml < A masked pattern was here < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded < Serialization trace: < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator) < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) < A masked pattern was here < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < A masked pattern was here < ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5 < PREHOOK: query: SELECT COUNT( * ) < FROM src1 x < JOIN srcpart z1 ON (x.key = z1.key) < JOIN src y1 ON (x.key = y1.key) < JOIN srcpart z2 ON (x.value = z2.value) < JOIN src y2 ON (x.value = y2.value) < WHERE z1.key < '' AND z2.key < 'zz' < AND y1.value < '' AND y2.value < 'zz' < PREHOOK: type: QUERY < PREHOOK: Input: default@src < PREHOOK: Input: default@src1 < PREHOOK: Input: default@srcpart < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11 < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12 < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11 < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12 < PREHOOK: Output: hdfs://### HDFS PATH ### {quote} The aim of this ticket is to investigate the issue, fix it and re-enable the test. The problem seems to lie in the deserialization of the computed tez dag plan. was: _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM transiently (from [flaky_test output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/], in case it disappears): {code:java} < Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex {code} {code:java} The aim of this ticket is to investigate the issue, fix it and re-enable the test.{code} > Fix OOM for hybridgrace_hashjoin_2.q > > > Key: HIVE-26828 > URL: https://issues.apache.org/jira/browse/HIVE-26828 > Project: Hive > Issue Type: Bug >
[jira] [Updated] (HIVE-26828) Fix OOM for hybridgrace_hashjoin_2.q
[ https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26828: Description: _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM transiently (from [flaky_test output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/], in case it disappears): {code:java} < Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml< A masked pattern was here < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)< A masked pattern was here < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded< A masked pattern was here < ]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml< A masked pattern was here < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)< A masked pattern was here < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded< A masked pattern was here < ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< PREHOOK: query: SELECT COUNT(*)< FROM src1 x< JOIN srcpart z1 ON (x.key = z1.key)< JOIN src y1 ON (x.key = y1.key)< JOIN srcpart z2 ON (x.value = z2.value)< JOIN src y2 ON (x.value = y2.value)< WHERE z1.key < '' AND z2.key < 'zz'< AND y1.value < '' AND y2.value < 'zz'< PREHOOK: type: QUERY< PREHOOK: Input: default@src< PREHOOK: Input: default@src1< PREHOOK: Input: default@srcpart< PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11< PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12< PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11< PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12< PREHOOK: Output: hdfs://### HDFS PATH ###{code} The aim of this ticket is to investigate the issue, fix it and re-enable the test. was: _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM transiently (from [flaky_test output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/], in case it disappears): {noformat} property: qfile used as override with val: hybridgrace_hashjoin_2.qproperty: run_disabled used as override with val: falseSetting hive-site: file:/home/jenkins/agent/workspace/hive-flaky-check/data/conf/tez//hive-site.xmlInitializing the schema to: 4.0.0Metastore connection URL: jdbc:derby:memory:junit_metastore_db;create=trueMetastore connection Driver : org.apache.derby.jdbc.EmbeddedDriverMetastore connection User: APPMetastore connection Password: mineStarting metastore schema initialization to 4.0.0Initialization script
[jira] [Updated] (HIVE-26828) Fix OOM for hybridgrace_hashjoin_2.q
[ https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26828: Description: _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM transiently (from [flaky_test output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/], in case it disappears): {code:java} < Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex {code} {code:java} The aim of this ticket is to investigate the issue, fix it and re-enable the test.{code} was: _hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM transiently (from [flaky_test output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/], in case it disappears): {code:java} < Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml< A masked pattern was here < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)< A masked pattern was here < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded< A masked pattern was here < ]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml< A masked pattern was here < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)< A masked pattern was here < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded< A masked pattern was here < ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< PREHOOK: query: SELECT COUNT(*)< FROM src1 x< JOIN srcpart z1 ON (x.key = z1.key)< JOIN src y1 ON (x.key = y1.key)< JOIN srcpart z2 ON (x.value = z2.value)< JOIN src y2 ON (x.value = y2.value)< WHERE z1.key < '' AND z2.key < 'zz'< AND y1.value < '' AND y2.value < 'zz'< PREHOOK: type: QUERY< PREHOOK: Input: default@src< PREHOOK: Input: default@src1< PREHOOK: Input: default@srcpart< PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11< PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12< PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11< PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12< PREHOOK: Output: hdfs://### HDFS PATH ###{code} The aim of this ticket is to investigate the issue, fix it and re-enable the test. > Fix OOM for hybridgrace_hashjoin_2.q > > > Key: HIVE-26828 > URL: https://issues.apache.org/jira/browse/HIVE-26828 > Project: Hive > Issue Type: Bug > Components: Test, Tez >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro
[jira] [Commented] (HIVE-26820) Disable hybridgrace_hashjoin_2.q flaky test
[ https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645232#comment-17645232 ] Alessandro Solimando commented on HIVE-26820: - Yes, I haven't had time to track which commit broke it, but I have had CI failing from time to time due to OOM in this test for quite some time now (like few months), it's not that frequent though. As I don't have much time to dive deep into this, for me the best course of action at the moment is to disable the test and create a ticket to investigate it and re-enable it later. > Disable hybridgrace_hashjoin_2.q flaky test > --- > > Key: HIVE-26820 > URL: https://issues.apache.org/jira/browse/HIVE-26820 > Project: Hive > Issue Type: Test > Components: Test >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Had this test failing many times in the last months, let's disable it for the > moment: > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26820) Disable hybridgrace_hashjoin_2.q flaky test
[ https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26820: Summary: Disable hybridgrace_hashjoin_2.q flaky test (was: Disable hybridgrace_hashjoin_2 flaky qtest) > Disable hybridgrace_hashjoin_2.q flaky test > --- > > Key: HIVE-26820 > URL: https://issues.apache.org/jira/browse/HIVE-26820 > Project: Hive > Issue Type: Test > Components: Test >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > Had this test failing many times in the last months, let's disable it for the > moment: > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26820) Disable hybridgrace_hashjoin_2 flaky qtest
[ https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26820: --- > Disable hybridgrace_hashjoin_2 flaky qtest > -- > > Key: HIVE-26820 > URL: https://issues.apache.org/jira/browse/HIVE-26820 > Project: Hive > Issue Type: Test > Components: Test >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > Had this test failing many times in the last months, let's disable it for the > moment: > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc
[ https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644328#comment-17644328 ] Alessandro Solimando edited comment on HIVE-26812 at 12/7/22 1:10 PM: -- [~zabetak], thanks for your input, I agree on the analysis, we can fix it here as per the PR, but we should open another ticket to track the issue in the beeline module. EDIT: as per why it works when compiling from the main directory with -Pitests, given your findings, I am wondering if the dependency-reduced.xml from hive-beeline module with the needed dependencies is used? was (Author: asolimando): [~zabetak], thanks for your input, I agree on the analysis, we can fix it here as per the PR, but we should open another ticket to track the issue in the beeline module. > hive-it-util module misses a dependency on hive-jdbc > > > Key: HIVE-26812 > URL: https://issues.apache.org/jira/browse/HIVE-26812 > Project: Hive > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Building from $hive/itests fails as follows: > {noformat} > [INFO] Hive Integration - Testing Utilities ... FAILURE [ 6.492 > s] > ... > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 56.499 s > [INFO] Finished at: 2022-12-06T19:24:16+01:00 > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile > (default-compile) on project hive-it-util: Compilation failure > [ERROR] > /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28] > cannot find symbol > [ERROR] symbol: class Utils > [ERROR] location: package org.apache.hive.jdbc > [ERROR] > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hive-it-util{noformat} > Surprisingly, building from the top directory with -Pitests does not fail. > There is a missing dependency on the hive-jdbc module, when adding that, the > error gets fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc
[ https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644328#comment-17644328 ] Alessandro Solimando commented on HIVE-26812: - [~zabetak], thanks for your input, I agree on the analysis, we can fix it here as per the PR, but we should open another ticket to track the issue in the beeline module. > hive-it-util module misses a dependency on hive-jdbc > > > Key: HIVE-26812 > URL: https://issues.apache.org/jira/browse/HIVE-26812 > Project: Hive > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Building from $hive/itests fails as follows: > {noformat} > [INFO] Hive Integration - Testing Utilities ... FAILURE [ 6.492 > s] > ... > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 56.499 s > [INFO] Finished at: 2022-12-06T19:24:16+01:00 > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile > (default-compile) on project hive-it-util: Compilation failure > [ERROR] > /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28] > cannot find symbol > [ERROR] symbol: class Utils > [ERROR] location: package org.apache.hive.jdbc > [ERROR] > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hive-it-util{noformat} > Surprisingly, building from the top directory with -Pitests does not fail. > There is a missing dependency on the hive-jdbc module, when adding that, the > error gets fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc
[ https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643999#comment-17643999 ] Alessandro Solimando commented on HIVE-26812: - [~zabetak], since we had a look together at this already, it should be trivial to review for you, if you have time :) > hive-it-util module misses a dependency on hive-jdbc > > > Key: HIVE-26812 > URL: https://issues.apache.org/jira/browse/HIVE-26812 > Project: Hive > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Building from $hive/itests fails as follows: > {noformat} > [INFO] Hive Integration - Testing Utilities ... FAILURE [ 6.492 > s] > ... > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 56.499 s > [INFO] Finished at: 2022-12-06T19:24:16+01:00 > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile > (default-compile) on project hive-it-util: Compilation failure > [ERROR] > /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28] > cannot find symbol > [ERROR] symbol: class Utils > [ERROR] location: package org.apache.hive.jdbc > [ERROR] > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hive-it-util{noformat} > Surprisingly, building from the top directory with -Pitests does not fail. > There is a missing dependency on the hive-jdbc module, when adding that, the > error gets fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc
[ https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26812 started by Alessandro Solimando. --- > hive-it-util module misses a dependency on hive-jdbc > > > Key: HIVE-26812 > URL: https://issues.apache.org/jira/browse/HIVE-26812 > Project: Hive > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > Building from $hive/itests fails as follows: > {noformat} > [INFO] Hive Integration - Testing Utilities ... FAILURE [ 6.492 > s] > ... > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 56.499 s > [INFO] Finished at: 2022-12-06T19:24:16+01:00 > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile > (default-compile) on project hive-it-util: Compilation failure > [ERROR] > /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28] > cannot find symbol > [ERROR] symbol: class Utils > [ERROR] location: package org.apache.hive.jdbc > [ERROR] > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hive-it-util{noformat} > Surprisingly, building from the top directory with -Pitests does not fail. > There is a missing dependency on the hive-jdbc module, when adding that, the > error gets fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc
[ https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26812: --- > hive-it-util module misses a dependency on hive-jdbc > > > Key: HIVE-26812 > URL: https://issues.apache.org/jira/browse/HIVE-26812 > Project: Hive > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > Building from $hive/itests fails as follows: > {noformat} > [INFO] Hive Integration - Testing Utilities ... FAILURE [ 6.492 > s] > ... > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 56.499 s > [INFO] Finished at: 2022-12-06T19:24:16+01:00 > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile > (default-compile) on project hive-it-util: Compilation failure > [ERROR] > /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28] > cannot find symbol > [ERROR] symbol: class Utils > [ERROR] location: package org.apache.hive.jdbc > [ERROR] > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hive-it-util{noformat} > Surprisingly, building from the top directory with -Pitests does not fail. > There is a missing dependency on the hive-jdbc module, when adding that, the > error gets fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796
[ https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643751#comment-17643751 ] Alessandro Solimando commented on HIVE-26806: - [~zabetak], I have deleted the green runs but the first time I re-run, the timeout occurred again. I haven't seen timeout from that run onward, so it has probably worked, but the new random split was unfortunate too by coincidence. So, resuming, deleting past green runs seem to work, no need to close and open PR again if not needed. Thanks! > Precommit tests in CI are timing out after HIVE-26796 > - > > Key: HIVE-26806 > URL: https://issues.apache.org/jira/browse/HIVE-26806 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > http://ci.hive.apache.org/job/hive-precommit/job/master/1506/ > {noformat} > ancelling nested steps due to timeout > 15:22:08 Sending interrupt signal to process > 15:22:08 Killing processes > 15:22:09 kill finished with exit code 0 > 15:22:19 Terminated > 15:22:19 script returned exit code 143 > [Pipeline] } > [Pipeline] // withEnv > [Pipeline] } > 15:22:19 Deleting 1 temporary files > [Pipeline] // configFileProvider > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (PostProcess) > [Pipeline] sh > [Pipeline] sh > [Pipeline] sh > [Pipeline] junit > 15:22:25 Recording test results > 15:22:32 [Checks API] No suitable checks publisher found. > [Pipeline] } > [Pipeline] // stage > [Pipeline] } > [Pipeline] // container > [Pipeline] } > [Pipeline] // node > [Pipeline] } > [Pipeline] // timeout > [Pipeline] } > [Pipeline] // podTemplate > [Pipeline] } > 15:22:32 Failed in branch split-01 > [Pipeline] // parallel > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (Archive) > [Pipeline] podTemplate > [Pipeline] { > [Pipeline] timeout > 15:22:33 Timeout set to expire in 6 hr 0 min > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26810) Replace HiveFilterSetOpTransposeRule onMatch method with Calcite's built-in implementation
[ https://issues.apache.org/jira/browse/HIVE-26810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26810: --- > Replace HiveFilterSetOpTransposeRule onMatch method with Calcite's built-in > implementation > -- > > Key: HIVE-26810 > URL: https://issues.apache.org/jira/browse/HIVE-26810 > Project: Hive > Issue Type: Task > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > After HIVE-26762, the _onMatch_ method is now the same as in the Calcite > implementation, we can drop the Hive's override in order to avoid the risk of > them drifting away again. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796
[ https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643470#comment-17643470 ] Alessandro Solimando edited comment on HIVE-26806 at 12/5/22 5:19 PM: -- It looks that deleting all green past runs did not fix for [https://github.com/apache/hive/pull/3137]. That's a big deal since the PR is huge and review is in progress, I don't think I can close and re-open it. Is there a way to tweak timeout for that PR alone [~zabetak]? EDIT: there is, I am using "Replay" in Jenkins so I can change the JenkinsFile for the given run without any change in Git, hopefully that will do the trick. was (Author: asolimando): It looks that deleting all green past runs did not fix for [https://github.com/apache/hive/pull/3137]. That's a big deal since the PR is huge and review is in progress, I don't think I can close and re-open it. Is there a way to tweak timeout for that PR alone [~zabetak]? > Precommit tests in CI are timing out after HIVE-26796 > - > > Key: HIVE-26806 > URL: https://issues.apache.org/jira/browse/HIVE-26806 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > http://ci.hive.apache.org/job/hive-precommit/job/master/1506/ > {noformat} > ancelling nested steps due to timeout > 15:22:08 Sending interrupt signal to process > 15:22:08 Killing processes > 15:22:09 kill finished with exit code 0 > 15:22:19 Terminated > 15:22:19 script returned exit code 143 > [Pipeline] } > [Pipeline] // withEnv > [Pipeline] } > 15:22:19 Deleting 1 temporary files > [Pipeline] // configFileProvider > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (PostProcess) > [Pipeline] sh > [Pipeline] sh > [Pipeline] sh > [Pipeline] junit > 15:22:25 Recording test results > 15:22:32 [Checks API] No suitable checks publisher found. > [Pipeline] } > [Pipeline] // stage > [Pipeline] } > [Pipeline] // container > [Pipeline] } > [Pipeline] // node > [Pipeline] } > [Pipeline] // timeout > [Pipeline] } > [Pipeline] // podTemplate > [Pipeline] } > 15:22:32 Failed in branch split-01 > [Pipeline] // parallel > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (Archive) > [Pipeline] podTemplate > [Pipeline] { > [Pipeline] timeout > 15:22:33 Timeout set to expire in 6 hr 0 min > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796
[ https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643470#comment-17643470 ] Alessandro Solimando commented on HIVE-26806: - It looks that deleting all green past runs did not fix for [https://github.com/apache/hive/pull/3137]. That's a big deal since the PR is huge and review is in progress, I don't think I can close and re-open it. Is there a way to tweak timeout for that PR alone [~zabetak]? > Precommit tests in CI are timing out after HIVE-26796 > - > > Key: HIVE-26806 > URL: https://issues.apache.org/jira/browse/HIVE-26806 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > http://ci.hive.apache.org/job/hive-precommit/job/master/1506/ > {noformat} > ancelling nested steps due to timeout > 15:22:08 Sending interrupt signal to process > 15:22:08 Killing processes > 15:22:09 kill finished with exit code 0 > 15:22:19 Terminated > 15:22:19 script returned exit code 143 > [Pipeline] } > [Pipeline] // withEnv > [Pipeline] } > 15:22:19 Deleting 1 temporary files > [Pipeline] // configFileProvider > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (PostProcess) > [Pipeline] sh > [Pipeline] sh > [Pipeline] sh > [Pipeline] junit > 15:22:25 Recording test results > 15:22:32 [Checks API] No suitable checks publisher found. > [Pipeline] } > [Pipeline] // stage > [Pipeline] } > [Pipeline] // container > [Pipeline] } > [Pipeline] // node > [Pipeline] } > [Pipeline] // timeout > [Pipeline] } > [Pipeline] // podTemplate > [Pipeline] } > 15:22:32 Failed in branch split-01 > [Pipeline] // parallel > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (Archive) > [Pipeline] podTemplate > [Pipeline] { > [Pipeline] timeout > 15:22:33 Timeout set to expire in 6 hr 0 min > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796
[ https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643222#comment-17643222 ] Alessandro Solimando commented on HIVE-26806: - Thanks [~zabetak], as you say the issue now affects only existing PRs, I am trying 2. to see if it works, otherwise I will go for 1., I will keep you guys posted here. Forgetting the old affected PRs, I am OK with reducing the timeout to the previous value, since it now works. > Precommit tests in CI are timing out after HIVE-26796 > - > > Key: HIVE-26806 > URL: https://issues.apache.org/jira/browse/HIVE-26806 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > http://ci.hive.apache.org/job/hive-precommit/job/master/1506/ > {noformat} > ancelling nested steps due to timeout > 15:22:08 Sending interrupt signal to process > 15:22:08 Killing processes > 15:22:09 kill finished with exit code 0 > 15:22:19 Terminated > 15:22:19 script returned exit code 143 > [Pipeline] } > [Pipeline] // withEnv > [Pipeline] } > 15:22:19 Deleting 1 temporary files > [Pipeline] // configFileProvider > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (PostProcess) > [Pipeline] sh > [Pipeline] sh > [Pipeline] sh > [Pipeline] junit > 15:22:25 Recording test results > 15:22:32 [Checks API] No suitable checks publisher found. > [Pipeline] } > [Pipeline] // stage > [Pipeline] } > [Pipeline] // container > [Pipeline] } > [Pipeline] // node > [Pipeline] } > [Pipeline] // timeout > [Pipeline] } > [Pipeline] // podTemplate > [Pipeline] } > 15:22:32 Failed in branch split-01 > [Pipeline] // parallel > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (Archive) > [Pipeline] podTemplate > [Pipeline] { > [Pipeline] timeout > 15:22:33 Timeout set to expire in 6 hr 0 min > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796
[ https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642558#comment-17642558 ] Alessandro Solimando commented on HIVE-26806: - In case you have an existing open PR suffering form this and you don't want to rebase, if you have permission to run Jenkins' jobs you just change the default split value to 22 and re-run, HTH > Precommit tests in CI are timing out after HIVE-26796 > - > > Key: HIVE-26806 > URL: https://issues.apache.org/jira/browse/HIVE-26806 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > http://ci.hive.apache.org/job/hive-precommit/job/master/1506/ > {noformat} > ancelling nested steps due to timeout > 15:22:08 Sending interrupt signal to process > 15:22:08 Killing processes > 15:22:09 kill finished with exit code 0 > 15:22:19 Terminated > 15:22:19 script returned exit code 143 > [Pipeline] } > [Pipeline] // withEnv > [Pipeline] } > 15:22:19 Deleting 1 temporary files > [Pipeline] // configFileProvider > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (PostProcess) > [Pipeline] sh > [Pipeline] sh > [Pipeline] sh > [Pipeline] junit > 15:22:25 Recording test results > 15:22:32 [Checks API] No suitable checks publisher found. > [Pipeline] } > [Pipeline] // stage > [Pipeline] } > [Pipeline] // container > [Pipeline] } > [Pipeline] // node > [Pipeline] } > [Pipeline] // timeout > [Pipeline] } > [Pipeline] // podTemplate > [Pipeline] } > 15:22:32 Failed in branch split-01 > [Pipeline] // parallel > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (Archive) > [Pipeline] podTemplate > [Pipeline] { > [Pipeline] timeout > 15:22:33 Timeout set to expire in 6 hr 0 min > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule
[ https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26762 started by Alessandro Solimando. --- > Remove operand pruning in HiveFilterSetOpTransposeRule > -- > > Key: HIVE-26762 > URL: https://issues.apache.org/jira/browse/HIVE-26762 > Project: Hive > Issue Type: Task > Components: CBO, Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if > the newly pushed filter simplifies to FALSE (due to the predicates holding on > the input). > If this is true and there is more than one UNION ALL operand, it gets pruned. > After HIVE-26524 ("Use Calcite to remove sections of a query plan known never > produces rows"), this is possibly redundant and we could drop this feature > and let the other rules take care of the pruning. > In such a case, it might be even possible to drop the Hive specific rule and > relies on the Calcite one (the difference is just the operand pruning at the > moment of writing), similarly to what HIVE-26642 did for > HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended > to tackle this in a separate ticket after verifying that is feasible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule
[ https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26762: --- Assignee: Alessandro Solimando > Remove operand pruning in HiveFilterSetOpTransposeRule > -- > > Key: HIVE-26762 > URL: https://issues.apache.org/jira/browse/HIVE-26762 > Project: Hive > Issue Type: Task > Components: CBO, Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if > the newly pushed filter simplifies to FALSE (due to the predicates holding on > the input). > If this is true and there is more than one UNION ALL operand, it gets pruned. > After HIVE-26524 ("Use Calcite to remove sections of a query plan known never > produces rows"), this is possibly redundant and we could drop this feature > and let the other rules take care of the pruning. > In such a case, it might be even possible to drop the Hive specific rule and > relies on the Calcite one (the difference is just the operand pruning at the > moment of writing), similarly to what HIVE-26642 did for > HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended > to tackle this in a separate ticket after verifying that is feasible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule
[ https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26762: Issue Type: Task (was: Bug) > Remove operand pruning in HiveFilterSetOpTransposeRule > -- > > Key: HIVE-26762 > URL: https://issues.apache.org/jira/browse/HIVE-26762 > Project: Hive > Issue Type: Task > Components: CBO, Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if > the newly pushed filter simplifies to FALSE (due to the predicates holding on > the input). > If this is true and there is more than one UNION ALL operand, it gets pruned. > After HIVE-26524 ("Use Calcite to remove sections of a query plan known never > produces rows"), this is possibly redundant and we could drop this feature > and let the other rules take care of the pruning. > In such a case, it might be even possible to drop the Hive specific rule and > relies on the Calcite one (the difference is just the operand pruning at the > moment of writing), similarly to what HIVE-26642 did for > HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended > to tackle this in a separate ticket after verifying that is feasible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26692) Check for the expected thrift version before compiling
[ https://issues.apache.org/jira/browse/HIVE-26692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641464#comment-17641464 ] Alessandro Solimando commented on HIVE-26692: - [~ayushtkn] I managed to find a bit of time to work on this, would you mind checking the PR if you have some spare cycles? > Check for the expected thrift version before compiling > -- > > Key: HIVE-26692 > URL: https://issues.apache.org/jira/browse/HIVE-26692 > Project: Hive > Issue Type: Task > Components: Thrift API >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > At the moment we don't check for the thrift version before launching thrift, > the error messages are often cryptic upon mismatches. > An explicit check with a clear error message would be nice, like what parquet > does: > [https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L247-L268] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26692) Check for the expected thrift version before compiling
[ https://issues.apache.org/jira/browse/HIVE-26692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26692 started by Alessandro Solimando. --- > Check for the expected thrift version before compiling > -- > > Key: HIVE-26692 > URL: https://issues.apache.org/jira/browse/HIVE-26692 > Project: Hive > Issue Type: Task > Components: Thrift API >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > At the moment we don't check for the thrift version before launching thrift, > the error messages are often cryptic upon mismatches. > An explicit check with a clear error message would be nice, like what parquet > does: > [https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L247-L268] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26692) Check for the expected thrift version before compiling
[ https://issues.apache.org/jira/browse/HIVE-26692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26692: --- Assignee: Alessandro Solimando > Check for the expected thrift version before compiling > -- > > Key: HIVE-26692 > URL: https://issues.apache.org/jira/browse/HIVE-26692 > Project: Hive > Issue Type: Task > Components: Thrift API >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > At the moment we don't check for the thrift version before launching thrift, > the error messages are often cryptic upon mismatches. > An explicit check with a clear error message would be nice, like what parquet > does: > [https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L247-L268] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26683) Sum over window produces 0 when row contains null
[ https://issues.apache.org/jira/browse/HIVE-26683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637130#comment-17637130 ] Alessandro Solimando commented on HIVE-26683: - +1 from me, it's always unfortunate to make breaking changes, but in this case the current behaviour seems inconsistent and broken (surprising to see 0, I agree it should be a NULL), we should fix it IMO. > Sum over window produces 0 when row contains null > - > > Key: HIVE-26683 > URL: https://issues.apache.org/jira/browse/HIVE-26683 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Ran the following sql: > > {code:java} > create table sum_window_test_small (id int, tinyint_col tinyint); > insert into sum_window_test_small values (5,5), (10, NULL), (11,1); > select id, > tinyint_col, > sum(tinyint_col) over (order by id nulls last rows between 1 following and 1 > following) > from sum_window_test_small order by id; > select id, > tinyint_col, > sum(tinyint_col) over (order by id nulls last rows between current row and 1 > following) > from sum_window_test_small order by id; > {code} > The result is > {code:java} > +-+--+---+ > | id | tinyint_col | sum_window_0 | > +-+--+---+ > | 5 | 5 | 0 | > | 10 | NULL | 1 | > | 11 | 1 | NULL | > +-+--+---+{code} > The first row should have the sum as NULL > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26243) Add vectorized implementation of the 'ds_kll_sketch' UDAF
[ https://issues.apache.org/jira/browse/HIVE-26243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando resolved HIVE-26243. - Fix Version/s: 4.0.0 Resolution: Fixed Fixed via [{{ad19ec3}}|https://github.com/apache/hive/commit/ad19ec3022a35bee4d618bd8992d9ce0f67be5b7], thanks to [~dkuzmenko] and [~kgyrtkirk] for their reviews > Add vectorized implementation of the 'ds_kll_sketch' UDAF > - > > Key: HIVE-26243 > URL: https://issues.apache.org/jira/browse/HIVE-26243 > Project: Hive > Issue Type: Improvement > Components: UDF, Vectorization >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > _ds_kll_sketch_ UDAF does not have a vectorized implementation at the moment, > the present ticket aims at bridging this gap. > This is particularly important because vectorization has an "all or nothing" > approach, so if this function is used at the side of vectorized functions, > they won't be able to benefit from vectorized execution. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule
[ https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26762: Description: HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if the newly pushed filter simplifies to FALSE (due to the predicates holding on the input). If this is true and there is more than one UNION ALL operand, it gets pruned. After HIVE-26524 ("Use Calcite to remove sections of a query plan known never produces rows"), this is possibly redundant and we could drop this feature and let the other rules take care of the pruning. In such a case, it might be even possible to drop the Hive specific rule and relies on the Calcite one (the difference is just the operand pruning at the moment of writing), similarly to what HIVE-26642 did for HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended to tackle this in a separate ticket after verifying that is feasible. was: HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if the newly pushed filter simplifies to FALSE (possibly due to the predicates holding on the input). If this is true and there is more than one UNION ALL operand, it gets pruned. After HIVE-26524 ("Use Calcite to remove sections of a query plan known never produces rows"), this is possibly redundant and we could drop this feature and let the other rules take care of the pruning. In such a case, it might be even possible to drop the Hive specific rule and relies on the Calcite one (the difference is just the operand pruning at the moment of writing), similarly to what HIVE-26642 did for HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended to tackle this in a separate ticket after verifying that is feasible. > Remove operand pruning in HiveFilterSetOpTransposeRule > -- > > Key: HIVE-26762 > URL: https://issues.apache.org/jira/browse/HIVE-26762 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Priority: Major > > HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if > the newly pushed filter simplifies to FALSE (due to the predicates holding on > the input). > If this is true and there is more than one UNION ALL operand, it gets pruned. > After HIVE-26524 ("Use Calcite to remove sections of a query plan known never > produces rows"), this is possibly redundant and we could drop this feature > and let the other rules take care of the pruning. > In such a case, it might be even possible to drop the Hive specific rule and > relies on the Calcite one (the difference is just the operand pruning at the > moment of writing), similarly to what HIVE-26642 did for > HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended > to tackle this in a separate ticket after verifying that is feasible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule
[ https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26762: Description: HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if the newly pushed filter simplifies to FALSE (possibly due to the predicates holding on the input). If this is true and there is more than one UNION ALL operand, it gets pruned. After HIVE-26524 ("Use Calcite to remove sections of a query plan known never produces rows"), this is possibly redundant and we could drop this feature and let the other rules take care of the pruning. In such a case, it might be even possible to drop the Hive specific rule and relies on the Calcite one (the difference is just the operand pruning at the moment of writing), similarly to what HIVE-26642 did for HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended to tackle this in a separate ticket after verifying that is feasible. was: HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if the newly pushed filter simplifies to FALSE (possibly due to the predicates holding on the input). If this is true and there is more than one UNION ALL operand, it gets pruned. After HIVE-26524 ("Use Calcite to remove sections of a query plan known never produces rows"), this is possibly redundant and we could drop this feature and let the other rules take care of the pruning. In such a case, it's even possible to drop the Hive specific rule and relies on the Calcite one (the difference is just the operand pruning at the moment of writing), similarly to what HIVE-26642 did for HiveReduceExpressionRule. > Remove operand pruning in HiveFilterSetOpTransposeRule > -- > > Key: HIVE-26762 > URL: https://issues.apache.org/jira/browse/HIVE-26762 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Priority: Major > > HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if > the newly pushed filter simplifies to FALSE (possibly due to the predicates > holding on the input). > If this is true and there is more than one UNION ALL operand, it gets pruned. > After HIVE-26524 ("Use Calcite to remove sections of a query plan known never > produces rows"), this is possibly redundant and we could drop this feature > and let the other rules take care of the pruning. > In such a case, it might be even possible to drop the Hive specific rule and > relies on the Calcite one (the difference is just the operand pruning at the > moment of writing), similarly to what HIVE-26642 did for > HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended > to tackle this in a separate ticket after verifying that is feasible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work stopped] (HIVE-26733) Not safe to use '=' for predicates on constant expressions that might be NULL
[ https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26733 stopped by Alessandro Solimando. --- > Not safe to use '=' for predicates on constant expressions that might be NULL > - > > Key: HIVE-26733 > URL: https://issues.apache.org/jira/browse/HIVE-26733 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago. > Hive's version lacks this commit > [https://github.com/apache/calcite/commit/8281668f] which introduced the use > of "IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can > be NULL. > There is no Calcite ticket for this change, so I am briefly explaining the > issue here. > Consider the following input as argument of > HiveRelMdPredicates::pullUpPredicates(Project) method: > {code:java} > SELECT char_length(NULL) FROM t{code} > The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL))) > which translates to "=(NULL, NULL)", which turns simplifies to FALSE under > the unknownAsFalse semantics. > The change will make this methods return "IS NOT DISTINCT FROM($0, > CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL, > NULL), which is TRUE. > For reference, we have the truth table below (from [1]): > ||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}|| > |{{0}}|{{0}}|_true_|_true_| > |{{0}}|{{1}}|_false_|_false_| > |{{0}}|{{null}}|_*unknown*_|_*false*_| > |{{null}}|{{null}}|_*unknown*_|_*true*_| > [1] https://modern-sql.com/feature/is-distinct-from -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando resolved HIVE-26722. - Resolution: Fixed > HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands > -- > > Key: HIVE-26722 > URL: https://issues.apache.org/jira/browse/HIVE-26722 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > h1. Reproducer > Consider the following query: > {code:java} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > CREATE EXTERNAL TABLE t (a string, b string); > INSERT INTO t VALUES ('1000', 'b1'); > INSERT INTO t VALUES ('2000', 'b2'); > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000;EXPLAIN CBO > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000; {code} > The expected result is: > {code:java} > 1000 b1 > 1000 NULL{code} > An example of correct plan is as follows: > {noformat} > CBO PLAN: > HiveUnion(all=[true]) > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) > HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET > "UTF-16LE"]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} > > Consider now a scenario where expression reduction in projections is disabled > by setting the following property{_}:{_} > {noformat} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > {noformat} > In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, > and we get the following (invalid) result: > {code:java} > 1000 b1{code} > produced by the following invalid plan: > {code:java} > CBO PLAN: > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) {code} > h1. Problem Analysis > At > [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] > the _RelMetadataQuery::getPulledUpPredicates_ method infers the following > predicate due to the CAST(NULL) in the projection: > {code:java} > (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} > When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is > inferred. > In > [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], > the rule checks if the conjunction of the predicate coming from the filter > (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or > not, under the _UnknownAsFalse_ semantics. > To summarize, the following expression is simplified under the > _UnknownAsFalse_ semantics: > {code:java} > AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), > =(CAST($0):DOUBLE, 1000)) > {code} > Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to > {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, > =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and > the UNION ALL operand is pruned. > Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the > issue, due to the _IS_NULL($1)_ inferred predicate, see > [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] > for understanding how the NULL literal is treated differently during > predicate inference. > _HiveRelMdPredicates_ should not use equality ('=') for nullable constant > expressions, but rather IS NOT DISTINCT FROM, as detailed in HIVE-26733, but > nonetheless the way simplification is done is not correct here, inferred > predicates should be used as "context", rather than been used in a > conjunctive expression, this usage does not conform with any of the similar > uses of simplification with inferred predicates (see the bottom of the > "Solution" section for examples and details). > h1. Solution > In
[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26722: Description: h1. Reproducer Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES ('2000', 'b2'); SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000;EXPLAIN CBO SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000; {code} The expected result is: {code:java} 1000 b1 1000 NULL{code} An example of correct plan is as follows: {noformat} CBO PLAN: HiveUnion(all=[true]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_} {noformat} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); {noformat} In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result: {code:java} 1000 b1{code} produced by the following invalid plan: {code:java} CBO PLAN: HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) {code} h1. Problem Analysis At [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] the _RelMetadataQuery::getPulledUpPredicates_ method infers the following predicate due to the CAST(NULL) in the projection: {code:java} (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred. In [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the _UnknownAsFalse_ semantics. To summarize, the following expression is simplified under the _UnknownAsFalse_ semantics: {code:java} AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000)) {code} Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the UNION ALL operand is pruned. Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, due to the _IS_NULL($1)_ inferred predicate, see [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] for understanding how the NULL literal is treated differently during predicate inference. _HiveRelMdPredicates_ should not use equality ('=') for nullable constant expressions, but rather IS NOT DISTINCT FROM, as detailed in HIVE-26733, but nonetheless the way simplification is done is not correct here, inferred predicates should be used as "context", rather than been used in a conjunctive expression, this usage does not conform with any of the similar uses of simplification with inferred predicates (see the bottom of the "Solution" section for examples and details). h1. Solution In order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with _predicates_ as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account. The code at [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] should be replaced by the following: {code:java} final RexExecutor executor = Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR); final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor); final RexNode x =
[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26722: Description: h1. Reproducer Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES ('2000', 'b2'); SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000;EXPLAIN CBO SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000; {code} The expected result is: {code:java} 1000 b1 1000 NULL{code} An example of correct plan is as follows: {noformat} CBO PLAN: HiveUnion(all=[true]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_} {noformat} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); {noformat} In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result: {code:java} 1000 b1{code} produced by the following invalid plan: {code:java} CBO PLAN: HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) {code} h1. Problem Analysis At [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] the _RelMetadataQuery::getPulledUpPredicates_ method infers the following predicate due to the CAST(NULL) in the projection: {code:java} (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred. In [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the _UnknownAsFalse_ semantics. To summarize, the following expression is simplified under the _UnknownAsFalse_ semantics: {code:java} AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000)) {code} Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the UNION ALL operand is pruned. Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, due to the _IS_NULL($1)_ inferred predicate, see [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] for understanding how the NULL literal is treated differently during predicate inference. The problem lies in the fact that, depending on the input _RelNode_ that we infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case. h1. Solution In order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with _predicates_ as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account. The code at [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] should be replaced by the following: {code:java} final RexExecutor executor = Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR); final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor); final RexNode x = simplify.simplifyUnknownAs(newCondition, RexUnknownAs.FALSE);{code} This is in line with other uses of simplification, like in Calcite:
[jira] [Updated] (HIVE-26733) Not safe to use '=' for predicates on constant expressions that might be NULL
[ https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26733: Description: HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago. Hive's version lacks this commit [https://github.com/apache/calcite/commit/8281668f] which introduced the use of "IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can be NULL. There is no Calcite ticket for this change, so I am briefly explaining the issue here. Consider the following input as argument of HiveRelMdPredicates::pullUpPredicates(Project) method: {code:java} SELECT char_length(NULL) FROM t{code} The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL))) which translates to "=(NULL, NULL)", which turns simplifies to FALSE under the unknownAsFalse semantics. The change will make this methods return "IS NOT DISTINCT FROM($0, CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL, NULL), which is TRUE. For reference, we have the truth table below (from [1]): ||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}|| |{{0}}|{{0}}|_true_|_true_| |{{0}}|{{1}}|_false_|_false_| |{{0}}|{{null}}|_*unknown*_|_*false*_| |{{null}}|{{null}}|_*unknown*_|_*true*_| [1] https://modern-sql.com/feature/is-distinct-from was: Given a _CAST(NULL as $type)_ as i-th project expression, the method returns _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a _NULL_ literal project expression. This is because _RexLiteral::isNullLiteral_ is used [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153], while in similar cases, it's often convenient to use {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}. > Not safe to use '=' for predicates on constant expressions that might be NULL > - > > Key: HIVE-26733 > URL: https://issues.apache.org/jira/browse/HIVE-26733 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago. > Hive's version lacks this commit > [https://github.com/apache/calcite/commit/8281668f] which introduced the use > of "IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can > be NULL. > There is no Calcite ticket for this change, so I am briefly explaining the > issue here. > Consider the following input as argument of > HiveRelMdPredicates::pullUpPredicates(Project) method: > {code:java} > SELECT char_length(NULL) FROM t{code} > The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL))) > which translates to "=(NULL, NULL)", which turns simplifies to FALSE under > the unknownAsFalse semantics. > The change will make this methods return "IS NOT DISTINCT FROM($0, > CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL, > NULL), which is TRUE. > For reference, we have the truth table below (from [1]): > ||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}|| > |{{0}}|{{0}}|_true_|_true_| > |{{0}}|{{1}}|_false_|_false_| > |{{0}}|{{null}}|_*unknown*_|_*false*_| > |{{null}}|{{null}}|_*unknown*_|_*true*_| > [1] https://modern-sql.com/feature/is-distinct-from -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26733) Not safe to use '=' for predicates on constant expressions that might be NULL
[ https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26733: Summary: Not safe to use '=' for predicates on constant expressions that might be NULL (was: HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for CAST(NULL)) > Not safe to use '=' for predicates on constant expressions that might be NULL > - > > Key: HIVE-26733 > URL: https://issues.apache.org/jira/browse/HIVE-26733 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Given a _CAST(NULL as $type)_ as i-th project expression, the method returns > _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a > _NULL_ literal project expression. > This is because _RexLiteral::isNullLiteral_ is used > [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153], > while in similar cases, it's often convenient to use > {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26733) HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for CAST(NULL)
[ https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26733: Description: Given a _CAST(NULL as $type)_ as i-th project expression, the method returns _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a _NULL_ literal project expression. This is because _RexLiteral::isNullLiteral_ is used [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153], while in similar cases, it's often convenient to use {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}. was: Given a _CAST(NULL as $type)_ as i-th project expression, the method returns _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a _NULL_ literal project expression. This is because _RexLiteral::isNullLiteral_ is used [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153], while in similar places, it's often convenient to use {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}. > HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for > CAST(NULL) > --- > > Key: HIVE-26733 > URL: https://issues.apache.org/jira/browse/HIVE-26733 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > Given a _CAST(NULL as $type)_ as i-th project expression, the method returns > _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a > _NULL_ literal project expression. > This is because _RexLiteral::isNullLiteral_ is used > [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153], > while in similar cases, it's often convenient to use > {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26733) HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for CAST(NULL)
[ https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26733 started by Alessandro Solimando. --- > HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for > CAST(NULL) > --- > > Key: HIVE-26733 > URL: https://issues.apache.org/jira/browse/HIVE-26733 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > Given a _CAST(NULL as $type)_ as i-th project expression, the method returns > _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a > _NULL_ literal project expression. > This is because _RexLiteral::isNullLiteral_ is used > [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153], > while in similar places, it's often convenient to use > {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26733) HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for CAST(NULL)
[ https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26733: --- > HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for > CAST(NULL) > --- > > Key: HIVE-26733 > URL: https://issues.apache.org/jira/browse/HIVE-26733 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > Given a _CAST(NULL as $type)_ as i-th project expression, the method returns > _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a > _NULL_ literal project expression. > This is because _RexLiteral::isNullLiteral_ is used > [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153], > while in similar places, it's often convenient to use > {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26722: Summary: HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands (was: HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands) > HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands > -- > > Key: HIVE-26722 > URL: https://issues.apache.org/jira/browse/HIVE-26722 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > h1. Reproducer > Consider the following query: > {code:java} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > CREATE EXTERNAL TABLE t (a string, b string); > INSERT INTO t VALUES ('1000', 'b1'); > INSERT INTO t VALUES ('2000', 'b2'); > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000;EXPLAIN CBO > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000; {code} > > The expected result is: > {code:java} > 1000 b1 > 1000 NULL{code} > An example of correct plan is as follows: > {noformat} > CBO PLAN: > HiveUnion(all=[true]) > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) > HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET > "UTF-16LE"]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} > > Consider now a scenario where expression reduction in projections is disabled > by setting the following property{_}:{_} > {noformat} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > {noformat} > In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, > and we get the following (invalid) result: > {code:java} > 1000 b1{code} > produced by the following invalid plan: > {code:java} > CBO PLAN: > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) {code} > h1. Problem Analysis > At > [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] > the _RelMetadataQuery::getPulledUpPredicates_ method infers the following > predicate due to the CAST(NULL) in the projection: > {code:java} > (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} > When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is > inferred. > In > [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], > the rule checks if the conjunction of the predicate coming from the filter > (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or > not, under the _UnknownAsFalse_ semantics. > To summarize, the following expression is simplified under the > _UnknownAsFalse_ semantics: > {code:java} > AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), > =(CAST($0):DOUBLE, 1000)) > {code} > Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to > {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, > =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and > the UNION ALL operand is pruned. > Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the > issue, due to the _IS_NULL($1)_ inferred predicate, see > [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] > for understanding how the NULL literal is treated differently during > predicate inference. > The problem lies in the fact that, depending on the input _RelNode_ that we > infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, > but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this > case. > h1. Solution > In order to correctly simplify a predicate and test if it's always false or > not, we should build RexSimplify with _predicates_ as the list of predicates > known to hold in the context. In this way, the different semantics are
[jira] [Work started] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26722 started by Alessandro Solimando. --- > HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands > - > > Key: HIVE-26722 > URL: https://issues.apache.org/jira/browse/HIVE-26722 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > h1. Reproducer > Consider the following query: > {code:java} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > CREATE EXTERNAL TABLE t (a string, b string); > INSERT INTO t VALUES ('1000', 'b1'); > INSERT INTO t VALUES ('2000', 'b2'); > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000;EXPLAIN CBO > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000; {code} > > The expected result is: > {code:java} > 1000 b1 > 1000 NULL{code} > An example of correct plan is as follows: > {noformat} > CBO PLAN: > HiveUnion(all=[true]) > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) > HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET > "UTF-16LE"]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} > > Consider now a scenario where expression reduction in projections is disabled > by setting the following property{_}:{_} > {noformat} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > {noformat} > In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, > and we get the following (invalid) result: > {code:java} > 1000 b1{code} > produced by the following invalid plan: > {code:java} > CBO PLAN: > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) {code} > h1. Problem Analysis > At > [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] > the _RelMetadataQuery::getPulledUpPredicates_ method infers the following > predicate due to the CAST(NULL) in the projection: > {code:java} > (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} > When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is > inferred. > In > [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], > the rule checks if the conjunction of the predicate coming from the filter > (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or > not, under the _UnknownAsFalse_ semantics. > To summarize, the following expression is simplified under the > _UnknownAsFalse_ semantics: > {code:java} > AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), > =(CAST($0):DOUBLE, 1000)) > {code} > Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to > {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, > =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and > the UNION ALL operand is pruned. > Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the > issue, due to the _IS_NULL($1)_ inferred predicate, see > [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] > for understanding how the NULL literal is treated differently during > predicate inference. > The problem lies in the fact that, depending on the input _RelNode_ that we > infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, > but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this > case. > h1. Solution > In order to correctly simplify a predicate and test if it's always false or > not, we should build RexSimplify with _predicates_ as the list of predicates > known to hold in the context. In this way, the different semantics are > correctly taken into account. > The code at >
[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26722: Description: h1. Reproducer Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES ('2000', 'b2'); SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000;EXPLAIN CBO SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000; {code} The expected result is: {code:java} 1000 b1 1000 NULL{code} An example of correct plan is as follows: {noformat} CBO PLAN: HiveUnion(all=[true]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_} {noformat} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); {noformat} In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result: {code:java} 1000 b1{code} produced by the following invalid plan: {code:java} CBO PLAN: HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) {code} h1. Problem Analysis At [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] the _RelMetadataQuery::getPulledUpPredicates_ method infers the following predicate due to the CAST(NULL) in the projection: {code:java} (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred. In [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the _UnknownAsFalse_ semantics. To summarize, the following expression is simplified under the _UnknownAsFalse_ semantics: {code:java} AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000)) {code} Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the UNION ALL operand is pruned. Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, due to the _IS_NULL($1)_ inferred predicate, see [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] for understanding how the NULL literal is treated differently during predicate inference. The problem lies in the fact that, depending on the input _RelNode_ that we infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case. h1. Solution In order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with _predicates_ as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account. The code at [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] should be replaced by the following: {code:java} final RexExecutor executor = Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR); final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor); final RexNode x = simplify.simplifyUnknownAs(newCondition, RexUnknownAs.FALSE);{code} was: Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1');
[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26722: Description: Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES ('2000', 'b2'); SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000;EXPLAIN CBO SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000; {code} The expected result is: {code:java} 1000 b1 1000 NULL{code} An example of correct plan is as follows: {noformat} CBO PLAN: HiveUnion(all=[true]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_} {noformat} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); {noformat} In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result: {code:java} 1000 b1{code} produced by the following invalid plan: {code:java} CBO PLAN: HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) {code} h3. Problem Analysis At [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] the _RelMetadataQuery::getPulledUpPredicates_ method infers the following predicate due to the CAST(NULL) in the projection: {code:java} (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred. In [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the _UnknownAsFalse_ semantics. To summarize, the following expression is simplified under the _UnknownAsFalse_ semantics: {code:java} AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000)) {code} Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the UNION ALL operand is pruned. Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, due to the _IS_NULL($1)_ inferred predicate, see [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] for understanding how the NULL literal is treated differently during predicate inference. The problem lies in the fact that, depending on the input _RelNode_ that we infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case. Solution: in order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with _predicates_ as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account. The code at [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] should be replaced by the following: {code:java} final RexExecutor executor = Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR); final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor); final RexNode x = simplify.simplifyUnknownAs(newCondition, RexUnknownAs.FALSE);{code} was: Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES
[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26722: Description: Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES ('2000', 'b2'); SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000;EXPLAIN CBO SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000; {code} The expected result is: {code:java} 1000 b1 1000 NULL{code} An example of correct plan is as follows: {noformat} CBO PLAN: HiveUnion(all=[true]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_} {noformat} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); {noformat} In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result: {code:java} 1000 b1{code} produced by the following invalid plan: CBO PLAN: HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) Problem analysis: At [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] the _RelMetadataQuery::getPulledUpPredicates_ method infers the following predicate due to the CAST(NULL) in the projection: {code:java} (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred. In [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the _UnknownAsFalse_ semantics. To summarize, the following expression is simplified under the _UnknownAsFalse_ semantics: {code:java} AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000)) {code} Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the UNION ALL operand is pruned. Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, due to the _IS_NULL($1)_ inferred predicate, see [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] for understanding how the NULL literal is treated differently during predicate inference. The problem lies in the fact that, depending on the input _RelNode_ that we infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case. Solution: in order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with _predicates_ as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account. The code at [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] should be replaced by the following: {code:java} final RexExecutor executor = Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR); final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor); final RexNode x = simplify.simplifyUnknownAs(newCondition, RexUnknownAs.FALSE);{code} was: Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES
[jira] [Assigned] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26722: --- > HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands > - > > Key: HIVE-26722 > URL: https://issues.apache.org/jira/browse/HIVE-26722 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > Consider the following query: > > {code:java} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > CREATE EXTERNAL TABLE t (a string, b string); > INSERT INTO t VALUES ('1000', 'b1'); > INSERT INTO t VALUES ('2000', 'b2'); > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000;EXPLAIN CBO > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000; {code} > > > The expected result is: > > {code:java} > 1000 b1 > 1000 NULL{code} > > An example of correct plan is as follows: > > {noformat} > CBO PLAN: > HiveUnion(all=[true]) > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) > HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET > "UTF-16LE"]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} > > > Consider now a scenario where expression reduction in projections is disabled > by setting the following property{_}:{_} > {noformat} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > {noformat} > In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, > and we get the following (invalid) result: > 1000 b1 > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26678) In the filter criteria associated with multiple tables, the filter result of the subquery by not in or in is incorrect.
[ https://issues.apache.org/jira/browse/HIVE-26678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629757#comment-17629757 ] Alessandro Solimando commented on HIVE-26678: - Non-CBO codepath has a lot of flaws, it should be probably discontinued at this point given that CBO support is mature and dates back a while. Any specific reason why you are running without CBO in the first place? > In the filter criteria associated with multiple tables, the filter result of > the subquery by not in or in is incorrect. > --- > > Key: HIVE-26678 > URL: https://issues.apache.org/jira/browse/HIVE-26678 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 3.1.0 >Reporter: lotan >Priority: Major > > create testtable as follow: > create table test101 (id string,id2 string); > create table test102 (id string,id2 string); > create table test103 (id string,id2 string); > create table test104 (id string,id2 string); > when cbo is false,run the following SQL statement: > explain select count(1) from test101 t1 > left join test102 t2 on t1.id=t2.id > left join test103 t3 on t1.id=t3.id2 > where t1.id in (select s.id from test104 s) > and t3.id2='123'; > you will see: > The filter criteria in the right table are lost. > The execution plan is as follows: > +-+ > | Explain > | > +-+ > | STAGE DEPENDENCIES: > | > | Stage-9 is a root stage > | > | Stage-3 depends on stages: Stage-9 > | > | Stage-0 depends on stages: Stage-3 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-9 > | > | Map Reduce Local Work > | > | Alias -> Map Local Tables: > | > | sq_1:s > | > | Fetch Operator > | > | limit: -1 > | > | t2 > | > | Fetch Operator > | > | limit: -1 > | > | t3 > | > | Fetch Operator > | > | limit: -1 > | > | Alias -> Map Local Operator Tree: > | > | sq_1:s > | > | TableScan > | > | alias: s > | > | Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE | > | Filter Operator > | > | predicate: id is not null (type: boolean) > | > | Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE | > | Select Operator > | > | expressions: id (type: string) > | > | outputColumnNames: _col0
[jira] [Comment Edited] (HIVE-26691) Generate thrift files by default at compilation time
[ https://issues.apache.org/jira/browse/HIVE-26691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627739#comment-17627739 ] Alessandro Solimando edited comment on HIVE-26691 at 11/2/22 2:26 PM: -- +1 for mentioning in the release notes and in the wiki. For upstream vs downstream, how do developers manage at the moment? On MacOS with brew I have multiple versions installed, and I switch with _brew unlink_ + {_}brew link{_}. This does not seem a problem to me. For the frequency of the update, I don't think it really matters, the same line of reasoning could apply for the JVM, mvn, etc., developers are required to have a proper setup for compiling, adding or removing thrift to that does not make a significant difference. For protobuf I guess the situation is similar, but I have never had to deal with that, I guess it can be addressed in a separate ticket. was (Author: asolimando): +1 for mentioning in the release notes and in the wiki. For upstream vs downstream, how do developers manage at the moment? On MacOS with brew I have multiple versions installed, and I switch with _brew unlink_ + {_}brew link{_}. This does not seem a problem to me. For the frequency of the update, I don't think it really matters, the same line of reasoning could apply for the JVM, mvn, etc., developers are required to have a proper setup for compiling, adding or removing thrift to that does not make a significant difference. For protobuf I guess the situation it's similar, but I have never had to deal with that. > Generate thrift files by default at compilation time > > > Key: HIVE-26691 > URL: https://issues.apache.org/jira/browse/HIVE-26691 > Project: Hive > Issue Type: Task > Components: Thrift API >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Priority: Major > > Currently Hive does not generate thrift files within the main compilation > task ({_}mvn clean install -DskipTests{_}), but it uses a separate profile > ({_}mvn clean install -Pthriftif -DskipTests -Dthrift.home=$thrift_path{_}), > and thrift-generated files are generally committed in VCS. > Other Apache projects like Parquet > ([https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml)] > use a different approach, building all thrift files by default in the main > compilation task. > In general, generated files should not be part of our VCS, only the "source" > file should be (.thrift files here). > Including generated files in VCS is not only problematic because they are > verbose and clog PR diffs, but they also generate a lot of conflicts (even > when the changes over the thrift file can be merged automatically). > The ticket proposes to move the thrift files generation at compile time, > remove the thrift-generated files from VCS, and add them to the "ignore" list. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26691) Generate thrift files by default at compilation time
[ https://issues.apache.org/jira/browse/HIVE-26691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627739#comment-17627739 ] Alessandro Solimando commented on HIVE-26691: - +1 for mentioning in the release notes and in the wiki. For upstream vs downstream, how do developers manage at the moment? On MacOS with brew I have multiple versions installed, and I switch with _brew unlink_ + {_}brew link{_}. This does not seem a problem to me. For the frequency of the update, I don't think it really matters, the same line of reasoning could apply for the JVM, mvn, etc., developers are required to have a proper setup for compiling, adding or removing thrift to that does not make a significant difference. For protobuf I guess the situation it's similar, but I have never had to deal with that. > Generate thrift files by default at compilation time > > > Key: HIVE-26691 > URL: https://issues.apache.org/jira/browse/HIVE-26691 > Project: Hive > Issue Type: Task > Components: Thrift API >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Priority: Major > > Currently Hive does not generate thrift files within the main compilation > task ({_}mvn clean install -DskipTests{_}), but it uses a separate profile > ({_}mvn clean install -Pthriftif -DskipTests -Dthrift.home=$thrift_path{_}), > and thrift-generated files are generally committed in VCS. > Other Apache projects like Parquet > ([https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml)] > use a different approach, building all thrift files by default in the main > compilation task. > In general, generated files should not be part of our VCS, only the "source" > file should be (.thrift files here). > Including generated files in VCS is not only problematic because they are > verbose and clog PR diffs, but they also generate a lot of conflicts (even > when the changes over the thrift file can be merged automatically). > The ticket proposes to move the thrift files generation at compile time, > remove the thrift-generated files from VCS, and add them to the "ignore" list. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26572) Support constant expressions in vectorization
[ https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26572: Description: At the moment, we cannot vectorize aggregate expression having constant parameters in addition to the aggregation column (it's forbidden [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). One compelling example of how this could help is [PR 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + _compute_bit_vector_fm_ when HLL implementation has been added, while _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. Another example is {_}VectorUDAFBloomFilterMerge{_}, receiving an extra constant parameter controlling the number of threads for merging tasks. At the moment this parameter is "injected" when trying to find an appropriate constructor (see [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]). This ad-hoc approach is not scalable and would make the code hard to read and maintain if more UDAFs require constant parameters. In addition, we are probably missing vectorization opportunities if no such ad-hoc treatment is added but an appropriate UDAF constructor is available or could be easily added (data sketches UDAF, although not yet vectorized, are a good target). was: At the moment, we cannot vectorize aggregate expression having constant parameters in addition to the aggregation column (it's forbidden [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). One compelling example of how this could help is [PR 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + _compute_bit_vector_fm_ when HLL implementation has been added, while _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant parameter controlling the number of threads for merging tasks. At the moment this parameter is "injected" when trying to find an appropriate constructor (see [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]). This ad-hoc approach is not scalable and would make the code hard to read and maintain if more UDAF requires constant parameters. In addition, we are probably missing vectorization opportunities if no such ad-hoc treatment is added but an appropriate UDAF constructor is available or could be easily added (data sketches UDAF, although not yet vectorized, are a good target). > Support constant expressions in vectorization > - > > Key: HIVE-26572 > URL: https://issues.apache.org/jira/browse/HIVE-26572 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > At the moment, we cannot vectorize aggregate expression having constant > parameters in addition to the aggregation column (it's forbidden > [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). > One compelling example of how this could help is [PR > 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where > _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + > _compute_bit_vector_fm_ when HLL implementation has been added, while > _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. > Another example is {_}VectorUDAFBloomFilterMerge{_}, receiving an extra > constant parameter controlling the number of threads for merging tasks. At > the moment this parameter is "injected" when trying to find an appropriate > constructor (see > [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]). > This ad-hoc approach is not scalable and would make the code hard to read and > maintain if more UDAFs require constant parameters. > In addition, we are
[jira] [Resolved] (HIVE-26572) Support constant expressions in vectorization
[ https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando resolved HIVE-26572. - Resolution: Fixed Fixed via [f7517bc|https://github.com/apache/hive/commit/f7517bcacad3e33c213fa3cfa8670dac1c25ee92], thanks [~dkuzmenko] and [~teddy.choi] for your reviews! > Support constant expressions in vectorization > - > > Key: HIVE-26572 > URL: https://issues.apache.org/jira/browse/HIVE-26572 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > At the moment, we cannot vectorize aggregate expression having constant > parameters in addition to the aggregation column (it's forbidden > [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). > One compelling example of how this could help is [PR > 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where > _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + > _compute_bit_vector_fm_ when HLL implementation has been added, while > _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. > Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant > parameter controlling the number of threads for merging tasks. At the moment > this parameter is "injected" when trying to find an appropriate constructor > (see > [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]). > This ad-hoc approach is not scalable and would make the code hard to read and > maintain if more UDAF requires constant parameters. > In addition, we are probably missing vectorization opportunities if no such > ad-hoc treatment is added but an appropriate UDAF constructor is available or > could be easily added (data sketches UDAF, although not yet vectorized, are a > good target). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26662) FAILED: SemanticException [Error 10072]: Database does not exist: spark_global_temp_views
[ https://issues.apache.org/jira/browse/HIVE-26662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17622975#comment-17622975 ] Alessandro Solimando commented on HIVE-26662: - This is open source Hive, for vendor specific setups like yours you should get in touch with the vendor support > FAILED: SemanticException [Error 10072]: Database does not exist: > spark_global_temp_views > - > > Key: HIVE-26662 > URL: https://issues.apache.org/jira/browse/HIVE-26662 > Project: Hive > Issue Type: Bug >Reporter: Mahmood Abu Awwad >Priority: Blocker > > while running our batches using Apache Spark with Hive on EMR cluster, as > we're using AWS glue as a MetaStore, it seems there is an issue occurs, which > is > {code:java} > EntityNotFoundException ,Database global_temp not found {code} > {code:java} > 2022-10-09T10:36:31,262 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Completed compiling > command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); > Time taken: 0.02 seconds > 2022-10-09T10:36:31,262 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: reexec.ReExecDriver (:()) - Execution #1 of query > 2022-10-09T10:36:31,262 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a > lock manager > 2022-10-09T10:36:31,262 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Executing > command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736): > show views > 2022-10-09T10:36:31,263 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Starting task [Stage-0:DDL] in serial mode > 2022-10-09T10:36:32,270 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Completed executing > command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); > Time taken: 1.008 seconds > 2022-10-09T10:36:32,270 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - OK > 2022-10-09T10:36:32,270 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a > lock manager > 2022-10-09T10:36:32,271 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: exec.ListSinkOperator (:()) - RECORDS_OUT_INTERMEDIATE:0, > RECORDS_OUT_OPERATOR_LIST_SINK_0:0, > 2022-10-09T10:36:32,271 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: CliDriver (:()) - Time taken: 1.028 seconds > 2022-10-09T10:36:32,271 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: conf.HiveConf (HiveConf.java:getLogIdVar(5104)) - Using the > default value passed in for log id: 573c4ce0-f73c-439b-829d-1f0b25db45ec > 2022-10-09T10:36:32,272 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: session.SessionState (SessionState.java:resetThreadName(452)) - > Resetting thread name to main > 2022-10-09T10:36:46,512 INFO [main([])]: conf.HiveConf > (HiveConf.java:getLogIdVar(5104)) - Using the default value passed in for log > id: 573c4ce0-f73c-439b-829d-1f0b25db45ec > 2022-10-09T10:36:46,513 INFO [main([])]: session.SessionState > (SessionState.java:updateThreadName(441)) - Updating thread name to > 573c4ce0-f73c-439b-829d-1f0b25db45ec main > 2022-10-09T10:36:46,515 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Compiling > command(queryId=hadoop_20221009103646_f390a868-07d7-49f1-b620-70d40e5e2cff): > use global_temp > 2022-10-09T10:36:46,530 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a > lock manager > 2022-10-09T10:36:46,666 ERROR [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - FAILED: SemanticException [Error 10072]: > Database does not exist: global_temp > org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: > global_temp > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2171) > at > org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeSwitchDatabase(DDLSemanticAnalyzer.java:1413) > at > org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:516) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) > at > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) > at > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) > at >
[jira] [Commented] (HIVE-26655) TPC-DS query 17 returns wrong results
[ https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17622974#comment-17622974 ] Alessandro Solimando commented on HIVE-26655: - No statistics means no CBO, there are many queries failing for non-CBO path. CBO has been around for long time now, with HIVE-25880 it's now possibile to disable specific rules in case of issues so there is no need to turn off CBO entirely in the presence of bugs. All this considered, I guess it's time to consider removing the non-CBO path and stop supporting it, rather than trying to fix it > TPC-DS query 17 returns wrong results > - > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Priority: Major > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26652) HiveSortPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26652 started by Alessandro Solimando. --- > HiveSortPullUpConstantsRule produces an invalid plan when pulling up > constants for nullable fields > -- > > Key: HIVE-26652 > URL: https://issues.apache.org/jira/browse/HIVE-26652 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0, 4.0.0-alpha-2 > > > The rule pulls up constants without checking/adjusting nullability to match > that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: type mismatch: > ref: > JavaType(class java.lang.Integer) > input: > JavaType(int) NOT NULL at > org.apache.calcite.util.Litmus$1.fail(Litmus.java:31) > at org.apache.calcite.plan.RelOptUtil.eq(RelOptUtil.java:2167) > at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:125) > at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:57) > at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112) > at org.apache.calcite.rel.core.Project.isValid(Project.java:215) > at org.apache.calcite.rel.core.Project.(Project.java:94) > at org.apache.calcite.rel.core.Project.(Project.java:100) > at > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.(HiveProject.java:58) > at > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.copy(HiveProject.java:106) > at org.apache.calcite.rel.core.Project.copy(Project.java:126) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSortPullUpConstantsRule$HiveSortPullUpConstantsRuleBase.onMatch(HiveSortPullUpConstantsRule.java:195) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) > at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.test(TestHiveSortExchangePullUpConstantsRule.java:104) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.testNullableFields(TestHiveSortExchangePullUpConstantsRule.java:156) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.mockito.internal.runners.DefaultInternalRunner$1$1.evaluate(DefaultInternalRunner.java:54) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.mockito.internal.runners.DefaultInternalRunner$1.run(DefaultInternalRunner.java:99) > at >
[jira] [Assigned] (HIVE-26652) HiveSortPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26652: --- > HiveSortPullUpConstantsRule produces an invalid plan when pulling up > constants for nullable fields > -- > > Key: HIVE-26652 > URL: https://issues.apache.org/jira/browse/HIVE-26652 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > > The rule pulls up constants without checking/adjusting nullability to match > that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: type mismatch: > ref: > JavaType(class java.lang.Integer) > input: > JavaType(int) NOT NULL at > org.apache.calcite.util.Litmus$1.fail(Litmus.java:31) > at org.apache.calcite.plan.RelOptUtil.eq(RelOptUtil.java:2167) > at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:125) > at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:57) > at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112) > at org.apache.calcite.rel.core.Project.isValid(Project.java:215) > at org.apache.calcite.rel.core.Project.(Project.java:94) > at org.apache.calcite.rel.core.Project.(Project.java:100) > at > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.(HiveProject.java:58) > at > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.copy(HiveProject.java:106) > at org.apache.calcite.rel.core.Project.copy(Project.java:126) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSortPullUpConstantsRule$HiveSortPullUpConstantsRuleBase.onMatch(HiveSortPullUpConstantsRule.java:195) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) > at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.test(TestHiveSortExchangePullUpConstantsRule.java:104) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.testNullableFields(TestHiveSortExchangePullUpConstantsRule.java:156) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.mockito.internal.runners.DefaultInternalRunner$1$1.evaluate(DefaultInternalRunner.java:54) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.mockito.internal.runners.DefaultInternalRunner$1.run(DefaultInternalRunner.java:99) > at > org.mockito.internal.runners.DefaultInternalRunner.run(DefaultInternalRunner.java:105)
[jira] [Updated] (HIVE-26652) HiveSortPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26652: Fix Version/s: 4.0.0 > HiveSortPullUpConstantsRule produces an invalid plan when pulling up > constants for nullable fields > -- > > Key: HIVE-26652 > URL: https://issues.apache.org/jira/browse/HIVE-26652 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0, 4.0.0-alpha-2 > > > The rule pulls up constants without checking/adjusting nullability to match > that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: type mismatch: > ref: > JavaType(class java.lang.Integer) > input: > JavaType(int) NOT NULL at > org.apache.calcite.util.Litmus$1.fail(Litmus.java:31) > at org.apache.calcite.plan.RelOptUtil.eq(RelOptUtil.java:2167) > at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:125) > at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:57) > at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112) > at org.apache.calcite.rel.core.Project.isValid(Project.java:215) > at org.apache.calcite.rel.core.Project.(Project.java:94) > at org.apache.calcite.rel.core.Project.(Project.java:100) > at > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.(HiveProject.java:58) > at > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.copy(HiveProject.java:106) > at org.apache.calcite.rel.core.Project.copy(Project.java:126) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSortPullUpConstantsRule$HiveSortPullUpConstantsRuleBase.onMatch(HiveSortPullUpConstantsRule.java:195) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) > at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.test(TestHiveSortExchangePullUpConstantsRule.java:104) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.testNullableFields(TestHiveSortExchangePullUpConstantsRule.java:156) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.mockito.internal.runners.DefaultInternalRunner$1$1.evaluate(DefaultInternalRunner.java:54) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.mockito.internal.runners.DefaultInternalRunner$1.run(DefaultInternalRunner.java:99) > at >
[jira] [Comment Edited] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17618914#comment-17618914 ] Alessandro Solimando edited comment on HIVE-26643 at 10/18/22 9:41 AM: --- A similar issue was fixed for _AggregateProjectPullUpConstantsRule_ in CALCITE-2179, see [https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176] was (Author: asolimando): A similar issue was fixed in this Calcite ticket for {_}AggregateProjectPullUpConstantsRule{_}, see [https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176] from CALCITE-2179 > HiveUnionPullUpConstantsRule produces an invalid plan when pulling up > constants for nullable fields > --- > > Key: HIVE-26643 > URL: https://issues.apache.org/jira/browse/HIVE-26643 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The rule pulls up constants without checking/adjusting nullability to match > that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: Cannot add expression of different type to set: > set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) > NOT NULL f2) NOT NULL > expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT > NULL f2) NOT NULL > set is > rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) > expression is HiveProject(f1=[1], f2=[$0]) > HiveUnion(all=[true]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > {code} > The solution is to check nullability and add a cast when the field is > nullable, since the constant's type is not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26643: Description: The rule pulls up constants without checking/adjusting nullability to match that of the field type. Here is the stack-trace when a nullable type is involved: {code:java} java.lang.AssertionError: Cannot add expression of different type to set: set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) NOT NULL f2) NOT NULL expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT NULL f2) NOT NULL set is rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) expression is HiveProject(f1=[1], f2=[$0]) HiveUnion(all=[true]) HiveProject(f2=[$1]) HiveProject(f1=[$0], f2=[$1]) HiveFilter(condition=[=($0, 1)]) LogicalTableScan(table=[[]]) HiveProject(f2=[$1]) HiveProject(f1=[$0], f2=[$1]) HiveFilter(condition=[=($0, 1)]) LogicalTableScan(table=[[]]) {code} The solution is to check nullability and add a cast when the field is nullable, since the constant's type is not. was: The rule does pull up constants without checking/adjusting nullability to match that of the field type. Here is the stack-trace when a nullable type is involved: {code:java} java.lang.AssertionError: Cannot add expression of different type to set: set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) NOT NULL f2) NOT NULL expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT NULL f2) NOT NULL set is rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) expression is HiveProject(f1=[1], f2=[$0]) HiveUnion(all=[true]) HiveProject(f2=[$1]) HiveProject(f1=[$0], f2=[$1]) HiveFilter(condition=[=($0, 1)]) LogicalTableScan(table=[[]]) HiveProject(f2=[$1]) HiveProject(f1=[$0], f2=[$1]) HiveFilter(condition=[=($0, 1)]) LogicalTableScan(table=[[]]) {code} The solution is to check nullability and add a cast when the field is nullable, since the constant's type is not. > HiveUnionPullUpConstantsRule produces an invalid plan when pulling up > constants for nullable fields > --- > > Key: HIVE-26643 > URL: https://issues.apache.org/jira/browse/HIVE-26643 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The rule pulls up constants without checking/adjusting nullability to match > that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: Cannot add expression of different type to set: > set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) > NOT NULL f2) NOT NULL > expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT > NULL f2) NOT NULL > set is > rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) > expression is HiveProject(f1=[1], f2=[$0]) > HiveUnion(all=[true]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > {code} > The solution is to check nullability and add a cast when the field is > nullable, since the constant's type is not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17618914#comment-17618914 ] Alessandro Solimando edited comment on HIVE-26643 at 10/17/22 1:52 PM: --- A similar issue was fixed in this Calcite ticket for {_}AggregateProjectPullUpConstantsRule{_}, see [https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176] from CALCITE-2179 was (Author: asolimando): A similar issue was fixed in this Calcite ticket for _AggregateProjectPullUpConstantsRule_, see https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176 > HiveUnionPullUpConstantsRule produces an invalid plan when pulling up > constants for nullable fields > --- > > Key: HIVE-26643 > URL: https://issues.apache.org/jira/browse/HIVE-26643 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The rule does pull up constants without checking/adjusting nullability to > match that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: Cannot add expression of different type to set: > set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) > NOT NULL f2) NOT NULL > expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT > NULL f2) NOT NULL > set is > rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) > expression is HiveProject(f1=[1], f2=[$0]) > HiveUnion(all=[true]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > {code} > The solution is to check nullability and add a cast when the field is > nullable, since the constant's type is not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17618914#comment-17618914 ] Alessandro Solimando commented on HIVE-26643: - A similar issue was fixed in this Calcite ticket for _AggregateProjectPullUpConstantsRule_, see https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176 > HiveUnionPullUpConstantsRule produces an invalid plan when pulling up > constants for nullable fields > --- > > Key: HIVE-26643 > URL: https://issues.apache.org/jira/browse/HIVE-26643 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The rule does pull up constants without checking/adjusting nullability to > match that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: Cannot add expression of different type to set: > set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) > NOT NULL f2) NOT NULL > expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT > NULL f2) NOT NULL > set is > rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) > expression is HiveProject(f1=[1], f2=[$0]) > HiveUnion(all=[true]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > {code} > The solution is to check nullability and add a cast when the field is > nullable, since the constant's type is not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26643: Summary: HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields (was: HiveUnionPullUpConstantsRule fails when pulling up constants for nullable fields) > HiveUnionPullUpConstantsRule produces an invalid plan when pulling up > constants for nullable fields > --- > > Key: HIVE-26643 > URL: https://issues.apache.org/jira/browse/HIVE-26643 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The rule does pull up constants without checking/adjusting nullability to > match that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: Cannot add expression of different type to set: > set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) > NOT NULL f2) NOT NULL > expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT > NULL f2) NOT NULL > set is > rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) > expression is HiveProject(f1=[1], f2=[$0]) > HiveUnion(all=[true]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > {code} > The solution is to check nullability and add a cast when the field is > nullable, since the constant's type is not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26643) HiveUnionPullUpConstantsRule fails when pulling up constants over nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26643: --- > HiveUnionPullUpConstantsRule fails when pulling up constants over nullable > fields > - > > Key: HIVE-26643 > URL: https://issues.apache.org/jira/browse/HIVE-26643 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The rule does pull up constants without checking/adjusting nullability to > match that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: Cannot add expression of different type to set: > set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) > NOT NULL f2) NOT NULL > expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT > NULL f2) NOT NULL > set is > rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) > expression is HiveProject(f1=[1], f2=[$0]) > HiveUnion(all=[true]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > {code} > The solution is to check nullability and add a cast when the field is > nullable, since the constant's type is not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26643) HiveUnionPullUpConstantsRule fails when pulling up constants for nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26643: Summary: HiveUnionPullUpConstantsRule fails when pulling up constants for nullable fields (was: HiveUnionPullUpConstantsRule fails when pulling up constants over nullable fields) > HiveUnionPullUpConstantsRule fails when pulling up constants for nullable > fields > > > Key: HIVE-26643 > URL: https://issues.apache.org/jira/browse/HIVE-26643 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The rule does pull up constants without checking/adjusting nullability to > match that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: Cannot add expression of different type to set: > set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) > NOT NULL f2) NOT NULL > expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT > NULL f2) NOT NULL > set is > rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) > expression is HiveProject(f1=[1], f2=[$0]) > HiveUnion(all=[true]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > {code} > The solution is to check nullability and add a cast when the field is > nullable, since the constant's type is not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26643) HiveUnionPullUpConstantsRule fails when pulling up constants over nullable fields
[ https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26643 started by Alessandro Solimando. --- > HiveUnionPullUpConstantsRule fails when pulling up constants over nullable > fields > - > > Key: HIVE-26643 > URL: https://issues.apache.org/jira/browse/HIVE-26643 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The rule does pull up constants without checking/adjusting nullability to > match that of the field type. > Here is the stack-trace when a nullable type is involved: > {code:java} > java.lang.AssertionError: Cannot add expression of different type to set: > set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) > NOT NULL f2) NOT NULL > expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT > NULL f2) NOT NULL > set is > rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true) > expression is HiveProject(f1=[1], f2=[$0]) > HiveUnion(all=[true]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > HiveProject(f2=[$1]) > HiveProject(f1=[$0], f2=[$1]) > HiveFilter(condition=[=($0, 1)]) > LogicalTableScan(table=[[]]) > {code} > The solution is to check nullability and add a cast when the field is > nullable, since the constant's type is not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26619) Sonar analysis is not run for the master branch
[ https://issues.apache.org/jira/browse/HIVE-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26619: Summary: Sonar analysis is not run for the master branch (was: Sonar analysis not run on the master branch) > Sonar analysis is not run for the master branch > --- > > Key: HIVE-26619 > URL: https://issues.apache.org/jira/browse/HIVE-26619 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The analysis for the master branch was using the wrong variable name > (_CHANGE_BRANCH_) instead of the branch name (_BRANCH_NAME_). > For an overview of git-related environment variables available in Jenkins, > you can refer to [https://ci.eclipse.org/webtools/env-vars.html/]. > With [~zabetak] we have noticed some spurious files in Sonar analysis for > PRs, as per this sonar support thread it might be linked to the stale > analysis of the target branch (master for us): > [https://community.sonarsource.com/t/unrelated-files-scanned-in-sonarcloud-pr-check/47138/14] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26619) Sonar analysis not run on the master branch
[ https://issues.apache.org/jira/browse/HIVE-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26619 started by Alessandro Solimando. --- > Sonar analysis not run on the master branch > --- > > Key: HIVE-26619 > URL: https://issues.apache.org/jira/browse/HIVE-26619 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The analysis for the master branch was using the wrong variable name > (_CHANGE_BRANCH_) instead of the branch name (_BRANCH_NAME_). > For an overview of git-related environment variables available in Jenkins, > you can refer to [https://ci.eclipse.org/webtools/env-vars.html/]. > With [~zabetak] we have noticed some spurious files in Sonar analysis for > PRs, as per this sonar support thread it might be linked to the stale > analysis of the target branch (master for us): > [https://community.sonarsource.com/t/unrelated-files-scanned-in-sonarcloud-pr-check/47138/14] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26619) Sonar analysis not run on the master branch
[ https://issues.apache.org/jira/browse/HIVE-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26619: --- > Sonar analysis not run on the master branch > --- > > Key: HIVE-26619 > URL: https://issues.apache.org/jira/browse/HIVE-26619 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The analysis for the master branch was using the wrong variable name > (_CHANGE_BRANCH_) instead of the branch name (_BRANCH_NAME_). > For an overview of git-related environment variables available in Jenkins, > you can refer to [https://ci.eclipse.org/webtools/env-vars.html/]. > With [~zabetak] we have noticed some spurious files in Sonar analysis for > PRs, as per this sonar support thread it might be linked to the stale > analysis of the target branch (master for us): > [https://community.sonarsource.com/t/unrelated-files-scanned-in-sonarcloud-pr-check/47138/14] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26572) Support constant expressions in vectorization
[ https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26572: Labels: pull-request-available (was: ) > Support constant expressions in vectorization > - > > Key: HIVE-26572 > URL: https://issues.apache.org/jira/browse/HIVE-26572 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > > At the moment, we cannot vectorize aggregate expression having constant > parameters in addition to the aggregation column (it's forbidden > [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). > One compelling example of how this could help is [PR > 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where > _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + > _compute_bit_vector_fm_ when HLL implementation has been added, while > _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. > Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant > parameter controlling the number of threads for merging tasks. At the moment > this parameter is "injected" when trying to find an appropriate constructor > (see > [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]). > This ad-hoc approach is not scalable and would make the code hard to read and > maintain if more UDAF requires constant parameters. > In addition, we are probably missing vectorization opportunities if no such > ad-hoc treatment is added but an appropriate UDAF constructor is available or > could be easily added (data sketches UDAF, although not yet vectorized, are a > good target). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26572) Support constant expressions in vectorization
[ https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26572: Description: At the moment, we cannot vectorize aggregate expression having constant parameters in addition to the aggregation column (it's forbidden [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). One compelling example of how this could help is [PR 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + _compute_bit_vector_fm_ when HLL implementation has been added, while _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant parameter controlling the number of threads for merging tasks. At the moment this parameter is "injected" when trying to find an appropriate constructor (see [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]). This ad-hoc approach is not scalable and would make the code hard to read and maintain if more UDAF requires constant parameters. In addition, we are probably missing vectorization opportunities if no such ad-hoc treatment is added but an appropriate UDAF constructor is available or could be easily added (data sketches UDAF, although not yet vectorized, are a good target). was: At the moment, we cannot vectorize aggregate expression having constant parameters in addition to the aggregation column (it's forbidden [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). One compelling example of how this could help is [PR 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + _compute_bit_vector_fm_ when HLL implementation has been added, while _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. > Support constant expressions in vectorization > - > > Key: HIVE-26572 > URL: https://issues.apache.org/jira/browse/HIVE-26572 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > At the moment, we cannot vectorize aggregate expression having constant > parameters in addition to the aggregation column (it's forbidden > [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). > One compelling example of how this could help is [PR > 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where > _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + > _compute_bit_vector_fm_ when HLL implementation has been added, while > _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. > Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant > parameter controlling the number of threads for merging tasks. At the moment > this parameter is "injected" when trying to find an appropriate constructor > (see > [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]). > This ad-hoc approach is not scalable and would make the code hard to read and > maintain if more UDAF requires constant parameters. > In addition, we are probably missing vectorization opportunities if no such > ad-hoc treatment is added but an appropriate UDAF constructor is available or > could be easily added (data sketches UDAF, although not yet vectorized, are a > good target). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26572) Support constant expressions in vectorization
[ https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26572: --- Assignee: Alessandro Solimando > Support constant expressions in vectorization > - > > Key: HIVE-26572 > URL: https://issues.apache.org/jira/browse/HIVE-26572 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > At the moment, we cannot vectorize aggregate expression having constant > parameters in addition to the aggregation column (it's forbidden > [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). > One compelling example of how this could help is [PR > 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where > _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + > _compute_bit_vector_fm_ when HLL implementation has been added, while > _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26572) Support constant expressions in vectorization
[ https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26572 started by Alessandro Solimando. --- > Support constant expressions in vectorization > - > > Key: HIVE-26572 > URL: https://issues.apache.org/jira/browse/HIVE-26572 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > At the moment, we cannot vectorize aggregate expression having constant > parameters in addition to the aggregation column (it's forbidden > [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). > One compelling example of how this could help is [PR > 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where > _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + > _compute_bit_vector_fm_ when HLL implementation has been added, while > _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26572) Support constant expressions in vectorization
[ https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26572: Summary: Support constant expressions in vectorization (was: Support constant expressions in vectorized expressions) > Support constant expressions in vectorization > - > > Key: HIVE-26572 > URL: https://issues.apache.org/jira/browse/HIVE-26572 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Priority: Major > > At the moment, we cannot vectorize aggregate expression having constant > parameters in addition to the aggregation column (it's forbidden > [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]). > One compelling example of how this could help is [PR > 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where > _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + > _compute_bit_vector_fm_ when HLL implementation has been added, while > _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26221) Add histogram-based column statistics
[ https://issues.apache.org/jira/browse/HIVE-26221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26221 started by Alessandro Solimando. --- > Add histogram-based column statistics > - > > Key: HIVE-26221 > URL: https://issues.apache.org/jira/browse/HIVE-26221 > Project: Hive > Issue Type: Improvement > Components: CBO, Metastore, Statistics >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Hive does not support histogram statistics, which are particularly useful for > skewed data (which is very common in practice) and range predicates. > Hive's current selectivity estimation for range predicates is based on a > hard-coded value of 1/3 (see > [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).]) > The current proposal aims at integrating histogram as an additional column > statistics, stored into the Hive metastore at the table (or partition) level. > The main requirements for histogram integration are the following: > * efficiency: the approach must scale and support billions of rows > * merge-ability: partition-level histograms have to be merged to form > table-level histograms > * explicit and configurable trade-off between memory footprint and accuracy > Hive already integrates [KLL data > sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. > Datasketches are small, stateful programs that process massive data-streams > and can provide approximate answers, with mathematical guarantees, to > computationally difficult queries orders-of-magnitude faster than > traditional, exact methods. > We propose to use KLL, and more specifically the cumulative distribution > function (CDF), as the underlying data structure for our histogram statistics. > The current proposal targets numeric data types (float, integer and numeric > families) and temporal data types (date and timestamp). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26548) Hive Data load without closing the session
[ https://issues.apache.org/jira/browse/HIVE-26548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando resolved HIVE-26548. - Resolution: Invalid > Hive Data load without closing the session > -- > > Key: HIVE-26548 > URL: https://issues.apache.org/jira/browse/HIVE-26548 > Project: Hive > Issue Type: New Feature > Environment: Test >Reporter: Ashok kumar >Priority: Major > > Hi i am new to hive and i want understand what is the best way to load the > below data. > > I am receving 50 countries data in a separate db for each country, each db > has 250 tables also the dbs will be avaible in different dates with suffix,my > reporting team needs consolidated db data for their analysis. So i have > implemented, a loop using shell script to load data from each table and > insert it into targrt table , with this approach hive is creating a seasion > and closing a session for each table ,hence it is taking days to complete the > process. Can any one help the best way to implement it using shell and hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26548) Hive Data load without closing the session
[ https://issues.apache.org/jira/browse/HIVE-26548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607040#comment-17607040 ] Alessandro Solimando commented on HIVE-26548: - Jira tickets are for discussing bugs/features etc., not for user use cases. You can try reaching out to the Hive user mailing list instead by sending an email to "u...@hive.apache.org" (you need to register first in order to see the replies). > Hive Data load without closing the session > -- > > Key: HIVE-26548 > URL: https://issues.apache.org/jira/browse/HIVE-26548 > Project: Hive > Issue Type: New Feature > Environment: Test >Reporter: Ashok kumar >Priority: Major > > Hi i am new to hive and i want understand what is the best way to load the > below data. > > I am receving 50 countries data in a separate db for each country, each db > has 250 tables also the dbs will be avaible in different dates with suffix,my > reporting team needs consolidated db data for their analysis. So i have > implemented, a loop using shell script to load data from each table and > insert it into targrt table , with this approach hive is creating a seasion > and closing a session for each table ,hence it is taking days to complete the > process. Can any one help the best way to implement it using shell and hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-25848) Empty result for structs in point lookup optimization with vectorization on
[ https://issues.apache.org/jira/browse/HIVE-25848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603546#comment-17603546 ] Alessandro Solimando edited comment on HIVE-25848 at 9/13/22 12:25 PM: --- I gave a quick look at the transformation that _HivePointLookup_ is doing and there is nothing wrong there, I think you are right [~ghanko] that the problem lies in the vectorization handling of _IN_ clauses involving _struct_ (turning off CBO or that specific rule simply prevents the appearance of such clauses, but they are not the root cause of the issue). was (Author: asolimando): I gave a quick look at the transformation that _HivePointLookup_ is doing and there is nothing wrong there, I think you are right [~ghanko] that the problem lies in the vectorization handling of _IN_ clauses involving _struct._ > Empty result for structs in point lookup optimization with vectorization on > --- > > Key: HIVE-25848 > URL: https://issues.apache.org/jira/browse/HIVE-25848 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Repro steps: > {code:java} > set hive.fetch.task.conversion=none; > create table test (a string) partitioned by (y string, m string); > insert into test values ('aa', 2022, 1); > select * from test where (y=year(date_sub(current_date,4)) and > m=month(date_sub(current_date,4))) or (y=year(date_sub(current_date,10)) and > m=month(date_sub(current_date,10)) ); > --gives empty result{code} > Turning either of the feature below off yields to good result (1 row > expected): > {code:java} > set hive.optimize.point.lookup=false; > set hive.cbo.enable=false; > set hive.vectorized.execution.enabled=false; > {code} > Expected good result is: > {code} > +-+-+-+ > | test.a | test.y | test.m | > +-+-+-+ > | aa | 2022 | 1 | > +-+-+-+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-25848) Empty result for structs in point lookup optimization with vectorization on
[ https://issues.apache.org/jira/browse/HIVE-25848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603546#comment-17603546 ] Alessandro Solimando commented on HIVE-25848: - I gave a quick look at the transformation that _HivePointLookup_ is doing and there is nothing wrong there, I think you are right [~ghanko] that the problem lies in the vectorization handling of _IN_ clauses involving _struct._ > Empty result for structs in point lookup optimization with vectorization on > --- > > Key: HIVE-25848 > URL: https://issues.apache.org/jira/browse/HIVE-25848 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Repro steps: > {code:java} > set hive.fetch.task.conversion=none; > create table test (a string) partitioned by (y string, m string); > insert into test values ('aa', 2022, 1); > select * from test where (y=year(date_sub(current_date,4)) and > m=month(date_sub(current_date,4))) or (y=year(date_sub(current_date,10)) and > m=month(date_sub(current_date,10)) ); > --gives empty result{code} > Turning either of the feature below off yields to good result (1 row > expected): > {code:java} > set hive.optimize.point.lookup=false; > set hive.cbo.enable=false; > set hive.vectorized.execution.enabled=false; > {code} > Expected good result is: > {code} > +-+-+-+ > | test.a | test.y | test.m | > +-+-+-+ > | aa | 2022 | 1 | > +-+-+-+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26400) Provide docker images for Hive
[ https://issues.apache.org/jira/browse/HIVE-26400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580687#comment-17580687 ] Alessandro Solimando commented on HIVE-26400: - [~dengzh], thanks for tackling this issue, improving the developer experience in Hive is very much needed. Me too I had problems with hive-dev-box too at the beginning, as [~zabetak] said it's very rich in features but documentation could be improved and/or updated. My feeling is that there is too much overlap to just start from scratch once again (it would be the third project in this space as already mentioned). Let's also keep in mind that hive-dev-box is used to run tests in CI, I feel that trying to integrate it into this repository and improving it would be the best investment for the community. In the process we could add or remove features as we see fit, but most importantly we must improve the documentation so that any newcomer can set it up easily without having to ask for help like it's the case now. WDYT? > Provide docker images for Hive > -- > > Key: HIVE-26400 > URL: https://issues.apache.org/jira/browse/HIVE-26400 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > Make Apache Hive be able to run inside docker container in pseudo-distributed > mode, with MySQL/Derby as its back database, provide the following: > * Quick-start/Debugging/Prepare a test env for Hive; > * Tools to build target image with specified version of Hive and its > dependencies; > * Images can be used as the basis for the Kubernetes operator. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs
[ https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578335#comment-17578335 ] Alessandro Solimando commented on HIVE-26196: - It's working for me, you must be indeed missing permissions but I am not an admin on Jenkins, I can't help with that > Integrate Sonar analysis for the master branch and PRs > -- > > Key: HIVE-26196 > URL: https://issues.apache.org/jira/browse/HIVE-26196 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > The aim of the ticket is to integrate SonarCloud analysis for the master > branch and PRs. > The ticket does not cover test coverage at the moment (it can be added in > follow-up tickets, if there is enough interest). > From preliminary tests, the analysis step requires 30 additional minutes for > the pipeline, but this step is run in parallel with the test run, so the > total end-to-end run-time is not affected. > The idea for this first integration is to track code quality metrics over new > commits in the master branch and for PRs, without any quality gate rules > (i.e., the analysis will never fail, independently of the values of the > quality metrics). > An example of analysis is available in the ASF Sonar account for Hive: [PR > analysis|https://sonarcloud.io/summary/new_code?id=apache_hive=3254] > After integrating the changes, PRs will also be decorated with a link to the > analysis to be able to better evaluate any pain points of the contribution at > an earlier stage, making the life of the reviewers a bit easier. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs
[ https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577820#comment-17577820 ] Alessandro Solimando commented on HIVE-26196: - Sure, on Jenkins side the token is handled in the credentials section: [http://ci.hive.apache.org/credentials/] On the SonarCloud side the token can be generated here: [https://sonarcloud.io/account/security] > Integrate Sonar analysis for the master branch and PRs > -- > > Key: HIVE-26196 > URL: https://issues.apache.org/jira/browse/HIVE-26196 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > The aim of the ticket is to integrate SonarCloud analysis for the master > branch and PRs. > The ticket does not cover test coverage at the moment (it can be added in > follow-up tickets, if there is enough interest). > From preliminary tests, the analysis step requires 30 additional minutes for > the pipeline, but this step is run in parallel with the test run, so the > total end-to-end run-time is not affected. > The idea for this first integration is to track code quality metrics over new > commits in the master branch and for PRs, without any quality gate rules > (i.e., the analysis will never fail, independently of the values of the > quality metrics). > An example of analysis is available in the ASF Sonar account for Hive: [PR > analysis|https://sonarcloud.io/summary/new_code?id=apache_hive=3254] > After integrating the changes, PRs will also be decorated with a link to the > analysis to be able to better evaluate any pain points of the contribution at > an earlier stage, making the life of the reviewers a bit easier. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs
[ https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577476#comment-17577476 ] Alessandro Solimando edited comment on HIVE-26196 at 8/9/22 4:00 PM: - I think you mean something like this [~zabetak]: [https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-maven/] ? Basically there are information that need to be added at the pom level (maven sonar plugin and optionally some parameters which can also be passed to mvn via command line), the rest goes to the mvn command invocation as described in the guide. For what concerns jacoco the story is a bit more complicated and for projects with many submodules like Hive it's not straightforward how to set it up, will cover that in a follow-up ticket. EDIT: some more info on the SonarCloud part: * [https://sonarcloud.io/project/roles?id=apache_hive] <-- user permissions can be set here (it will be important to add at least all active committers here to mark false positives for reviews) * [https://sonarcloud.io/project/quality_gate?id=apache_hive] <-- quality gate can be set here among existing quality gates, if creating a new one is needed you need to contact infra (see INFRA-23557) * I had to add a sonar token to hive ci This is all it took to get it up and running as it is now was (Author: asolimando): I think you mean something like this [~zabetak]: [https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-maven/] ? Basically there are information that need to be added at the pom level (maven sonar plugin and optionally some parameters which can also be passed to mvn via command line), the rest goes to the mvn command invocation as described in the guide. For what concerns jacoco the story is a bit more complicated and for projects with many submodules like Hive it's not straightforward how to set it up, will cover that in a follow-up ticket. > Integrate Sonar analysis for the master branch and PRs > -- > > Key: HIVE-26196 > URL: https://issues.apache.org/jira/browse/HIVE-26196 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > The aim of the ticket is to integrate SonarCloud analysis for the master > branch and PRs. > The ticket does not cover test coverage at the moment (it can be added in > follow-up tickets, if there is enough interest). > From preliminary tests, the analysis step requires 30 additional minutes for > the pipeline, but this step is run in parallel with the test run, so the > total end-to-end run-time is not affected. > The idea for this first integration is to track code quality metrics over new > commits in the master branch and for PRs, without any quality gate rules > (i.e., the analysis will never fail, independently of the values of the > quality metrics). > An example of analysis is available in the ASF Sonar account for Hive: [PR > analysis|https://sonarcloud.io/summary/new_code?id=apache_hive=3254] > After integrating the changes, PRs will also be decorated with a link to the > analysis to be able to better evaluate any pain points of the contribution at > an earlier stage, making the life of the reviewers a bit easier. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs
[ https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577476#comment-17577476 ] Alessandro Solimando commented on HIVE-26196: - I think you mean something like this [~zabetak]: [https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-maven/] ? Basically there are information that need to be added at the pom level (maven sonar plugin and optionally some parameters which can also be passed to mvn via command line), the rest goes to the mvn command invocation as described in the guide. For what concerns jacoco the story is a bit more complicated and for projects with many submodules like Hive it's not straightforward how to set it up, will cover that in a follow-up ticket. > Integrate Sonar analysis for the master branch and PRs > -- > > Key: HIVE-26196 > URL: https://issues.apache.org/jira/browse/HIVE-26196 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > The aim of the ticket is to integrate SonarCloud analysis for the master > branch and PRs. > The ticket does not cover test coverage at the moment (it can be added in > follow-up tickets, if there is enough interest). > From preliminary tests, the analysis step requires 30 additional minutes for > the pipeline, but this step is run in parallel with the test run, so the > total end-to-end run-time is not affected. > The idea for this first integration is to track code quality metrics over new > commits in the master branch and for PRs, without any quality gate rules > (i.e., the analysis will never fail, independently of the values of the > quality metrics). > An example of analysis is available in the ASF Sonar account for Hive: [PR > analysis|https://sonarcloud.io/summary/new_code?id=apache_hive=3254] > After integrating the changes, PRs will also be decorated with a link to the > analysis to be able to better evaluate any pain points of the contribution at > an earlier stage, making the life of the reviewers a bit easier. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-25909) Add test for 'hive.default.nulls.last' property for windows with ordering
[ https://issues.apache.org/jira/browse/HIVE-25909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567331#comment-17567331 ] Alessandro Solimando commented on HIVE-25909: - For the record, I could not find what the SQL standard dictates for the default of ORDER BY in windowing functions regarding NULLs, so I have compared against the major RDBMs and found that we Hive is aligned with every of them (apart from MySQL), my findings as follows: {noformat} SELECT username, action, amount, row_number() OVER (PARTITION BY username, action ORDER BY action DESC, amount DESC) FROM event; Oracle 11g R2: johnbuy (null) 1 johnbuy 39 2 johnbuy 25 3 johnsell20 1 johnsell3 2 MySQL 8.0: johnbuy 39 1 johnbuy 25 2 johnbuy null3 johnsell20 1 johnsell3 2 Postgres 13: johnsell20 1 johnsell3 2 johnbuy null1 johnbuy 39 2 johnbuy 25 3 Hive: johnsell20 1 johnsell3 2 johnbuy NULL1 johnbuy 39 2 johnbuy 25 3 {noformat} {noformat} SELECT username, action, amount, row_number() OVER (PARTITION BY username, action ORDER BY action DESC, amount DESC NULLS LAST) FROM event; Oracle 11g R2: johnbuy 39 1 johnbuy 25 2 johnbuy (null) 3 johnsell20 1 johnsell3 2 MySQL 8.0: it does not support "NULLS LAST" syntax Postgres 13: johnsell20 1 johnsell3 2 johnbuy 39 1 johnbuy 25 2 johnbuy null3 Hive: johnsell20 1 johnsell3 2 johnbuy 39 1 johnbuy 25 2 johnbuy NULL3 {noformat} {noformat} SELECT username, action, amount, row_number() OVER (PARTITION BY username, action ORDER BY action DESC, amount DESC NULLS FIRST) FROM event; Oracle 11g R2: johnbuy (null) 1 johnbuy 39 2 johnbuy 25 3 johnsell20 1 johnsell3 2 MySQL 8.0: it does not support "NULLS FIRST" syntax Postgres 13: johnsell20 1 johnsell3 2 johnbuy null1 johnbuy 39 2 johnbuy 25 3 Hive: johnsell20 1 johnsell3 2 johnbuy NULL1 johnbuy 39 2 johnbuy 25 3 {noformat} > Add test for 'hive.default.nulls.last' property for windows with ordering > - > > Key: HIVE-25909 > URL: https://issues.apache.org/jira/browse/HIVE-25909 > Project: Hive > Issue Type: Test > Components: CBO >Affects Versions: 4.0.0 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0, 4.0.0-alpha-1 > > Time Spent: 1h > Remaining Estimate: 0h > > Add a test around "hive.default.nulls.last" configuration property and its > interaction with order by clauses within windows. > The property is known to respect such properties: > > ||hive.default.nulls.last||ASC||DESC|| > |true|NULL LAST|NULL FIRST| > |false|NULL FIRST|NULL LAST| > > The test can be based along the line of the following examples: > {noformat} > -- hive.default.nulls.last is true by default, it sets NULLS_FIRST for DESC > set hive.default.nulls.last; > OUT: > hive.default.nulls.last=true > SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC) > FROM test1; > OUT: > John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1 > John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2 > John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3 > John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4 > John Doe 1990-05-10 00:00:00 2021-09-10 00:00:00 5 > John Doe 1987-05-10 00:00:00 NULL 1 > John Doe 1987-05-10 00:00:00 2022-01-10 00:00:00 2 > John Doe 1987-05-10 00:00:00 2021-12-10 00:00:00 3 > John Doe 1987-05-10 00:00:00 2021-11-10 00:00:00 4 > John Doe 1987-05-10 00:00:00 2021-10-10 00:00:00 5 > -- we set hive.default.nulls.last=false, it sets NULLS_LAST for DESC > set hive.default.nulls.last=false; > SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC) > FROM test1; > OUT: > John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1 > John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2 > John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3 > John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4 > John Doe 1990-05-10 00:00:00 2021-09-10
[jira] [Commented] (HIVE-26383) OOM during join query
[ https://issues.apache.org/jira/browse/HIVE-26383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564956#comment-17564956 ] Alessandro Solimando commented on HIVE-26383: - [~pkumarsinha], does it reproduce if you trim the table/query further? > OOM during join query > - > > Key: HIVE-26383 > URL: https://issues.apache.org/jira/browse/HIVE-26383 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Priority: Major > > {code:java} > [ERROR] > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[innerjoin_cal_with_insert] > Time elapsed: 100.73 s <<< ERROR! > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.util.HashMap.newTreeNode(HashMap.java:1784) > at java.util.HashMap$TreeNode.putTreeVal(HashMap.java:2029) > at java.util.HashMap.putVal(HashMap.java:639) > at java.util.HashMap.put(HashMap.java:613) > at java.util.HashSet.add(HashSet.java:220) > at > org.apache.hadoop.hive.ql.optimizer.calcite.stats.EstimateUniqueKeys.getUniqueKeys(EstimateUniqueKeys.java:229) > at > org.apache.hadoop.hive.ql.optimizer.calcite.stats.EstimateUniqueKeys.getUniqueKeys(EstimateUniqueKeys.java:304) > at > org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.isKey(HiveRelMdRowCount.java:501) > at > org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.analyzeJoinForPKFK(HiveRelMdRowCount.java:302) > at > org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:102) > at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) > at > org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212) > at > org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1882) > at > org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1756) > at > org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addToTop(LoptOptimizeJoinRule.java:1233) > at > org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addFactorToTree(LoptOptimizeJoinRule.java:927) > at > org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createOrdering(LoptOptimizeJoinRule.java:728) > at > org.apache.calcite.rel.rules.LoptOptimizeJoinRule.findBestOrderings(LoptOptimizeJoinRule.java:459) > at > org.apache.calcite.rel.rules.LoptOptimizeJoinRule.onMatch(LoptOptimizeJoinRule.java:128) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2468) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2427) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyJoinOrderingTransform(CalcitePlanner.java:2193) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1750) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1605) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-20628) Parsing error when using a complex map data type under dynamic column masking
[ https://issues.apache.org/jira/browse/HIVE-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando resolved HIVE-20628. - Resolution: Invalid This is not a bug nor a regression since complex data types have never been supported by Hive. > Parsing error when using a complex map data type under dynamic column masking > - > > Key: HIVE-20628 > URL: https://issues.apache.org/jira/browse/HIVE-20628 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, Parser, Security >Affects Versions: 2.1.0 > Environment: The error can be simulated using HDP 2.6.4 sandbox >Reporter: Darryl Dutton >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When trying to use the map complex data type as part of dynamic column mask, > Hive throws a parsing error as it is expecting a primitive type (see trace > pasted below). The use case is trying to apply masking to elements within a > map type by applying a custom hive UDF (to apply the mask) using Ranger. > Expect Hive to support complex data types for masking in addition to the > primitive types. The expectation occurs when Hive need to evaluate the UDF or > apply a standard mask (pass-through works as expected). You can recreate the > problem by creating a simple table with a map data type column, then applying > the masking to that column through a Ranger resource based policy and a > custom function (you can use a standard Hive UDF str_to_map('F4','') to > simulate returning a map). > CREATE TABLE `mask_test`( > `key` string, > `value` map) > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > > INSERT INTO TABLE mask_test > SELECT 'AAA' as key, > map('F1','2022','F2','','F3','333') as value > FROM (select 1 ) as temp; > > > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException: > line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type > specification > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146) > ... 15 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize > input near 'map' '<' 'string' in primitive type specification > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26378) Improve error message for masking over complex data types
[ https://issues.apache.org/jira/browse/HIVE-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26378: Description: The current error when applying column masking over (unsupported) complex data types could be improved and be more explicit. Currently, the thrown error is as follows: {noformat} Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException: line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type specification at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146) ... 15 more Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type specification at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368) {noformat} > Improve error message for masking over complex data types > - > > Key: HIVE-26378 > URL: https://issues.apache.org/jira/browse/HIVE-26378 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Security >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > The current error when applying column masking over (unsupported) complex > data types could be improved and be more explicit. > Currently, the thrown error is as follows: > {noformat} > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException: > line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type > specification > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146) > ... 15 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize > input near 'map' '<' 'string' in primitive type specification > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)