from:"Alessandro Solimando \(Jira\)"

[jira] [Comment Edited] (HIVE-25952) Drop HiveRelMdPredicates::getPredicates(Project...) to use that of RelMdPredicates

2024-07-23 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868103#comment-17868103
 ] 

Alessandro Solimando edited comment on HIVE-25952 at 7/23/24 3:44 PM:
--

It's been a long time but IIRC, I marked HIVE-25966 as blocking this ticket due 
to 
[HiveRelMdPredicates.java#L160|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L160].

It seems that the opposite should be true, we can't fix HIVE-25966 (and adopt 
Calcite's machinery) without making them agree on what a constant is (the 
RexCall case you mentioned, with the imprecision you correctly spotted).

Most probably the test results are not available anymore, but the divergence of 
Hive/Calcite on what is a constant where causing some issues that had to be 
fixed.

EDIT: when the run of your PR is over we will probably see what issues blocked 
me back then when working on it. I don't remember how big these changes were, 
but for sure they were there.


was (Author: asolimando):
It's been a long time but IIRC, I marked HIVE-25966 as blocking this ticket due 
to 
[HiveRelMdPredicates.java#L160|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L160].

It seems that the opposite should be true, we can't fix HIVE-25966 (and adopt 
Calcite's machinery) without making them agree on what a constant is (the 
RexCall case you mentioned, with the imprecision you correctly spotted).

Most probably the test results are not available anymore, but the divergence of 
Hive/Calcite on what is a constant where causing some issues that had to be 
fixed.

> Drop HiveRelMdPredicates::getPredicates(Project...) to use that of 
> RelMdPredicates
> --
>
> Key: HIVE-25952
> URL: https://issues.apache.org/jira/browse/HIVE-25952
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are some differences on this method between Hive and Calcite, the idea 
> of this ticket is to unify the two methods, and then drop the override in 
> HiveRelMdPredicates in favour of the method of RelMdPredicates.
> After applying HIVE-25966, the only difference is in the test for constant 
> expressions, which can be summarized as follows:
> ||Expression Type|Is Constant for Hive?||Is Constant for Calcite?||
> |InputRef|False|False|
> |Call|True if function is deterministic (arguments are not checked), false 
> otherwise|True if function is deterministic and all operands are constants, 
> false otherwise|
> |CorrelatedVariable|False|False|
> |LocalRef|False|False|
> |Over|False|False|
> |DymanicParameter|False|True|
> |RangeRef|False|False|
> |FieldAccess|False|Given expr.field, true if expr is constant, false 
> otherwise|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-25952) Drop HiveRelMdPredicates::getPredicates(Project...) to use that of RelMdPredicates

2024-07-23 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868103#comment-17868103
 ] 

Alessandro Solimando commented on HIVE-25952:
-

It's been a long time but IIRC, I marked HIVE-25966 as blocking this ticket due 
to 
[HiveRelMdPredicates.java#L160|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L160].

It seems that the opposite should be true, we can't fix HIVE-25966 (and adopt 
Calcite's machinery) without making them agree on what a constant is (the 
RexCall case you mentioned, with the imprecision you correctly spotted).

Most probably the test results are not available anymore, but the divergence of 
Hive/Calcite on what is a constant where causing some issues that had to be 
fixed.

> Drop HiveRelMdPredicates::getPredicates(Project...) to use that of 
> RelMdPredicates
> --
>
> Key: HIVE-25952
> URL: https://issues.apache.org/jira/browse/HIVE-25952
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are some differences on this method between Hive and Calcite, the idea 
> of this ticket is to unify the two methods, and then drop the override in 
> HiveRelMdPredicates in favour of the method of RelMdPredicates.
> After applying HIVE-25966, the only difference is in the test for constant 
> expressions, which can be summarized as follows:
> ||Expression Type|Is Constant for Hive?||Is Constant for Calcite?||
> |InputRef|False|False|
> |Call|True if function is deterministic (arguments are not checked), false 
> otherwise|True if function is deterministic and all operands are constants, 
> false otherwise|
> |CorrelatedVariable|False|False|
> |LocalRef|False|False|
> |Over|False|False|
> |DymanicParameter|False|True|
> |RangeRef|False|False|
> |FieldAccess|False|Given expr.field, true if expr is constant, false 
> otherwise|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28264) OOM/slow compilation when query contains SELECT clauses with nested expressions

2024-05-16 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-28264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847000#comment-17847000
 ] 

Alessandro Solimando commented on HIVE-28264:
-

I guess the problem applies to the respective Calcite rules from which the Hive 
ones were derived, do you know if that has been addressed there?

> OOM/slow compilation when query contains SELECT clauses with nested 
> expressions
> ---
>
> Key: HIVE-28264
> URL: https://issues.apache.org/jira/browse/HIVE-28264
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, HiveServer2
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> {code:sql}
> CREATE TABLE t0 (`title` string);
> SELECT x10 from
> (SELECT concat_ws('L10',x9, x9, x9, x9) as x10 from
> (SELECT concat_ws('L9',x8, x8, x8, x8) as x9 from
> (SELECT concat_ws('L8',x7, x7, x7, x7) as x8 from
> (SELECT concat_ws('L7',x6, x6, x6, x6) as x7 from
> (SELECT concat_ws('L6',x5, x5, x5, x5) as x6 from
> (SELECT concat_ws('L5',x4, x4, x4, x4) as x5 from
> (SELECT concat_ws('L4',x3, x3, x3, x3) as x4 from
> (SELECT concat_ws('L3',x2, x2, x2, x2) as x3 
> from
> (SELECT concat_ws('L2',x1, x1, x1, x1) as 
> x2 from
> (SELECT concat_ws('L1',x0, x0, x0, 
> x0) as x1 from
> (SELECT concat_ws('L0',title, 
> title, title, title) as x0 from t0) t1) t2) t3) t4) t5) t6) t7) t8) t9) t10) t
> WHERE x10 = 'Something';
> {code}
> The query above fails with OOM when run with the TestMiniLlapLocalCliDriver 
> and the default max heap size configuration effective for tests (-Xmx2048m).
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:3332)
>   at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
>   at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
>   at java.lang.StringBuilder.append(StringBuilder.java:136)
>   at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:152)
>   at org.apache.calcite.rex.RexCall.toString(RexCall.java:165)
>   at org.apache.calcite.rex.RexCall.appendOperands(RexCall.java:105)
>   at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:151)
>   at org.apache.calcite.rex.RexCall.toString(RexCall.java:165)
>   at java.lang.String.valueOf(String.java:2994)
>   at java.lang.StringBuilder.append(StringBuilder.java:131)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:90)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144)
>   at 
> org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explainInputs(RelWriterImpl.java:122)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:116)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144)
>   at 
> org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
>   at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2308)
>   at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2292)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RuleEventLogger.ruleProductionSucceeded(RuleEventLogger.java:73)
>   at 
> org.apache.calcite.plan.MulticastRelOptListener.ruleProductionSucceeded(MulticastRelOptListener.java:68)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.notifyTransformation(AbstractRelOptPlanner.java:370)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyTransformationResults(HepPlanner.java:702)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:545)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:271)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2452)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2411)
> {noformat}



--
This

[jira] [Commented] (HIVE-26313) Aggregate all column statistics into a single field in metastore

2023-03-20 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702796#comment-17702796
 ] 

Alessandro Solimando commented on HIVE-26313:
-

[~zabetak] there is a draft branch available 
[here|https://github.com/asolimando/hive/tree/master-HIVE-26313-statistics_blob],
 it's the one [~veghlaci05] is referring to.

The code is based on _Jackson_ and _Immutables_ libraries and has the building 
blocks to serialize statistics to _json_ and deserialize them.

What I was aiming for, was to keep both individual columns and the blob, test 
that everything was working end-to-end (comparing both), then remove and clean 
up the individual columns version once I was happy with the result.

That's the reason why you see many unnecessary serialization and 
deserialization of the json blob.

At the same time the idea was to also simplify the subclasses of _ 
ColumnStatsMerger_ and push more complexity into each class (in line with 
HIVE-27000 but pushed even further).

I am not currently working on it, so if you are interested feel free to pick 
this up and use the branch if it's useful.

> Aggregate all column statistics into a single field in metastore
> 
>
> Key: HIVE-26313
> URL: https://issues.apache.org/jira/browse/HIVE-26313
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Major
>  Labels: backward-incompatible
>
> At the moment, column statistics tables in the metastore schema look like 
> this (it's similar for _PART_COL_STATS_):
> {noformat}
> CREATE TABLE "APP"."TAB_COL_STATS"(
> "CAT_NAME" VARCHAR(256) NOT NULL,
> "DB_NAME" VARCHAR(128) NOT NULL,
> "TABLE_NAME" VARCHAR(256) NOT NULL,
> "COLUMN_NAME" VARCHAR(767) NOT NULL,
> "COLUMN_TYPE" VARCHAR(128) NOT NULL,
> "LONG_LOW_VALUE" BIGINT,
> "LONG_HIGH_VALUE" BIGINT,
> "DOUBLE_LOW_VALUE" DOUBLE,
> "DOUBLE_HIGH_VALUE" DOUBLE,
> "BIG_DECIMAL_LOW_VALUE" VARCHAR(4000),
> "BIG_DECIMAL_HIGH_VALUE" VARCHAR(4000),
> "NUM_DISTINCTS" BIGINT,
> "NUM_NULLS" BIGINT NOT NULL,
> "AVG_COL_LEN" DOUBLE,
> "MAX_COL_LEN" BIGINT,
> "NUM_TRUES" BIGINT,
> "NUM_FALSES" BIGINT,
> "LAST_ANALYZED" BIGINT,
> "CS_ID" BIGINT NOT NULL,
> "TBL_ID" BIGINT NOT NULL,
> "BIT_VECTOR" BLOB,
> "ENGINE" VARCHAR(128) NOT NULL
> );
> {noformat}
> The idea is to have a single blob named _STATISTICS_ to replace them, as 
> follows:
> {noformat}
> CREATE TABLE "APP"."TAB_COL_STATS"(
> "CAT_NAME" VARCHAR(256) NOT NULL,
> "DB_NAME" VARCHAR(128) NOT NULL,
> "TABLE_NAME" VARCHAR(256) NOT NULL,
> "COLUMN_NAME" VARCHAR(767) NOT NULL,
> "COLUMN_TYPE" VARCHAR(128) NOT NULL,
> "STATISTICS" BLOB,
> "LAST_ANALYZED" BIGINT,
> "CS_ID" BIGINT NOT NULL,
> "TBL_ID" BIGINT NOT NULL,
> "ENGINE" VARCHAR(128) NOT NULL
> );
> {noformat}
> The _STATISTICS_ column could be the serialization of a Json-encoded string, 
> which will be consumed in a "schema-on-read" fashion.
> At first at least the removed column statistics will be encoded in the 
> _STATISTICS_ column, but since each "consumer" will read the portion of the 
> schema it is interested into, multiple engines (see the _ENGINE_ column) can 
> read and write statistics as they deem fit.
> Another advantage is that, if we plan to add more statistics in the future, 
> we won't need to change the thrift interface for the metastore again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled

2023-02-14 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-27065:

Affects Version/s: 4.0.0-alpha-2

> Exception in partition column statistics update with SQL Server db when 
> histogram statistics is not enabled
> ---
>
> Key: HIVE-27065
> URL: https://issues.apache.org/jira/browse/HIVE-27065
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-2
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *[Description]* 
> java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with 
> SQL Server db when histogram statistics is not enabled.
> *java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.*
>  
> *[Steps to reproduce]* 
> Create stage table, load data into stage table, create partition table and 
> load data into the table from the stage table.
> {code:java}
> 0: jdbc:hive2://localhost:1> create database mydb;
> 0: jdbc:hive2://localhost:1> use mydb;
>  
> 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name 
> string) row format delimited fields terminated by '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table 
> stage;
>  
> 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) 
> partitioned by (category string) row format delimited fields terminated by 
> '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; 
> {code}
>  
> *[Exception Stack]*
> {code:java}
> 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] 
> metastore.DirectSqlUpdateStat: Unable to update Column stats for  dynpart
> java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.
>     at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown 
> Source) ~[?:?]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>

[jira] [Updated] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled

2023-02-14 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-27065:

Component/s: Metastore
 Statistics

> Exception in partition column statistics update with SQL Server db when 
> histogram statistics is not enabled
> ---
>
> Key: HIVE-27065
> URL: https://issues.apache.org/jira/browse/HIVE-27065
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *[Description]* 
> java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with 
> SQL Server db when histogram statistics is not enabled.
> *java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.*
>  
> *[Steps to reproduce]* 
> Create stage table, load data into stage table, create partition table and 
> load data into the table from the stage table.
> {code:java}
> 0: jdbc:hive2://localhost:1> create database mydb;
> 0: jdbc:hive2://localhost:1> use mydb;
>  
> 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name 
> string) row format delimited fields terminated by '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table 
> stage;
>  
> 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) 
> partitioned by (category string) row format delimited fields terminated by 
> '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; 
> {code}
>  
> *[Exception Stack]*
> {code:java}
> 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] 
> metastore.DirectSqlUpdateStat: Unable to update Column stats for  dynpart
> java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.
>     at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown 
> Source) ~[?:?]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
>

[jira] [Commented] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled

2023-02-14 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688632#comment-17688632
 ] 

Alessandro Solimando commented on HIVE-27065:
-

Fixed via 
[8592eb0|https://github.com/apache/hive/commit/8592eb0c6466234b9162b27c26bc2b13030cf71f],
 thanks [~VenuReddy] for your patch!

> Exception in partition column statistics update with SQL Server db when 
> histogram statistics is not enabled
> ---
>
> Key: HIVE-27065
> URL: https://issues.apache.org/jira/browse/HIVE-27065
> Project: Hive
>  Issue Type: Bug
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *[Description]* 
> java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with 
> SQL Server db when histogram statistics is not enabled.
> *java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.*
>  
> *[Steps to reproduce]* 
> Create stage table, load data into stage table, create partition table and 
> load data into the table from the stage table.
> {code:java}
> 0: jdbc:hive2://localhost:1> create database mydb;
> 0: jdbc:hive2://localhost:1> use mydb;
>  
> 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name 
> string) row format delimited fields terminated by '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table 
> stage;
>  
> 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) 
> partitioned by (category string) row format delimited fields terminated by 
> '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; 
> {code}
>  
> *[Exception Stack]*
> {code:java}
> 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] 
> metastore.DirectSqlUpdateStat: Unable to update Column stats for  dynpart
> java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.
>     at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown 
> Source) ~[?:?]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
>

[jira] [Resolved] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled

2023-02-14 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-27065.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Exception in partition column statistics update with SQL Server db when 
> histogram statistics is not enabled
> ---
>
> Key: HIVE-27065
> URL: https://issues.apache.org/jira/browse/HIVE-27065
> Project: Hive
>  Issue Type: Bug
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *[Description]* 
> java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with 
> SQL Server db when histogram statistics is not enabled.
> *java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.*
>  
> *[Steps to reproduce]* 
> Create stage table, load data into stage table, create partition table and 
> load data into the table from the stage table.
> {code:java}
> 0: jdbc:hive2://localhost:1> create database mydb;
> 0: jdbc:hive2://localhost:1> use mydb;
>  
> 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name 
> string) row format delimited fields terminated by '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table 
> stage;
>  
> 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) 
> partitioned by (category string) row format delimited fields terminated by 
> '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; 
> {code}
>  
> *[Exception Stack]*
> {code:java}
> 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] 
> metastore.DirectSqlUpdateStat: Unable to update Column stats for  dynpart
> java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.
>     at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown 
> Source) ~[?:?]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>

[jira] [Assigned] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled

2023-02-14 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-27065:
---

Assignee: Venugopal Reddy K

> Exception in partition column statistics update with SQL Server db when 
> histogram statistics is not enabled
> ---
>
> Key: HIVE-27065
> URL: https://issues.apache.org/jira/browse/HIVE-27065
> Project: Hive
>  Issue Type: Bug
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *[Description]* 
> java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with 
> SQL Server db when histogram statistics is not enabled.
> *java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.*
>  
> *[Steps to reproduce]* 
> Create stage table, load data into stage table, create partition table and 
> load data into the table from the stage table.
> {code:java}
> 0: jdbc:hive2://localhost:1> create database mydb;
> 0: jdbc:hive2://localhost:1> use mydb;
>  
> 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name 
> string) row format delimited fields terminated by '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table 
> stage;
>  
> 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) 
> partitioned by (category string) row format delimited fields terminated by 
> '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; 
> {code}
>  
> *[Exception Stack]*
> {code:java}
> 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] 
> metastore.DirectSqlUpdateStat: Unable to update Column stats for  dynpart
> java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.
>     at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown 
> Source) ~[?:?]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
>

[jira] [Commented] (HIVE-27065) Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled

2023-02-09 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686915#comment-17686915
 ] 

Alessandro Solimando commented on HIVE-27065:
-

Thanks [~VenuReddy] for reporting and opening the PR, I will take a closer look 
at it during the weekend.

I will leave some preliminary comments to kick off the discussion!

> Exception in partition column statistics update with SQL Server db when 
> histogram statistics is not enabled
> ---
>
> Key: HIVE-27065
> URL: https://issues.apache.org/jira/browse/HIVE-27065
> Project: Hive
>  Issue Type: Bug
>Reporter: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *[Description]* 
> java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with 
> SQL Server db when histogram statistics is not enabled.
> *java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.*
>  
> *[Steps to reproduce]* 
> Create stage table, load data into stage table, create partition table and 
> load data into the table from the stage table.
> {code:java}
> 0: jdbc:hive2://localhost:1> create database mydb;
> 0: jdbc:hive2://localhost:1> use mydb;
>  
> 0: jdbc:hive2://localhost:1> create table stage(sr int, st string, name 
> string) row format delimited fields terminated by '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> load data local inpath 'partdata' into table 
> stage;
>  
> 0: jdbc:hive2://localhost:1> create table dynpart(num int, name string) 
> partitioned by (category string) row format delimited fields terminated by 
> '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:1> insert into dynpart select * from stage; 
> {code}
>  
> *[Exception Stack]*
> {code:java}
> 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] 
> metastore.DirectSqlUpdateStat: Unable to update Column stats for  dynpart
> java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.
>     at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown 
> Source) ~[?:?]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
>

[jira] [Commented] (HIVE-27055) hive-exec typos part 3

2023-02-07 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685182#comment-17685182
 ] 

Alessandro Solimando commented on HIVE-27055:
-

I have changed the "Affect Version/s" field value to 4.0.0-alpha-2 because 
4.0.0 is not out yet AFAIK.

> hive-exec typos part 3
> --
>
> Key: HIVE-27055
> URL: https://issues.apache.org/jira/browse/HIVE-27055
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning, Query Processor
>Affects Versions: 4.0.0-alpha-2
>Reporter: Michal Lorek
>Assignee: Michal Lorek
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> multiple typos and grammar errors in hive-exec module code and comments



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27055) hive-exec typos part 3

2023-02-07 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-27055:

Affects Version/s: 4.0.0-alpha-2
   (was: 4.0.0)

> hive-exec typos part 3
> --
>
> Key: HIVE-27055
> URL: https://issues.apache.org/jira/browse/HIVE-27055
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning, Query Processor
>Affects Versions: 4.0.0-alpha-2
>Reporter: Michal Lorek
>Assignee: Michal Lorek
>Priority: Trivial
>
> multiple typos and grammar errors in hive-exec module code and comments



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-27000) Improve the modularity of the *ColumnStatsMerger classes

2023-01-30 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27000 started by Alessandro Solimando.
---
> Improve the modularity of the *ColumnStatsMerger classes
> 
>
> Key: HIVE-27000
> URL: https://issues.apache.org/jira/browse/HIVE-27000
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> *ColumnStatsMerger classes contain a lot of duplicate code which is not 
> specific to the data type, and that could therefore be lifted to a common 
> parent class.
> This phenomenon is bound to become even worse if we keep enriching further 
> our supported set of statistics as we did in the context of HIVE-26221.
> The current ticket aims at improving the modularity and code reuse of the 
> *ColumnStatsMerger classes, while improving unit-test coverage to cover all 
> classes and support more use-cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27000) Improve the modularity of the *ColumnStatsMerger classes

2023-01-30 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-27000:
---


> Improve the modularity of the *ColumnStatsMerger classes
> 
>
> Key: HIVE-27000
> URL: https://issues.apache.org/jira/browse/HIVE-27000
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> *ColumnStatsMerger classes contain a lot of duplicate code which is not 
> specific to the data type, and that could therefore be lifted to a common 
> parent class.
> This phenomenon is bound to become even worse if we keep enriching further 
> our supported set of statistics as we did in the context of HIVE-26221.
> The current ticket aims at improving the modularity and code reuse of the 
> *ColumnStatsMerger classes, while improving unit-test coverage to cover all 
> classes and support more use-cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26297) Refactoring ColumnStatsAggregator classes to reduce warnings

2022-12-15 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-26297.
-
Resolution: Abandoned

> Refactoring ColumnStatsAggregator classes to reduce warnings
> 
>
> Key: HIVE-26297
> URL: https://issues.apache.org/jira/browse/HIVE-26297
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Minor
>
> The interest of reducing warnings is to be able to focus on the important 
> ones.
> Some of the bugs fixed while writing unit-tests were highlighted as warnings 
> (potential NPEs and rounding issues), but it was hard to see them among the 
> many other (less severe) warnings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26221) Add histogram-based column statistics

2022-12-13 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646588#comment-17646588
 ] 

Alessandro Solimando commented on HIVE-26221:
-

Thank you [~dengzh] and [~amansinha100] for the great review! I would also like 
to acknowledge the work of Ryan Johnson for all the benchmarking he did which 
shaped the final decision of directly using CDF function without needing 
intermediate (binned) histogram representation, and [~kgyrtkirk] and 
[~amansinha100] for their inputs in the design phase of the proposal.

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26820) Disable hybridgrace_hashjoin_2.q flaky test

2022-12-09 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-26820.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Disable hybridgrace_hashjoin_2.q flaky test
> ---
>
> Key: HIVE-26820
> URL: https://issues.apache.org/jira/browse/HIVE-26820
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Had this test failing many times in the last months, let's disable it for the 
> moment:
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26820) Disable hybridgrace_hashjoin_2.q flaky test

2022-12-09 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645311#comment-17645311
 ] 

Alessandro Solimando commented on HIVE-26820:
-

Thanks [~zabetak] for the review and merge, I have filed and linked HIVE-26828 
to this ticket, which can now be closed!

> Disable hybridgrace_hashjoin_2.q flaky test
> ---
>
> Key: HIVE-26820
> URL: https://issues.apache.org/jira/browse/HIVE-26820
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Had this test failing many times in the last months, let's disable it for the 
> moment:
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26828) Fix OOM for hybridgrace_hashjoin_2.q

2022-12-09 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26828:

Description: 
_hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
transiently (from [flaky_test 
output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
 in case it disappears):
{quote}< Status: Failed
< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex 
vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: 
z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: 
Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
<  A masked pattern was here 
< Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
< Serialization trace:
< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
< aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
<  A masked pattern was here 
< Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
<  A masked pattern was here 
< ]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
< DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
< FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, 
vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
<  A masked pattern was here 
< Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
< Serialization trace:
< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
< aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
<  A masked pattern was here 
< Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
<  A masked pattern was here 
< ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due 
to OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked 
Vertex killed due to OTHER_VERTEX_FAILURE]DAG did not succeed due to 
VERTEX_FAILURE. failedVertices:1 killedVertices:5
< PREHOOK: query: SELECT COUNT( * )
< FROM src1 x
< JOIN srcpart z1 ON (x.key = z1.key)
< JOIN src y1 ON (x.key = y1.key)
< JOIN srcpart z2 ON (x.value = z2.value)
< JOIN src y2 ON (x.value = y2.value)
< WHERE z1.key < '' AND z2.key < 'zz'
< AND y1.value < '' AND y2.value < 'zz'
< PREHOOK: type: QUERY
< PREHOOK: Input: default@src
< PREHOOK: Input: default@src1
< PREHOOK: Input: default@srcpart
< PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11
< PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12
< PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11
< PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12
< PREHOOK: Output: hdfs://### HDFS PATH ###
{quote}
The aim of this ticket is to investigate the issue, fix it and re-enable the 
test.

The problem seems to lie in the deserialization of the computed tez dag plan.

  was:
_hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
transiently (from [flaky_test 
output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
 in case it disappears):
{code:java}
< Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, 
diagnostics=[Vertex {code}
{code:java}




The aim of this ticket is to investigate the issue, fix it and re-enable the 
test.{code}


> Fix OOM for hybridgrace_hashjoin_2.q
> 
>
> Key: HIVE-26828
> URL: https://issues.apache.org/jira/browse/HIVE-26828
> Project: Hive
>  Issue Type: Bug
>

[jira] [Updated] (HIVE-26828) Fix OOM for hybridgrace_hashjoin_2.q

2022-12-09 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26828:

Description: 
_hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
transiently (from [flaky_test 
output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
 in case it disappears):
{code:java}
< Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, 
diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml<
  A masked pattern was here < Caused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< 
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< 
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork 
(org.apache.hadoop.hive.ql.plan.MapWork)<  A masked pattern was here < 
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded<  A 
masked pattern was here < ]< [Masked Vertex killed due to 
OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< 
[Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due 
to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< 
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, 
vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml<
  A masked pattern was here < Caused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< 
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< 
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork 
(org.apache.hadoop.hive.ql.plan.MapWork)<  A masked pattern was here < 
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded<  A 
masked pattern was here < ][Masked Vertex killed due to 
OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked 
Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG did 
not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< PREHOOK: 
query: SELECT COUNT(*)< FROM src1 x< JOIN srcpart z1 ON (x.key = z1.key)< JOIN 
src y1 ON (x.key = y1.key)< JOIN srcpart z2 ON (x.value = z2.value)< JOIN 
src y2 ON (x.value = y2.value)< WHERE z1.key < '' AND z2.key < 
'zz'<  AND y1.value < '' AND y2.value < 'zz'< PREHOOK: 
type: QUERY< PREHOOK: Input: default@src< PREHOOK: Input: default@src1< 
PREHOOK: Input: default@srcpart< PREHOOK: Input: 
default@srcpart@ds=2008-04-08/hr=11< PREHOOK: Input: 
default@srcpart@ds=2008-04-08/hr=12< PREHOOK: Input: 
default@srcpart@ds=2008-04-09/hr=11< PREHOOK: Input: 
default@srcpart@ds=2008-04-09/hr=12< PREHOOK: Output: hdfs://### HDFS PATH 
###{code}
The aim of this ticket is to investigate the issue, fix it and re-enable the 
test.

  was:
_hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
transiently (from [flaky_test 
output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
 in case it disappears):
{noformat}
property: qfile used as override with val: hybridgrace_hashjoin_2.qproperty: 
run_disabled used as override with val: falseSetting hive-site: 
file:/home/jenkins/agent/workspace/hive-flaky-check/data/conf/tez//hive-site.xmlInitializing
 the schema to: 4.0.0Metastore connection URL:  
jdbc:derby:memory:junit_metastore_db;create=trueMetastore connection Driver :   
org.apache.derby.jdbc.EmbeddedDriverMetastore connection User:  APPMetastore 
connection Password:   mineStarting metastore schema initialization to 
4.0.0Initialization script

[jira] [Updated] (HIVE-26828) Fix OOM for hybridgrace_hashjoin_2.q

2022-12-09 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26828:

Description: 
_hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
transiently (from [flaky_test 
output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
 in case it disappears):
{code:java}
< Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, 
diagnostics=[Vertex {code}
{code:java}




The aim of this ticket is to investigate the issue, fix it and re-enable the 
test.{code}

  was:
_hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
transiently (from [flaky_test 
output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
 in case it disappears):
{code:java}
< Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, 
diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml<
  A masked pattern was here < Caused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< 
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< 
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork 
(org.apache.hadoop.hive.ql.plan.MapWork)<  A masked pattern was here < 
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded<  A 
masked pattern was here < ]< [Masked Vertex killed due to 
OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< 
[Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due 
to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< 
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, 
vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml<
  A masked pattern was here < Caused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< 
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< 
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork 
(org.apache.hadoop.hive.ql.plan.MapWork)<  A masked pattern was here < 
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded<  A 
masked pattern was here < ][Masked Vertex killed due to 
OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked 
Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG did 
not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< PREHOOK: 
query: SELECT COUNT(*)< FROM src1 x< JOIN srcpart z1 ON (x.key = z1.key)< JOIN 
src y1 ON (x.key = y1.key)< JOIN srcpart z2 ON (x.value = z2.value)< JOIN 
src y2 ON (x.value = y2.value)< WHERE z1.key < '' AND z2.key < 
'zz'<  AND y1.value < '' AND y2.value < 'zz'< PREHOOK: 
type: QUERY< PREHOOK: Input: default@src< PREHOOK: Input: default@src1< 
PREHOOK: Input: default@srcpart< PREHOOK: Input: 
default@srcpart@ds=2008-04-08/hr=11< PREHOOK: Input: 
default@srcpart@ds=2008-04-08/hr=12< PREHOOK: Input: 
default@srcpart@ds=2008-04-09/hr=11< PREHOOK: Input: 
default@srcpart@ds=2008-04-09/hr=12< PREHOOK: Output: hdfs://### HDFS PATH 
###{code}
The aim of this ticket is to investigate the issue, fix it and re-enable the 
test.


> Fix OOM for hybridgrace_hashjoin_2.q
> 
>
> Key: HIVE-26828
> URL: https://issues.apache.org/jira/browse/HIVE-26828
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Tez
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro

[jira] [Commented] (HIVE-26820) Disable hybridgrace_hashjoin_2.q flaky test

2022-12-09 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645232#comment-17645232
 ] 

Alessandro Solimando commented on HIVE-26820:
-

Yes, I haven't had time to track which commit broke it, but I have had CI 
failing from time to time due to OOM in this test for quite some time now (like 
few months), it's not that frequent though.

As I don't have much time to dive deep into this, for me the best course of 
action at the moment is to disable the test and create a ticket to investigate 
it and re-enable it later.

> Disable hybridgrace_hashjoin_2.q flaky test
> ---
>
> Key: HIVE-26820
> URL: https://issues.apache.org/jira/browse/HIVE-26820
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Had this test failing many times in the last months, let's disable it for the 
> moment:
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26820) Disable hybridgrace_hashjoin_2.q flaky test

2022-12-08 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26820:

Summary: Disable hybridgrace_hashjoin_2.q flaky test  (was: Disable 
hybridgrace_hashjoin_2 flaky qtest)

> Disable hybridgrace_hashjoin_2.q flaky test
> ---
>
> Key: HIVE-26820
> URL: https://issues.apache.org/jira/browse/HIVE-26820
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Had this test failing many times in the last months, let's disable it for the 
> moment:
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26820) Disable hybridgrace_hashjoin_2 flaky qtest

2022-12-08 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26820:
---


> Disable hybridgrace_hashjoin_2 flaky qtest
> --
>
> Key: HIVE-26820
> URL: https://issues.apache.org/jira/browse/HIVE-26820
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Had this test failing many times in the last months, let's disable it for the 
> moment:
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc

2022-12-07 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644328#comment-17644328
 ] 

Alessandro Solimando edited comment on HIVE-26812 at 12/7/22 1:10 PM:
--

[~zabetak], thanks for your input, I agree on the analysis, we can fix it here 
as per the PR, but we should open another ticket to track the issue in the 
beeline module.

EDIT: as per why it works when compiling from the main directory with -Pitests, 
given your findings, I am wondering if the dependency-reduced.xml from 
hive-beeline module with the needed dependencies is used?


was (Author: asolimando):
[~zabetak], thanks for your input, I agree on the analysis, we can fix it here 
as per the PR, but we should open another ticket to track the issue in the 
beeline module.

 

> hive-it-util module misses a dependency on hive-jdbc
> 
>
> Key: HIVE-26812
> URL: https://issues.apache.org/jira/browse/HIVE-26812
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Building from $hive/itests fails as follows:
> {noformat}
> [INFO] Hive Integration - Testing Utilities ... FAILURE [  6.492 
> s]
> ...
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  56.499 s
> [INFO] Finished at: 2022-12-06T19:24:16+01:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile 
> (default-compile) on project hive-it-util: Compilation failure
> [ERROR] 
> /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28]
>  cannot find symbol
> [ERROR]   symbol:   class Utils
> [ERROR]   location: package org.apache.hive.jdbc
> [ERROR]
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-util{noformat}
> Surprisingly, building from the top directory with -Pitests does not fail.
> There is a missing dependency on the hive-jdbc module, when adding that, the 
> error gets fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc

2022-12-07 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644328#comment-17644328
 ] 

Alessandro Solimando commented on HIVE-26812:
-

[~zabetak], thanks for your input, I agree on the analysis, we can fix it here 
as per the PR, but we should open another ticket to track the issue in the 
beeline module.

 

> hive-it-util module misses a dependency on hive-jdbc
> 
>
> Key: HIVE-26812
> URL: https://issues.apache.org/jira/browse/HIVE-26812
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Building from $hive/itests fails as follows:
> {noformat}
> [INFO] Hive Integration - Testing Utilities ... FAILURE [  6.492 
> s]
> ...
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  56.499 s
> [INFO] Finished at: 2022-12-06T19:24:16+01:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile 
> (default-compile) on project hive-it-util: Compilation failure
> [ERROR] 
> /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28]
>  cannot find symbol
> [ERROR]   symbol:   class Utils
> [ERROR]   location: package org.apache.hive.jdbc
> [ERROR]
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-util{noformat}
> Surprisingly, building from the top directory with -Pitests does not fail.
> There is a missing dependency on the hive-jdbc module, when adding that, the 
> error gets fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc

2022-12-06 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643999#comment-17643999
 ] 

Alessandro Solimando commented on HIVE-26812:
-

[~zabetak], since we had a look together at this already, it should be trivial 
to review for you, if you have time :)

> hive-it-util module misses a dependency on hive-jdbc
> 
>
> Key: HIVE-26812
> URL: https://issues.apache.org/jira/browse/HIVE-26812
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Building from $hive/itests fails as follows:
> {noformat}
> [INFO] Hive Integration - Testing Utilities ... FAILURE [  6.492 
> s]
> ...
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  56.499 s
> [INFO] Finished at: 2022-12-06T19:24:16+01:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile 
> (default-compile) on project hive-it-util: Compilation failure
> [ERROR] 
> /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28]
>  cannot find symbol
> [ERROR]   symbol:   class Utils
> [ERROR]   location: package org.apache.hive.jdbc
> [ERROR]
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-util{noformat}
> Surprisingly, building from the top directory with -Pitests does not fail.
> There is a missing dependency on the hive-jdbc module, when adding that, the 
> error gets fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc

2022-12-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26812 started by Alessandro Solimando.
---
> hive-it-util module misses a dependency on hive-jdbc
> 
>
> Key: HIVE-26812
> URL: https://issues.apache.org/jira/browse/HIVE-26812
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Building from $hive/itests fails as follows:
> {noformat}
> [INFO] Hive Integration - Testing Utilities ... FAILURE [  6.492 
> s]
> ...
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  56.499 s
> [INFO] Finished at: 2022-12-06T19:24:16+01:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile 
> (default-compile) on project hive-it-util: Compilation failure
> [ERROR] 
> /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28]
>  cannot find symbol
> [ERROR]   symbol:   class Utils
> [ERROR]   location: package org.apache.hive.jdbc
> [ERROR]
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-util{noformat}
> Surprisingly, building from the top directory with -Pitests does not fail.
> There is a missing dependency on the hive-jdbc module, when adding that, the 
> error gets fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc

2022-12-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26812:
---


> hive-it-util module misses a dependency on hive-jdbc
> 
>
> Key: HIVE-26812
> URL: https://issues.apache.org/jira/browse/HIVE-26812
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Building from $hive/itests fails as follows:
> {noformat}
> [INFO] Hive Integration - Testing Utilities ... FAILURE [  6.492 
> s]
> ...
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  56.499 s
> [INFO] Finished at: 2022-12-06T19:24:16+01:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile 
> (default-compile) on project hive-it-util: Compilation failure
> [ERROR] 
> /Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28]
>  cannot find symbol
> [ERROR]   symbol:   class Utils
> [ERROR]   location: package org.apache.hive.jdbc
> [ERROR]
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-util{noformat}
> Surprisingly, building from the top directory with -Pitests does not fail.
> There is a missing dependency on the hive-jdbc module, when adding that, the 
> error gets fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-06 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643751#comment-17643751
 ] 

Alessandro Solimando commented on HIVE-26806:
-

[~zabetak], I have deleted the green runs but the first time I re-run, the 
timeout occurred again.

I haven't seen timeout from that run onward, so it has probably worked, but the 
new random split was unfortunate too by coincidence.

So, resuming, deleting past green runs seem to work, no need to close and open 
PR again if not needed.

Thanks!

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26810) Replace HiveFilterSetOpTransposeRule onMatch method with Calcite's built-in implementation

2022-12-05 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26810:
---


> Replace HiveFilterSetOpTransposeRule onMatch method with Calcite's built-in 
> implementation
> --
>
> Key: HIVE-26810
> URL: https://issues.apache.org/jira/browse/HIVE-26810
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> After HIVE-26762, the _onMatch_ method is now the same as in the Calcite 
> implementation, we can drop the Hive's override in order to avoid the risk of 
> them drifting away again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643470#comment-17643470
 ] 

Alessandro Solimando edited comment on HIVE-26806 at 12/5/22 5:19 PM:
--

It looks that deleting all green past runs did not fix for 
[https://github.com/apache/hive/pull/3137].

That's a big deal since the PR is huge and review is in progress, I don't think 
I can close and re-open it.

Is there a way to tweak timeout for that PR alone [~zabetak]?

EDIT: there is, I am using "Replay" in Jenkins so I can change the JenkinsFile 
for the given run without any change in Git, hopefully that will do the trick.


was (Author: asolimando):
It looks that deleting all green past runs did not fix for 
[https://github.com/apache/hive/pull/3137].

That's a big deal since the PR is huge and review is in progress, I don't think 
I can close and re-open it.

Is there a way to tweak timeout for that PR alone [~zabetak]?

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643470#comment-17643470
 ] 

Alessandro Solimando commented on HIVE-26806:
-

It looks that deleting all green past runs did not fix for 
[https://github.com/apache/hive/pull/3137].

That's a big deal since the PR is huge and review is in progress, I don't think 
I can close and re-open it.

Is there a way to tweak timeout for that PR alone [~zabetak]?

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643222#comment-17643222
 ] 

Alessandro Solimando commented on HIVE-26806:
-

Thanks [~zabetak], as you say the issue now affects only existing PRs, I am 
trying 2. to see if it works, otherwise I will go for 1., I will keep you guys 
posted here.

Forgetting the old affected PRs, I am OK with reducing the timeout to the 
previous value, since it now works.

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-02 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642558#comment-17642558
 ] 

Alessandro Solimando commented on HIVE-26806:
-

In case you have an existing open PR suffering form this and you don't want to 
rebase, if you have permission to run Jenkins' jobs you just change the default 
split value to 22 and re-run, HTH

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-12-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26762 started by Alessandro Solimando.
---
> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (due to the predicates holding on 
> the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
> produces rows"), this is possibly redundant and we could drop this feature 
> and let the other rules take care of the pruning.
> In such a case, it might be even possible to drop the Hive specific rule and 
> relies on the Calcite one (the difference is just the operand pruning at the 
> moment of writing), similarly to what HIVE-26642 did for 
> HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
> to tackle this in a separate ticket after verifying that is feasible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-12-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26762:
---

Assignee: Alessandro Solimando

> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (due to the predicates holding on 
> the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
> produces rows"), this is possibly redundant and we could drop this feature 
> and let the other rules take care of the pruning.
> In such a case, it might be even possible to drop the Hive specific rule and 
> relies on the Calcite one (the difference is just the operand pruning at the 
> moment of writing), similarly to what HIVE-26642 did for 
> HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
> to tackle this in a separate ticket after verifying that is feasible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-12-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26762:

Issue Type: Task  (was: Bug)

> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (due to the predicates holding on 
> the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
> produces rows"), this is possibly redundant and we could drop this feature 
> and let the other rules take care of the pruning.
> In such a case, it might be even possible to drop the Hive specific rule and 
> relies on the Calcite one (the difference is just the operand pruning at the 
> moment of writing), similarly to what HIVE-26642 did for 
> HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
> to tackle this in a separate ticket after verifying that is feasible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26692) Check for the expected thrift version before compiling

2022-11-30 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641464#comment-17641464
 ] 

Alessandro Solimando commented on HIVE-26692:
-

[~ayushtkn] I managed to find a bit of time to work on this, would you mind 
checking the PR if you have some spare cycles?

> Check for the expected thrift version before compiling
> --
>
> Key: HIVE-26692
> URL: https://issues.apache.org/jira/browse/HIVE-26692
> Project: Hive
>  Issue Type: Task
>  Components: Thrift API
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At the moment we don't check for the thrift version before launching thrift, 
> the error messages are often cryptic upon mismatches.
> An explicit check with a clear error message would be nice, like what parquet 
> does: 
> [https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L247-L268]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26692) Check for the expected thrift version before compiling

2022-11-30 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26692 started by Alessandro Solimando.
---
> Check for the expected thrift version before compiling
> --
>
> Key: HIVE-26692
> URL: https://issues.apache.org/jira/browse/HIVE-26692
> Project: Hive
>  Issue Type: Task
>  Components: Thrift API
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> At the moment we don't check for the thrift version before launching thrift, 
> the error messages are often cryptic upon mismatches.
> An explicit check with a clear error message would be nice, like what parquet 
> does: 
> [https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L247-L268]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26692) Check for the expected thrift version before compiling

2022-11-30 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26692:
---

Assignee: Alessandro Solimando

> Check for the expected thrift version before compiling
> --
>
> Key: HIVE-26692
> URL: https://issues.apache.org/jira/browse/HIVE-26692
> Project: Hive
>  Issue Type: Task
>  Components: Thrift API
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> At the moment we don't check for the thrift version before launching thrift, 
> the error messages are often cryptic upon mismatches.
> An explicit check with a clear error message would be nice, like what parquet 
> does: 
> [https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L247-L268]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26683) Sum over window produces 0 when row contains null

2022-11-22 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637130#comment-17637130
 ] 

Alessandro Solimando commented on HIVE-26683:
-

+1 from me, it's always unfortunate to make breaking changes, but in this case 
the current behaviour seems inconsistent and broken (surprising to see 0, I 
agree it should be a NULL), we should fix it IMO.

> Sum over window produces 0 when row contains null
> -
>
> Key: HIVE-26683
> URL: https://issues.apache.org/jira/browse/HIVE-26683
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Ran the following sql:
>  
> {code:java}
> create table sum_window_test_small (id int, tinyint_col tinyint);
> insert into sum_window_test_small values (5,5), (10, NULL), (11,1);
> select id,
> tinyint_col,
> sum(tinyint_col) over (order by id nulls last rows between 1 following and 1 
> following)
> from sum_window_test_small order by id;
> select id,
> tinyint_col,
> sum(tinyint_col) over (order by id nulls last rows between current row and 1 
> following)
> from sum_window_test_small order by id;
> {code}
> The result is
> {code:java}
> +-+--+---+
> | id  | tinyint_col  | sum_window_0  |
> +-+--+---+
> | 5   | 5            | 0             |
> | 10  | NULL         | 1             |
> | 11  | 1            | NULL          |
> +-+--+---+{code}
> The first row should have the sum as NULL
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26243) Add vectorized implementation of the 'ds_kll_sketch' UDAF

2022-11-21 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-26243.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed via 
[{{ad19ec3}}|https://github.com/apache/hive/commit/ad19ec3022a35bee4d618bd8992d9ce0f67be5b7],
 thanks to [~dkuzmenko] and [~kgyrtkirk] for their reviews

> Add vectorized implementation of the 'ds_kll_sketch' UDAF
> -
>
> Key: HIVE-26243
> URL: https://issues.apache.org/jira/browse/HIVE-26243
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF, Vectorization
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> _ds_kll_sketch_ UDAF does not have a vectorized implementation at the moment, 
> the present ticket aims at bridging this gap.
> This is particularly important because vectorization has an "all or nothing" 
> approach, so if this function is used at the side of vectorized functions, 
> they won't be able to benefit from vectorized execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-11-18 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26762:

Description: 
HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if the 
newly pushed filter simplifies to FALSE (due to the predicates holding on the 
input).

If this is true and there is more than one UNION ALL operand, it gets pruned.

After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
produces rows"), this is possibly redundant and we could drop this feature and 
let the other rules take care of the pruning.

In such a case, it might be even possible to drop the Hive specific rule and 
relies on the Calcite one (the difference is just the operand pruning at the 
moment of writing), similarly to what HIVE-26642 did for 
HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
to tackle this in a separate ticket after verifying that is feasible.

  was:
HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if the 
newly pushed filter simplifies to FALSE (possibly due to the predicates holding 
on the input).

If this is true and there is more than one UNION ALL operand, it gets pruned.

After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
produces rows"), this is possibly redundant and we could drop this feature and 
let the other rules take care of the pruning.

In such a case, it might be even possible to drop the Hive specific rule and 
relies on the Calcite one (the difference is just the operand pruning at the 
moment of writing), similarly to what HIVE-26642 did for 
HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
to tackle this in a separate ticket after verifying that is feasible.


> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Major
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (due to the predicates holding on 
> the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
> produces rows"), this is possibly redundant and we could drop this feature 
> and let the other rules take care of the pruning.
> In such a case, it might be even possible to drop the Hive specific rule and 
> relies on the Calcite one (the difference is just the operand pruning at the 
> moment of writing), similarly to what HIVE-26642 did for 
> HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
> to tackle this in a separate ticket after verifying that is feasible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-11-18 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26762:

Description: 
HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if the 
newly pushed filter simplifies to FALSE (possibly due to the predicates holding 
on the input).

If this is true and there is more than one UNION ALL operand, it gets pruned.

After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
produces rows"), this is possibly redundant and we could drop this feature and 
let the other rules take care of the pruning.

In such a case, it might be even possible to drop the Hive specific rule and 
relies on the Calcite one (the difference is just the operand pruning at the 
moment of writing), similarly to what HIVE-26642 did for 
HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
to tackle this in a separate ticket after verifying that is feasible.

  was:
HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if the 
newly pushed filter simplifies to FALSE (possibly due to the predicates holding 
on the input).

If this is true and there is more than one UNION ALL operand, it gets pruned.

After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
produces rows"), this is possibly redundant and we could drop this feature and 
let the other rules take care of the pruning.

In such a case, it's even possible to drop the Hive specific rule and relies on 
the Calcite one (the difference is just the operand pruning at the moment of 
writing), similarly to what HIVE-26642 did for HiveReduceExpressionRule.


> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Major
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (possibly due to the predicates 
> holding on the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
> produces rows"), this is possibly redundant and we could drop this feature 
> and let the other rules take care of the pruning.
> In such a case, it might be even possible to drop the Hive specific rule and 
> relies on the Calcite one (the difference is just the operand pruning at the 
> moment of writing), similarly to what HIVE-26642 did for 
> HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
> to tackle this in a separate ticket after verifying that is feasible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work stopped] (HIVE-26733) Not safe to use '=' for predicates on constant expressions that might be NULL

2022-11-18 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26733 stopped by Alessandro Solimando.
---
> Not safe to use '=' for predicates on constant expressions that might be NULL
> -
>
> Key: HIVE-26733
> URL: https://issues.apache.org/jira/browse/HIVE-26733
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago.
> Hive's version lacks this commit 
> [https://github.com/apache/calcite/commit/8281668f] which introduced the use 
> of "IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can 
> be NULL.
> There is no Calcite ticket for this change, so I am briefly explaining the 
> issue here.
> Consider the following input as argument of 
> HiveRelMdPredicates::pullUpPredicates(Project) method:
> {code:java}
> SELECT char_length(NULL) FROM t{code}
> The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL))) 
> which translates to "=(NULL, NULL)", which turns simplifies to FALSE under 
> the unknownAsFalse semantics.
> The change will make this methods return "IS NOT DISTINCT FROM($0, 
> CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL, 
> NULL), which is TRUE.
> For reference, we have the truth table below (from [1]):
> ||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}||
> |{{0}}|{{0}}|_true_|_true_|
> |{{0}}|{{1}}|_false_|_false_|
> |{{0}}|{{null}}|_*unknown*_|_*false*_|
> |{{null}}|{{null}}|_*unknown*_|_*true*_|
> [1] https://modern-sql.com/feature/is-distinct-from



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands

2022-11-17 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-26722.
-
Resolution: Fixed

> HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands
> --
>
> Key: HIVE-26722
> URL: https://issues.apache.org/jira/browse/HIVE-26722
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> h1. Reproducer
> Consider the following query:
> {code:java}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> CREATE EXTERNAL TABLE t (a string, b string);
> INSERT INTO t VALUES ('1000', 'b1');
> INSERT INTO t VALUES ('2000', 'b2');
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000;EXPLAIN CBO
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000; {code}
> The expected result is:
> {code:java}
> 1000    b1
> 1000    NULL{code}
> An example of correct plan is as follows:
> {noformat}
> CBO PLAN:
> HiveUnion(all=[true])
>   HiveProject(a=[$0], b=[$1])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t])
>   HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE"])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
>  
> Consider now a scenario where expression reduction in projections is disabled 
> by setting the following property{_}:{_}
> {noformat}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> {noformat}
> In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
> and we get the following (invalid) result:
> {code:java}
> 1000    b1{code}
> produced by the following invalid plan:
> {code:java}
> CBO PLAN:
> HiveProject(a=[$0], b=[$1])
>   HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>     HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
> h1. Problem Analysis
> At 
> [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
>  the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
> predicate due to the CAST(NULL) in the projection:
> {code:java}
> (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
> When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
> inferred.
> In 
> [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
>  the rule checks if the conjunction of the predicate coming from the filter 
> (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
> not, under the _UnknownAsFalse_ semantics.
> To summarize, the following expression is simplified under the 
> _UnknownAsFalse_ semantics:
> {code:java}
> AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
> =(CAST($0):DOUBLE, 1000))
> {code}
> Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to 
> {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, 
>  =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and 
> the UNION ALL operand is pruned.
> Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the 
> issue, due to the _IS_NULL($1)_ inferred predicate, see 
> [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
>  for understanding how the NULL literal is treated differently during 
> predicate inference.
> _HiveRelMdPredicates_ should not use equality ('=') for nullable constant 
> expressions, but rather IS NOT DISTINCT FROM, as detailed in HIVE-26733, but 
> nonetheless the way simplification is done is not correct here, inferred 
> predicates should be used as "context", rather than been used in a 
> conjunctive expression, this usage does not conform with any of the similar 
> uses of simplification with inferred predicates (see the bottom of the 
> "Solution" section for examples and details).
> h1. Solution
> In

[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands

2022-11-16 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26722:

Description: 
h1. Reproducer

Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');

SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000; {code}
The expected result is:
{code:java}
1000    b1
1000    NULL{code}
An example of correct plan is as follows:
{noformat}
CBO PLAN:
HiveUnion(all=[true])
  HiveProject(a=[$0], b=[$1])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t])
  HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
 

Consider now a scenario where expression reduction in projections is disabled 
by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
and we get the following (invalid) result:
{code:java}
1000    b1{code}
produced by the following invalid plan:
{code:java}
CBO PLAN:
HiveProject(a=[$0], b=[$1])
  HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
    HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
h1. Problem Analysis

At 
[HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
 the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
predicate due to the CAST(NULL) in the projection:
{code:java}
(=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
inferred.

In 
[HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
 the rule checks if the conjunction of the predicate coming from the filter 
(here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
not, under the _UnknownAsFalse_ semantics.

To summarize, the following expression is simplified under the _UnknownAsFalse_ 
semantics:
{code:java}
AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
=(CAST($0):DOUBLE, 1000))
{code}
Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, 
because no value is equal to NULL (even NULL itself), AND(FALSE,  
=(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the 
UNION ALL operand is pruned.

Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, 
due to the _IS_NULL($1)_ inferred predicate, see 
[HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
 for understanding how the NULL literal is treated differently during predicate 
inference.

_HiveRelMdPredicates_ should not use equality ('=') for nullable constant 
expressions, but rather IS NOT DISTINCT FROM, as detailed in HIVE-26733, but 
nonetheless the way simplification is done is not correct here, inferred 
predicates should be used as "context", rather than been used in a conjunctive 
expression, this usage does not conform with any of the similar uses of 
simplification with inferred predicates (see the bottom of the "Solution" 
section for examples and details).
h1. Solution

In order to correctly simplify a predicate and test if it's always false or 
not, we should build RexSimplify with _predicates_ as the list of predicates 
known to hold in the context. In this way, the different semantics are 
correctly taken into account.

The code at 
[HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121]
 should be replaced by the following:
{code:java}
final RexExecutor executor =
Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR);
final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor);
final RexNode x =

[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands

2022-11-16 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26722:

Description: 
h1. Reproducer

Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');

SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000; {code}
 

The expected result is:
{code:java}
1000    b1
1000    NULL{code}
An example of correct plan is as follows:
{noformat}
CBO PLAN:
HiveUnion(all=[true])
  HiveProject(a=[$0], b=[$1])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t])
  HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
 

Consider now a scenario where expression reduction in projections is disabled 
by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
and we get the following (invalid) result:
{code:java}
1000    b1{code}
produced by the following invalid plan:
{code:java}
CBO PLAN:
HiveProject(a=[$0], b=[$1])
  HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
    HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
h1. Problem Analysis

At 
[HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
 the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
predicate due to the CAST(NULL) in the projection:
{code:java}
(=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
inferred.

In 
[HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
 the rule checks if the conjunction of the predicate coming from the filter 
(here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
not, under the _UnknownAsFalse_ semantics.

To summarize, the following expression is simplified under the _UnknownAsFalse_ 
semantics:
{code:java}
AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
=(CAST($0):DOUBLE, 1000))
{code}
Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, 
because no value is equal to NULL (even NULL itself), AND(FALSE,  
=(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the 
UNION ALL operand is pruned.

Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, 
due to the _IS_NULL($1)_ inferred predicate, see 
[HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
 for understanding how the NULL literal is treated differently during predicate 
inference.

The problem lies in the fact that, depending on the input _RelNode_ that we 
infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, 
but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case.
h1. Solution

In order to correctly simplify a predicate and test if it's always false or 
not, we should build RexSimplify with _predicates_ as the list of predicates 
known to hold in the context. In this way, the different semantics are 
correctly taken into account.

The code at 
[HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121]
 should be replaced by the following:
{code:java}
final RexExecutor executor =
Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR);
final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor);
final RexNode x = simplify.simplifyUnknownAs(newCondition, 
RexUnknownAs.FALSE);{code}
This is in line with other uses of simplification, like in Calcite:

[jira] [Updated] (HIVE-26733) Not safe to use '=' for predicates on constant expressions that might be NULL

2022-11-15 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26733:

Description: 
HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago.

Hive's version lacks this commit 
[https://github.com/apache/calcite/commit/8281668f] which introduced the use of 
"IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can be 
NULL.

There is no Calcite ticket for this change, so I am briefly explaining the 
issue here.

Consider the following input as argument of 
HiveRelMdPredicates::pullUpPredicates(Project) method:
{code:java}
SELECT char_length(NULL) FROM t{code}
The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL))) which 
translates to "=(NULL, NULL)", which turns simplifies to FALSE under the 
unknownAsFalse semantics.

The change will make this methods return "IS NOT DISTINCT FROM($0, 
CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL, NULL), 
which is TRUE.

For reference, we have the truth table below (from [1]):
||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}||
|{{0}}|{{0}}|_true_|_true_|
|{{0}}|{{1}}|_false_|_false_|
|{{0}}|{{null}}|_*unknown*_|_*false*_|
|{{null}}|{{null}}|_*unknown*_|_*true*_|

[1] https://modern-sql.com/feature/is-distinct-from

  was:
Given a _CAST(NULL as $type)_ as i-th project expression, the method returns 
_(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a 
_NULL_ literal project expression.

This is because _RexLiteral::isNullLiteral_ is used 
[here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
 while in similar cases, it's often convenient to use 
{_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.


> Not safe to use '=' for predicates on constant expressions that might be NULL
> -
>
> Key: HIVE-26733
> URL: https://issues.apache.org/jira/browse/HIVE-26733
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago.
> Hive's version lacks this commit 
> [https://github.com/apache/calcite/commit/8281668f] which introduced the use 
> of "IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can 
> be NULL.
> There is no Calcite ticket for this change, so I am briefly explaining the 
> issue here.
> Consider the following input as argument of 
> HiveRelMdPredicates::pullUpPredicates(Project) method:
> {code:java}
> SELECT char_length(NULL) FROM t{code}
> The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL))) 
> which translates to "=(NULL, NULL)", which turns simplifies to FALSE under 
> the unknownAsFalse semantics.
> The change will make this methods return "IS NOT DISTINCT FROM($0, 
> CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL, 
> NULL), which is TRUE.
> For reference, we have the truth table below (from [1]):
> ||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}||
> |{{0}}|{{0}}|_true_|_true_|
> |{{0}}|{{1}}|_false_|_false_|
> |{{0}}|{{null}}|_*unknown*_|_*false*_|
> |{{null}}|{{null}}|_*unknown*_|_*true*_|
> [1] https://modern-sql.com/feature/is-distinct-from



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26733) Not safe to use '=' for predicates on constant expressions that might be NULL

2022-11-15 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26733:

Summary: Not safe to use '=' for predicates on constant expressions that 
might be NULL  (was: HiveRelMdPredicates::getPredicate(Project) should return 
IS_NULL for CAST(NULL))

> Not safe to use '=' for predicates on constant expressions that might be NULL
> -
>
> Key: HIVE-26733
> URL: https://issues.apache.org/jira/browse/HIVE-26733
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Given a _CAST(NULL as $type)_ as i-th project expression, the method returns 
> _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a 
> _NULL_ literal project expression.
> This is because _RexLiteral::isNullLiteral_ is used 
> [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
>  while in similar cases, it's often convenient to use 
> {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26733) HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for CAST(NULL)

2022-11-13 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26733:

Description: 
Given a _CAST(NULL as $type)_ as i-th project expression, the method returns 
_(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a 
_NULL_ literal project expression.

This is because _RexLiteral::isNullLiteral_ is used 
[here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
 while in similar cases, it's often convenient to use 
{_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.

  was:
Given a _CAST(NULL as $type)_ as i-th project expression, the method returns 
_(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a 
_NULL_ literal project expression.

This is because _RexLiteral::isNullLiteral_ is used 
[here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
 while in similar places, it's often convenient to use 
{_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.


> HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for 
> CAST(NULL)
> ---
>
> Key: HIVE-26733
> URL: https://issues.apache.org/jira/browse/HIVE-26733
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Given a _CAST(NULL as $type)_ as i-th project expression, the method returns 
> _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a 
> _NULL_ literal project expression.
> This is because _RexLiteral::isNullLiteral_ is used 
> [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
>  while in similar cases, it's often convenient to use 
> {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26733) HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for CAST(NULL)

2022-11-13 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26733 started by Alessandro Solimando.
---
> HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for 
> CAST(NULL)
> ---
>
> Key: HIVE-26733
> URL: https://issues.apache.org/jira/browse/HIVE-26733
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Given a _CAST(NULL as $type)_ as i-th project expression, the method returns 
> _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a 
> _NULL_ literal project expression.
> This is because _RexLiteral::isNullLiteral_ is used 
> [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
>  while in similar places, it's often convenient to use 
> {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26733) HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for CAST(NULL)

2022-11-13 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26733:
---


> HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for 
> CAST(NULL)
> ---
>
> Key: HIVE-26733
> URL: https://issues.apache.org/jira/browse/HIVE-26733
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Given a _CAST(NULL as $type)_ as i-th project expression, the method returns 
> _(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a 
> _NULL_ literal project expression.
> This is because _RexLiteral::isNullLiteral_ is used 
> [here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
>  while in similar places, it's often convenient to use 
> {_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands

2022-11-10 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26722:

Summary: HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands 
 (was: HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands)

> HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands
> --
>
> Key: HIVE-26722
> URL: https://issues.apache.org/jira/browse/HIVE-26722
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> h1. Reproducer
> Consider the following query:
> {code:java}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> CREATE EXTERNAL TABLE t (a string, b string);
> INSERT INTO t VALUES ('1000', 'b1');
> INSERT INTO t VALUES ('2000', 'b2');
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000;EXPLAIN CBO
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000; {code}
>  
> The expected result is:
> {code:java}
> 1000    b1
> 1000    NULL{code}
> An example of correct plan is as follows:
> {noformat}
> CBO PLAN:
> HiveUnion(all=[true])
>   HiveProject(a=[$0], b=[$1])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t])
>   HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE"])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
>  
> Consider now a scenario where expression reduction in projections is disabled 
> by setting the following property{_}:{_}
> {noformat}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> {noformat}
> In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
> and we get the following (invalid) result:
> {code:java}
> 1000    b1{code}
> produced by the following invalid plan:
> {code:java}
> CBO PLAN:
> HiveProject(a=[$0], b=[$1])
>   HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>     HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
> h1. Problem Analysis
> At 
> [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
>  the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
> predicate due to the CAST(NULL) in the projection:
> {code:java}
> (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
> When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
> inferred.
> In 
> [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
>  the rule checks if the conjunction of the predicate coming from the filter 
> (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
> not, under the _UnknownAsFalse_ semantics.
> To summarize, the following expression is simplified under the 
> _UnknownAsFalse_ semantics:
> {code:java}
> AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
> =(CAST($0):DOUBLE, 1000))
> {code}
> Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to 
> {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, 
>  =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and 
> the UNION ALL operand is pruned.
> Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the 
> issue, due to the _IS_NULL($1)_ inferred predicate, see 
> [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
>  for understanding how the NULL literal is treated differently during 
> predicate inference.
> The problem lies in the fact that, depending on the input _RelNode_ that we 
> infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, 
> but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this 
> case.
> h1. Solution
> In order to correctly simplify a predicate and test if it's always false or 
> not, we should build RexSimplify with _predicates_ as the list of predicates 
> known to hold in the context. In this way, the different semantics are

[jira] [Work started] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands

2022-11-10 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26722 started by Alessandro Solimando.
---
> HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands
> -
>
> Key: HIVE-26722
> URL: https://issues.apache.org/jira/browse/HIVE-26722
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> h1. Reproducer
> Consider the following query:
> {code:java}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> CREATE EXTERNAL TABLE t (a string, b string);
> INSERT INTO t VALUES ('1000', 'b1');
> INSERT INTO t VALUES ('2000', 'b2');
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000;EXPLAIN CBO
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000; {code}
>  
> The expected result is:
> {code:java}
> 1000    b1
> 1000    NULL{code}
> An example of correct plan is as follows:
> {noformat}
> CBO PLAN:
> HiveUnion(all=[true])
>   HiveProject(a=[$0], b=[$1])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t])
>   HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE"])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
>  
> Consider now a scenario where expression reduction in projections is disabled 
> by setting the following property{_}:{_}
> {noformat}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> {noformat}
> In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
> and we get the following (invalid) result:
> {code:java}
> 1000    b1{code}
> produced by the following invalid plan:
> {code:java}
> CBO PLAN:
> HiveProject(a=[$0], b=[$1])
>   HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>     HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
> h1. Problem Analysis
> At 
> [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
>  the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
> predicate due to the CAST(NULL) in the projection:
> {code:java}
> (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
> When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
> inferred.
> In 
> [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
>  the rule checks if the conjunction of the predicate coming from the filter 
> (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
> not, under the _UnknownAsFalse_ semantics.
> To summarize, the following expression is simplified under the 
> _UnknownAsFalse_ semantics:
> {code:java}
> AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
> =(CAST($0):DOUBLE, 1000))
> {code}
> Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to 
> {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, 
>  =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and 
> the UNION ALL operand is pruned.
> Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the 
> issue, due to the _IS_NULL($1)_ inferred predicate, see 
> [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
>  for understanding how the NULL literal is treated differently during 
> predicate inference.
> The problem lies in the fact that, depending on the input _RelNode_ that we 
> infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, 
> but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this 
> case.
> h1. Solution
> In order to correctly simplify a predicate and test if it's always false or 
> not, we should build RexSimplify with _predicates_ as the list of predicates 
> known to hold in the context. In this way, the different semantics are 
> correctly taken into account.
> The code at 
>

[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands

2022-11-10 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26722:

Description: 
h1. Reproducer

Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');

SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000; {code}
 

The expected result is:
{code:java}
1000    b1
1000    NULL{code}
An example of correct plan is as follows:
{noformat}
CBO PLAN:
HiveUnion(all=[true])
  HiveProject(a=[$0], b=[$1])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t])
  HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
 

Consider now a scenario where expression reduction in projections is disabled 
by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
and we get the following (invalid) result:
{code:java}
1000    b1{code}
produced by the following invalid plan:
{code:java}
CBO PLAN:
HiveProject(a=[$0], b=[$1])
  HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
    HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
h1. Problem Analysis

At 
[HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
 the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
predicate due to the CAST(NULL) in the projection:
{code:java}
(=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
inferred.

In 
[HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
 the rule checks if the conjunction of the predicate coming from the filter 
(here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
not, under the _UnknownAsFalse_ semantics.

To summarize, the following expression is simplified under the _UnknownAsFalse_ 
semantics:
{code:java}
AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
=(CAST($0):DOUBLE, 1000))
{code}
Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, 
because no value is equal to NULL (even NULL itself), AND(FALSE,  
=(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the 
UNION ALL operand is pruned.

Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, 
due to the _IS_NULL($1)_ inferred predicate, see 
[HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
 for understanding how the NULL literal is treated differently during predicate 
inference.

The problem lies in the fact that, depending on the input _RelNode_ that we 
infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, 
but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case.
h1. Solution

In order to correctly simplify a predicate and test if it's always false or 
not, we should build RexSimplify with _predicates_ as the list of predicates 
known to hold in the context. In this way, the different semantics are 
correctly taken into account.

The code at 
[HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121]
 should be replaced by the following:
{code:java}
final RexExecutor executor =
Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR);
final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor);
final RexNode x = simplify.simplifyUnknownAs(newCondition, 
RexUnknownAs.FALSE);{code}

  was:
Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');

[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands

2022-11-10 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26722:

Description: 
Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');

SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000; {code}
 

The expected result is:
{code:java}
1000    b1
1000    NULL{code}
An example of correct plan is as follows:
{noformat}
CBO PLAN:
HiveUnion(all=[true])
  HiveProject(a=[$0], b=[$1])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t])
  HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
 

Consider now a scenario where expression reduction in projections is disabled 
by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
and we get the following (invalid) result:
{code:java}
1000    b1{code}
produced by the following invalid plan:
{code:java}
CBO PLAN:
HiveProject(a=[$0], b=[$1])
  HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
    HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
h3. Problem Analysis

At 
[HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
 the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
predicate due to the CAST(NULL) in the projection:
{code:java}
(=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
inferred.

In 
[HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
 the rule checks if the conjunction of the predicate coming from the filter 
(here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
not, under the _UnknownAsFalse_ semantics.

To summarize, the following expression is simplified under the _UnknownAsFalse_ 
semantics:
{code:java}
AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
=(CAST($0):DOUBLE, 1000))
{code}
Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, 
because no value is equal to NULL (even NULL itself), AND(FALSE,  
=(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the 
UNION ALL operand is pruned.

Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, 
due to the _IS_NULL($1)_ inferred predicate, see 
[HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
 for understanding how the NULL literal is treated differently during predicate 
inference.

The problem lies in the fact that, depending on the input _RelNode_ that we 
infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, 
but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case.

Solution: in order to correctly simplify a predicate and test if it's always 
false or not, we should build RexSimplify with _predicates_ as the list of 
predicates known to hold in the context. In this way, the different semantics 
are correctly taken into account.

The code at 
[HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121]
 should be replaced by the following:
{code:java}
final RexExecutor executor =
Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR);
final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor);
final RexNode x = simplify.simplifyUnknownAs(newCondition, 
RexUnknownAs.FALSE);{code}

  was:
Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES

[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands

2022-11-10 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26722:

Description: 
Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');

SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000; {code}
 

The expected result is:
{code:java}
1000    b1
1000    NULL{code}
An example of correct plan is as follows:
{noformat}
CBO PLAN:
HiveUnion(all=[true])
  HiveProject(a=[$0], b=[$1])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t])
  HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
 

Consider now a scenario where expression reduction in projections is disabled 
by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
and we get the following (invalid) result:
{code:java}
1000    b1{code}
 

produced by the following invalid plan:
CBO PLAN:
HiveProject(a=[$0], b=[$1])
  HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
    HiveTableScan(table=[[default, t]], table:alias=[t])
 

Problem analysis:

At 
[HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
 the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
predicate due to the CAST(NULL) in the projection:

 
{code:java}
(=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
 

When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
inferred.

In 
[HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
 the rule checks if the conjunction of the predicate coming from the filter 
(here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
not, under the _UnknownAsFalse_ semantics.

To summarize, the following expression is simplified under the _UnknownAsFalse_ 
semantics:

 
{code:java}
AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
=(CAST($0):DOUBLE, 1000))
{code}
Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, 
because no value is equal to NULL (even NULL itself), AND(FALSE,  
=(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the 
UNION ALL operand is pruned.

Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, 
due to the _IS_NULL($1)_ inferred predicate, see 
[HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
 for understanding how the NULL literal is treated differently during predicate 
inference.

The problem lies in the fact that, depending on the input _RelNode_ that we 
infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, 
but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case.

Solution: in order to correctly simplify a predicate and test if it's always 
false or not, we should build RexSimplify with _predicates_ as the list of 
predicates known to hold in the context. In this way, the different semantics 
are correctly taken into account.

The code at 
[HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121]
 should be replaced by the following:

 
{code:java}
final RexExecutor executor =
Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR);
final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor);
final RexNode x = simplify.simplifyUnknownAs(newCondition, 
RexUnknownAs.FALSE);{code}
 

  was:
Consider the following query:

 
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES

[jira] [Assigned] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands

2022-11-10 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26722:
---


> HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands
> -
>
> Key: HIVE-26722
> URL: https://issues.apache.org/jira/browse/HIVE-26722
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Consider the following query:
>  
> {code:java}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> CREATE EXTERNAL TABLE t (a string, b string);
> INSERT INTO t VALUES ('1000', 'b1');
> INSERT INTO t VALUES ('2000', 'b2');
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000;EXPLAIN CBO
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000; {code}
>  
>  
> The expected result is:
>  
> {code:java}
> 1000    b1
> 1000    NULL{code}
>  
> An example of correct plan is as follows:
>  
> {noformat}
> CBO PLAN:
> HiveUnion(all=[true])
>   HiveProject(a=[$0], b=[$1])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t])
>   HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE"])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
>  
>  
> Consider now a scenario where expression reduction in projections is disabled 
> by setting the following property{_}:{_}
> {noformat}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> {noformat}
> In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
> and we get the following (invalid) result:
> 1000    b1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26678) In the filter criteria associated with multiple tables, the filter result of the subquery by not in or in is incorrect.

2022-11-07 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629757#comment-17629757
 ] 

Alessandro Solimando commented on HIVE-26678:
-

Non-CBO codepath has a lot of flaws, it should be probably discontinued at this 
point given that CBO support is mature and dates back a while.

Any specific reason why you are running without CBO in the first place?

> In the filter criteria associated with multiple tables, the filter result of 
> the subquery by not in or in is incorrect.
> ---
>
> Key: HIVE-26678
> URL: https://issues.apache.org/jira/browse/HIVE-26678
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: lotan
>Priority: Major
>
> create testtable as follow:
> create table test101 (id string,id2 string);
> create table test102 (id string,id2 string);
> create table test103 (id string,id2 string);
> create table test104 (id string,id2 string);
> when cbo is false，run the following SQL statement:
> explain select count(1) from test101 t1 
> left join test102 t2 on t1.id=t2.id
> left join test103 t3 on t1.id=t3.id2
> where t1.id in (select s.id from test104 s)
> and t3.id2='123';
> you will see：
> The filter criteria in the right table are lost.
> The execution plan is as follows:
> +-+
> |                                               Explain                       
>                         |
> +-+
> | STAGE DEPENDENCIES:                                                         
>                         |
> |   Stage-9 is a root stage                                                   
>                         |
> |   Stage-3 depends on stages: Stage-9                                        
>                         |
> |   Stage-0 depends on stages: Stage-3                                        
>                         |
> |                                                                             
>                         |
> | STAGE PLANS:                                                                
>                         |
> |   Stage: Stage-9                                                            
>                         |
> |     Map Reduce Local Work                                                   
>                         |
> |       Alias -> Map Local Tables:                                            
>                         |
> |         sq_1:s                                                              
>                         |
> |           Fetch Operator                                                    
>                         |
> |             limit: -1                                                       
>                         |
> |         t2                                                                  
>                         |
> |           Fetch Operator                                                    
>                         |
> |             limit: -1                                                       
>                         |
> |         t3                                                                  
>                         |
> |           Fetch Operator                                                    
>                         |
> |             limit: -1                                                       
>                         |
> |       Alias -> Map Local Operator Tree:                                     
>                         |
> |         sq_1:s                                                              
>                         |
> |           TableScan                                                         
>                         |
> |             alias: s                                                        
>                         |
> |             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE            |
> |             Filter Operator                                                 
>                         |
> |               predicate: id is not null (type: boolean)                     
>                         |
> |               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE          |
> |               Select Operator                                               
>                         |
> |                 expressions: id (type: string)                              
>                         |
> |                 outputColumnNames: _col0

[jira] [Comment Edited] (HIVE-26691) Generate thrift files by default at compilation time

2022-11-02 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627739#comment-17627739
 ] 

Alessandro Solimando edited comment on HIVE-26691 at 11/2/22 2:26 PM:
--

+1 for mentioning in the release notes and in the wiki.

For upstream vs downstream, how do developers manage at the moment? On MacOS 
with brew I have multiple versions installed, and I switch with _brew unlink_ + 
{_}brew link{_}. This does not seem a problem to me.

For the frequency of the update, I don't think it really matters, the same line 
of reasoning could apply for the JVM, mvn, etc., developers are required to 
have a proper setup for compiling, adding or removing thrift to that does not 
make a significant difference.

For protobuf I guess the situation is similar, but I have never had to deal 
with that, I guess it can be addressed in a separate ticket.


was (Author: asolimando):
+1 for mentioning in the release notes and in the wiki.

For upstream vs downstream, how do developers manage at the moment? On MacOS 
with brew I have multiple versions installed, and I switch with _brew unlink_ + 
{_}brew link{_}. This does not seem a problem to me.

For the frequency of the update, I don't think it really matters, the same line 
of reasoning could apply for the JVM, mvn, etc., developers are required to 
have a proper setup for compiling, adding or removing thrift to that does not 
make a significant difference.

For protobuf I guess the situation it's similar, but I have never had to deal 
with that.

> Generate thrift files by default at compilation time
> 
>
> Key: HIVE-26691
> URL: https://issues.apache.org/jira/browse/HIVE-26691
> Project: Hive
>  Issue Type: Task
>  Components: Thrift API
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Major
>
> Currently Hive does not generate thrift files within the main compilation 
> task ({_}mvn clean install -DskipTests{_}), but it uses a separate profile 
> ({_}mvn clean install -Pthriftif -DskipTests -Dthrift.home=$thrift_path{_}), 
> and thrift-generated files are generally committed in VCS.
> Other Apache projects like Parquet 
> ([https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml)] 
> use a different approach, building all thrift files by default in the main 
> compilation task.
> In general, generated files should not be part of our VCS, only the "source" 
> file should be (.thrift files here).
> Including generated files in VCS is not only problematic because they are 
> verbose and clog PR diffs, but they also generate a lot of conflicts (even 
> when the changes over the thrift file can be merged automatically).
> The ticket proposes to move the thrift files generation at compile time, 
> remove the thrift-generated files from VCS, and add them to the "ignore" list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26691) Generate thrift files by default at compilation time

2022-11-02 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627739#comment-17627739
 ] 

Alessandro Solimando commented on HIVE-26691:
-

+1 for mentioning in the release notes and in the wiki.

For upstream vs downstream, how do developers manage at the moment? On MacOS 
with brew I have multiple versions installed, and I switch with _brew unlink_ + 
{_}brew link{_}. This does not seem a problem to me.

For the frequency of the update, I don't think it really matters, the same line 
of reasoning could apply for the JVM, mvn, etc., developers are required to 
have a proper setup for compiling, adding or removing thrift to that does not 
make a significant difference.

For protobuf I guess the situation it's similar, but I have never had to deal 
with that.

> Generate thrift files by default at compilation time
> 
>
> Key: HIVE-26691
> URL: https://issues.apache.org/jira/browse/HIVE-26691
> Project: Hive
>  Issue Type: Task
>  Components: Thrift API
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Major
>
> Currently Hive does not generate thrift files within the main compilation 
> task ({_}mvn clean install -DskipTests{_}), but it uses a separate profile 
> ({_}mvn clean install -Pthriftif -DskipTests -Dthrift.home=$thrift_path{_}), 
> and thrift-generated files are generally committed in VCS.
> Other Apache projects like Parquet 
> ([https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml)] 
> use a different approach, building all thrift files by default in the main 
> compilation task.
> In general, generated files should not be part of our VCS, only the "source" 
> file should be (.thrift files here).
> Including generated files in VCS is not only problematic because they are 
> verbose and clog PR diffs, but they also generate a lot of conflicts (even 
> when the changes over the thrift file can be merged automatically).
> The ticket proposes to move the thrift files generation at compile time, 
> remove the thrift-generated files from VCS, and add them to the "ignore" list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26572) Support constant expressions in vectorization

2022-10-25 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26572:

Description: 
At the moment, we cannot vectorize aggregate expression having constant 
parameters in addition to the aggregation column (it's forbidden 
[here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).

One compelling example of how this could help is [PR 
1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
_compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
_compute_bit_vector_fm_ when HLL implementation has been added, while 
_compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.

Another example is {_}VectorUDAFBloomFilterMerge{_}, receiving an extra 
constant parameter controlling the number of threads for merging tasks. At the 
moment this parameter is "injected" when trying to find an appropriate 
constructor (see 
[VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).

This ad-hoc approach is not scalable and would make the code hard to read and 
maintain if more UDAFs require constant parameters.

In addition, we are probably missing vectorization opportunities if no such 
ad-hoc treatment is added but an appropriate UDAF constructor is available or 
could be easily added (data sketches UDAF, although not yet vectorized, are a 
good target).

  was:
At the moment, we cannot vectorize aggregate expression having constant 
parameters in addition to the aggregation column (it's forbidden 
[here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).

One compelling example of how this could help is [PR 
1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
_compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
_compute_bit_vector_fm_ when HLL implementation has been added, while 
_compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.

Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant 
parameter controlling the number of threads for merging tasks. At the moment 
this parameter is "injected" when trying to find an appropriate constructor 
(see 
[VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).

This ad-hoc approach is not scalable and would make the code hard to read and 
maintain if more UDAF requires constant parameters.

In addition, we are probably missing vectorization opportunities if no such 
ad-hoc treatment is added but an appropriate UDAF constructor is available or 
could be easily added (data sketches UDAF, although not yet vectorized, are a 
good target).


> Support constant expressions in vectorization
> -
>
> Key: HIVE-26572
> URL: https://issues.apache.org/jira/browse/HIVE-26572
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> At the moment, we cannot vectorize aggregate expression having constant 
> parameters in addition to the aggregation column (it's forbidden 
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR 
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
> _compute_bit_vector_fm_ when HLL implementation has been added, while 
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.
> Another example is {_}VectorUDAFBloomFilterMerge{_}, receiving an extra 
> constant parameter controlling the number of threads for merging tasks. At 
> the moment this parameter is "injected" when trying to find an appropriate 
> constructor (see 
> [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).
> This ad-hoc approach is not scalable and would make the code hard to read and 
> maintain if more UDAFs require constant parameters.
> In addition, we are

[jira] [Resolved] (HIVE-26572) Support constant expressions in vectorization

2022-10-25 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-26572.
-
Resolution: Fixed

Fixed via 
[f7517bc|https://github.com/apache/hive/commit/f7517bcacad3e33c213fa3cfa8670dac1c25ee92],
 thanks [~dkuzmenko] and [~teddy.choi] for your reviews!

> Support constant expressions in vectorization
> -
>
> Key: HIVE-26572
> URL: https://issues.apache.org/jira/browse/HIVE-26572
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> At the moment, we cannot vectorize aggregate expression having constant 
> parameters in addition to the aggregation column (it's forbidden 
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR 
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
> _compute_bit_vector_fm_ when HLL implementation has been added, while 
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.
> Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant 
> parameter controlling the number of threads for merging tasks. At the moment 
> this parameter is "injected" when trying to find an appropriate constructor 
> (see 
> [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).
> This ad-hoc approach is not scalable and would make the code hard to read and 
> maintain if more UDAF requires constant parameters.
> In addition, we are probably missing vectorization opportunities if no such 
> ad-hoc treatment is added but an appropriate UDAF constructor is available or 
> could be easily added (data sketches UDAF, although not yet vectorized, are a 
> good target).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26662) FAILED: SemanticException [Error 10072]: Database does not exist: spark_global_temp_views

2022-10-24 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17622975#comment-17622975
 ] 

Alessandro Solimando commented on HIVE-26662:
-

This is open source Hive, for vendor specific setups like yours you should get 
in touch with the vendor support

> FAILED: SemanticException [Error 10072]: Database does not exist: 
> spark_global_temp_views
> -
>
> Key: HIVE-26662
> URL: https://issues.apache.org/jira/browse/HIVE-26662
> Project: Hive
>  Issue Type: Bug
>Reporter: Mahmood Abu Awwad
>Priority: Blocker
>
> while running our batches using Apache Spark with Hive on EMR cluster, as 
> we're using AWS glue as a MetaStore, it seems there is an issue occurs, which 
> is 
> {code:java}
> EntityNotFoundException ,Database global_temp not found {code}
> {code:java}
> 2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Completed compiling 
> command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); 
> Time taken: 0.02 seconds
> 2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: reexec.ReExecDriver (:()) - Execution #1 of query
> 2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a 
> lock manager
> 2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Executing 
> command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736): 
> show views
> 2022-10-09T10:36:31,263 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Starting task [Stage-0:DDL] in serial mode
> 2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Completed executing 
> command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); 
> Time taken: 1.008 seconds
> 2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - OK
> 2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a 
> lock manager
> 2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: exec.ListSinkOperator (:()) - RECORDS_OUT_INTERMEDIATE:0, 
> RECORDS_OUT_OPERATOR_LIST_SINK_0:0,
> 2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: CliDriver (:()) - Time taken: 1.028 seconds
> 2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: conf.HiveConf (HiveConf.java:getLogIdVar(5104)) - Using the 
> default value passed in for log id: 573c4ce0-f73c-439b-829d-1f0b25db45ec
> 2022-10-09T10:36:32,272 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: session.SessionState (SessionState.java:resetThreadName(452)) - 
> Resetting thread name to  main
> 2022-10-09T10:36:46,512 INFO  [main([])]: conf.HiveConf 
> (HiveConf.java:getLogIdVar(5104)) - Using the default value passed in for log 
> id: 573c4ce0-f73c-439b-829d-1f0b25db45ec
> 2022-10-09T10:36:46,513 INFO  [main([])]: session.SessionState 
> (SessionState.java:updateThreadName(441)) - Updating thread name to 
> 573c4ce0-f73c-439b-829d-1f0b25db45ec main
> 2022-10-09T10:36:46,515 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Compiling 
> command(queryId=hadoop_20221009103646_f390a868-07d7-49f1-b620-70d40e5e2cff): 
> use global_temp
> 2022-10-09T10:36:46,530 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a 
> lock manager
> 2022-10-09T10:36:46,666 ERROR [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - FAILED: SemanticException [Error 10072]: 
> Database does not exist: global_temp
> org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: 
> global_temp
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2171)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeSwitchDatabase(DDLSemanticAnalyzer.java:1413)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:516)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
> at 
>

[jira] [Commented] (HIVE-26655) TPC-DS query 17 returns wrong results

2022-10-24 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17622974#comment-17622974
 ] 

Alessandro Solimando commented on HIVE-26655:
-

No statistics means no CBO, there are many queries failing for non-CBO path.

 

CBO has been around for long time now, with HIVE-25880 it's now possibile to 
disable specific rules in case of issues so there is no need to turn off CBO 
entirely in the presence of bugs.

All this considered, I guess it's time to consider removing the non-CBO path 
and stop supporting it, rather than trying to fix it

 

> TPC-DS query 17 returns wrong results
> -
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Priority: Major
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26652) HiveSortPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields

2022-10-19 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26652 started by Alessandro Solimando.
---
> HiveSortPullUpConstantsRule produces an invalid plan when pulling up 
> constants for nullable fields
> --
>
> Key: HIVE-26652
> URL: https://issues.apache.org/jira/browse/HIVE-26652
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0, 4.0.0-alpha-2
>
>
> The rule pulls up constants without checking/adjusting nullability to match 
> that of the field type.
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: type mismatch:
> ref:
> JavaType(class java.lang.Integer)
> input:
> JavaType(int) NOT NULL    at 
> org.apache.calcite.util.Litmus$1.fail(Litmus.java:31)
>     at org.apache.calcite.plan.RelOptUtil.eq(RelOptUtil.java:2167)
>     at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:125)
>     at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:57)
>     at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112)
>     at org.apache.calcite.rel.core.Project.isValid(Project.java:215)
>     at org.apache.calcite.rel.core.Project.(Project.java:94)
>     at org.apache.calcite.rel.core.Project.(Project.java:100)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.(HiveProject.java:58)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.copy(HiveProject.java:106)
>     at org.apache.calcite.rel.core.Project.copy(Project.java:126)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSortPullUpConstantsRule$HiveSortPullUpConstantsRuleBase.onMatch(HiveSortPullUpConstantsRule.java:195)
>     at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
>     at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>     at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.test(TestHiveSortExchangePullUpConstantsRule.java:104)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.testNullableFields(TestHiveSortExchangePullUpConstantsRule.java:156)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.mockito.internal.runners.DefaultInternalRunner$1$1.evaluate(DefaultInternalRunner.java:54)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at 
> org.mockito.internal.runners.DefaultInternalRunner$1.run(DefaultInternalRunner.java:99)
>     at 
>

[jira] [Assigned] (HIVE-26652) HiveSortPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields

2022-10-19 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26652:
---


> HiveSortPullUpConstantsRule produces an invalid plan when pulling up 
> constants for nullable fields
> --
>
> Key: HIVE-26652
> URL: https://issues.apache.org/jira/browse/HIVE-26652
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> The rule pulls up constants without checking/adjusting nullability to match 
> that of the field type.
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: type mismatch:
> ref:
> JavaType(class java.lang.Integer)
> input:
> JavaType(int) NOT NULL    at 
> org.apache.calcite.util.Litmus$1.fail(Litmus.java:31)
>     at org.apache.calcite.plan.RelOptUtil.eq(RelOptUtil.java:2167)
>     at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:125)
>     at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:57)
>     at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112)
>     at org.apache.calcite.rel.core.Project.isValid(Project.java:215)
>     at org.apache.calcite.rel.core.Project.(Project.java:94)
>     at org.apache.calcite.rel.core.Project.(Project.java:100)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.(HiveProject.java:58)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.copy(HiveProject.java:106)
>     at org.apache.calcite.rel.core.Project.copy(Project.java:126)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSortPullUpConstantsRule$HiveSortPullUpConstantsRuleBase.onMatch(HiveSortPullUpConstantsRule.java:195)
>     at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
>     at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>     at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.test(TestHiveSortExchangePullUpConstantsRule.java:104)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.testNullableFields(TestHiveSortExchangePullUpConstantsRule.java:156)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.mockito.internal.runners.DefaultInternalRunner$1$1.evaluate(DefaultInternalRunner.java:54)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at 
> org.mockito.internal.runners.DefaultInternalRunner$1.run(DefaultInternalRunner.java:99)
>     at 
> org.mockito.internal.runners.DefaultInternalRunner.run(DefaultInternalRunner.java:105)

[jira] [Updated] (HIVE-26652) HiveSortPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields

2022-10-19 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26652:

Fix Version/s: 4.0.0

> HiveSortPullUpConstantsRule produces an invalid plan when pulling up 
> constants for nullable fields
> --
>
> Key: HIVE-26652
> URL: https://issues.apache.org/jira/browse/HIVE-26652
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0, 4.0.0-alpha-2
>
>
> The rule pulls up constants without checking/adjusting nullability to match 
> that of the field type.
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: type mismatch:
> ref:
> JavaType(class java.lang.Integer)
> input:
> JavaType(int) NOT NULL    at 
> org.apache.calcite.util.Litmus$1.fail(Litmus.java:31)
>     at org.apache.calcite.plan.RelOptUtil.eq(RelOptUtil.java:2167)
>     at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:125)
>     at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:57)
>     at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112)
>     at org.apache.calcite.rel.core.Project.isValid(Project.java:215)
>     at org.apache.calcite.rel.core.Project.(Project.java:94)
>     at org.apache.calcite.rel.core.Project.(Project.java:100)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.(HiveProject.java:58)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.copy(HiveProject.java:106)
>     at org.apache.calcite.rel.core.Project.copy(Project.java:126)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSortPullUpConstantsRule$HiveSortPullUpConstantsRuleBase.onMatch(HiveSortPullUpConstantsRule.java:195)
>     at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
>     at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>     at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.test(TestHiveSortExchangePullUpConstantsRule.java:104)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.testNullableFields(TestHiveSortExchangePullUpConstantsRule.java:156)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.mockito.internal.runners.DefaultInternalRunner$1$1.evaluate(DefaultInternalRunner.java:54)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at 
> org.mockito.internal.runners.DefaultInternalRunner$1.run(DefaultInternalRunner.java:99)
>     at 
>

[jira] [Comment Edited] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields

2022-10-18 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17618914#comment-17618914
 ] 

Alessandro Solimando edited comment on HIVE-26643 at 10/18/22 9:41 AM:
---

A similar issue was fixed for _AggregateProjectPullUpConstantsRule_ in 
CALCITE-2179, see 
[https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176]


was (Author: asolimando):
A similar issue was fixed in this Calcite ticket for 
{_}AggregateProjectPullUpConstantsRule{_}, see 
[https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176]
 from CALCITE-2179

> HiveUnionPullUpConstantsRule produces an invalid plan when pulling up 
> constants for nullable fields
> ---
>
> Key: HIVE-26643
> URL: https://issues.apache.org/jira/browse/HIVE-26643
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The rule pulls up constants without checking/adjusting nullability to match 
> that of the field type.
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) 
> NOT NULL f2) NOT NULL
> expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT 
> NULL f2) NOT NULL
> set is 
> rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
> expression is HiveProject(f1=[1], f2=[$0])
>   HiveUnion(all=[true])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> {code}
> The solution is to check nullability and add a cast when the field is 
> nullable, since the constant's type is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields

2022-10-18 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26643:

Description: 
The rule pulls up constants without checking/adjusting nullability to match 
that of the field type.

Here is the stack-trace when a nullable type is involved:
{code:java}
java.lang.AssertionError: Cannot add expression of different type to set:
set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) NOT 
NULL f2) NOT NULL
expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT NULL 
f2) NOT NULL
set is 
rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
expression is HiveProject(f1=[1], f2=[$0])
  HiveUnion(all=[true])
HiveProject(f2=[$1])
  HiveProject(f1=[$0], f2=[$1])
HiveFilter(condition=[=($0, 1)])
  LogicalTableScan(table=[[]])
HiveProject(f2=[$1])
  HiveProject(f1=[$0], f2=[$1])
HiveFilter(condition=[=($0, 1)])
  LogicalTableScan(table=[[]])
{code}
The solution is to check nullability and add a cast when the field is nullable, 
since the constant's type is not.

  was:
The rule does pull up constants without checking/adjusting nullability to match 
that of the field type. 

Here is the stack-trace when a nullable type is involved:
{code:java}
java.lang.AssertionError: Cannot add expression of different type to set:
set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) NOT 
NULL f2) NOT NULL
expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT NULL 
f2) NOT NULL
set is 
rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
expression is HiveProject(f1=[1], f2=[$0])
  HiveUnion(all=[true])
HiveProject(f2=[$1])
  HiveProject(f1=[$0], f2=[$1])
HiveFilter(condition=[=($0, 1)])
  LogicalTableScan(table=[[]])
HiveProject(f2=[$1])
  HiveProject(f1=[$0], f2=[$1])
HiveFilter(condition=[=($0, 1)])
  LogicalTableScan(table=[[]])
{code}

The solution is to check nullability and add a cast when the field is nullable, 
since the constant's type is not.


> HiveUnionPullUpConstantsRule produces an invalid plan when pulling up 
> constants for nullable fields
> ---
>
> Key: HIVE-26643
> URL: https://issues.apache.org/jira/browse/HIVE-26643
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The rule pulls up constants without checking/adjusting nullability to match 
> that of the field type.
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) 
> NOT NULL f2) NOT NULL
> expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT 
> NULL f2) NOT NULL
> set is 
> rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
> expression is HiveProject(f1=[1], f2=[$0])
>   HiveUnion(all=[true])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> {code}
> The solution is to check nullability and add a cast when the field is 
> nullable, since the constant's type is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields

2022-10-17 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17618914#comment-17618914
 ] 

Alessandro Solimando edited comment on HIVE-26643 at 10/17/22 1:52 PM:
---

A similar issue was fixed in this Calcite ticket for 
{_}AggregateProjectPullUpConstantsRule{_}, see 
[https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176]
 from CALCITE-2179


was (Author: asolimando):
A similar issue was fixed in this Calcite ticket for 
_AggregateProjectPullUpConstantsRule_, see 
https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176

> HiveUnionPullUpConstantsRule produces an invalid plan when pulling up 
> constants for nullable fields
> ---
>
> Key: HIVE-26643
> URL: https://issues.apache.org/jira/browse/HIVE-26643
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The rule does pull up constants without checking/adjusting nullability to 
> match that of the field type. 
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) 
> NOT NULL f2) NOT NULL
> expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT 
> NULL f2) NOT NULL
> set is 
> rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
> expression is HiveProject(f1=[1], f2=[$0])
>   HiveUnion(all=[true])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> {code}
> The solution is to check nullability and add a cast when the field is 
> nullable, since the constant's type is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields

2022-10-17 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17618914#comment-17618914
 ] 

Alessandro Solimando commented on HIVE-26643:
-

A similar issue was fixed in this Calcite ticket for 
_AggregateProjectPullUpConstantsRule_, see 
https://github.com/apache/calcite/commit/aa25dcbe565196fb6b78149042ee817427ed4f68#diff-ff4ebbdcaabdec1969e88cbeb4fa7519f5f867d9abdce2a333e1ebc8fc549a47R172-R176

> HiveUnionPullUpConstantsRule produces an invalid plan when pulling up 
> constants for nullable fields
> ---
>
> Key: HIVE-26643
> URL: https://issues.apache.org/jira/browse/HIVE-26643
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The rule does pull up constants without checking/adjusting nullability to 
> match that of the field type. 
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) 
> NOT NULL f2) NOT NULL
> expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT 
> NULL f2) NOT NULL
> set is 
> rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
> expression is HiveProject(f1=[1], f2=[$0])
>   HiveUnion(all=[true])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> {code}
> The solution is to check nullability and add a cast when the field is 
> nullable, since the constant's type is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26643) HiveUnionPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields

2022-10-17 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26643:

Summary: HiveUnionPullUpConstantsRule produces an invalid plan when pulling 
up constants for nullable fields  (was: HiveUnionPullUpConstantsRule fails when 
pulling up constants for nullable fields)

> HiveUnionPullUpConstantsRule produces an invalid plan when pulling up 
> constants for nullable fields
> ---
>
> Key: HIVE-26643
> URL: https://issues.apache.org/jira/browse/HIVE-26643
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The rule does pull up constants without checking/adjusting nullability to 
> match that of the field type. 
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) 
> NOT NULL f2) NOT NULL
> expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT 
> NULL f2) NOT NULL
> set is 
> rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
> expression is HiveProject(f1=[1], f2=[$0])
>   HiveUnion(all=[true])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> {code}
> The solution is to check nullability and add a cast when the field is 
> nullable, since the constant's type is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26643) HiveUnionPullUpConstantsRule fails when pulling up constants over nullable fields

2022-10-17 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26643:
---


> HiveUnionPullUpConstantsRule fails when pulling up constants over nullable 
> fields
> -
>
> Key: HIVE-26643
> URL: https://issues.apache.org/jira/browse/HIVE-26643
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The rule does pull up constants without checking/adjusting nullability to 
> match that of the field type. 
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) 
> NOT NULL f2) NOT NULL
> expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT 
> NULL f2) NOT NULL
> set is 
> rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
> expression is HiveProject(f1=[1], f2=[$0])
>   HiveUnion(all=[true])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> {code}
> The solution is to check nullability and add a cast when the field is 
> nullable, since the constant's type is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26643) HiveUnionPullUpConstantsRule fails when pulling up constants for nullable fields

2022-10-17 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26643:

Summary: HiveUnionPullUpConstantsRule fails when pulling up constants for 
nullable fields  (was: HiveUnionPullUpConstantsRule fails when pulling up 
constants over nullable fields)

> HiveUnionPullUpConstantsRule fails when pulling up constants for nullable 
> fields
> 
>
> Key: HIVE-26643
> URL: https://issues.apache.org/jira/browse/HIVE-26643
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The rule does pull up constants without checking/adjusting nullability to 
> match that of the field type. 
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) 
> NOT NULL f2) NOT NULL
> expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT 
> NULL f2) NOT NULL
> set is 
> rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
> expression is HiveProject(f1=[1], f2=[$0])
>   HiveUnion(all=[true])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> {code}
> The solution is to check nullability and add a cast when the field is 
> nullable, since the constant's type is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26643) HiveUnionPullUpConstantsRule fails when pulling up constants over nullable fields

2022-10-17 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26643 started by Alessandro Solimando.
---
> HiveUnionPullUpConstantsRule fails when pulling up constants over nullable 
> fields
> -
>
> Key: HIVE-26643
> URL: https://issues.apache.org/jira/browse/HIVE-26643
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The rule does pull up constants without checking/adjusting nullability to 
> match that of the field type. 
> Here is the stack-trace when a nullable type is involved:
> {code:java}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) 
> NOT NULL f2) NOT NULL
> expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT 
> NULL f2) NOT NULL
> set is 
> rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
> expression is HiveProject(f1=[1], f2=[$0])
>   HiveUnion(all=[true])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> HiveProject(f2=[$1])
>   HiveProject(f1=[$0], f2=[$1])
> HiveFilter(condition=[=($0, 1)])
>   LogicalTableScan(table=[[]])
> {code}
> The solution is to check nullability and add a cast when the field is 
> nullable, since the constant's type is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26619) Sonar analysis is not run for the master branch

2022-10-11 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26619:

Summary: Sonar analysis is not run for the master branch  (was: Sonar 
analysis not run on the master branch)

> Sonar analysis is not run for the master branch
> ---
>
> Key: HIVE-26619
> URL: https://issues.apache.org/jira/browse/HIVE-26619
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The analysis for the master branch was using the wrong variable name 
> (_CHANGE_BRANCH_) instead of the branch name (_BRANCH_NAME_).
> For an overview of git-related environment variables available in Jenkins, 
> you can refer to [https://ci.eclipse.org/webtools/env-vars.html/].
> With [~zabetak] we have noticed some spurious files in Sonar analysis for 
> PRs, as per this sonar support thread it might be linked to the stale 
> analysis of the target branch (master for us): 
> [https://community.sonarsource.com/t/unrelated-files-scanned-in-sonarcloud-pr-check/47138/14]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26619) Sonar analysis not run on the master branch

2022-10-11 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26619 started by Alessandro Solimando.
---
> Sonar analysis not run on the master branch
> ---
>
> Key: HIVE-26619
> URL: https://issues.apache.org/jira/browse/HIVE-26619
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The analysis for the master branch was using the wrong variable name 
> (_CHANGE_BRANCH_) instead of the branch name (_BRANCH_NAME_).
> For an overview of git-related environment variables available in Jenkins, 
> you can refer to [https://ci.eclipse.org/webtools/env-vars.html/].
> With [~zabetak] we have noticed some spurious files in Sonar analysis for 
> PRs, as per this sonar support thread it might be linked to the stale 
> analysis of the target branch (master for us): 
> [https://community.sonarsource.com/t/unrelated-files-scanned-in-sonarcloud-pr-check/47138/14]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26619) Sonar analysis not run on the master branch

2022-10-11 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26619:
---


> Sonar analysis not run on the master branch
> ---
>
> Key: HIVE-26619
> URL: https://issues.apache.org/jira/browse/HIVE-26619
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The analysis for the master branch was using the wrong variable name 
> (_CHANGE_BRANCH_) instead of the branch name (_BRANCH_NAME_).
> For an overview of git-related environment variables available in Jenkins, 
> you can refer to [https://ci.eclipse.org/webtools/env-vars.html/].
> With [~zabetak] we have noticed some spurious files in Sonar analysis for 
> PRs, as per this sonar support thread it might be linked to the stale 
> analysis of the target branch (master for us): 
> [https://community.sonarsource.com/t/unrelated-files-scanned-in-sonarcloud-pr-check/47138/14]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26572) Support constant expressions in vectorization

2022-10-05 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26572:

Labels: pull-request-available  (was: )

> Support constant expressions in vectorization
> -
>
> Key: HIVE-26572
> URL: https://issues.apache.org/jira/browse/HIVE-26572
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>
> At the moment, we cannot vectorize aggregate expression having constant 
> parameters in addition to the aggregation column (it's forbidden 
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR 
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
> _compute_bit_vector_fm_ when HLL implementation has been added, while 
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.
> Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant 
> parameter controlling the number of threads for merging tasks. At the moment 
> this parameter is "injected" when trying to find an appropriate constructor 
> (see 
> [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).
> This ad-hoc approach is not scalable and would make the code hard to read and 
> maintain if more UDAF requires constant parameters.
> In addition, we are probably missing vectorization opportunities if no such 
> ad-hoc treatment is added but an appropriate UDAF constructor is available or 
> could be easily added (data sketches UDAF, although not yet vectorized, are a 
> good target).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26572) Support constant expressions in vectorization

2022-10-05 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26572:

Description: 
At the moment, we cannot vectorize aggregate expression having constant 
parameters in addition to the aggregation column (it's forbidden 
[here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).

One compelling example of how this could help is [PR 
1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
_compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
_compute_bit_vector_fm_ when HLL implementation has been added, while 
_compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.

Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant 
parameter controlling the number of threads for merging tasks. At the moment 
this parameter is "injected" when trying to find an appropriate constructor 
(see 
[VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).

This ad-hoc approach is not scalable and would make the code hard to read and 
maintain if more UDAF requires constant parameters.

In addition, we are probably missing vectorization opportunities if no such 
ad-hoc treatment is added but an appropriate UDAF constructor is available or 
could be easily added (data sketches UDAF, although not yet vectorized, are a 
good target).

  was:
At the moment, we cannot vectorize aggregate expression having constant 
parameters in addition to the aggregation column (it's forbidden 
[here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).

One compelling example of how this could help is [PR 
1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
_compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
_compute_bit_vector_fm_ when HLL implementation has been added, while 
_compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.


> Support constant expressions in vectorization
> -
>
> Key: HIVE-26572
> URL: https://issues.apache.org/jira/browse/HIVE-26572
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> At the moment, we cannot vectorize aggregate expression having constant 
> parameters in addition to the aggregation column (it's forbidden 
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR 
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
> _compute_bit_vector_fm_ when HLL implementation has been added, while 
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.
> Another example is _VectorUDAFBloomFilterMerge_, receiving an extra constant 
> parameter controlling the number of threads for merging tasks. At the moment 
> this parameter is "injected" when trying to find an appropriate constructor 
> (see 
> [VectorGroupByOperator.java#L1224-L1244|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1224-L1244]).
> This ad-hoc approach is not scalable and would make the code hard to read and 
> maintain if more UDAF requires constant parameters.
> In addition, we are probably missing vectorization opportunities if no such 
> ad-hoc treatment is added but an appropriate UDAF constructor is available or 
> could be easily added (data sketches UDAF, although not yet vectorized, are a 
> good target).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26572) Support constant expressions in vectorization

2022-10-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26572:
---

Assignee: Alessandro Solimando

> Support constant expressions in vectorization
> -
>
> Key: HIVE-26572
> URL: https://issues.apache.org/jira/browse/HIVE-26572
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> At the moment, we cannot vectorize aggregate expression having constant 
> parameters in addition to the aggregation column (it's forbidden 
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR 
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
> _compute_bit_vector_fm_ when HLL implementation has been added, while 
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26572) Support constant expressions in vectorization

2022-10-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26572 started by Alessandro Solimando.
---
> Support constant expressions in vectorization
> -
>
> Key: HIVE-26572
> URL: https://issues.apache.org/jira/browse/HIVE-26572
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> At the moment, we cannot vectorize aggregate expression having constant 
> parameters in addition to the aggregation column (it's forbidden 
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR 
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
> _compute_bit_vector_fm_ when HLL implementation has been added, while 
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26572) Support constant expressions in vectorization

2022-09-29 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26572:

Summary: Support constant expressions in vectorization  (was: Support 
constant expressions in vectorized expressions)

> Support constant expressions in vectorization
> -
>
> Key: HIVE-26572
> URL: https://issues.apache.org/jira/browse/HIVE-26572
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Major
>
> At the moment, we cannot vectorize aggregate expression having constant 
> parameters in addition to the aggregation column (it's forbidden 
> [here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).
> One compelling example of how this could help is [PR 
> 1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
> _compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
> _compute_bit_vector_fm_ when HLL implementation has been added, while 
> _compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26221) Add histogram-based column statistics

2022-09-28 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26221 started by Alessandro Solimando.
---
> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26548) Hive Data load without closing the session

2022-09-20 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-26548.
-
Resolution: Invalid

> Hive Data load without closing the session
> --
>
> Key: HIVE-26548
> URL: https://issues.apache.org/jira/browse/HIVE-26548
> Project: Hive
>  Issue Type: New Feature
> Environment: Test
>Reporter: Ashok kumar
>Priority: Major
>
> Hi  i am new to hive and i want understand what is the best way to load the 
> below  data.
>  
> I am receving 50 countries data in a separate db for each country,  each db 
> has 250 tables also the dbs will be avaible in different dates with suffix,my 
> reporting team needs consolidated db data for their analysis. So i have 
> implemented,   a loop using shell script to load data from each table and 
> insert it into targrt table , with this approach hive is creating a seasion 
> and closing a session for each table ,hence it is taking days to complete the 
> process. Can any one help the best way to implement it using shell and hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26548) Hive Data load without closing the session

2022-09-20 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607040#comment-17607040
 ] 

Alessandro Solimando commented on HIVE-26548:
-

Jira tickets are for discussing bugs/features etc., not for user use cases. You 
can try reaching out to the Hive user mailing list instead by sending an email 
to "u...@hive.apache.org" (you need to register first in order to see the 
replies).

> Hive Data load without closing the session
> --
>
> Key: HIVE-26548
> URL: https://issues.apache.org/jira/browse/HIVE-26548
> Project: Hive
>  Issue Type: New Feature
> Environment: Test
>Reporter: Ashok kumar
>Priority: Major
>
> Hi  i am new to hive and i want understand what is the best way to load the 
> below  data.
>  
> I am receving 50 countries data in a separate db for each country,  each db 
> has 250 tables also the dbs will be avaible in different dates with suffix,my 
> reporting team needs consolidated db data for their analysis. So i have 
> implemented,   a loop using shell script to load data from each table and 
> insert it into targrt table , with this approach hive is creating a seasion 
> and closing a session for each table ,hence it is taking days to complete the 
> process. Can any one help the best way to implement it using shell and hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-25848) Empty result for structs in point lookup optimization with vectorization on

2022-09-13 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603546#comment-17603546
 ] 

Alessandro Solimando edited comment on HIVE-25848 at 9/13/22 12:25 PM:
---

I gave a quick look at the transformation that _HivePointLookup_ is doing and 
there is nothing wrong there, I think you are right [~ghanko] that the problem 
lies in the vectorization handling of _IN_ clauses involving _struct_ (turning 
off CBO or that specific rule simply prevents the appearance of such clauses, 
but they are not the root cause of the issue).


was (Author: asolimando):
I gave a quick look at the transformation that _HivePointLookup_ is doing and 
there is nothing wrong there, I think you are right [~ghanko] that the problem 
lies in the vectorization handling of _IN_ clauses involving _struct._

> Empty result for structs in point lookup optimization with vectorization on
> ---
>
> Key: HIVE-25848
> URL: https://issues.apache.org/jira/browse/HIVE-25848
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Repro steps:
> {code:java}
> set hive.fetch.task.conversion=none;
> create table test (a string) partitioned by (y string, m string);
> insert into test values ('aa', 2022, 1);
> select * from test where (y=year(date_sub(current_date,4)) and 
> m=month(date_sub(current_date,4))) or (y=year(date_sub(current_date,10)) and 
> m=month(date_sub(current_date,10)) );
> --gives empty result{code}
> Turning either of the feature below off yields to good result (1 row 
> expected):
> {code:java}
> set hive.optimize.point.lookup=false;
> set hive.cbo.enable=false;
> set hive.vectorized.execution.enabled=false;
> {code}
> Expected good result is:
> {code}
> +-+-+-+
> | test.a  | test.y  | test.m  |
> +-+-+-+
> | aa      | 2022    | 1       |
> +-+-+-+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-25848) Empty result for structs in point lookup optimization with vectorization on

2022-09-13 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603546#comment-17603546
 ] 

Alessandro Solimando commented on HIVE-25848:
-

I gave a quick look at the transformation that _HivePointLookup_ is doing and 
there is nothing wrong there, I think you are right [~ghanko] that the problem 
lies in the vectorization handling of _IN_ clauses involving _struct._

> Empty result for structs in point lookup optimization with vectorization on
> ---
>
> Key: HIVE-25848
> URL: https://issues.apache.org/jira/browse/HIVE-25848
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Repro steps:
> {code:java}
> set hive.fetch.task.conversion=none;
> create table test (a string) partitioned by (y string, m string);
> insert into test values ('aa', 2022, 1);
> select * from test where (y=year(date_sub(current_date,4)) and 
> m=month(date_sub(current_date,4))) or (y=year(date_sub(current_date,10)) and 
> m=month(date_sub(current_date,10)) );
> --gives empty result{code}
> Turning either of the feature below off yields to good result (1 row 
> expected):
> {code:java}
> set hive.optimize.point.lookup=false;
> set hive.cbo.enable=false;
> set hive.vectorized.execution.enabled=false;
> {code}
> Expected good result is:
> {code}
> +-+-+-+
> | test.a  | test.y  | test.m  |
> +-+-+-+
> | aa      | 2022    | 1       |
> +-+-+-+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26400) Provide docker images for Hive

2022-08-17 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580687#comment-17580687
 ] 

Alessandro Solimando commented on HIVE-26400:
-

[~dengzh], thanks for tackling this issue, improving the developer experience 
in Hive is very much needed.

Me too I had problems with hive-dev-box too at the beginning, as [~zabetak] 
said it's very rich in features but documentation could be improved and/or 
updated.

My feeling is that there is too much overlap to just start from scratch once 
again (it would be the third project in this space as already mentioned).

Let's also keep in mind that hive-dev-box is used to run tests in CI, I feel 
that trying to integrate it into this repository and improving it would be the 
best investment for the community.

In the process we could add or remove features as we see fit, but most 
importantly we must improve the documentation so that any newcomer can set it 
up easily without having to ask for help like it's the case now.

WDYT?

> Provide docker images for Hive
> --
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Make Apache Hive be able to run inside docker container in pseudo-distributed 
> mode, with MySQL/Derby as its back database, provide the following:
>  * Quick-start/Debugging/Prepare a test env for Hive;
>  * Tools to build target image with specified version of Hive and its 
> dependencies;
>  * Images can be used as the basis for the Kubernetes operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs

2022-08-11 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578335#comment-17578335
 ] 

Alessandro Solimando commented on HIVE-26196:
-

It's working for me, you must be indeed missing permissions but I am not an 
admin on Jenkins, I can't help with that

> Integrate Sonar analysis for the master branch and PRs
> --
>
> Key: HIVE-26196
> URL: https://issues.apache.org/jira/browse/HIVE-26196
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The aim of the ticket is to integrate SonarCloud analysis for the master 
> branch and PRs.
> The ticket does not cover test coverage at the moment (it can be added in 
> follow-up tickets, if there is enough interest).
> From preliminary tests, the analysis step requires 30 additional minutes for 
> the pipeline, but this step is run in parallel with the test run, so the 
> total end-to-end run-time is not affected.
> The idea for this first integration is to track code quality metrics over new 
> commits in the master branch and for PRs, without any quality gate rules 
> (i.e., the analysis will never fail, independently of the values of the 
> quality metrics).
> An example of analysis is available in the ASF Sonar account for Hive: [PR 
> analysis|https://sonarcloud.io/summary/new_code?id=apache_hive=3254]
> After integrating the changes, PRs will also be decorated with a link to the 
> analysis to be able to better evaluate any pain points of the contribution at 
> an earlier stage, making the life of the reviewers a bit easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs

2022-08-10 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577820#comment-17577820
 ] 

Alessandro Solimando commented on HIVE-26196:
-

Sure, on Jenkins side the token is handled in the credentials section: 
[http://ci.hive.apache.org/credentials/] 

On the SonarCloud side the token can be generated here: 
[https://sonarcloud.io/account/security]

> Integrate Sonar analysis for the master branch and PRs
> --
>
> Key: HIVE-26196
> URL: https://issues.apache.org/jira/browse/HIVE-26196
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The aim of the ticket is to integrate SonarCloud analysis for the master 
> branch and PRs.
> The ticket does not cover test coverage at the moment (it can be added in 
> follow-up tickets, if there is enough interest).
> From preliminary tests, the analysis step requires 30 additional minutes for 
> the pipeline, but this step is run in parallel with the test run, so the 
> total end-to-end run-time is not affected.
> The idea for this first integration is to track code quality metrics over new 
> commits in the master branch and for PRs, without any quality gate rules 
> (i.e., the analysis will never fail, independently of the values of the 
> quality metrics).
> An example of analysis is available in the ASF Sonar account for Hive: [PR 
> analysis|https://sonarcloud.io/summary/new_code?id=apache_hive=3254]
> After integrating the changes, PRs will also be decorated with a link to the 
> analysis to be able to better evaluate any pain points of the contribution at 
> an earlier stage, making the life of the reviewers a bit easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs

2022-08-09 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577476#comment-17577476
 ] 

Alessandro Solimando edited comment on HIVE-26196 at 8/9/22 4:00 PM:
-

I think you mean something like this [~zabetak]: 
[https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-maven/] ? 
Basically there are information that need to be added at the pom level (maven 
sonar plugin and optionally some parameters which can also be passed to mvn via 
command line), the rest goes to the mvn command invocation as described in the 
guide.

For what concerns jacoco the story is a bit more complicated and for projects 
with many submodules like Hive it's not straightforward how to set it up, will 
cover that in a follow-up ticket.

EDIT: some more info on the SonarCloud part:
 * [https://sonarcloud.io/project/roles?id=apache_hive] <-- user permissions 
can be set here (it will be important to add at least all active committers 
here to mark false positives for reviews)
 * [https://sonarcloud.io/project/quality_gate?id=apache_hive] <-- quality gate 
can be set here among existing quality gates, if creating a new one is needed 
you need to contact infra (see INFRA-23557)
 * I had to add a sonar token to hive ci

This is all it took to get it up and running as it is now


was (Author: asolimando):
I think you mean something like this [~zabetak]: 
[https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-maven/] ? 
Basically there are information that need to be added at the pom level (maven 
sonar plugin and optionally some parameters which can also be passed to mvn via 
command line), the rest goes to the mvn command invocation as described in the 
guide.

For what concerns jacoco the story is a bit more complicated and for projects 
with many submodules like Hive it's not straightforward how to set it up, will 
cover that in a follow-up ticket.

> Integrate Sonar analysis for the master branch and PRs
> --
>
> Key: HIVE-26196
> URL: https://issues.apache.org/jira/browse/HIVE-26196
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The aim of the ticket is to integrate SonarCloud analysis for the master 
> branch and PRs.
> The ticket does not cover test coverage at the moment (it can be added in 
> follow-up tickets, if there is enough interest).
> From preliminary tests, the analysis step requires 30 additional minutes for 
> the pipeline, but this step is run in parallel with the test run, so the 
> total end-to-end run-time is not affected.
> The idea for this first integration is to track code quality metrics over new 
> commits in the master branch and for PRs, without any quality gate rules 
> (i.e., the analysis will never fail, independently of the values of the 
> quality metrics).
> An example of analysis is available in the ASF Sonar account for Hive: [PR 
> analysis|https://sonarcloud.io/summary/new_code?id=apache_hive=3254]
> After integrating the changes, PRs will also be decorated with a link to the 
> analysis to be able to better evaluate any pain points of the contribution at 
> an earlier stage, making the life of the reviewers a bit easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs

2022-08-09 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577476#comment-17577476
 ] 

Alessandro Solimando commented on HIVE-26196:
-

I think you mean something like this [~zabetak]: 
[https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-maven/] ? 
Basically there are information that need to be added at the pom level (maven 
sonar plugin and optionally some parameters which can also be passed to mvn via 
command line), the rest goes to the mvn command invocation as described in the 
guide.

For what concerns jacoco the story is a bit more complicated and for projects 
with many submodules like Hive it's not straightforward how to set it up, will 
cover that in a follow-up ticket.

> Integrate Sonar analysis for the master branch and PRs
> --
>
> Key: HIVE-26196
> URL: https://issues.apache.org/jira/browse/HIVE-26196
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The aim of the ticket is to integrate SonarCloud analysis for the master 
> branch and PRs.
> The ticket does not cover test coverage at the moment (it can be added in 
> follow-up tickets, if there is enough interest).
> From preliminary tests, the analysis step requires 30 additional minutes for 
> the pipeline, but this step is run in parallel with the test run, so the 
> total end-to-end run-time is not affected.
> The idea for this first integration is to track code quality metrics over new 
> commits in the master branch and for PRs, without any quality gate rules 
> (i.e., the analysis will never fail, independently of the values of the 
> quality metrics).
> An example of analysis is available in the ASF Sonar account for Hive: [PR 
> analysis|https://sonarcloud.io/summary/new_code?id=apache_hive=3254]
> After integrating the changes, PRs will also be decorated with a link to the 
> analysis to be able to better evaluate any pain points of the contribution at 
> an earlier stage, making the life of the reviewers a bit easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-25909) Add test for 'hive.default.nulls.last' property for windows with ordering

2022-07-15 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567331#comment-17567331
 ] 

Alessandro Solimando commented on HIVE-25909:
-

For the record, I could not find what the SQL standard dictates for the default 
of ORDER BY in windowing functions regarding NULLs, so I have compared against 
the major RDBMs and found that we Hive is aligned with every of them (apart 
from MySQL), my findings as follows:

{noformat}
SELECT username, action, amount, row_number() OVER (PARTITION BY username, 
action ORDER BY action DESC, amount DESC)
FROM event;

Oracle 11g R2:
johnbuy (null)  1
johnbuy 39  2
johnbuy 25  3
johnsell20  1
johnsell3   2

MySQL 8.0:
johnbuy 39  1
johnbuy 25  2
johnbuy null3
johnsell20  1
johnsell3   2

Postgres 13:
johnsell20  1
johnsell3   2
johnbuy null1
johnbuy 39  2
johnbuy 25  3

Hive:
johnsell20  1
johnsell3   2
johnbuy NULL1
johnbuy 39  2
johnbuy 25  3
{noformat}

{noformat}
SELECT username, action, amount, row_number() OVER (PARTITION BY username, 
action ORDER BY action DESC, amount DESC NULLS LAST)
FROM event;

Oracle 11g R2:
johnbuy 39  1
johnbuy 25  2
johnbuy (null)  3
johnsell20  1
johnsell3   2

MySQL 8.0: it does not support "NULLS LAST" syntax

Postgres 13:
johnsell20  1
johnsell3   2
johnbuy 39  1
johnbuy 25  2
johnbuy null3

Hive:
johnsell20  1
johnsell3   2
johnbuy 39  1
johnbuy 25  2
johnbuy NULL3
{noformat}

{noformat}
SELECT username, action, amount, row_number() OVER (PARTITION BY username, 
action ORDER BY action DESC, amount DESC NULLS FIRST)
FROM event;

Oracle 11g R2:
johnbuy (null)  1
johnbuy 39  2
johnbuy 25  3
johnsell20  1
johnsell3   2

MySQL 8.0: it does not support "NULLS FIRST" syntax

Postgres 13:
johnsell20  1
johnsell3   2
johnbuy null1
johnbuy 39  2
johnbuy 25  3

Hive:
johnsell20  1
johnsell3   2
johnbuy NULL1
johnbuy 39  2
johnbuy 25  3
{noformat}


> Add test for 'hive.default.nulls.last' property for windows with ordering
> -
>
> Key: HIVE-25909
> URL: https://issues.apache.org/jira/browse/HIVE-25909
> Project: Hive
>  Issue Type: Test
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0, 4.0.0-alpha-1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add a test around "hive.default.nulls.last" configuration property and its 
> interaction with order by clauses within windows.
> The property is known to respect such properties:
>  
> ||hive.default.nulls.last||ASC||DESC||
> |true|NULL LAST|NULL FIRST|
> |false|NULL FIRST|NULL LAST|
>  
> The test can be based along the line of the following examples:
> {noformat}
> -- hive.default.nulls.last is true by default, it sets NULLS_FIRST for DESC
> set hive.default.nulls.last;
> OUT:
> hive.default.nulls.last=true
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
> FROM test1;
> OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
> John Doe        1987-05-10 00:00:00     NULL    1
> John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
> John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
> John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
> John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5
> -- we set hive.default.nulls.last=false, it sets NULLS_LAST for DESC
> set hive.default.nulls.last=false;
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
> FROM test1;
> OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10

[jira] [Commented] (HIVE-26383) OOM during join query

2022-07-11 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564956#comment-17564956
 ] 

Alessandro Solimando commented on HIVE-26383:
-

[~pkumarsinha], does it reproduce if you trim the table/query further?

> OOM during join query
> -
>
> Key: HIVE-26383
> URL: https://issues.apache.org/jira/browse/HIVE-26383
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Priority: Major
>
> {code:java}
> [ERROR] 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[innerjoin_cal_with_insert]
>   Time elapsed: 100.73 s  <<< ERROR!
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at java.util.HashMap.newTreeNode(HashMap.java:1784)
>   at java.util.HashMap$TreeNode.putTreeVal(HashMap.java:2029)
>   at java.util.HashMap.putVal(HashMap.java:639)
>   at java.util.HashMap.put(HashMap.java:613)
>   at java.util.HashSet.add(HashSet.java:220)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.EstimateUniqueKeys.getUniqueKeys(EstimateUniqueKeys.java:229)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.EstimateUniqueKeys.getUniqueKeys(EstimateUniqueKeys.java:304)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.isKey(HiveRelMdRowCount.java:501)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.analyzeJoinForPKFK(HiveRelMdRowCount.java:302)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:102)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
>   at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1882)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1756)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addToTop(LoptOptimizeJoinRule.java:1233)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addFactorToTree(LoptOptimizeJoinRule.java:927)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createOrdering(LoptOptimizeJoinRule.java:728)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.findBestOrderings(LoptOptimizeJoinRule.java:459)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.onMatch(LoptOptimizeJoinRule.java:128)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2468)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2427)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyJoinOrderingTransform(CalcitePlanner.java:2193)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1750)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1605)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-20628) Parsing error when using a complex map data type under dynamic column masking

2022-07-07 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-20628.
-
Resolution: Invalid

This is not a bug nor a regression since complex data types have never been 
supported by Hive.

> Parsing error when using a complex map data type under dynamic column masking
> -
>
> Key: HIVE-20628
> URL: https://issues.apache.org/jira/browse/HIVE-20628
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, Parser, Security
>Affects Versions: 2.1.0
> Environment: The error can be simulated using HDP 2.6.4 sandbox
>Reporter: Darryl Dutton
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When trying to use the map complex data type as part of dynamic column mask, 
> Hive throws a parsing error as it is expecting a primitive type (see trace 
> pasted below). The use case is trying to apply masking to elements within a 
> map type by applying a custom hive UDF (to apply the mask) using Ranger. 
> Expect Hive to support complex data types for masking in addition to the 
> primitive types. The expectation occurs when Hive need to evaluate the UDF or 
> apply a standard mask (pass-through works as expected). You can recreate the 
> problem by creating a simple table with a map data type column, then applying 
> the masking to that column through a Ranger resource based policy and  a 
> custom function (you can use a standard Hive UDF  str_to_map('F4','') to 
> simulate returning a map). 
> CREATE  TABLE `mask_test`(
>  `key` string, 
>  `value` map)
> STORED AS INPUTFORMAT 
>  'org.apache.hadoop.mapred.TextInputFormat'
>  
> INSERT INTO TABLE mask_test
> SELECT 'AAA' as key, 
> map('F1','2022','F2','','F3','333') as value
> FROM (select 1 ) as temp;
>  
>  
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
>  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
>  at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
>  ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26378) Improve error message for masking over complex data types

2022-07-07 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26378:

Description: 
The current error when applying column masking over (unsupported) complex data 
types could be improved and be more explicit.

Currently, the thrown error is as follows:
{noformat}
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
 line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
specification
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
... 15 more
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize input 
near 'map' '<' 'string' in primitive type specification
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)
{noformat}


> Improve error message for masking over complex data types
> -
>
> Key: HIVE-26378
> URL: https://issues.apache.org/jira/browse/HIVE-26378
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The current error when applying column masking over (unsupported) complex 
> data types could be improved and be more explicit.
> Currently, the thrown error is as follows:
> {noformat}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
> ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 >

1 - 100 of 267 matches

Mail list logo