[jira] [Created] (HIVE-27159) Filters are not pushed down for decimal format in Parquet
Rajesh Balamohan created HIVE-27159: --- Summary: Filters are not pushed down for decimal format in Parquet Key: HIVE-27159 URL: https://issues.apache.org/jira/browse/HIVE-27159 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan Decimal filters are not created and pushed down in parquet readers. This causes latency delays and unwanted row processing in query execution. It throws exception in runtime and processes more rows. E.g Q13. {noformat} Parquet: (Map 1) INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 31254.00 0 0 549,181,950 133 INFO : Map 3 0.00 0 0 73,049 365 INFO : Map 4 2027.00 0 0 6,000,0001,689,919 INFO : Map 5 0.00 0 0 7,2001,440 INFO : Map 6517.00 0 0 1,920,800 493,920 INFO : Map 7 0.00 0 0 1,0021,002 INFO : Reducer 2 18716.00 0 0 1330 INFO : -- ORC: INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 6556.00 0 0 267,146,063 152 INFO : Map 3 0.00 0 0 10,000 365 INFO : Map 4 2014.00 0 0 6,000,0001,689,919 INFO : Map 5 0.00 0 0 7,2001,440 INFO : Map 6504.00 0 0 1,920,800 493,920 INFO : Reducer 2 3159.00 0 0 1520 INFO : -- {noformat} {noformat} Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: (ss_hdemo_sk is not null and ss_addr_sk is not null and ss_cdemo_sk is not null and ss_store_sk is not null and ((ss_sales_price >= 100) or (ss_sales_price <= 150) or (ss_sales_price >= 50) or (ss_sales_price <= 100) or (ss_sales_price >= 150) or (ss_sales_price <= 200)) and ((ss_net_profit >= 100) or (ss_net_profit <= 200) or (ss_net_profit >= 150) or (ss_net_profit <= 300) or (ss_net_profit >= 50) or (ss_net_profit <= 250))) (type: boolean) probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_112_container, bigKeyColName:ss_hdemo_sk, smallTablePos:1, keyRatio:5.042575832290721E-6 Statistics: Num rows: 2750380056 Data size: 1321831086472 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (ss_hdemo_sk is not null and ss_addr_sk is not null and ss_cdemo_sk is not null and ss_store_sk is not null and ((ss_sales_price >= 100) or (ss_sales_price <= 150) or (ss_sales_price >= 50) or (ss_sales_price <= 100) or (ss_sales_price >= 150) or (ss_sales_price <= 200)) and ((ss_net_profit >= 100) or (ss_net_profit <= 200) or (ss_net_profit >= 150) or (ss_net_profit <= 300) or (ss_net_profit >= 50) or (ss_net_profit <= 250))) (type: boolean) Statistics: Num rows: 2500252205 Data size: 1201619783884 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ss_cdemo_sk (type: bigint), ss_hdemo_sk (type: bigint), ss_addr_sk (type: bigint), ss_store_sk (type: bigint), ss_quantity (type: int), ss_ext_sales_price (type: decimal(7,2)), ss_ext_wholesale_cost (type: decimal(7,2)), ss_sold_date_sk (type: bigint), ss_net_profit BETWEEN 100 AND 200 (type: boolean), ss_net_profit BETWEEN 150 AND 300 (type: boolean), ss_net_profit BETWEEN 50 AND 250 (type: boolean), ss_sales_price BETWEEN 100 AND 150 (type: boolean), ss_sales_price BETWEEN 50 AND 100 (type: boolean), ss_sales_price BETWEEN 150 AND 200 (type: boolean)
[jira] [Created] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
Simhadri Govindappa created HIVE-27158: -- Summary: Store hive columns stats in puffin files for iceberg tables Key: HIVE-27158 URL: https://issues.apache.org/jira/browse/HIVE-27158 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27157) AssertionError when inferring return type for unix_timestamp function
Stamatis Zampetakis created HIVE-27157: -- Summary: AssertionError when inferring return type for unix_timestamp function Key: HIVE-27157 URL: https://issues.apache.org/jira/browse/HIVE-27157 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Any attempt to derive the return data type for the {{unix_timestamp}} function results into the following assertion error. {noformat} java.lang.AssertionError: typeName.allowsPrecScale(true, false): BIGINT at org.apache.calcite.sql.type.BasicSqlType.checkPrecScale(BasicSqlType.java:65) at org.apache.calcite.sql.type.BasicSqlType.(BasicSqlType.java:81) at org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:67) at org.apache.calcite.sql.fun.SqlAbstractTimeFunction.inferReturnType(SqlAbstractTimeFunction.java:78) at org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:278) {noformat} due to a faulty implementation of type inference for the respective operators: * [https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java] * [https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java] Although at this stage in master it is not possible to reproduce the problem with an actual SQL query the buggy implementation must be fixed since slight changes in the code/CBO rules may lead to methods relying on {{{}SqlOperator.inferReturnType{}}}. Note that in older versions of Hive it is possible to hit the AssertionError in various ways. For example in Hive 3.1.3 (and older), the error may come from [HiveRelDecorrelator|https://github.com/apache/hive/blob/4df4d75bf1e16fe0af75aad0b4179c34c07fc975/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1933] in the presence of sub-queries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27156) Wrong results when CAST timestamp literal with timezone to TIMESTAMP
Stamatis Zampetakis created HIVE-27156: -- Summary: Wrong results when CAST timestamp literal with timezone to TIMESTAMP Key: HIVE-27156 URL: https://issues.apache.org/jira/browse/HIVE-27156 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Casting a timestamp literal with an invalid timezone to the TIMESTAMP datatype results into a timestamp with the time part truncated to midnight (00:00:00). *Case I* {code:sql} select cast('2020-06-28 22:17:33.123456 Europe/Amsterd' as timestamp); {code} +Actual+ |2020-06-28 00:00:00| +Expected+ |NULL/ERROR/2020-06-28 22:17:33.123456| *Case II* {code:sql} select cast('2020-06-28 22:17:33.123456 Invalid/Zone' as timestamp); {code} +Actual+ |2020-06-28 00:00:00| +Expected+ |NULL/ERROR/2020-06-28 22:17:33.123456| The existing documentation does not cover what should be the output in the cases above: * https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-TimestampstimestampTimestamps * https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types *Case III* Another subtle but important case is the following where the timestamp literal has a valid timezone but we are attempting a cast to a datatype that does not store the timezone. {code:sql} select cast('2020-06-28 22:17:33.123456 Europe/Amsterdam' as timestamp); {code} +Actual+ |2020-06-28 22:17:33.123456| The correctness of the last result is debatable since someone would expect a NULL or ERROR. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27155) Iceberg: Vectorize virtual columns
Denys Kuzmenko created HIVE-27155: - Summary: Iceberg: Vectorize virtual columns Key: HIVE-27155 URL: https://issues.apache.org/jira/browse/HIVE-27155 Project: Hive Issue Type: Task Reporter: Denys Kuzmenko Vectorization gets disabled at runtime with the following reason: {code} Select expression for SELECT operator: Virtual column PARTITION__SPEC__ID is not supported {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)