[jira] [Created] (HIVE-25358) Remove reviewer pattern
Jesus Camacho Rodriguez created HIVE-25358: -- Summary: Remove reviewer pattern Key: HIVE-25358 URL: https://issues.apache.org/jira/browse/HIVE-25358 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25105) Support Parquet as MV storage format
Jesus Camacho Rodriguez created HIVE-25105: -- Summary: Support Parquet as MV storage format Key: HIVE-25105 URL: https://issues.apache.org/jira/browse/HIVE-25105 Project: Hive Issue Type: Improvement Components: Materialized views Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently the support storage formats do not include Parquet: {code} ... HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", "ORC", new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"), ... {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24966) RuntimeException in CBO if HMS stats are modified externally
Jesus Camacho Rodriguez created HIVE-24966: -- Summary: RuntimeException in CBO if HMS stats are modified externally Key: HIVE-24966 URL: https://issues.apache.org/jira/browse/HIVE-24966 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez While we want to expose this case so the user can take action, currently we throw a RuntimeException. Rather than failing the query, it may be better to show this information to the user and suggest recomputing stats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24685) Remove HiveSubQRemoveRelBuilder
Jesus Camacho Rodriguez created HIVE-24685: -- Summary: Remove HiveSubQRemoveRelBuilder Key: HIVE-24685 URL: https://issues.apache.org/jira/browse/HIVE-24685 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez The class seems to be a close clone of {{RelBuilder}} created due to some bugs existing in original implementation. Those issues seem to be fixed now and we should be able to get rid of the copy. In the worst case scenario, if we need to keep it for the time being, we could try to make it extend {{RelBuilder}} and override only necessary methods. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24527) Allow triggering materialized view rewriting for external tables
Jesus Camacho Rodriguez created HIVE-24527: -- Summary: Allow triggering materialized view rewriting for external tables Key: HIVE-24527 URL: https://issues.apache.org/jira/browse/HIVE-24527 Project: Hive Issue Type: Sub-task Components: Materialized views Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Although we will not be able to check data staleness, this can be useful for debugging purposes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24453) DirectSQL error when parsing create_time value for database
Jesus Camacho Rodriguez created HIVE-24453: -- Summary: DirectSQL error when parsing create_time value for database Key: HIVE-24453 URL: https://issues.apache.org/jira/browse/HIVE-24453 Project: Hive Issue Type: Bug Components: Metastore Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. Although the value for that field is always set after that patch, the value could be null if the database was created before the feature went in. DirectSQL should check for null value before parsing the integer, otherwise we hit an exception and fallback to ORM path: {noformat} 2020-11-28 09:06:05,414 WARN org.apache.hadoop.hive.metastore.ObjectStore: [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this is not an error): null at org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420) at org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor
Jesus Camacho Rodriguez created HIVE-24387: -- Summary: Metastore access through JDBC handler does not use correct database accessor Key: HIVE-24387 URL: https://issues.apache.org/jira/browse/HIVE-24387 Project: Hive Issue Type: Bug Components: JDBC storage handler Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez There is some differences in the SQL syntax for each RDBMS generated by the database accessor. For metastore, we always end up with the default accessor, which lead to errors, e.g., when a limit query is executed for a Postgres-backed metastore. {code} Error: java.io.IOException: java.io.IOException: org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error while trying to get column names: ERROR: syntax error at or near "{" Position: 200 (state=,code=0) SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", "GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", "TBL_COL_PRIV", "TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS" {LIMIT 1} {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24325) Cardinality preserving join optimization may fail when column is a constant
Jesus Camacho Rodriguez created HIVE-24325: -- Summary: Cardinality preserving join optimization may fail when column is a constant Key: HIVE-24325 URL: https://issues.apache.org/jira/browse/HIVE-24325 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez More info to come. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24232) Incorrect translation of rollup expression from Calcite
Jesus Camacho Rodriguez created HIVE-24232: -- Summary: Incorrect translation of rollup expression from Calcite Key: HIVE-24232 URL: https://issues.apache.org/jira/browse/HIVE-24232 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez In Calcite, it is not necessary that the columns in the group set are in the same order as the rollup. For instance, this is the Calcite representation of a rollup for a given query: {code} HiveAggregate(group=[{1, 6, 7}], groups=[[{1, 6, 7}, {1, 7}, {1}, {}]], agg#0=[sum($12)], agg#1=[count($12)], agg#2=[sum($4)], agg#3=[count($4)], agg#4=[sum($15)], agg#5=[count($15)]) {code} When we generate the Hive plan from the Calcite operator, we make such assumption incorrectly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24202) Clean up local HS2 HMS cache code (II)
Jesus Camacho Rodriguez created HIVE-24202: -- Summary: Clean up local HS2 HMS cache code (II) Key: HIVE-24202 URL: https://issues.apache.org/jira/browse/HIVE-24202 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Follow-up for HIVE-24183 (split into different JIRAs). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24183) Clean up local HS2 HMS cache code
Jesus Camacho Rodriguez created HIVE-24183: -- Summary: Clean up local HS2 HMS cache code Key: HIVE-24183 URL: https://issues.apache.org/jira/browse/HIVE-24183 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Follow-up for HIVE-24176. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24176) Create query-level cache for HMS requests and extend existing local HS2 HMS cache
Jesus Camacho Rodriguez created HIVE-24176: -- Summary: Create query-level cache for HMS requests and extend existing local HS2 HMS cache Key: HIVE-24176 URL: https://issues.apache.org/jira/browse/HIVE-24176 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez This issue creates a query-level cache for HMS requests. The lifecycle of that cache is associated to the lifecycle of the query. This basically means that each unique request to certain HMS APIs should only be served once from HMS, while follow-up repetitive calls will be retrieved from cache. The initial implementation includes caching for 19 APIs. This issue also extends existing local HS2 HMS cache implementation introduced in HIVE-23949 to support other requests (getTableColumnStatistics, getPartitionsByNames). In fact, implementation relies on some of the logic introduced in that JIRA since there are some commonalities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24157) Strict mode to fail on CAST timestamp <-> numeral
Jesus Camacho Rodriguez created HIVE-24157: -- Summary: Strict mode to fail on CAST timestamp <-> numeral Key: HIVE-24157 URL: https://issues.apache.org/jira/browse/HIVE-24157 Project: Hive Issue Type: Improvement Components: SQL Reporter: Jesus Camacho Rodriguez There is some interest in enforcing that CAST numeral <-> timestamp is disallowed to avoid confusion among users, e.g., SQL standard does not allow numeral <-> timestamp casting, timestamp type is timezone agnostic, etc. We should introduce a strict config for timestamp (similar to others before): If the config is true, we shall fail while compiling the query with a meaningful message. To provide similar behavior, Hive has multiple functions that provide clearer semantics for numeral to timestamp conversion (and vice versa): https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses
Jesus Camacho Rodriguez created HIVE-24154: -- Summary: Missing simplification opportunity with IN and EQUALS clauses Key: HIVE-24154 URL: https://issues.apache.org/jira/browse/HIVE-24154 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez For instance, in perf driver CBO query 74, there are several filters that could be simplified further: {code} HiveFilter(condition=[AND(=($1, 1999), IN($1, 1998, 1999))]) {code} This may lead to incorrect estimates and leads to unnecessary execution time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24147) Table column names are not extracted correctly in Hive JDBC storage handler
Jesus Camacho Rodriguez created HIVE-24147: -- Summary: Table column names are not extracted correctly in Hive JDBC storage handler Key: HIVE-24147 URL: https://issues.apache.org/jira/browse/HIVE-24147 Project: Hive Issue Type: Bug Components: JDBC storage handler Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez It seems the `ResultSetMetaData` extracted from the query to retrieve the table columns names contains these columns as fully qualified names instead of possibly using the {{getTableName}} method. This ends up throwing the storage handler off and leading to exceptions, both in CBO path and non-CBO path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
Jesus Camacho Rodriguez created HIVE-24144: -- Summary: getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value Key: HIVE-24144 URL: https://issues.apache.org/jira/browse/HIVE-24144 Project: Hive Issue Type: Bug Components: JDBC Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez {code} public String getIdentifierQuoteString() throws SQLException { return " "; } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
Jesus Camacho Rodriguez created HIVE-24143: -- Summary: Include convention in JDBC converter operator in Calcite plan Key: HIVE-24143 URL: https://issues.apache.org/jira/browse/HIVE-24143 Project: Hive Issue Type: Improvement Components: CBO Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Among others, it will be useful to debug the dialect being chosen for query generation. For instance: {code} HiveProject(jdbc_type_conversion_table1.ikey=[$0], jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], jdbc_type_conversion_table1.dkey=[$3], jdbc_type_conversion_table1.chkey=[$4], jdbc_type_conversion_table1.dekey=[$5], jdbc_type_conversion_table1.dtkey=[$6], jdbc_type_conversion_table1.tkey=[$7]) HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], dekey=[$5], dtkey=[$6], tkey=[$7]) ->HiveJdbcConverter(convention=[JDBC.DERBY]) JdbcHiveTableScan(table=[[default, jdbc_type_conversion_table1]], table:alias=[jdbc_type_conversion_table1]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24092) Implement additional JDBC methods required by JDBC storage handler
Jesus Camacho Rodriguez created HIVE-24092: -- Summary: Implement additional JDBC methods required by JDBC storage handler Key: HIVE-24092 URL: https://issues.apache.org/jira/browse/HIVE-24092 Project: Hive Issue Type: Bug Components: JDBC storage handler Reporter: Jesus Camacho Rodriguez Calcite may rely on the following JDBC methods to generate SQL queries for Hive JDBC storage handler, which in the case of Hive itself, return a {{Method not supported}} exception. We should implement such methods: {code} nullsAreSortedAtEnd nullsAreSortedAtStart nullsAreSortedLow nullsAreSortedHigh storesLowerCaseIdentifiers storesLowerCaseQuotedIdentifiers storesMixedCaseIdentifiers storesMixedCaseQuotedIdentifiers storesUpperCaseIdentifiers storesUpperCaseQuotedIdentifiers supportsMixedCaseIdentifiers supportsMixedCaseQuotedIdentifiers {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24074) Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x
Jesus Camacho Rodriguez created HIVE-24074: -- Summary: Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x Key: HIVE-24074 URL: https://issues.apache.org/jira/browse/HIVE-24074 Project: Hive Issue Type: Bug Components: Avro, Parquet Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez The timezone conversion for Parquet and Avro uses new {{java.time.*}} classes, which can lead to incorrect values returned for certain dates in certain timezones if timestamp was computed and converted based on {{java.sql.*}} classes. For instance, the offset used for Singapore timezone in 1900-01-01T00:00:00.000 is UTC+8, while the correct offset for that date should be UTC+6:55:25. Some additional information can be found here: https://stackoverflow.com/a/52152315 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24073) Execution exception in sort-merge semijoin
Jesus Camacho Rodriguez created HIVE-24073: -- Summary: Execution exception in sort-merge semijoin Key: HIVE-24073 URL: https://issues.apache.org/jira/browse/HIVE-24073 Project: Hive Issue Type: Bug Components: Operators Reporter: Jesus Camacho Rodriguez Assignee: mahesh kumar behera Working on HIVE-24001, we trigger an additional SJ conversion that leads to this exception at execution time: {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite nextKeyWritables[1] at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite nextKeyWritables[1] at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite nextKeyWritables[1] at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887) at org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003) at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020) ... 23 more {code} To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in the last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24041) Extend semijoin conversion rules
Jesus Camacho Rodriguez created HIVE-24041: -- Summary: Extend semijoin conversion rules Key: HIVE-24041 URL: https://issues.apache.org/jira/browse/HIVE-24041 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez This patch fixes a couple of limitations that can be seen in {{cbo_query95.q}}, in particular: - It adds a rule to trigger semijoin conversion when the there is an aggregate on top of the join that prunes all columns from left side, and the aggregate operator is on the left input of the join. - It extends existing semijoin conversion rules to prune the unused columns from its left input, which leads to additional conversion opportunities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24012) Support for rewriting with materialized views containing grouping sets
Jesus Camacho Rodriguez created HIVE-24012: -- Summary: Support for rewriting with materialized views containing grouping sets Key: HIVE-24012 URL: https://issues.apache.org/jira/browse/HIVE-24012 Project: Hive Issue Type: Sub-task Components: Materialized views Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Rewriting is not triggered for materialized views containing grouping sets. This issue implements an extension from Hive side to trigger additional rewritings for materialized views containing grouping sets. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23973) Use SQL constraints to improve join reordering algorithm (III)
Jesus Camacho Rodriguez created HIVE-23973: -- Summary: Use SQL constraints to improve join reordering algorithm (III) Key: HIVE-23973 URL: https://issues.apache.org/jira/browse/HIVE-23973 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23869) Move alter statements in parser to new file
Jesus Camacho Rodriguez created HIVE-23869: -- Summary: Move alter statements in parser to new file Key: HIVE-23869 URL: https://issues.apache.org/jira/browse/HIVE-23869 Project: Hive Issue Type: Improvement Components: Parser Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez We are hitting HiveParser 'code too large' problem. HIVE-23857 introduced an adhoc script to solve this problem. Instead, we can split HiveParser.g into smaller files. For instance, we can group all alter statements into their own .g file. This patch also fixes an ambiguity warning that was through related to LIKE ALL/ANY clauses. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23558) Remove compute_stats UDAF
Jesus Camacho Rodriguez created HIVE-23558: -- Summary: Remove compute_stats UDAF Key: HIVE-23558 URL: https://issues.apache.org/jira/browse/HIVE-23558 Project: Hive Issue Type: Improvement Components: Statistics Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez HIVE-23530 replaces its usage completely. This issue is to remove it from Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23530) Use SQL functions instead of compute_stats UDAF to compute column statistics
Jesus Camacho Rodriguez created HIVE-23530: -- Summary: Use SQL functions instead of compute_stats UDAF to compute column statistics Key: HIVE-23530 URL: https://issues.apache.org/jira/browse/HIVE-23530 Project: Hive Issue Type: Improvement Components: Statistics Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently we compute column statistics by relying on the {{compute_stats}} UDAF. For instance, for a given table {{tbl}}, the query to compute statistics for columns is translated internally into: {code} SELECT compute_stats(c1), compute_stats(c2), ... FROM tbl; {code} {{compute_stats}} produces data for the stats available for each column type, e.g., struct<"max":long,"min":long,"countnulls":long,...>. This issue is to produce a query that relies purely on SQL functions instead: {code} SELECT max(c1), min(c1), count(case when c1 is null then 1 else null end), ... FROM tbl; {code} This will allow us to deprecate the {{compute_stats}} UDAF since it mostly duplicates functionality found in those other functions. Additionally, many of those functions already provide a vectorized implementation so the approach could potentially improve the performance of column stats collection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23389) FilterMergeRule can lead to AssertionError
Jesus Camacho Rodriguez created HIVE-23389: -- Summary: FilterMergeRule can lead to AssertionError Key: HIVE-23389 URL: https://issues.apache.org/jira/browse/HIVE-23389 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez I have not been able in latest master but this could potentially happens since Filter creation as a check on whether the expression is flat ([here|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Filter.java#L74]) and Filter merge does not flatten an expression when it is created. {noformat} java.lang.AssertionError: AND(=($3, 100), OR(OR(null, IS NOT NULL(CAST(100):INTEGER)), =(CAST(100):INTEGER, CAST(200):INTEGER))) at org.apache.calcite.rel.core.Filter.(Filter.java:74) at org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter.(HiveFilter.java:39) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveFilterFactoryImpl.createFilter(HiveRelFactories.java:126) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelBuilder.filter(HiveRelBuilder.java:99) at org.apache.calcite.tools.RelBuilder.filter(RelBuilder.java:1055) at org.apache.calcite.rel.rules.FilterMergeRule.onMatch(FilterMergeRule.java:81) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23365) Put RS deduplication optimization under cost based decision
Jesus Camacho Rodriguez created HIVE-23365: -- Summary: Put RS deduplication optimization under cost based decision Key: HIVE-23365 URL: https://issues.apache.org/jira/browse/HIVE-23365 Project: Hive Issue Type: Improvement Components: Physical Optimizer Reporter: Jesus Camacho Rodriguez Currently, RS deduplication is always executed whenever it is semantically correct. However, it could be beneficial if t to leave both RS operators in the plan, e.g., if the NDV of the second RS is very low. Thus, we would like this decision to be cost-based. We could use a simple heuristic that would work fine for most of the cases without introducing regressions for existing cases, e.g., if NDV for partition column is less than estimated parallelism in the second RS, do not execute deduplication. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23302) Create HiveJdbcDatabaseAccessor for JDBC storage handler
Jesus Camacho Rodriguez created HIVE-23302: -- Summary: Create HiveJdbcDatabaseAccessor for JDBC storage handler Key: HIVE-23302 URL: https://issues.apache.org/jira/browse/HIVE-23302 Project: Hive Issue Type: Bug Components: StorageHandler Reporter: Jesus Camacho Rodriguez The {{JdbcDatabaseAccessor}} associated with the storage handler makes some SQL calls to the RDBMS through the JDBC connection. There is a {{GenericJdbcDatabaseAccessor}} with a generic implementation that the storage handler uses if there is no specific implementation for a certain RDBMS. Currently, Hive uses the {{GenericJdbcDatabaseAccessor}}. Afaik the only generic query that will not work is splitting the query based on offset and limit, since the syntax for that query is different than the one accepted by Hive. We should create a {{HiveJdbcDatabaseAccessor}} to override that query and possibly fix any other existing incompatibilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23298) Disable RS deduplication step in Optimizer if it is run in TezCompiler
Jesus Camacho Rodriguez created HIVE-23298: -- Summary: Disable RS deduplication step in Optimizer if it is run in TezCompiler Key: HIVE-23298 URL: https://issues.apache.org/jira/browse/HIVE-23298 Project: Hive Issue Type: Improvement Components: Physical Optimizer Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez HIVE-20703 introduced an additional RS deduplication step in TezCompiler. We could possibly try to disable the one that runs in {{Optimizer}} if we are using Tez so we do not run the optimization twice. This issue is to explore that possibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23291) Add Hive to DatabaseType in JDBC storage handler
Jesus Camacho Rodriguez created HIVE-23291: -- Summary: Add Hive to DatabaseType in JDBC storage handler Key: HIVE-23291 URL: https://issues.apache.org/jira/browse/HIVE-23291 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Inception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23275) Represent UNBOUNDED in window functions in CBO correctly
Jesus Camacho Rodriguez created HIVE-23275: -- Summary: Represent UNBOUNDED in window functions in CBO correctly Key: HIVE-23275 URL: https://issues.apache.org/jira/browse/HIVE-23275 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently we use a bounded representation with bound set to Integer.MAX_VALUE, which works correctly since that is the Hive implementation. However, Calcite has a specific boundary class {{RexWindowBoundUnbounded}} that we should be using instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23229) CAST to string on column instead of simplification over literal column
Jesus Camacho Rodriguez created HIVE-23229: -- Summary: CAST to string on column instead of simplification over literal column Key: HIVE-23229 URL: https://issues.apache.org/jira/browse/HIVE-23229 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez After HIVE-23100 went in, we end up for one of the queries with CAST over a column instead of applying CAST on literal and comparing in CHAR, which can be seen in ql/src/test/results/clientpositive/in_typecheck_char.q.out . {code} filterExpr: (((s = 'a') and (t = 'a ')) or (null and (t = 'bb'))) is null (type: boolean) {code} was replaced by: {code} filterExpr: (((CAST( s AS STRING) = 'a') and (CAST( t AS STRING) = 'a')) or (null and (CAST( t AS STRING) = 'bb'))) is null (type: boolean) {code} Probably this is as a result of the changes introduced in HIVE-23100 wrt IN handling. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23228) Missed optimization opportunity with equals and not equals
Jesus Camacho Rodriguez created HIVE-23228: -- Summary: Missed optimization opportunity with equals and not equals Key: HIVE-23228 URL: https://issues.apache.org/jira/browse/HIVE-23228 Project: Hive Issue Type: Improvement Components: CBO Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez After HIVE-23100 went in, there was a missed opportunity on the simplification of an AND predicate containing equals and not equals clause, which can be seen in ql/src/test/results/clientpositive/pcs.q.out . {code} filterExpr: ((key = 3) or (ds = '2000-04-08') or key is not null) and (key = 2)) or ((ds <> '2000-04-08') and (key = 3))) and ((key + 5) > 0))) (type: boolean) {code} was replaced by: {code} filterExpr: ((key = 3) or (ds = '2000-04-08') or key is not null) and (key = 2)) or ((ds <> '2000-04-08') and (key <> 2) and (key = 3))) and ((key + 5) > 0))) (type: boolean) {code} Note the additional {{key <> 2}} in predicate below. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23227) Refactor RexConverter and move some of its functionality into HiveFunctionHelper
Jesus Camacho Rodriguez created HIVE-23227: -- Summary: Refactor RexConverter and move some of its functionality into HiveFunctionHelper Key: HIVE-23227 URL: https://issues.apache.org/jira/browse/HIVE-23227 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez After HIVE-23100, {{HiveFunctionHelper}} makes a few calls to methods that are in {{RexConverter}}. Those methods do not need to be there anymore but were not moved as part of that patch to avoid further changes in it. This issue is to tackle that refactoring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23226) Implement Calcite rule to transform CASE into COALESCE when possible
Jesus Camacho Rodriguez created HIVE-23226: -- Summary: Implement Calcite rule to transform CASE into COALESCE when possible Key: HIVE-23226 URL: https://issues.apache.org/jira/browse/HIVE-23226 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Currently, it is done in {{TypeCheckProcFactory}} when we create a Hive expression after Calcite optimization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23225) Simplify ExprFactory, ExprNodeDescExprFactory and RexNodeExprFactory
Jesus Camacho Rodriguez created HIVE-23225: -- Summary: Simplify ExprFactory, ExprNodeDescExprFactory and RexNodeExprFactory Key: HIVE-23225 URL: https://issues.apache.org/jira/browse/HIVE-23225 Project: Hive Issue Type: Improvement Reporter: Jesus Camacho Rodriguez The new {{ExprFactory}} was created based on existing calls from {{TypeCheckProcFactory}}. Now that we have the {{ExprNodeDesc}} and {{RexNode}} implementations, it seems we could do some work consolidating those methods, simplifying the super/subclasses, etc. For instance, the handling of literal values seems quite convoluted (handled by many different method) and could possibly be abstracted in a different way. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23224) Literals in CBO plan could show less information
Jesus Camacho Rodriguez created HIVE-23224: -- Summary: Literals in CBO plan could show less information Key: HIVE-23224 URL: https://issues.apache.org/jira/browse/HIVE-23224 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently, they are very verbose. For all char and varchar literals it will show the encoding, thought it is always the same. For varchar literals, it prints type and length, which seems unnecessary. For instance: {code} HiveFilter(condition=[AND(IN($10, _UTF-16LE'wallpaper':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'parenting':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'musical':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'womens':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'birdal':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'pants':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), IN($12, _UTF-16LE'Home':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'Books':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'Electronics':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'Shoes':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'Jewelry':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'Men':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), OR(AND(IN($12, _UTF-16LE'Home':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'Books':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'Electronics':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), IN($10, _UTF-16LE'wallpaper':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'parenting':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'musical':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), AND(IN($12, _UTF-16LE'Shoes':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'Jewelry':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'Men':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), IN($10, _UTF-16LE'womens':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'birdal':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'pants':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23223) Unnecessary CAST to decimal around CASE statement
Jesus Camacho Rodriguez created HIVE-23223: -- Summary: Unnecessary CAST to decimal around CASE statement Key: HIVE-23223 URL: https://issues.apache.org/jira/browse/HIVE-23223 Project: Hive Issue Type: Improvement Components: CBO Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez After HIVE-23100 went in, there was a missed opportunity on the simplification of a CAST statement on top of a CASE clause, which can be seen in ql/src/test/results/clientpositive/vector_case_when_2.q.out . {code} expressions: q548284 (type: int), CASE WHEN ((q548284 = 4)) THEN (0.8) WHEN ((q548284 = 5)) THEN (1) ELSE (8) END (type: decimal(2,1)) {code} was replaced by: {code} expressions: q548284 (type: int), CAST( CASE WHEN ((q548284 = 4)) THEN (0.8) WHEN ((q548284 = 5)) THEN (1) ELSE (8) END AS decimal(11,1)) (type: decimal(11,1)) {code} The type of the CASE expression could be inferred and enforce without the CAST. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23222) Missed opportunity in IN merge
Jesus Camacho Rodriguez created HIVE-23222: -- Summary: Missed opportunity in IN merge Key: HIVE-23222 URL: https://issues.apache.org/jira/browse/HIVE-23222 Project: Hive Issue Type: Improvement Components: CBO Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez After HIVE-23100 went in, there was a missed opportunity merging IN clauses, which can be seen in ql/src/test/results/clientpositive/llap/vector_between_in.q.out . {code} filterExpr: (cdecimal1) IN (2365.8945945946, 881.0135135135, -3367.6517567568) (type: boolean) {code} was replaced by: {code} filterExpr: ((cdecimal1) IN (2365.8945945946, -3367.6517567568) or (cdecimal1) IN (881.0135135135)) (type: boolean) {code} The problem seems to be that with decimal type, we are considering values with different precision/scale as a different type, thus we do not merge them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23162) Remove swapping logic to merge joins in AST converter
Jesus Camacho Rodriguez created HIVE-23162: -- Summary: Remove swapping logic to merge joins in AST converter Key: HIVE-23162 URL: https://issues.apache.org/jira/browse/HIVE-23162 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez In ASTConverter, there is some logic to invert join inputs so the logic to merge joins in SemanticAnalyzer kicks in. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L407 There is a bug because inputs are swapped but the schema is not. However, it turns out that logic is not needed now that merging is off by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23100) Create RexNode factory and use it in CalcitePlanner
Jesus Camacho Rodriguez created HIVE-23100: -- Summary: Create RexNode factory and use it in CalcitePlanner Key: HIVE-23100 URL: https://issues.apache.org/jira/browse/HIVE-23100 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Follow-up of HIVE-22746. This will allow us to generate directly the RexNode from the AST nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23011) Shared work optimizer should check residual predicates when comparing joins
Jesus Camacho Rodriguez created HIVE-23011: -- Summary: Shared work optimizer should check residual predicates when comparing joins Key: HIVE-23011 URL: https://issues.apache.org/jira/browse/HIVE-23011 Project: Hive Issue Type: Bug Components: Physical Optimizer Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23009) SEL operator created by DynamicPartitionPruningOptimization does not populate colExprMap
Jesus Camacho Rodriguez created HIVE-23009: -- Summary: SEL operator created by DynamicPartitionPruningOptimization does not populate colExprMap Key: HIVE-23009 URL: https://issues.apache.org/jira/browse/HIVE-23009 Project: Hive Issue Type: Bug Components: Physical Optimizer, Statistics Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez This can lead to incorrect column stats propagation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22996) BasicStats parsing should check proactively for null or empty string
Jesus Camacho Rodriguez created HIVE-22996: -- Summary: BasicStats parsing should check proactively for null or empty string Key: HIVE-22996 URL: https://issues.apache.org/jira/browse/HIVE-22996 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Rather than throwing an Exception for control flow, which will create unnecessary overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite
Jesus Camacho Rodriguez created HIVE-22978: -- Summary: Fix decimal precision and scale inference for aggregate rewriting in Calcite Key: HIVE-22978 URL: https://issues.apache.org/jira/browse/HIVE-22978 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into {{sum/count}}. When type of {{avg}} is decimal, inference of intermediate precision and scale for the division is not done correctly. The reason is that we miss support for some types in method {{getDefaultPrecision}} in {{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden in {{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate type inference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22962) Reuse HiveRelFieldTrimmer instance across queries
Jesus Camacho Rodriguez created HIVE-22962: -- Summary: Reuse HiveRelFieldTrimmer instance across queries Key: HIVE-22962 URL: https://issues.apache.org/jira/browse/HIVE-22962 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently we create multiple {{HiveRelFieldTrimmer}} instances per query. {{HiveRelFieldTrimmer}} uses a method dispatcher that has a built-in caching mechanism: given a certain object, it stores the method that was called for the object class. However, by instantiating the trimmer multiple times per query and across queries, we create a new dispatcher with each instantiation, thus effectively removing the caching mechanism that is built within the dispatcher. This issue is to reutilize the same {{HiveRelFieldTrimmer}} instance within a single query and across queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22953) Update Apache Arrow and flatbuffer versions
Jesus Camacho Rodriguez created HIVE-22953: -- Summary: Update Apache Arrow and flatbuffer versions Key: HIVE-22953 URL: https://issues.apache.org/jira/browse/HIVE-22953 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez HIVE-22827 updated flatbuffer version to 1.6.0.1. Current Arrow version consumed by Hive uses 1.2.0 (com.vlkan:flatbuffers version). This issue is to update Arrow to at least 0.15.1 and flatbuffers to 1.11.0 (from official flatbuffers release, same version used by Arrow). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22923) Extract cumulative cost metadata from HiveRelMdDistinctRowCount metadata provider
Jesus Camacho Rodriguez created HIVE-22923: -- Summary: Extract cumulative cost metadata from HiveRelMdDistinctRowCount metadata provider Key: HIVE-22923 URL: https://issues.apache.org/jira/browse/HIVE-22923 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez It should not contained there. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22921) materialized_view_partitioned_3.q relies on hive.optimize.sort.dynamic.partition property
Jesus Camacho Rodriguez created HIVE-22921: -- Summary: materialized_view_partitioned_3.q relies on hive.optimize.sort.dynamic.partition property Key: HIVE-22921 URL: https://issues.apache.org/jira/browse/HIVE-22921 Project: Hive Issue Type: Test Reporter: Jesus Camacho Rodriguez Assignee: Vineet Garg {{hive.optimize.sort.dynamic.partition}} was deprecated in favor of {{hive.optimize.sort.dynamic.partition.threshold}} in HIVE-20703. {{materialized_view_partitioned_3.q}} specifically tests SortedDynPartitionOptimizer for MVs. We need to update the q test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22854) hive-service should not depend on hive-exec
Jesus Camacho Rodriguez created HIVE-22854: -- Summary: hive-service should not depend on hive-exec Key: HIVE-22854 URL: https://issues.apache.org/jira/browse/HIVE-22854 Project: Hive Issue Type: Improvement Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez It does not need to depend on hive-exec since it does not use it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22842) Timestamp/date vectors in Arrow serializer should use correct calendar for value representation
Jesus Camacho Rodriguez created HIVE-22842: -- Summary: Timestamp/date vectors in Arrow serializer should use correct calendar for value representation Key: HIVE-22842 URL: https://issues.apache.org/jira/browse/HIVE-22842 Project: Hive Issue Type: Improvement Reporter: Jesus Camacho Rodriguez Assignee: Shubham Chaurasia -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22827) Update Flatbuffer version
Jesus Camacho Rodriguez created HIVE-22827: -- Summary: Update Flatbuffer version Key: HIVE-22827 URL: https://issues.apache.org/jira/browse/HIVE-22827 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Hive currently uses Flatbuffer 1.2.0. Other Apache projects use a more up-to-date version, e.g. 1.6.0.1. Upgrade to that version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22795) Create new parser and udf module from ql
Jesus Camacho Rodriguez created HIVE-22795: -- Summary: Create new parser and udf module from ql Key: HIVE-22795 URL: https://issues.apache.org/jira/browse/HIVE-22795 Project: Hive Issue Type: Improvement Components: Build Infrastructure Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez ql is a huge module. I propose to start splitting it by creating new module `parser` and `udf` to encapsulate some classes related to SQL parsing and UDF declaration, respectively. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22785) Update/delete/merge statements not optimized through CBO
Jesus Camacho Rodriguez created HIVE-22785: -- Summary: Update/delete/merge statements not optimized through CBO Key: HIVE-22785 URL: https://issues.apache.org/jira/browse/HIVE-22785 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Currently, CBO is bypassed for update/delete/merge statements. To support optimizing these statements through CBO, we need to complete three main tasks: 1) support for sort in CBO, 2) support for SORT in AST converter, and 3) {{RewriteSemanticAnalyzer}} should extend {{CalcitePlanner}} instead of {{SemanticAnalyzer}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22746) Make TypeCheckProcFactory generic
Jesus Camacho Rodriguez created HIVE-22746: -- Summary: Make TypeCheckProcFactory generic Key: HIVE-22746 URL: https://issues.apache.org/jira/browse/HIVE-22746 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez {{TypeCheckProcFactory}} is responsible for processing AST nodes and generating ExprNode objects from them. When we generate the expressions for Calcite planning, we go through a {{AST node -> ExprNode -> RexNode}} transformation. We would like to avoid the overhead of going through the ExprNode, and thus generate directly the RexNode from the AST. To do that, the first step is to make {{TypeCheckProcFactory}} generic, so it can receive an expression factory and create expressions in different realms. For the time being, the only factory implementation is the ExprNode factory. Thus, this patch focuses mainly on refactoring {{TypeCheckProcFactory}} without breaking anything that is already working. In a follow-up patch, we will create a {{RexNode}} factory and use it when we parse the query in CalcitePlanner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22728) Limit the scope of uniqueness of constraint name to database
Jesus Camacho Rodriguez created HIVE-22728: -- Summary: Limit the scope of uniqueness of constraint name to database Key: HIVE-22728 URL: https://issues.apache.org/jira/browse/HIVE-22728 Project: Hive Issue Type: Wish Reporter: Jesus Camacho Rodriguez Currently, constraint names are globally unique across all databases (assumption is that this may have done by design). Nevertheless, though behavior seems to be implementation specific, it would be interesting to limit the scope to uniqueness per database. Currently we do not store database information with the constraints. To change the scope to one db, we would need to store the DB_ID in the KEY_CONSTRAINTS table in metastore when we create a constraint and add the DB_ID to the PRIMARY KEY of that table. Some minor changes to the error messages would be needed too, since otherwise it would be difficult to identify the correct violation in queries that span across multiple databases. Additionally, the SQL scripts will need to be updated to populate the DB_ID when we upgrade to new version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22589) Add storage support for ProlepticCalendar
Jesus Camacho Rodriguez created HIVE-22589: -- Summary: Add storage support for ProlepticCalendar Key: HIVE-22589 URL: https://issues.apache.org/jira/browse/HIVE-22589 Project: Hive Issue Type: Bug Components: storage-api Reporter: Owen O'Malley Assignee: László Bodor Fix For: 4.0.0, 3.1.3, storage-2.7.1 Hive recently moved its processing to the proleptic calendar, which has created some issues for users who have dates before 1580 AD. I'd propose extending the column vectors for times & dates to encode which calendar they are using. * create DateColumnVector that extends LongColumnVector * add a method to change calendars to both DateColumnVector and TimestampColumnVector. {code} /** * Change the calendar to or from proleptic. If the new and old values of the flag are the * same, nothing is done. * useProleptic - set the flag for the proleptic calendar * updateData - change the data to match the new value of the flag. */ void changeCalendar(useProleptic: boolean, updateData: boolean); /** * Detect whether this data is using the proleptic calendar. */ boolean usingProlepticCalendar(); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22549) RS deduplication should not merge final aggregation without keys
Jesus Camacho Rodriguez created HIVE-22549: -- Summary: RS deduplication should not merge final aggregation without keys Key: HIVE-22549 URL: https://issues.apache.org/jira/browse/HIVE-22549 Project: Hive Issue Type: Bug Components: Physical Optimizer Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez This may lead to performance degradation. For instance, this can happen for the following query: {code} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; EXPLAIN CREATE TABLE x STORED AS ORC TBLPROPERTIES('transactional'='true') AS SELECT * FROM SRC x CLUSTER BY x.key; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22538) RS deduplication does not always enforce hive.optimize.reducededuplication.min.reducer
Jesus Camacho Rodriguez created HIVE-22538: -- Summary: RS deduplication does not always enforce hive.optimize.reducededuplication.min.reducer Key: HIVE-22538 URL: https://issues.apache.org/jira/browse/HIVE-22538 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez For transactional tables, that property might be overriden to 1, which can lead to merging final aggregation into a single stage (hence leading to performance degradation). For instance, when autogather column stats is enabled, this can happen for the following query: {code} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; EXPLAIN CREATE TABLE x STORED AS ORC TBLPROPERTIES('transactional'='true') AS SELECT * FROM SRC x CLUSTER BY x.key; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22532) PTFPPD may push limit incorrectly through Rank/DenseRank function
Jesus Camacho Rodriguez created HIVE-22532: -- Summary: PTFPPD may push limit incorrectly through Rank/DenseRank function Key: HIVE-22532 URL: https://issues.apache.org/jira/browse/HIVE-22532 Project: Hive Issue Type: Bug Components: Physical Optimizer Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22486) Send only accessed columns for masking policies request
Jesus Camacho Rodriguez created HIVE-22486: -- Summary: Send only accessed columns for masking policies request Key: HIVE-22486 URL: https://issues.apache.org/jira/browse/HIVE-22486 Project: Hive Issue Type: Improvement Components: CBO Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently, we send all columns for masking request, even if they are not accessed by the given query. We could send only those columns for which the masking policy will be necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22480) IndexOutOfBounds exception while reading ORC files written with empty positions list in first row index entry
Jesus Camacho Rodriguez created HIVE-22480: -- Summary: IndexOutOfBounds exception while reading ORC files written with empty positions list in first row index entry Key: HIVE-22480 URL: https://issues.apache.org/jira/browse/HIVE-22480 Project: Hive Issue Type: Bug Components: ORC Affects Versions: 2.3.6, 1.2.2 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.3.0, 1.2.3, 2.4.0, 2.3.7 Although this should not happen, we may end up with empty positions list in first row index entry due to some bug (see ORC-569). Since positions in first row index are always zero, it would be good if the reader could still read these files instead of fail. The error stack looks like this: {code} ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1566395485735_11359_2_00, diagnostics=[Task failed, taskId=task_1566395485735_11359_2_00_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1566395485735_11359_2_00_00_0:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:377) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0 at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:83) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:694) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:653) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:525) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188) ... 14 more Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:380) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) ... 25 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:4456) at org.apache.orc.OrcProto$RowIndexEntry.getPositions(OrcProto.java:6867) at org.apache.orc.impl.RecordReaderUtils.addRgFilteredStreamToRanges(RecordReaderUtils.java:257) at org.apache.orc.impl.RecordReaderImpl.planReadPartialDataStreams(RecordReaderImpl.java:942) at org.apache.orc.impl.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:979) at org.apache.orc.impl.R
[jira] [Created] (HIVE-22430) Avoid creation of additional RS for limit if it is equal to zero
Jesus Camacho Rodriguez created HIVE-22430: -- Summary: Avoid creation of additional RS for limit if it is equal to zero Key: HIVE-22430 URL: https://issues.apache.org/jira/browse/HIVE-22430 Project: Hive Issue Type: Improvement Components: Query Planning Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22396) CMV creating a Full ACID partitioned table fails because of no writeId
Jesus Camacho Rodriguez created HIVE-22396: -- Summary: CMV creating a Full ACID partitioned table fails because of no writeId Key: HIVE-22396 URL: https://issues.apache.org/jira/browse/HIVE-22396 Project: Hive Issue Type: Sub-task Components: HiveServer2, repl Affects Versions: 4.0.0 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat create table t1(a int, b int); insert into t1 values (1, 2), (3, 4); create table t6_part partitioned by (a) stored as orc tblproperties ("transactional"="true") as select * from t1; ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in the config by open txn task for migration Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in the config by open txn task for migration (state=08S01,code=1) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22341) abortTxns statements should be executed in a single transaction
Jesus Camacho Rodriguez created HIVE-22341: -- Summary: abortTxns statements should be executed in a single transaction Key: HIVE-22341 URL: https://issues.apache.org/jira/browse/HIVE-22341 Project: Hive Issue Type: Bug Components: Metastore Reporter: Jesus Camacho Rodriguez Logic in `abortTxns` should be executed in a single transaction, rather than multiple ones. Otherwise, if you restart HMS between txn abort and the lock deletion, we end up with orphaned locks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22339) Change default time for MVs refresh in registry
Jesus Camacho Rodriguez created HIVE-22339: -- Summary: Change default time for MVs refresh in registry Key: HIVE-22339 URL: https://issues.apache.org/jira/browse/HIVE-22339 Project: Hive Issue Type: Improvement Components: Materialized views Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Default was set to 60secs in HIVE-21344. It seems it may be too aggressive; suggestion is to change default to 1500secs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22314) Disable count distinct rewrite in Hive optimizer if it is already rewritten by Calcite
Jesus Camacho Rodriguez created HIVE-22314: -- Summary: Disable count distinct rewrite in Hive optimizer if it is already rewritten by Calcite Key: HIVE-22314 URL: https://issues.apache.org/jira/browse/HIVE-22314 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22311) Propagate min/max column values from statistics to the optimizer for timestamp type
Jesus Camacho Rodriguez created HIVE-22311: -- Summary: Propagate min/max column values from statistics to the optimizer for timestamp type Key: HIVE-22311 URL: https://issues.apache.org/jira/browse/HIVE-22311 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently stats annotation does not consider timestamp type e.g. for estimates with range predicates. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22310) Factor out common code from *ColumnStatsAggregator
Jesus Camacho Rodriguez created HIVE-22310: -- Summary: Factor out common code from *ColumnStatsAggregator Key: HIVE-22310 URL: https://issues.apache.org/jira/browse/HIVE-22310 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jesus Camacho Rodriguez There are different column stats aggregator instances for each different types, e.g., {{DateColumnStatsAggregator}}, {{LongColumnStatsAggregator}}, {{DoubleColumnStatsAggregator}}, etc. Much of the logic in those classes seems to be common or could be generalized and reused; we should move it into {{ColumnStatsAggregator}} parent class or a utility class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22309) Use finer granularity for different types in column stats
Jesus Camacho Rodriguez created HIVE-22309: -- Summary: Use finer granularity for different types in column stats Key: HIVE-22309 URL: https://issues.apache.org/jira/browse/HIVE-22309 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jesus Camacho Rodriguez For instance, for {{timestamp}} type we are throwing away precision since we store min/max in seconds since epoch (no millis nor nanos). This would include some changes in metastore tables that store column statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22248) Min value for column in stats is not set correctly for some data types
Jesus Camacho Rodriguez created HIVE-22248: -- Summary: Min value for column in stats is not set correctly for some data types Key: HIVE-22248 URL: https://issues.apache.org/jira/browse/HIVE-22248 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jesus Camacho Rodriguez Assignee: Miklos Gergely I am not sure whether the problem is printing the value or in the value stored in the metastore itself, but for some types (e.g. tinyint, smallint, int, bigint, double or float), the min value does not seem to be set correctly (set to 0). https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/alter_table_update_status.q.out#L342 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22241) Implement UDF to convert a date/timestamp from Gregorian-Julian hybrid calendar to proleptic Gregorian calendar
Jesus Camacho Rodriguez created HIVE-22241: -- Summary: Implement UDF to convert a date/timestamp from Gregorian-Julian hybrid calendar to proleptic Gregorian calendar Key: HIVE-22241 URL: https://issues.apache.org/jira/browse/HIVE-22241 Project: Hive Issue Type: Improvement Components: UDF Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez UDF that converts a date/timestamp from *Gregorian-Julian hybrid* calendar, i.e., calendar that supports both the Julian and Gregorian calendar systems with the support of a single discontinuity, which corresponds by default to the Gregorian date when the Gregorian calendar was instituted, to *proleptic Gregorian calendar* (ISO 8601 standard), which is produced by extending the Gregorian calendar backward to dates preceding its official introduction in 1582. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22239) Scale data size using column value ranges
Jesus Camacho Rodriguez created HIVE-22239: -- Summary: Scale data size using column value ranges Key: HIVE-22239 URL: https://issues.apache.org/jira/browse/HIVE-22239 Project: Hive Issue Type: Improvement Components: Physical Optimizer Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently, min/max values for columns are only used to determine whether a certain range filter falls out of range and thus filters all rows or none at all. If it does not, we just use a heuristic that the condition will filter 1/3 of the input rows. Instead of using that heuristic, we can use another one that assumes that data will be uniformly distributed across that range, and calculate the selectivity for the condition accordingly. This patch also includes the propagation of min/max column values from statistics to the optimizer for timestamp type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22232) NPE when hive.order.columnalignment is set to false
Jesus Camacho Rodriguez created HIVE-22232: -- Summary: NPE when hive.order.columnalignment is set to false Key: HIVE-22232 URL: https://issues.apache.org/jira/browse/HIVE-22232 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez When {{hive.order.columnalignment}} is disabled and the plan contains an Aggregate operator, we hit a NPE. {code} java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:163) at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:111) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1555) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:483) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12630) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:357) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1385) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1332) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1327) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:124) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:217) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242) ... {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22219) Bringing a node manager down blocks restart of LLAP service
Jesus Camacho Rodriguez created HIVE-22219: -- Summary: Bringing a node manager down blocks restart of LLAP service Key: HIVE-22219 URL: https://issues.apache.org/jira/browse/HIVE-22219 Project: Hive Issue Type: Bug Components: llap Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez For YARN service, when number of running instances != number of desired instances, the service state may be in STARTED or FLEX (instead of STABLE). For Hive LLAP side, there is a config to control the threshold of service health check. The Hive LLAP code misses checking these states, which can result in the service not coming up even if the threshold is met. https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusServiceDriver.java#L382 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22209) Creating a materialized view with no tables should be handled more gracefully
Jesus Camacho Rodriguez created HIVE-22209: -- Summary: Creating a materialized view with no tables should be handled more gracefully Key: HIVE-22209 URL: https://issues.apache.org/jira/browse/HIVE-22209 Project: Hive Issue Type: Bug Components: Materialized views Reporter: Jesus Camacho Rodriguez Assignee: John Sherman Currently, materialized views without a table reference are not supported. However, instead of printing a clear message about it, when a materialized view is created without a table reference, we fail with an unclear message. {code} > create materialized view mv_test1 as select 5; (...) ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request failed : INSERT INTO MV_TABLES_USED (MV_CREATION_METADATA_ID,TBL_ID) VALUES (?,?) ) INFO : Completed executing command(queryId=hive_20190916203511_b609cccf-f5e3-45dd-abfd-6e869d94e39a); Time taken: 10.469 seconds Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaExcep tion(message:Add request failed : INSERT INTO MV_TABLES_USED (MV_CREATION_METADATA_ID,TBL_ID) VALUES (?,?) ) (state=08S01,code=1) {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22204) Beeline option to show/not show execution report
Jesus Camacho Rodriguez created HIVE-22204: -- Summary: Beeline option to show/not show execution report Key: HIVE-22204 URL: https://issues.apache.org/jira/browse/HIVE-22204 Project: Hive Issue Type: Improvement Components: Beeline Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently, {{--silent=true}} will also remove the short report about execution (includes number of rows returned by a query and execution time). It would be interesting to control whether we want to show that report even if {{--silent=true}}, e.g., using an option {{--report=true}}. Default (existing) behavior should not change. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22200) Hash collision may cause column resolution to fail
Jesus Camacho Rodriguez created HIVE-22200: -- Summary: Hash collision may cause column resolution to fail Key: HIVE-22200 URL: https://issues.apache.org/jira/browse/HIVE-22200 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez {{ExprNodeDescUtils.getExprNodeColumnDesc}} extracts the {{ExprNodeColumnDesc}} (column descriptors) from an expression. In fact, it creates a map from hash to the object itself. It same hash value is generated for two different objects, this will result in a clash in the map and some expressions not being part of its values. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22170) from_unixtime and unix_timestamp should use user session time zone
Jesus Camacho Rodriguez created HIVE-22170: -- Summary: from_unixtime and unix_timestamp should use user session time zone Key: HIVE-22170 URL: https://issues.apache.org/jira/browse/HIVE-22170 Project: Hive Issue Type: Bug Affects Versions: 3.1.2, 3.1.1, 3.1.0, 4.0.0, 3.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez According to documentation, that is the expected behavior (since session time zone was not present, system time zone was being used previously). This was incorrectly changed by HIVE-12192 / HIVE-20007. This JIRA should fix this issue. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22075) Fix HIVE-14200 properly
Jesus Camacho Rodriguez created HIVE-22075: -- Summary: Fix HIVE-14200 properly Key: HIVE-22075 URL: https://issues.apache.org/jira/browse/HIVE-22075 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Gopal V -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22072) Altering table to make a column change does not update constraints references
Jesus Camacho Rodriguez created HIVE-22072: -- Summary: Altering table to make a column change does not update constraints references Key: HIVE-22072 URL: https://issues.apache.org/jira/browse/HIVE-22072 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez The constraint will still point to old column descriptor incorrectly. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22066) Upgrade Apache parent POM to version 21
Jesus Camacho Rodriguez created HIVE-22066: -- Summary: Upgrade Apache parent POM to version 21 Key: HIVE-22066 URL: https://issues.apache.org/jira/browse/HIVE-22066 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22057) Early bailout in SharedWorkOptimizer if all tables are referenced only once
Jesus Camacho Rodriguez created HIVE-22057: -- Summary: Early bailout in SharedWorkOptimizer if all tables are referenced only once Key: HIVE-22057 URL: https://issues.apache.org/jira/browse/HIVE-22057 Project: Hive Issue Type: Improvement Components: Physical Optimizer Reporter: Jesus Camacho Rodriguez In that case, there is no space for optimization, so we should bail out immediately and do not do any extra work. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22046) Differentiate among column stats computed by different engines
Jesus Camacho Rodriguez created HIVE-22046: -- Summary: Differentiate among column stats computed by different engines Key: HIVE-22046 URL: https://issues.apache.org/jira/browse/HIVE-22046 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez The goal is to avoid computation of column stats by engines to step on each other, e.g., Hive and Impala. In longer term, we may introduce a common representation for the column statistics stored by different engines. For this issue, we will add a new column 'engine' to TAB_COL_STATS HMS table (unpartitioned tables) and to PART_COL_STATS HMS table (partitioned tables). This will prevent conflicts at the column level stats. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22042) Set hive.exec.dynamic.partition.mode=nonstrict by default
Jesus Camacho Rodriguez created HIVE-22042: -- Summary: Set hive.exec.dynamic.partition.mode=nonstrict by default Key: HIVE-22042 URL: https://issues.apache.org/jira/browse/HIVE-22042 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22007) Do not push not supported types to specific JDBC sources from Calcite
Jesus Camacho Rodriguez created HIVE-22007: -- Summary: Do not push not supported types to specific JDBC sources from Calcite Key: HIVE-22007 URL: https://issues.apache.org/jira/browse/HIVE-22007 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez We should not push a project expression if it uses a type that a specific dialect does not support, e.g., boolean in Oracle. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22003) Shared work optimizer may leave semijoin branches in plan that are not used
Jesus Camacho Rodriguez created HIVE-22003: -- Summary: Shared work optimizer may leave semijoin branches in plan that are not used Key: HIVE-22003 URL: https://issues.apache.org/jira/browse/HIVE-22003 Project: Hive Issue Type: Bug Components: Physical Optimizer Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez This may happen only when the TS are the only operators that are shared. Repro attached in q file. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-21976) Offset should be null instead of zero in Calcite HiveSortLimit
Jesus Camacho Rodriguez created HIVE-21976: -- Summary: Offset should be null instead of zero in Calcite HiveSortLimit Key: HIVE-21976 URL: https://issues.apache.org/jira/browse/HIVE-21976 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Calcite expects a value equal or greater than 1. Otherwise, it may generate SQL from a plan incorrectly ({{offset 0}}). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21953) Enable CLUSTERED ON/DISTRIBUTED ON+SORTED ON in incremental rebuild of materialized views
Jesus Camacho Rodriguez created HIVE-21953: -- Summary: Enable CLUSTERED ON/DISTRIBUTED ON+SORTED ON in incremental rebuild of materialized views Key: HIVE-21953 URL: https://issues.apache.org/jira/browse/HIVE-21953 Project: Hive Issue Type: Bug Components: Materialized views Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Follow-up of HIVE-18842. For insert and insert branch in merge, we can introduce a RS to enforce these properties, as we do when we create the materialized view or execute a full rebuild. This will make delta files created for the insert to obey the same organization. If the increments are large enough, this may improve query execution performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21946) Consider data distribution of a materialized view in transparent rewriting
Jesus Camacho Rodriguez created HIVE-21946: -- Summary: Consider data distribution of a materialized view in transparent rewriting Key: HIVE-21946 URL: https://issues.apache.org/jira/browse/HIVE-21946 Project: Hive Issue Type: Bug Components: Materialized views Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently, we do consider partitioning of the original table, but we do not take into account data organization (DISTRIBUTE/SORT/CLUSTER) in the optimizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21945) Enable sorted dynamic partitioning optimization for materialized views with custom data organization
Jesus Camacho Rodriguez created HIVE-21945: -- Summary: Enable sorted dynamic partitioning optimization for materialized views with custom data organization Key: HIVE-21945 URL: https://issues.apache.org/jira/browse/HIVE-21945 Project: Hive Issue Type: Bug Components: Materialized views Affects Versions: 4.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez After implementing HIVE-18842, we need to extend the optimizer to work with partitioned materialized views that are created with custom data organization, i.e., using CLUSTERED, DISTRIBUTED, or SORTED. Currently, optimization bails out when the materialized view is partitioned and either CLUSTERED, DISTRIBUTED, or SORTED. In particular, we will need to combine the RS operator introduced by the translation of these clauses with the new RS needed to distribute and sort the data based on the dynamic partition values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21928) Fix for statistics annotation in nested AND expressions
Jesus Camacho Rodriguez created HIVE-21928: -- Summary: Fix for statistics annotation in nested AND expressions Key: HIVE-21928 URL: https://issues.apache.org/jira/browse/HIVE-21928 Project: Hive Issue Type: Bug Components: Physical Optimizer Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Discovered while working on HIVE-21867. Having predicates with nested AND expressions may result in different stats, even if predicates are basically similar (from stats estimation standpoint). For instance, stats for {{AND(x=5, true, true)}} are different from {{x=5}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21872) Bucketed tables that load data from data/files/auto_sortmerge_join should be tagged as 'bucketing_version'='1'
Jesus Camacho Rodriguez created HIVE-21872: -- Summary: Bucketed tables that load data from data/files/auto_sortmerge_join should be tagged as 'bucketing_version'='1' Key: HIVE-21872 URL: https://issues.apache.org/jira/browse/HIVE-21872 Project: Hive Issue Type: Bug Components: Test Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez It is incorrect to use version 2, since the data files were created with old hash function. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21871) Multi-statement transactions in direct SQL
Jesus Camacho Rodriguez created HIVE-21871: -- Summary: Multi-statement transactions in direct SQL Key: HIVE-21871 URL: https://issues.apache.org/jira/browse/HIVE-21871 Project: Hive Issue Type: Bug Components: Metastore, Standalone Metastore Reporter: Jesus Camacho Rodriguez To access metastore, we may bypass the JDO layer and query the metastore RDBMS directly (we refer to this as direct SQL path). There are some methods in Hive metastore that may issue multiple queries against RDBMS to build the return objects (e.g. {{get_partitions_by_names}}). Currently going through direct SQL may issue each query to the RDBMS in a different transaction (while afaik going through JDO will create a single transaction to retrieve and compose such objects). This may lead to failures while running some operations concurrently, e.g., in the example above, if a partition is being dropped and partitions are being retrieved using direct SQL path. A solution would be to execute all statements needed to retrieve the results for such a function within a single transaction when we use direct SQL path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21867) Sort semijoin conditions to accelerate query processing
Jesus Camacho Rodriguez created HIVE-21867: -- Summary: Sort semijoin conditions to accelerate query processing Key: HIVE-21867 URL: https://issues.apache.org/jira/browse/HIVE-21867 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Following approach similar to http://db.cs.berkeley.edu/jmh/miscpapers/sigmod93.pdf . To reorder predicates in AND conditions, we could rank each of elements in the clauses in increasing order based on following formula: {code} rank = (selectivity - 1) / cost per tuple {code} Similarly, for OR conditions: {code} rank = (-selectivity) / cost per tuple {code} Selectivity can be computed with FilterSelectivityEstimator. For cost per tuple, we will need to come up with some heuristic based on how expensive is the evaluation of the functions contained in that predicate. Custom UDFs could be annotated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21857) Sort conditions in a filter predicate to accelerate query processing
Jesus Camacho Rodriguez created HIVE-21857: -- Summary: Sort conditions in a filter predicate to accelerate query processing Key: HIVE-21857 URL: https://issues.apache.org/jira/browse/HIVE-21857 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez To reorder predicates in AND conditions, we could rank each of elements in the clauses in increasing order based on following formula: {code} rank = (selectivity - 1) / cost per tuple {code} Similarly, for OR conditions: {code} rank = (-selectivity) / cost per tuple {code} Selectivity can be computed with FilterSelectivityEstimator. For cost per tuple, we will need to come up with some heuristic based on how expensive is the evaluation of the functions contained in that predicate. Custom UDFs could be annotated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21834) Avoid unnecessary calls to simplify filter conditions
Jesus Camacho Rodriguez created HIVE-21834: -- Summary: Avoid unnecessary calls to simplify filter conditions Key: HIVE-21834 URL: https://issues.apache.org/jira/browse/HIVE-21834 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Every time we create a filter, we try to simplify its condition. However, we already have a rule that simplifies the expressions and it is within the same loop as most of the rules that end up creating new filters. Hence, it may seem we should be able to remove some of the calls to simplify those conditions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21827) Multiple calls in Semantic
Jesus Camacho Rodriguez created HIVE-21827: -- Summary: Multiple calls in Semantic Key: HIVE-21827 URL: https://issues.apache.org/jira/browse/HIVE-21827 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21794) Add materialized view parameters to sqlStdAuthSafeVarNameRegexes
Jesus Camacho Rodriguez created HIVE-21794: -- Summary: Add materialized view parameters to sqlStdAuthSafeVarNameRegexes Key: HIVE-21794 URL: https://issues.apache.org/jira/browse/HIVE-21794 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-21794.patch -- This message was sent by Atlassian JIRA (v7.6.3#76005)