[jira] [Created] (HIVE-26314) Support alter function in Hive DDL
Wechar created HIVE-26314: - Summary: Support alter function in Hive DDL Key: HIVE-26314 URL: https://issues.apache.org/jira/browse/HIVE-26314 Project: Hive Issue Type: Task Components: Hive Affects Versions: 4.0.0-alpha-1 Reporter: Wechar Assignee: Wechar Fix For: 4.0.0-alpha-2 Hive SQL does not support {{*ALTER FUNCTION*}} yet, we can refer to the {{*CREATE [OR REPLACE] FUNCTION*}} of [Spark|https://spark.apache.org/docs/3.1.2/sql-ref-syntax-ddl-create-function.html] to implement the alter function . -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26313) Aggregate all column statistics into a single field
Alessandro Solimando created HIVE-26313: --- Summary: Aggregate all column statistics into a single field Key: HIVE-26313 URL: https://issues.apache.org/jira/browse/HIVE-26313 Project: Hive Issue Type: Improvement Components: Standalone Metastore, Statistics Affects Versions: 4.0.0-alpha-2 Reporter: Alessandro Solimando At the moment, column statistics tables in the metastore schema look like this (it's similar for _PART_COL_STATS_): {noformat} CREATE TABLE "APP"."TAB_COL_STATS"( "CAT_NAME" VARCHAR(256) NOT NULL, "DB_NAME" VARCHAR(128) NOT NULL, "TABLE_NAME" VARCHAR(256) NOT NULL, "COLUMN_NAME" VARCHAR(767) NOT NULL, "COLUMN_TYPE" VARCHAR(128) NOT NULL, "LONG_LOW_VALUE" BIGINT, "LONG_HIGH_VALUE" BIGINT, "DOUBLE_LOW_VALUE" DOUBLE, "DOUBLE_HIGH_VALUE" DOUBLE, "BIG_DECIMAL_LOW_VALUE" VARCHAR(4000), "BIG_DECIMAL_HIGH_VALUE" VARCHAR(4000), "NUM_DISTINCTS" BIGINT, "NUM_NULLS" BIGINT NOT NULL, "AVG_COL_LEN" DOUBLE, "MAX_COL_LEN" BIGINT, "NUM_TRUES" BIGINT, "NUM_FALSES" BIGINT, "LAST_ANALYZED" BIGINT, "CS_ID" BIGINT NOT NULL, "TBL_ID" BIGINT NOT NULL, "BIT_VECTOR" BLOB, "ENGINE" VARCHAR(128) NOT NULL ); {noformat} The idea is to have a single blob named _STATISTICS_ to replace them, as follows: {noformat} CREATE TABLE "APP"."TAB_COL_STATS"( "CAT_NAME" VARCHAR(256) NOT NULL, "DB_NAME" VARCHAR(128) NOT NULL, "TABLE_NAME" VARCHAR(256) NOT NULL, "COLUMN_NAME" VARCHAR(767) NOT NULL, "COLUMN_TYPE" VARCHAR(128) NOT NULL, "STATISTICS" BLOB, "LAST_ANALYZED" BIGINT, "CS_ID" BIGINT NOT NULL, "TBL_ID" BIGINT NOT NULL, "ENGINE" VARCHAR(128) NOT NULL ); {noformat} The _STATISTICS_ column could be the serialization of a Json-encoded string, which will be consumed in a "schema-on-read" fashion. At first at least the removed column statistics will be encoded in the _STATISTICS_ column, but since each "consumer" will read the portion of the schema it is interested into, multiple engines (see the _ENGINE_ column) can read and write statistics as they deem fit. Another advantage is that, if we plan to add more statistics in the future, we won't need to change the thrift interface for the metastore again. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26312) Use default digest normalization strategy in CBO
Stamatis Zampetakis created HIVE-26312: -- Summary: Use default digest normalization strategy in CBO Key: HIVE-26312 URL: https://issues.apache.org/jira/browse/HIVE-26312 Project: Hive Issue Type: Task Components: CBO Affects Versions: 4.0.0-alpha-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis CALCITE-2450 introduced a way to improve planning time by normalizing some query expressions (RexNodes). The behavior can be enabled/disabled via the following system property: calcite.enable.rexnode.digest.normalize There was an attempt to disable the normalization explicitly in HIVE-23456 to avoid rendering HiveFilterSortPredicates rule useless. However, the [way the normalization is disabled now|https://github.com/apache/hive/blob/f29cb2245c97102975ea0dd73783049eaa0947a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L549], dependents on the way classes are loaded. If for some reason CalciteSystemProperty is loaded before hitting the respective line in Hive.java setting the property will not have any effect. After HIVE-26238 the behavior of the rule is not dependent in the value of the property so there is nothing holding us back from enabling the normalization. At the moment there is not strong reason to enable or disable the normalization explicitly so it is better to rely on the default value provided by Calcite to avoid running with different normalization strategy when the class loading order changes. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26311) Incorrect content of array when IN operator is in the filter
Gabor Kaszab created HIVE-26311: --- Summary: Incorrect content of array when IN operator is in the filter Key: HIVE-26311 URL: https://issues.apache.org/jira/browse/HIVE-26311 Project: Hive Issue Type: Bug Reporter: Gabor Kaszab select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 2 = 1 and id = 5 {code:java} +-+---+---+ | id | arr1 | arr2 | +-+---+---+ | 5 | [10,null,12] | ["ten","eleven","twelve","thirteen"] | +-+---+---+{code} select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 2 = 1 *and id in (select id from functional_parquet.alltypestiny)* and id = 5; {code:java} +-+-+---+ | id | arr1 | arr2 | +-+-+---+ | 5 | [10,10,12] | ["ten","eleven","twelve","thirteen"] | +-+-+---+ {code} Note, the first (and correct) example returns 10, null and 12 as the items of an array while the second query for some reaon shows 10 instead of the null value. The only difference between the 2 examples is that in the second I added an extra filter (that in fact doesn't filter out anything as functional_parquet.alltypestiny's ID contains numbers from zero to ten) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26310) Remove unused junit runners from test-utils module
Stamatis Zampetakis created HIVE-26310: -- Summary: Remove unused junit runners from test-utils module Key: HIVE-26310 URL: https://issues.apache.org/jira/browse/HIVE-26310 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The two classes under https://github.com/apache/hive/tree/master/testutils/src/java/org/apache/hive/testutils/junit/runners namely: * [ConcurrentTestRunner|https://github.com/apache/hive/blob/fe0f1a648b14cdf27edcf7a5d323cbd060104ebf/testutils/src/java/org/apache/hive/testutils/junit/runners/ConcurrentTestRunner.java] * [ConcurrentScheduler|https://github.com/apache/hive/blob/fe0f1a648b14cdf27edcf7a5d323cbd060104ebf/testutils/src/java/org/apache/hive/testutils/junit/runners/model/ConcurrentScheduler.java] have been introduced a long time ago by HIVE-2935 to somewhat parallelize execution for {{TestBeeLineDriver}}. However, since HIVE-1 (resolved 6 years ago) they are not used by anyone and unlikely to be used again in the future since there are much more modern alternatives. -- This message was sent by Atlassian Jira (v8.20.7#820007)