[jira] [Created] (HIVE-26314) Support alter function in Hive DDL

2022-06-10 Thread Wechar (Jira)
Wechar created HIVE-26314:
-

 Summary: Support alter function in Hive DDL
 Key: HIVE-26314
 URL: https://issues.apache.org/jira/browse/HIVE-26314
 Project: Hive
  Issue Type: Task
  Components: Hive
Affects Versions: 4.0.0-alpha-1
Reporter: Wechar
Assignee: Wechar
 Fix For: 4.0.0-alpha-2


Hive SQL does not support {{*ALTER FUNCTION*}} yet, we can refer to the 
{{*CREATE [OR REPLACE] FUNCTION*}} of 
[Spark|https://spark.apache.org/docs/3.1.2/sql-ref-syntax-ddl-create-function.html]
 to implement the alter function .



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26313) Aggregate all column statistics into a single field

2022-06-10 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26313:
---

 Summary: Aggregate all column statistics into a single field
 Key: HIVE-26313
 URL: https://issues.apache.org/jira/browse/HIVE-26313
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore, Statistics
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


At the moment, column statistics tables in the metastore schema look like this 
(it's similar for _PART_COL_STATS_):

{noformat}
CREATE TABLE "APP"."TAB_COL_STATS"(
"CAT_NAME" VARCHAR(256) NOT NULL,
"DB_NAME" VARCHAR(128) NOT NULL,
"TABLE_NAME" VARCHAR(256) NOT NULL,
"COLUMN_NAME" VARCHAR(767) NOT NULL,
"COLUMN_TYPE" VARCHAR(128) NOT NULL,
"LONG_LOW_VALUE" BIGINT,
"LONG_HIGH_VALUE" BIGINT,
"DOUBLE_LOW_VALUE" DOUBLE,
"DOUBLE_HIGH_VALUE" DOUBLE,
"BIG_DECIMAL_LOW_VALUE" VARCHAR(4000),
"BIG_DECIMAL_HIGH_VALUE" VARCHAR(4000),
"NUM_DISTINCTS" BIGINT,
"NUM_NULLS" BIGINT NOT NULL,
"AVG_COL_LEN" DOUBLE,
"MAX_COL_LEN" BIGINT,
"NUM_TRUES" BIGINT,
"NUM_FALSES" BIGINT,
"LAST_ANALYZED" BIGINT,
"CS_ID" BIGINT NOT NULL,
"TBL_ID" BIGINT NOT NULL,
"BIT_VECTOR" BLOB,
"ENGINE" VARCHAR(128) NOT NULL
);
{noformat}

The idea is to have a single blob named _STATISTICS_ to replace them, as 
follows:

{noformat}
CREATE TABLE "APP"."TAB_COL_STATS"(
"CAT_NAME" VARCHAR(256) NOT NULL,
"DB_NAME" VARCHAR(128) NOT NULL,
"TABLE_NAME" VARCHAR(256) NOT NULL,
"COLUMN_NAME" VARCHAR(767) NOT NULL,
"COLUMN_TYPE" VARCHAR(128) NOT NULL,
"STATISTICS" BLOB,
"LAST_ANALYZED" BIGINT,
"CS_ID" BIGINT NOT NULL,
"TBL_ID" BIGINT NOT NULL,
"ENGINE" VARCHAR(128) NOT NULL
);
{noformat}

The _STATISTICS_ column could be the serialization of a Json-encoded string, 
which will be consumed in a "schema-on-read" fashion.

At first at least the removed column statistics will be encoded in the 
_STATISTICS_ column, but since each "consumer" will read the portion of the 
schema it is interested into, multiple engines (see the _ENGINE_ column) can 
read and write statistics as they deem fit.

Another advantage is that, if we plan to add more statistics in the future, we 
won't need to change the thrift interface for the metastore again.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26312) Use default digest normalization strategy in CBO

2022-06-10 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-26312:
--

 Summary: Use default digest normalization strategy in CBO
 Key: HIVE-26312
 URL: https://issues.apache.org/jira/browse/HIVE-26312
 Project: Hive
  Issue Type: Task
  Components: CBO
Affects Versions: 4.0.0-alpha-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


CALCITE-2450 introduced a way to improve planning time by normalizing some 
query expressions (RexNodes). The behavior can be enabled/disabled via the 
following system property: calcite.enable.rexnode.digest.normalize

There was an attempt to disable the normalization explicitly in HIVE-23456 to 
avoid rendering HiveFilterSortPredicates rule useless. However, the [way the 
normalization is disabled 
now|https://github.com/apache/hive/blob/f29cb2245c97102975ea0dd73783049eaa0947a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L549],
 dependents on the way classes are loaded. If for some reason 
CalciteSystemProperty is loaded before hitting the respective line in Hive.java 
setting the property will not have any effect.

After HIVE-26238 the behavior of the rule is not dependent in the value of the 
property so there is nothing holding us back from enabling the normalization.

At the moment there is not strong reason to enable or disable the normalization 
explicitly so it is better to rely on the default value provided by Calcite to 
avoid running with different normalization strategy when the class loading 
order changes.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26311) Incorrect content of array when IN operator is in the filter

2022-06-10 Thread Gabor Kaszab (Jira)
Gabor Kaszab created HIVE-26311:
---

 Summary: Incorrect content of array when IN operator is in the 
filter
 Key: HIVE-26311
 URL: https://issues.apache.org/jira/browse/HIVE-26311
 Project: Hive
  Issue Type: Bug
Reporter: Gabor Kaszab


select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 2 
= 1 and id = 5
{code:java}
+-+---+---+
| id  |     arr1      |                 arr2                  |
+-+---+---+
| 5   | [10,null,12]  | ["ten","eleven","twelve","thirteen"]  |
+-+---+---+{code}
select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 2 
= 1 *and id in (select id from functional_parquet.alltypestiny)* and id = 5;
{code:java}
+-+-+---+
| id  |    arr1     |                 arr2                  |
+-+-+---+
| 5   | [10,10,12]  | ["ten","eleven","twelve","thirteen"]  |
+-+-+---+ {code}
Note, the first (and correct) example returns 10, null and 12 as the items of 
an array while the second query for some reaon shows 10 instead of the null 
value. The only difference between the 2 examples is that in the second I added 
an extra filter (that in fact doesn't filter out anything as 
functional_parquet.alltypestiny's ID contains numbers from zero to ten)

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26310) Remove unused junit runners from test-utils module

2022-06-10 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-26310:
--

 Summary: Remove unused junit runners from test-utils module
 Key: HIVE-26310
 URL: https://issues.apache.org/jira/browse/HIVE-26310
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The two classes under 
https://github.com/apache/hive/tree/master/testutils/src/java/org/apache/hive/testutils/junit/runners
 namely:
* 
[ConcurrentTestRunner|https://github.com/apache/hive/blob/fe0f1a648b14cdf27edcf7a5d323cbd060104ebf/testutils/src/java/org/apache/hive/testutils/junit/runners/ConcurrentTestRunner.java]
* 
[ConcurrentScheduler|https://github.com/apache/hive/blob/fe0f1a648b14cdf27edcf7a5d323cbd060104ebf/testutils/src/java/org/apache/hive/testutils/junit/runners/model/ConcurrentScheduler.java]

have been introduced a long time ago by HIVE-2935 to somewhat parallelize 
execution for {{TestBeeLineDriver}}.

However, since HIVE-1 (resolved 6 years ago) they are not used by anyone 
and unlikely to be used again in the future since there are much more modern 
alternatives.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)