[jira] [Updated] (HIVE-27847) Prevent query Failures on Numeric <-> Timestamp

2024-06-04 Thread Shohei Okumiya (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shohei Okumiya updated HIVE-27847:
--
Status: Patch Available  (was: In Progress)

>  Prevent query Failures on Numeric <-> Timestamp
> 
>
> Key: HIVE-27847
> URL: https://issues.apache.org/jira/browse/HIVE-27847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2, 4.0.0-alpha-1
> Environment: master
> 4.0.0-alpha-1
>Reporter: Basapuram Kumar
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Attachments: HIVE-27847.patch
>
>
> In Master/4.0.0-alpha-1 branches, performing the Numeric to Timestamp 
> conversion, its failing with the error as 
> "{color:#de350b}org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited 
> (hive.strict.timestamp.conversion){color}" .
>  
> *Repro steps.*
>  # Sample data
> {noformat}
> $ hdfs dfs -cat /tmp/tc/t.csv
> 1653209895687,2022-05-22T15:58:15.931+07:00
> 1653209938316,2022-05-22T15:58:58.490+07:00
> 1653209962021,2022-05-22T15:59:22.191+07:00
> 1653210021993,2022-05-22T16:00:22.174+07:00
> 1653209890524,2022-05-22T15:58:10.724+07:00
> 1653210095382,2022-05-22T16:01:35.775+07:00
> 1653210044308,2022-05-22T16:00:44.683+07:00
> 1653210098546,2022-05-22T16:01:38.886+07:00
> 1653210012220,2022-05-22T16:00:12.394+07:00
> 165321376,2022-05-22T16:00:00.622+07:00{noformat}
>  # table with above data [1]
> {noformat}
> create external table   test_ts_conv(begin string, ts string) row format 
> delimited fields terminated by ',' stored as TEXTFILE LOCATION '/tmp/tc/';
> desc   test_ts_conv;
> | col_name  | data_type  | comment  |
> +---++--+
> | begin     | string     |          |
> | ts        | string     |          |
> +---++--+{noformat}
>  #  Create table with CTAS
> {noformat}
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=true  |
> +-+
> set to false
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion=false;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=false  |
> +-+
> #Query:
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> 
> CREATE TABLE t_date 
> AS 
> select
>   CAST( CAST( `begin` AS BIGINT) / 1000  AS TIMESTAMP ) `begin`, 
>   CAST( 
> DATE_FORMAT(CAST(regexp_replace(`ts`,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})\\+(\\d{2}):(\\d{2})','$1-$2-$3
>  $4:$5:$6.$7') AS TIMESTAMP ),'MMdd') as BIGINT ) `par_key`
> FROM    test_ts_conv;{noformat}
> Error:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149)
>     at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>     at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:508)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314)
>     ... 17 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27847) Prevent query Failures on Numeric <-> Timestamp

2024-06-04 Thread Shohei Okumiya (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852234#comment-17852234
 ] 

Shohei Okumiya commented on HIVE-27847:
---

I took it over based on the discussion in #4851, and this is the new PR.

https://github.com/apache/hive/pull/5278

>  Prevent query Failures on Numeric <-> Timestamp
> 
>
> Key: HIVE-27847
> URL: https://issues.apache.org/jira/browse/HIVE-27847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
> Environment: master
> 4.0.0-alpha-1
>Reporter: Basapuram Kumar
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Attachments: HIVE-27847.patch
>
>
> In Master/4.0.0-alpha-1 branches, performing the Numeric to Timestamp 
> conversion, its failing with the error as 
> "{color:#de350b}org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited 
> (hive.strict.timestamp.conversion){color}" .
>  
> *Repro steps.*
>  # Sample data
> {noformat}
> $ hdfs dfs -cat /tmp/tc/t.csv
> 1653209895687,2022-05-22T15:58:15.931+07:00
> 1653209938316,2022-05-22T15:58:58.490+07:00
> 1653209962021,2022-05-22T15:59:22.191+07:00
> 1653210021993,2022-05-22T16:00:22.174+07:00
> 1653209890524,2022-05-22T15:58:10.724+07:00
> 1653210095382,2022-05-22T16:01:35.775+07:00
> 1653210044308,2022-05-22T16:00:44.683+07:00
> 1653210098546,2022-05-22T16:01:38.886+07:00
> 1653210012220,2022-05-22T16:00:12.394+07:00
> 165321376,2022-05-22T16:00:00.622+07:00{noformat}
>  # table with above data [1]
> {noformat}
> create external table   test_ts_conv(begin string, ts string) row format 
> delimited fields terminated by ',' stored as TEXTFILE LOCATION '/tmp/tc/';
> desc   test_ts_conv;
> | col_name  | data_type  | comment  |
> +---++--+
> | begin     | string     |          |
> | ts        | string     |          |
> +---++--+{noformat}
>  #  Create table with CTAS
> {noformat}
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=true  |
> +-+
> set to false
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion=false;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=false  |
> +-+
> #Query:
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> 
> CREATE TABLE t_date 
> AS 
> select
>   CAST( CAST( `begin` AS BIGINT) / 1000  AS TIMESTAMP ) `begin`, 
>   CAST( 
> DATE_FORMAT(CAST(regexp_replace(`ts`,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})\\+(\\d{2}):(\\d{2})','$1-$2-$3
>  $4:$5:$6.$7') AS TIMESTAMP ),'MMdd') as BIGINT ) `par_key`
> FROM    test_ts_conv;{noformat}
> Error:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149)
>     at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>     at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:508)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314)
>     ... 17 more {code}



--
This message was sent by Atlassian Jira

[jira] [Assigned] (HIVE-27847) Prevent query Failures on Numeric <-> Timestamp

2024-06-04 Thread Shohei Okumiya (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shohei Okumiya reassigned HIVE-27847:
-

Assignee: Shohei Okumiya  (was: Basapuram Kumar)

>  Prevent query Failures on Numeric <-> Timestamp
> 
>
> Key: HIVE-27847
> URL: https://issues.apache.org/jira/browse/HIVE-27847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
> Environment: master
> 4.0.0-alpha-1
>Reporter: Basapuram Kumar
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Attachments: HIVE-27847.patch
>
>
> In Master/4.0.0-alpha-1 branches, performing the Numeric to Timestamp 
> conversion, its failing with the error as 
> "{color:#de350b}org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited 
> (hive.strict.timestamp.conversion){color}" .
>  
> *Repro steps.*
>  # Sample data
> {noformat}
> $ hdfs dfs -cat /tmp/tc/t.csv
> 1653209895687,2022-05-22T15:58:15.931+07:00
> 1653209938316,2022-05-22T15:58:58.490+07:00
> 1653209962021,2022-05-22T15:59:22.191+07:00
> 1653210021993,2022-05-22T16:00:22.174+07:00
> 1653209890524,2022-05-22T15:58:10.724+07:00
> 1653210095382,2022-05-22T16:01:35.775+07:00
> 1653210044308,2022-05-22T16:00:44.683+07:00
> 1653210098546,2022-05-22T16:01:38.886+07:00
> 1653210012220,2022-05-22T16:00:12.394+07:00
> 165321376,2022-05-22T16:00:00.622+07:00{noformat}
>  # table with above data [1]
> {noformat}
> create external table   test_ts_conv(begin string, ts string) row format 
> delimited fields terminated by ',' stored as TEXTFILE LOCATION '/tmp/tc/';
> desc   test_ts_conv;
> | col_name  | data_type  | comment  |
> +---++--+
> | begin     | string     |          |
> | ts        | string     |          |
> +---++--+{noformat}
>  #  Create table with CTAS
> {noformat}
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=true  |
> +-+
> set to false
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion=false;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=false  |
> +-+
> #Query:
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> 
> CREATE TABLE t_date 
> AS 
> select
>   CAST( CAST( `begin` AS BIGINT) / 1000  AS TIMESTAMP ) `begin`, 
>   CAST( 
> DATE_FORMAT(CAST(regexp_replace(`ts`,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})\\+(\\d{2}):(\\d{2})','$1-$2-$3
>  $4:$5:$6.$7') AS TIMESTAMP ),'MMdd') as BIGINT ) `par_key`
> FROM    test_ts_conv;{noformat}
> Error:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149)
>     at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>     at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:508)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314)
>     ... 17 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28294) drop database cascade operation can skip client side filtering while fetching tables in db

2024-06-04 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-28294:
-
Description: 
Drop database cascade operation fetches all tables in the DB, while doing so we 
perform client-side filtering on the tables. We can avoid client-side filtering 
as we anyway authorize on the tables in the DB for the drop operation.

Also, we need to add functions in the database for authorization before 
dropping the database.

  was:Drop database cascade operation fetches all tables in the DB, while doing 
so we perform client-side filtering on the tables. We can avoid client-side 
filtering as we anyway authorize on the tables in the DB for the drop operation.


> drop database cascade operation can skip client side filtering while fetching 
> tables in db
> --
>
> Key: HIVE-28294
> URL: https://issues.apache.org/jira/browse/HIVE-28294
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>
> Drop database cascade operation fetches all tables in the DB, while doing so 
> we perform client-side filtering on the tables. We can avoid client-side 
> filtering as we anyway authorize on the tables in the DB for the drop 
> operation.
> Also, we need to add functions in the database for authorization before 
> dropping the database.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28301) Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3

2024-06-04 Thread Sercan Tekin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sercan Tekin updated HIVE-28301:

Description: 
*Create a table and insert data into it:*
{code:java}
CREATE TABLE tbl_1 (col_1 STRING);
INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
{code}

*Submit the below query:*
{code:java}
SELECT DISTINCT (
CASE
WHEN col_1 = "G" THEN "Value_1"
WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
ELSE "Value_3"
END) AS G
FROM tbl_1;
{code}

*Actual result:*
{code:java}
3alue_1
Value_1
Value_3
nValue_
{code}

*Expected result (This is what Hive-2.3 returns):*
{code:java}
Value_1
Value_3
{code}

*Workaround:*
Either disabling vectorization;
{code:java}
SET hive.vectorized.execution.enabled=false;
{code}

Or reverting https://issues.apache.org/jira/browse/HIVE-16731.
CC: [~teddy.choi] and [~mmccline] as HIVE-16731 was reported and fixed by you 
guys.

  was:
*Create a table and insert data into it:*
{code:java}
CREATE TABLE tbl_1 (col_1 STRING);
INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
{code}

*Submit the below query:*
{code:java}
SELECT DISTINCT (
CASE
WHEN col_1 = "G" THEN "Value_1"
WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
ELSE "Value_3"
END) AS G
FROM tbl_1;
{code}

*Actual result:*
{code:java}
3alue_1
Value_1
Value_3
nValue_
{code}

*Expected result (This is what Hive-2.3 returns):*
{code:java}
Value_1
Value_3
{code}

*Workaround:*
Either disabling vectorization;
{code:java}
SET hive.vectorized.execution.enabled=false;
{code}

Or reverting https://issues.apache.org/jira/browse/HIVE-16731.


> Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3
> ---
>
> Key: HIVE-28301
> URL: https://issues.apache.org/jira/browse/HIVE-28301
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.3
>Reporter: Sercan Tekin
>Priority: Critical
>
> *Create a table and insert data into it:*
> {code:java}
> CREATE TABLE tbl_1 (col_1 STRING);
> INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
> {code}
> *Submit the below query:*
> {code:java}
> SELECT DISTINCT (
> CASE
> WHEN col_1 = "G" THEN "Value_1"
> WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
> ELSE "Value_3"
> END) AS G
> FROM tbl_1;
> {code}
> *Actual result:*
> {code:java}
> 3alue_1
> Value_1
> Value_3
> nValue_
> {code}
> *Expected result (This is what Hive-2.3 returns):*
> {code:java}
> Value_1
> Value_3
> {code}
> *Workaround:*
> Either disabling vectorization;
> {code:java}
> SET hive.vectorized.execution.enabled=false;
> {code}
> Or reverting https://issues.apache.org/jira/browse/HIVE-16731.
> CC: [~teddy.choi] and [~mmccline] as HIVE-16731 was reported and fixed by you 
> guys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28301) Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3

2024-06-04 Thread Sercan Tekin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sercan Tekin updated HIVE-28301:

Description: 
*Create a table and insert data into it:*
{code:java}
CREATE TABLE tbl_1 (col_1 STRING);
INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
{code}

*Submit the below query:*
{code:java}
SELECT DISTINCT (
CASE
WHEN col_1 = "G" THEN "Value_1"
WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
ELSE "Value_3"
END) AS G
FROM tbl_1;
{code}

*Actual result:*
{code:java}
3alue_1
Value_1
Value_3
nValue_
{code}

*Expected result (This is what Hive-2.3 returns):*
{code:java}
Value_1
Value_3
{code}

*Workaround:*
Either disabling vectorization;
{code:java}
SET hive.vectorized.execution.enabled=false;
{code}

Or reverting https://issues.apache.org/jira/browse/HIVE-16731.

  was:
*STEPS TO REPRODUCE:*

Create a table and insert data into it:
{code:java}
CREATE TABLE tbl_1 (col_1 STRING);
INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
{code}

Submit the below query:
{code:java}
SELECT DISTINCT (
CASE
WHEN col_1 = "G" THEN "Value_1"
WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
ELSE "Value_3"
END) AS G
FROM tbl_1;
{code}

Actual result:
{code:java}
3alue_1
Value_1
Value_3
nValue_
{code}

Expected result (This is what Hive-2.3 returns):
{code:java}
Value_1
Value_3
{code}

Workaround:
Either disabling 
{code:java}
SET hive.vectorized.execution.enabled=false;
{code}

Or reverting https://issues.apache.org/jira/browse/HIVE-16731.


> Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3
> ---
>
> Key: HIVE-28301
> URL: https://issues.apache.org/jira/browse/HIVE-28301
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.3
>Reporter: Sercan Tekin
>Priority: Critical
>
> *Create a table and insert data into it:*
> {code:java}
> CREATE TABLE tbl_1 (col_1 STRING);
> INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
> {code}
> *Submit the below query:*
> {code:java}
> SELECT DISTINCT (
> CASE
> WHEN col_1 = "G" THEN "Value_1"
> WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
> ELSE "Value_3"
> END) AS G
> FROM tbl_1;
> {code}
> *Actual result:*
> {code:java}
> 3alue_1
> Value_1
> Value_3
> nValue_
> {code}
> *Expected result (This is what Hive-2.3 returns):*
> {code:java}
> Value_1
> Value_3
> {code}
> *Workaround:*
> Either disabling vectorization;
> {code:java}
> SET hive.vectorized.execution.enabled=false;
> {code}
> Or reverting https://issues.apache.org/jira/browse/HIVE-16731.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28301) Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3

2024-06-04 Thread Sercan Tekin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sercan Tekin updated HIVE-28301:

Description: 
*STEPS TO REPRODUCE:*

Create a table and insert data into it:
{code:java}
CREATE TABLE tbl_1 (col_1 STRING);
INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
{code}

Submit the below query:
{code:java}
SELECT DISTINCT (
CASE
WHEN col_1 = "G" THEN "Value_1"
WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
ELSE "Value_3"
END) AS G
FROM tbl_1;
{code}

Actual result:
{code:java}
3alue_1
Value_1
Value_3
nValue_
{code}

Expected result (This is what Hive-2.3 returns):
{code:java}
Value_1
Value_3
{code}

Workaround:
Either disabling 
{code:java}
SET hive.vectorized.execution.enabled=false;
{code}

Or reverting https://issues.apache.org/jira/browse/HIVE-16731.

  was:
*STEPS TO REPRODUCE:*

Create a table and insert data into it:
{code:java}
CREATE TABLE tbl_1 (col_1 STRING);
INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
{code}

Submit the below query:
{code:java}
SELECT DISTINCT (
CASE
WHEN col_1 = "G" THEN "Value_1"
WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
ELSE "Value_3"
END) AS G
FROM tbl_1;
{code}

Actual result:
{code:java}
3alue_1
Value_1
Value_3
nValue_
{code}

Expected result (This is what Hive-2.3 returns):
{code:java}
Value_1
Value_3
{code}

Workaround:
Either:
{code:java}
SET hive.vectorized.execution.enabled=false;
{code}

Or reverting https://issues.apache.org/jira/browse/HIVE-16731.


> Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3
> ---
>
> Key: HIVE-28301
> URL: https://issues.apache.org/jira/browse/HIVE-28301
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.3
>Reporter: Sercan Tekin
>Priority: Critical
>
> *STEPS TO REPRODUCE:*
> Create a table and insert data into it:
> {code:java}
> CREATE TABLE tbl_1 (col_1 STRING);
> INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
> {code}
> Submit the below query:
> {code:java}
> SELECT DISTINCT (
> CASE
> WHEN col_1 = "G" THEN "Value_1"
> WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
> ELSE "Value_3"
> END) AS G
> FROM tbl_1;
> {code}
> Actual result:
> {code:java}
> 3alue_1
> Value_1
> Value_3
> nValue_
> {code}
> Expected result (This is what Hive-2.3 returns):
> {code:java}
> Value_1
> Value_3
> {code}
> Workaround:
> Either disabling 
> {code:java}
> SET hive.vectorized.execution.enabled=false;
> {code}
> Or reverting https://issues.apache.org/jira/browse/HIVE-16731.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28301) Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3

2024-06-04 Thread Sercan Tekin (Jira)
Sercan Tekin created HIVE-28301:
---

 Summary: Vectorization: CASE WHEN Returns Wrong Result in 
Hive-3.1.3
 Key: HIVE-28301
 URL: https://issues.apache.org/jira/browse/HIVE-28301
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 3.1.3
Reporter: Sercan Tekin


*STEPS TO REPRODUCE:*

Create a table and insert data into it:
{code:java}
CREATE TABLE tbl_1 (col_1 STRING);
INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G');
{code}

Submit the below query:
{code:java}
SELECT DISTINCT (
CASE
WHEN col_1 = "G" THEN "Value_1"
WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2"
ELSE "Value_3"
END) AS G
FROM tbl_1;
{code}

Actual result:
{code:java}
3alue_1
Value_1
Value_3
nValue_
{code}

Expected result (This is what Hive-2.3 returns):
{code:java}
Value_1
Value_3
{code}

Workaround:
Either:
{code:java}
SET hive.vectorized.execution.enabled=false;
{code}

Or reverting https://issues.apache.org/jira/browse/HIVE-16731.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26893) Extend batch partition APIs to ignore partition schemas

2024-06-04 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-26893.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

[~hemanth619] Closing this jira as the fix has been merged

> Extend batch partition APIs to ignore partition schemas
> ---
>
> Key: HIVE-26893
> URL: https://issues.apache.org/jira/browse/HIVE-26893
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Quanlong Huang
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> There are several HMS APIs that return a list of partitions, e.g. 
> get_partitions_ps(), get_partitions_by_names(), add_partitions_req() with 
> needResult=true, etc. Each partition instance will have a unique list of 
> FieldSchemas as the partition schema:
> {code:java}
> org.apache.hadoop.hive.metastore.api.Partition
> -> org.apache.hadoop.hive.metastore.api.StorageDescriptor
>->  cols: list {code}
> This could occupy a large memory footprint for wide tables (e.g. with 2k 
> cols). See the heap histogram in IMPALA-11812 as an example.
> Some engines like Impala doesn't actually use/respect the partition level 
> schema. It's a waste of network/serde resource to transmit them. It'd be nice 
> if these APIs provide an optional boolean flag for ignoring partition 
> schemas. So HMS clients (e.g. Impala) don't need to clear them later (to save 
> mem).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28244) Add SBOM for storage-api and standalone-metastore modules

2024-06-04 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852076#comment-17852076
 ] 

Denys Kuzmenko commented on HIVE-28244:
---

Merged to master
[~Aggarwal_Raghav], thanks for the patch!

> Add SBOM for storage-api and standalone-metastore modules
> -
>
> Key: HIVE-28244
> URL: https://issues.apache.org/jira/browse/HIVE-28244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> -Pdist profile doesn't work for storage-api/pom.xml and 
> standalone-metastore/pom.xml for creating SBOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28244) Add SBOM for storage-api and standalone-metastore modules

2024-06-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28244.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Add SBOM for storage-api and standalone-metastore modules
> -
>
> Key: HIVE-28244
> URL: https://issues.apache.org/jira/browse/HIVE-28244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> -Pdist profile doesn't work for storage-api/pom.xml and 
> standalone-metastore/pom.xml for creating SBOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28238) Open Hive transaction only for ACID resources

2024-06-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28238.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Open Hive transaction only for ACID resources
> -
>
> Key: HIVE-28238
> URL: https://issues.apache.org/jira/browse/HIVE-28238
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28238) Open Hive transaction only for ACID resources

2024-06-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-28238:
-

Assignee: Denys Kuzmenko

> Open Hive transaction only for ACID resources
> -
>
> Key: HIVE-28238
> URL: https://issues.apache.org/jira/browse/HIVE-28238
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28238) Open Hive transaction only for ACID resources

2024-06-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28238:
--
Summary: Open Hive transaction only for ACID resources  (was: Open Hive 
ACID txn only for transactional resources)

> Open Hive transaction only for ACID resources
> -
>
> Key: HIVE-28238
> URL: https://issues.apache.org/jira/browse/HIVE-28238
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28238) Open Hive transaction only for ACID resources

2024-06-04 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852033#comment-17852033
 ] 

Denys Kuzmenko commented on HIVE-28238:
---

Merged to master
Thanks [~kkasa] for the review!

> Open Hive transaction only for ACID resources
> -
>
> Key: HIVE-28238
> URL: https://issues.apache.org/jira/browse/HIVE-28238
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28300) ALTER TABLE CONCATENATE on a List Bucketing Table fails when using Tez.

2024-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28300:
--
Labels: pull-request-available  (was: )

> ALTER TABLE CONCATENATE on a List Bucketing Table fails when using Tez.
> ---
>
> Key: HIVE-28300
> URL: https://issues.apache.org/jira/browse/HIVE-28300
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
>
> Running list_bucket_dml_8.q using TestMiniLlapLocalCliDriver fails with the 
> following error message:
> {code:java}
> org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException: Vertex failed, 
> vertexName=File Merge, vertexId=vertex_1717492217780_0001_4_00, 
> diagnostics=[Task failed, taskId=task_1717492217780_0001_4_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Node: ### : Error while 
> running task ( failure ) : 
> attempt_1717492217780_0001_4_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
>  NOT EQUAL TO 
> file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> Multiple partitions for one merge mapper: 
> file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
>  NOT EQUAL TO 
> file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:220)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 16 more
> {code}
> This is a Hive-Tez problem which happens when Hive handles ALTER TABLE 
> CONCATENATE command on a List Bucketing table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28300) ALTER TABLE CONCATENATE on a List Bucketing Table fails when using Tez.

2024-06-04 Thread Seonggon Namgung (Jira)
Seonggon Namgung created HIVE-28300:
---

 Summary: ALTER TABLE CONCATENATE on a List Bucketing Table fails 
when using Tez.
 Key: HIVE-28300
 URL: https://issues.apache.org/jira/browse/HIVE-28300
 Project: Hive
  Issue Type: Bug
Reporter: Seonggon Namgung
Assignee: Seonggon Namgung


Running list_bucket_dml_8.q using TestMiniLlapLocalCliDriver fails with the 
following error message:
{code:java}
org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException: Vertex failed, 
vertexName=File Merge, vertexId=vertex_1717492217780_0001_4_00, 
diagnostics=[Task failed, taskId=task_1717492217780_0001_4_00_00, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Node: ### : Error while running 
task ( failure ) : 
attempt_1717492217780_0001_4_00_00_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: Multiple partitions for one merge mapper: 
file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
 NOT EQUAL TO 
file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at 
org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple 
partitions for one merge mapper: 
file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
 NOT EQUAL TO 
file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484
at 
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:220)
at 
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:153)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
... 16 more

{code}
This is a Hive-Tez problem which happens when Hive handles ALTER TABLE 
CONCATENATE command on a List Bucketing table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly

2024-06-04 Thread Sungwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851965#comment-17851965
 ] 

Sungwoo Park commented on HIVE-24207:
-

Seonggon created HIVE-28281 to report the problem in case 1.

For case 2, it's hard to reproduce the problem, but the bug seems obvious 
because two speculative task attempts are not supposed to update a common 
counter for the same LimitOperator.

> LimitOperator can leverage ObjectCache to bail out quickly
> --
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {noformat}
> select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in 
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk 
> limit 100;
>  select distinct ss_sold_date_sk from store_sales, date_dim where 
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = 
> date_dim.d_date_sk limit 100;
>  {noformat}
> Queries like the above generate a large number of map tasks. Currently they 
> don't bail out after generating enough amount of data. 
> It would be good to make use of ObjectCache & retain the number of records 
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks 
> in the operator's init phase itself. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28091) Remove invalid long datatype in ColumnStatsUpdateTask

2024-06-04 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang resolved HIVE-28091.

Fix Version/s: 4.1.0
   Resolution: Fixed

Merged into master branch.

Thanks [~dkuzmenko] for the review!!!

> Remove invalid long datatype in ColumnStatsUpdateTask
> -
>
> Key: HIVE-28091
> URL: https://issues.apache.org/jira/browse/HIVE-28091
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java#L104]
> {code:java}
>     if (columnType.equalsIgnoreCase("long") || 
> columnType.equalsIgnoreCase("tinyint")
>         || columnType.equalsIgnoreCase("smallint") || 
> columnType.equalsIgnoreCase("int")
>         || columnType.equalsIgnoreCase("bigint")) {
>       LongColumnStatsDataInspector longStats = new 
> LongColumnStatsDataInspector(); {code}
> IMO, Hive column does not support long data type. We should remove the 
> incorrect data type in ColumnStatsUpdateTask.
>  
> In addition, the column stats related code blocks should be consistent with 
> code in StatObjectConverter.java, which also does not have long type.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java#L378]
> {code:java}
>     } else if (colType.equals("bigint") || colType.equals("int") ||
>         colType.equals("smallint") || colType.equals("tinyint")) { {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28254) CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results

2024-06-04 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851927#comment-17851927
 ] 

Krisztian Kasa commented on HIVE-28254:
---

Merged to master. Thanks [~okumin] for the fix.

> CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results
> ---
>
> Key: HIVE-28254
> URL: https://issues.apache.org/jira/browse/HIVE-28254
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
>
> CBO return path can build incorrect GroupByOperator when multiple 
> aggregations with DISTINCT are involved.
> This is an example.
> {code:java}
> CREATE TABLE test (col1 INT, col2 INT);
> INSERT INTO test VALUES (1, 100), (2, 200), (2, 200), (3, 300);
> set hive.cbo.returnpath.hiveop=true;
> set hive.map.aggr=false;
> SELECT
>   SUM(DISTINCT col1),
>   COUNT(DISTINCT col1),
>   SUM(DISTINCT col2),
>   SUM(col2)
> FROM test;{code}
> The last column should be 800. But the SUM refers to col1 and the actual 
> result is 8.
> {code:java}
> +--+--+--+--+
> | _c0  | _c1  | _c2  | _c3  |
> +--+--+--+--+
> | 6    | 3    | 600  | 8    |
> +--+--+--+--+ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28254) CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results

2024-06-04 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-28254:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results
> ---
>
> Key: HIVE-28254
> URL: https://issues.apache.org/jira/browse/HIVE-28254
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> CBO return path can build incorrect GroupByOperator when multiple 
> aggregations with DISTINCT are involved.
> This is an example.
> {code:java}
> CREATE TABLE test (col1 INT, col2 INT);
> INSERT INTO test VALUES (1, 100), (2, 200), (2, 200), (3, 300);
> set hive.cbo.returnpath.hiveop=true;
> set hive.map.aggr=false;
> SELECT
>   SUM(DISTINCT col1),
>   COUNT(DISTINCT col1),
>   SUM(DISTINCT col2),
>   SUM(col2)
> FROM test;{code}
> The last column should be 800. But the SUM refers to col1 and the actual 
> result is 8.
> {code:java}
> +--+--+--+--+
> | _c0  | _c1  | _c2  | _c3  |
> +--+--+--+--+
> | 6    | 3    | 600  | 8    |
> +--+--+--+--+ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28299) Iceberg: Optimize show partitions through column projection

2024-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28299:
--
Labels: pull-request-available  (was: )

> Iceberg: Optimize show partitions through column projection
> ---
>
> Key: HIVE-28299
> URL: https://issues.apache.org/jira/browse/HIVE-28299
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Minor
>  Labels: pull-request-available
>
> In the current *show partitions* implementation, we need to fetch all columns 
> data, but in fact we only need two columns data, *partition* & {*}spec_id{*}.
> We can only fetch the two columns through column project, and this can 
> improve the performance in case of big iceberg partition table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28299) Iceberg: Optimize show partitions through column projection

2024-06-04 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang reassigned HIVE-28299:
--

Assignee: Butao Zhang

> Iceberg: Optimize show partitions through column projection
> ---
>
> Key: HIVE-28299
> URL: https://issues.apache.org/jira/browse/HIVE-28299
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Minor
>
> In the current *show partitions* implementation, we need to fetch all columns 
> data, but in fact we only need two columns data, *partition* & {*}spec_id{*}.
> We can only fetch the two columns through column project, and this can 
> improve the performance in case of big iceberg partition table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28299) Iceberg: Optimize show partitions through column projection

2024-06-04 Thread Butao Zhang (Jira)
Butao Zhang created HIVE-28299:
--

 Summary: Iceberg: Optimize show partitions through column 
projection
 Key: HIVE-28299
 URL: https://issues.apache.org/jira/browse/HIVE-28299
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Butao Zhang


In the current *show partitions* implementation, we need to fetch all columns 
data, but in fact we only need two columns data, *partition* & {*}spec_id{*}.

We can only fetch the two columns through column project, and this can improve 
the performance in case of big iceberg partition table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28298) TestQueryShutdownHooks to run on Tez

2024-06-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-28298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28298:

Description: 
this test got stuck while running in the scope of HIVE-27972, need to check

jstack showed stacks like:
{code}
"HiveServer2-Background-Pool: Thread-155" #155 prio=5 os_prio=31 
tid=0x7fa59f5e4800 nid=0x8027 waiting for monitor entry [0x700010ace000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hive.ql.exec.tez.TezTask$SyncDagClient.tryKillDAG(TezTask.java:870)
- waiting to lock <0x0007be5c98f8> (a 
org.apache.tez.dag.api.client.DAGClientImplLocal)
at 
org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor.monitorExecution(TezJobMonitor.java:278)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:271)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354)
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327)
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244)
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)




"f7b8e319-d882-4734-b036-1bb2da4b2783 main" #1 prio=5 os_prio=31 
tid=0x7fa60e80b800 nid=0x1b03 waiting on condition [0x7e32a000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007be3d0578> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at 
org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:982)
at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
at org.apache.tez.client.LocalClient$2.apply(LocalClient.java:441)
at org.apache.tez.client.LocalClient$2.apply(LocalClient.java:437)
at 
org.apache.tez.dag.api.client.DAGClientImplLocal.getDAGStatusInternal(DAGClientImplLocal.java:53)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:232)
at 
org.apache.tez.dag.api.client.DAGClientImpl._waitForCompletionWithStatusUpdates(DAGClientImpl.java:583)
at 
org.apache.tez.dag.api.client.DAGClientImpl.waitForCompletion(DAGClientImpl.java:375)
at 
org.apache.hadoop.hive.ql.exec.tez.TezTask$SyncDagClient.waitForCompletion(TezTask.java:877)
- locked <0x0007be5c98f8> (a 
org.apache.tez.dag.api.client.DAGClientImplLocal)
at 
org.apache.hadoop.hive.ql.exec.tez.TezTask.closeDagClientOnCancellation(TezTask.java:432)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.shutdown(TezTask.java:805)
at org.apache.hadoop.hive.ql.TaskQueue.shutdown(TaskQueue.java:138)
- locked <0x0007be2a20b8> (a org.apache.hadoop.hive.ql.TaskQueue)
at org.apache.hadoop.hive.ql.Driver.releaseTaskQueue(Driver.java:802)
at org.apache.hadoop.hive.ql.Driver.close(Driver.java:779)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.close(ReExecDriver.java:268)
at 
org.apache.hive.service.cli.operation.SQLOperation.cleanup(SQLOperation.java:409)
- locked <0x0007be5994c0> (a 

[jira] [Created] (HIVE-28298) TestQueryShutdownHooks to run on Tez

2024-06-04 Thread Jira
László Bodor created HIVE-28298:
---

 Summary: TestQueryShutdownHooks to run on Tez
 Key: HIVE-28298
 URL: https://issues.apache.org/jira/browse/HIVE-28298
 Project: Hive
  Issue Type: Sub-task
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)