[jira] [Updated] (HIVE-27847) Prevent query Failures on Numeric <-> Timestamp
[ https://issues.apache.org/jira/browse/HIVE-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shohei Okumiya updated HIVE-27847: -- Status: Patch Available (was: In Progress) > Prevent query Failures on Numeric <-> Timestamp > > > Key: HIVE-27847 > URL: https://issues.apache.org/jira/browse/HIVE-27847 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0-alpha-2, 4.0.0-alpha-1 > Environment: master > 4.0.0-alpha-1 >Reporter: Basapuram Kumar >Assignee: Shohei Okumiya >Priority: Major > Labels: hive-4.0.1-must, pull-request-available > Attachments: HIVE-27847.patch > > > In Master/4.0.0-alpha-1 branches, performing the Numeric to Timestamp > conversion, its failing with the error as > "{color:#de350b}org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting > NUMERIC types to TIMESTAMP is prohibited > (hive.strict.timestamp.conversion){color}" . > > *Repro steps.* > # Sample data > {noformat} > $ hdfs dfs -cat /tmp/tc/t.csv > 1653209895687,2022-05-22T15:58:15.931+07:00 > 1653209938316,2022-05-22T15:58:58.490+07:00 > 1653209962021,2022-05-22T15:59:22.191+07:00 > 1653210021993,2022-05-22T16:00:22.174+07:00 > 1653209890524,2022-05-22T15:58:10.724+07:00 > 1653210095382,2022-05-22T16:01:35.775+07:00 > 1653210044308,2022-05-22T16:00:44.683+07:00 > 1653210098546,2022-05-22T16:01:38.886+07:00 > 1653210012220,2022-05-22T16:00:12.394+07:00 > 165321376,2022-05-22T16:00:00.622+07:00{noformat} > # table with above data [1] > {noformat} > create external table test_ts_conv(begin string, ts string) row format > delimited fields terminated by ',' stored as TEXTFILE LOCATION '/tmp/tc/'; > desc test_ts_conv; > | col_name | data_type | comment | > +---++--+ > | begin | string | | > | ts | string | | > +---++--+{noformat} > # Create table with CTAS > {noformat} > 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set > hive.strict.timestamp.conversion; > +-+ > | set | > +-+ > | hive.strict.timestamp.conversion=true | > +-+ > set to false > 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set > hive.strict.timestamp.conversion=false; > +-+ > | set | > +-+ > | hive.strict.timestamp.conversion=false | > +-+ > #Query: > 0: jdbc:hive2://char1000.sre.iti.acceldata.de> > CREATE TABLE t_date > AS > select > CAST( CAST( `begin` AS BIGINT) / 1000 AS TIMESTAMP ) `begin`, > CAST( > DATE_FORMAT(CAST(regexp_replace(`ts`,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})\\+(\\d{2}):(\\d{2})','$1-$2-$3 > $4:$5:$6.$7') AS TIMESTAMP ),'MMdd') as BIGINT ) `par_key` > FROM test_ts_conv;{noformat} > Error: > {code:java} > Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting > NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:508) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314) > ... 17 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27847) Prevent query Failures on Numeric <-> Timestamp
[ https://issues.apache.org/jira/browse/HIVE-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852234#comment-17852234 ] Shohei Okumiya commented on HIVE-27847: --- I took it over based on the discussion in #4851, and this is the new PR. https://github.com/apache/hive/pull/5278 > Prevent query Failures on Numeric <-> Timestamp > > > Key: HIVE-27847 > URL: https://issues.apache.org/jira/browse/HIVE-27847 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2 > Environment: master > 4.0.0-alpha-1 >Reporter: Basapuram Kumar >Assignee: Shohei Okumiya >Priority: Major > Labels: hive-4.0.1-must, pull-request-available > Attachments: HIVE-27847.patch > > > In Master/4.0.0-alpha-1 branches, performing the Numeric to Timestamp > conversion, its failing with the error as > "{color:#de350b}org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting > NUMERIC types to TIMESTAMP is prohibited > (hive.strict.timestamp.conversion){color}" . > > *Repro steps.* > # Sample data > {noformat} > $ hdfs dfs -cat /tmp/tc/t.csv > 1653209895687,2022-05-22T15:58:15.931+07:00 > 1653209938316,2022-05-22T15:58:58.490+07:00 > 1653209962021,2022-05-22T15:59:22.191+07:00 > 1653210021993,2022-05-22T16:00:22.174+07:00 > 1653209890524,2022-05-22T15:58:10.724+07:00 > 1653210095382,2022-05-22T16:01:35.775+07:00 > 1653210044308,2022-05-22T16:00:44.683+07:00 > 1653210098546,2022-05-22T16:01:38.886+07:00 > 1653210012220,2022-05-22T16:00:12.394+07:00 > 165321376,2022-05-22T16:00:00.622+07:00{noformat} > # table with above data [1] > {noformat} > create external table test_ts_conv(begin string, ts string) row format > delimited fields terminated by ',' stored as TEXTFILE LOCATION '/tmp/tc/'; > desc test_ts_conv; > | col_name | data_type | comment | > +---++--+ > | begin | string | | > | ts | string | | > +---++--+{noformat} > # Create table with CTAS > {noformat} > 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set > hive.strict.timestamp.conversion; > +-+ > | set | > +-+ > | hive.strict.timestamp.conversion=true | > +-+ > set to false > 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set > hive.strict.timestamp.conversion=false; > +-+ > | set | > +-+ > | hive.strict.timestamp.conversion=false | > +-+ > #Query: > 0: jdbc:hive2://char1000.sre.iti.acceldata.de> > CREATE TABLE t_date > AS > select > CAST( CAST( `begin` AS BIGINT) / 1000 AS TIMESTAMP ) `begin`, > CAST( > DATE_FORMAT(CAST(regexp_replace(`ts`,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})\\+(\\d{2}):(\\d{2})','$1-$2-$3 > $4:$5:$6.$7') AS TIMESTAMP ),'MMdd') as BIGINT ) `par_key` > FROM test_ts_conv;{noformat} > Error: > {code:java} > Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting > NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:508) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314) > ... 17 more {code} -- This message was sent by Atlassian Jira
[jira] [Assigned] (HIVE-27847) Prevent query Failures on Numeric <-> Timestamp
[ https://issues.apache.org/jira/browse/HIVE-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shohei Okumiya reassigned HIVE-27847: - Assignee: Shohei Okumiya (was: Basapuram Kumar) > Prevent query Failures on Numeric <-> Timestamp > > > Key: HIVE-27847 > URL: https://issues.apache.org/jira/browse/HIVE-27847 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2 > Environment: master > 4.0.0-alpha-1 >Reporter: Basapuram Kumar >Assignee: Shohei Okumiya >Priority: Major > Labels: hive-4.0.1-must, pull-request-available > Attachments: HIVE-27847.patch > > > In Master/4.0.0-alpha-1 branches, performing the Numeric to Timestamp > conversion, its failing with the error as > "{color:#de350b}org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting > NUMERIC types to TIMESTAMP is prohibited > (hive.strict.timestamp.conversion){color}" . > > *Repro steps.* > # Sample data > {noformat} > $ hdfs dfs -cat /tmp/tc/t.csv > 1653209895687,2022-05-22T15:58:15.931+07:00 > 1653209938316,2022-05-22T15:58:58.490+07:00 > 1653209962021,2022-05-22T15:59:22.191+07:00 > 1653210021993,2022-05-22T16:00:22.174+07:00 > 1653209890524,2022-05-22T15:58:10.724+07:00 > 1653210095382,2022-05-22T16:01:35.775+07:00 > 1653210044308,2022-05-22T16:00:44.683+07:00 > 1653210098546,2022-05-22T16:01:38.886+07:00 > 1653210012220,2022-05-22T16:00:12.394+07:00 > 165321376,2022-05-22T16:00:00.622+07:00{noformat} > # table with above data [1] > {noformat} > create external table test_ts_conv(begin string, ts string) row format > delimited fields terminated by ',' stored as TEXTFILE LOCATION '/tmp/tc/'; > desc test_ts_conv; > | col_name | data_type | comment | > +---++--+ > | begin | string | | > | ts | string | | > +---++--+{noformat} > # Create table with CTAS > {noformat} > 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set > hive.strict.timestamp.conversion; > +-+ > | set | > +-+ > | hive.strict.timestamp.conversion=true | > +-+ > set to false > 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set > hive.strict.timestamp.conversion=false; > +-+ > | set | > +-+ > | hive.strict.timestamp.conversion=false | > +-+ > #Query: > 0: jdbc:hive2://char1000.sre.iti.acceldata.de> > CREATE TABLE t_date > AS > select > CAST( CAST( `begin` AS BIGINT) / 1000 AS TIMESTAMP ) `begin`, > CAST( > DATE_FORMAT(CAST(regexp_replace(`ts`,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})\\+(\\d{2}):(\\d{2})','$1-$2-$3 > $4:$5:$6.$7') AS TIMESTAMP ),'MMdd') as BIGINT ) `par_key` > FROM test_ts_conv;{noformat} > Error: > {code:java} > Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting > NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:508) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314) > ... 17 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28294) drop database cascade operation can skip client side filtering while fetching tables in db
[ https://issues.apache.org/jira/browse/HIVE-28294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala updated HIVE-28294: - Description: Drop database cascade operation fetches all tables in the DB, while doing so we perform client-side filtering on the tables. We can avoid client-side filtering as we anyway authorize on the tables in the DB for the drop operation. Also, we need to add functions in the database for authorization before dropping the database. was:Drop database cascade operation fetches all tables in the DB, while doing so we perform client-side filtering on the tables. We can avoid client-side filtering as we anyway authorize on the tables in the DB for the drop operation. > drop database cascade operation can skip client side filtering while fetching > tables in db > -- > > Key: HIVE-28294 > URL: https://issues.apache.org/jira/browse/HIVE-28294 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > > Drop database cascade operation fetches all tables in the DB, while doing so > we perform client-side filtering on the tables. We can avoid client-side > filtering as we anyway authorize on the tables in the DB for the drop > operation. > Also, we need to add functions in the database for authorization before > dropping the database. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28301) Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3
[ https://issues.apache.org/jira/browse/HIVE-28301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sercan Tekin updated HIVE-28301: Description: *Create a table and insert data into it:* {code:java} CREATE TABLE tbl_1 (col_1 STRING); INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); {code} *Submit the below query:* {code:java} SELECT DISTINCT ( CASE WHEN col_1 = "G" THEN "Value_1" WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" ELSE "Value_3" END) AS G FROM tbl_1; {code} *Actual result:* {code:java} 3alue_1 Value_1 Value_3 nValue_ {code} *Expected result (This is what Hive-2.3 returns):* {code:java} Value_1 Value_3 {code} *Workaround:* Either disabling vectorization; {code:java} SET hive.vectorized.execution.enabled=false; {code} Or reverting https://issues.apache.org/jira/browse/HIVE-16731. CC: [~teddy.choi] and [~mmccline] as HIVE-16731 was reported and fixed by you guys. was: *Create a table and insert data into it:* {code:java} CREATE TABLE tbl_1 (col_1 STRING); INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); {code} *Submit the below query:* {code:java} SELECT DISTINCT ( CASE WHEN col_1 = "G" THEN "Value_1" WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" ELSE "Value_3" END) AS G FROM tbl_1; {code} *Actual result:* {code:java} 3alue_1 Value_1 Value_3 nValue_ {code} *Expected result (This is what Hive-2.3 returns):* {code:java} Value_1 Value_3 {code} *Workaround:* Either disabling vectorization; {code:java} SET hive.vectorized.execution.enabled=false; {code} Or reverting https://issues.apache.org/jira/browse/HIVE-16731. > Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3 > --- > > Key: HIVE-28301 > URL: https://issues.apache.org/jira/browse/HIVE-28301 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.1.3 >Reporter: Sercan Tekin >Priority: Critical > > *Create a table and insert data into it:* > {code:java} > CREATE TABLE tbl_1 (col_1 STRING); > INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); > {code} > *Submit the below query:* > {code:java} > SELECT DISTINCT ( > CASE > WHEN col_1 = "G" THEN "Value_1" > WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" > ELSE "Value_3" > END) AS G > FROM tbl_1; > {code} > *Actual result:* > {code:java} > 3alue_1 > Value_1 > Value_3 > nValue_ > {code} > *Expected result (This is what Hive-2.3 returns):* > {code:java} > Value_1 > Value_3 > {code} > *Workaround:* > Either disabling vectorization; > {code:java} > SET hive.vectorized.execution.enabled=false; > {code} > Or reverting https://issues.apache.org/jira/browse/HIVE-16731. > CC: [~teddy.choi] and [~mmccline] as HIVE-16731 was reported and fixed by you > guys. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28301) Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3
[ https://issues.apache.org/jira/browse/HIVE-28301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sercan Tekin updated HIVE-28301: Description: *Create a table and insert data into it:* {code:java} CREATE TABLE tbl_1 (col_1 STRING); INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); {code} *Submit the below query:* {code:java} SELECT DISTINCT ( CASE WHEN col_1 = "G" THEN "Value_1" WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" ELSE "Value_3" END) AS G FROM tbl_1; {code} *Actual result:* {code:java} 3alue_1 Value_1 Value_3 nValue_ {code} *Expected result (This is what Hive-2.3 returns):* {code:java} Value_1 Value_3 {code} *Workaround:* Either disabling vectorization; {code:java} SET hive.vectorized.execution.enabled=false; {code} Or reverting https://issues.apache.org/jira/browse/HIVE-16731. was: *STEPS TO REPRODUCE:* Create a table and insert data into it: {code:java} CREATE TABLE tbl_1 (col_1 STRING); INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); {code} Submit the below query: {code:java} SELECT DISTINCT ( CASE WHEN col_1 = "G" THEN "Value_1" WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" ELSE "Value_3" END) AS G FROM tbl_1; {code} Actual result: {code:java} 3alue_1 Value_1 Value_3 nValue_ {code} Expected result (This is what Hive-2.3 returns): {code:java} Value_1 Value_3 {code} Workaround: Either disabling {code:java} SET hive.vectorized.execution.enabled=false; {code} Or reverting https://issues.apache.org/jira/browse/HIVE-16731. > Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3 > --- > > Key: HIVE-28301 > URL: https://issues.apache.org/jira/browse/HIVE-28301 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.1.3 >Reporter: Sercan Tekin >Priority: Critical > > *Create a table and insert data into it:* > {code:java} > CREATE TABLE tbl_1 (col_1 STRING); > INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); > {code} > *Submit the below query:* > {code:java} > SELECT DISTINCT ( > CASE > WHEN col_1 = "G" THEN "Value_1" > WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" > ELSE "Value_3" > END) AS G > FROM tbl_1; > {code} > *Actual result:* > {code:java} > 3alue_1 > Value_1 > Value_3 > nValue_ > {code} > *Expected result (This is what Hive-2.3 returns):* > {code:java} > Value_1 > Value_3 > {code} > *Workaround:* > Either disabling vectorization; > {code:java} > SET hive.vectorized.execution.enabled=false; > {code} > Or reverting https://issues.apache.org/jira/browse/HIVE-16731. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28301) Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3
[ https://issues.apache.org/jira/browse/HIVE-28301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sercan Tekin updated HIVE-28301: Description: *STEPS TO REPRODUCE:* Create a table and insert data into it: {code:java} CREATE TABLE tbl_1 (col_1 STRING); INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); {code} Submit the below query: {code:java} SELECT DISTINCT ( CASE WHEN col_1 = "G" THEN "Value_1" WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" ELSE "Value_3" END) AS G FROM tbl_1; {code} Actual result: {code:java} 3alue_1 Value_1 Value_3 nValue_ {code} Expected result (This is what Hive-2.3 returns): {code:java} Value_1 Value_3 {code} Workaround: Either disabling {code:java} SET hive.vectorized.execution.enabled=false; {code} Or reverting https://issues.apache.org/jira/browse/HIVE-16731. was: *STEPS TO REPRODUCE:* Create a table and insert data into it: {code:java} CREATE TABLE tbl_1 (col_1 STRING); INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); {code} Submit the below query: {code:java} SELECT DISTINCT ( CASE WHEN col_1 = "G" THEN "Value_1" WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" ELSE "Value_3" END) AS G FROM tbl_1; {code} Actual result: {code:java} 3alue_1 Value_1 Value_3 nValue_ {code} Expected result (This is what Hive-2.3 returns): {code:java} Value_1 Value_3 {code} Workaround: Either: {code:java} SET hive.vectorized.execution.enabled=false; {code} Or reverting https://issues.apache.org/jira/browse/HIVE-16731. > Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3 > --- > > Key: HIVE-28301 > URL: https://issues.apache.org/jira/browse/HIVE-28301 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.1.3 >Reporter: Sercan Tekin >Priority: Critical > > *STEPS TO REPRODUCE:* > Create a table and insert data into it: > {code:java} > CREATE TABLE tbl_1 (col_1 STRING); > INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); > {code} > Submit the below query: > {code:java} > SELECT DISTINCT ( > CASE > WHEN col_1 = "G" THEN "Value_1" > WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" > ELSE "Value_3" > END) AS G > FROM tbl_1; > {code} > Actual result: > {code:java} > 3alue_1 > Value_1 > Value_3 > nValue_ > {code} > Expected result (This is what Hive-2.3 returns): > {code:java} > Value_1 > Value_3 > {code} > Workaround: > Either disabling > {code:java} > SET hive.vectorized.execution.enabled=false; > {code} > Or reverting https://issues.apache.org/jira/browse/HIVE-16731. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28301) Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3
Sercan Tekin created HIVE-28301: --- Summary: Vectorization: CASE WHEN Returns Wrong Result in Hive-3.1.3 Key: HIVE-28301 URL: https://issues.apache.org/jira/browse/HIVE-28301 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 3.1.3 Reporter: Sercan Tekin *STEPS TO REPRODUCE:* Create a table and insert data into it: {code:java} CREATE TABLE tbl_1 (col_1 STRING); INSERT INTO tbl_1 VALUES ('G'),('G'),('not G'),('G'),('G'),('G'); {code} Submit the below query: {code:java} SELECT DISTINCT ( CASE WHEN col_1 = "G" THEN "Value_1" WHEN substr(LPAD(col_1,3,"0") ,1,1) = "G" THEN "Value_2" ELSE "Value_3" END) AS G FROM tbl_1; {code} Actual result: {code:java} 3alue_1 Value_1 Value_3 nValue_ {code} Expected result (This is what Hive-2.3 returns): {code:java} Value_1 Value_3 {code} Workaround: Either: {code:java} SET hive.vectorized.execution.enabled=false; {code} Or reverting https://issues.apache.org/jira/browse/HIVE-16731. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26893) Extend batch partition APIs to ignore partition schemas
[ https://issues.apache.org/jira/browse/HIVE-26893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-26893. -- Fix Version/s: 4.0.0 Resolution: Fixed [~hemanth619] Closing this jira as the fix has been merged > Extend batch partition APIs to ignore partition schemas > --- > > Key: HIVE-26893 > URL: https://issues.apache.org/jira/browse/HIVE-26893 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Quanlong Huang >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > There are several HMS APIs that return a list of partitions, e.g. > get_partitions_ps(), get_partitions_by_names(), add_partitions_req() with > needResult=true, etc. Each partition instance will have a unique list of > FieldSchemas as the partition schema: > {code:java} > org.apache.hadoop.hive.metastore.api.Partition > -> org.apache.hadoop.hive.metastore.api.StorageDescriptor >-> cols: list {code} > This could occupy a large memory footprint for wide tables (e.g. with 2k > cols). See the heap histogram in IMPALA-11812 as an example. > Some engines like Impala doesn't actually use/respect the partition level > schema. It's a waste of network/serde resource to transmit them. It'd be nice > if these APIs provide an optional boolean flag for ignoring partition > schemas. So HMS clients (e.g. Impala) don't need to clear them later (to save > mem). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28244) Add SBOM for storage-api and standalone-metastore modules
[ https://issues.apache.org/jira/browse/HIVE-28244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852076#comment-17852076 ] Denys Kuzmenko commented on HIVE-28244: --- Merged to master [~Aggarwal_Raghav], thanks for the patch! > Add SBOM for storage-api and standalone-metastore modules > - > > Key: HIVE-28244 > URL: https://issues.apache.org/jira/browse/HIVE-28244 > Project: Hive > Issue Type: Improvement >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > -Pdist profile doesn't work for storage-api/pom.xml and > standalone-metastore/pom.xml for creating SBOM. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28244) Add SBOM for storage-api and standalone-metastore modules
[ https://issues.apache.org/jira/browse/HIVE-28244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-28244. --- Fix Version/s: 4.1.0 Resolution: Fixed > Add SBOM for storage-api and standalone-metastore modules > - > > Key: HIVE-28244 > URL: https://issues.apache.org/jira/browse/HIVE-28244 > Project: Hive > Issue Type: Improvement >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > -Pdist profile doesn't work for storage-api/pom.xml and > standalone-metastore/pom.xml for creating SBOM. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28238) Open Hive transaction only for ACID resources
[ https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-28238. --- Fix Version/s: 4.1.0 Resolution: Fixed > Open Hive transaction only for ACID resources > - > > Key: HIVE-28238 > URL: https://issues.apache.org/jira/browse/HIVE-28238 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28238) Open Hive transaction only for ACID resources
[ https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko reassigned HIVE-28238: - Assignee: Denys Kuzmenko > Open Hive transaction only for ACID resources > - > > Key: HIVE-28238 > URL: https://issues.apache.org/jira/browse/HIVE-28238 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28238) Open Hive transaction only for ACID resources
[ https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-28238: -- Summary: Open Hive transaction only for ACID resources (was: Open Hive ACID txn only for transactional resources) > Open Hive transaction only for ACID resources > - > > Key: HIVE-28238 > URL: https://issues.apache.org/jira/browse/HIVE-28238 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28238) Open Hive transaction only for ACID resources
[ https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852033#comment-17852033 ] Denys Kuzmenko commented on HIVE-28238: --- Merged to master Thanks [~kkasa] for the review! > Open Hive transaction only for ACID resources > - > > Key: HIVE-28238 > URL: https://issues.apache.org/jira/browse/HIVE-28238 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28300) ALTER TABLE CONCATENATE on a List Bucketing Table fails when using Tez.
[ https://issues.apache.org/jira/browse/HIVE-28300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28300: -- Labels: pull-request-available (was: ) > ALTER TABLE CONCATENATE on a List Bucketing Table fails when using Tez. > --- > > Key: HIVE-28300 > URL: https://issues.apache.org/jira/browse/HIVE-28300 > Project: Hive > Issue Type: Bug >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: pull-request-available > > Running list_bucket_dml_8.q using TestMiniLlapLocalCliDriver fails with the > following error message: > {code:java} > org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException: Vertex failed, > vertexName=File Merge, vertexId=vertex_1717492217780_0001_4_00, > diagnostics=[Task failed, taskId=task_1717492217780_0001_4_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Node: ### : Error while > running task ( failure ) : > attempt_1717492217780_0001_4_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: Multiple partitions for one merge mapper: > file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME > NOT EQUAL TO > file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484 > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > Multiple partitions for one merge mapper: > file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME > NOT EQUAL TO > file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484 > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:220) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:153) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 16 more > {code} > This is a Hive-Tez problem which happens when Hive handles ALTER TABLE > CONCATENATE command on a List Bucketing table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28300) ALTER TABLE CONCATENATE on a List Bucketing Table fails when using Tez.
Seonggon Namgung created HIVE-28300: --- Summary: ALTER TABLE CONCATENATE on a List Bucketing Table fails when using Tez. Key: HIVE-28300 URL: https://issues.apache.org/jira/browse/HIVE-28300 Project: Hive Issue Type: Bug Reporter: Seonggon Namgung Assignee: Seonggon Namgung Running list_bucket_dml_8.q using TestMiniLlapLocalCliDriver fails with the following error message: {code:java} org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException: Vertex failed, vertexName=File Merge, vertexId=vertex_1717492217780_0001_4_00, diagnostics=[Task failed, taskId=task_1717492217780_0001_4_00_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Node: ### : Error while running task ( failure ) : attempt_1717492217780_0001_4_00_00_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME NOT EQUAL TO file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME NOT EQUAL TO file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484 at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:220) at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:153) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) ... 16 more {code} This is a Hive-Tez problem which happens when Hive handles ALTER TABLE CONCATENATE command on a List Bucketing table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly
[ https://issues.apache.org/jira/browse/HIVE-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851965#comment-17851965 ] Sungwoo Park commented on HIVE-24207: - Seonggon created HIVE-28281 to report the problem in case 1. For case 2, it's hard to reproduce the problem, but the bug seems obvious because two speculative task attempts are not supposed to update a common counter for the same LimitOperator. > LimitOperator can leverage ObjectCache to bail out quickly > -- > > Key: HIVE-24207 > URL: https://issues.apache.org/jira/browse/HIVE-24207 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > {noformat} > select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in > (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk > limit 100; > select distinct ss_sold_date_sk from store_sales, date_dim where > date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = > date_dim.d_date_sk limit 100; > {noformat} > Queries like the above generate a large number of map tasks. Currently they > don't bail out after generating enough amount of data. > It would be good to make use of ObjectCache & retain the number of records > generated. LimitOperator/VectorLimitOperator can bail out for the later tasks > in the operator's init phase itself. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28091) Remove invalid long datatype in ColumnStatsUpdateTask
[ https://issues.apache.org/jira/browse/HIVE-28091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Butao Zhang resolved HIVE-28091. Fix Version/s: 4.1.0 Resolution: Fixed Merged into master branch. Thanks [~dkuzmenko] for the review!!! > Remove invalid long datatype in ColumnStatsUpdateTask > - > > Key: HIVE-28091 > URL: https://issues.apache.org/jira/browse/HIVE-28091 > Project: Hive > Issue Type: Improvement >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Trivial > Labels: pull-request-available > Fix For: 4.1.0 > > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java#L104] > {code:java} > if (columnType.equalsIgnoreCase("long") || > columnType.equalsIgnoreCase("tinyint") > || columnType.equalsIgnoreCase("smallint") || > columnType.equalsIgnoreCase("int") > || columnType.equalsIgnoreCase("bigint")) { > LongColumnStatsDataInspector longStats = new > LongColumnStatsDataInspector(); {code} > IMO, Hive column does not support long data type. We should remove the > incorrect data type in ColumnStatsUpdateTask. > > In addition, the column stats related code blocks should be consistent with > code in StatObjectConverter.java, which also does not have long type. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java#L378] > {code:java} > } else if (colType.equals("bigint") || colType.equals("int") || > colType.equals("smallint") || colType.equals("tinyint")) { {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28254) CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results
[ https://issues.apache.org/jira/browse/HIVE-28254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851927#comment-17851927 ] Krisztian Kasa commented on HIVE-28254: --- Merged to master. Thanks [~okumin] for the fix. > CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results > --- > > Key: HIVE-28254 > URL: https://issues.apache.org/jira/browse/HIVE-28254 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: hive-4.0.1-must, pull-request-available > > CBO return path can build incorrect GroupByOperator when multiple > aggregations with DISTINCT are involved. > This is an example. > {code:java} > CREATE TABLE test (col1 INT, col2 INT); > INSERT INTO test VALUES (1, 100), (2, 200), (2, 200), (3, 300); > set hive.cbo.returnpath.hiveop=true; > set hive.map.aggr=false; > SELECT > SUM(DISTINCT col1), > COUNT(DISTINCT col1), > SUM(DISTINCT col2), > SUM(col2) > FROM test;{code} > The last column should be 800. But the SUM refers to col1 and the actual > result is 8. > {code:java} > +--+--+--+--+ > | _c0 | _c1 | _c2 | _c3 | > +--+--+--+--+ > | 6 | 3 | 600 | 8 | > +--+--+--+--+ {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28254) CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results
[ https://issues.apache.org/jira/browse/HIVE-28254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-28254: -- Fix Version/s: 4.1.0 Resolution: Fixed Status: Resolved (was: Patch Available) > CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results > --- > > Key: HIVE-28254 > URL: https://issues.apache.org/jira/browse/HIVE-28254 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: hive-4.0.1-must, pull-request-available > Fix For: 4.1.0 > > > CBO return path can build incorrect GroupByOperator when multiple > aggregations with DISTINCT are involved. > This is an example. > {code:java} > CREATE TABLE test (col1 INT, col2 INT); > INSERT INTO test VALUES (1, 100), (2, 200), (2, 200), (3, 300); > set hive.cbo.returnpath.hiveop=true; > set hive.map.aggr=false; > SELECT > SUM(DISTINCT col1), > COUNT(DISTINCT col1), > SUM(DISTINCT col2), > SUM(col2) > FROM test;{code} > The last column should be 800. But the SUM refers to col1 and the actual > result is 8. > {code:java} > +--+--+--+--+ > | _c0 | _c1 | _c2 | _c3 | > +--+--+--+--+ > | 6 | 3 | 600 | 8 | > +--+--+--+--+ {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28299) Iceberg: Optimize show partitions through column projection
[ https://issues.apache.org/jira/browse/HIVE-28299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28299: -- Labels: pull-request-available (was: ) > Iceberg: Optimize show partitions through column projection > --- > > Key: HIVE-28299 > URL: https://issues.apache.org/jira/browse/HIVE-28299 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Minor > Labels: pull-request-available > > In the current *show partitions* implementation, we need to fetch all columns > data, but in fact we only need two columns data, *partition* & {*}spec_id{*}. > We can only fetch the two columns through column project, and this can > improve the performance in case of big iceberg partition table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28299) Iceberg: Optimize show partitions through column projection
[ https://issues.apache.org/jira/browse/HIVE-28299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Butao Zhang reassigned HIVE-28299: -- Assignee: Butao Zhang > Iceberg: Optimize show partitions through column projection > --- > > Key: HIVE-28299 > URL: https://issues.apache.org/jira/browse/HIVE-28299 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Minor > > In the current *show partitions* implementation, we need to fetch all columns > data, but in fact we only need two columns data, *partition* & {*}spec_id{*}. > We can only fetch the two columns through column project, and this can > improve the performance in case of big iceberg partition table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28299) Iceberg: Optimize show partitions through column projection
Butao Zhang created HIVE-28299: -- Summary: Iceberg: Optimize show partitions through column projection Key: HIVE-28299 URL: https://issues.apache.org/jira/browse/HIVE-28299 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: Butao Zhang In the current *show partitions* implementation, we need to fetch all columns data, but in fact we only need two columns data, *partition* & {*}spec_id{*}. We can only fetch the two columns through column project, and this can improve the performance in case of big iceberg partition table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28298) TestQueryShutdownHooks to run on Tez
[ https://issues.apache.org/jira/browse/HIVE-28298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-28298: Description: this test got stuck while running in the scope of HIVE-27972, need to check jstack showed stacks like: {code} "HiveServer2-Background-Pool: Thread-155" #155 prio=5 os_prio=31 tid=0x7fa59f5e4800 nid=0x8027 waiting for monitor entry [0x700010ace000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hive.ql.exec.tez.TezTask$SyncDagClient.tryKillDAG(TezTask.java:870) - waiting to lock <0x0007be5c98f8> (a org.apache.tez.dag.api.client.DAGClientImplLocal) at org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor.monitorExecution(TezJobMonitor.java:278) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:271) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "f7b8e319-d882-4734-b036-1bb2da4b2783 main" #1 prio=5 os_prio=31 tid=0x7fa60e80b800 nid=0x1b03 waiting on condition [0x7e32a000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0007be3d0578> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:982) at org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73) at org.apache.tez.client.LocalClient$2.apply(LocalClient.java:441) at org.apache.tez.client.LocalClient$2.apply(LocalClient.java:437) at org.apache.tez.dag.api.client.DAGClientImplLocal.getDAGStatusInternal(DAGClientImplLocal.java:53) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:232) at org.apache.tez.dag.api.client.DAGClientImpl._waitForCompletionWithStatusUpdates(DAGClientImpl.java:583) at org.apache.tez.dag.api.client.DAGClientImpl.waitForCompletion(DAGClientImpl.java:375) at org.apache.hadoop.hive.ql.exec.tez.TezTask$SyncDagClient.waitForCompletion(TezTask.java:877) - locked <0x0007be5c98f8> (a org.apache.tez.dag.api.client.DAGClientImplLocal) at org.apache.hadoop.hive.ql.exec.tez.TezTask.closeDagClientOnCancellation(TezTask.java:432) at org.apache.hadoop.hive.ql.exec.tez.TezTask.shutdown(TezTask.java:805) at org.apache.hadoop.hive.ql.TaskQueue.shutdown(TaskQueue.java:138) - locked <0x0007be2a20b8> (a org.apache.hadoop.hive.ql.TaskQueue) at org.apache.hadoop.hive.ql.Driver.releaseTaskQueue(Driver.java:802) at org.apache.hadoop.hive.ql.Driver.close(Driver.java:779) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.close(ReExecDriver.java:268) at org.apache.hive.service.cli.operation.SQLOperation.cleanup(SQLOperation.java:409) - locked <0x0007be5994c0> (a
[jira] [Created] (HIVE-28298) TestQueryShutdownHooks to run on Tez
László Bodor created HIVE-28298: --- Summary: TestQueryShutdownHooks to run on Tez Key: HIVE-28298 URL: https://issues.apache.org/jira/browse/HIVE-28298 Project: Hive Issue Type: Sub-task Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.20.10#820010)