[jira] [Updated] (HIVE-28012) Invalid reference to the newly added column

2024-01-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28012:
---
Description: 
Steps to repro:
{code:java}
--! qt:dataset:src
--! qt:dataset:part
set hive.stats.autogather=true;
set hive.stats.column.autogather=true;
set 
metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
set hive.metastore.client.capabilities=HIVEFULLACIDWRITE,HIVEFULLACIDREAD;
set hive.create.as.external.legacy=true;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

CREATE TABLE rename_partition_table0 (key STRING, value STRING) PARTITIONED BY 
(part STRING) STORED AS ORC;
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '1') SELECT * 
FROM src where rand(1) < 0.5;
ALTER TABLE rename_partition_table0 ADD COLUMNS (new_col INT);
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '2') SELECT 
src.*, 1 FROM src;
{code}
Set hive.metastore.client.cache.v2.enabled=false can act as a workaround.

  was:
Steps to repro:

 
{code:java}
--! qt:dataset:src
--! qt:dataset:part
set hive.stats.autogather=true;
set hive.stats.column.autogather=true;
set 
metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
set hive.metastore.client.capabilities=HIVEFULLACIDWRITE,HIVEFULLACIDREAD;
set hive.create.as.external.legacy=true;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

CREATE TABLE rename_partition_table0 (key STRING, value STRING) PARTITIONED BY 
(part STRING) STORED AS ORC;
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '1') SELECT * 
FROM src where rand(1) < 0.5;
ALTER TABLE rename_partition_table0 ADD COLUMNS (new_col INT);
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '2') SELECT 
src.*, 1 FROM src;
{code}
 

 


> Invalid reference to the newly added column
> ---
>
> Key: HIVE-28012
> URL: https://issues.apache.org/jira/browse/HIVE-28012
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Priority: Major
>
> Steps to repro:
> {code:java}
> --! qt:dataset:src
> --! qt:dataset:part
> set hive.stats.autogather=true;
> set hive.stats.column.autogather=true;
> set 
> metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
> set hive.metastore.client.capabilities=HIVEFULLACIDWRITE,HIVEFULLACIDREAD;
> set hive.create.as.external.legacy=true;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> CREATE TABLE rename_partition_table0 (key STRING, value STRING) PARTITIONED 
> BY (part STRING) STORED AS ORC;
> INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '1') SELECT 
> * FROM src where rand(1) < 0.5;
> ALTER TABLE rename_partition_table0 ADD COLUMNS (new_col INT);
> INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '2') SELECT 
> src.*, 1 FROM src;
> {code}
> Set hive.metastore.client.cache.v2.enabled=false can act as a workaround.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28012) Invalid reference to the newly added column

2024-01-18 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-28012:
--

 Summary: Invalid reference to the newly added column
 Key: HIVE-28012
 URL: https://issues.apache.org/jira/browse/HIVE-28012
 Project: Hive
  Issue Type: Bug
Reporter: Zhihua Deng


Steps to repro:

 
{code:java}
--! qt:dataset:src
--! qt:dataset:part
set hive.stats.autogather=true;
set hive.stats.column.autogather=true;
set 
metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
set hive.metastore.client.capabilities=HIVEFULLACIDWRITE,HIVEFULLACIDREAD;
set hive.create.as.external.legacy=true;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

CREATE TABLE rename_partition_table0 (key STRING, value STRING) PARTITIONED BY 
(part STRING) STORED AS ORC;
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '1') SELECT * 
FROM src where rand(1) < 0.5;
ALTER TABLE rename_partition_table0 ADD COLUMNS (new_col INT);
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '2') SELECT 
src.*, 1 FROM src;
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28011) Update the table info in PART_COL_STATS directly in case of table rename

2024-01-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-28011:
--

Assignee: Zhihua Deng

> Update the table info in PART_COL_STATS directly in case of table rename
> 
>
> Key: HIVE-28011
> URL: https://issues.apache.org/jira/browse/HIVE-28011
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>
> Following the discussion on 
> [https://github.com/apache/hive/pull/4995#issuecomment-1899477224,] there are 
> still some rooms for performance tuning in case of table rename, that is, we 
> don't need to fetch all the column statistics from database then update them 
> in batch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28011) Update the table info in PART_COL_STATS directly in case of table rename

2024-01-18 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-28011:
--

 Summary: Update the table info in PART_COL_STATS directly in case 
of table rename
 Key: HIVE-28011
 URL: https://issues.apache.org/jira/browse/HIVE-28011
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Zhihua Deng


Following the discussion on 
[https://github.com/apache/hive/pull/4995#issuecomment-1899477224,] there are 
still some rooms for performance tuning in case of table rename, that is, we 
don't need to fetch all the column statistics from database then update them in 
batch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28010) Using apache fury instead of kyro/protubuf

2024-01-18 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808418#comment-17808418
 ] 

yongzhi.shao edited comment on HIVE-28010 at 1/19/24 1:43 AM:
--

[~dkuzmenko]  &  [~zhangbutao] :

Hi. Are you interested in this?


was (Author: lisoda):
[~dkuzmenko] :

Hi. Are you interested in this?

> Using apache fury instead of kyro/protubuf
> --
>
> Key: HIVE-28010
> URL: https://issues.apache.org/jira/browse/HIVE-28010
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Minor
>
> APACHE FURY is a new serialisation framework that can significantly improve 
> serialisation/deserialisation performance compared to Kyro and Protobuf. Do 
> we need Fury in HIVE?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28010) Using apache fury instead of kyro/protubuf

2024-01-18 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808418#comment-17808418
 ] 

yongzhi.shao edited comment on HIVE-28010 at 1/19/24 1:37 AM:
--

[~dkuzmenko] :

Hi. Are you interested in this?


was (Author: lisoda):
[~dkuzmenko] :

Are you interested in this?

> Using apache fury instead of kyro/protubuf
> --
>
> Key: HIVE-28010
> URL: https://issues.apache.org/jira/browse/HIVE-28010
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Minor
>
> APACHE FURY is a new serialisation framework that can significantly improve 
> serialisation/deserialisation performance compared to Kyro and Protobuf. Do 
> we need Fury in HIVE?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28010) Using apache fury instead of kyro/protubuf

2024-01-18 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808418#comment-17808418
 ] 

yongzhi.shao commented on HIVE-28010:
-

[~dkuzmenko] :

Are you interested in this?

> Using apache fury instead of kyro/protubuf
> --
>
> Key: HIVE-28010
> URL: https://issues.apache.org/jira/browse/HIVE-28010
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Minor
>
> APACHE FURY is a new serialisation framework that can significantly improve 
> serialisation/deserialisation performance compared to Kyro and Protobuf. Do 
> we need Fury in HIVE?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28010) Using apache fury instead of kyro/protubuf

2024-01-18 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-28010:

Summary: Using apache fury instead of kyro/protubuf  (was: using apache 
fury instead of kro/protobuf)

> Using apache fury instead of kyro/protubuf
> --
>
> Key: HIVE-28010
> URL: https://issues.apache.org/jira/browse/HIVE-28010
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Minor
>
> APACHE FURY is a new serialisation framework that can significantly improve 
> serialisation/deserialisation performance compared to Kyro and Protobuf. Do 
> we need Fury in HIVE?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28010) using apache fury instead of kro/protobuf

2024-01-18 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-28010:

Summary: using apache fury instead of kro/protobuf  (was: 使用apache fury 替代 
kyro/protubuf)

> using apache fury instead of kro/protobuf
> -
>
> Key: HIVE-28010
> URL: https://issues.apache.org/jira/browse/HIVE-28010
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Minor
>
> APACHE FURY is a new serialisation framework that can significantly improve 
> serialisation/deserialisation performance compared to Kyro and Protobuf. Do 
> we need Fury in HIVE?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28010) 使用apache fury 替代 kyro/protubuf

2024-01-18 Thread yongzhi.shao (Jira)
yongzhi.shao created HIVE-28010:
---

 Summary: 使用apache fury 替代 kyro/protubuf
 Key: HIVE-28010
 URL: https://issues.apache.org/jira/browse/HIVE-28010
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 4.0.0
Reporter: yongzhi.shao


APACHE FURY is a new serialisation framework that can significantly improve 
serialisation/deserialisation performance compared to Kyro and Protobuf. Do we 
need Fury in HIVE?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27960) Invalid function error when using custom udaf

2024-01-18 Thread gaoxiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoxiong updated HIVE-27960:

Fix Version/s: (was: 4.0.0-beta-1)

> Invalid function error when using custom udaf
> -
>
> Key: HIVE-27960
> URL: https://issues.apache.org/jira/browse/HIVE-27960
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.9, 4.0.0-beta-1
> Environment: Aliyun emr hive 2.3.9
>Reporter: gaoxiong
>Assignee: gaoxiong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.9
>
>
> When a permanent udaf used before over() function, hive will throw invalid 
> function error.
>  
> -In HIVE-12719 , it fix this issue for hive 3, but it can't work in hive 2.- 
> This issue reproduce on master.
>  
> In hive 2, it should get FunctionInfo from FunctionRegistry before get 
> WindowFunctionInfo same to hive 3. Because it will register window function 
> to session. Then hive can get WindowFunctionInfo correctly.
>  
>  err detail 
> register a permanent udaf:
> {code:java}
> create function row_number2 as 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber'; {code}
> execute query in a new cli session:
> {code:java}
> select row_number2() over();{code}
> blew is error log:
> {code:java}
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.Underlying error: Invalid function 
> row_number22023-12-06T10:17:30,348 ERROR 
> [0b7764ce-cde3-49c5-9d32-f96d61b20773 main] ql.Driver: FAILED: 
> SemanticException Failed to breakup Windowing invocations into Groups. At 
> least 1 group must only depend on input columns. Also check for circular 
> dependencies.Underlying error: Invalid function 
> row_number2org.apache.hadoop.hive.ql.parse.SemanticException: Failed to 
> breakup Windowing invocations into Groups. At least 1 group must only depend 
> on input columns. Also check for circular dependencies.Underlying error: 
> Invalid function row_number2    at 
> org.apache.hadoop.hive.ql.parse.WindowingComponentizer.next(WindowingComponentizer.java:97)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genWindowingPlan(SemanticAnalyzer.java:13270)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9685)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9644)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10549)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10427)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:11125)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:481)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)    at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)    at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)    at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)    at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)    at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)    
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)    at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)    at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)    at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)    at 
> org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)    at 
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)    at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)    at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)    at 
> org.apache.hadoop.util.RunJar.main(RunJar.java:153) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27960) Invalid function error when using custom udaf

2024-01-18 Thread gaoxiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoxiong updated HIVE-27960:

Fix Version/s: 4.0.0-beta-1

> Invalid function error when using custom udaf
> -
>
> Key: HIVE-27960
> URL: https://issues.apache.org/jira/browse/HIVE-27960
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.9, 4.0.0-beta-1
> Environment: Aliyun emr hive 2.3.9
>Reporter: gaoxiong
>Assignee: gaoxiong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.9, 4.0.0-beta-1
>
>
> When a permanent udaf used before over() function, hive will throw invalid 
> function error.
>  
> -In HIVE-12719 , it fix this issue for hive 3, but it can't work in hive 2.- 
> This issue reproduce on master.
>  
> In hive 2, it should get FunctionInfo from FunctionRegistry before get 
> WindowFunctionInfo same to hive 3. Because it will register window function 
> to session. Then hive can get WindowFunctionInfo correctly.
>  
>  err detail 
> register a permanent udaf:
> {code:java}
> create function row_number2 as 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber'; {code}
> execute query in a new cli session:
> {code:java}
> select row_number2() over();{code}
> blew is error log:
> {code:java}
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.Underlying error: Invalid function 
> row_number22023-12-06T10:17:30,348 ERROR 
> [0b7764ce-cde3-49c5-9d32-f96d61b20773 main] ql.Driver: FAILED: 
> SemanticException Failed to breakup Windowing invocations into Groups. At 
> least 1 group must only depend on input columns. Also check for circular 
> dependencies.Underlying error: Invalid function 
> row_number2org.apache.hadoop.hive.ql.parse.SemanticException: Failed to 
> breakup Windowing invocations into Groups. At least 1 group must only depend 
> on input columns. Also check for circular dependencies.Underlying error: 
> Invalid function row_number2    at 
> org.apache.hadoop.hive.ql.parse.WindowingComponentizer.next(WindowingComponentizer.java:97)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genWindowingPlan(SemanticAnalyzer.java:13270)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9685)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9644)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10549)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10427)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:11125)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:481)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)    at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)    at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)    at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)    at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)    at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)    
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)    at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)    at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)    at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)    at 
> org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)    at 
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)    at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)    at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)    at 
> org.apache.hadoop.util.RunJar.main(RunJar.java:153) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27960) Invalid function error when using custom udaf

2024-01-18 Thread gaoxiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoxiong updated HIVE-27960:

Affects Version/s: 4.0.0-beta-1

> Invalid function error when using custom udaf
> -
>
> Key: HIVE-27960
> URL: https://issues.apache.org/jira/browse/HIVE-27960
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.9, 4.0.0-beta-1
> Environment: Aliyun emr hive 2.3.9
>Reporter: gaoxiong
>Assignee: gaoxiong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.9
>
>
> When a permanent udaf used before over() function, hive will throw invalid 
> function error.
>  
> -In HIVE-12719 , it fix this issue for hive 3, but it can't work in hive 2.- 
> This issue reproduce on master.
>  
> In hive 2, it should get FunctionInfo from FunctionRegistry before get 
> WindowFunctionInfo same to hive 3. Because it will register window function 
> to session. Then hive can get WindowFunctionInfo correctly.
>  
>  err detail 
> register a permanent udaf:
> {code:java}
> create function row_number2 as 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber'; {code}
> execute query in a new cli session:
> {code:java}
> select row_number2() over();{code}
> blew is error log:
> {code:java}
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.Underlying error: Invalid function 
> row_number22023-12-06T10:17:30,348 ERROR 
> [0b7764ce-cde3-49c5-9d32-f96d61b20773 main] ql.Driver: FAILED: 
> SemanticException Failed to breakup Windowing invocations into Groups. At 
> least 1 group must only depend on input columns. Also check for circular 
> dependencies.Underlying error: Invalid function 
> row_number2org.apache.hadoop.hive.ql.parse.SemanticException: Failed to 
> breakup Windowing invocations into Groups. At least 1 group must only depend 
> on input columns. Also check for circular dependencies.Underlying error: 
> Invalid function row_number2    at 
> org.apache.hadoop.hive.ql.parse.WindowingComponentizer.next(WindowingComponentizer.java:97)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genWindowingPlan(SemanticAnalyzer.java:13270)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9685)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9644)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10549)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10427)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:11125)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:481)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)    at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)    at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)    at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)    at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)    at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)    
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)    at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)    at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)    at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)    at 
> org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)    at 
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)    at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)    at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)    at 
> org.apache.hadoop.util.RunJar.main(RunJar.java:153) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27104) Upgrade Bouncy Castle to 1.68 due to high CVEs

2024-01-18 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh reassigned HIVE-27104:
---

Assignee: Indhumathi Muthumurugesh

> Upgrade Bouncy Castle to 1.68 due to high CVEs
> --
>
> Key: HIVE-27104
> URL: https://issues.apache.org/jira/browse/HIVE-27104
> Project: Hive
>  Issue Type: Task
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27996) Revert HIVE-27406 & HIVE-27481

2024-01-18 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808115#comment-17808115
 ] 

László Végh commented on HIVE-27996:


[~aturoczy]  Currently yes. pls see my comment at HIVE-28004.

The plan is to fix the issues introfuced by HIVE-27406 and HIVE-27481.

 

This task is to provide a possibility to rollback if the fixes may take too 
much time and it is getting urgent make Hive work again.

> Revert HIVE-27406 & HIVE-27481
> --
>
> Key: HIVE-27996
> URL: https://issues.apache.org/jira/browse/HIVE-27996
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0-beta-1
>Reporter: László Végh
>Assignee: Laszlo Vegh
>Priority: Blocker
>  Labels: pull-request-available
>
> Revert HIVE-27406 & HIVE-27481
>  
> The introduced changes were causing DB incompatibility issues.
> {code}
> create table if not exists tab_acid (a int) partitioned by (p string) stored 
> as orc TBLPROPERTIES ('transactional'='true');
> insert into tab_acid values(1,'foo'),(3,'bar');
> Caused by: MetaException(message:The update count was rejected in at least 
> one of the result array. Rolling back.)
>   at 
> org.apache.hadoop.hive.metastore.txn.jdbc.MultiDataSourceJdbcResource.execute(MultiDataSourceJdbcResource.java:217)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.addDynamicPartitions(TxnHandler.java:876)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28009) Shared work optimizer ignores schema merge setting in case of virtual column difference

2024-01-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28009:
--
Labels: pull-request-available  (was: )

> Shared work optimizer ignores schema merge setting in case of virtual column 
> difference
> ---
>
> Key: HIVE-28009
> URL: https://issues.apache.org/jira/browse/HIVE-28009
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> set hive.optimize.shared.work.merge.ts.schema=false;
> create table t1(a int);
> explain
> WITH t AS (
>   select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from (
> select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a, row_number() 
> OVER (partition by INPUT__FILE__NAME) rn from t1
> where a = 1
>   ) q
>   where rn=1
> )
> select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from t1 where NOT (a 
> = 1) AND INPUT__FILE__NAME IN (select INPUT__FILE__NAME from t)
> union all
> select * from t
> {code}
> Before SharedWorkOptimizer:
> {code:java}
> TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
> TS[3]-FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
> TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
> {code}
> After SharedWorkOptimizer:
> {code:java}
> TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
>  -FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
> TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
> {code}
> TS[3] and TS[18] are merged but their schema doesn't match and 
> {{hive.optimize.shared.work.merge.ts.schema}} was turned off in the test
> {code:java}
> TS[3]: 0 = FILENAME
> TS[18]: 0 = BLOCKOFFSET,  FILENAME
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28004) DELETE on ACID table failed with NoClassDefFoundError: com/sun/tools/javac/util/List

2024-01-18 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-28004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808111#comment-17808111
 ] 

László Végh commented on HIVE-28004:


[~aturoczy] 

Yes, this is a blocker. Some issues were left uncovered by the unit/integration 
tests which are causing to fail very basic SQL statements like insert / drop. 
Unfortunately these issues were not exist using Derby HMS. So I did some 
testing on a PostgreSQL HMS backend, and the plan is to also run the qtests 
against a PostgresHMS. The PR I mentioned above is to collect all the fixes and 
merge them together. Until that ACID is broken.

> DELETE on ACID table failed with NoClassDefFoundError: 
> com/sun/tools/javac/util/List
> 
>
> Key: HIVE-28004
> URL: https://issues.apache.org/jira/browse/HIVE-28004
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0-beta-1
>Reporter: Butao Zhang
>Assignee: László Végh
>Priority: Blocker
>
> I am not sure if it is a bug or usage question.
> Test on Hive master branch:
>  
> {code:java}
> set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.support.concurrency = true;
> create table testacid4(id int) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> delete from testacid4 where id=110;
> {code}
>  
> *beeline console shows error:*
> {code:java}
> 0: jdbc:hive2://127.0.0.1:1/default> delete from testacid4 where id=110;
> INFO  : Compiling 
> command(queryId=hive_20240116180628_ec5ac4d8-473b-4b42-b0dd-eecebec71268): 
> delete from testacid4 where id=110
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:row__id, 
> type:struct, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20240116180628_ec5ac4d8-473b-4b42-b0dd-eecebec71268); 
> Time taken: 3.554 seconds
> INFO  : Operation QUERY obtained 1 locks
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.lockmgr.LockException(org.apache.thrift.TApplicationException:
>  Internal error processing get_latest_txnid_in_conflict)
> org.apache.hadoop.hive.ql.lockmgr.LockException: 
> org.apache.thrift.TApplicationException: Internal error processing 
> get_latest_txnid_in_conflict
>         at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.getLatestTxnIdInConflict(DbTxnManager.java:1055)
>         at 
> org.apache.hadoop.hive.ql.DriverTxnHandler.isValidTxnListState(DriverTxnHandler.java:435)
>         at org.apache.hadoop.hive.ql.Driver.validateTxnList(Driver.java:250)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:199)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.thrift.TApplicationException: Internal error processing 
> get_latest_txnid_in_conflict
>         at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_latest_txnid_in_conflict(ThriftHiveMetastore.java:6404)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_latest_txnid_in_conflict(ThriftHiveMetastore.java:6391)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getLatestTxnIdInConflict(HiveMetaStoreClient.java:4421)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.De

[jira] [Created] (HIVE-28009) Shared work optimizer ignores schema merge setting in case of virtual column difference

2024-01-18 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-28009:
-

 Summary: Shared work optimizer ignores schema merge setting in 
case of virtual column difference
 Key: HIVE-28009
 URL: https://issues.apache.org/jira/browse/HIVE-28009
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0-beta-1, 4.0.0
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code:java}
set hive.optimize.shared.work.merge.ts.schema=false;

create table t1(a int);

explain
WITH t AS (
  select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from (
select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a, row_number() OVER 
(partition by INPUT__FILE__NAME) rn from t1
where a = 1
  ) q
  where rn=1
)
select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from t1 where NOT (a = 
1) AND INPUT__FILE__NAME IN (select INPUT__FILE__NAME from t)
union all
select * from t
{code}
Before SharedWorkOptimizer:
{code:java}
TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
TS[3]-FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
{code}
After SharedWorkOptimizer:
{code:java}
TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
 -FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
{code}
TS[3] and TS[18] are merged but their schema doesn't match and 
{{hive.optimize.shared.work.merge.ts.schema}} was turned off in the test
{code:java}
TS[3]: 0 = FILENAME
TS[18]: 0 = BLOCKOFFSET,  FILENAME
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-28004) DELETE on ACID table failed with NoClassDefFoundError: com/sun/tools/javac/util/List

2024-01-18 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-28004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-28004 started by László Végh.
--
> DELETE on ACID table failed with NoClassDefFoundError: 
> com/sun/tools/javac/util/List
> 
>
> Key: HIVE-28004
> URL: https://issues.apache.org/jira/browse/HIVE-28004
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0-beta-1
>Reporter: Butao Zhang
>Assignee: László Végh
>Priority: Blocker
>
> I am not sure if it is a bug or usage question.
> Test on Hive master branch:
>  
> {code:java}
> set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.support.concurrency = true;
> create table testacid4(id int) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> delete from testacid4 where id=110;
> {code}
>  
> *beeline console shows error:*
> {code:java}
> 0: jdbc:hive2://127.0.0.1:1/default> delete from testacid4 where id=110;
> INFO  : Compiling 
> command(queryId=hive_20240116180628_ec5ac4d8-473b-4b42-b0dd-eecebec71268): 
> delete from testacid4 where id=110
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:row__id, 
> type:struct, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20240116180628_ec5ac4d8-473b-4b42-b0dd-eecebec71268); 
> Time taken: 3.554 seconds
> INFO  : Operation QUERY obtained 1 locks
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.lockmgr.LockException(org.apache.thrift.TApplicationException:
>  Internal error processing get_latest_txnid_in_conflict)
> org.apache.hadoop.hive.ql.lockmgr.LockException: 
> org.apache.thrift.TApplicationException: Internal error processing 
> get_latest_txnid_in_conflict
>         at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.getLatestTxnIdInConflict(DbTxnManager.java:1055)
>         at 
> org.apache.hadoop.hive.ql.DriverTxnHandler.isValidTxnListState(DriverTxnHandler.java:435)
>         at org.apache.hadoop.hive.ql.Driver.validateTxnList(Driver.java:250)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:199)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.thrift.TApplicationException: Internal error processing 
> get_latest_txnid_in_conflict
>         at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_latest_txnid_in_conflict(ThriftHiveMetastore.java:6404)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_latest_txnid_in_conflict(ThriftHiveMetastore.java:6391)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getLatestTxnIdInConflict(HiveMetaStoreClient.java:4421)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)
>         at com.sun.proxy.$Proxy32.getLatestTxnIdInConflict(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

[jira] [Updated] (HIVE-27741) Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead to data consistency issues

2024-01-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27741:
--
Labels: pull-request-available  (was: )

> Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead 
> to data consistency issues
> 
>
> Key: HIVE-27741
> URL: https://issues.apache.org/jira/browse/HIVE-27741
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0-beta-1
>Reporter: Janos Kovacs
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: pull-request-available
>
> When the timezone specified in the *to_utc_timestamp()* function is not 
> valid, it still treated as UTC instead of throwing an error. If the user 
> accidentally made a typo - e.g. America/Los{color:#ff}*t*{color}_Angeles, 
> the query runs successfully returning an invalid converted value which can 
> lead to data consistency issues. 
> Repro code:
> {noformat}
> docker rm -f hive4
> export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
> export HS2_ENV_TZ="Europe/Budapest"
> export HS2_USER_TZ=${HS2_ENV_TZ}
> export HIVE_LOCAL_TZ="America/Los_Angeles"
> export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
> -Dhive.local.time.zone=$HIVE_LOCAL_TZ"
> export HS2_OPTS="$HS2_OPTS  
> -Dhive.server2.tez.initialize.default.sessions=false"
> docker run -d -p 1:1 -p 10001:10001 -p 10002:10002 --env 
> TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env 
> SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION}
> docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:1/' -e "
> SELECT '\${env:TZ}' as \`env:TZ\`,
>'\${system:user.timezone}' as \`system:user.timezone\`,
>'\${hiveconf:hive.local.time.zone}' as 
> \`hiveconf:hive.local.time.zone\`;
> DROP TABLE IF EXISTS timestamptest;
> CREATE TABLE timestamptest (
>   ts timestamp,
>   tz timestamp with local time zone
> ) STORED AS TEXTFILE;
> INSERT INTO timestamptest select TIMESTAMP'2016-01-03 
> 12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles';
> SELECT
>   tzas orig,
>   to_utc_timestamp(tz, 'America/Los_Angeles')   as utc_correct_tz,
>   to_utc_timestamp(tz, 'Europe/HereIsATypo')as utc_incorrect_tz,
>   to_utc_timestamp(tz, 'LOCAL') as 
> utc_local_aslo_incorrect_tz,
>   to_utc_timestamp(tz, 'UTC')   as utc_tz
> FROM timestamptest;
> "
> {noformat}
>  
> The results are:
> {noformat}
> +--+---++
> |  env:tz  | system:user.timezone  | hiveconf:hive.local.time.zone  |
> +--+---++
> | Europe/Budapest  | Europe/Budapest   | America/Los_Angeles|
> +--+---++
> ++++--++
> |orig| utc_correct_tz |
> utc_incorrect_tz| utc_local_aslo_incorrect_tz  | utc_tz |
> ++++--++
> | 2016-01-03 12:26:34.0 America/Los_Angeles  | 2016-01-03 20:26:34.0  | 
> 2016-01-03 12:26:34.0  | 2016-01-03 12:26:34.0| 2016-01-03 12:26:34.0 
>  |
> ++++--++
> {noformat}
> Note:
>  * the invalid timezone - utc_incorrect_tz - is treated as UTC
>  * also note that LOCAL is also treated as UTC which in fact should be 
> treated as system's timezone, but as LOCAL is also an invalid timezone value 
> in hive4, ut becomes UTC just like any other invalid and/or typo timezone 
> values (see HIVE-27742)
>  
> Hive should throw an Exception in that case to let the user know that the 
> provided timezone is wrong - at least this should be configurable, e.g. via 
> something like {*}hive.strict.time.zone.check{*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27775) DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift

2024-01-18 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808086#comment-17808086
 ] 

Stamatis Zampetakis commented on HIVE-27775:


[~dengzh] [~wechar] Have we confirmed that this bug is something that can 
affect production? It seems that direct SQL returns the correct result so my 
question is actually the following. Is it possible to disable direct SQL for 
this code path and hit the JDO bug?

> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift
> --
>
> Key: HIVE-27775
> URL: https://issues.apache.org/jira/browse/HIVE-27775
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Zhihua Deng
>Priority: Critical
>  Labels: pull-request-available
>
> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift.
> {code:sql}
> --! qt:timezone:Europe/Paris
> CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime 
> TIMESTAMP) STORED AS ORC;
> INSERT into payments VALUES('---', '2023-03-26 02:30:00');
> SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00';
> {code}
> The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls 
> exactly in the middle of the DST shift. In this particular timezone this date 
> time never really exists since we are jumping directly from 02:00:00 to 
> 03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic 
> (https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) 
> so it is a perfectly valid timestamp that can be inserted in a table and we 
> must be able to recover it back.
> For the SELECT query above, partition pruning kicks in and calls the 
> ObjectStore#getPartitionsByExpr method in order to fetch the respective 
> partitions matching the timestamp from HMS.
> The tests however reveal that DirectSQL and JDO paths are not returning the 
> same results leading to an exception when VerifyingObjectStore is used. 
> According to the error below DirectSQL is able to recover one partition from 
> HMS (expected) while JDO/ORM returns empty (not expected).
> {noformat}
> 2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0
> 2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.RetryingHMSHandler: MetaException(message:Lists are not the same 
> size: SQL 1, ORM 0)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
>   at com.sun.proxy.$Proxy57.getPartitionsByExpr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:98)
>   at 
> org.apache.hadoop.hive.metastore.AbstractHMSHandlerProxy.invoke(AbstractHMSHandlerProxy.java:82)
>   at com.sun.proxy.$Proxy59.get_partitions_spec_by_expr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsSpecByExprInternal(HiveMetaStoreClient.java:2472)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientWithLocalCache.getPartitionsSpecByExprInternal(HiveMetaStoreClientWithLocalCache.java:396)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getPartitionsSpecByExprInternal(SessionHiveMetaStoreClient.java:2279)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsSpecByExpr(HiveMetaStoreClient.java:2484)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.listPartitionsSpecByExpr(SessionHiveMetaStoreClient.java:1346)

[jira] [Resolved] (HIVE-27749) SchemaTool initSchema fails on Mariadb 10.2

2024-01-18 Thread Sourabh Badhya (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya resolved HIVE-27749.
---
Resolution: Fixed

Merged the addendum PR to master.
Thanks [~aturoczy] and [~dkuzmenko] for the reviews.

> SchemaTool initSchema fails on Mariadb 10.2
> ---
>
> Key: HIVE-27749
> URL: https://issues.apache.org/jira/browse/HIVE-27749
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-alpha-2, 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Sourabh Badhya
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: mariadb-metastore-schema-tests.patch
>
>
> Schema initialization for 4.0.0-beta-1 fails when run on Mariadb 10.2.
> The problem is reproducible on current 
> (e5a7ce2f091da1f8a324da6e489cda59b9e4bfc6) master by applying the 
> [^mariadb-metastore-schema-tests.patch] and then running:
> {noformat}
> mvn test -Dtest=TestMariadb#install -Dtest.groups=""{noformat}
> The error is shown below:
> {noformat}
> 315/409  ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` 
> BIGINT(20) GENERATED ALWAYS AS (1) STORED NOT NULL;
> Error: (conn=11) You have an error in your SQL syntax; check the manual that 
> corresponds to your MariaDB server version for the right syntax to use near 
> 'NOT NULL' at line 1 (state=42000,code=1064)
> Aborting command set because "force" is false and command failed: "ALTER 
> TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) GENERATED 
> ALWAYS AS (1) STORED NOT NULL;"
> [ERROR] 2023-09-27 21:36:30.317 [main] MetastoreSchemaTool - Schema 
> initialization FAILED! Metastore state would be inconsistent!
> Schema initialization FAILED! Metastore state would be inconsistent!
> [ERROR] 2023-09-27 21:36:30.317 [main] MetastoreSchemaTool - Underlying 
> cause: java.io.IOException : Schema script failed, errorcode OTHER
> Underlying cause: java.io.IOException : Schema script failed, errorcode OTHER
> org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization 
> FAILED! Metastore state would be inconsistent!
> at 
> org.apache.hadoop.hive.metastore.tools.schematool.SchemaToolTaskInit.execute(SchemaToolTaskInit.java:66)
> at 
> org.apache.hadoop.hive.metastore.tools.schematool.MetastoreSchemaTool.run(MetastoreSchemaTool.java:480)
> at 
> org.apache.hadoop.hive.metastore.tools.schematool.MetastoreSchemaTool.run(MetastoreSchemaTool.java:425)
> at 
> org.apache.hadoop.hive.metastore.dbinstall.rules.DatabaseRule.installLatest(DatabaseRule.java:269)
> at 
> org.apache.hadoop.hive.metastore.dbinstall.DbInstallBase.install(DbInstallBase.java:34)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provi