[jira] [Created] (HIVE-28399) Improve the fetch size in HiveConnection

2024-07-23 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-28399:
--

 Summary: Improve the fetch size in HiveConnection
 Key: HIVE-28399
 URL: https://issues.apache.org/jira/browse/HIVE-28399
 Project: Hive
  Issue Type: Improvement
Reporter: Zhihua Deng
Assignee: Zhihua Deng


If the 4.x Hive Jdbc client connects to an older HS2 or other thrift 
implementations, it might throw the IllegalStateException: 
[https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L1253-L1258],
 as the remote might haven't set the property:  
hive.server2.thrift.resultset.default.fetch.size back to the response of 
OpenSession request. It also introduces the confusing on what the real fetch 
size the connection is, as we have both initFetchSize and defaultFetchSize in 
this HiveConnection, the HiveStatement checks the initFetchSize, 
defaultFetchSize and 
HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.defaultIntVal to obtain the 
real fetch size, we can make them one in HiveConnection, so every statement 
created from the connection uses this fetch size.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28399) Improve the fetch size in HiveConnection

2024-07-23 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28399:
---
Description: 
If the 4.x Hive Jdbc client connects to an older HS2 or other thrift 
implementations, it might throw the IllegalStateException: 
[https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L1253-L1258],
 as the remote might haven't set the property:  
hive.server2.thrift.resultset.default.fetch.size back to the response of 
OpenSession request. It also introduces the confusing on what the real fetch 
size the connection is, as we have both initFetchSize and defaultFetchSize in 
this HiveConnection, the HiveStatement checks the initFetchSize, 
defaultFetchSize and 
HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.defaultIntVal to obtain the 
real fetch size, we can make them one in HiveConnection, so every statement 
created from the connection uses this new fetch size.
 

  was:
If the 4.x Hive Jdbc client connects to an older HS2 or other thrift 
implementations, it might throw the IllegalStateException: 
[https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L1253-L1258],
 as the remote might haven't set the property:  
hive.server2.thrift.resultset.default.fetch.size back to the response of 
OpenSession request. It also introduces the confusing on what the real fetch 
size the connection is, as we have both initFetchSize and defaultFetchSize in 
this HiveConnection, the HiveStatement checks the initFetchSize, 
defaultFetchSize and 
HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.defaultIntVal to obtain the 
real fetch size, we can make them one in HiveConnection, so every statement 
created from the connection uses this fetch size.
 


> Improve the fetch size in HiveConnection
> 
>
> Key: HIVE-28399
> URL: https://issues.apache.org/jira/browse/HIVE-28399
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>
> If the 4.x Hive Jdbc client connects to an older HS2 or other thrift 
> implementations, it might throw the IllegalStateException: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L1253-L1258],
>  as the remote might haven't set the property:  
> hive.server2.thrift.resultset.default.fetch.size back to the response of 
> OpenSession request. It also introduces the confusing on what the real fetch 
> size the connection is, as we have both initFetchSize and defaultFetchSize in 
> this HiveConnection, the HiveStatement checks the initFetchSize, 
> defaultFetchSize and 
> HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.defaultIntVal to obtain the 
> real fetch size, we can make them one in HiveConnection, so every statement 
> created from the connection uses this new fetch size.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28360) Upgrade jersey to version 1.19.4,

2024-07-23 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28360:
---
Labels: hive-4.0.1-must  (was: )

> Upgrade jersey to version 1.19.4,
> -
>
> Key: HIVE-28360
> URL: https://issues.apache.org/jira/browse/HIVE-28360
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: lvyankui
>Assignee: lvyankui
>Priority: Major
>  Labels: hive-4.0.1-must
> Attachments: HIVE-28360.patch
>
>
> Hive version: 3.1.3
> Hadoop version: 3.3.6
> After upgrading to Hadoop 3.3.6, the Hive WebHCat server fails to start 
> because of inconsistent versions of the Jersey JAR package. Hive HCat lacks 
> the jersey-server-1.19 jar.
>  
> After upgrading to Hadoop 3.3.5+, Hadoop updates jersey to version 
> {color:#ff}1.19.4{color}, which is inconsistent with the jersey version 
> in the Hive WebHCat server. As a result, the startup fails. To resolve this, 
> manually download a package and place it in 
> /usr/lib/hive-hcatalog/share/webhcat/svr/lib/
> Therefore, when packaging Hive, we need to specify the version of Jersey in 
> the Hive POM file to match the version of Jersey in Hadoop  to avoid version 
> conflicts.
>  
> Here is the error log 
> INFO  | 18 Jul 2024 14:37:13,237 | org.eclipse.jetty.server.Server | 
> jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 
> 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 1.8.0_412-b08
> WARN  | 18 Jul 2024 14:37:13,326 | 
> org.eclipse.jetty.server.handler.ContextHandler.ROOT | unavailable
> com.sun.jersey.api.container.ContainerException: No WebApplication provider 
> is present
>         at 
> com.sun.jersey.spi.container.WebApplicationFactory.createWebApplication(WebApplicationFactory.java:69)
>  ~[jersey-server-1.19.4.jar:1.19.4]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.create(ServletContainer.java:412)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.create(ServletContainer.java:327)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:603) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:207) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:394)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:577)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at javax.servlet.GenericServlet.init(GenericServlet.java:244) 
> ~[javax.servlet-api-3.1.0.jar:3.1.0]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28350) Drop remote database succeeds but fails while deleting data under

2024-07-23 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28350.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fix has been merged to master. Thank you [~hemanth619] for the PR, and 
[~zhangbutao] for the review!

> Drop remote database succeeds but fails while deleting data under
> -
>
> Key: HIVE-28350
> URL: https://issues.apache.org/jira/browse/HIVE-28350
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> Drop remote database operation succeeds but fails towards the end while 
> clearing data under the database's location because while fetching database 
> object via JDO we don't seem to set the 'locationUri' field.
> {code:java}
> > drop database pg_hive_tests;
> INFO  : Compiling 
> command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86): 
> drop database pg_hive_tests
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86); 
> Time taken: 0.115 seconds
> INFO  : Executing 
> command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86): 
> drop database pg_hive_tests
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
>     at org.apache.hadoop.hive.ql.metadata.Hive.dropDatabase(Hive.java:716) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:51)
>  ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:356) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:329) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:813) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:550) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:544) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_232]
>     at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_232]
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  ~[hadoop-common-3.1.1.7.2.18.0-641.jar:?]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232]
>     at 
> 

[jira] [Resolved] (HIVE-28315) Missing classes while using hive jdbc standalone jar

2024-07-19 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28315.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fix has been pushed to master. Thank you [~dkuzmenko] for the review!

> Missing classes while using hive jdbc standalone jar
> 
>
> Key: HIVE-28315
> URL: https://issues.apache.org/jira/browse/HIVE-28315
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28372) No need to update partitions stats when renaming table

2024-07-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28372:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> No need to update partitions stats when renaming table
> --
>
> Key: HIVE-28372
> URL: https://issues.apache.org/jira/browse/HIVE-28372
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> After HIVE-27725, We no need to update partitions stats when renaming table, 
> as table name & db name are not in partition stats.
> This change can speed up partitioned table rename operation in case of many 
> partition stats stored in HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28372) No need to update partitions stats when renaming table

2024-07-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28372.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fix has been merged. Thank you for the PR [~zhangbutao]!

> No need to update partitions stats when renaming table
> --
>
> Key: HIVE-28372
> URL: https://issues.apache.org/jira/browse/HIVE-28372
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> After HIVE-27725, We no need to update partitions stats when renaming table, 
> as table name & db name are not in partition stats.
> This change can speed up partitioned table rename operation in case of many 
> partition stats stored in HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28360) Upgrade jersey to version 1.19.4,

2024-07-18 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866961#comment-17866961
 ] 

Zhihua Deng commented on HIVE-28360:


Sorry, I just find there already has a PR here...

> Upgrade jersey to version 1.19.4,
> -
>
> Key: HIVE-28360
> URL: https://issues.apache.org/jira/browse/HIVE-28360
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: lvyankui
>Assignee: lvyankui
>Priority: Major
> Attachments: HIVE-28360.patch
>
>
> Hive version: 3.1.3
> Hadoop version: 3.3.5
> After upgrading to Hadoop 3.3.5, the Hive WebHCat server fails to start 
> because of inconsistent versions of the Jersey JAR package. Hive HCat lacks 
> the jersey-server-1.19 jar.
>  
> After upgrading to Hadoop 3.3.5+, Hadoop updates jersey to version 
> {color:#ff}1.19.4{color}, which is inconsistent with the jersey version 
> in the Hive WebHCat server. As a result, the startup fails. To resolve this, 
> manually download a package and place it in 
> /usr/lib/hive-hcatalog/share/webhcat/svr/lib/
> Therefore, when packaging Hive, we need to specify the version of Jersey in 
> the Hive POM file to match the version of Jersey in Hadoop  to avoid version 
> conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26537) Deprecate older APIs in the HMS

2024-07-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-26537:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Deprecate older APIs in the HMS
> ---
>
> Key: HIVE-26537
> URL: https://issues.apache.org/jira/browse/HIVE-26537
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> This Jira is to track the clean-up(deprecate older APIs and point the HMS 
> client to the newer APIs) work in the hive metastore server.
> More details will be added here soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26018) The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR

2024-07-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-26018:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> ---
>
> Key: HIVE-26018
> URL: https://issues.apache.org/jira/browse/HIVE-26018
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and 
> the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  
> T2_n1x b (b.key);
> Hive on Tez result: wrong
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +--+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> +-+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> +---+
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +-+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |ccc    |ccc    |
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28352) Schematool fails to upgradeSchema on dbType=hive

2024-07-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28352:
---
Labels: hive-4.0.1-merged hive-4.0.1-must  (was: hive-4.0.1-must)

> Schematool fails to upgradeSchema on dbType=hive
> 
>
> Key: HIVE-28352
> URL: https://issues.apache.org/jira/browse/HIVE-28352
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must
> Fix For: 4.1.0
>
>
> Schematool tries to refer to incorrect file names.
> {code:java}
> $ schematool -metaDbType derby -dbType hive -initSchemaTo 3.0.0 -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> $ schematool -metaDbType derby -dbType hive -upgradeSchema -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> ...
> Completed upgrade-3.0.0-to-3.1.0.hive.sql
> Upgrade script upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Upgrade 
> FAILED! Metastore state would be inconsistent !!
> Upgrade FAILED! Metastore state would be inconsistent !!
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: 
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Use 
> --verbose for detailed stacktrace.
> Use --verbose for detailed stacktrace.
> 2024-06-27T01:41:46,573 ERROR [main] schematool.MetastoreSchemaTool: *** 
> schemaTool failed ***
> *** schemaTool failed *** {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28338) Client connection count is not correct in HiveMetaStore#close

2024-07-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28338:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Client connection count is not correct in HiveMetaStore#close
> -
>
> Key: HIVE-28338
> URL: https://issues.apache.org/jira/browse/HIVE-28338
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-24349 introduced a bug in {{HiveMetaStoreClient}} for embedded 
> metastore, where the log would print negative connection counts.
> *Root Cause*
> Connection count is only used in remote metastore, we do not need decrease 
> connection counts when transport is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28369) LLAP proactive eviction fails with NullPointerException

2024-07-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28369:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> LLAP proactive eviction fails with NullPointerException
> ---
>
> Key: HIVE-28369
> URL: https://issues.apache.org/jira/browse/HIVE-28369
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> When hive.llap.io.encode.enabled is false, LLAP proactive eviction fails with 
> NullPointerException as follows:
> {code:java}
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.evictEntity(LlapIoImpl.java:313)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.evictEntity(LlapProtocolServerImpl.java:365)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapManagementProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:33214)
>  ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
>  ~[hadoop-common-3.3.6.jar:?]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
>  ~[hadoop-common-3.3.6.jar:?]
> ...{code}
>  
> In fact,  3 caches used by LlapIoImpl.evictEntity() may be null or throw 
> UnsupportedOperationException, so we should check whether it is safe to call 
> markBuffersForProactiveEviction() or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28368) Iceberg: Unable to read PARTITIONS Metadata table

2024-07-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28368:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Iceberg: Unable to read PARTITIONS Metadata table
> -
>
> Key: HIVE-28368
> URL: https://issues.apache.org/jira/browse/HIVE-28368
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Fails with
> {noformat}
> Caused by: java.lang.ClassCastException: java.time.LocalDateTime cannot be 
> cast to java.time.OffsetDateTime
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampWithZoneObjectInspectorHive3.getPrimitiveJavaObject(IcebergTimestampWithZoneObjectInspectorHive3.java:60)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampWithZoneObjectInspectorHive3.getPrimitiveWritableObject(IcebergTimestampWithZoneObjectInspectorHive3.java:67)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:313)
>  
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
>  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28308) The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all dependencies under the Compile Scope

2024-07-17 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866925#comment-17866925
 ] 

Zhihua Deng commented on HIVE-28308:


I checked the org.apache.hive:hive-jdbc, version 3.1.3 and 2.3.3,  they don't 
add  any extra dependencies, while the {{standalone}} jar has many transitive 
dependencies. I'm wondering the dependencies showing in the maven page are from 
the standalone hive-jdbc, this is not expected. If you try to use the slim 
hive-jdbc, need to be carefully of what dependencies need to import yourself. 

> The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all 
> dependencies under the Compile Scope
> ---
>
> Key: HIVE-28308
> URL: https://issues.apache.org/jira/browse/HIVE-28308
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Priority: Major
>
> - The module *org.apache.hive:hive-jdbc:4.0.0* unintuitively removed all 
> dependencies under the Compile Scope.
>  - Comparing the dependencies listed at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/4.0.0/dependencies]
>  with those at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/3.1.3/dependencies]
>  , it becomes apparent that *org.apache.hive:hive-jdbc:4.0.0* includes only 
> test dependencies. This results in the need to manually add additional 
> dependencies when utilizing the HiveServer2 JDBC Driver.
>  - 
> {code:xml}
> 
>   org.apache.hive
>   hive-jdbc
>   4.0.0
> 
> 
>   org.apache.hive
>   hive-service
>   4.0.0
> 
>   
> org.apache.logging.log4j
> log4j-slf4j-impl
>   
>   
> org.slf4j
> slf4j-reload4j
>   
>   
> org.apache.logging.log4j
> log4j-web
>   
>   
> org.apache.logging.log4j
> log4j-core
>   
>   
> org.apache.logging.log4j
> log4j-api
>   
>   
> org.apache.logging.log4j
> log4j-1.2-api
>   
>   
> 
> 
>   org.apache.hadoop
>   hadoop-client-runtime
>   3.3.6
> 
> {code}
>  - More early surveys come from 
> https://github.com/apache/shardingSphere/pull/31526 and 
> https://github.com/dbeaver/dbeaver/issues/22777 . I personally think this is 
> not a reasonable phenomenon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28360) Upgrade jersey to version 1.19.4,

2024-07-17 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866914#comment-17866914
 ] 

Zhihua Deng commented on HIVE-28360:


Also please post the failing message of the startup if possible to the Jira to 
help the reviewer get a better understanding what the problem is.

> Upgrade jersey to version 1.19.4,
> -
>
> Key: HIVE-28360
> URL: https://issues.apache.org/jira/browse/HIVE-28360
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: lvyankui
>Assignee: lvyankui
>Priority: Major
> Attachments: HIVE-28360.patch
>
>
> Hive version: 3.1.3
> Hadoop version: 3.3.5
> After upgrading to Hadoop 3.3.5, the Hive WebHCat server fails to start 
> because of inconsistent versions of the Jersey JAR package. Hive HCat lacks 
> the jersey-server-1.19 jar.
>  
> After upgrading to Hadoop 3.3.5+, Hadoop updates jersey to version 
> {color:#ff}1.19.4{color}, which is inconsistent with the jersey version 
> in the Hive WebHCat server. As a result, the startup fails. To resolve this, 
> manually download a package and place it in 
> /usr/lib/hive-hcatalog/share/webhcat/svr/lib/
> Therefore, when packaging Hive, we need to specify the version of Jersey in 
> the Hive POM file to match the version of Jersey in Hadoop  to avoid version 
> conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28360) Upgrade jersey to version 1.19.4,

2024-07-17 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866913#comment-17866913
 ] 

Zhihua Deng commented on HIVE-28360:


Hi [~lvyankui], could you open a PR for the fix please so it can get reviewed 
and tested? 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-SubmittingaPR

> Upgrade jersey to version 1.19.4,
> -
>
> Key: HIVE-28360
> URL: https://issues.apache.org/jira/browse/HIVE-28360
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: lvyankui
>Assignee: lvyankui
>Priority: Major
> Attachments: HIVE-28360.patch
>
>
> Hive version: 3.1.3
> Hadoop version: 3.3.5
> After upgrading to Hadoop 3.3.5, the Hive WebHCat server fails to start 
> because of inconsistent versions of the Jersey JAR package. Hive HCat lacks 
> the jersey-server-1.19 jar.
>  
> After upgrading to Hadoop 3.3.5+, Hadoop updates jersey to version 
> {color:#ff}1.19.4{color}, which is inconsistent with the jersey version 
> in the Hive WebHCat server. As a result, the startup fails. To resolve this, 
> manually download a package and place it in 
> /usr/lib/hive-hcatalog/share/webhcat/svr/lib/
> Therefore, when packaging Hive, we need to specify the version of Jersey in 
> the Hive POM file to match the version of Jersey in Hadoop  to avoid version 
> conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28338) Client connection count is not correct in HiveMetaStore#close

2024-07-17 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28338:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Client connection count is not correct in HiveMetaStore#close
> -
>
> Key: HIVE-28338
> URL: https://issues.apache.org/jira/browse/HIVE-28338
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-24349 introduced a bug in {{HiveMetaStoreClient}} for embedded 
> metastore, where the log would print negative connection counts.
> *Root Cause*
> Connection count is only used in remote metastore, we do not need decrease 
> connection counts when transport is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26537) Deprecate older APIs in the HMS

2024-07-17 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-26537.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fix has been merged into master! Thank you for the PR [~hemanth619]!

> Deprecate older APIs in the HMS
> ---
>
> Key: HIVE-26537
> URL: https://issues.apache.org/jira/browse/HIVE-26537
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> This Jira is to track the clean-up(deprecate older APIs and point the HMS 
> client to the newer APIs) work in the hive metastore server.
> More details will be added here soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28316) The documentation provides an ambiguous explanation regarding the mutually exclusive nature of `STORED BY` and `STORED AS`

2024-07-10 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28316.

Fix Version/s: Not Applicable
   Resolution: Fixed

Updated the document. Thank you for the issue report, [~linghengqian]!

> The documentation provides an ambiguous explanation regarding the mutually 
> exclusive nature of `STORED BY` and `STORED AS`
> --
>
> Key: HIVE-28316
> URL: https://issues.apache.org/jira/browse/HIVE-28316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Qiheng He
>Assignee: Zhihua Deng
>Priority: Major
> Fix For: Not Applicable
>
>
> - The documentation provides an ambiguous explanation regarding the mutually 
> exclusive nature of {*}STORED BY{*} and {*}STORED AS{*}.
> - As mentioned on 
> https://cwiki.apache.org/confluence/display/Hive/StorageHandlers , when the 
> {*}CREATE TABLE{*} statement specifies {*}STORED BY{*}, it should not also 
> specify {*}STORED AS{*}. The content in question is as follows.
> {code:bash}
> When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED 
> AS cannot be specified. Optional SERDEPROPERTIES can be specified as part of 
> the STORED BY clause and will be passed to the serde provided by the storage 
> handler.
> See CREATE TABLE and Row Format, Storage Format, and SerDe for more 
> information.
> Example:
> CREATE TABLE hbase_table_1(key int, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "cf:string",
> "hbase.table.name" = "hbase_table_0"
> );
> {code}
> - This is similarly reflected in the documentation at 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL , where 
> {*}|{*} separates {*}STORED BY{*} from {*}STORED AS{*}, indicating their 
> distinct usage and mutual exclusivity.
> {code:bash}
> [
>[ROW FORMAT row_format] 
>[STORED AS file_format]
>  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  
> -- (Note: Available in Hive 0.6.0 and later)
> ]
> {code}
> - However, this contradicts the information provided in the Hive-Iceberg 
> Integration documentation at 
> https://cwiki.apache.org/confluence/display/Hive/Hive-Iceberg+Integration , 
> which explicitly gives examples demonstrating that {*}STORED BY{*} can 
> coexist with {*}STORED AS{*}. This creates an ambiguous interpretation. 
> {code:bash}
> The iceberg table currently supports three file formats: PARQUET, ORC & AVRO. 
> The default file format is Parquet. The file format can be explicitily 
> provided by using STORED AS  while creating the table
> Example-1:
> CREATE TABLE ORC_TABLE (ID INT) STORED BY ICEBERG STORED AS ORC;
> {code}
> - Further early discussions on this topic can be found at 
> https://github.com/apache/shardingsphere/pull/31526 .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28338) Client connection count is not correct in HiveMetaStore#close

2024-06-28 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860919#comment-17860919
 ] 

Zhihua Deng commented on HIVE-28338:


Fix has been merged. Thank you for the contribution, [~wechar]!

> Client connection count is not correct in HiveMetaStore#close
> -
>
> Key: HIVE-28338
> URL: https://issues.apache.org/jira/browse/HIVE-28338
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-24349 introduced a bug in {{HiveMetaStoreClient}} for embedded 
> metastore, where the log would print negative connection counts.
> *Root Cause*
> Connection count is only used in remote metastore, we do not need decrease 
> connection counts when transport is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28338) Client connection count is not correct in HiveMetaStore#close

2024-06-28 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28338.

Fix Version/s: 4.1.0
   Resolution: Fixed

> Client connection count is not correct in HiveMetaStore#close
> -
>
> Key: HIVE-28338
> URL: https://issues.apache.org/jira/browse/HIVE-28338
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-24349 introduced a bug in {{HiveMetaStoreClient}} for embedded 
> metastore, where the log would print negative connection counts.
> *Root Cause*
> Connection count is only used in remote metastore, we do not need decrease 
> connection counts when transport is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28352) Schematool fails to upgradeSchema on dbType=hive

2024-06-27 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860366#comment-17860366
 ] 

Zhihua Deng commented on HIVE-28352:


Thank you [~okumin] for the issue and PR, marked this Jira with 4.0.1-must.

> Schematool fails to upgradeSchema on dbType=hive
> 
>
> Key: HIVE-28352
> URL: https://issues.apache.org/jira/browse/HIVE-28352
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must
>
> Schematool tries to refer to incorrect file names.
> {code:java}
> $ schematool -metaDbType derby -dbType hive -initSchemaTo 3.0.0 -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> $ schematool -metaDbType derby -dbType hive -upgradeSchema -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> ...
> Completed upgrade-3.0.0-to-3.1.0.hive.sql
> Upgrade script upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Upgrade 
> FAILED! Metastore state would be inconsistent !!
> Upgrade FAILED! Metastore state would be inconsistent !!
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: 
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Use 
> --verbose for detailed stacktrace.
> Use --verbose for detailed stacktrace.
> 2024-06-27T01:41:46,573 ERROR [main] schematool.MetastoreSchemaTool: *** 
> schemaTool failed ***
> *** schemaTool failed *** {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28352) Schematool fails to upgradeSchema on dbType=hive

2024-06-27 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28352:
---
Labels: hive-4.0.1-must  (was: )

> Schematool fails to upgradeSchema on dbType=hive
> 
>
> Key: HIVE-28352
> URL: https://issues.apache.org/jira/browse/HIVE-28352
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must
>
> Schematool tries to refer to incorrect file names.
> {code:java}
> $ schematool -metaDbType derby -dbType hive -initSchemaTo 3.0.0 -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> $ schematool -metaDbType derby -dbType hive -upgradeSchema -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> ...
> Completed upgrade-3.0.0-to-3.1.0.hive.sql
> Upgrade script upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Upgrade 
> FAILED! Metastore state would be inconsistent !!
> Upgrade FAILED! Metastore state would be inconsistent !!
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: 
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Use 
> --verbose for detailed stacktrace.
> Use --verbose for detailed stacktrace.
> 2024-06-27T01:41:46,573 ERROR [main] schematool.MetastoreSchemaTool: *** 
> schemaTool failed ***
> *** schemaTool failed *** {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28205) Implement direct sql for get_partitions_ps_with_auth api

2024-06-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28205.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fix has been merged. Thank you [~wechar] for the PR, and [~zhangbutao] for the 
review!

> Implement direct sql for get_partitions_ps_with_auth api
> 
>
> Key: HIVE-28205
> URL: https://issues.apache.org/jira/browse/HIVE-28205
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28317) Support HiveServer2 JDBC Driver's `initFile` parameter to directly read SQL files on the classpath

2024-06-15 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855271#comment-17855271
 ] 

Zhihua Deng commented on HIVE-28317:


Do we see other databases support this use case? does {*}{*}{*}classpath{*} 
refer to the current directory where the process is started?

> Support HiveServer2 JDBC Driver's `initFile` parameter to directly read SQL 
> files on the classpath
> --
>
> Key: HIVE-28317
> URL: https://issues.apache.org/jira/browse/HIVE-28317
> Project: Hive
>  Issue Type: Wish
>Reporter: Qiheng He
>Priority: Major
>
> - Support HiveServer2 JDBC Driver's {*}initFile{*} parameter to directly read 
> SQL files on the classpath.
> - In https://issues.apache.org/jira/browse/HIVE-5867 , the {*}initFile{*} 
> parameter of HiveServer2 JDBC Driver is supported to read SQL files on 
> absolute paths. However, this jdbcUrl parameter does not natively support 
> paths on the {*}classpath{*}. This necessitates executing additional steps if 
> one needs to read a file located on the {*}classpath{*}.
> {code:java}
> import com.zaxxer.hikari.HikariConfig;
> import com.zaxxer.hikari.HikariDataSource;
> import org.awaitility.Awaitility;
> import org.junit.jupiter.api.Test;
> import java.nio.file.Paths;
> import java.time.Duration;
> public class ExampleTest {
> @Test
> void test() {
> String absolutePath = 
> Paths.get("src/test/resources/test-sql/test-databases-hive.sql")
> .toAbsolutePath().normalize().toString();
> HikariConfig config = new HikariConfig();
> config.setDriverClassName("org.apache.hive.jdbc.HiveDriver");
> config.setJdbcUrl("jdbc:hive2://localhost:1;initFile=" + 
> absolutePath);
> try (HikariDataSource hikariDataSource = new 
> HikariDataSource(config)) {
> 
> Awaitility.await().atMost(Duration.ofMinutes(1L)).ignoreExceptions().until(() 
> -> {
> hikariDataSource.getConnection().close();
> return true;
> });
> }
> }
> }
> {code}
> - It would greatly facilitate integration testing scenarios, such as those 
> with {*}testcontainers-java{*}, if the {*}initFile{*} parameter could 
> recognize classpath files by identifying a {*}classpath{*} prefix. For 
> instance, a JDBC URL like 
> {*}jdbc:hive2://localhost:1;initFile=classpath:test-sql/test-databases-hive.sql{*}
>  would become much more useful.
> - Further early discussions on this topic can be found at 
> https://github.com/apache/shardingsphere/pull/31526 .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28316) The documentation provides an ambiguous explanation regarding the mutually exclusive nature of `STORED BY` and `STORED AS`

2024-06-15 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855270#comment-17855270
 ] 

Zhihua Deng commented on HIVE-28316:


The document is somehow out-dated, let me fix it. Thank you for pointing it 
out, [~linghengqian] !

> The documentation provides an ambiguous explanation regarding the mutually 
> exclusive nature of `STORED BY` and `STORED AS`
> --
>
> Key: HIVE-28316
> URL: https://issues.apache.org/jira/browse/HIVE-28316
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Priority: Major
>
> - The documentation provides an ambiguous explanation regarding the mutually 
> exclusive nature of {*}STORED BY{*} and {*}STORED AS{*}.
> - As mentioned on 
> https://cwiki.apache.org/confluence/display/Hive/StorageHandlers , when the 
> {*}CREATE TABLE{*} statement specifies {*}STORED BY{*}, it should not also 
> specify {*}STORED AS{*}. The content in question is as follows.
> {code:bash}
> When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED 
> AS cannot be specified. Optional SERDEPROPERTIES can be specified as part of 
> the STORED BY clause and will be passed to the serde provided by the storage 
> handler.
> See CREATE TABLE and Row Format, Storage Format, and SerDe for more 
> information.
> Example:
> CREATE TABLE hbase_table_1(key int, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "cf:string",
> "hbase.table.name" = "hbase_table_0"
> );
> {code}
> - This is similarly reflected in the documentation at 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL , where 
> {*}|{*} separates {*}STORED BY{*} from {*}STORED AS{*}, indicating their 
> distinct usage and mutual exclusivity.
> {code:bash}
> [
>[ROW FORMAT row_format] 
>[STORED AS file_format]
>  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  
> -- (Note: Available in Hive 0.6.0 and later)
> ]
> {code}
> - However, this contradicts the information provided in the Hive-Iceberg 
> Integration documentation at 
> https://cwiki.apache.org/confluence/display/Hive/Hive-Iceberg+Integration , 
> which explicitly gives examples demonstrating that {*}STORED BY{*} can 
> coexist with {*}STORED AS{*}. This creates an ambiguous interpretation. 
> {code:bash}
> The iceberg table currently supports three file formats: PARQUET, ORC & AVRO. 
> The default file format is Parquet. The file format can be explicitily 
> provided by using STORED AS  while creating the table
> Example-1:
> CREATE TABLE ORC_TABLE (ID INT) STORED BY ICEBERG STORED AS ORC;
> {code}
> - Further early discussions on this topic can be found at 
> https://github.com/apache/shardingsphere/pull/31526 .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28316) The documentation provides an ambiguous explanation regarding the mutually exclusive nature of `STORED BY` and `STORED AS`

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-28316:
--

Assignee: Zhihua Deng

> The documentation provides an ambiguous explanation regarding the mutually 
> exclusive nature of `STORED BY` and `STORED AS`
> --
>
> Key: HIVE-28316
> URL: https://issues.apache.org/jira/browse/HIVE-28316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Qiheng He
>Assignee: Zhihua Deng
>Priority: Major
>
> - The documentation provides an ambiguous explanation regarding the mutually 
> exclusive nature of {*}STORED BY{*} and {*}STORED AS{*}.
> - As mentioned on 
> https://cwiki.apache.org/confluence/display/Hive/StorageHandlers , when the 
> {*}CREATE TABLE{*} statement specifies {*}STORED BY{*}, it should not also 
> specify {*}STORED AS{*}. The content in question is as follows.
> {code:bash}
> When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED 
> AS cannot be specified. Optional SERDEPROPERTIES can be specified as part of 
> the STORED BY clause and will be passed to the serde provided by the storage 
> handler.
> See CREATE TABLE and Row Format, Storage Format, and SerDe for more 
> information.
> Example:
> CREATE TABLE hbase_table_1(key int, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "cf:string",
> "hbase.table.name" = "hbase_table_0"
> );
> {code}
> - This is similarly reflected in the documentation at 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL , where 
> {*}|{*} separates {*}STORED BY{*} from {*}STORED AS{*}, indicating their 
> distinct usage and mutual exclusivity.
> {code:bash}
> [
>[ROW FORMAT row_format] 
>[STORED AS file_format]
>  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  
> -- (Note: Available in Hive 0.6.0 and later)
> ]
> {code}
> - However, this contradicts the information provided in the Hive-Iceberg 
> Integration documentation at 
> https://cwiki.apache.org/confluence/display/Hive/Hive-Iceberg+Integration , 
> which explicitly gives examples demonstrating that {*}STORED BY{*} can 
> coexist with {*}STORED AS{*}. This creates an ambiguous interpretation. 
> {code:bash}
> The iceberg table currently supports three file formats: PARQUET, ORC & AVRO. 
> The default file format is Parquet. The file format can be explicitily 
> provided by using STORED AS  while creating the table
> Example-1:
> CREATE TABLE ORC_TABLE (ID INT) STORED BY ICEBERG STORED AS ORC;
> {code}
> - Further early discussions on this topic can be found at 
> https://github.com/apache/shardingsphere/pull/31526 .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28316) The documentation provides an ambiguous explanation regarding the mutually exclusive nature of `STORED BY` and `STORED AS`

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28316:
---
Component/s: Documentation

> The documentation provides an ambiguous explanation regarding the mutually 
> exclusive nature of `STORED BY` and `STORED AS`
> --
>
> Key: HIVE-28316
> URL: https://issues.apache.org/jira/browse/HIVE-28316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Qiheng He
>Priority: Major
>
> - The documentation provides an ambiguous explanation regarding the mutually 
> exclusive nature of {*}STORED BY{*} and {*}STORED AS{*}.
> - As mentioned on 
> https://cwiki.apache.org/confluence/display/Hive/StorageHandlers , when the 
> {*}CREATE TABLE{*} statement specifies {*}STORED BY{*}, it should not also 
> specify {*}STORED AS{*}. The content in question is as follows.
> {code:bash}
> When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED 
> AS cannot be specified. Optional SERDEPROPERTIES can be specified as part of 
> the STORED BY clause and will be passed to the serde provided by the storage 
> handler.
> See CREATE TABLE and Row Format, Storage Format, and SerDe for more 
> information.
> Example:
> CREATE TABLE hbase_table_1(key int, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "cf:string",
> "hbase.table.name" = "hbase_table_0"
> );
> {code}
> - This is similarly reflected in the documentation at 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL , where 
> {*}|{*} separates {*}STORED BY{*} from {*}STORED AS{*}, indicating their 
> distinct usage and mutual exclusivity.
> {code:bash}
> [
>[ROW FORMAT row_format] 
>[STORED AS file_format]
>  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  
> -- (Note: Available in Hive 0.6.0 and later)
> ]
> {code}
> - However, this contradicts the information provided in the Hive-Iceberg 
> Integration documentation at 
> https://cwiki.apache.org/confluence/display/Hive/Hive-Iceberg+Integration , 
> which explicitly gives examples demonstrating that {*}STORED BY{*} can 
> coexist with {*}STORED AS{*}. This creates an ambiguous interpretation. 
> {code:bash}
> The iceberg table currently supports three file formats: PARQUET, ORC & AVRO. 
> The default file format is Parquet. The file format can be explicitily 
> provided by using STORED AS  while creating the table
> Example-1:
> CREATE TABLE ORC_TABLE (ID INT) STORED BY ICEBERG STORED AS ORC;
> {code}
> - Further early discussions on this topic can be found at 
> https://github.com/apache/shardingsphere/pull/31526 .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28308) The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all dependencies under the Compile Scope

2024-06-15 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855269#comment-17855269
 ] 

Zhihua Deng commented on HIVE-28308:


[~linghengqian] Thank you for the input, 

Regarding "Furthermore, the Uber JAR can complicate building GraalVM Native 
Images, necessitating additional GraalVM Reachability Metadata due to its 
inclusive nature."

Could you please give more details why the uber jar could introduce the trouble 
to the building? is it because of the big size? Apart from the class conflict, 
is there any other reason why org.apache.hive:hive-jdbc:4.0.0 is preferable 
than the standalone jar?

Thanks,

Zhihua

> The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all 
> dependencies under the Compile Scope
> ---
>
> Key: HIVE-28308
> URL: https://issues.apache.org/jira/browse/HIVE-28308
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Priority: Major
>
> - The module *org.apache.hive:hive-jdbc:4.0.0* unintuitively removed all 
> dependencies under the Compile Scope.
>  - Comparing the dependencies listed at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/4.0.0/dependencies]
>  with those at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/3.1.3/dependencies]
>  , it becomes apparent that *org.apache.hive:hive-jdbc:4.0.0* includes only 
> test dependencies. This results in the need to manually add additional 
> dependencies when utilizing the HiveServer2 JDBC Driver.
>  - 
> {code:xml}
> 
>   org.apache.hive
>   hive-jdbc
>   4.0.0
> 
> 
>   org.apache.hive
>   hive-service
>   4.0.0
> 
>   
> org.apache.logging.log4j
> log4j-slf4j-impl
>   
>   
> org.slf4j
> slf4j-reload4j
>   
>   
> org.apache.logging.log4j
> log4j-web
>   
>   
> org.apache.logging.log4j
> log4j-core
>   
>   
> org.apache.logging.log4j
> log4j-api
>   
>   
> org.apache.logging.log4j
> log4j-1.2-api
>   
>   
> 
> 
>   org.apache.hadoop
>   hadoop-client-runtime
>   3.3.6
> 
> {code}
>  - More early surveys come from 
> https://github.com/apache/shardingSphere/pull/31526 and 
> https://github.com/dbeaver/dbeaver/issues/22777 . I personally think this is 
> not a reasonable phenomenon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28278:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28315) Missing classes while using hive jdbc standalone jar

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-28315:
--

Assignee: Zhihua Deng

> Missing classes while using hive jdbc standalone jar
> 
>
> Key: HIVE-28315
> URL: https://issues.apache.org/jira/browse/HIVE-28315
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28270) Fix missing partition paths bug on drop_database

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28270:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Fix missing partition paths  bug on drop_database
> -
>
> Key: HIVE-28270
> URL: https://issues.apache.org/jira/browse/HIVE-28270
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> In {{HMSHandler#drop_database_core}}, it needs to collect all partition paths 
> that were not in the subdirectory of the table path, but now it only fetch 
> the last batch of paths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28271) DirectSql fails for AlterPartitions

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28271:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> DirectSql fails for AlterPartitions
> ---
>
> Key: HIVE-28271
> URL: https://issues.apache.org/jira/browse/HIVE-28271
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> It fails at three places: (Misses Database Which Uses CLOB & Missing Boolean 
> type conversions Checks
> *First:*
> {noformat}
> 2024-05-21T08:50:16,570  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.lang.ClassCastException: org.apache.derby.impl.jdbc.EmbedClob cannot be 
> cast to java.lang.String at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.getParams(DirectSqlUpdatePart.java:748)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateParamTableInBatch(DirectSqlUpdatePart.java:715)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:636)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);
> {noformat}
> *Second:*
> {noformat}
> 2024-05-21T09:14:36,808  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.lang.ClassCastException: org.apache.derby.impl.jdbc.EmbedClob cannot be 
> cast to java.lang.String at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateCDInBatch(DirectSqlUpdatePart.java:1228)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateStorageDescriptorInBatch(DirectSqlUpdatePart.java:888)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:638)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);{noformat}
> *Third: Missing Boolean check type*
> {noformat}
> 2024-05-21T09:35:44,063  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.sql.BatchUpdateException: A truncation error was encountered trying to 
> shrink CHAR 'false' to length 1. at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.lambda$updateSDInBatch$16(DirectSqlUpdatePart.java:926)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateWithStatement(DirectSqlUpdatePart.java:656)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateSDInBatch(DirectSqlUpdatePart.java:926)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateStorageDescriptorInBatch(DirectSqlUpdatePart.java:900)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:638)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28260) CreateTableEvent wrongly skips authorizing DFS_URI for managed table

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28260:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> CreateTableEvent wrongly skips authorizing DFS_URI for managed table 
> -
>
> Key: HIVE-28260
> URL: https://issues.apache.org/jira/browse/HIVE-28260
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-27525 eased out permissions for external table but it wrongly eased out 
> for managed tables as well by wrong check for managed tables



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28211) Restore hive-exec-core jar

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28211:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Restore hive-exec-core jar
> --
>
> Key: HIVE-28211
> URL: https://issues.apache.org/jira/browse/HIVE-28211
> Project: Hive
>  Issue Type: Task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> The hive-exec-core jar is used by spark, oozie, hudi and many other pojects. 
> Removal of the hive-exec-core jar has caused the following issues.
> Spark : [https://lists.apache.org/list?d...@hive.apache.org:lte=1M:joda]
> Oozie: [https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg]
> Hudi: [apache/hudi#8147|https://github.com/apache/hudi/issues/8147]
> Until the we shade & relocate dependencies in hive-exec, we should restore 
> the hive-exec core jar .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28254) CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28254:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results
> ---
>
> Key: HIVE-28254
> URL: https://issues.apache.org/jira/browse/HIVE-28254
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> CBO return path can build incorrect GroupByOperator when multiple 
> aggregations with DISTINCT are involved.
> This is an example.
> {code:java}
> CREATE TABLE test (col1 INT, col2 INT);
> INSERT INTO test VALUES (1, 100), (2, 200), (2, 200), (3, 300);
> set hive.cbo.returnpath.hiveop=true;
> set hive.map.aggr=false;
> SELECT
>   SUM(DISTINCT col1),
>   COUNT(DISTINCT col1),
>   SUM(DISTINCT col2),
>   SUM(col2)
> FROM test;{code}
> The last column should be 800. But the SUM refers to col1 and the actual 
> result is 8.
> {code:java}
> +--+--+--+--+
> | _c0  | _c1  | _c2  | _c3  |
> +--+--+--+--+
> | 6    | 3    | 600  | 8    |
> +--+--+--+--+ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28202:
---
Labels: hive-4.0.1-merged hive-4.0.1-must performance 
pull-request-available  (was: hive-4.0.1-must performance 
pull-request-available)

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, performance, 
> pull-request-available
> Fix For: 4.1.0
>
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28190) Materialized view rebuild lock heart-beating is broken

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28190:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Materialized view rebuild lock heart-beating is broken
> --
>
> Key: HIVE-28190
> URL: https://issues.apache.org/jira/browse/HIVE-28190
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Zsolt Miskolczi
>Assignee: Zsolt Miskolczi
>Priority: Critical
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> It fails with the following error: 
> {code:java}
> org.springframework.dao.InvalidDataAccessApiUsageException: SQL [UPDATE 
> "MATERIALIZATION_REBUILD_LOCKS" SET "MRL_LAST_HEARTBEAT" = 1712571919559 
> WHERE "MRL_TXN_ID" = 2297 AND "MRL_DB_NAME" = ? AND "MRL_TBL_NAME" = ?]: 
> given 2 parameters but expected 0 {code}
> We didn't spot it so far as when the heartbeat of materialized view fails 
> with an error, it doesn't affect the rebuild query run. So that it can be 
> only spotted by actively watching the logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28166) Iceberg: Truncate on branch operates on the main table

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28166:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Iceberg: Truncate on branch operates on the main table
> --
>
> Key: HIVE-28166
> URL: https://issues.apache.org/jira/browse/HIVE-28166
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Shooting a truncate operation on an iceberg table branching is operating on 
> the main table instead of operating on the main table



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28158) Add ASF license header in non-java files

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28158:
---
Labels: hive-4.0.1-merged hive-4.0.1-must  (was: hive-4.0.1-must)

> Add ASF license header in non-java files
> 
>
> Key: HIVE-28158
> URL: https://issues.apache.org/jira/browse/HIVE-28158
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must
> Fix For: 4.1.0
>
>
> According to the a [ASF policy|https://www.apache.org/legal/src-headers.html] 
> all source files should contain an ASF header. Currently there are a lot of 
> source files that do not contain the ASF header. The files can be broken into 
> the following categories:
> *Must have:*
>  * Python files (.py)
>  * Bash/Shell script files (.sh)
>  * Javascript files (.js)
> *Should have:*
>  * Maven files (pom.xml)
>  * GitHub workflows and Docker files (.yml)
> *Good to have:*
>  * Hive/Tez/Yarn and other configuration files (.xml)
>  * Log4J property files (.properties)
>  * Markdown files (.md)
> *Could have but OK if they don't:*
>  * Data files for tests (data/files/**)
>  * Generated code files (src/gen)
>  * QTest input/output files (.q, .q.out)
>  * IntelliJ files (.idea)
>  * Other txt and data files
> The changes here aim to address the first three categories (must, should, 
> good) and add the missing header when possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28098) Fails to copy empty column statistics of materialized CTE

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28098:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Fails to copy empty column statistics of materialized CTE
> -
>
> Key: HIVE-28098
> URL: https://issues.apache.org/jira/browse/HIVE-28098
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-28080 introduced the optimization of materialized CTEs, but it turned 
> out that it failed when statistics were empty.
> This query reproduces the issue.
> {code:java}
> set hive.stats.autogather=false;
> CREATE TABLE src_no_stats AS SELECT '123' as key, 'val123' as value UNION ALL 
> SELECT '9' as key, 'val9' as value;
> set hive.optimize.cte.materialize.threshold=2;
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> EXPLAIN WITH materialized_cte1 AS (
>   SELECT * FROM src_no_stats
> ),
> materialized_cte2 AS (
>   SELECT a.key
>   FROM materialized_cte1 a
>   JOIN materialized_cte1 b ON (a.key = b.key)
> )
> SELECT a.key
> FROM materialized_cte2 a
> JOIN materialized_cte2 b ON (a.key = b.key); {code}
> It throws an error.
> {code:java}
> Error: Error while compiling statement: FAILED: IllegalStateException The 
> size of col stats must be equal to that of schema. Stats = [], Schema = [key] 
> (state=42000,code=4) {code}
> Attaching a debugger, FSO of materialized_cte2 has empty stats as 
> JoinOperator loses stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28134) Improve SecureCmdDoAs

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28134:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Improve SecureCmdDoAs
> -
>
> Key: HIVE-28134
> URL: https://issues.apache.org/jira/browse/HIVE-28134
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Improve the SecureCmdDoAs code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28161) Incorrect Copyright years in META-INF/NOTICE files

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28161:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Incorrect Copyright years in META-INF/NOTICE files
> --
>
> Key: HIVE-28161
> URL: https://issues.apache.org/jira/browse/HIVE-28161
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> The generated META-INF/NOTICE file which resides inside each jar produced by 
> Hive has incorrect copyright years.
> Inside all jars the NOTICE file has the following incorrect content:
> {noformat}
> Copyright 2020 The Apache Software Foundation
> {noformat}
> The Copyright statement should include the timespan from the inception of the 
> project to now.
> {noformat}
> Copyright 2008-2024 The Apache Software Foundation
> {noformat}
> The problem can be easily seen by inspecting the jar content after building 
> the module or checking the previously published jars in Maven central.
> {noformat}
> mvn clean install -DskipTests -pl common/
> jar xf common/target/hive-common-4.1.0-SNAPSHOT.jar META-INF
> cat META-INF/NOTICE 
> Hive Common
> Copyright 2020 The Apache Software Foundation
> This product includes software developed at
> The Apache Software Foundation (http://www.apache.org/).
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27847) Prevent query Failures on Numeric <-> Timestamp

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-27847:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

>  Prevent query Failures on Numeric <-> Timestamp
> 
>
> Key: HIVE-27847
> URL: https://issues.apache.org/jira/browse/HIVE-27847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
> Environment: master
> 4.0.0-alpha-1
>Reporter: Basapuram Kumar
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
> Attachments: HIVE-27847.patch
>
>
> In Master/4.0.0-alpha-1 branches, performing the Numeric to Timestamp 
> conversion, its failing with the error as 
> "{color:#de350b}org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited 
> (hive.strict.timestamp.conversion){color}" .
>  
> *Repro steps.*
>  # Sample data
> {noformat}
> $ hdfs dfs -cat /tmp/tc/t.csv
> 1653209895687,2022-05-22T15:58:15.931+07:00
> 1653209938316,2022-05-22T15:58:58.490+07:00
> 1653209962021,2022-05-22T15:59:22.191+07:00
> 1653210021993,2022-05-22T16:00:22.174+07:00
> 1653209890524,2022-05-22T15:58:10.724+07:00
> 1653210095382,2022-05-22T16:01:35.775+07:00
> 1653210044308,2022-05-22T16:00:44.683+07:00
> 1653210098546,2022-05-22T16:01:38.886+07:00
> 1653210012220,2022-05-22T16:00:12.394+07:00
> 165321376,2022-05-22T16:00:00.622+07:00{noformat}
>  # table with above data [1]
> {noformat}
> create external table   test_ts_conv(begin string, ts string) row format 
> delimited fields terminated by ',' stored as TEXTFILE LOCATION '/tmp/tc/';
> desc   test_ts_conv;
> | col_name  | data_type  | comment  |
> +---++--+
> | begin     | string     |          |
> | ts        | string     |          |
> +---++--+{noformat}
>  #  Create table with CTAS
> {noformat}
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=true  |
> +-+
> set to false
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion=false;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=false  |
> +-+
> #Query:
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> 
> CREATE TABLE t_date 
> AS 
> select
>   CAST( CAST( `begin` AS BIGINT) / 1000  AS TIMESTAMP ) `begin`, 
>   CAST( 
> DATE_FORMAT(CAST(regexp_replace(`ts`,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})\\+(\\d{2}):(\\d{2})','$1-$2-$3
>  $4:$5:$6.$7') AS TIMESTAMP ),'MMdd') as BIGINT ) `par_key`
> FROM    test_ts_conv;{noformat}
> Error:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149)
>     at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>     at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:508)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314)
>     ... 17 more {code}



--
This message was sent by Atlassian Jira

[jira] [Updated] (HIVE-28082) HiveAggregateReduceFunctionsRule could generate an inconsistent result

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28082:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> HiveAggregateReduceFunctionsRule could generate an inconsistent result
> --
>
> Key: HIVE-28082
> URL: https://issues.apache.org/jira/browse/HIVE-28082
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> HiveAggregateReduceFunctionsRule translates AVG, STDDEV_POP, STDDEV_SAMP, 
> VAR_POP, and VAR_SAMP. Those UDFs accept string types and try to decode them 
> as floating point values. It is possible that undecodable values exist.
> We found that it could cause inconsistent behaviors with or without CBO.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> SELECT AVG('text');
> ...
> +--+
> | _c0  |
> +--+
> | 0.0  |
> +--+
> 1 row selected (18.229 seconds)
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> set hive.cbo.enable=false;
> No rows affected (0.013 seconds)
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> SELECT AVG('text');
> ...
> +---+
> |  _c0  |
> +---+
> | NULL  |
> +---+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28204) Remove some HMS obsolete scripts

2024-06-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28204:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Remove some HMS obsolete scripts
> 
>
> Key: HIVE-28204
> URL: https://issues.apache.org/jira/browse/HIVE-28204
> Project: Hive
>  Issue Type: Task
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> As the Hive 1.x has reached end of life, the scripts for HMS metadata need to 
> be removed from the repository and the packaged tarball, however it's better 
> to keep the script for the Hive to upgrade from 1.x and the test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28204) Remove some HMS obsolete scripts

2024-06-10 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28204.

Fix Version/s: 4.1.0
   Resolution: Fixed

> Remove some HMS obsolete scripts
> 
>
> Key: HIVE-28204
> URL: https://issues.apache.org/jira/browse/HIVE-28204
> Project: Hive
>  Issue Type: Task
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> As the Hive 1.x has reached end of life, the scripts for HMS metadata need to 
> be removed from the repository and the packaged tarball, however it's better 
> to keep the script for the Hive to upgrade from 1.x and the test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28308) The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all dependencies under the Compile Scope

2024-06-10 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853584#comment-17853584
 ] 

Zhihua Deng commented on HIVE-28308:


I think the org.apache.hive:hive-jdbc:4.0.0 behaviors as expected, in your case 
please try
{noformat}

org.apache.hive
hive-jdbc
standalone
4.0.0
{noformat}
it would no need to add extra dependencies, but I'm not sure if it works well 
with the embedded HS2.
However there are some missing classes when I try using this standalone jar, I 
filed a Jira HIVE-28315 to address the problem.

> The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all 
> dependencies under the Compile Scope
> ---
>
> Key: HIVE-28308
> URL: https://issues.apache.org/jira/browse/HIVE-28308
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Priority: Major
>
> - The module *org.apache.hive:hive-jdbc:4.0.0* unintuitively removed all 
> dependencies under the Compile Scope.
>  - Comparing the dependencies listed at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/4.0.0/dependencies]
>  with those at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/3.1.3/dependencies]
>  , it becomes apparent that *org.apache.hive:hive-jdbc:4.0.0* includes only 
> test dependencies. This results in the need to manually add additional 
> dependencies when utilizing the HiveServer2 JDBC Driver.
>  - 
> {code:xml}
> 
>   org.apache.hive
>   hive-jdbc
>   4.0.0
> 
> 
>   org.apache.hive
>   hive-service
>   4.0.0
> 
>   
> org.apache.logging.log4j
> log4j-slf4j-impl
>   
>   
> org.slf4j
> slf4j-reload4j
>   
>   
> org.apache.logging.log4j
> log4j-web
>   
>   
> org.apache.logging.log4j
> log4j-core
>   
>   
> org.apache.logging.log4j
> log4j-api
>   
>   
> org.apache.logging.log4j
> log4j-1.2-api
>   
>   
> 
> 
>   org.apache.hadoop
>   hadoop-client-runtime
>   3.3.6
> 
> {code}
>  - More early surveys come from 
> https://github.com/apache/shardingSphere/pull/31526 and 
> https://github.com/dbeaver/dbeaver/issues/22777 . I personally think this is 
> not a reasonable phenomenon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28315) Missing classes while using hive jdbc standalone jar

2024-06-10 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28315:
---
Labels: hive-4.0.1-must  (was: )

> Missing classes while using hive jdbc standalone jar
> 
>
> Key: HIVE-28315
> URL: https://issues.apache.org/jira/browse/HIVE-28315
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28315) Missing classes while using hive jdbc standalone jar

2024-06-10 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-28315:
--

 Summary: Missing classes while using hive jdbc standalone jar
 Key: HIVE-28315
 URL: https://issues.apache.org/jira/browse/HIVE-28315
 Project: Hive
  Issue Type: Improvement
Reporter: Zhihua Deng
Assignee: Zhihua Deng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28315) Missing classes while using hive jdbc standalone jar

2024-06-10 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-28315:
--

Assignee: (was: Zhihua Deng)

> Missing classes while using hive jdbc standalone jar
> 
>
> Key: HIVE-28315
> URL: https://issues.apache.org/jira/browse/HIVE-28315
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28308) The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all dependencies under the Compile Scope

2024-06-06 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28308:
---
Labels:   (was: hive-4.0.1-must)

> The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all 
> dependencies under the Compile Scope
> ---
>
> Key: HIVE-28308
> URL: https://issues.apache.org/jira/browse/HIVE-28308
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Priority: Major
>
> - The module *org.apache.hive:hive-jdbc:4.0.0* unintuitively removed all 
> dependencies under the Compile Scope.
>  - Comparing the dependencies listed at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/4.0.0/dependencies]
>  with those at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/3.1.3/dependencies]
>  , it becomes apparent that *org.apache.hive:hive-jdbc:4.0.0* includes only 
> test dependencies. This results in the need to manually add additional 
> dependencies when utilizing the HiveServer2 JDBC Driver.
>  - 
> {code:xml}
> 
>   org.apache.hive
>   hive-jdbc
>   4.0.0
> 
> 
>   org.apache.hive
>   hive-service
>   4.0.0
> 
>   
> org.apache.logging.log4j
> log4j-slf4j-impl
>   
>   
> org.slf4j
> slf4j-reload4j
>   
>   
> org.apache.logging.log4j
> log4j-web
>   
>   
> org.apache.logging.log4j
> log4j-core
>   
>   
> org.apache.logging.log4j
> log4j-api
>   
>   
> org.apache.logging.log4j
> log4j-1.2-api
>   
>   
> 
> 
>   org.apache.hadoop
>   hadoop-client-runtime
>   3.3.6
> 
> {code}
>  - More early surveys come from 
> https://github.com/apache/shardingSphere/pull/31526 and 
> https://github.com/dbeaver/dbeaver/issues/22777 . I personally think this is 
> not a reasonable phenomenon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28308) The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all dependencies under the Compile Scope

2024-06-06 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28308:
---
Labels: hive-4.0.1-must  (was: )

> The module `org.apache.hive:hive-jdbc:4.0.0` unintuitively removed all 
> dependencies under the Compile Scope
> ---
>
> Key: HIVE-28308
> URL: https://issues.apache.org/jira/browse/HIVE-28308
> Project: Hive
>  Issue Type: Bug
>Reporter: Qiheng He
>Priority: Major
>  Labels: hive-4.0.1-must
>
> - The module *org.apache.hive:hive-jdbc:4.0.0* unintuitively removed all 
> dependencies under the Compile Scope.
>  - Comparing the dependencies listed at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/4.0.0/dependencies]
>  with those at 
> [https://central.sonatype.com/artifact/org.apache.hive/hive-jdbc/3.1.3/dependencies]
>  , it becomes apparent that *org.apache.hive:hive-jdbc:4.0.0* includes only 
> test dependencies. This results in the need to manually add additional 
> dependencies when utilizing the HiveServer2 JDBC Driver.
>  - 
> {code:xml}
> 
>   org.apache.hive
>   hive-jdbc
>   4.0.0
> 
> 
>   org.apache.hive
>   hive-service
>   4.0.0
> 
>   
> org.apache.logging.log4j
> log4j-slf4j-impl
>   
>   
> org.slf4j
> slf4j-reload4j
>   
>   
> org.apache.logging.log4j
> log4j-web
>   
>   
> org.apache.logging.log4j
> log4j-core
>   
>   
> org.apache.logging.log4j
> log4j-api
>   
>   
> org.apache.logging.log4j
> log4j-1.2-api
>   
>   
> 
> 
>   org.apache.hadoop
>   hadoop-client-runtime
>   3.3.6
> 
> {code}
>  - More early surveys come from 
> https://github.com/apache/shardingSphere/pull/31526 and 
> https://github.com/dbeaver/dbeaver/issues/22777 . I personally think this is 
> not a reasonable phenomenon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26538) MetastoreDefaultTransformer should revise the location when it's empty

2024-06-06 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-26538:
--

Assignee: (was: Zhihua Deng)

> MetastoreDefaultTransformer should revise the location when it's empty
> --
>
> Key: HIVE-26538
> URL: https://issues.apache.org/jira/browse/HIVE-26538
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The table's location is treated as null when it's empty, this takes place 
> somewhere such as:
> [https://github.com/apache/hive/blob/82f319773cb2361a98963e861fb903ab8eecd9c4/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L2367]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java#L729]
>   
> MetastoreDefaultTransformer should revise the empty location when 
> altering/creating tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28286) Add filtering support for get_table_metas API in Hive metastore

2024-06-06 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28286.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fix has been merged! Thank you Naveen for the PR and everyone involved in the 
review!

> Add filtering support for get_table_metas API in Hive metastore
> ---
>
> Key: HIVE-28286
> URL: https://issues.apache.org/jira/browse/HIVE-28286
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Hive Metastore has support for filtering objects thru the plugin authorizer 
> for some APIs like getTables(), getDatabases(), getDataConnectors() etc. 
> However, the same should be done for the get_table_metas() API call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27847) Prevent query Failures on Numeric <-> Timestamp

2024-06-05 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-27847:
---
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Fix has been merged. Thank you [~basapuram.kumar] for issue report and the 
original work, and [~okumin] for the check and contribution!

>  Prevent query Failures on Numeric <-> Timestamp
> 
>
> Key: HIVE-27847
> URL: https://issues.apache.org/jira/browse/HIVE-27847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
> Environment: master
> 4.0.0-alpha-1
>Reporter: Basapuram Kumar
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
> Attachments: HIVE-27847.patch
>
>
> In Master/4.0.0-alpha-1 branches, performing the Numeric to Timestamp 
> conversion, its failing with the error as 
> "{color:#de350b}org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited 
> (hive.strict.timestamp.conversion){color}" .
>  
> *Repro steps.*
>  # Sample data
> {noformat}
> $ hdfs dfs -cat /tmp/tc/t.csv
> 1653209895687,2022-05-22T15:58:15.931+07:00
> 1653209938316,2022-05-22T15:58:58.490+07:00
> 1653209962021,2022-05-22T15:59:22.191+07:00
> 1653210021993,2022-05-22T16:00:22.174+07:00
> 1653209890524,2022-05-22T15:58:10.724+07:00
> 1653210095382,2022-05-22T16:01:35.775+07:00
> 1653210044308,2022-05-22T16:00:44.683+07:00
> 1653210098546,2022-05-22T16:01:38.886+07:00
> 1653210012220,2022-05-22T16:00:12.394+07:00
> 165321376,2022-05-22T16:00:00.622+07:00{noformat}
>  # table with above data [1]
> {noformat}
> create external table   test_ts_conv(begin string, ts string) row format 
> delimited fields terminated by ',' stored as TEXTFILE LOCATION '/tmp/tc/';
> desc   test_ts_conv;
> | col_name  | data_type  | comment  |
> +---++--+
> | begin     | string     |          |
> | ts        | string     |          |
> +---++--+{noformat}
>  #  Create table with CTAS
> {noformat}
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=true  |
> +-+
> set to false
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion=false;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=false  |
> +-+
> #Query:
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> 
> CREATE TABLE t_date 
> AS 
> select
>   CAST( CAST( `begin` AS BIGINT) / 1000  AS TIMESTAMP ) `begin`, 
>   CAST( 
> DATE_FORMAT(CAST(regexp_replace(`ts`,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})\\+(\\d{2}):(\\d{2})','$1-$2-$3
>  $4:$5:$6.$7') AS TIMESTAMP ),'MMdd') as BIGINT ) `par_key`
> FROM    test_ts_conv;{noformat}
> Error:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149)
>     at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>     at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:508)
>     at 
> 

[jira] [Comment Edited] (HIVE-28293) TestGracefulStopHS2 runs too long

2024-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851550#comment-17851550
 ] 

Zhihua Deng edited comment on HIVE-28293 at 6/3/24 8:18 AM:


Sounds like the query is still running on cancellation until it completes(Tez) 
in this case, or could we remove this line?

[https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java#L420]


was (Author: dengzh):
Sounds like the query is still running on cancellation until it completes(Tez) 
in this case, or could we remove this line?

[https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java#L420]

 

 

> TestGracefulStopHS2 runs too long
> -
>
> Key: HIVE-28293
> URL: https://issues.apache.org/jira/browse/HIVE-28293
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: newbie
>
> it tests whether long queries are finished successfully in the event of a 
> graceful shutdown, however 10 minutes seems to be too long: 
> https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/itests/hive-unit/src/test/java/org/apache/hive/service/server/TestGracefulStopHS2.java#L85
> I'm pretty sure we can validate this in a few minutes
> {code}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hive.service.server.TestGracefulStopHS2
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 1,225.833 s - in org.apache.hive.service.server.TestGracefulStopHS2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28293) TestGracefulStopHS2 runs too long

2024-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851550#comment-17851550
 ] 

Zhihua Deng commented on HIVE-28293:


Sounds like the query is still running on cancellation until it completes(Tez) 
in this case, or could we remove this line?

[https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java#L420]

 

 

> TestGracefulStopHS2 runs too long
> -
>
> Key: HIVE-28293
> URL: https://issues.apache.org/jira/browse/HIVE-28293
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: newbie
>
> it tests whether long queries are finished successfully in the event of a 
> graceful shutdown, however 10 minutes seems to be too long: 
> https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/itests/hive-unit/src/test/java/org/apache/hive/service/server/TestGracefulStopHS2.java#L85
> I'm pretty sure we can validate this in a few minutes
> {code}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hive.service.server.TestGracefulStopHS2
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 1,225.833 s - in org.apache.hive.service.server.TestGracefulStopHS2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28293) TestGracefulStopHS2 runs too long

2024-06-02 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851517#comment-17851517
 ] 

Zhihua Deng commented on HIVE-28293:


The second query with 60ms is supposed to timeout and failed as the HS2 has 
been shutdown in 60s,

https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/itests/hive-unit/src/test/java/org/apache/hive/service/server/TestGracefulStopHS2.java#L55

it shouldn't have been running for 10min when the shutdown is triggered.

Not sure why the test spends so much time, when I test on my local laptop, it 
finishes within one minutes.

 

> TestGracefulStopHS2 runs too long
> -
>
> Key: HIVE-28293
> URL: https://issues.apache.org/jira/browse/HIVE-28293
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: newbie
>
> it tests whether long queries are finished successfully in the event of a 
> graceful shutdown, however 10 minutes seems to be too long: 
> https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/itests/hive-unit/src/test/java/org/apache/hive/service/server/TestGracefulStopHS2.java#L85
> I'm pretty sure we can validate this in a few minutes
> {code}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hive.service.server.TestGracefulStopHS2
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 1,225.833 s - in org.apache.hive.service.server.TestGracefulStopHS2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28293) TestGracefulStopHS2 runs too long

2024-06-02 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851517#comment-17851517
 ] 

Zhihua Deng edited comment on HIVE-28293 at 6/3/24 4:42 AM:


The second query with 60ms is supposed to timeout and failed as the HS2 has 
been shutdown in 60s,

[https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/itests/hive-unit/src/test/java/org/apache/hive/service/server/TestGracefulStopHS2.java#L55]

it shouldn't have been running for 10min when the shutdown is triggered.

Not sure why the test spends so much time, when I test on my local laptop, it 
finishes within one minutes.


was (Author: dengzh):
The second query with 60ms is supposed to timeout and failed as the HS2 has 
been shutdown in 60s,

https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/itests/hive-unit/src/test/java/org/apache/hive/service/server/TestGracefulStopHS2.java#L55

it shouldn't have been running for 10min when the shutdown is triggered.

Not sure why the test spends so much time, when I test on my local laptop, it 
finishes within one minutes.

 

> TestGracefulStopHS2 runs too long
> -
>
> Key: HIVE-28293
> URL: https://issues.apache.org/jira/browse/HIVE-28293
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: newbie
>
> it tests whether long queries are finished successfully in the event of a 
> graceful shutdown, however 10 minutes seems to be too long: 
> https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/itests/hive-unit/src/test/java/org/apache/hive/service/server/TestGracefulStopHS2.java#L85
> I'm pretty sure we can validate this in a few minutes
> {code}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hive.service.server.TestGracefulStopHS2
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 1,225.833 s - in org.apache.hive.service.server.TestGracefulStopHS2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28293) TestGracefulStopHS2 runs too long

2024-06-02 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-28293:
--

Assignee: Zhihua Deng

> TestGracefulStopHS2 runs too long
> -
>
> Key: HIVE-28293
> URL: https://issues.apache.org/jira/browse/HIVE-28293
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: newbie
>
> it tests whether long queries are finished successfully in the event of a 
> graceful shutdown, however 10 minutes seems to be too long: 
> https://github.com/apache/hive/blob/8c90ec0ce576d6319470f7dc4dd27daebb654dec/itests/hive-unit/src/test/java/org/apache/hive/service/server/TestGracefulStopHS2.java#L85
> I'm pretty sure we can validate this in a few minutes
> {code}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hive.service.server.TestGracefulStopHS2
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 1,225.833 s - in org.apache.hive.service.server.TestGracefulStopHS2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28263) Metastore scripts : Update query getting stuck when sub-query of in-clause is returning empty results

2024-05-31 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28263.

Fix Version/s: Not Applicable
   Resolution: Not A Bug

I tested the query against the target db, changing the left join to inner join 
can solve the problem, which is not the case on upstream. Closing the jira.

> Metastore scripts : Update query getting stuck when sub-query of in-clause is 
> returning empty results
> -
>
> Key: HIVE-28263
> URL: https://issues.apache.org/jira/browse/HIVE-28263
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
> Fix For: Not Applicable
>
>
> As part of fix HIVE-27457
> below query is added to 
> [upgrade-4.0.0-alpha-2-to-4.0.0-beta-1.mysql.sql|https://github.com/apache/hive/blob/0e84fe2000c026afd0a49f4e7c7dd5f54fe7b1ec/standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-4.0.0-alpha-2-to-4.0.0-beta-1.mysql.sql#L43]
> {noformat}
> UPDATE SERDES
> SET SERDES.SLIB = "org.apache.hadoop.hive.kudu.KuduSerDe"
> WHERE SERDE_ID IN (
> SELECT SDS.SERDE_ID
> FROM TBLS
> INNER JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
> WHERE TBLS.TBL_ID IN (SELECT TBL_ID FROM TABLE_PARAMS WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%')
> );{noformat}
> This query is getting hung when sub-query is returning empty results in MySQL
>  
>  
> {noformat}
> MariaDB [test]> SELECT TBL_ID FROM table_params WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%';
> Empty set (0.33 sec)
> MariaDB [test]> SELECT sds.SERDE_ID FROM tbls LEFT JOIN sds ON tbls.SD_ID = 
> sds.SD_ID WHERE tbls.TBL_ID IN (SELECT TBL_ID FROM table_params WHERE 
> PARAM_VALUE LIKE '%KuduStorageHandler%');
> Empty set (0.44 sec)
> {noformat}
> And the query kept on running for more than 20 minutes
> {noformat}
> MariaDB [test]> UPDATE serdes SET serdes.SLIB = 
> "org.apache.hadoop.hive.kudu.KuduSerDe" WHERE SERDE_ID IN ( SELECT 
> sds.SERDE_ID FROM tbls LEFT JOIN sds ON tbls.SD_ID = sds.SD_ID WHERE 
> tbls.TBL_ID IN (SELECT TBL_ID FROM table_params WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%'));
> ^CCtrl-C -- query killed. Continuing normally.
> ERROR 1317 (70100): Query execution was interrupted{noformat}
> The explain extended looks like
> {noformat}
> MariaDB [test]> explain extended UPDATE serdes SET serdes.SLIB = 
> "org.apache.hadoop.hive.kudu.KuduSerDe" WHERE SERDE_ID IN ( SELECT 
> sds.SERDE_ID FROM tbls LEFT JOIN sds ON tbls.SD_ID = sds.SD_ID WHERE 
> tbls.TBL_ID IN (SELECT TBL_ID FROM table_params WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%'));
> +--++--++---+--+-+-++--+-+
> | id   | select_type        | table        | type   | possible_keys           
>   | key          | key_len | ref             | rows   | filtered | Extra      
>  |
> +--++--++---+--+-+-++--+-+
> |    1 | PRIMARY            | serdes       | index  | NULL                    
>   | PRIMARY      | 8       | NULL            | 401267 |   100.00 | Using 
> where |
> |    2 | DEPENDENT SUBQUERY | tbls         | index  | 
> PRIMARY,TBLS_N50,TBLS_N49 | TBLS_N50     | 9       | NULL            |  50921 
> |   100.00 | Using index |
> |    2 | DEPENDENT SUBQUERY |   | eq_ref | distinct_key            
>   | distinct_key | 8       | func            |      1 |   100.00 |            
>  |
> |    2 | DEPENDENT SUBQUERY | sds          | eq_ref | PRIMARY                 
>   | PRIMARY      | 8       | test.tbls.SD_ID |      1 |   100.00 | Using 
> where |
> |    3 | MATERIALIZED       | table_params | ALL    | 
> PRIMARY,TABLE_PARAMS_N49  | NULL         | NULL    | NULL            | 356593 
> |   100.00 | Using where |
> +--++--++---+--+-+-++--+-+
> 5 rows in set (0.00 sec){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26220) Shade & relocate dependencies in hive-exec to avoid conflicting with downstream projects

2024-05-19 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-26220:
---
Labels: pull-request-available  (was: hive-4.0.1-must 
pull-request-available)

> Shade & relocate dependencies in hive-exec to avoid conflicting with 
> downstream projects
> 
>
> Key: HIVE-26220
> URL: https://issues.apache.org/jira/browse/HIVE-26220
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 4.0.0-alpha-1
>Reporter: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
>
> Currently projects like Spark, Trino/Presto, Iceberg, etc, are depending on 
> {{hive-exec:core}} which was removed in HIVE-25531. The reason these projects 
> use {{hive-exec:core}} is because they have the flexibility to exclude, shade 
> & relocate dependencies in {{hive-exec}} that conflict with the ones they 
> brought in by themselves. However, with {{hive-exec}} this is no longer 
> possible, since it is a fat jar that shade those dependencies but do not 
> relocate many of them.
> In order for the downstream projects to consume {{hive-exec}}, we will need 
> to make sure all the dependencies in {{hive-exec}} are properly shaded and 
> relocated, so they won't cause conflicts with those from the downstream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28261) Update Hive version in Docker README

2024-05-15 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846550#comment-17846550
 ] 

Zhihua Deng edited comment on HIVE-28261 at 5/15/24 8:49 AM:
-

Thank you for the PR Mert! Feel free to change the assignee to you if the 
account is ready.


was (Author: dengzh):
Thank you for the PR Mert!. Feel free to change the assignee to you if the 
account is ready.

> Update Hive version in Docker README
> 
>
> Key: HIVE-28261
> URL: https://issues.apache.org/jira/browse/HIVE-28261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
> Fix For: 4.1.0
>
>
> Add the updates on the page to the readme as quickstart shows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28261) Update Hive version in Docker README

2024-05-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28261.

Fix Version/s: 4.1.0
   Resolution: Fixed

Thank you for the PR Mert!. Feel free to change the assignee to you if the 
account is ready.

> Update Hive version in Docker README
> 
>
> Key: HIVE-28261
> URL: https://issues.apache.org/jira/browse/HIVE-28261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
> Fix For: 4.1.0
>
>
> Add the updates on the page to the readme as quickstart shows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28261) Update Hive version in Docker README

2024-05-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28261:
---
Description: Add the updates on the page to the readme as quickstart shows.

> Update Hive version in Docker README
> 
>
> Key: HIVE-28261
> URL: https://issues.apache.org/jira/browse/HIVE-28261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>
> Add the updates on the page to the readme as quickstart shows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28261) Update Hive version in Docker README

2024-05-15 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-28261:
--

 Summary: Update Hive version in Docker README
 Key: HIVE-28261
 URL: https://issues.apache.org/jira/browse/HIVE-28261
 Project: Hive
  Issue Type: Improvement
Reporter: Zhihua Deng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28098) Fails to copy empty column statistics of materialized CTE

2024-05-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28098:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Fails to copy empty column statistics of materialized CTE
> -
>
> Key: HIVE-28098
> URL: https://issues.apache.org/jira/browse/HIVE-28098
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-28080 introduced the optimization of materialized CTEs, but it turned 
> out that it failed when statistics were empty.
> This query reproduces the issue.
> {code:java}
> set hive.stats.autogather=false;
> CREATE TABLE src_no_stats AS SELECT '123' as key, 'val123' as value UNION ALL 
> SELECT '9' as key, 'val9' as value;
> set hive.optimize.cte.materialize.threshold=2;
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> EXPLAIN WITH materialized_cte1 AS (
>   SELECT * FROM src_no_stats
> ),
> materialized_cte2 AS (
>   SELECT a.key
>   FROM materialized_cte1 a
>   JOIN materialized_cte1 b ON (a.key = b.key)
> )
> SELECT a.key
> FROM materialized_cte2 a
> JOIN materialized_cte2 b ON (a.key = b.key); {code}
> It throws an error.
> {code:java}
> Error: Error while compiling statement: FAILED: IllegalStateException The 
> size of col stats must be equal to that of schema. Stats = [], Schema = [key] 
> (state=42000,code=4) {code}
> Attaching a debugger, FSO of materialized_cte2 has empty stats as 
> JoinOperator loses stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27847) Prevent query Failures on Numeric <-> Timestamp

2024-05-15 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-27847:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

>  Prevent query Failures on Numeric <-> Timestamp
> 
>
> Key: HIVE-27847
> URL: https://issues.apache.org/jira/browse/HIVE-27847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
> Environment: master
> 4.0.0-alpha-1
>Reporter: Basapuram Kumar
>Assignee: Basapuram Kumar
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Attachments: HIVE-27847.patch
>
>
> In Master/4.0.0-alpha-1 branches, performing the Numeric to Timestamp 
> conversion, its failing with the error as 
> "{color:#de350b}org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited 
> (hive.strict.timestamp.conversion){color}" .
>  
> *Repro steps.*
>  # Sample data
> {noformat}
> $ hdfs dfs -cat /tmp/tc/t.csv
> 1653209895687,2022-05-22T15:58:15.931+07:00
> 1653209938316,2022-05-22T15:58:58.490+07:00
> 1653209962021,2022-05-22T15:59:22.191+07:00
> 1653210021993,2022-05-22T16:00:22.174+07:00
> 1653209890524,2022-05-22T15:58:10.724+07:00
> 1653210095382,2022-05-22T16:01:35.775+07:00
> 1653210044308,2022-05-22T16:00:44.683+07:00
> 1653210098546,2022-05-22T16:01:38.886+07:00
> 1653210012220,2022-05-22T16:00:12.394+07:00
> 165321376,2022-05-22T16:00:00.622+07:00{noformat}
>  # table with above data [1]
> {noformat}
> create external table   test_ts_conv(begin string, ts string) row format 
> delimited fields terminated by ',' stored as TEXTFILE LOCATION '/tmp/tc/';
> desc   test_ts_conv;
> | col_name  | data_type  | comment  |
> +---++--+
> | begin     | string     |          |
> | ts        | string     |          |
> +---++--+{noformat}
>  #  Create table with CTAS
> {noformat}
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=true  |
> +-+
> set to false
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> set 
> hive.strict.timestamp.conversion=false;
> +-+
> |                   set                   |
> +-+
> | hive.strict.timestamp.conversion=false  |
> +-+
> #Query:
> 0: jdbc:hive2://char1000.sre.iti.acceldata.de> 
> CREATE TABLE t_date 
> AS 
> select
>   CAST( CAST( `begin` AS BIGINT) / 1000  AS TIMESTAMP ) `begin`, 
>   CAST( 
> DATE_FORMAT(CAST(regexp_replace(`ts`,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})\\+(\\d{2}):(\\d{2})','$1-$2-$3
>  $4:$5:$6.$7') AS TIMESTAMP ),'MMdd') as BIGINT ) `par_key`
> FROM    test_ts_conv;{noformat}
> Error:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting 
> NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91)
>     at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149)
>     at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>     at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>     at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>     at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:508)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:314)
>     ... 17 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28200) Improve get_partitions_by_filter/expr when partition limit enabled

2024-05-07 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28200.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fix has been pushed to master. Thank you for the PR [~wechar] !

> Improve get_partitions_by_filter/expr when partition limit enabled
> --
>
> Key: HIVE-28200
> URL: https://issues.apache.org/jira/browse/HIVE-28200
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> When {{hive.metastore.limit.partition.request}} is configured, HMS would get 
> the matching partition counts before get the real partition objects. The 
> count  could be a slow query if the input filter or expr is too complex.
> In this case, such slow filter will be executed in both counting partition 
> numbers and fetching real partition objects, which harms the performance and 
> backend DBMS.
> We can make an improvement by getting matched partition names firstly, and 
> then check limit through the size of partition names, and finally get the 
> partitions by the partition names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28161) Incorrect Copyright years in META-INF/NOTICE files

2024-05-01 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842613#comment-17842613
 ] 

Zhihua Deng commented on HIVE-28161:


Fix has been merged. Thank you for the PR [~zabetak]!

> Incorrect Copyright years in META-INF/NOTICE files
> --
>
> Key: HIVE-28161
> URL: https://issues.apache.org/jira/browse/HIVE-28161
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> The generated META-INF/NOTICE file which resides inside each jar produced by 
> Hive has incorrect copyright years.
> Inside all jars the NOTICE file has the following incorrect content:
> {noformat}
> Copyright 2020 The Apache Software Foundation
> {noformat}
> The Copyright statement should include the timespan from the inception of the 
> project to now.
> {noformat}
> Copyright 2008-2024 The Apache Software Foundation
> {noformat}
> The problem can be easily seen by inspecting the jar content after building 
> the module or checking the previously published jars in Maven central.
> {noformat}
> mvn clean install -DskipTests -pl common/
> jar xf common/target/hive-common-4.1.0-SNAPSHOT.jar META-INF
> cat META-INF/NOTICE 
> Hive Common
> Copyright 2020 The Apache Software Foundation
> This product includes software developed at
> The Apache Software Foundation (http://www.apache.org/).
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28161) Incorrect Copyright years in META-INF/NOTICE files

2024-05-01 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28161.

Fix Version/s: 4.1.0
   Resolution: Fixed

> Incorrect Copyright years in META-INF/NOTICE files
> --
>
> Key: HIVE-28161
> URL: https://issues.apache.org/jira/browse/HIVE-28161
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> The generated META-INF/NOTICE file which resides inside each jar produced by 
> Hive has incorrect copyright years.
> Inside all jars the NOTICE file has the following incorrect content:
> {noformat}
> Copyright 2020 The Apache Software Foundation
> {noformat}
> The Copyright statement should include the timespan from the inception of the 
> project to now.
> {noformat}
> Copyright 2008-2024 The Apache Software Foundation
> {noformat}
> The problem can be easily seen by inspecting the jar content after building 
> the module or checking the previously published jars in Maven central.
> {noformat}
> mvn clean install -DskipTests -pl common/
> jar xf common/target/hive-common-4.1.0-SNAPSHOT.jar META-INF
> cat META-INF/NOTICE 
> Hive Common
> Copyright 2020 The Apache Software Foundation
> This product includes software developed at
> The Apache Software Foundation (http://www.apache.org/).
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28206) Preparing for 4.0.1 development

2024-04-26 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28206:
---
Labels:   (was: 4.)

> Preparing for 4.0.1 development
> ---
>
> Key: HIVE-28206
> URL: https://issues.apache.org/jira/browse/HIVE-28206
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Zhihua Deng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28206) Preparing for 4.0.1 development

2024-04-26 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28206:
---
Labels: 4.  (was: )

> Preparing for 4.0.1 development
> ---
>
> Key: HIVE-28206
> URL: https://issues.apache.org/jira/browse/HIVE-28206
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: 4.
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28206) Preparing for 4.0.1 development

2024-04-26 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-28206:
--

Assignee: Zhihua Deng

> Preparing for 4.0.1 development
> ---
>
> Key: HIVE-28206
> URL: https://issues.apache.org/jira/browse/HIVE-28206
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Zhihua Deng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28158) Add ASF license header in non-java files

2024-04-26 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28158:
---
Labels: hive-4.0.1-must  (was: )

> Add ASF license header in non-java files
> 
>
> Key: HIVE-28158
> URL: https://issues.apache.org/jira/browse/HIVE-28158
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: hive-4.0.1-must
> Fix For: 4.1.0
>
>
> According to the a [ASF policy|https://www.apache.org/legal/src-headers.html] 
> all source files should contain an ASF header. Currently there are a lot of 
> source files that do not contain the ASF header. The files can be broken into 
> the following categories:
> *Must have:*
>  * Python files (.py)
>  * Bash/Shell script files (.sh)
>  * Javascript files (.js)
> *Should have:*
>  * Maven files (pom.xml)
>  * GitHub workflows and Docker files (.yml)
> *Good to have:*
>  * Hive/Tez/Yarn and other configuration files (.xml)
>  * Log4J property files (.properties)
>  * Markdown files (.md)
> *Could have but OK if they don't:*
>  * Data files for tests (data/files/**)
>  * Generated code files (src/gen)
>  * QTest input/output files (.q, .q.out)
>  * IntelliJ files (.idea)
>  * Other txt and data files
> The changes here aim to address the first three categories (must, should, 
> good) and add the missing header when possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28211) Restore hive-exec-core jar

2024-04-26 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28211:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Restore hive-exec-core jar
> --
>
> Key: HIVE-28211
> URL: https://issues.apache.org/jira/browse/HIVE-28211
> Project: Hive
>  Issue Type: Task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
>
> The hive-exec-core jar is used by spark, oozie, hudi and many other pojects. 
> Removal of the hive-exec-core jar has caused the following issues.
> Spark : [https://lists.apache.org/list?d...@hive.apache.org:lte=1M:joda]
> Oozie: [https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg]
> Hudi: [apache/hudi#8147|https://github.com/apache/hudi/issues/8147]
> Until the we shade & relocate dependencies in hive-exec, we should restore 
> the hive-exec core jar .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28161) Incorrect Copyright years in META-INF/NOTICE files

2024-04-26 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28161:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Incorrect Copyright years in META-INF/NOTICE files
> --
>
> Key: HIVE-28161
> URL: https://issues.apache.org/jira/browse/HIVE-28161
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
>
> The generated META-INF/NOTICE file which resides inside each jar produced by 
> Hive has incorrect copyright years.
> Inside all jars the NOTICE file has the following incorrect content:
> {noformat}
> Copyright 2020 The Apache Software Foundation
> {noformat}
> The Copyright statement should include the timespan from the inception of the 
> project to now.
> {noformat}
> Copyright 2008-2024 The Apache Software Foundation
> {noformat}
> The problem can be easily seen by inspecting the jar content after building 
> the module or checking the previously published jars in Maven central.
> {noformat}
> mvn clean install -DskipTests -pl common/
> jar xf common/target/hive-common-4.1.0-SNAPSHOT.jar META-INF
> cat META-INF/NOTICE 
> Hive Common
> Copyright 2020 The Apache Software Foundation
> This product includes software developed at
> The Apache Software Foundation (http://www.apache.org/).
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27494) Deduplicate the task result that generated by more branches in union all

2024-04-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-27494:
---
Description: 
HIVE-23891 adds the ability to deduplicate the task result that under the 
directory,

//_tmp.-ext-1//HIVE_UNION_SUBDIR_1,

but turns out to ignore taking the same action to the output directory for the 
same query:

//_tmp.-ext-1//HIVE_UNION_SUBDIR_2.

So user may still have the same data duplication problem upon multiple tez task 
attempts.

  was:
HIVE-23891 adds the ability to deduplicate the task result that under the 
directory,

//_tmp.-ext-1//HIVE_UNION_SUBDIR_1,

but turns out to ignore taking the same action to the directory for the same 
query:

//_tmp.-ext-1//HIVE_UNION_SUBDIR_2.

So user may still have the same data duplication problem in multiple tez task 
attempts.


> Deduplicate the task result that generated by more branches in union all
> 
>
> Key: HIVE-27494
> URL: https://issues.apache.org/jira/browse/HIVE-27494
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
> Attachments: ddl.q, explain.output
>
>
> HIVE-23891 adds the ability to deduplicate the task result that under the 
> directory,
> //_tmp.-ext-1//HIVE_UNION_SUBDIR_1,
> but turns out to ignore taking the same action to the output directory for 
> the same query:
> //_tmp.-ext-1//HIVE_UNION_SUBDIR_2.
> So user may still have the same data duplication problem upon multiple tez 
> task attempts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28204) Remove some HMS obsolete scripts

2024-04-17 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28204:
---
Description: As the Hive 1.x has reached end of life, the scripts for HMS 
metadata need to be removed from the repository and the packaged tarball, 
however it's better to keep the script for the Hive to upgrade from 1.x and the 
test.  (was: As the Hive 1.x has reached end of life, the scripts for HMS 
metadata need to be removed from the repository and the packaged tarball, 
however it's better to keep the script for the Hive to upgrade from 1.x.)

> Remove some HMS obsolete scripts
> 
>
> Key: HIVE-28204
> URL: https://issues.apache.org/jira/browse/HIVE-28204
> Project: Hive
>  Issue Type: Task
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
>
> As the Hive 1.x has reached end of life, the scripts for HMS metadata need to 
> be removed from the repository and the packaged tarball, however it's better 
> to keep the script for the Hive to upgrade from 1.x and the test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28204) Remove some HMS obsolete scripts

2024-04-17 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28204:
---
Summary: Remove some HMS obsolete scripts  (was: Remove the HMS 1.x init 
script)

> Remove some HMS obsolete scripts
> 
>
> Key: HIVE-28204
> URL: https://issues.apache.org/jira/browse/HIVE-28204
> Project: Hive
>  Issue Type: Task
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
>
> As the Hive 1.x has reached end of life, the scripts for HMS metadata need to 
> be removed from the repository and the packaged tarball, however it's better 
> to keep the script for the Hive to upgrade from 1.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28204) Remove the HMS 1.x init script

2024-04-17 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-28204:
--

 Summary: Remove the HMS 1.x init script
 Key: HIVE-28204
 URL: https://issues.apache.org/jira/browse/HIVE-28204
 Project: Hive
  Issue Type: Task
Reporter: Zhihua Deng
Assignee: Zhihua Deng


As the Hive 1.x has reached end of life, the scripts for HMS metadata need to 
be removed from the repository and the packaged tarball, however it's better to 
keep the script for the Hive to upgrade from 1.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28182) Logs page doesn't load in Hive UI

2024-04-16 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837932#comment-17837932
 ] 

Zhihua Deng commented on HIVE-28182:


It works for me as well...

> Logs page doesn't load in Hive UI
> -
>
> Key: HIVE-28182
> URL: https://issues.apache.org/jira/browse/HIVE-28182
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Dmitriy Fingerman
>Priority: Major
> Attachments: Screen Shot 2024-04-05 at 5.16.26 PM.png, 
> image-2024-04-06-02-51-24-363.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28198) Trino table is recognized as EXTERNAL_TABLE regardless of external_location parameter

2024-04-16 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837931#comment-17837931
 ] 

Zhihua Deng commented on HIVE-28198:


In Hive 4.0, we treat the Hive acid table as the managed table by default, for 
the legacy managed table, it's translated to an external table with parameter: 
TRANSLATED_TO_EXTERNAL=true, external.table.purge=true, the legacy external 
table works as the same.

> Trino table is recognized as EXTERNAL_TABLE regardless of external_location 
> parameter
> -
>
> Key: HIVE-28198
> URL: https://issues.apache.org/jira/browse/HIVE-28198
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Mladjan Gadzic
>Priority: Major
>
> {code:java}
> trino > create table hive.default.test_table(id int);{code}
> {code:java}
> trino> delete from hive.default.test_table;
> Query 20240402_103228_00042_hm8m3, FAILED, 1 node Splits: 1 total, 0 done 
> (0.00%) 0.08 [0 rows, 0B] [0 rows/s, 0B/s]
> Query 20240402_103228_00042_hm8m3 failed: Cannot delete from non-managed Hive 
>  table{code}
> This behavior is tested and works as expected in Hive 3. Table type is stored 
> in HMS DB in {{TBLS}} table {{TBL_TYPE}} field. For Hive 3 value is 
> MANAGED_TABLE and EXTERNAL_TABLE for Hive 4.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28199) Docker quickstart does not work for Hive 3.1.3 on Mac M2

2024-04-16 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28199.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fix has been merged. Thank you for the PR [~ryandgoldenberg]!

> Docker quickstart does not work for Hive 3.1.3 on Mac M2
> 
>
> Key: HIVE-28199
> URL: https://issues.apache.org/jira/browse/HIVE-28199
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryan Goldenberg
>Assignee: Ryan Goldenberg
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Quickstart: 
> [https://hive.apache.org/developement/quickstart/#--hiveserver2-metastore]
> On Mac M2, {{docker-compose up}} for {{HIVE_VERSION=3.1.3}} gives the 
> following errors
>  * {{/home/hive/.beeline}} directory issue
> {quote}metastore | *** schemaTool failed ***
> metastore | [
> metastore | WARN] Failed to create directory:
> metastore | /home/hive/.beeline
> metastore | No such file or directory
> {quote} * Underscore in network name, from {{/tmp/hive/hive.log}} on 
> {{{}hiveserver2{}}}:
> {quote}2024-04-02T16:26:24,867 ERROR [main] utils.MetaStoreUtils: Got 
> exception: java.net.URISyntaxException Illegal character in hostname at index 
> 25: thrift://metastore.docker_default:9083
> java.net.URISyntaxException: Illegal character in hostname at index 25: 
> thrift://metastore.docker_default:9083
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27746) Hive Metastore should send single AlterPartitionEvent with list of partitions

2024-03-11 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27746.

Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thank you [~hemanth619] , [~jfs] and [~henrib] for the review!

A property: metastore.alterPartitions.notification.v2.enabled is introduced to 
ensure backward compatibility when it sets to false, so downstream notification 
consumers can still process the ALTER_PARTITION event without changes.

> Hive Metastore should send single AlterPartitionEvent with list of partitions
> -
>
> Key: HIVE-27746
> URL: https://issues.apache.org/jira/browse/HIVE-27746
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Naveen Gangam
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In HIVE-3938, work was done to send single AddPartitionEvent for APIs that 
> add partitions in bulk. Similarly, we have alter_partitions APIs that alter 
> partitions in bulk via a single HMS call. For such events, we should also 
> send a single AlterPartitionEvent with a list of partitions in it.
> This would be way more efficient than having to send and process them 
> individually.
> This fix will be incompatible with the older clients that expect single 
> partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27805) Hive server2 connections limits bug

2024-02-28 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-27805:
--

Assignee: Zhihua Deng

> Hive server2 connections limits bug
> ---
>
> Key: HIVE-27805
> URL: https://issues.apache.org/jira/browse/HIVE-27805
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2, 3.1.3
> Environment: image: apache/hive:3.1.3
> jdbc driver: hive-jdbc:2.1.0
>Reporter: Xiwei Wang
>Assignee: Zhihua Deng
>Priority: Major
>
> When I use JDBC and specify a non-existent database to connect to a 
> hiveserver2 that configured hive.server2.limit.connections.per.user=10, a 
> session initialization error 
> occurs(org.apache.hive.service.cli.HiveSQLException: Failed to open new 
> session: Database not_exists_db does not exist); and even a normal connection 
> will report an error after the number of attempts exceeds the maximum limit I 
> configured (org.apache.hive.service.cli.HiveSQLException: Connection limit 
> per user reached (user: aeolus limit: 10))
>  
> I found that inside the method 
> org.apache.hive.service.cli.session.SessionManager#createSession
> , if seesion initialization fails, it will cause the increased number of 
> connections called incrementConnections cannot be released; after the number 
> of failures exceeds the maximum number of connections configured by the user, 
> such as hive.server2.limit.connections.per.user, hiveserver2 will not accept 
> any connections due to the limitations.
>  
> {code:java}
> 2023-10-17T12:14:54,313  WARN [HiveServer2-Handler-Pool: Thread-3329] 
> thrift.ThriftCLIService: Error opening session:
> org.apache.hive.service.cli.HiveSQLException: Failed to open new session: 
> Database not_exists_db does not exist
>     at 
> org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:434)
>  ~[hive-service-3.1.3.jar:3.1.3]
>     at 
> org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:373)
>  ~[hive-service-3.1.3.jar:3.1.3]
>     at 
> org.apache.hive.service.cli.CLIService.openSession(CLIService.java:187) 
> ~[hive-service-3.1.3.jar:3.1.3]
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:475)
>  ~[hive-service-3.1.3.jar:3.1.3]
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:322)
>  ~[hive-service-3.1.3.jar:3.1.3]
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1497)
>  ~[hive-exec-3.1.3.jar:3.1.3]
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1482)
>  ~[hive-exec-3.1.3.jar:3.1.3]
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-3.1.3.jar:3.1.3]
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-3.1.3.jar:3.1.3]
>     at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  ~[hive-service-3.1.3.jar:3.1.3]
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-3.1.3.jar:3.1.3]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_342]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_342]
>     at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
> Caused by: org.apache.hive.service.cli.HiveSQLException: Database dw_aeolus 
> does not exist
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.configureSession(HiveSessionImpl.java:294)
>  ~[hive-service-3.1.3.jar:3.1.3]
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.open(HiveSessionImpl.java:199)
>  ~[hive-service-3.1.3.jar:3.1.3]
>     at 
> org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:425)
>  ~[hive-service-3.1.3.jar:3.1.3]
>     ... 13 more
> 2023-10-17T12:14:54,972  INFO [HiveServer2-Handler-Pool: Thread-3330] 
> thrift.ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8
> 2023-10-17T12:14:54,973 ERROR [HiveServer2-Handler-Pool: Thread-3330] 
> service.CompositeService: Connection limit per user reached (user: aeolus 
> limit: 10)
> 2023-10-17T12:14:54,973  WARN [HiveServer2-Handler-Pool: Thread-3330] 
> thrift.ThriftCLIService: Error opening session:
> org.apache.hive.service.cli.HiveSQLException: Connection limit per user 
> reached (user: aeolus limit: 10)
>     at 
> org.apache.hive.service.cli.session.SessionManager.incrementConnections(SessionManager.java:476)
>  ~[hive-service-3.1.3.jar:3.1.3]
>     at 
> 

[jira] [Resolved] (HIVE-27692) Explore removing the always task from embedded HMS

2024-02-23 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27692.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged. Thank you Naveen for the review!

> Explore removing the always task from embedded HMS
> --
>
> Key: HIVE-27692
> URL: https://issues.apache.org/jira/browse/HIVE-27692
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The always tasks are running in the leader HMS now, the properties for 
> configuring the leader should only belong to HMS, other engines such as 
> Spark/Impala doesn't need to know these properties. For most cases, the 
> engine only cares about the properties for connecting HMS, e.g, 
> hive.metastore.uris.
> Every time when a new apps uses an embedded Metastore, it will start the HMS 
> always tasks by default. Imaging we have hundreds of apps, then hundreds of 
> pieces of the same tasks are running, this will put extra burden to the 
> underlying databases, such as the flooding queries, connection limit.
> I think we can remove always tasks from the embeded Metastore, the always 
> task will be taken care of by the standalone Metastore, as a standalone 
> Metastore should be here in production environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26435) Add method for collecting HMS meta summary

2024-02-21 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819470#comment-17819470
 ] 

Zhihua Deng edited comment on HIVE-26435 at 2/22/24 4:55 AM:
-

Fix has been merged into master, Thank you [~danielzhu] and [~ruyi.zheng] for 
the work! 


was (Author: dengzh):
Fix has been merged into master, Thank you [~danielzhu] for the work! 

> Add method for collecting HMS meta summary
> --
>
> Key: HIVE-26435
> URL: https://issues.apache.org/jira/browse/HIVE-26435
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ruyi Zheng
>Assignee: Hongdan Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive Metastore currently lacks visibility into its metadata. This work 
> includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to 
> print a summary (catalog name, database name, table name, partition column 
> count, number of rows. table type, file type, compression type, total data 
> size, etc). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26435) Add method for collecting HMS meta summary

2024-02-21 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-26435.

Fix Version/s: 4.0.0
 Assignee: Hongdan Zhu  (was: Naveen Gangam)
   Resolution: Fixed

Fix has been merged into master, Thank you [~danielzhu] for the work! 

> Add method for collecting HMS meta summary
> --
>
> Key: HIVE-26435
> URL: https://issues.apache.org/jira/browse/HIVE-26435
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ruyi Zheng
>Assignee: Hongdan Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive Metastore currently lacks visibility into its metadata. This work 
> includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to 
> print a summary (catalog name, database name, table name, partition column 
> count, number of rows. table type, file type, compression type, total data 
> size, etc). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26435) Add method for collecting HMS meta summary

2024-02-21 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-26435:
---
Summary: Add method for collecting HMS meta summary  (was: HMS Summary)

> Add method for collecting HMS meta summary
> --
>
> Key: HIVE-26435
> URL: https://issues.apache.org/jira/browse/HIVE-26435
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ruyi Zheng
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive Metastore currently lacks visibility into its metadata. This work 
> includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to 
> print a summary (catalog name, database name, table name, partition column 
> count, number of rows. table type, file type, compression type, total data 
> size, etc). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27778) Alter table command gives error after computer stats is run with Impala

2024-02-21 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27778.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged into master. Thank you [~dkuzmenko] and [~zhangbutao] for 
the review!

> Alter table command gives error after computer stats is run with Impala
> ---
>
> Key: HIVE-27778
> URL: https://issues.apache.org/jira/browse/HIVE-27778
> Project: Hive
>  Issue Type: Bug
>Reporter: Kokila N
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Hive partitioned table's column stats are stored on partition level which is 
> in PART_COL_STATS in sys.
> When "column stats " query is run in Impala on a Hive partitioned 
> table generates column stats on table level and is stored in TAB_COL_STATS.
> So, Executing "Alter rename  to " after impala compute 
> stats throws error.
> {code:java}
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.parqtest without 
> providing the transactional write state for verification (new write ID 6, 
> valid write IDs null; current state null; new state {} {code}
> The column stats generated from impala needs to be deleted for alter command 
> to work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27778) Alter table command gives error after computer stats is run with Impala

2024-01-25 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-27778:
--

Assignee: Zhihua Deng

> Alter table command gives error after computer stats is run with Impala
> ---
>
> Key: HIVE-27778
> URL: https://issues.apache.org/jira/browse/HIVE-27778
> Project: Hive
>  Issue Type: Bug
>Reporter: Kokila N
>Assignee: Zhihua Deng
>Priority: Major
>
> Hive partitioned table's column stats are stored on partition level which is 
> in PART_COL_STATS in sys.
> When "column stats " query is run in Impala on a Hive partitioned 
> table generates column stats on table level and is stored in TAB_COL_STATS.
> So, Executing "Alter rename  to " after impala compute 
> stats throws error.
> {code:java}
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.parqtest without 
> providing the transactional write state for verification (new write ID 6, 
> valid write IDs null; current state null; new state {} {code}
> The column stats generated from impala needs to be deleted for alter command 
> to work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27775) DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift

2024-01-19 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808855#comment-17808855
 ] 

Zhihua Deng commented on HIVE-27775:


Not sure how many users cloud use timestamp as the type for the partition 
column, looks like this is a new feature introduced in Hive 4.0.

>From the Jira description, the partition predicate in the query

 
{code:java}
SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00'; {code}
returns an empty result on direct sql, while a partition('2023-03-26 02:30:00') 
on the JDO, provided the TIMESTAMP data type in Hive is timezone agnostic, so I 
assume the result from JDO is correct.

Besides this difference, when I tested the date/timestamp partition column 
against Postgres and MySQL, there're some problems:

Postgres(direct sql):

operator does not exist: timestamp without time zone = character varying

MySQL(direct sql):

You have an error in your SQL syntax; check the manual that corresponds to your 
MySQL server version for the right syntax to use near 'TIMESTAMP) else null 
end) as TIMESTAMP) = '2023-03-26 03:30:00'))' at line 1 

 

> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift
> --
>
> Key: HIVE-27775
> URL: https://issues.apache.org/jira/browse/HIVE-27775
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Zhihua Deng
>Priority: Critical
>  Labels: pull-request-available
>
> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift.
> {code:sql}
> --! qt:timezone:Europe/Paris
> CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime 
> TIMESTAMP) STORED AS ORC;
> INSERT into payments VALUES('---', '2023-03-26 02:30:00');
> SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00';
> {code}
> The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls 
> exactly in the middle of the DST shift. In this particular timezone this date 
> time never really exists since we are jumping directly from 02:00:00 to 
> 03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic 
> (https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) 
> so it is a perfectly valid timestamp that can be inserted in a table and we 
> must be able to recover it back.
> For the SELECT query above, partition pruning kicks in and calls the 
> ObjectStore#getPartitionsByExpr method in order to fetch the respective 
> partitions matching the timestamp from HMS.
> The tests however reveal that DirectSQL and JDO paths are not returning the 
> same results leading to an exception when VerifyingObjectStore is used. 
> According to the error below DirectSQL is able to recover one partition from 
> HMS (expected) while JDO/ORM returns empty (not expected).
> {noformat}
> 2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0
> 2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.RetryingHMSHandler: MetaException(message:Lists are not the same 
> size: SQL 1, ORM 0)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
>   at com.sun.proxy.$Proxy57.getPartitionsByExpr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:98)
>   at 
> org.apache.hadoop.hive.metastore.AbstractHMSHandlerProxy.invoke(AbstractHMSHandlerProxy.java:82)
>   at com.sun.proxy.$Proxy59.get_partitions_spec_by_expr(Unknown Source)
>   at 
> 

[jira] [Comment Edited] (HIVE-27775) DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift

2024-01-19 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808855#comment-17808855
 ] 

Zhihua Deng edited comment on HIVE-27775 at 1/20/24 2:27 AM:
-

Not sure how many users cloud use timestamp as the type for the partition 
column, looks like this is a new feature introduced in Hive 4.0.

>From the Jira description, the partition predicate in the query
{code:java}
SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00'; {code}
returns an empty result on direct sql, while a partition('2023-03-26 02:30:00') 
on the JDO, provided the TIMESTAMP data type in Hive is timezone agnostic, so I 
assume the result from JDO is correct.

Besides this difference, when I tested the date/timestamp partition column 
against Postgres and MySQL, there're some problems:

Postgres(direct sql):

operator does not exist: timestamp without time zone = character varying

MySQL(direct sql):

You have an error in your SQL syntax; check the manual that corresponds to your 
MySQL server version for the right syntax to use near 'TIMESTAMP) else null 
end) as TIMESTAMP) = '2023-03-26 03:30:00'))' at line 1 


was (Author: dengzh):
Not sure how many users cloud use timestamp as the type for the partition 
column, looks like this is a new feature introduced in Hive 4.0.

>From the Jira description, the partition predicate in the query

 
{code:java}
SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00'; {code}
returns an empty result on direct sql, while a partition('2023-03-26 02:30:00') 
on the JDO, provided the TIMESTAMP data type in Hive is timezone agnostic, so I 
assume the result from JDO is correct.

Besides this difference, when I tested the date/timestamp partition column 
against Postgres and MySQL, there're some problems:

Postgres(direct sql):

operator does not exist: timestamp without time zone = character varying

MySQL(direct sql):

You have an error in your SQL syntax; check the manual that corresponds to your 
MySQL server version for the right syntax to use near 'TIMESTAMP) else null 
end) as TIMESTAMP) = '2023-03-26 03:30:00'))' at line 1 

 

> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift
> --
>
> Key: HIVE-27775
> URL: https://issues.apache.org/jira/browse/HIVE-27775
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Zhihua Deng
>Priority: Critical
>  Labels: pull-request-available
>
> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift.
> {code:sql}
> --! qt:timezone:Europe/Paris
> CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime 
> TIMESTAMP) STORED AS ORC;
> INSERT into payments VALUES('---', '2023-03-26 02:30:00');
> SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00';
> {code}
> The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls 
> exactly in the middle of the DST shift. In this particular timezone this date 
> time never really exists since we are jumping directly from 02:00:00 to 
> 03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic 
> (https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) 
> so it is a perfectly valid timestamp that can be inserted in a table and we 
> must be able to recover it back.
> For the SELECT query above, partition pruning kicks in and calls the 
> ObjectStore#getPartitionsByExpr method in order to fetch the respective 
> partitions matching the timestamp from HMS.
> The tests however reveal that DirectSQL and JDO paths are not returning the 
> same results leading to an exception when VerifyingObjectStore is used. 
> According to the error below DirectSQL is able to recover one partition from 
> HMS (expected) while JDO/ORM returns empty (not expected).
> {noformat}
> 2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0
> 2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.RetryingHMSHandler: MetaException(message:Lists are not the same 
> size: SQL 1, ORM 0)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> 

[jira] [Resolved] (HIVE-27994) Optimize renaming the partitioned table

2024-01-19 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27994.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged. Thank you [~zhangbutao] and [~hemanth619] for the review!

> Optimize renaming the partitioned table
> ---
>
> Key: HIVE-27994
> URL: https://issues.apache.org/jira/browse/HIVE-27994
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In case of table rename, every row in PART_COL_STATS associated with the 
> table should be fetched, stored in memory, delete & re-insert with new 
> db/table name, this could take hours if the table has thousands of column 
> statistics in PART_COL_STATS.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28012) Invalid reference to the newly added column

2024-01-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28012:
---
Description: 
Steps to repro:
{code:java}
--! qt:dataset:src
--! qt:dataset:part
set hive.stats.autogather=true;
set hive.stats.column.autogather=true;
set 
metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
set hive.metastore.client.capabilities=HIVEFULLACIDWRITE,HIVEFULLACIDREAD;
set hive.create.as.external.legacy=true;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

CREATE TABLE rename_partition_table0 (key STRING, value STRING) PARTITIONED BY 
(part STRING) STORED AS ORC;
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '1') SELECT * 
FROM src where rand(1) < 0.5;
ALTER TABLE rename_partition_table0 ADD COLUMNS (new_col INT);
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '2') SELECT 
src.*, 1 FROM src;
{code}
Set hive.metastore.client.cache.v2.enabled=false can act as a workaround.

  was:
Steps to repro:

 
{code:java}
--! qt:dataset:src
--! qt:dataset:part
set hive.stats.autogather=true;
set hive.stats.column.autogather=true;
set 
metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
set hive.metastore.client.capabilities=HIVEFULLACIDWRITE,HIVEFULLACIDREAD;
set hive.create.as.external.legacy=true;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

CREATE TABLE rename_partition_table0 (key STRING, value STRING) PARTITIONED BY 
(part STRING) STORED AS ORC;
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '1') SELECT * 
FROM src where rand(1) < 0.5;
ALTER TABLE rename_partition_table0 ADD COLUMNS (new_col INT);
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '2') SELECT 
src.*, 1 FROM src;
{code}
 

 


> Invalid reference to the newly added column
> ---
>
> Key: HIVE-28012
> URL: https://issues.apache.org/jira/browse/HIVE-28012
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Priority: Major
>
> Steps to repro:
> {code:java}
> --! qt:dataset:src
> --! qt:dataset:part
> set hive.stats.autogather=true;
> set hive.stats.column.autogather=true;
> set 
> metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
> set hive.metastore.client.capabilities=HIVEFULLACIDWRITE,HIVEFULLACIDREAD;
> set hive.create.as.external.legacy=true;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> CREATE TABLE rename_partition_table0 (key STRING, value STRING) PARTITIONED 
> BY (part STRING) STORED AS ORC;
> INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '1') SELECT 
> * FROM src where rand(1) < 0.5;
> ALTER TABLE rename_partition_table0 ADD COLUMNS (new_col INT);
> INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '2') SELECT 
> src.*, 1 FROM src;
> {code}
> Set hive.metastore.client.cache.v2.enabled=false can act as a workaround.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28012) Invalid reference to the newly added column

2024-01-18 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-28012:
--

 Summary: Invalid reference to the newly added column
 Key: HIVE-28012
 URL: https://issues.apache.org/jira/browse/HIVE-28012
 Project: Hive
  Issue Type: Bug
Reporter: Zhihua Deng


Steps to repro:

 
{code:java}
--! qt:dataset:src
--! qt:dataset:part
set hive.stats.autogather=true;
set hive.stats.column.autogather=true;
set 
metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
set hive.metastore.client.capabilities=HIVEFULLACIDWRITE,HIVEFULLACIDREAD;
set hive.create.as.external.legacy=true;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

CREATE TABLE rename_partition_table0 (key STRING, value STRING) PARTITIONED BY 
(part STRING) STORED AS ORC;
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '1') SELECT * 
FROM src where rand(1) < 0.5;
ALTER TABLE rename_partition_table0 ADD COLUMNS (new_col INT);
INSERT OVERWRITE TABLE rename_partition_table0 PARTITION (part = '2') SELECT 
src.*, 1 FROM src;
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28011) Update the table info in PART_COL_STATS directly in case of table rename

2024-01-18 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-28011:
--

Assignee: Zhihua Deng

> Update the table info in PART_COL_STATS directly in case of table rename
> 
>
> Key: HIVE-28011
> URL: https://issues.apache.org/jira/browse/HIVE-28011
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>
> Following the discussion on 
> [https://github.com/apache/hive/pull/4995#issuecomment-1899477224,] there are 
> still some rooms for performance tuning in case of table rename, that is, we 
> don't need to fetch all the column statistics from database then update them 
> in batch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   >