[jira] [Updated] (HIVE-26265) REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26265:
--
Labels: pull-request-available  (was: )

> REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.
> 
>
> Key: HIVE-26265
> URL: https://issues.apache.org/jira/browse/HIVE-26265
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: francis pang
>Assignee: francis pang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> REPL DUMP is replication all OpenXacts, even when they are from other non 
> replicated databases. This wastes space in the dump, and ends up opening 
> unneeded transactions during REPL LOAD.
>  
> Add a config property for replication that filters out OpenXact events during 
> REPL DUMP. During REPL LOAD, the txns can be implicitly opened when the 
> ALLOC_WRITE_ID is processed. For CommitTxn and AbortTxn, dump only if WRITE 
> ID was allocated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26265) REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26265?focusedWorklogId=780992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780992
 ]

ASF GitHub Bot logged work on HIVE-26265:
-

Author: ASF GitHub Bot
Created on: 14/Jun/22 05:34
Start Date: 14/Jun/22 05:34
Worklog Time Spent: 10m 
  Work Description: cmunkey opened a new pull request, #3365:
URL: https://github.com/apache/hive/pull/3365

   Purpose: Currently, all Txn events OpenTxn, CommitTxn, and RollbackTxn are 
included in the REPL DUMP, even when the transcation does not involve the 
database being dumped
   (replicated). These events are unnecessary and result is excessive space 
required for the dump, as well as increasing work that results from these 
events being replayed
   during REPL LOAD.
   
   Solution proposed: To reduce this unnecessary space and work, added the 
hive.repl.filter.transactions configuration property. When set to "true", extra 
Txn events
   will be filtered out as follows: CommitTxn and RollbackTxn are included in 
the REPL DUMP only if the transaction referenced had a corresponding 
ALLOCATE_WRITE_ID event
   that was dumped. OpenTxn is never dumped, and the OpenTxn event will be 
implcitly Opened when REPL LOAD processes the ALLOC_WRITE_ID event, since the 
ALLOC_WRITE_ID
   contains the open transaction ids. The default setting is "false".
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 780992)
Remaining Estimate: 0h
Time Spent: 10m

> REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.
> 
>
> Key: HIVE-26265
> URL: https://issues.apache.org/jira/browse/HIVE-26265
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: francis pang
>Assignee: francis pang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> REPL DUMP is replication all OpenXacts, even when they are from other non 
> replicated databases. This wastes space in the dump, and ends up opening 
> unneeded transactions during REPL LOAD.
>  
> Add a config property for replication that filters out OpenXact events during 
> REPL DUMP. During REPL LOAD, the txns can be implicitly opened when the 
> ALLOC_WRITE_ID is processed. For CommitTxn and AbortTxn, dump only if WRITE 
> ID was allocated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26323) Expose real exception information to the client in JdbcSerDe.java

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26323?focusedWorklogId=780974=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780974
 ]

ASF GitHub Bot logged work on HIVE-26323:
-

Author: ASF GitHub Bot
Created on: 14/Jun/22 03:17
Start Date: 14/Jun/22 03:17
Worklog Time Spent: 10m 
  Work Description: zhangbutao opened a new pull request, #3364:
URL: https://github.com/apache/hive/pull/3364

   
   
   
   ### What changes were proposed in this pull request?
   
   Throw real exception massage to client in method _initialize_  of 
JdbcSerDe.java
   
   ### Why are the changes needed?
   
   Display friendly execption massage to the client user.
   Please see the details in: https://issues.apache.org/jira/browse/HIVE-26323
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. 
   
   ### How was this patch tested?
   
   Local test




Issue Time Tracking
---

Worklog Id: (was: 780974)
Remaining Estimate: 0h
Time Spent: 10m

> Expose real exception information to the client in JdbcSerDe.java
> -
>
> Key: HIVE-26323
> URL: https://issues.apache.org/jira/browse/HIVE-26323
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Method *_initialize_* in JdbcSerDe.java, always return the same exception 
> massage to the client no matter what problems happen.
> {code:java}
>     } catch (Exception e) {
>       throw new SerDeException("Caught exception while initializing the 
> SqlSerDe", e);
>     } {code}
> We should expose real execption massage to the client.
> This is a regression from HIVE-24560.
>  
> Step to repro:
> 1. create a jdbc table using incorrect mysql passwd or using incorrect mysql 
> host:
> {code:java}
> CREATE EXTERNAL TABLE jdbc_testtbl
> (
>   id bigint
> )
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
> "hive.sql.database.type" = "MYSQL",
> "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
> "hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",
> "hive.sql.dbcp.username" = "root",
> "hive.sql.dbcp.password" = "password",
> "hive.sql.table" = "mysqltbl",
> "hive.sql.dbcp.maxActive" = "1"
> ); {code}
> 2. beeline client always display same exception massage no matter incorrect 
> mysql  passwd or incorrect mysql host:
> {code:java}
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Caught 
> exception while initializing the SqlSerDe)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1343) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1348) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:141)
>  ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:99)
>  ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:343) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>   

[jira] [Updated] (HIVE-26323) Expose real exception information to the client in JdbcSerDe.java

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26323:
--
Labels: pull-request-available  (was: )

> Expose real exception information to the client in JdbcSerDe.java
> -
>
> Key: HIVE-26323
> URL: https://issues.apache.org/jira/browse/HIVE-26323
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Method *_initialize_* in JdbcSerDe.java, always return the same exception 
> massage to the client no matter what problems happen.
> {code:java}
>     } catch (Exception e) {
>       throw new SerDeException("Caught exception while initializing the 
> SqlSerDe", e);
>     } {code}
> We should expose real execption massage to the client.
> This is a regression from HIVE-24560.
>  
> Step to repro:
> 1. create a jdbc table using incorrect mysql passwd or using incorrect mysql 
> host:
> {code:java}
> CREATE EXTERNAL TABLE jdbc_testtbl
> (
>   id bigint
> )
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
> "hive.sql.database.type" = "MYSQL",
> "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
> "hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",
> "hive.sql.dbcp.username" = "root",
> "hive.sql.dbcp.password" = "password",
> "hive.sql.table" = "mysqltbl",
> "hive.sql.dbcp.maxActive" = "1"
> ); {code}
> 2. beeline client always display same exception massage no matter incorrect 
> mysql  passwd or incorrect mysql host:
> {code:java}
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Caught 
> exception while initializing the SqlSerDe)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1343) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1348) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:141)
>  ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:99)
>  ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:343) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233)
>  ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88)
>  ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> 

[jira] [Updated] (HIVE-26323) Expose real exception information to the client in JdbcSerDe.java

2022-06-13 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26323:
--
Description: 
Method *_initialize_* in JdbcSerDe.java, always return the same exception 
massage to the client no matter what problems happen.
{code:java}
    } catch (Exception e) {
      throw new SerDeException("Caught exception while initializing the 
SqlSerDe", e);
    } {code}
We should expose real execption massage to the client.

This is a regression from HIVE-24560.

 

Step to repro:

1. create a jdbc table using incorrect mysql passwd or using incorrect mysql 
host:
{code:java}
CREATE EXTERNAL TABLE jdbc_testtbl
(
  id bigint
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",
"hive.sql.dbcp.username" = "root",
"hive.sql.dbcp.password" = "password",
"hive.sql.table" = "mysqltbl",
"hive.sql.dbcp.maxActive" = "1"
); {code}
2. beeline client always display same exception massage no matter incorrect 
mysql  passwd or incorrect mysql host:
{code:java}
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : Failed
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Caught 
exception while initializing the SqlSerDe)
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1343) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1348) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:141)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:99)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:343) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_112]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)
 ~[hadoop-common-3.1.0-bc3.2.0.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_112]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_112]
        at 

[jira] [Updated] (HIVE-26323) Expose real exception information to the client in JdbcSerDe.java

2022-06-13 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26323:
--
Description: 
Method *_initialize_* in JdbcSerDe.java, always return the same exception 
massage to the client no matter what problems happen.
{code:java}
    } catch (Exception e) {
      throw new SerDeException("Caught exception while initializing the 
SqlSerDe", e);
    } {code}
We should expose real execption massage to the client.

This is a regression from HIVE-24560.

 

Step to repro:

1. create a jdbc table using incorrect mysql passwd or using incorrect mysql 
host:
{code:java}
CREATE EXTERNAL TABLE jdbc_testtbl
(
  id bigint
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",
"hive.sql.dbcp.username" = "root",
"hive.sql.dbcp.password" = "password",
"hive.sql.table" = "mysqltbl",
"hive.sql.dbcp.maxActive" = "1"
); {code}
2. beeline client always display same exception massage no matter incorrect 
mysq  passwd or incorrect host:
{code:java}
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : Failed
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Caught 
exception while initializing the SqlSerDe)
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1343) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1348) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:141)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:99)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:343) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_112]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)
 ~[hadoop-common-3.1.0-bc3.2.0.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_112]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_112]
        at 

[jira] [Updated] (HIVE-26323) Expose real exception information to the client in JdbcSerDe.java

2022-06-13 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26323:
--
Description: 
Method *_initialize_* in JdbcSerDe.java, always return the same exception 
massage to the client no matter what problems happen.
{code:java}
    } catch (Exception e) {
      throw new SerDeException("Caught exception while initializing the 
SqlSerDe", e);
    } {code}
We should expose real execption massage to the client.

This is a regression from HIVE-24560.

 

Step to repro:

1. create a jdbc table using incorrect mysql passwd or using incorrect mysql 
host:
{code:java}
CREATE EXTERNAL TABLE jdbc_testtbl
(
  id bigint
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",
"hive.sql.dbcp.username" = "root",
"hive.sql.dbcp.password" = "password",
"hive.sql.table" = "mysqltbl",
"hive.sql.dbcp.maxActive" = "1"
); {code}
2. beeline client always display same execption massage no matter incorrect 
mysq  passwd or incorrect host:
{code:java}
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : Failed
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Caught 
exception while initializing the SqlSerDe)
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1343) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1348) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:141)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:99)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:343) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_112]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)
 ~[hadoop-common-3.1.0-bc3.2.0.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_112]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_112]
        at 

[jira] [Assigned] (HIVE-26323) Expose real exception information to the client in JdbcSerDe.java

2022-06-13 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-26323:
-

Assignee: zhangbutao

> Expose real exception information to the client in JdbcSerDe.java
> -
>
> Key: HIVE-26323
> URL: https://issues.apache.org/jira/browse/HIVE-26323
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> Method *_initialize_* in JdbcSerDe.java, always return the same exception 
> massage to the client no matter what problems happen.
> {code:java}
>     } catch (Exception e) {
>       throw new SerDeException("Caught exception while initializing the 
> SqlSerDe", e);
>     } {code}
> We should expose real execption massage to the client.
> This is a regression from HIVE-24560.
>  
> Step to repro:
> 1. create a jdbc table using incorrect mysql passwd or using incorrect mysql 
> host:
> {code:java}
> CREATE EXTERNAL TABLE jdbc_testtbl
> (
>   id bigint
> )
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
> "hive.sql.database.type" = "MYSQL",
> "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
> "hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",
> "hive.sql.dbcp.username" = "root",
> "hive.sql.dbcp.password" = "password",
> "hive.sql.table" = "mysqltbl",
> "hive.sql.dbcp.maxActive" = "1"
> ); {code}
> 2. beeline client always display same execption massage no matter incorrect 
> mysq  passwd or incorrect host:
> {code:java}
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Caught 
> exception while initializing the SqlSerDe)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1343) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1348) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:141)
>  ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:99)
>  ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:343) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233)
>  ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88)
>  ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
>  ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 

[jira] [Updated] (HIVE-26323) Expose real exception information to the client in JdbcSerDe.java

2022-06-13 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26323:
--
Description: 
Method *_initialize_* in JdbcSerDe.java, always return the same exception 
massage to the client no matter what problems happen.
{code:java}
    } catch (Exception e) {
      throw new SerDeException("Caught exception while initializing the 
SqlSerDe", e);
    } {code}
We should expose real execption massage to the client.

This is a regression from HIVE-24560.

 

Step to repro:

1. create a jdbc table using incorrect mysql passwd or using incorrect mysql 
host:
{code:java}
CREATE EXTERNAL TABLE jdbc_testtbl
(
  id bigint
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",
"hive.sql.dbcp.username" = "root",
"hive.sql.dbcp.password" = "password",
"hive.sql.table" = "mysqltbl",
"hive.sql.dbcp.maxActive" = "1"
); {code}
2. beeline client always display same execption massage no matter incoorect 
mysq  passwd or incorrect host:
{code:java}
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : Failed
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Caught 
exception while initializing the SqlSerDe)
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1343) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1348) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:141)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:99)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:343) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_112]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)
 ~[hadoop-common-3.1.0-bc3.2.0.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_112]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_112]
        at 

[jira] [Work logged] (HIVE-26269) Class cast exception when vectorization is enabled for certain case when cases

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26269?focusedWorklogId=780970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780970
 ]

ASF GitHub Bot logged work on HIVE-26269:
-

Author: ASF GitHub Bot
Created on: 14/Jun/22 02:47
Start Date: 14/Jun/22 02:47
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on code in PR #3329:
URL: https://github.com/apache/hive/pull/3329#discussion_r896323787


##
ql/src/test/queries/clientpositive/vector_case_when_3.q:
##
@@ -0,0 +1,7 @@
+set hive.explain.user=false;
+set hive.fetch.task.conversion=none;
+set hive.vectorized.execution.enabled=true;
+create external table test_decimal(rattag string, newclt_all decimal(15,2)) 
stored as orc;
+insert into test_decimal values('a', '10.20');
+select sum(case when rattag='a' then newclt_all*0.3 else newclt_all end) from 
test_decimal;
+select sum(case when rattag='Y' then newclt_all*0.3 else newclt_all end) from 
test_decimal;

Review Comment:
   Yes that will be useful to see what is happening. I added.





Issue Time Tracking
---

Worklog Id: (was: 780970)
Time Spent: 40m  (was: 0.5h)

> Class cast exception when vectorization is enabled for certain case when cases
> --
>
> Key: HIVE-26269
> URL: https://issues.apache.org/jira/browse/HIVE-26269
> Project: Hive
>  Issue Type: Bug
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Class cast exception when vectorization is enabled for certain case when cases



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26184) COLLECT_SET with GROUP BY is very slow when some keys are highly skewed

2022-06-13 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553854#comment-17553854
 ] 

okumin commented on HIVE-26184:
---

Thanks a lot!

> COLLECT_SET with GROUP BY is very slow when some keys are highly skewed
> ---
>
> Key: HIVE-26184
> URL: https://issues.apache.org/jira/browse/HIVE-26184
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.8, 3.1.3
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I observed some reducers spend 98% of CPU time in invoking 
> `java.util.HashMap#clear`.
> Looking the detail, I found COLLECT_SET reuses a LinkedHashSet and its 
> `clear` can be quite heavy when a relation has a small number of highly 
> skewed keys.
>  
> To reproduce the issue, first, we will create rows with a skewed key.
> {code:java}
> INSERT INTO test_collect_set
> SELECT '----' AS key, CAST(UUID() AS VARCHAR) 
> AS value
> FROM table_with_many_rows
> LIMIT 10;{code}
> Then, we will create many non-skewed rows.
> {code:java}
> INSERT INTO test_collect_set
> SELECT UUID() AS key, UUID() AS value
> FROM table_with_many_rows
> LIMIT 500;{code}
> We can observe the issue when we aggregate values by `key`.
> {code:java}
> SELECT key, COLLECT_SET(value) FROM group_by_skew GROUP BY key{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-24375) Cannot drop parttions when using metastore standalone

2022-06-13 Thread Wang Jiangkun (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553853#comment-17553853
 ] 

Wang Jiangkun commented on HIVE-24375:
--

I have this problem too, can anyone solve it? My current solution is to package 
hive-exec in the standalonemetastore and modify the configuration in the 
standalonemetastore 
hive.metastore.expression.proxy=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore
 to avoid this problem

[ashutoshc |https://github.com/ashutoshc]
h3. [sershe-apache|https://github.com/sershe-apache]
h3.  

> Cannot drop parttions when using metastore standalone
> -
>
> Key: HIVE-24375
> URL: https://issues.apache.org/jira/browse/HIVE-24375
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: Alex Simenduev
>Priority: Major
>
> I can successfully connect to metastore using beeline client.
> {code:java}
>  beeline --fastConnect=true -u jdbc:hive2://
> {code}
> The I run partition drop command for a table
> {code:java}
> ALTER TABLE work_v3 DROP PARTITION (user=992, `date`=20200806);{code}
> I get error that partition not found, but the partition does exists according 
> to show partitions.
> Here is full stacktrace.
>  
> {code}
> 0: jdbc:hive2://> alter table work_v3 drop partition (user=992, 
> `date`=20200806);0: jdbc:hive2://> alter table work_v3 drop partition 
> (user=992, `date`=20200806);20/11/12 09:55:00 [main]: INFO conf.HiveConf: 
> Using the default value passed in for log id: 
> 366b6abb-87d4-4758-9cfb-fa36903f69de20/11/12 09:55:00 [main]: INFO 
> session.SessionState: Updating thread name to 
> 366b6abb-87d4-4758-9cfb-fa36903f69de main20/11/12 09:55:00 
> [366b6abb-87d4-4758-9cfb-fa36903f69de main]: INFO operation.OperationManager: 
> Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, 
> getHandleIdentifier()=897f778a-37c6-444d-b244-3f701268eec7]20/11/12 09:55:00 
> [366b6abb-87d4-4758-9cfb-fa36903f69de main]: INFO ql.Driver: Compiling 
> command(queryId=root_20201112095500_e4ca0f5b-0b00-43ae-9efb-2d0a24ceed15): 
> alter table work_v3 drop partition (user=992, `date`=20200806)20/11/12 
> 09:55:00 [366b6abb-87d4-4758-9cfb-fa36903f69de main]: INFO ql.Driver: 
> Concurrency mode is disabled, not creating a lock managerFAILED: 
> SemanticException [Error 10006]: Partition not found ((user = 992) and (date 
> = 20200806))20/11/12 09:55:00 [366b6abb-87d4-4758-9cfb-fa36903f69de main]: 
> ERROR ql.Driver: FAILED: SemanticException [Error 10006]: Partition not found 
> ((user = 992) and (date = 
> 20200806))org.apache.hadoop.hive.ql.parse.SemanticException: Partition not 
> found ((user = 992) and (date = 20200806)) at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addTableDropPartsOutputs(DDLSemanticAnalyzer.java:4028)
>  at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableDropParts(DDLSemanticAnalyzer.java:3376)
>  at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:327)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
>  at org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) 
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
>  at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
>  at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
>  at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1585)
>  at com.sun.proxy.$Proxy28.ExecuteStatement(Unknown Source) at 
> org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:323) 
> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:265) 

[jira] [Updated] (HIVE-26322) Upgrade gson to 2.9.0 due to CVE

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26322:
--
Labels: pull-request-available  (was: )

> Upgrade gson to 2.9.0 due to CVE
> 
>
> Key: HIVE-26322
> URL: https://issues.apache.org/jira/browse/HIVE-26322
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26322) Upgrade gson to 2.9.0 due to CVE

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26322?focusedWorklogId=780961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780961
 ]

ASF GitHub Bot logged work on HIVE-26322:
-

Author: ASF GitHub Bot
Created on: 14/Jun/22 01:33
Start Date: 14/Jun/22 01:33
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request, #3363:
URL: https://github.com/apache/hive/pull/3363

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 780961)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade gson to 2.9.0 due to CVE
> 
>
> Key: HIVE-26322
> URL: https://issues.apache.org/jira/browse/HIVE-26322
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26142) Extend the hidden conf list with webui keystore pwd

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26142?focusedWorklogId=780954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780954
 ]

ASF GitHub Bot logged work on HIVE-26142:
-

Author: ASF GitHub Bot
Created on: 14/Jun/22 00:23
Start Date: 14/Jun/22 00:23
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #3212:
URL: https://github.com/apache/hive/pull/3212#issuecomment-1154576740

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 780954)
Time Spent: 20m  (was: 10m)

>  Extend the hidden conf list with webui keystore pwd
> 
>
> Key: HIVE-26142
> URL: https://issues.apache.org/jira/browse/HIVE-26142
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Janos Kovacs
>Assignee: Janos Kovacs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The SSL keystore configuration is separated for HS2 itself and the WebUI. The 
> hidden configuration list only contains server2.keystore.password but should 
> also contain server2.webui.keystore.password.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25787) Prevent duplicate paths in the fileList while adding an entry to NotifcationLog

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25787?focusedWorklogId=780955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780955
 ]

ASF GitHub Bot logged work on HIVE-25787:
-

Author: ASF GitHub Bot
Created on: 14/Jun/22 00:23
Start Date: 14/Jun/22 00:23
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #3170:
URL: https://github.com/apache/hive/pull/3170#issuecomment-1154576834

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 780955)
Time Spent: 40m  (was: 0.5h)

> Prevent duplicate paths in the fileList while adding an entry to 
> NotifcationLog
> ---
>
> Key: HIVE-25787
> URL: https://issues.apache.org/jira/browse/HIVE-25787
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As of now, while adding entries to notification logs, in case of retries, 
> sometimes the same path gets added to the notification log entry, which 
> during replication leads to failures during copy.
> Avoid having same path more than once for single transaction.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26320) Incorrect case evaluation for Parquet based table

2022-06-13 Thread Chiran Ravani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiran Ravani updated HIVE-26320:
-
Issue Type: Bug  (was: Improvement)

> Incorrect case evaluation for Parquet based table
> -
>
> Key: HIVE-26320
> URL: https://issues.apache.org/jira/browse/HIVE-26320
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 4.0.0-alpha-1
>Reporter: Chiran Ravani
>Priority: Major
>
> Query involving case statement with two or more conditions leads to incorrect 
> result for tables with parquet format, The problem is not observed with ORC 
> or TextFile.
> *Steps to reproduce*:
> {code:java}
> create external table case_test_parquet(kob varchar(2),enhanced_type_code 
> int) stored as parquet;
> insert into case_test_parquet values('BB',18),('BC',18),('AB',18);
> select case when (
>(kob='BB' and enhanced_type_code='18')
>or (kob='BC' and enhanced_type_code='18')
>  )
> then 1
> else 0
> end as logic_check
> from case_test_parquet;
> {code}
> Result:
> {code}
> 0
> 0
> 0
> {code}
> Expected result:
> {code}
> 1
> 1
> 0
> {code}
> The problem does not appear when setting hive.optimize.point.lookup=false.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26321) Upgrade commons-io to 2.11.0

2022-06-13 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-26321:



> Upgrade commons-io to 2.11.0
> 
>
> Key: HIVE-26321
> URL: https://issues.apache.org/jira/browse/HIVE-26321
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> Upgrade commons-io to 2.11.0



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=780869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780869
 ]

ASF GitHub Bot logged work on HIVE-25879:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 15:30
Start Date: 13/Jun/22 15:30
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on code in PR #3348:
URL: https://github.com/apache/hive/pull/3348#discussion_r895859650


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -316,17 +316,20 @@ private boolean ensureDbInit() {
   }
 
   private boolean runTestQuery() {
+boolean doTrace = LOG.isDebugEnabled();
 Transaction tx = pm.currentTransaction();
 boolean doCommit = false;
 if (!tx.isActive()) {
   tx.begin();
   doCommit = true;
 }
 // Run a self-test query. If it doesn't work, we will self-disable. What a 
PITA...
-String selfTestQuery = "select \"DB_ID\" from " + DBS + "";
+String selfTestQuery = "select \"DB_ID\" from " + DBS + " WHERE 
\"DB_ID\"=1";

Review Comment:
   `where 1=0` sounds good to me...or `DB_ID=-1`; but this will be ok as well





Issue Time Tracking
---

Worklog Id: (was: 780869)
Time Spent: 1.5h  (was: 1h 20m)

> MetaStoreDirectSql test query should not query the whole DBS table
> --
>
> Key: HIVE-25879
> URL: https://issues.apache.org/jira/browse/HIVE-25879
> Project: Hive
>  Issue Type: Bug
>Reporter: Miklos Szurap
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The runTestQuery() in the 
> org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query
> {code:java}
> select "DB_ID" from "DBS"{code}
> to determine whether the direct SQL can be used.
> With larger deployments with many (10k+) Hive databases it would be more 
> efficienct to query a small table instead, for example the "VERSION" table 
> should always have a single row only.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26184) COLLECT_SET with GROUP BY is very slow when some keys are highly skewed

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26184?focusedWorklogId=780865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780865
 ]

ASF GitHub Bot logged work on HIVE-26184:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 15:26
Start Date: 13/Jun/22 15:26
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged PR #3253:
URL: https://github.com/apache/hive/pull/3253




Issue Time Tracking
---

Worklog Id: (was: 780865)
Time Spent: 1.5h  (was: 1h 20m)

> COLLECT_SET with GROUP BY is very slow when some keys are highly skewed
> ---
>
> Key: HIVE-26184
> URL: https://issues.apache.org/jira/browse/HIVE-26184
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.8, 3.1.3
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I observed some reducers spend 98% of CPU time in invoking 
> `java.util.HashMap#clear`.
> Looking the detail, I found COLLECT_SET reuses a LinkedHashSet and its 
> `clear` can be quite heavy when a relation has a small number of highly 
> skewed keys.
>  
> To reproduce the issue, first, we will create rows with a skewed key.
> {code:java}
> INSERT INTO test_collect_set
> SELECT '----' AS key, CAST(UUID() AS VARCHAR) 
> AS value
> FROM table_with_many_rows
> LIMIT 10;{code}
> Then, we will create many non-skewed rows.
> {code:java}
> INSERT INTO test_collect_set
> SELECT UUID() AS key, UUID() AS value
> FROM table_with_many_rows
> LIMIT 500;{code}
> We can observe the issue when we aggregate values by `key`.
> {code:java}
> SELECT key, COLLECT_SET(value) FROM group_by_skew GROUP BY key{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26184) COLLECT_SET with GROUP BY is very slow when some keys are highly skewed

2022-06-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-26184.
-
Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

merged into master. Thank you [~okumin] !

> COLLECT_SET with GROUP BY is very slow when some keys are highly skewed
> ---
>
> Key: HIVE-26184
> URL: https://issues.apache.org/jira/browse/HIVE-26184
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.8, 3.1.3
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I observed some reducers spend 98% of CPU time in invoking 
> `java.util.HashMap#clear`.
> Looking the detail, I found COLLECT_SET reuses a LinkedHashSet and its 
> `clear` can be quite heavy when a relation has a small number of highly 
> skewed keys.
>  
> To reproduce the issue, first, we will create rows with a skewed key.
> {code:java}
> INSERT INTO test_collect_set
> SELECT '----' AS key, CAST(UUID() AS VARCHAR) 
> AS value
> FROM table_with_many_rows
> LIMIT 10;{code}
> Then, we will create many non-skewed rows.
> {code:java}
> INSERT INTO test_collect_set
> SELECT UUID() AS key, UUID() AS value
> FROM table_with_many_rows
> LIMIT 500;{code}
> We can observe the issue when we aggregate values by `key`.
> {code:java}
> SELECT key, COLLECT_SET(value) FROM group_by_skew GROUP BY key{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26307) Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26307?focusedWorklogId=780861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780861
 ]

ASF GitHub Bot logged work on HIVE-26307:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 15:20
Start Date: 13/Jun/22 15:20
Worklog Time Spent: 10m 
  Work Description: pvary merged PR #3354:
URL: https://github.com/apache/hive/pull/3354




Issue Time Tracking
---

Worklog Id: (was: 780861)
Time Spent: 40m  (was: 0.5h)

> Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads
> -
>
> Key: HIVE-26307
> URL: https://issues.apache.org/jira/browse/HIVE-26307
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With vectorized Iceberg reads we are creating {{HadoopInputFile}} objects 
> just to store the location of the files. If we can avoid this, then we can 
> improve the performance, since the {{path.getFileSystem(conf)}} calls can 
> become costly, especially for S3



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26307) Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads

2022-06-13 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26307.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~szita]!

> Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads
> -
>
> Key: HIVE-26307
> URL: https://issues.apache.org/jira/browse/HIVE-26307
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With vectorized Iceberg reads we are creating {{HadoopInputFile}} objects 
> just to store the location of the files. If we can avoid this, then we can 
> improve the performance, since the {{path.getFileSystem(conf)}} calls can 
> become costly, especially for S3



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26307) Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26307?focusedWorklogId=780857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780857
 ]

ASF GitHub Bot logged work on HIVE-26307:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 15:07
Start Date: 13/Jun/22 15:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3354:
URL: https://github.com/apache/hive/pull/3354#discussion_r895836024


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -381,100 +400,67 @@ private CloseableIterable newAvroIterable(
   Avro.ReadBuilder avroReadBuilder = Avro.read(inputFile)
   .project(readSchema)
   .split(task.start(), task.length());
+

Review Comment:
   The first check in `openVectorized` is this:
   ```
 
Preconditions.checkArgument(!task.file().format().equals(FileFormat.AVRO),
 "Vectorized execution is not yet supported for Iceberg avro 
tables. " +
 "Please turn off vectorization and retry the query.");
   ```





Issue Time Tracking
---

Worklog Id: (was: 780857)
Time Spent: 0.5h  (was: 20m)

> Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads
> -
>
> Key: HIVE-26307
> URL: https://issues.apache.org/jira/browse/HIVE-26307
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With vectorized Iceberg reads we are creating {{HadoopInputFile}} objects 
> just to store the location of the files. If we can avoid this, then we can 
> improve the performance, since the {{path.getFileSystem(conf)}} calls can 
> become costly, especially for S3



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25733) Add check-spelling CI action

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25733?focusedWorklogId=780855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780855
 ]

ASF GitHub Bot logged work on HIVE-25733:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 15:06
Start Date: 13/Jun/22 15:06
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on PR #2809:
URL: https://github.com/apache/hive/pull/2809#issuecomment-1154041239

   only the mysql/metastore test have failed; for which to pass - we would have 
to merge the master branch and run the tests once more...this PR had a clean 
run before - just wanted to be sure that its still good.




Issue Time Tracking
---

Worklog Id: (was: 780855)
Time Spent: 1h 20m  (was: 1h 10m)

> Add check-spelling CI action
> 
>
> Key: HIVE-25733
> URL: https://issues.apache.org/jira/browse/HIVE-25733
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Josh Soref
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add CI to catch spelling errors. See [https://www.check-spelling.dev/] for 
> information.
> Initially this will only check the {{serde}} directory, but the intention is 
> to expand its coverage as spelling errors in other directories are fixed.
> Note that for this to work the action should be made a required check, 
> otherwise when a typo is added forks from that commit will get complaints.
> If a typo is intentional, the action will provide information about how to 
> add it to {{expect.txt}} such that it will be accepted as an expected item 
> (i.e. not a typo).
> To skip a file/directory entirely, add a matching entry to 
> {{{}excludes.txt{}}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25733) Add check-spelling CI action

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25733?focusedWorklogId=780854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780854
 ]

ASF GitHub Bot logged work on HIVE-25733:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 15:05
Start Date: 13/Jun/22 15:05
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged PR #2809:
URL: https://github.com/apache/hive/pull/2809




Issue Time Tracking
---

Worklog Id: (was: 780854)
Time Spent: 1h 10m  (was: 1h)

> Add check-spelling CI action
> 
>
> Key: HIVE-25733
> URL: https://issues.apache.org/jira/browse/HIVE-25733
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Josh Soref
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add CI to catch spelling errors. See [https://www.check-spelling.dev/] for 
> information.
> Initially this will only check the {{serde}} directory, but the intention is 
> to expand its coverage as spelling errors in other directories are fixed.
> Note that for this to work the action should be made a required check, 
> otherwise when a typo is added forks from that commit will get complaints.
> If a typo is intentional, the action will provide information about how to 
> add it to {{expect.txt}} such that it will be accepted as an expected item 
> (i.e. not a typo).
> To skip a file/directory entirely, add a matching entry to 
> {{{}excludes.txt{}}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25733) Add check-spelling CI action

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25733?focusedWorklogId=780853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780853
 ]

ASF GitHub Bot logged work on HIVE-25733:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 15:04
Start Date: 13/Jun/22 15:04
Worklog Time Spent: 10m 
  Work Description: jsoref commented on PR #2809:
URL: https://github.com/apache/hive/pull/2809#issuecomment-1154038518

   Sigh




Issue Time Tracking
---

Worklog Id: (was: 780853)
Time Spent: 1h  (was: 50m)

> Add check-spelling CI action
> 
>
> Key: HIVE-25733
> URL: https://issues.apache.org/jira/browse/HIVE-25733
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Josh Soref
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add CI to catch spelling errors. See [https://www.check-spelling.dev/] for 
> information.
> Initially this will only check the {{serde}} directory, but the intention is 
> to expand its coverage as spelling errors in other directories are fixed.
> Note that for this to work the action should be made a required check, 
> otherwise when a typo is added forks from that commit will get complaints.
> If a typo is intentional, the action will provide information about how to 
> add it to {{expect.txt}} such that it will be accepted as an expected item 
> (i.e. not a typo).
> To skip a file/directory entirely, add a matching entry to 
> {{{}excludes.txt{}}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26320) Incorrect case evaluation for Parquet based table

2022-06-13 Thread Chiran Ravani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiran Ravani updated HIVE-26320:
-
Summary: Incorrect case evaluation for Parquet based table  (was: Incorrect 
case evaluation for Parquet based table.)

> Incorrect case evaluation for Parquet based table
> -
>
> Key: HIVE-26320
> URL: https://issues.apache.org/jira/browse/HIVE-26320
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Query Planning
>Affects Versions: 4.0.0-alpha-1
>Reporter: Chiran Ravani
>Priority: Major
>
> Query involving case statement with two or more conditions leads to incorrect 
> result for tables with parquet format, The problem is not observed with ORC 
> or TextFile.
> *Steps to reproduce*:
> {code:java}
> create external table case_test_parquet(kob varchar(2),enhanced_type_code 
> int) stored as parquet;
> insert into case_test_parquet values('BB',18),('BC',18),('AB',18);
> select case when (
>(kob='BB' and enhanced_type_code='18')
>or (kob='BC' and enhanced_type_code='18')
>  )
> then 1
> else 0
> end as logic_check
> from case_test_parquet;
> {code}
> Result:
> {code}
> 0
> 0
> 0
> {code}
> Expected result:
> {code}
> 1
> 1
> 0
> {code}
> The problem does not appear when setting hive.optimize.point.lookup=false.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.3

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=780830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780830
 ]

ASF GitHub Bot logged work on HIVE-24484:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 13:51
Start Date: 13/Jun/22 13:51
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on code in PR #3279:
URL: https://github.com/apache/hive/pull/3279#discussion_r895742321


##
ql/src/test/results/clientpositive/llap/acid_table_directories_test.q.out:
##
@@ -163,13 +170,6 @@ POSTHOOK: Input: default@acidparttbl@p=200
 ### ACID DELTA DIR ###
 ### ACID DELTA DIR ###
 ### ACID DELTA DIR ###
- A masked pattern was here 

Review Comment:
   [HADOOP-12502](https://issues.apache.org/jira/browse/HADOOP-12502) is the 
culprit and earlier listing was sorted now it isn't so the Ls -R output will 
change intermittently so we can't have this test only hence disabled





Issue Time Tracking
---

Worklog Id: (was: 780830)
Time Spent: 12h 23m  (was: 12h 13m)

> Upgrade Hadoop to 3.3.3
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 23m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26307) Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26307?focusedWorklogId=780822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780822
 ]

ASF GitHub Bot logged work on HIVE-26307:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 13:37
Start Date: 13/Jun/22 13:37
Worklog Time Spent: 10m 
  Work Description: szlta commented on code in PR #3354:
URL: https://github.com/apache/hive/pull/3354#discussion_r895723865


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -381,100 +400,67 @@ private CloseableIterable newAvroIterable(
   Avro.ReadBuilder avroReadBuilder = Avro.read(inputFile)
   .project(readSchema)
   .split(task.start(), task.length());
+

Review Comment:
   nit: Should we keep the original exception ("Vectorized execution is not yet 
supported for Iceberg avro...") here in case the inMemoryDataModel==HIVE ?
   In theory HiveIcebergStorageHandler should prevent such combination, just 
wanted to know if this was a conscious decision.





Issue Time Tracking
---

Worklog Id: (was: 780822)
Time Spent: 20m  (was: 10m)

> Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads
> -
>
> Key: HIVE-26307
> URL: https://issues.apache.org/jira/browse/HIVE-26307
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With vectorized Iceberg reads we are creating {{HadoopInputFile}} objects 
> just to store the location of the files. If we can avoid this, then we can 
> improve the performance, since the {{path.getFileSystem(conf)}} calls can 
> become costly, especially for S3



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-06-13 Thread Chiran Ravani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553584#comment-17553584
 ] 

Chiran Ravani commented on HIVE-25980:
--

[~ste...@apache.org] Thank you for sharing the idea. I can open another Jira to 
track this improvement, the goal for this one was to reduce unnecessary FS 
calls which we were making as part of HiveMetaStoreChecker.checkTable.

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>   at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1331)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
>   at 
> 

[jira] [Commented] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-06-13 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553577#comment-17553577
 ] 

Steve Loughran commented on HIVE-25980:
---

ok. I'd still recommend the method {{listStatusIterator}} for paginated 
retrieval of wide listings from hdfs, s3a and abfs, especially if you can do 
any useful work during that iteration

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>   at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1331)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
>   at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)

[jira] [Commented] (HIVE-26311) Incorrect content of array when IN operator is in the filter

2022-06-13 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553572#comment-17553572
 ] 

Peter Vary commented on HIVE-26311:
---

[~gaborkaszab]: Would HIVE-26250 help?

> Incorrect content of array when IN operator is in the filter
> 
>
> Key: HIVE-26311
> URL: https://issues.apache.org/jira/browse/HIVE-26311
> Project: Hive
>  Issue Type: Bug
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: correctness
> Attachments: arrays.parq
>
>
> select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 
> 2 = 1 and id = 5
> {code:java}
> +-+---+---+
> | id  |     arr1      |                 arr2                  |
> +-+---+---+
> | 5   | [10,null,12]  | ["ten","eleven","twelve","thirteen"]  |
> +-+---+---+{code}
> select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 
> 2 = 1 *and id in (select id from functional_parquet.alltypestiny)* and id = 5;
> {code:java}
> +-+-+---+
> | id  |    arr1     |                 arr2                  |
> +-+-+---+
> | 5   | [10,10,12]  | ["ten","eleven","twelve","thirteen"]  |
> +-+-+---+ {code}
> Note, the first (and correct) example returns 10, null and 12 as the items of 
> an array while the second query for some reaon shows 10 instead of the null 
> value. The only difference between the 2 examples is that in the second I 
> added an extra filter (that in fact doesn't filter out anything as 
> functional_parquet.alltypestiny's ID contains numbers from zero to ten)
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-26319:
--
Component/s: File Formats

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-26319:
--
Fix Version/s: 4.0.0

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=780788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780788
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 11:46
Start Date: 13/Jun/22 11:46
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request, #3362:
URL: https://github.com/apache/hive/pull/3362

   ### What changes were proposed in this pull request?
   Rewrite update statements of iceberg tables to multi insert statement 
similarly in case of native acid tables.
   
   When generating the rewritten statement:
   * Get the virtual columns from the table's storage handler in case of non 
native acid tables
   * Include the old values to the select clause of the delete branch of the 
multi insert statement.
   
   When executing the multi insert:
   * Two iceberg writers are used which produce a data delta file and a delete 
delta file. The result of these writers should be merged into one 
`FilesForCommit` if both writers are run in the same task.
   * In case of more complex statements (ex. partitioned and/or bucketed) more 
than one Tez task produces commit info so this patch enables storing all of 
them.
   * Every `FileSinkOperator` creates its own jobConf instance because the 
iceberg write operation is stored in it and it is different in both instance.
   
   
   ### Why are the changes needed?
   See #2855
   + Preparation for iceberg Merge implementation.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestIcebergLlapLocalCliDriver -Dqfile=update_iceberg_partitioned_orc2.q 
-pl itests/qtest-iceberg -Piceberg -Pitests
   ```




Issue Time Tracking
---

Worklog Id: (was: 780788)
Remaining Estimate: 0h
Time Spent: 10m

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26319:
--
Labels: pull-request-available  (was: )

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-26319:
-


> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=780770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780770
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 09:56
Start Date: 13/Jun/22 09:56
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r895538935


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##
@@ -326,10 +356,26 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
   CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
   prFromMetastore.setPartitionName(getPartitionName(table, partition));
   prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
+  if (allPartDirs.contains(partPath)) {
 result.getCorrectPartitions().add(prFromMetastore);
+allPartDirs.remove(partPath);
+  } else {
+// There can be edge case where user can define partition directory 
outside of table directory
+// to avoid eviction of such partitions
+// we check existence of partition path which are not in table 
directory
+// and add to result for getPartitionsNotOnFs.
+if (!partPath.toString().contains(tablePathStr)) {

Review Comment:
   can't we have a simple check for this? like
   
   `if (!fs.exists(partPath)) {
   result.getPartitionsNotOnFs().add(prFromMetastore);
 } else {
   result.getCorrectPartitions().add(prFromMetastore);
 }`





Issue Time Tracking
---

Worklog Id: (was: 780770)
Time Spent: 8h 20m  (was: 8h 10m)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> 

[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=780769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780769
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 09:51
Start Date: 13/Jun/22 09:51
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r895533968


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##
@@ -326,10 +356,26 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
   CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
   prFromMetastore.setPartitionName(getPartitionName(table, partition));
   prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
+  if (allPartDirs.contains(partPath)) {

Review Comment:
   allPartDirs.remove(partPath); will return true if the element exits and 
false otherwise. we can have one statement instead of contains and then remove





Issue Time Tracking
---

Worklog Id: (was: 780769)
Time Spent: 8h 10m  (was: 8h)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> 

[jira] [Updated] (HIVE-26298) Selecting complex types on migrated iceberg table does not work

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26298:
--
Labels: pull-request-available  (was: )

> Selecting complex types on migrated iceberg table does not work
> ---
>
> Key: HIVE-26298
> URL: https://issues.apache.org/jira/browse/HIVE-26298
> Project: Hive
>  Issue Type: Bug
>Reporter: Gergely Fürnstáhl
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1-a5d522f4-a065-44e6-983b-ba66596b4332.metadata.json
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am working on implementing NameMapping in Impala (mainly replicating Hive's 
> functionality) and ran into the following issue:
> {code:java}
> CREATE TABLE array_demo
> (
>   int_primitive INT,
>   int_array ARRAY,
>   int_array_array ARRAY>,
>   int_to_array_array_Map MAP>>
> )
> STORED AS ORC;
> INSERT INTO array_demo values (0, array(1), array(array(2), array(3,4)), 
> map(5,array(array(6),array(7,8;
> select * from array_demo;
> +---+---+-++
> | array_demo.int_primitive  | array_demo.int_array  | 
> array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> +---+---+-++
> | 0                         | [1]                   | [[2],[3,4]]             
>     | {5:[[6],[7,8]]}                    |
> +---+---+-++
>  {code}
> Converting to iceberg
>  
>  
> {code:java}
> ALTER TABLE array_demo SET TBLPROPERTIES 
> ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')
> select * from array_demo;
> INFO  : Compiling 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe):
>  select * from array_demo
> INFO  : No Stats for default@array_demo, Columns: int_primitive, int_array, 
> int_to_array_array_map, int_array_array
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:array_demo.int_primitive, type:int, 
> comment:null), FieldSchema(name:array_demo.int_array, type:array, 
> comment:null), FieldSchema(name:array_demo.int_array_array, 
> type:array>, comment:null), 
> FieldSchema(name:array_demo.int_to_array_array_map, 
> type:map>>, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe);
>  Time taken: 0.036 seconds
> INFO  : Executing 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe):
>  select * from array_demo
> INFO  : Completed executing 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe);
>  Time taken: 0.0 seconds
> INFO  : OK
> Error: java.io.IOException: java.lang.IllegalArgumentException: Can not 
> promote MAP type to INTEGER (state=,code=0)
> select int_primitive from array_demo;
> ++
> | int_primitive  |
> ++
> | 0              |
> ++
> 1 row selected (0.088 seconds)
>  {code}
> Removing schema.name-mapping.default solves it
> {code:java}
> ALTER TABLE array_demo UNSET TBLPROPERTIES ('schema.name-mapping.default');
> select * from array_demo;
> +---+---+-++
> | array_demo.int_primitive  | array_demo.int_array  | 
> array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> +---+---+-++
> | 0                         | [1]                   | [[2],[3,4]]             
>     | {5:[[6],[7,8]]}                    |
> +---+---+-++
>  {code}
> Possible cause:
>  
> The name mapping generated and pushed into schema.name-mapping.default is 
> different from the name mapping in the schema in the metadata.json (attached 
> it)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26298) Selecting complex types on migrated iceberg table does not work

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26298?focusedWorklogId=780766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780766
 ]

ASF GitHub Bot logged work on HIVE-26298:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 09:41
Start Date: 13/Jun/22 09:41
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request, #3361:
URL: https://github.com/apache/hive/pull/3361

   
   
   ### What changes were proposed in this pull request?
   Change iceberg field ID generation while converting hive schema to iceberg 
schema.
   
   
   
   ### Why are the changes needed?
   When converting hive schema to iceberg schema the field ID is generated as 
we traverse the schema tree in preorder. This can be an issue while converting 
multiple levels of complex types since iceberg expects that the IDs are 
generated using a custom tree traversal technique. 
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Unit test
   
   




Issue Time Tracking
---

Worklog Id: (was: 780766)
Remaining Estimate: 0h
Time Spent: 10m

> Selecting complex types on migrated iceberg table does not work
> ---
>
> Key: HIVE-26298
> URL: https://issues.apache.org/jira/browse/HIVE-26298
> Project: Hive
>  Issue Type: Bug
>Reporter: Gergely Fürnstáhl
>Assignee: László Pintér
>Priority: Major
> Attachments: 1-a5d522f4-a065-44e6-983b-ba66596b4332.metadata.json
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am working on implementing NameMapping in Impala (mainly replicating Hive's 
> functionality) and ran into the following issue:
> {code:java}
> CREATE TABLE array_demo
> (
>   int_primitive INT,
>   int_array ARRAY,
>   int_array_array ARRAY>,
>   int_to_array_array_Map MAP>>
> )
> STORED AS ORC;
> INSERT INTO array_demo values (0, array(1), array(array(2), array(3,4)), 
> map(5,array(array(6),array(7,8;
> select * from array_demo;
> +---+---+-++
> | array_demo.int_primitive  | array_demo.int_array  | 
> array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> +---+---+-++
> | 0                         | [1]                   | [[2],[3,4]]             
>     | {5:[[6],[7,8]]}                    |
> +---+---+-++
>  {code}
> Converting to iceberg
>  
>  
> {code:java}
> ALTER TABLE array_demo SET TBLPROPERTIES 
> ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')
> select * from array_demo;
> INFO  : Compiling 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe):
>  select * from array_demo
> INFO  : No Stats for default@array_demo, Columns: int_primitive, int_array, 
> int_to_array_array_map, int_array_array
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:array_demo.int_primitive, type:int, 
> comment:null), FieldSchema(name:array_demo.int_array, type:array, 
> comment:null), FieldSchema(name:array_demo.int_array_array, 
> type:array>, comment:null), 
> FieldSchema(name:array_demo.int_to_array_array_map, 
> type:map>>, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe);
>  Time taken: 0.036 seconds
> INFO  : Executing 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe):
>  select * from array_demo
> INFO  : Completed executing 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe);
>  Time taken: 0.0 seconds
> INFO  : OK
> Error: java.io.IOException: java.lang.IllegalArgumentException: Can not 
> promote MAP type to INTEGER (state=,code=0)
> select int_primitive from array_demo;
> ++
> | int_primitive  |
> ++
> | 0              |
> ++
> 1 row selected (0.088 seconds)
>  {code}
> Removing schema.name-mapping.default solves it
> {code:java}
> ALTER TABLE array_demo UNSET TBLPROPERTIES ('schema.name-mapping.default');
> select * from array_demo;
> +---+---+-++
> | array_demo.int_primitive  | array_demo.int_array  | 
> array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> 

[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=780762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780762
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 09:20
Start Date: 13/Jun/22 09:20
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3053:
URL: https://github.com/apache/hive/pull/3053#issuecomment-1153681364

   @cravani: Only a small nit, then we are ready.
   Thanks!




Issue Time Tracking
---

Worklog Id: (was: 780762)
Time Spent: 8h  (was: 7h 50m)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>   at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1331)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
>   at 
> 

[jira] [Assigned] (HIVE-26318) Select on migrated iceberg table fails with NPE

2022-06-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-26318:



> Select on migrated iceberg table fails with NPE
> ---
>
> Key: HIVE-26318
> URL: https://issues.apache.org/jira/browse/HIVE-26318
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> Enable vectorization:
> {code:sql}
> set hive.vectorized.execution.enabled=true;
> {code}
> Create a hive table with the following schema:
> {code:sql}
> CREATE EXTERNAL TABLE tbl_complex (
> a int, 
> arrayofprimitives array, 
> arrayofarrays array>,
> arrayofmaps array>,
> arrayofstructs array somewhere:string>>,
> mapofprimitives map,
> mapofarrays map>,
> mapofmaps map>,
> mapofstructs map somewhere:string>>,
> structofprimitives struct, 
> structofarrays struct, birthdays:array>, 
> structofmaps struct, map2:map>
> ) STORED AS PARQUET" {code}
> Insert some data:
> {code:sql}
> INSERT INTO tbl_complex VALUES (
> 1, 
> array('a','b','c'), 
> array(array('a'), array('b', 'c')), 
> array(map('a','b'), map('e','f')), 
> array(named_struct('something', 'a', 'someone', 'b', 'somewhere', 
> 'c'), 
> named_struct('something', 'e', 'someone', 'f', 'somewhere', 'g')), 
> map('a', 'b'), 
> map('a', array('b','c')), 
> map('a', map('b','c')), 
> map('a', named_struct('something', 'b', 'someone', 'c', 'somewhere', 
> 'd')), 
> named_struct('something', 'a', 'somewhere', 'b'), 
> named_struct('names', array('a', 'b'), 'birthdays', array('c', 'd', 
> 'e')), 
> named_struct('map1', map('a', 'b'), 'map2', map('c', 'd')) 
>  )
> {code}
> Migrate the table to iceberg:
> {code:sql}
> ALTER TABLE tbl_complex SET TBLPROPERTIES 
> ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler');
> {code}
> Run a simple query: 
> {code:sql}
> SELECT * FROM tbl_complex ORDER BY a;
> {code}
> It will fail with:
> {code:txt}
> TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1655110825475_0001_3_00_00_1:java.lang.RuntimeException: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:200)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:139)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:105)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:164)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:83)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:706)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:665)
>   at 
> 

[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=780758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780758
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 09:10
Start Date: 13/Jun/22 09:10
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r895497824


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##
@@ -309,7 +309,37 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
   return;
 }
 
-Set partPaths = new HashSet<>();
+// now check the table folder and see if we find anything
+// that isn't in the metastore
+Set allPartDirs = new HashSet<>();
+List partColumns = table.getPartitionKeys();
+checkPartitionDirs(tablePath, allPartDirs, 
Collections.unmodifiableList(getPartColNames(table)));
+String tablePathStr = tablePath.toString();
+int tablePathLength = tablePathStr.length();
+
+if (filterExp != null) {
+  PartitionExpressionProxy expressionProxy = createExpressionProxy(conf);
+  List partitions = new ArrayList<>();
+  Set partDirs = new HashSet();
+  boolean tablePathStrEndsWith = tablePathStr.endsWith("/");
+  allPartDirs.stream().forEach(path -> {
+if (tablePathStrEndsWith) {

Review Comment:
   We can move this check out of the loop:
   ```
   int tablePathStrLen = tablePathStr.endsWith("/") ? tablePathStr.length() : 
tablePathStr.length() + 1;
   allPartDirs.stream().forEach(path -> 
partitions.add(path.toString().substring(tablePathStrLen)));
   ```





Issue Time Tracking
---

Worklog Id: (was: 780758)
Time Spent: 7h 50m  (was: 7h 40m)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> 

[jira] [Assigned] (HIVE-26317) Select on iceberg table stored as parquet and vectorization enabled fails with Runtime Exception

2022-06-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-26317:



> Select on iceberg table stored as parquet and vectorization enabled fails 
> with Runtime Exception
> 
>
> Key: HIVE-26317
> URL: https://issues.apache.org/jira/browse/HIVE-26317
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> Create an iceberg table having the following schema:
> {code:sql}
> CREATE EXTERNAL tbl_complex TABLE  (a int, arrayofarrays 
> array>) STORED BY ICEBERG STORED AS PARQUET
> {code}
> Insert some data into it:
> {code:sql}
> INSERT INTO tbl_complex VALUES (1, array(array('a'), array('b', 'c')))
> {code}
> Turn on vectorization and run a simple query:
> {code:sql}
> set hive.vectorized.execution.enabled=true;
> SELECT * FROM tbl_complex;
> {code}
> The query will fail with
> {code:java}
> Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, 
> Vertex vertex_1655109552551_0001_2_00 [Map 1] killed/failed due 
> to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:367)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:246)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:270)
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:281)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:545)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:513)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:271)
>   at 
> org.apache.iceberg.mr.hive.TestHiveShell.executeStatement(TestHiveShell.java:142)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1655109552551_0001_2_00, diagnostics=[Task 
> failed, taskId=task_1655109552551_0001_2_00_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1655109552551_0001_2_00_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.lang.RuntimeException: Unsupported type used in list:array
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.RuntimeException: Unsupported type used in 
> list:array
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:89)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 16 more
> Caused by: java.io.IOException: java.lang.RuntimeException: Unsupported type 
> used in list:array
>   at 
> 

[jira] [Assigned] (HIVE-26316) Handle dangling open txns on both src & tgt in unplanned failover.

2022-06-13 Thread Haymant Mangla (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla reassigned HIVE-26316:
-


> Handle dangling open txns on both src & tgt in unplanned failover.
> --
>
> Key: HIVE-26316
> URL: https://issues.apache.org/jira/browse/HIVE-26316
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26165) Remove READ locks for ACID tables

2022-06-13 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-26165.
---
  Assignee: Denys Kuzmenko
Resolution: Fixed

> Remove READ locks for ACID tables
> -
>
> Key: HIVE-26165
> URL: https://issues.apache.org/jira/browse/HIVE-26165
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since operations that required an EXCLUSIVE lock were rewritten to READs 
> non-blocking, we do not need READ locks anymore. That should improve ACID 
> concurrency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26165) Remove READ locks for ACID tables

2022-06-13 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553411#comment-17553411
 ] 

Denys Kuzmenko commented on HIVE-26165:
---

Merged to master.
[~klcopp], thank you for the review!

> Remove READ locks for ACID tables
> -
>
> Key: HIVE-26165
> URL: https://issues.apache.org/jira/browse/HIVE-26165
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since operations that required an EXCLUSIVE lock were rewritten to READs 
> non-blocking, we do not need READ locks anymore. That should improve ACID 
> concurrency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26165) Remove READ locks for ACID tables

2022-06-13 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26165:
--
Summary: Remove READ locks for ACID tables  (was: Remove READ locks for 
ACID tables with SoftDelete enabled)

> Remove READ locks for ACID tables
> -
>
> Key: HIVE-26165
> URL: https://issues.apache.org/jira/browse/HIVE-26165
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since operations that required an EXCLUSIVE lock were rewritten to READs 
> non-blocking, we do not need READ locks anymore. That should improve ACID 
> concurrency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26165) Remove READ locks for ACID tables with SoftDelete enabled

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26165?focusedWorklogId=780723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780723
 ]

ASF GitHub Bot logged work on HIVE-26165:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 07:14
Start Date: 13/Jun/22 07:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged PR #3235:
URL: https://github.com/apache/hive/pull/3235




Issue Time Tracking
---

Worklog Id: (was: 780723)
Time Spent: 0.5h  (was: 20m)

> Remove READ locks for ACID tables with SoftDelete enabled
> -
>
> Key: HIVE-26165
> URL: https://issues.apache.org/jira/browse/HIVE-26165
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since operations that required an EXCLUSIVE lock were rewritten to READs 
> non-blocking, we do not need READ locks anymore. That should improve ACID 
> concurrency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)