[jira] [Commented] (HIVE-27375) SharedWorkOptimizer assigns a common cache key to MapJoin operators that should not share MapJoin tables

2023-06-13 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731930#comment-17731930
 ] 

Denys Kuzmenko commented on HIVE-27375:
---

Merged to master.
Thanks for the contribution [~seonggon] and [~akshatm], [~rkirtir] for the 
review!

> SharedWorkOptimizer assigns a common cache key to MapJoin operators that 
> should not share MapJoin tables
> 
>
> Key: HIVE-27375
> URL: https://issues.apache.org/jira/browse/HIVE-27375
> Project: Hive
>  Issue Type: Bug
>Reporter: Sungwoo Park
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
>
> When hive.optimize.shared.work.mapjoin.cache.reuse is set to true, 
> SharedWorkOptimizer sometimes assigns a common cache key to MapJoin operators 
> that should not share MapJoin tables. This bug occurs only for MapJoin 
> operators with 3 or more parent operators.
> Example:
> MAPJOIN[575] (RS_83, GBY_66, RS_85)
> MAPJOIN[585] (RS_212, RS_213, GBY_210)
> In this example, both MAPJOIN[575] and MAPJOIN[585] have three parent 
> operators. The current implementation assigns a common cache key to 
> MAPJOIN[575] and MAPJOIN[585] because RS_83 are RS_212 are equivalent.
> However, MAPJOIN[575] uses GBY_66 for its big table whereas MAPJOIN[585] uses 
> GBY_210 for its big table. As a result, the MapJoin table loaded by one 
> operator cannot be used by the other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27375) SharedWorkOptimizer assigns a common cache key to MapJoin operators that should not share MapJoin tables

2023-06-13 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-27375.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> SharedWorkOptimizer assigns a common cache key to MapJoin operators that 
> should not share MapJoin tables
> 
>
> Key: HIVE-27375
> URL: https://issues.apache.org/jira/browse/HIVE-27375
> Project: Hive
>  Issue Type: Bug
>Reporter: Sungwoo Park
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When hive.optimize.shared.work.mapjoin.cache.reuse is set to true, 
> SharedWorkOptimizer sometimes assigns a common cache key to MapJoin operators 
> that should not share MapJoin tables. This bug occurs only for MapJoin 
> operators with 3 or more parent operators.
> Example:
> MAPJOIN[575] (RS_83, GBY_66, RS_85)
> MAPJOIN[585] (RS_212, RS_213, GBY_210)
> In this example, both MAPJOIN[575] and MAPJOIN[585] have three parent 
> operators. The current implementation assigns a common cache key to 
> MAPJOIN[575] and MAPJOIN[585] because RS_83 are RS_212 are equivalent.
> However, MAPJOIN[575] uses GBY_66 for its big table whereas MAPJOIN[585] uses 
> GBY_210 for its big table. As a result, the MapJoin table loaded by one 
> operator cannot be used by the other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27436) Alter table schema fails when partition spec contains columns not in the lower registry

2023-06-13 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-27436:
-

 Summary: Alter table schema fails when partition spec contains 
columns not in the lower registry
 Key: HIVE-27436
 URL: https://issues.apache.org/jira/browse/HIVE-27436
 Project: Hive
  Issue Type: Task
Reporter: Denys Kuzmenko


Steps to repro
{code}
CREATE EXTERNAL TABLE tbl_ice(ID INT, NAME STRING) 
PARTITIONED BY (DEPT STRING) 
STORED BY ICEBERG STORED AS PARQUET;

INSERT INTO TABLE tbl_ice
VALUES (1,'ONE','MATH'), (2, 'ONE','PHYSICS'), (3,'ONE','CHEMISTRY'), 
(4,'TWO','MATH'), (5, 'TWO','PHYSICS'), (6,'TWO','CHEMISTRY');

ALTER TABLE tbl_ice ADD COLUMNS (emp_count int);
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27436) Iceberg: Alter table schema fails when partition spec contains columns not in the lower registry

2023-06-13 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27436:
--
Summary: Iceberg: Alter table schema fails when partition spec contains 
columns not in the lower registry  (was: Alter table schema fails when 
partition spec contains columns not in the lower registry)

> Iceberg: Alter table schema fails when partition spec contains columns not in 
> the lower registry
> 
>
> Key: HIVE-27436
> URL: https://issues.apache.org/jira/browse/HIVE-27436
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> Steps to repro
> {code}
> CREATE EXTERNAL TABLE tbl_ice(ID INT, NAME STRING) 
> PARTITIONED BY (DEPT STRING) 
> STORED BY ICEBERG STORED AS PARQUET;
> INSERT INTO TABLE tbl_ice
> VALUES (1,'ONE','MATH'), (2, 'ONE','PHYSICS'), (3,'ONE','CHEMISTRY'), 
> (4,'TWO','MATH'), (5, 'TWO','PHYSICS'), (6,'TWO','CHEMISTRY');
> ALTER TABLE tbl_ice ADD COLUMNS (emp_count int);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27436) Iceberg: Alter table schema fails when partition spec contains columns not in the lower registry

2023-06-13 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-27436:
-

Assignee: Denys Kuzmenko

> Iceberg: Alter table schema fails when partition spec contains columns not in 
> the lower registry
> 
>
> Key: HIVE-27436
> URL: https://issues.apache.org/jira/browse/HIVE-27436
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> Steps to repro
> {code}
> CREATE EXTERNAL TABLE tbl_ice(ID INT, NAME STRING) 
> PARTITIONED BY (DEPT STRING) 
> STORED BY ICEBERG STORED AS PARQUET;
> INSERT INTO TABLE tbl_ice
> VALUES (1,'ONE','MATH'), (2, 'ONE','PHYSICS'), (3,'ONE','CHEMISTRY'), 
> (4,'TWO','MATH'), (5, 'TWO','PHYSICS'), (6,'TWO','CHEMISTRY');
> ALTER TABLE tbl_ice ADD COLUMNS (emp_count int);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27436) Iceberg: Alter table schema fails when partition spec contains columns not in the lower registry

2023-06-13 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27436:
--
Status: Patch Available  (was: Open)

> Iceberg: Alter table schema fails when partition spec contains columns not in 
> the lower registry
> 
>
> Key: HIVE-27436
> URL: https://issues.apache.org/jira/browse/HIVE-27436
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> Steps to repro
> {code}
> CREATE EXTERNAL TABLE tbl_ice(ID INT, NAME STRING) 
> PARTITIONED BY (DEPT STRING) 
> STORED BY ICEBERG STORED AS PARQUET;
> INSERT INTO TABLE tbl_ice
> VALUES (1,'ONE','MATH'), (2, 'ONE','PHYSICS'), (3,'ONE','CHEMISTRY'), 
> (4,'TWO','MATH'), (5, 'TWO','PHYSICS'), (6,'TWO','CHEMISTRY');
> ALTER TABLE tbl_ice ADD COLUMNS (emp_count int);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22590) Revert HIVE-17218 Canonical-ize hostnames for Hive metastore, and HS2 servers as it causes issues with SSL and LB

2023-06-13 Thread Nedzad Campara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732029#comment-17732029
 ] 

Nedzad Campara commented on HIVE-22590:
---

I would like to bump this, as this is a major issue that breaks SSL for 
Hiveserver2 behind an LB.

> Revert HIVE-17218 Canonical-ize hostnames for Hive metastore, and HS2 servers 
> as it causes issues with SSL and LB
> -
>
> Key: HIVE-22590
> URL: https://issues.apache.org/jira/browse/HIVE-22590
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Priority: Major
>
> HIVE-17218 causes issues with Load balanced + SSL environments, like this:
> {code}
> java.net.SocketException: Socket is closed 
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1532) 
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) 
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158) 
> at 
> org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
>  
> at org.apache.thrift.transport.TSocket.close(TSocket.java:235) 
> at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:318) 
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
>  
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  
> at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204) 
> at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:169) 
> at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) 
> at java.sql.DriverManager.getConnection(DriverManager.java:664) 
> at java.sql.DriverManager.getConnection(DriverManager.java:208) 
> at 
> org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:146)
>  
> at 
> org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:211)
>  
> at org.apache.hive.beeline.Commands.connect(Commands.java:1496) 
> at org.apache.hive.beeline.Commands.connect(Commands.java:1391) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:498) 
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
>  
> at org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1135) 
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1174) 
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010) 
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:922) 
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518) 
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:498) 
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
> Unknown HS2 problem when communicating with Thrift server. 
> Error: Could not open client transport with JDBC Uri: jdbc:hive2:// For privacy> GSS initiate failed 
> Also, could not send response: 
> org.apache.thrift.transport.TTransportException: 
> javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
> No subject alternative names matching IP address 52 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27434) Preparing for 4.0.0-beta-1 development

2023-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27434:
--
Labels: pull-request-available  (was: )

> Preparing for 4.0.0-beta-1 development
> --
>
> Key: HIVE-27434
> URL: https://issues.apache.org/jira/browse/HIVE-27434
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27437) Vectorization: VectorizedOrcRecordReader does not reset VectorizedRowBatch after processing

2023-06-13 Thread Alagappan Maruthappan (Jira)
Alagappan Maruthappan created HIVE-27437:


 Summary: Vectorization: VectorizedOrcRecordReader does not reset 
VectorizedRowBatch after processing
 Key: HIVE-27437
 URL: https://issues.apache.org/jira/browse/HIVE-27437
 Project: Hive
  Issue Type: Task
Reporter: Alagappan Maruthappan
Assignee: Alagappan Maruthappan


There seems to be a memory leak in VectorizedOrcRecordReader. When 
MapColumnVector or ListColumnVector is used and the VectorizedRowBatch is not 
reset after every read, the vector keeps growing and spending a lot of time 
assigning memory. 
The reset happens in VectorizedParquetRecordReader -

https://github.com/apache/hive/blob/f78ca5df80c0bcb566f0915cda65112268df492c/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java#L400



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27437) Vectorization: VectorizedOrcRecordReader does not reset VectorizedRowBatch after processing

2023-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27437:
--
Labels: pull-request-available  (was: )

> Vectorization: VectorizedOrcRecordReader does not reset VectorizedRowBatch 
> after processing
> ---
>
> Key: HIVE-27437
> URL: https://issues.apache.org/jira/browse/HIVE-27437
> Project: Hive
>  Issue Type: Task
>Reporter: Alagappan Maruthappan
>Assignee: Alagappan Maruthappan
>Priority: Major
>  Labels: pull-request-available
>
> There seems to be a memory leak in VectorizedOrcRecordReader. When 
> MapColumnVector or ListColumnVector is used and the VectorizedRowBatch is not 
> reset after every read, the vector keeps growing and spending a lot of time 
> assigning memory. 
> The reset happens in VectorizedParquetRecordReader -
> https://github.com/apache/hive/blob/f78ca5df80c0bcb566f0915cda65112268df492c/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java#L400



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27438) Audit leader election event failed in non-appendable filesystems

2023-06-13 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-27438:
--

 Summary: Audit leader election event failed in non-appendable 
filesystems
 Key: HIVE-27438
 URL: https://issues.apache.org/jira/browse/HIVE-27438
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Zhihua Deng
Assignee: Zhihua Deng


If the underlying file system of the warehouse is S3 or others based which 
cannot support append operation, then auditing the leader info to the remote 
file could be failed. For example:

org.apache.hadoop.hive.metastore.HiveMetaStore: [Leader-Watcher-housekeeping1]: 
Error while writing the leader info, path: s3a://.../leader_housekeeping.json

java.lang.UnsupportedOperationException: Append is not supported by 
S3AFileSystem

As a result, the audit logs would be missing and the user cannot be able to see 
the history changes of leader any more.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27438) Audit leader election event failed in non-appendable filesystems

2023-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27438:
--
Labels: pull-request-available  (was: )

> Audit leader election event failed in non-appendable filesystems
> 
>
> Key: HIVE-27438
> URL: https://issues.apache.org/jira/browse/HIVE-27438
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>
> If the underlying file system of the warehouse is S3 or others based which 
> cannot support append operation, then auditing the leader info to the remote 
> file could be failed. For example:
> org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [Leader-Watcher-housekeeping1]: Error while writing the leader info, path: 
> s3a://.../leader_housekeeping.json
> java.lang.UnsupportedOperationException: Append is not supported by 
> S3AFileSystem
> As a result, the audit logs would be missing and the user cannot be able to 
> see the history changes of leader any more.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)