[jira] [Commented] (HIVE-27406) CompactionTxnHandler cleanup

2023-09-28 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770314#comment-17770314
 ] 

Stamatis Zampetakis commented on HIVE-27406:


The 4.0.0-beta-1 version is already out and this ticket is not part of it. I 
updated the Fix Version to indicate the next (possibly 4.0.0).

> CompactionTxnHandler cleanup
> 
>
> Key: HIVE-27406
> URL: https://issues.apache.org/jira/browse/HIVE-27406
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Tech Debt elimination. Cleanup and standardize CompactionTxnHandler.java 
> considering the following items, but not limited to them:
>  * Check for proper javadoc
>  * Remove unnecessary logging, adjust log level properly
>  * Consistent transaction handling:
>  ** No rollback for selects
>  ** Exception in case of update count mismatch (single row update returns 0 
> or >1 for updated rows)
>  ** Review proper usage of RetrySemantics.* annotations
>  * Common private template methods for 'infrastructure' code (same try-catch 
> logic, connection handling, etc: possibly with spring-jdbc
>  * Replace inline hardcoded and assembled SQL statements with parameterized 
> constants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27406) CompactionTxnHandler cleanup

2023-09-28 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27406:
---
Fix Version/s: 4.0.0
   (was: 4.0.0-beta-1)

> CompactionTxnHandler cleanup
> 
>
> Key: HIVE-27406
> URL: https://issues.apache.org/jira/browse/HIVE-27406
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Tech Debt elimination. Cleanup and standardize CompactionTxnHandler.java 
> considering the following items, but not limited to them:
>  * Check for proper javadoc
>  * Remove unnecessary logging, adjust log level properly
>  * Consistent transaction handling:
>  ** No rollback for selects
>  ** Exception in case of update count mismatch (single row update returns 0 
> or >1 for updated rows)
>  ** Review proper usage of RetrySemantics.* annotations
>  * Common private template methods for 'infrastructure' code (same try-catch 
> logic, connection handling, etc: possibly with spring-jdbc
>  * Replace inline hardcoded and assembled SQL statements with parameterized 
> constants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27757) Upgrade hadoop to 3.3.6

2023-09-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27757:
--
Labels: pull-request-available  (was: )

> Upgrade hadoop to 3.3.6
> ---
>
> Key: HIVE-27757
> URL: https://issues.apache.org/jira/browse/HIVE-27757
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Hadoop 3.3.6 is released and comes up with lots of improvements & CVE fixes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27757) Upgrade hadoop to 3.3.6

2023-09-28 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27757:
---

 Summary: Upgrade hadoop to 3.3.6
 Key: HIVE-27757
 URL: https://issues.apache.org/jira/browse/HIVE-27757
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Hadoop 3.3.6 is released and comes up with lots of improvements & CVE fixes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27704) Remove PowerMock from jdbc-handler and upgrade mockito to 4.11

2023-09-28 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27704.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Remove PowerMock from jdbc-handler and upgrade mockito to 4.11
> --
>
> Key: HIVE-27704
> URL: https://issues.apache.org/jira/browse/HIVE-27704
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Assignee: KIRTI RUGE
>Priority: Major
>  Labels: newbie, pull-request-available, starter
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27704) Remove PowerMock from jdbc-handler and upgrade mockito to 4.11

2023-09-28 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770302#comment-17770302
 ] 

Ayush Saxena commented on HIVE-27704:
-

Committed to master.

Thanx [~rkirtir] for the contribution & [~InvisibleProgrammer] for the review!!!

> Remove PowerMock from jdbc-handler and upgrade mockito to 4.11
> --
>
> Key: HIVE-27704
> URL: https://issues.apache.org/jira/browse/HIVE-27704
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Assignee: KIRTI RUGE
>Priority: Major
>  Labels: newbie, pull-request-available, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27744) privileges check is skipped when using partly dynamic partition write.

2023-09-28 Thread shuaiqi.guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuaiqi.guo reassigned HIVE-27744:
--

Assignee: shuaiqi.guo

> privileges check is skipped when using partly dynamic partition write.
> --
>
> Key: HIVE-27744
> URL: https://issues.apache.org/jira/browse/HIVE-27744
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: All Versions
>Reporter: shuaiqi.guo
>Assignee: shuaiqi.guo
>Priority: Blocker
> Fix For: 2.3.5
>
> Attachments: HIVE-27744.patch
>
>
> the privileges check will be skiped when using dynamic partition write with 
> part of the partition specified, just like the following example:
> {code:java}
> insert overwrite table test_privilege partition (`date` = '2023-09-27', hour)
> ... {code}
> hive will execute it directly without checking write privileges.
>  
> use the following patch to fix this bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27743) Semantic Search In Hive

2023-09-28 Thread Sreenath (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath updated HIVE-27743:

Description: 
_Semantic search is the tech power *vector databases,* and we can have the same 
power in Hive._
Semantic search is a way for computers to understand the meaning behind words 
and phrases when you're searching for something. Instead of just looking for 
exact matches of keywords, it tries to figure out what you're really asking and 
provides results that are more relevant and meaningful to your question. It's 
like having a search engine that can understand what you mean, not just what 
you say, making it easier to find the information you're looking for. This 
ticket is a wish to have Semantic search in Hive.

On the implementation side, semantic search uses an embedding model and any of 
the similarity distance functions.

My proposal is to implement functions for on-the-fly calculation of similarity 
distance between two values. Once we have them we could easily do semantic 
search as part of a where clause.
 * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') > 
0.9“. And it could return records with regions like Scandinavia, Nordic, Baltic 
etc…
 * We could have functions thats accept values as text or as vector embeddings.

  was:
Semantic search is a way for computers to understand the meaning behind words 
and phrases when you're searching for something. Instead of just looking for 
exact matches of keywords, it tries to figure out what you're really asking and 
provides results that are more relevant and meaningful to your question. It's 
like having a search engine that can understand what you mean, not just what 
you say, making it easier to find the information you're looking for. This 
ticket is a wish to have Semantic search in Hive.

On the implementation side, semantic search uses an embedding model and any of 
the similarity distance functions. 

My proposal is to implement functions for on-the-fly calculation of similarity 
distance between two values. Once we have them we could easily do semantic 
search as part of a where clause.
 * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') > 
0.9“. And it could return records with regions like Scandinavia, Nordic, Baltic 
etc…
 * We could have functions thats accept values as text or as vector embeddings.


> Semantic Search In Hive
> ---
>
> Key: HIVE-27743
> URL: https://issues.apache.org/jira/browse/HIVE-27743
> Project: Hive
>  Issue Type: Wish
> Environment: *  
>Reporter: Sreenath
>Priority: Major
>
> _Semantic search is the tech power *vector databases,* and we can have the 
> same power in Hive._
> Semantic search is a way for computers to understand the meaning behind words 
> and phrases when you're searching for something. Instead of just looking for 
> exact matches of keywords, it tries to figure out what you're really asking 
> and provides results that are more relevant and meaningful to your question. 
> It's like having a search engine that can understand what you mean, not just 
> what you say, making it easier to find the information you're looking for. 
> This ticket is a wish to have Semantic search in Hive.
> On the implementation side, semantic search uses an embedding model and any 
> of the similarity distance functions.
> My proposal is to implement functions for on-the-fly calculation of 
> similarity distance between two values. Once we have them we could easily do 
> semantic search as part of a where clause.
>  * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') 
> > 0.9“. And it could return records with regions like Scandinavia, Nordic, 
> Baltic etc…
>  * We could have functions thats accept values as text or as vector 
> embeddings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27406) CompactionTxnHandler cleanup

2023-09-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Végh resolved HIVE-27406.

   Fix Version/s: 4.0.0-beta-1
Target Version/s: 4.0.0-beta-1  (was: 4.0.0)
  Resolution: Fixed

> CompactionTxnHandler cleanup
> 
>
> Key: HIVE-27406
> URL: https://issues.apache.org/jira/browse/HIVE-27406
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>
> Tech Debt elimination. Cleanup and standardize CompactionTxnHandler.java 
> considering the following items, but not limited to them:
>  * Check for proper javadoc
>  * Remove unnecessary logging, adjust log level properly
>  * Consistent transaction handling:
>  ** No rollback for selects
>  ** Exception in case of update count mismatch (single row update returns 0 
> or >1 for updated rows)
>  ** Review proper usage of RetrySemantics.* annotations
>  * Common private template methods for 'infrastructure' code (same try-catch 
> logic, connection handling, etc: possibly with spring-jdbc
>  * Replace inline hardcoded and assembled SQL statements with parameterized 
> constants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL

2023-09-28 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770042#comment-17770042
 ] 

Stamatis Zampetakis edited comment on HIVE-27755 at 9/28/23 12:50 PM:
--

For testing the changes, I enabled the general_log for MySQL 
(https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_general_log)
 and run the following tests before and after the changes in PR#4757:

{noformat}
cd standalone-metastore/metastore-server
mvn test -Dtest=TestMysql#upgrade -Dtest.groups=""
mvn test -Dtest=TestSchemaToolForMetastore#testValidateSchemaTables*Mysql* 
-Dtest.groups="" 
mvn test -Dtest=TestSchemaToolForMetastore#testValidateSequences*Mysql* 
-Dtest.groups="" 
{noformat}

I monitored the general_log output generated by the aforementioned tests and I 
compared before and after files for each test verifying that table and column 
names are quoted as expected.

The before and after files from the general_log are attached in this JIRA.

HIVE-27747 is required in order to run TestSchemaToolForMetastore with MySQL as 
a backend. HIVE-27747 is not a prerequisite (but good to have) for merging this 
change.


was (Author: zabetak):
For testing the changes, I enabled the general_log for MySQL 
(https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_general_log)
 and run the following tests before and after the changes in PR#4757:

{noformat}
cd standalone-metastore/metastore-server
mvn test -Dtest=TestMysql#upgrade -Dtest.groups=""
mvn test -Dtest=TestSchemaToolForMetastore#testValidateSchemaTables*Mysql* 
-Dtest.groups="" (requires patch in #4754)
mvn test -Dtest=TestSchemaToolForMetastore#testValidateSequences*Mysql* 
-Dtest.groups="" (requires patch in #4754)
{noformat}

I monitored the general_log output generated by the aforementioned tests and I 
compared before and after files for each test verifying that table and column 
names are quoted as expected.

The before and after files from the general_log are attached in this JIRA.

> Quote identifiers in SQL emitted by SchemaTool for MySQL
> 
>
> Key: HIVE-27755
> URL: https://issues.apache.org/jira/browse/HIVE-27755
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestMysql-upgrade-after.txt, 
> TestMysql-upgrade-before.txt, 
> TestSchemaToolForMetastore-validateSequences-after.txt, 
> TestSchemaToolForMetastore-validateSequences-before.txt, 
> TestSchemaToolForMetastore-validateTables-after.txt, 
> TestSchemaToolForMetastore-validateTables-before.txt
>
>
> Various SchemaTool options/tasks (e.g., "validate") generate and run SQL 
> statements on the underlying database. Depending on the database identifiers 
> in the SQL statements may be quoted (see 
> [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]).
> Currently, all identifiers are quoted when the database is Postgres and this 
> tickets aims to do the same for MySQL/MariaDB.
> The main motivation behind this change is to avoid unexpected surprises and 
> query failures when/if the database decides to turn some of the 
> tables/columns we are using internally to reserved keywords.
> As a concrete example, the Percona fork of MySQL recently turned 
> SEQUENCE_TABLE into a reserved keyword 
> ([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html])
>  and this comes in conflict with our internal metastore table.
> The installation scripts do not fail since in that case SEQUENCE_TABLE is 
> quoted 
> ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447])
>  but validation queries emitted by the SchemaTool will fail 
> ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117])
>  if we don't use quoted identifiers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL

2023-09-28 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27755:
---
Attachment: TestSchemaToolForMetastore-validateTables-before.txt
TestSchemaToolForMetastore-validateTables-after.txt
TestSchemaToolForMetastore-validateSequences-before.txt
TestSchemaToolForMetastore-validateSequences-after.txt
TestMysql-upgrade-before.txt
TestMysql-upgrade-after.txt

> Quote identifiers in SQL emitted by SchemaTool for MySQL
> 
>
> Key: HIVE-27755
> URL: https://issues.apache.org/jira/browse/HIVE-27755
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestMysql-upgrade-after.txt, 
> TestMysql-upgrade-before.txt, 
> TestSchemaToolForMetastore-validateSequences-after.txt, 
> TestSchemaToolForMetastore-validateSequences-before.txt, 
> TestSchemaToolForMetastore-validateTables-after.txt, 
> TestSchemaToolForMetastore-validateTables-before.txt
>
>
> Various SchemaTool options/tasks (e.g., "validate") generate and run SQL 
> statements on the underlying database. Depending on the database identifiers 
> in the SQL statements may be quoted (see 
> [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]).
> Currently, all identifiers are quoted when the database is Postgres and this 
> tickets aims to do the same for MySQL/MariaDB.
> The main motivation behind this change is to avoid unexpected surprises and 
> query failures when/if the database decides to turn some of the 
> tables/columns we are using internally to reserved keywords.
> As a concrete example, the Percona fork of MySQL recently turned 
> SEQUENCE_TABLE into a reserved keyword 
> ([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html])
>  and this comes in conflict with our internal metastore table.
> The installation scripts do not fail since in that case SEQUENCE_TABLE is 
> quoted 
> ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447])
>  but validation queries emitted by the SchemaTool will fail 
> ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117])
>  if we don't use quoted identifiers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL

2023-09-28 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770042#comment-17770042
 ] 

Stamatis Zampetakis commented on HIVE-27755:


For testing the changes, I enabled the general_log for MySQL 
(https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_general_log)
 and run the following tests before and after the changes in PR#4757:

{noformat}
cd standalone-metastore/metastore-server
mvn test -Dtest=TestMysql#upgrade -Dtest.groups=""
mvn test -Dtest=TestSchemaToolForMetastore#testValidateSchemaTables*Mysql* 
-Dtest.groups="" (requires patch in #4754)
mvn test -Dtest=TestSchemaToolForMetastore#testValidateSequences*Mysql* 
-Dtest.groups="" (requires patch in #4754)
{noformat}

I monitored the general_log output generated by the aforementioned tests and I 
compared before and after files for each test verifying that table and column 
names are quoted as expected.

The before and after files from the general_log are attached in this JIRA.

> Quote identifiers in SQL emitted by SchemaTool for MySQL
> 
>
> Key: HIVE-27755
> URL: https://issues.apache.org/jira/browse/HIVE-27755
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestMysql-upgrade-after.txt, 
> TestMysql-upgrade-before.txt, 
> TestSchemaToolForMetastore-validateSequences-after.txt, 
> TestSchemaToolForMetastore-validateSequences-before.txt, 
> TestSchemaToolForMetastore-validateTables-after.txt, 
> TestSchemaToolForMetastore-validateTables-before.txt
>
>
> Various SchemaTool options/tasks (e.g., "validate") generate and run SQL 
> statements on the underlying database. Depending on the database identifiers 
> in the SQL statements may be quoted (see 
> [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]).
> Currently, all identifiers are quoted when the database is Postgres and this 
> tickets aims to do the same for MySQL/MariaDB.
> The main motivation behind this change is to avoid unexpected surprises and 
> query failures when/if the database decides to turn some of the 
> tables/columns we are using internally to reserved keywords.
> As a concrete example, the Percona fork of MySQL recently turned 
> SEQUENCE_TABLE into a reserved keyword 
> ([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html])
>  and this comes in conflict with our internal metastore table.
> The installation scripts do not fail since in that case SEQUENCE_TABLE is 
> quoted 
> ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447])
>  but validation queries emitted by the SchemaTool will fail 
> ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117])
>  if we don't use quoted identifiers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL

2023-09-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27755:
--
Labels: pull-request-available  (was: )

> Quote identifiers in SQL emitted by SchemaTool for MySQL
> 
>
> Key: HIVE-27755
> URL: https://issues.apache.org/jira/browse/HIVE-27755
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> Various SchemaTool options/tasks (e.g., "validate") generate and run SQL 
> statements on the underlying database. Depending on the database identifiers 
> in the SQL statements may be quoted (see 
> [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]).
> Currently, all identifiers are quoted when the database is Postgres and this 
> tickets aims to do the same for MySQL/MariaDB.
> The main motivation behind this change is to avoid unexpected surprises and 
> query failures when/if the database decides to turn some of the 
> tables/columns we are using internally to reserved keywords.
> As a concrete example, the Percona fork of MySQL recently turned 
> SEQUENCE_TABLE into a reserved keyword 
> ([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html])
>  and this comes in conflict with our internal metastore table.
> The installation scripts do not fail since in that case SEQUENCE_TABLE is 
> quoted 
> ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447])
>  but validation queries emitted by the SchemaTool will fail 
> ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117])
>  if we don't use quoted identifiers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-09-28 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770037#comment-17770037
 ] 

Krisztian Kasa commented on HIVE-27754:
---

{code}
set hive.cbo.fallback.strategy=NEVER;
{code}
Can be used to prevent running these statements.
see also:
https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L687-L688

> Query Filter with OR condition updates every record in the table
> 
>
> Key: HIVE-27754
> URL: https://issues.apache.org/jira/browse/HIVE-27754
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
>  
> {noformat}
> UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' 
> ;{noformat}
>  After the above statement, all the records are updated. The condition 
> {{'Taylor'}} is a constant string, and it will always evaluate to true 
> because it's a non-empty string. So, effectively,  {{UPDATE}} statement is 
> updating all rows in the {{customers_man.}}
> {{}}
> {{Repro: }}
> {noformat}
> create  table customers_man (customer_id bigint, first_name string) 
> PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES 
> ('transactional'='true');
>  insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", 
> "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", 
> "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", 
> "Johnson"), (3, "Trudy", "Henderson");
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 1  | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 1  | Sharon| Taylor
>|
>  
> ++---+--+
>  UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> last_name='Taylor' ;
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 22 | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 22 | Sharon| Taylor
>|
>  
> ++---+--+
>   UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> 'Taylor' ;
>   select * from customers_man;
>   
> ++---+--+
>   | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>   
> ++---+--+
>   | 22 | Blake | Burr 
> |
>   | 22 | Jake  | Donnel   
> |
>   | 22 | Trudy | Henderson
>   

[jira] [Updated] (HIVE-27723) Prevent localizing the same original file more than once if symlinks are present

2023-09-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27723:

Summary: Prevent localizing the same original file more than once if 
symlinks are present  (was: Prevent localizing the same file more than once)

> Prevent localizing the same original file more than once if symlinks are 
> present
> 
>
> Key: HIVE-27723
> URL: https://issues.apache.org/jira/browse/HIVE-27723
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>
> We already calculate SHA hashes for the files to be localized. There is a 
> chance, that in some setups, the hive-exec jars are symlinked so it gets 
> localized more than once.
> {code}
> [root@lbodor-hiveontez-4 ~]# sudo -u hive hdfs dfs -ls -R 
> /tmp/hive/hive/_tez_session_dir
> drwx--   - hive supergroup  0 2023-09-20 12:13 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6
> drwx--   - hive supergroup  0 2023-09-20 12:19 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6/.tez
> drwx--   - hive supergroup  0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources/hive-exec-3.1.3000.7.2.18.0-334.jar
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources/hive-exec.jar
> drwx--   - hive supergroup  0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1
> drwx--   - hive supergroup  0 2023-09-20 12:04 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1/.tez
> drwx--   - hive supergroup  0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources/hive-exec-3.1.3000.7.2.18.0-334.jar
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources/hive-exec.jar
> drwx--   - hive supergroup  0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad
> drwx--   - hive supergroup  0 2023-09-20 13:13 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad/.tez
> drwx--   - hive supergroup  0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources/hive-exec-3.1.3000.7.2.18.0-334.jar
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources/hive-exec.jar
> drwx--   - hive supergroup  0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57
> drwx--   - hive supergroup  0 2023-09-20 12:04 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57/.tez
> drwx--   - hive supergroup  0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources/hive-exec-3.1.3000.7.2.18.0-334.jar
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources/hive-exec.jar
> {code}
> in the presence of huge amount of sessions, we cannot afford this overhead of 
> copying this files to HDFS and localizing to all containers twice
> the root cause can be solved by removing symlinks of the same hive-exec jar, 
> however, as we're already calculating SHA for the files, it's so easy to take 
> care of the duplications in the localization codepath, and this takes care of 
> any accidental duplications



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-09-28 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770032#comment-17770032
 ] 

Krisztian Kasa commented on HIVE-27754:
---

A simple repro with a query:
{code} 
create table t1 (a int);

insert into t1(a) values (1), (2), (NULL);

select * from t1 where 'anything';
{code}
returns
{code}
1
2
NULL
{code}

CBO is failing in this case. From hive.log
{code}
2023-09-28T05:14:55,578 ERROR [08def54d-804f-44fc-8452-c9873eb3a06e Listener at 
0.0.0.0/36139] parse.CalcitePlanner: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Filter 
expression with non-boolean return type.
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3216)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3202)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3399)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3410)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5084)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1649)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1593)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1345)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13023)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:467)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) 
~[hive

[jira] [Created] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL

2023-09-28 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27755:
--

 Summary: Quote identifiers in SQL emitted by SchemaTool for MySQL
 Key: HIVE-27755
 URL: https://issues.apache.org/jira/browse/HIVE-27755
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Affects Versions: 4.0.0-beta-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Various SchemaTool options/tasks (e.g., "validate") generate and run SQL 
statements on the underlying database. Depending on the database identifiers in 
the SQL statements may be quoted (see 
[https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]).

Currently, all identifiers are quoted when the database is Postgres and this 
tickets aims to do the same for MySQL/MariaDB.

The main motivation behind this change is to avoid unexpected surprises and 
query failures when/if the database decides to turn some of the tables/columns 
we are using internally to reserved keywords.

As a concrete example, the Percona fork of MySQL recently turned SEQUENCE_TABLE 
into a reserved keyword 
([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html]) 
and this comes in conflict with our internal metastore table.

The installation scripts do not fail since in that case SEQUENCE_TABLE is 
quoted 
([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447])
 but validation queries emitted by the SchemaTool will fail 
([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117])
 if we don't use quoted identifiers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-09-28 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa reassigned HIVE-27754:
--

Assignee: Simhadri Govindappa

> Query Filter with OR condition updates every record in the table
> 
>
> Key: HIVE-27754
> URL: https://issues.apache.org/jira/browse/HIVE-27754
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
>  
> {noformat}
> UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' 
> ;{noformat}
>  After the above statement, all the records are updated. The condition 
> {{'Taylor'}} is a constant string, and it will always evaluate to true 
> because it's a non-empty string. So, effectively,  {{UPDATE}} statement is 
> updating all rows in the {{customers_man.}}
> {{}}
> {{Repro: }}
> {noformat}
> create  table customers_man (customer_id bigint, first_name string) 
> PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES 
> ('transactional'='true');
>  insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", 
> "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", 
> "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", 
> "Johnson"), (3, "Trudy", "Henderson");
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 1  | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 1  | Sharon| Taylor
>|
>  
> ++---+--+
>  UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> last_name='Taylor' ;
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 22 | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 22 | Sharon| Taylor
>|
>  
> ++---+--+
>   UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> 'Taylor' ;
>   select * from customers_man;
>   
> ++---+--+
>   | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>   
> ++---+--+
>   | 22 | Blake | Burr 
> |
>   | 22 | Jake  | Donnel   
> |
>   | 22 | Trudy | Henderson
> |
>   | 22 | Trudy | Johnson  
> |
>   | 22 | Susan | Morrison 
> |
>   | 22 | Joanna| Pierce  

[jira] [Created] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-09-28 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27754:
--

 Summary: Query Filter with OR condition updates every record in 
the table
 Key: HIVE-27754
 URL: https://issues.apache.org/jira/browse/HIVE-27754
 Project: Hive
  Issue Type: Bug
Reporter: Simhadri Govindappa


 
{noformat}
UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' 
;{noformat}
 After the above statement, all the records are updated. The condition 
{{'Taylor'}} is a constant string, and it will always evaluate to true because 
it's a non-empty string. So, effectively,  {{UPDATE}} statement is updating all 
rows in the {{customers_man.}}
{{}}
{{Repro: }}


{noformat}
create  table customers_man (customer_id bigint, first_name string) PARTITIONED 
BY (last_name string) STORED AS orc TBLPROPERTIES ('transactional'='true');

 insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", 
"Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", 
"Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", 
"Johnson"), (3, "Trudy", "Henderson");
 select * from customers_man;
 
++---+--+
 | customers_man.customer_id  | customers_man.first_name  | 
customers_man.last_name  |
 
++---+--+
 | 3  | Blake | Burr
 |
 | 2  | Jake  | Donnel  
 |
 | 3  | Trudy | Henderson   
 |
 | 3  | Trudy | Johnson 
 |
 | 2  | Susan | Morrison
 |
 | 1  | Joanna| Pierce  
 |
 | 2  | Joanna| Silver  
 |
 | 2  | Bob   | Silver  
 |
 | 1  | Sharon| Taylor  
 |
 
++---+--+


 UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
last_name='Taylor' ;
 select * from customers_man;
 
++---+--+
 | customers_man.customer_id  | customers_man.first_name  | 
customers_man.last_name  |
 
++---+--+
 | 3  | Blake | Burr
 |
 | 2  | Jake  | Donnel  
 |
 | 3  | Trudy | Henderson   
 |
 | 3  | Trudy | Johnson 
 |
 | 2  | Susan | Morrison
 |
 | 22 | Joanna| Pierce  
 |
 | 2  | Joanna| Silver  
 |
 | 2  | Bob   | Silver  
 |
 | 22 | Sharon| Taylor  
 |
 
++---+--+


  UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' ;
  select * from customers_man;
  
++---+--+
  | customers_man.customer_id  | customers_man.first_name  | 
customers_man.last_name  |
  
++---+--+
  | 22 | Blake | Burr   
  |
  | 22 | Jake  | Donnel 
  |
  | 22 | Trudy | Henderson  
  |
  | 22 | Trudy | Johnson
  |
  | 22 | Susan | Morrison   
  |
  | 22 | Joanna| Pierce 
  |
  | 22 | Joanna| Silver 
  |
  | 22 | Bob   | Silver 
  |
  | 22 | Sharon| Taylor 
  |
  
++---+--+

--- simpler repro
UPDATE customers_man SET customer_id=23 WHERE true;
select * from customers_man; 

+

[jira] [Commented] (HIVE-27573) Backport of HIVE-21799: NullPointerException in DynamicPartitionPruningOptimization, when join key is on aggregation column

2023-09-28 Thread Sankar Hariappan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770015#comment-17770015
 ] 

Sankar Hariappan commented on HIVE-27573:
-

Thanks [~shefali636] for the contribution!

> Backport of HIVE-21799: NullPointerException in 
> DynamicPartitionPruningOptimization, when join key is on aggregation column
> ---
>
> Key: HIVE-27573
> URL: https://issues.apache.org/jira/browse/HIVE-27573
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.1.3
>Reporter: Shefali Singh
>Assignee: Shefali Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27573) Backport of HIVE-21799: NullPointerException in DynamicPartitionPruningOptimization, when join key is on aggregation column

2023-09-28 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-27573.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

> Backport of HIVE-21799: NullPointerException in 
> DynamicPartitionPruningOptimization, when join key is on aggregation column
> ---
>
> Key: HIVE-27573
> URL: https://issues.apache.org/jira/browse/HIVE-27573
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Shefali Singh
>Assignee: Shefali Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27573) Backport of HIVE-21799: NullPointerException in DynamicPartitionPruningOptimization, when join key is on aggregation column

2023-09-28 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-27573:

Affects Version/s: 3.1.3
   (was: 3.2.0)

> Backport of HIVE-21799: NullPointerException in 
> DynamicPartitionPruningOptimization, when join key is on aggregation column
> ---
>
> Key: HIVE-27573
> URL: https://issues.apache.org/jira/browse/HIVE-27573
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.1.3
>Reporter: Shefali Singh
>Assignee: Shefali Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27753) Mask Q file output to avoid flakyness

2023-09-28 Thread KIRTI RUGE (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KIRTI RUGE updated HIVE-27753:
--
Description: 
Mask below pattern in q output files to avoid flakyness of tests

 

drwxr-xr-x - ### USER ### ### GROUP ### 0 ### HDFS DATE ### hdfs://### HDFS 
PATH ###

  was:Mask below pattern in q output files to avoid flakyness of tests


> Mask Q file output to avoid flakyness 
> --
>
> Key: HIVE-27753
> URL: https://issues.apache.org/jira/browse/HIVE-27753
> Project: Hive
>  Issue Type: Improvement
>Reporter: KIRTI RUGE
>Assignee: KIRTI RUGE
>Priority: Major
>
> Mask below pattern in q output files to avoid flakyness of tests
>  
> drwxr-xr-x - ### USER ### ### GROUP ### 0 ### HDFS DATE ### hdfs://### HDFS 
> PATH ###



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27753) Mask Q file output to avoid flakyness

2023-09-28 Thread KIRTI RUGE (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KIRTI RUGE reassigned HIVE-27753:
-

Assignee: KIRTI RUGE

> Mask Q file output to avoid flakyness 
> --
>
> Key: HIVE-27753
> URL: https://issues.apache.org/jira/browse/HIVE-27753
> Project: Hive
>  Issue Type: Improvement
>Reporter: KIRTI RUGE
>Assignee: KIRTI RUGE
>Priority: Major
>
> Mask below pattern in q output files to avoid flakyness of tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27753) Mask Q file output to avoid flakyness

2023-09-28 Thread KIRTI RUGE (Jira)
KIRTI RUGE created HIVE-27753:
-

 Summary: Mask Q file output to avoid flakyness 
 Key: HIVE-27753
 URL: https://issues.apache.org/jira/browse/HIVE-27753
 Project: Hive
  Issue Type: Improvement
Reporter: KIRTI RUGE


Mask below pattern in q output files to avoid flakyness of tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class

2023-09-28 Thread Akshat Mathur (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshat Mathur updated HIVE-27752:
-
Target Version/s: 4.0.0
  Status: Patch Available  (was: In Progress)

> Remove DagUtils duplicate class
> ---
>
> Key: HIVE-27752
> URL: https://issues.apache.org/jira/browse/HIVE-27752
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Akshat Mathur
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> remove this small orphaned stuff: 
> https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java
> and place method to 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27752) Remove DagUtils duplicate class

2023-09-28 Thread Akshat Mathur (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27752 started by Akshat Mathur.

> Remove DagUtils duplicate class
> ---
>
> Key: HIVE-27752
> URL: https://issues.apache.org/jira/browse/HIVE-27752
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Akshat Mathur
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> remove this small orphaned stuff: 
> https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java
> and place method to 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class

2023-09-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27752:
--
Labels: newbie pull-request-available  (was: newbie)

> Remove DagUtils duplicate class
> ---
>
> Key: HIVE-27752
> URL: https://issues.apache.org/jira/browse/HIVE-27752
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Akshat Mathur
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> remove this small orphaned stuff: 
> https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java
> and place method to 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27695) Intermittent OOM when running TestMiniTezCliDriver

2023-09-28 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769958#comment-17769958
 ] 

Stamatis Zampetakis commented on HIVE-27695:


More failed runs due to OOM in TezCliDriver:
* 
http://ci.hive.apache.org/job/hive-precommit/job/PR-4750/3/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_19___PostProcess___testCliDriver_flatten_union_subdir_/
* 
http://ci.hive.apache.org/job/hive-precommit/job/PR-4754/1/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_19___PostProcess___testCliDriver_tez_union_with_udf_/
 


> Intermittent OOM when running TestMiniTezCliDriver
> --
>
> Key: HIVE-27695
> URL: https://issues.apache.org/jira/browse/HIVE-27695
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: am_heap_dumps.tar.xz, leak_suspect_1.png
>
>
> Running all the tests under TestMiniTezCliDriver very frequently (but still 
> intermittently) leads to OutOfMemory errors.
> {noformat}
> cd itests/qtest && mvn test -Dtest=TestMiniTezCliDriver
> {noformat}
> I set {{-XX:+HeapDumpOnOutOfMemoryError}} and the respective heapdumps are 
> attached to this ticket.
> The OOM is thrown from the application master and a quick inspection of the 
> dumps shows that it comes mainly from the accumulation of Configuration 
> objects (~1MB each) by various classes.
> The max heap size for application master is pretty low (~100MB) so it is 
> quite easy to reach. The heap size is explicitly very low for testing 
> purposes but maybe we should re-evaluate the current configurations for the 
> tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27751) Log Query Compilation summary in an accumulated way

2023-09-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27751:
--
Labels: pull-request-available  (was: )

> Log Query Compilation summary in an accumulated way
> ---
>
> Key: HIVE-27751
> URL: https://issues.apache.org/jira/browse/HIVE-27751
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Query Compilation summary is very useful for reading and collecting all the 
> measures of compile time in a single place. It is also useful in debugging a 
> performance issue in the query compilation phase and also to report and 
> compare with various runs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27752) Remove DagUtils duplicate class

2023-09-28 Thread Akshat Mathur (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshat Mathur reassigned HIVE-27752:


Assignee: Akshat Mathur

> Remove DagUtils duplicate class
> ---
>
> Key: HIVE-27752
> URL: https://issues.apache.org/jira/browse/HIVE-27752
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Akshat Mathur
>Priority: Minor
>  Labels: newbie
>
> remove this small orphaned stuff: 
> https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java
> and place method to 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class

2023-09-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27752:

Labels: newbie  (was: )

> Remove DagUtils duplicate class
> ---
>
> Key: HIVE-27752
> URL: https://issues.apache.org/jira/browse/HIVE-27752
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Major
>  Labels: newbie
>
> remove this small orphaned stuff: 
> https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java
> and place method to 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class

2023-09-28 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27752:

Priority: Minor  (was: Major)

> Remove DagUtils duplicate class
> ---
>
> Key: HIVE-27752
> URL: https://issues.apache.org/jira/browse/HIVE-27752
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Minor
>  Labels: newbie
>
> remove this small orphaned stuff: 
> https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java
> and place method to 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27752) Remove DagUtils duplicate

2023-09-28 Thread Jira
László Bodor created HIVE-27752:
---

 Summary: Remove DagUtils duplicate
 Key: HIVE-27752
 URL: https://issues.apache.org/jira/browse/HIVE-27752
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class

2023-09-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27752:

Summary: Remove DagUtils duplicate class  (was: Remove DagUtils duplicate)

> Remove DagUtils duplicate class
> ---
>
> Key: HIVE-27752
> URL: https://issues.apache.org/jira/browse/HIVE-27752
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class

2023-09-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27752:

Description: 
remove this small orphaned stuff: 
https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java
and place method to 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java

> Remove DagUtils duplicate class
> ---
>
> Key: HIVE-27752
> URL: https://issues.apache.org/jira/browse/HIVE-27752
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Major
>
> remove this small orphaned stuff: 
> https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java
> and place method to 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27751) Log Query Compilation summary in an accumulated way

2023-09-28 Thread Ramesh Kumar Thangarajan (Jira)
Ramesh Kumar Thangarajan created HIVE-27751:
---

 Summary: Log Query Compilation summary in an accumulated way
 Key: HIVE-27751
 URL: https://issues.apache.org/jira/browse/HIVE-27751
 Project: Hive
  Issue Type: Task
  Components: Hive
Reporter: Ramesh Kumar Thangarajan
Assignee: Ramesh Kumar Thangarajan


Query Compilation summary is very useful for reading and collecting all the 
measures of compile time in a single place. It is also useful in debugging a 
performance issue in the query compilation phase and also to report and compare 
with various runs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)