[jira] [Commented] (HIVE-27406) CompactionTxnHandler cleanup
[ https://issues.apache.org/jira/browse/HIVE-27406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770314#comment-17770314 ] Stamatis Zampetakis commented on HIVE-27406: The 4.0.0-beta-1 version is already out and this ticket is not part of it. I updated the Fix Version to indicate the next (possibly 4.0.0). > CompactionTxnHandler cleanup > > > Key: HIVE-27406 > URL: https://issues.apache.org/jira/browse/HIVE-27406 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: László Végh >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Tech Debt elimination. Cleanup and standardize CompactionTxnHandler.java > considering the following items, but not limited to them: > * Check for proper javadoc > * Remove unnecessary logging, adjust log level properly > * Consistent transaction handling: > ** No rollback for selects > ** Exception in case of update count mismatch (single row update returns 0 > or >1 for updated rows) > ** Review proper usage of RetrySemantics.* annotations > * Common private template methods for 'infrastructure' code (same try-catch > logic, connection handling, etc: possibly with spring-jdbc > * Replace inline hardcoded and assembled SQL statements with parameterized > constants. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27406) CompactionTxnHandler cleanup
[ https://issues.apache.org/jira/browse/HIVE-27406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-27406: --- Fix Version/s: 4.0.0 (was: 4.0.0-beta-1) > CompactionTxnHandler cleanup > > > Key: HIVE-27406 > URL: https://issues.apache.org/jira/browse/HIVE-27406 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: László Végh >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Tech Debt elimination. Cleanup and standardize CompactionTxnHandler.java > considering the following items, but not limited to them: > * Check for proper javadoc > * Remove unnecessary logging, adjust log level properly > * Consistent transaction handling: > ** No rollback for selects > ** Exception in case of update count mismatch (single row update returns 0 > or >1 for updated rows) > ** Review proper usage of RetrySemantics.* annotations > * Common private template methods for 'infrastructure' code (same try-catch > logic, connection handling, etc: possibly with spring-jdbc > * Replace inline hardcoded and assembled SQL statements with parameterized > constants. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27757) Upgrade hadoop to 3.3.6
[ https://issues.apache.org/jira/browse/HIVE-27757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27757: -- Labels: pull-request-available (was: ) > Upgrade hadoop to 3.3.6 > --- > > Key: HIVE-27757 > URL: https://issues.apache.org/jira/browse/HIVE-27757 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > > Hadoop 3.3.6 is released and comes up with lots of improvements & CVE fixes -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27757) Upgrade hadoop to 3.3.6
Ayush Saxena created HIVE-27757: --- Summary: Upgrade hadoop to 3.3.6 Key: HIVE-27757 URL: https://issues.apache.org/jira/browse/HIVE-27757 Project: Hive Issue Type: Improvement Reporter: Ayush Saxena Assignee: Ayush Saxena Hadoop 3.3.6 is released and comes up with lots of improvements & CVE fixes -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27704) Remove PowerMock from jdbc-handler and upgrade mockito to 4.11
[ https://issues.apache.org/jira/browse/HIVE-27704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-27704. - Fix Version/s: 4.0.0 Resolution: Fixed > Remove PowerMock from jdbc-handler and upgrade mockito to 4.11 > -- > > Key: HIVE-27704 > URL: https://issues.apache.org/jira/browse/HIVE-27704 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Zsolt Miskolczi >Assignee: KIRTI RUGE >Priority: Major > Labels: newbie, pull-request-available, starter > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27704) Remove PowerMock from jdbc-handler and upgrade mockito to 4.11
[ https://issues.apache.org/jira/browse/HIVE-27704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770302#comment-17770302 ] Ayush Saxena commented on HIVE-27704: - Committed to master. Thanx [~rkirtir] for the contribution & [~InvisibleProgrammer] for the review!!! > Remove PowerMock from jdbc-handler and upgrade mockito to 4.11 > -- > > Key: HIVE-27704 > URL: https://issues.apache.org/jira/browse/HIVE-27704 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Zsolt Miskolczi >Assignee: KIRTI RUGE >Priority: Major > Labels: newbie, pull-request-available, starter > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27744) privileges check is skipped when using partly dynamic partition write.
[ https://issues.apache.org/jira/browse/HIVE-27744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shuaiqi.guo reassigned HIVE-27744: -- Assignee: shuaiqi.guo > privileges check is skipped when using partly dynamic partition write. > -- > > Key: HIVE-27744 > URL: https://issues.apache.org/jira/browse/HIVE-27744 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: All Versions >Reporter: shuaiqi.guo >Assignee: shuaiqi.guo >Priority: Blocker > Fix For: 2.3.5 > > Attachments: HIVE-27744.patch > > > the privileges check will be skiped when using dynamic partition write with > part of the partition specified, just like the following example: > {code:java} > insert overwrite table test_privilege partition (`date` = '2023-09-27', hour) > ... {code} > hive will execute it directly without checking write privileges. > > use the following patch to fix this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27743) Semantic Search In Hive
[ https://issues.apache.org/jira/browse/HIVE-27743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath updated HIVE-27743: Description: _Semantic search is the tech power *vector databases,* and we can have the same power in Hive._ Semantic search is a way for computers to understand the meaning behind words and phrases when you're searching for something. Instead of just looking for exact matches of keywords, it tries to figure out what you're really asking and provides results that are more relevant and meaningful to your question. It's like having a search engine that can understand what you mean, not just what you say, making it easier to find the information you're looking for. This ticket is a wish to have Semantic search in Hive. On the implementation side, semantic search uses an embedding model and any of the similarity distance functions. My proposal is to implement functions for on-the-fly calculation of similarity distance between two values. Once we have them we could easily do semantic search as part of a where clause. * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') > 0.9“. And it could return records with regions like Scandinavia, Nordic, Baltic etc… * We could have functions thats accept values as text or as vector embeddings. was: Semantic search is a way for computers to understand the meaning behind words and phrases when you're searching for something. Instead of just looking for exact matches of keywords, it tries to figure out what you're really asking and provides results that are more relevant and meaningful to your question. It's like having a search engine that can understand what you mean, not just what you say, making it easier to find the information you're looking for. This ticket is a wish to have Semantic search in Hive. On the implementation side, semantic search uses an embedding model and any of the similarity distance functions. My proposal is to implement functions for on-the-fly calculation of similarity distance between two values. Once we have them we could easily do semantic search as part of a where clause. * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') > 0.9“. And it could return records with regions like Scandinavia, Nordic, Baltic etc… * We could have functions thats accept values as text or as vector embeddings. > Semantic Search In Hive > --- > > Key: HIVE-27743 > URL: https://issues.apache.org/jira/browse/HIVE-27743 > Project: Hive > Issue Type: Wish > Environment: * >Reporter: Sreenath >Priority: Major > > _Semantic search is the tech power *vector databases,* and we can have the > same power in Hive._ > Semantic search is a way for computers to understand the meaning behind words > and phrases when you're searching for something. Instead of just looking for > exact matches of keywords, it tries to figure out what you're really asking > and provides results that are more relevant and meaningful to your question. > It's like having a search engine that can understand what you mean, not just > what you say, making it easier to find the information you're looking for. > This ticket is a wish to have Semantic search in Hive. > On the implementation side, semantic search uses an embedding model and any > of the similarity distance functions. > My proposal is to implement functions for on-the-fly calculation of > similarity distance between two values. Once we have them we could easily do > semantic search as part of a where clause. > * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') > > 0.9“. And it could return records with regions like Scandinavia, Nordic, > Baltic etc… > * We could have functions thats accept values as text or as vector > embeddings. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27406) CompactionTxnHandler cleanup
[ https://issues.apache.org/jira/browse/HIVE-27406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Végh resolved HIVE-27406. Fix Version/s: 4.0.0-beta-1 Target Version/s: 4.0.0-beta-1 (was: 4.0.0) Resolution: Fixed > CompactionTxnHandler cleanup > > > Key: HIVE-27406 > URL: https://issues.apache.org/jira/browse/HIVE-27406 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: László Végh >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-beta-1 > > > Tech Debt elimination. Cleanup and standardize CompactionTxnHandler.java > considering the following items, but not limited to them: > * Check for proper javadoc > * Remove unnecessary logging, adjust log level properly > * Consistent transaction handling: > ** No rollback for selects > ** Exception in case of update count mismatch (single row update returns 0 > or >1 for updated rows) > ** Review proper usage of RetrySemantics.* annotations > * Common private template methods for 'infrastructure' code (same try-catch > logic, connection handling, etc: possibly with spring-jdbc > * Replace inline hardcoded and assembled SQL statements with parameterized > constants. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL
[ https://issues.apache.org/jira/browse/HIVE-27755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770042#comment-17770042 ] Stamatis Zampetakis edited comment on HIVE-27755 at 9/28/23 12:50 PM: -- For testing the changes, I enabled the general_log for MySQL (https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_general_log) and run the following tests before and after the changes in PR#4757: {noformat} cd standalone-metastore/metastore-server mvn test -Dtest=TestMysql#upgrade -Dtest.groups="" mvn test -Dtest=TestSchemaToolForMetastore#testValidateSchemaTables*Mysql* -Dtest.groups="" mvn test -Dtest=TestSchemaToolForMetastore#testValidateSequences*Mysql* -Dtest.groups="" {noformat} I monitored the general_log output generated by the aforementioned tests and I compared before and after files for each test verifying that table and column names are quoted as expected. The before and after files from the general_log are attached in this JIRA. HIVE-27747 is required in order to run TestSchemaToolForMetastore with MySQL as a backend. HIVE-27747 is not a prerequisite (but good to have) for merging this change. was (Author: zabetak): For testing the changes, I enabled the general_log for MySQL (https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_general_log) and run the following tests before and after the changes in PR#4757: {noformat} cd standalone-metastore/metastore-server mvn test -Dtest=TestMysql#upgrade -Dtest.groups="" mvn test -Dtest=TestSchemaToolForMetastore#testValidateSchemaTables*Mysql* -Dtest.groups="" (requires patch in #4754) mvn test -Dtest=TestSchemaToolForMetastore#testValidateSequences*Mysql* -Dtest.groups="" (requires patch in #4754) {noformat} I monitored the general_log output generated by the aforementioned tests and I compared before and after files for each test verifying that table and column names are quoted as expected. The before and after files from the general_log are attached in this JIRA. > Quote identifiers in SQL emitted by SchemaTool for MySQL > > > Key: HIVE-27755 > URL: https://issues.apache.org/jira/browse/HIVE-27755 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: TestMysql-upgrade-after.txt, > TestMysql-upgrade-before.txt, > TestSchemaToolForMetastore-validateSequences-after.txt, > TestSchemaToolForMetastore-validateSequences-before.txt, > TestSchemaToolForMetastore-validateTables-after.txt, > TestSchemaToolForMetastore-validateTables-before.txt > > > Various SchemaTool options/tasks (e.g., "validate") generate and run SQL > statements on the underlying database. Depending on the database identifiers > in the SQL statements may be quoted (see > [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]). > Currently, all identifiers are quoted when the database is Postgres and this > tickets aims to do the same for MySQL/MariaDB. > The main motivation behind this change is to avoid unexpected surprises and > query failures when/if the database decides to turn some of the > tables/columns we are using internally to reserved keywords. > As a concrete example, the Percona fork of MySQL recently turned > SEQUENCE_TABLE into a reserved keyword > ([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html]) > and this comes in conflict with our internal metastore table. > The installation scripts do not fail since in that case SEQUENCE_TABLE is > quoted > ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447]) > but validation queries emitted by the SchemaTool will fail > ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117]) > if we don't use quoted identifiers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL
[ https://issues.apache.org/jira/browse/HIVE-27755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-27755: --- Attachment: TestSchemaToolForMetastore-validateTables-before.txt TestSchemaToolForMetastore-validateTables-after.txt TestSchemaToolForMetastore-validateSequences-before.txt TestSchemaToolForMetastore-validateSequences-after.txt TestMysql-upgrade-before.txt TestMysql-upgrade-after.txt > Quote identifiers in SQL emitted by SchemaTool for MySQL > > > Key: HIVE-27755 > URL: https://issues.apache.org/jira/browse/HIVE-27755 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: TestMysql-upgrade-after.txt, > TestMysql-upgrade-before.txt, > TestSchemaToolForMetastore-validateSequences-after.txt, > TestSchemaToolForMetastore-validateSequences-before.txt, > TestSchemaToolForMetastore-validateTables-after.txt, > TestSchemaToolForMetastore-validateTables-before.txt > > > Various SchemaTool options/tasks (e.g., "validate") generate and run SQL > statements on the underlying database. Depending on the database identifiers > in the SQL statements may be quoted (see > [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]). > Currently, all identifiers are quoted when the database is Postgres and this > tickets aims to do the same for MySQL/MariaDB. > The main motivation behind this change is to avoid unexpected surprises and > query failures when/if the database decides to turn some of the > tables/columns we are using internally to reserved keywords. > As a concrete example, the Percona fork of MySQL recently turned > SEQUENCE_TABLE into a reserved keyword > ([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html]) > and this comes in conflict with our internal metastore table. > The installation scripts do not fail since in that case SEQUENCE_TABLE is > quoted > ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447]) > but validation queries emitted by the SchemaTool will fail > ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117]) > if we don't use quoted identifiers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL
[ https://issues.apache.org/jira/browse/HIVE-27755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770042#comment-17770042 ] Stamatis Zampetakis commented on HIVE-27755: For testing the changes, I enabled the general_log for MySQL (https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_general_log) and run the following tests before and after the changes in PR#4757: {noformat} cd standalone-metastore/metastore-server mvn test -Dtest=TestMysql#upgrade -Dtest.groups="" mvn test -Dtest=TestSchemaToolForMetastore#testValidateSchemaTables*Mysql* -Dtest.groups="" (requires patch in #4754) mvn test -Dtest=TestSchemaToolForMetastore#testValidateSequences*Mysql* -Dtest.groups="" (requires patch in #4754) {noformat} I monitored the general_log output generated by the aforementioned tests and I compared before and after files for each test verifying that table and column names are quoted as expected. The before and after files from the general_log are attached in this JIRA. > Quote identifiers in SQL emitted by SchemaTool for MySQL > > > Key: HIVE-27755 > URL: https://issues.apache.org/jira/browse/HIVE-27755 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: TestMysql-upgrade-after.txt, > TestMysql-upgrade-before.txt, > TestSchemaToolForMetastore-validateSequences-after.txt, > TestSchemaToolForMetastore-validateSequences-before.txt, > TestSchemaToolForMetastore-validateTables-after.txt, > TestSchemaToolForMetastore-validateTables-before.txt > > > Various SchemaTool options/tasks (e.g., "validate") generate and run SQL > statements on the underlying database. Depending on the database identifiers > in the SQL statements may be quoted (see > [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]). > Currently, all identifiers are quoted when the database is Postgres and this > tickets aims to do the same for MySQL/MariaDB. > The main motivation behind this change is to avoid unexpected surprises and > query failures when/if the database decides to turn some of the > tables/columns we are using internally to reserved keywords. > As a concrete example, the Percona fork of MySQL recently turned > SEQUENCE_TABLE into a reserved keyword > ([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html]) > and this comes in conflict with our internal metastore table. > The installation scripts do not fail since in that case SEQUENCE_TABLE is > quoted > ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447]) > but validation queries emitted by the SchemaTool will fail > ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117]) > if we don't use quoted identifiers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL
[ https://issues.apache.org/jira/browse/HIVE-27755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27755: -- Labels: pull-request-available (was: ) > Quote identifiers in SQL emitted by SchemaTool for MySQL > > > Key: HIVE-27755 > URL: https://issues.apache.org/jira/browse/HIVE-27755 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > > Various SchemaTool options/tasks (e.g., "validate") generate and run SQL > statements on the underlying database. Depending on the database identifiers > in the SQL statements may be quoted (see > [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]). > Currently, all identifiers are quoted when the database is Postgres and this > tickets aims to do the same for MySQL/MariaDB. > The main motivation behind this change is to avoid unexpected surprises and > query failures when/if the database decides to turn some of the > tables/columns we are using internally to reserved keywords. > As a concrete example, the Percona fork of MySQL recently turned > SEQUENCE_TABLE into a reserved keyword > ([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html]) > and this comes in conflict with our internal metastore table. > The installation scripts do not fail since in that case SEQUENCE_TABLE is > quoted > ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447]) > but validation queries emitted by the SchemaTool will fail > ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117]) > if we don't use quoted identifiers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27754) Query Filter with OR condition updates every record in the table
[ https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770037#comment-17770037 ] Krisztian Kasa commented on HIVE-27754: --- {code} set hive.cbo.fallback.strategy=NEVER; {code} Can be used to prevent running these statements. see also: https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L687-L688 > Query Filter with OR condition updates every record in the table > > > Key: HIVE-27754 > URL: https://issues.apache.org/jira/browse/HIVE-27754 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > > {noformat} > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' > ;{noformat} > After the above statement, all the records are updated. The condition > {{'Taylor'}} is a constant string, and it will always evaluate to true > because it's a non-empty string. So, effectively, {{UPDATE}} statement is > updating all rows in the {{customers_man.}} > {{}} > {{Repro: }} > {noformat} > create table customers_man (customer_id bigint, first_name string) > PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES > ('transactional'='true'); > insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", > "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", > "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", > "Johnson"), (3, "Trudy", "Henderson"); > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 3 | Blake | Burr >| > | 2 | Jake | Donnel >| > | 3 | Trudy | Henderson >| > | 3 | Trudy | Johnson >| > | 2 | Susan | Morrison >| > | 1 | Joanna| Pierce >| > | 2 | Joanna| Silver >| > | 2 | Bob | Silver >| > | 1 | Sharon| Taylor >| > > ++---+--+ > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR > last_name='Taylor' ; > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 3 | Blake | Burr >| > | 2 | Jake | Donnel >| > | 3 | Trudy | Henderson >| > | 3 | Trudy | Johnson >| > | 2 | Susan | Morrison >| > | 22 | Joanna| Pierce >| > | 2 | Joanna| Silver >| > | 2 | Bob | Silver >| > | 22 | Sharon| Taylor >| > > ++---+--+ > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR > 'Taylor' ; > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 22 | Blake | Burr > | > | 22 | Jake | Donnel > | > | 22 | Trudy | Henderson >
[jira] [Updated] (HIVE-27723) Prevent localizing the same original file more than once if symlinks are present
[ https://issues.apache.org/jira/browse/HIVE-27723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27723: Summary: Prevent localizing the same original file more than once if symlinks are present (was: Prevent localizing the same file more than once) > Prevent localizing the same original file more than once if symlinks are > present > > > Key: HIVE-27723 > URL: https://issues.apache.org/jira/browse/HIVE-27723 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > We already calculate SHA hashes for the files to be localized. There is a > chance, that in some setups, the hive-exec jars are symlinked so it gets > localized more than once. > {code} > [root@lbodor-hiveontez-4 ~]# sudo -u hive hdfs dfs -ls -R > /tmp/hive/hive/_tez_session_dir > drwx-- - hive supergroup 0 2023-09-20 12:13 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6 > drwx-- - hive supergroup 0 2023-09-20 12:19 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6/.tez > drwx-- - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources/hive-exec-3.1.3000.7.2.18.0-334.jar > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources/hive-exec.jar > drwx-- - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1 > drwx-- - hive supergroup 0 2023-09-20 12:04 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1/.tez > drwx-- - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources/hive-exec-3.1.3000.7.2.18.0-334.jar > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources/hive-exec.jar > drwx-- - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad > drwx-- - hive supergroup 0 2023-09-20 13:13 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad/.tez > drwx-- - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources/hive-exec-3.1.3000.7.2.18.0-334.jar > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources/hive-exec.jar > drwx-- - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57 > drwx-- - hive supergroup 0 2023-09-20 12:04 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57/.tez > drwx-- - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources/hive-exec-3.1.3000.7.2.18.0-334.jar > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources/hive-exec.jar > {code} > in the presence of huge amount of sessions, we cannot afford this overhead of > copying this files to HDFS and localizing to all containers twice > the root cause can be solved by removing symlinks of the same hive-exec jar, > however, as we're already calculating SHA for the files, it's so easy to take > care of the duplications in the localization codepath, and this takes care of > any accidental duplications -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27754) Query Filter with OR condition updates every record in the table
[ https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770032#comment-17770032 ] Krisztian Kasa commented on HIVE-27754: --- A simple repro with a query: {code} create table t1 (a int); insert into t1(a) values (1), (2), (NULL); select * from t1 where 'anything'; {code} returns {code} 1 2 NULL {code} CBO is failing in this case. From hive.log {code} 2023-09-28T05:14:55,578 ERROR [08def54d-804f-44fc-8452-c9873eb3a06e Listener at 0.0.0.0/36139] parse.CalcitePlanner: CBO failed, skipping CBO. org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Filter expression with non-boolean return type. at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3216) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3202) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3399) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3410) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5084) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1649) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1593) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1345) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13023) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:467) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) ~[hive
[jira] [Created] (HIVE-27755) Quote identifiers in SQL emitted by SchemaTool for MySQL
Stamatis Zampetakis created HIVE-27755: -- Summary: Quote identifiers in SQL emitted by SchemaTool for MySQL Key: HIVE-27755 URL: https://issues.apache.org/jira/browse/HIVE-27755 Project: Hive Issue Type: Improvement Components: Standalone Metastore Affects Versions: 4.0.0-beta-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Various SchemaTool options/tasks (e.g., "validate") generate and run SQL statements on the underlying database. Depending on the database identifiers in the SQL statements may be quoted (see [https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/HiveSchemaHelper.java#L173]). Currently, all identifiers are quoted when the database is Postgres and this tickets aims to do the same for MySQL/MariaDB. The main motivation behind this change is to avoid unexpected surprises and query failures when/if the database decides to turn some of the tables/columns we are using internally to reserved keywords. As a concrete example, the Percona fork of MySQL recently turned SEQUENCE_TABLE into a reserved keyword ([https://docs.percona.com/percona-server/8.0/flexibility/sequence_table.html]) and this comes in conflict with our internal metastore table. The installation scripts do not fail since in that case SEQUENCE_TABLE is quoted ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0-beta-2.mysql.sql#L447]) but validation queries emitted by the SchemaTool will fail ([https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/schematool/SchemaToolTaskValidate.java#L117]) if we don't use quoted identifiers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27754) Query Filter with OR condition updates every record in the table
[ https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa reassigned HIVE-27754: -- Assignee: Simhadri Govindappa > Query Filter with OR condition updates every record in the table > > > Key: HIVE-27754 > URL: https://issues.apache.org/jira/browse/HIVE-27754 > Project: Hive > Issue Type: Bug >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > > > {noformat} > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' > ;{noformat} > After the above statement, all the records are updated. The condition > {{'Taylor'}} is a constant string, and it will always evaluate to true > because it's a non-empty string. So, effectively, {{UPDATE}} statement is > updating all rows in the {{customers_man.}} > {{}} > {{Repro: }} > {noformat} > create table customers_man (customer_id bigint, first_name string) > PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES > ('transactional'='true'); > insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", > "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", > "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", > "Johnson"), (3, "Trudy", "Henderson"); > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 3 | Blake | Burr >| > | 2 | Jake | Donnel >| > | 3 | Trudy | Henderson >| > | 3 | Trudy | Johnson >| > | 2 | Susan | Morrison >| > | 1 | Joanna| Pierce >| > | 2 | Joanna| Silver >| > | 2 | Bob | Silver >| > | 1 | Sharon| Taylor >| > > ++---+--+ > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR > last_name='Taylor' ; > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 3 | Blake | Burr >| > | 2 | Jake | Donnel >| > | 3 | Trudy | Henderson >| > | 3 | Trudy | Johnson >| > | 2 | Susan | Morrison >| > | 22 | Joanna| Pierce >| > | 2 | Joanna| Silver >| > | 2 | Bob | Silver >| > | 22 | Sharon| Taylor >| > > ++---+--+ > UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR > 'Taylor' ; > select * from customers_man; > > ++---+--+ > | customers_man.customer_id | customers_man.first_name | > customers_man.last_name | > > ++---+--+ > | 22 | Blake | Burr > | > | 22 | Jake | Donnel > | > | 22 | Trudy | Henderson > | > | 22 | Trudy | Johnson > | > | 22 | Susan | Morrison > | > | 22 | Joanna| Pierce
[jira] [Created] (HIVE-27754) Query Filter with OR condition updates every record in the table
Simhadri Govindappa created HIVE-27754: -- Summary: Query Filter with OR condition updates every record in the table Key: HIVE-27754 URL: https://issues.apache.org/jira/browse/HIVE-27754 Project: Hive Issue Type: Bug Reporter: Simhadri Govindappa {noformat} UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' ;{noformat} After the above statement, all the records are updated. The condition {{'Taylor'}} is a constant string, and it will always evaluate to true because it's a non-empty string. So, effectively, {{UPDATE}} statement is updating all rows in the {{customers_man.}} {{}} {{Repro: }} {noformat} create table customers_man (customer_id bigint, first_name string) PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES ('transactional'='true'); insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", "Johnson"), (3, "Trudy", "Henderson"); select * from customers_man; ++---+--+ | customers_man.customer_id | customers_man.first_name | customers_man.last_name | ++---+--+ | 3 | Blake | Burr | | 2 | Jake | Donnel | | 3 | Trudy | Henderson | | 3 | Trudy | Johnson | | 2 | Susan | Morrison | | 1 | Joanna| Pierce | | 2 | Joanna| Silver | | 2 | Bob | Silver | | 1 | Sharon| Taylor | ++---+--+ UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR last_name='Taylor' ; select * from customers_man; ++---+--+ | customers_man.customer_id | customers_man.first_name | customers_man.last_name | ++---+--+ | 3 | Blake | Burr | | 2 | Jake | Donnel | | 3 | Trudy | Henderson | | 3 | Trudy | Johnson | | 2 | Susan | Morrison | | 22 | Joanna| Pierce | | 2 | Joanna| Silver | | 2 | Bob | Silver | | 22 | Sharon| Taylor | ++---+--+ UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' ; select * from customers_man; ++---+--+ | customers_man.customer_id | customers_man.first_name | customers_man.last_name | ++---+--+ | 22 | Blake | Burr | | 22 | Jake | Donnel | | 22 | Trudy | Henderson | | 22 | Trudy | Johnson | | 22 | Susan | Morrison | | 22 | Joanna| Pierce | | 22 | Joanna| Silver | | 22 | Bob | Silver | | 22 | Sharon| Taylor | ++---+--+ --- simpler repro UPDATE customers_man SET customer_id=23 WHERE true; select * from customers_man; +
[jira] [Commented] (HIVE-27573) Backport of HIVE-21799: NullPointerException in DynamicPartitionPruningOptimization, when join key is on aggregation column
[ https://issues.apache.org/jira/browse/HIVE-27573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770015#comment-17770015 ] Sankar Hariappan commented on HIVE-27573: - Thanks [~shefali636] for the contribution! > Backport of HIVE-21799: NullPointerException in > DynamicPartitionPruningOptimization, when join key is on aggregation column > --- > > Key: HIVE-27573 > URL: https://issues.apache.org/jira/browse/HIVE-27573 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.1.3 >Reporter: Shefali Singh >Assignee: Shefali Singh >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27573) Backport of HIVE-21799: NullPointerException in DynamicPartitionPruningOptimization, when join key is on aggregation column
[ https://issues.apache.org/jira/browse/HIVE-27573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-27573. - Fix Version/s: 3.2.0 Resolution: Fixed > Backport of HIVE-21799: NullPointerException in > DynamicPartitionPruningOptimization, when join key is on aggregation column > --- > > Key: HIVE-27573 > URL: https://issues.apache.org/jira/browse/HIVE-27573 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Shefali Singh >Assignee: Shefali Singh >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27573) Backport of HIVE-21799: NullPointerException in DynamicPartitionPruningOptimization, when join key is on aggregation column
[ https://issues.apache.org/jira/browse/HIVE-27573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-27573: Affects Version/s: 3.1.3 (was: 3.2.0) > Backport of HIVE-21799: NullPointerException in > DynamicPartitionPruningOptimization, when join key is on aggregation column > --- > > Key: HIVE-27573 > URL: https://issues.apache.org/jira/browse/HIVE-27573 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.1.3 >Reporter: Shefali Singh >Assignee: Shefali Singh >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27753) Mask Q file output to avoid flakyness
[ https://issues.apache.org/jira/browse/HIVE-27753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KIRTI RUGE updated HIVE-27753: -- Description: Mask below pattern in q output files to avoid flakyness of tests drwxr-xr-x - ### USER ### ### GROUP ### 0 ### HDFS DATE ### hdfs://### HDFS PATH ### was:Mask below pattern in q output files to avoid flakyness of tests > Mask Q file output to avoid flakyness > -- > > Key: HIVE-27753 > URL: https://issues.apache.org/jira/browse/HIVE-27753 > Project: Hive > Issue Type: Improvement >Reporter: KIRTI RUGE >Assignee: KIRTI RUGE >Priority: Major > > Mask below pattern in q output files to avoid flakyness of tests > > drwxr-xr-x - ### USER ### ### GROUP ### 0 ### HDFS DATE ### hdfs://### HDFS > PATH ### -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27753) Mask Q file output to avoid flakyness
[ https://issues.apache.org/jira/browse/HIVE-27753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KIRTI RUGE reassigned HIVE-27753: - Assignee: KIRTI RUGE > Mask Q file output to avoid flakyness > -- > > Key: HIVE-27753 > URL: https://issues.apache.org/jira/browse/HIVE-27753 > Project: Hive > Issue Type: Improvement >Reporter: KIRTI RUGE >Assignee: KIRTI RUGE >Priority: Major > > Mask below pattern in q output files to avoid flakyness of tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27753) Mask Q file output to avoid flakyness
KIRTI RUGE created HIVE-27753: - Summary: Mask Q file output to avoid flakyness Key: HIVE-27753 URL: https://issues.apache.org/jira/browse/HIVE-27753 Project: Hive Issue Type: Improvement Reporter: KIRTI RUGE Mask below pattern in q output files to avoid flakyness of tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class
[ https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshat Mathur updated HIVE-27752: - Target Version/s: 4.0.0 Status: Patch Available (was: In Progress) > Remove DagUtils duplicate class > --- > > Key: HIVE-27752 > URL: https://issues.apache.org/jira/browse/HIVE-27752 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: Akshat Mathur >Priority: Minor > Labels: newbie, pull-request-available > > remove this small orphaned stuff: > https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java > and place method to > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27752) Remove DagUtils duplicate class
[ https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27752 started by Akshat Mathur. > Remove DagUtils duplicate class > --- > > Key: HIVE-27752 > URL: https://issues.apache.org/jira/browse/HIVE-27752 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: Akshat Mathur >Priority: Minor > Labels: newbie, pull-request-available > > remove this small orphaned stuff: > https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java > and place method to > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class
[ https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27752: -- Labels: newbie pull-request-available (was: newbie) > Remove DagUtils duplicate class > --- > > Key: HIVE-27752 > URL: https://issues.apache.org/jira/browse/HIVE-27752 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: Akshat Mathur >Priority: Minor > Labels: newbie, pull-request-available > > remove this small orphaned stuff: > https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java > and place method to > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27695) Intermittent OOM when running TestMiniTezCliDriver
[ https://issues.apache.org/jira/browse/HIVE-27695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769958#comment-17769958 ] Stamatis Zampetakis commented on HIVE-27695: More failed runs due to OOM in TezCliDriver: * http://ci.hive.apache.org/job/hive-precommit/job/PR-4750/3/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_19___PostProcess___testCliDriver_flatten_union_subdir_/ * http://ci.hive.apache.org/job/hive-precommit/job/PR-4754/1/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_19___PostProcess___testCliDriver_tez_union_with_udf_/ > Intermittent OOM when running TestMiniTezCliDriver > -- > > Key: HIVE-27695 > URL: https://issues.apache.org/jira/browse/HIVE-27695 > Project: Hive > Issue Type: Bug > Components: Test >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: am_heap_dumps.tar.xz, leak_suspect_1.png > > > Running all the tests under TestMiniTezCliDriver very frequently (but still > intermittently) leads to OutOfMemory errors. > {noformat} > cd itests/qtest && mvn test -Dtest=TestMiniTezCliDriver > {noformat} > I set {{-XX:+HeapDumpOnOutOfMemoryError}} and the respective heapdumps are > attached to this ticket. > The OOM is thrown from the application master and a quick inspection of the > dumps shows that it comes mainly from the accumulation of Configuration > objects (~1MB each) by various classes. > The max heap size for application master is pretty low (~100MB) so it is > quite easy to reach. The heap size is explicitly very low for testing > purposes but maybe we should re-evaluate the current configurations for the > tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27751) Log Query Compilation summary in an accumulated way
[ https://issues.apache.org/jira/browse/HIVE-27751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27751: -- Labels: pull-request-available (was: ) > Log Query Compilation summary in an accumulated way > --- > > Key: HIVE-27751 > URL: https://issues.apache.org/jira/browse/HIVE-27751 > Project: Hive > Issue Type: Task > Components: Hive >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > > Query Compilation summary is very useful for reading and collecting all the > measures of compile time in a single place. It is also useful in debugging a > performance issue in the query compilation phase and also to report and > compare with various runs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27752) Remove DagUtils duplicate class
[ https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshat Mathur reassigned HIVE-27752: Assignee: Akshat Mathur > Remove DagUtils duplicate class > --- > > Key: HIVE-27752 > URL: https://issues.apache.org/jira/browse/HIVE-27752 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: Akshat Mathur >Priority: Minor > Labels: newbie > > remove this small orphaned stuff: > https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java > and place method to > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class
[ https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27752: Labels: newbie (was: ) > Remove DagUtils duplicate class > --- > > Key: HIVE-27752 > URL: https://issues.apache.org/jira/browse/HIVE-27752 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > Labels: newbie > > remove this small orphaned stuff: > https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java > and place method to > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class
[ https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-27752: Priority: Minor (was: Major) > Remove DagUtils duplicate class > --- > > Key: HIVE-27752 > URL: https://issues.apache.org/jira/browse/HIVE-27752 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Minor > Labels: newbie > > remove this small orphaned stuff: > https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java > and place method to > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27752) Remove DagUtils duplicate
László Bodor created HIVE-27752: --- Summary: Remove DagUtils duplicate Key: HIVE-27752 URL: https://issues.apache.org/jira/browse/HIVE-27752 Project: Hive Issue Type: Improvement Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class
[ https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27752: Summary: Remove DagUtils duplicate class (was: Remove DagUtils duplicate) > Remove DagUtils duplicate class > --- > > Key: HIVE-27752 > URL: https://issues.apache.org/jira/browse/HIVE-27752 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27752) Remove DagUtils duplicate class
[ https://issues.apache.org/jira/browse/HIVE-27752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27752: Description: remove this small orphaned stuff: https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java and place method to https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java > Remove DagUtils duplicate class > --- > > Key: HIVE-27752 > URL: https://issues.apache.org/jira/browse/HIVE-27752 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > > remove this small orphaned stuff: > https://github.com/apache/hive/blob/57c15936d7a69e215c986d62aa959e70cb352da4/ql/src/java/org/apache/hadoop/hive/ql/exec/DagUtils.java > and place method to > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27751) Log Query Compilation summary in an accumulated way
Ramesh Kumar Thangarajan created HIVE-27751: --- Summary: Log Query Compilation summary in an accumulated way Key: HIVE-27751 URL: https://issues.apache.org/jira/browse/HIVE-27751 Project: Hive Issue Type: Task Components: Hive Reporter: Ramesh Kumar Thangarajan Assignee: Ramesh Kumar Thangarajan Query Compilation summary is very useful for reading and collecting all the measures of compile time in a single place. It is also useful in debugging a performance issue in the query compilation phase and also to report and compare with various runs -- This message was sent by Atlassian Jira (v8.20.10#820010)