[jira] [Work logged] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-25404?focusedWorklogId=641526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641526 ] ASF GitHub Bot logged work on HIVE-25404: - Author: ASF GitHub Bot Created on: 25/Aug/21 06:46 Start Date: 25/Aug/21 06:46 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2568: URL: https://github.com/apache/hive/pull/2568#discussion_r695446760 ## File path: ql/src/test/queries/clientpositive/merge_partitioned_insert.q ## @@ -0,0 +1,19 @@ +--! qt:transactional + +drop table u; +drop table t; + +create table u(id integer); +insert into u values(3); + +create table t1(id integer, value string default 'def'); +insert into t1 values(1,'xx'); +insert into t1 (id) values(2); + +merge into t1 t using u on t.id=u.id when not matched then insert (id) values (u.id); + Review comment: makes sense...original test was only checking for the compilation error I don't think we have this covered that well..so I've added the `select` statements -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641526) Time Spent: 50m (was: 40m) > Inserts inside merge statements are rewritten incorrectly for partitioned > tables > > > Key: HIVE-25404 > URL: https://issues.apache.org/jira/browse/HIVE-25404 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > {code} > drop table u;drop table t; > create table t(value string default 'def') partitioned by (id integer); > create table u(id integer); > {code} > #1 id&value specified > rewritten > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause > SELECT `u`.`id`,'x' >WHERE `t`.`id` IS NULL > {code} > #2 when values is not specified > {code} > merge into t using u on t.id=u.id when not matched then insert (id) values > (u.id); > {code} > rewritten query: > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause > SELECT `u`.`id` >WHERE `t`.`id` IS NULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-25404?focusedWorklogId=641524&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641524 ] ASF GitHub Bot logged work on HIVE-25404: - Author: ASF GitHub Bot Created on: 25/Aug/21 06:28 Start Date: 25/Aug/21 06:28 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2568: URL: https://github.com/apache/hive/pull/2568#discussion_r695436813 ## File path: ql/src/test/queries/clientpositive/merge_partitioned_insert.q ## @@ -0,0 +1,19 @@ +--! qt:transactional + +drop table u; +drop table t; + +create table u(id integer); +insert into u values(3); + +create table t1(id integer, value string default 'def'); +insert into t1 values(1,'xx'); +insert into t1 (id) values(2); + +merge into t1 t using u on t.id=u.id when not matched then insert (id) values (u.id); Review comment: no; the issue was a compilation - I didn't wanted to add a big explain which doesn't catch any part of the issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641524) Time Spent: 40m (was: 0.5h) > Inserts inside merge statements are rewritten incorrectly for partitioned > tables > > > Key: HIVE-25404 > URL: https://issues.apache.org/jira/browse/HIVE-25404 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > drop table u;drop table t; > create table t(value string default 'def') partitioned by (id integer); > create table u(id integer); > {code} > #1 id&value specified > rewritten > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause > SELECT `u`.`id`,'x' >WHERE `t`.`id` IS NULL > {code} > #2 when values is not specified > {code} > merge into t using u on t.id=u.id when not matched then insert (id) values > (u.id); > {code} > rewritten query: > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause > SELECT `u`.`id` >WHERE `t`.`id` IS NULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-25404?focusedWorklogId=641523&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641523 ] ASF GitHub Bot logged work on HIVE-25404: - Author: ASF GitHub Bot Created on: 25/Aug/21 06:27 Start Date: 25/Aug/21 06:27 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2568: URL: https://github.com/apache/hive/pull/2568#discussion_r695436190 ## File path: ql/src/test/queries/clientpositive/merge_partitioned_insert.q ## @@ -0,0 +1,19 @@ +--! qt:transactional + +drop table u; +drop table t; Review comment: my original test case only had one case; later added a second removed these `drop` statements as they are only needed from beeline -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641523) Time Spent: 0.5h (was: 20m) > Inserts inside merge statements are rewritten incorrectly for partitioned > tables > > > Key: HIVE-25404 > URL: https://issues.apache.org/jira/browse/HIVE-25404 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > drop table u;drop table t; > create table t(value string default 'def') partitioned by (id integer); > create table u(id integer); > {code} > #1 id&value specified > rewritten > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause > SELECT `u`.`id`,'x' >WHERE `t`.`id` IS NULL > {code} > #2 when values is not specified > {code} > merge into t using u on t.id=u.id when not matched then insert (id) values > (u.id); > {code} > rewritten query: > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause > SELECT `u`.`id` >WHERE `t`.`id` IS NULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25317) Relocate dependencies in shaded hive-exec module
[ https://issues.apache.org/jira/browse/HIVE-25317?focusedWorklogId=641468&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641468 ] ASF GitHub Bot logged work on HIVE-25317: - Author: ASF GitHub Bot Created on: 25/Aug/21 02:53 Start Date: 25/Aug/21 02:53 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #2459: URL: https://github.com/apache/hive/pull/2459#discussion_r695352477 ## File path: llap-server/pom.xml ## @@ -38,6 +38,7 @@ org.apache.hive hive-exec ${project.version} + core Review comment: @sunchao do we need to have similar change on master first? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641468) Time Spent: 2.5h (was: 2h 20m) > Relocate dependencies in shaded hive-exec module > > > Key: HIVE-25317 > URL: https://issues.apache.org/jira/browse/HIVE-25317 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > When we want to use shaded version of hive-exec (i.e., w/o classifier), more > dependencies conflict with Spark. We need to relocate these dependencies too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) Concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Description: In the Linux environment, adding multiple jars concurrently through HiveCli or JDBC will increase the system cpu and even affect the service. Finally, we found that when the add jar is executed, the FileUtil chmod method is used to grant permissions to the downloaded jar file. The performance of this method is very low. So we use the setPosixFilePermissions method of the Files class to test. The performance is seventy to eighty times that of FileUtil (the same file is given permissions in multiple cycles, when it is cycled 1000 times), and as the number of cycles increases, the gap becomes larger and larger. But the file requires jdk7+, which is not friendly to windows. Therefore, if you use the setPosixFilePermissions method of the Files class to grant permissions to files in an operating system that conforms to the posix specification(tested on Mac and Linux), the performance will be improved. (was: In the Linux environment, adding multiple jars concurrently through HiveCli or JDBC will increase the system cpu and even affect the service. Finally, we found that when the add jar is executed, the FileUtil chmod method is used to grant permissions to the downloaded jar file. The performance of this method is very low. So we use the setPosixFilePermissions method of the Files class to test. The performance is seventy to eighty times that of FileUtil (the same file is given permissions in multiple cycles, when it is cycled 1000 times). But the file requires jdk7+, which is not friendly to windows. Therefore, if you use the setPosixFilePermissions method of the Files class to grant permissions to files in an operating system that conforms to the posix specification(tested on Mac and Linux), the performance will be improved.) > Concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.jpg, HIVE-25474.patch, PermissionTest.java > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times), and as the number of cycles increases, the gap becomes larger and > larger. But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) Concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Summary: Concurrency add jars cause hiveserver2 sys cpu to high (was: concurrency add jars cause hiveserver2 sys cpu to high) > Concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.jpg, HIVE-25474.patch, PermissionTest.java > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times). But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Attachment: PermissionTest.java > concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.jpg, HIVE-25474.patch, PermissionTest.java > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times). But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Attachment: HIVE-25474.jpg > concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.jpg, HIVE-25474.patch > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times). But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Attachment: HIVE-25474.patch Status: Patch Available (was: In Progress) > concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.patch > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times). But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Attachment: (was: HIVE-25474.patch) > concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.patch > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times). But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25476) Remove Unused Dependencies for JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-25476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25476: -- Labels: pull-request-available (was: ) > Remove Unused Dependencies for JDBC Driver > -- > > Key: HIVE-25476 > URL: https://issues.apache.org/jira/browse/HIVE-25476 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I am using JDBC driver in a project and was very surprised by the number of > dependencies it has. Remove some unnecessary dependencies to make it a > little easier to work with. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25476) Remove Unused Dependencies for JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-25476?focusedWorklogId=641437&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641437 ] ASF GitHub Bot logged work on HIVE-25476: - Author: ASF GitHub Bot Created on: 25/Aug/21 01:22 Start Date: 25/Aug/21 01:22 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #2599: URL: https://github.com/apache/hive/pull/2599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641437) Remaining Estimate: 0h Time Spent: 10m > Remove Unused Dependencies for JDBC Driver > -- > > Key: HIVE-25476 > URL: https://issues.apache.org/jira/browse/HIVE-25476 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > I am using JDBC driver in a project and was very surprised by the number of > dependencies it has. Remove some unnecessary dependencies to make it a > little easier to work with. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25477) Clean Up JDBC Code
[ https://issues.apache.org/jira/browse/HIVE-25477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25477: -- Labels: pull-request-available (was: ) > Clean Up JDBC Code > -- > > Key: HIVE-25477 > URL: https://issues.apache.org/jira/browse/HIVE-25477 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > * Remove unused imports > * Remove unused code -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25477) Clean Up JDBC Code
[ https://issues.apache.org/jira/browse/HIVE-25477?focusedWorklogId=641436&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641436 ] ASF GitHub Bot logged work on HIVE-25477: - Author: ASF GitHub Bot Created on: 25/Aug/21 01:21 Start Date: 25/Aug/21 01:21 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2600: URL: https://github.com/apache/hive/pull/2600#issuecomment-905094983 @nrg4878 Review please? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641436) Remaining Estimate: 0h Time Spent: 10m > Clean Up JDBC Code > -- > > Key: HIVE-25477 > URL: https://issues.apache.org/jira/browse/HIVE-25477 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > * Remove unused imports > * Remove unused code -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23571) [CachedStore] Add ValidWriteIdList to SharedCache.TableWrapper
[ https://issues.apache.org/jira/browse/HIVE-23571?focusedWorklogId=641415&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641415 ] ASF GitHub Bot logged work on HIVE-23571: - Author: ASF GitHub Bot Created on: 25/Aug/21 00:12 Start Date: 25/Aug/21 00:12 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2128: URL: https://github.com/apache/hive/pull/2128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641415) Time Spent: 4.5h (was: 4h 20m) > [CachedStore] Add ValidWriteIdList to SharedCache.TableWrapper > -- > > Key: HIVE-23571 > URL: https://issues.apache.org/jira/browse/HIVE-23571 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Add ValidWriteIdList to SharedCache.TableWrapper. This would be used in > deciding whether a given read request can be served from the cache or we have > to reload it from the backing database. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=641392&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641392 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 24/Aug/21 23:45 Start Date: 24/Aug/21 23:45 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r695289418 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { +lcv.lengths[index] = 0; +lcv.isNull[index] = true; +lcv.noNulls = false; + } elements.add(lastValue); } while (fetchNextValue(category) && (repetitionLevel != 0)); -lcv.isNull[index] = false; lcv.lengths[index] = elements.size() - lcv.offsets[index]; Review comment: good catch, I'm removing the assignment in the loop, because this outer assignment is valid under all circumstances -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641392) Time Spent: 3h 10m (was: 3h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:25
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=641387&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641387 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 24/Aug/21 23:40 Start Date: 24/Aug/21 23:40 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r695287365 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { Review comment: the original intention here was to sign if there is a NULL value instead of a list, which is happens in definitionLevel == 0, I'll change this part and add some comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641387) Time Spent: 3h (was: 2h 50m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.
[jira] [Updated] (HIVE-25479) Browser SSO auth may fail intermittently on chrome browser in virtual environments
[ https://issues.apache.org/jira/browse/HIVE-25479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25479: -- Labels: pull-request-available (was: ) > Browser SSO auth may fail intermittently on chrome browser in virtual > environments > -- > > Key: HIVE-25479 > URL: https://issues.apache.org/jira/browse/HIVE-25479 > Project: Hive > Issue Type: Bug > Components: JDBC >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When browser based SSO is enabled the Hive JDBC driver might miss the POST > requests coming from the browser which provide the one-time token issued by > HS2s after the SAML flow completes. The issue was observed mostly in virtual > environments on Windows. > The issue seems to be that when the driver binds to a port even though the > port is in LISTEN state, if the browser issues posts request on the port > before it goes into ACCEPT state the result is non-deterministic. On native > OSes we observed that the connection is buffered and is received by the > driver when it begins accepting the connections. In case of VMs it is > observed that even though the connection is buffered and presented when the > port goes into ACCEPT mode, the payload of the request or the connection > itself is lost. This race condition causes the driver to wait for the browser > until it timesout and the browser keeps waiting for a response from the > driver. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25479) Browser SSO auth may fail intermittently on chrome browser in virtual environments
[ https://issues.apache.org/jira/browse/HIVE-25479?focusedWorklogId=641335&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641335 ] ASF GitHub Bot logged work on HIVE-25479: - Author: ASF GitHub Bot Created on: 24/Aug/21 21:07 Start Date: 24/Aug/21 21:07 Worklog Time Spent: 10m Work Description: vihangk1 opened a new pull request #2601: URL: https://github.com/apache/hive/pull/2601 ### What changes were proposed in this pull request? This patch fixes a race condition on the JDBC driver side when it brings up a browser to do a SAML based SSO authentication. The race condition occurs sometimes in virtual environment and has to do with the port state when the browser sends a POST request to the driver. More details are available in JIRA. ### Why are the changes needed? To fix the bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The issue is not reproducible on my dev machine. However, I added some new tests to cover the new changes and additionally the patch was manually verified on Windows 10 VMs where the issue was reproducible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641335) Remaining Estimate: 0h Time Spent: 10m > Browser SSO auth may fail intermittently on chrome browser in virtual > environments > -- > > Key: HIVE-25479 > URL: https://issues.apache.org/jira/browse/HIVE-25479 > Project: Hive > Issue Type: Bug > Components: JDBC >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When browser based SSO is enabled the Hive JDBC driver might miss the POST > requests coming from the browser which provide the one-time token issued by > HS2s after the SAML flow completes. The issue was observed mostly in virtual > environments on Windows. > The issue seems to be that when the driver binds to a port even though the > port is in LISTEN state, if the browser issues posts request on the port > before it goes into ACCEPT state the result is non-deterministic. On native > OSes we observed that the connection is buffered and is received by the > driver when it begins accepting the connections. In case of VMs it is > observed that even though the connection is buffered and presented when the > port goes into ACCEPT mode, the payload of the request or the connection > itself is lost. This race condition causes the driver to wait for the browser > until it timesout and the browser keeps waiting for a response from the > driver. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25317) Relocate dependencies in shaded hive-exec module
[ https://issues.apache.org/jira/browse/HIVE-25317?focusedWorklogId=641313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641313 ] ASF GitHub Bot logged work on HIVE-25317: - Author: ASF GitHub Bot Created on: 24/Aug/21 20:38 Start Date: 24/Aug/21 20:38 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #2459: URL: https://github.com/apache/hive/pull/2459#discussion_r695185072 ## File path: llap-server/pom.xml ## @@ -38,6 +38,7 @@ org.apache.hive hive-exec ${project.version} + core Review comment: As more dependencies are relocated here, some modules if they depends on non-core artifact, will cause class not found error... The motivation is because we want to use shaded version of hive-exec (i.e., w/o classifier) in Spark to make sure it doesn't conflict guava version there. But there are more dependencies conflict with Spark. We need to relocate these dependencies too.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641313) Time Spent: 2h 20m (was: 2h 10m) > Relocate dependencies in shaded hive-exec module > > > Key: HIVE-25317 > URL: https://issues.apache.org/jira/browse/HIVE-25317 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > When we want to use shaded version of hive-exec (i.e., w/o classifier), more > dependencies conflict with Spark. We need to relocate these dependencies too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25408) AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for Authorization.
[ https://issues.apache.org/jira/browse/HIVE-25408?focusedWorklogId=641280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641280 ] ASF GitHub Bot logged work on HIVE-25408: - Author: ASF GitHub Bot Created on: 24/Aug/21 19:35 Start Date: 24/Aug/21 19:35 Worklog Time Spent: 10m Work Description: yongzhi merged pull request #2560: URL: https://github.com/apache/hive/pull/2560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641280) Time Spent: 40m (was: 0.5h) > AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for > Authorization. > - > > Key: HIVE-25408 > URL: https://issues.apache.org/jira/browse/HIVE-25408 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Currently, Hive is sending an empty list in the Hive Privilege Objects for > authorization when a user does the following operation: alter table foo set > owner user user_name; > We should be sending the input/objects related to the table in Hive privilege > objects for authorization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25479) Browser SSO auth may fail intermittently on chrome browser in virtual environments
[ https://issues.apache.org/jira/browse/HIVE-25479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reassigned HIVE-25479: -- > Browser SSO auth may fail intermittently on chrome browser in virtual > environments > -- > > Key: HIVE-25479 > URL: https://issues.apache.org/jira/browse/HIVE-25479 > Project: Hive > Issue Type: Bug > Components: JDBC >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > When browser based SSO is enabled the Hive JDBC driver might miss the POST > requests coming from the browser which provide the one-time token issued by > HS2s after the SAML flow completes. The issue was observed mostly in virtual > environments on Windows. > The issue seems to be that when the driver binds to a port even though the > port is in LISTEN state, if the browser issues posts request on the port > before it goes into ACCEPT state the result is non-deterministic. On native > OSes we observed that the connection is buffered and is received by the > driver when it begins accepting the connections. In case of VMs it is > observed that even though the connection is buffered and presented when the > port goes into ACCEPT mode, the payload of the request or the connection > itself is lost. This race condition causes the driver to wait for the browser > until it timesout and the browser keeps waiting for a response from the > driver. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25478) Temp file left over after ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS
[ https://issues.apache.org/jira/browse/HIVE-25478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline reassigned HIVE-25478: --- > Temp file left over after ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS > - > > Key: HIVE-25478 > URL: https://issues.apache.org/jira/browse/HIVE-25478 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > > The dot staging file (".hive-staging") file is not removed at the end of the > ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS operation as it is for say an > INSERT that does automatic statistics collection. I expected it would be > deleted after the Stats Work stage. > Any ideas where in the code to add automatic deletion (hook)? > hdfs dfs -ls /hive/warehouse/managed/table_orc > Found 2 items > drwxr-xr-x - hive supergroup 0 2021-08-24 17:19 > /hive/warehouse/managed/table_orc/.hive-staging_hive_2021-08-24_17-19-17_228_4856027533912221506-7 > drwxr-xr-x - hive supergroup 0 2021-08-24 07:17 > /hive/warehouse/managed/table_orc/delta_001_001_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403937#comment-17403937 ] Panagiotis Garefalakis commented on HIVE-24316: --- Hey [~glapark] thanks for bringing this up -- taking a look at MemoryManagerImpl looks like checkMemory() is the new method that determines if the scale has changed and since ORC-361 removed getTotalMemoryPool() calls from multiple places we are loosing the effect of controlling the memory pool. The intention behind LlapAwareMemoryManager was to have memory per executor instead of the entire heap since multiple writers are involved. An idea could be to restore getTotalMemoryPool calls where needed . > Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-24316 > URL: https://issues.apache.org/jira/browse/HIVE-24316 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 3.1.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 3.1.3 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > This will bring eleven bug fixes. > * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702] > * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25477) Clean Up JDBC Code
[ https://issues.apache.org/jira/browse/HIVE-25477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-25477: - > Clean Up JDBC Code > -- > > Key: HIVE-25477 > URL: https://issues.apache.org/jira/browse/HIVE-25477 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > * Remove unused imports > * Remove unused code -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25476) Remove Unused Dependencies for JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-25476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-25476: - > Remove Unused Dependencies for JDBC Driver > -- > > Key: HIVE-25476 > URL: https://issues.apache.org/jira/browse/HIVE-25476 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > I am using JDBC driver in a project and was very surprised by the number of > dependencies it has. Remove some unnecessary dependencies to make it a > little easier to work with. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403902#comment-17403902 ] Dongjoon Hyun commented on HIVE-24316: -- cc [~omalley] > Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-24316 > URL: https://issues.apache.org/jira/browse/HIVE-24316 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 3.1.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 3.1.3 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > This will bring eleven bug fixes. > * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702] > * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403893#comment-17403893 ] Sungwoo commented on HIVE-24316: Hello, It seems that with ORC-361, the use of MemoryManagerImpl in LlapAwareMemoryManager is inconsistent. Before merging ORC-361, LlapAwareMemoryManager sets its own totalMemoryPool and MemoryManagerImpl accesses totalMemoryPool via getTotalMemoryPool(), so everything is fine. With ORC-361 merged, we have the following: 1. LlapAwareMemoryManager sets its own totalMemoryPool as a private field. 2. MemoryManagerImpl sets its own totalMemoryPool as a private field. 3. LlapAwareMemoryManager overrides getTotalMemoryPool() using its own totalMemoryPool. Now it is unclear whether or not getTotalMemoryPool() should be overridden. Here are my thoughts on ORC-361: 1. Is MemoryManagerImpl intended to coordinate all threads writing to ORC files inside a process (like LLAP Daemon)? Then is it necessary to create LlapAwareMemoryManager as a ThreadLocal object? Why not just call OrcFile.getStaticMemoryManager() to obtain the shared MemoryManagerImpl? 3. LlapAwareMemoryManager sets its own totalMemoryPool: {code:java} long memPerExecutor = LlapDaemonInfo.INSTANCE.getMemoryPerExecutor(); totalMemoryPool = (long) (memPerExecutor * maxLoad); {code} >From my understanding, this has no effect because MemoryManagerImpl sets its >own totalMemoryPool. Any comment would be appreciated. > Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 > - > > Key: HIVE-24316 > URL: https://issues.apache.org/jira/browse/HIVE-24316 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 3.1.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 3.1.3 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > This will bring eleven bug fixes. > * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702] > * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Summary: concurrency add jars cause hiveserver2 sys cpu to high (was: Improvement concurrency add jars cause hiveserver2 sys cpu to high) > concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.patch > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times). But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) Improvement concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Attachment: HIVE-25474.patch > Improvement concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.patch > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times). But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) Improvement concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Attachment: (was: HIVE-25474.patch) > Improvement concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.patch > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times). But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25474) Improvement concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guangbao zhao updated HIVE-25474: - Fix Version/s: (was: 3.1.2) Target Version/s: (was: 3.1.2) Description: In the Linux environment, adding multiple jars concurrently through HiveCli or JDBC will increase the system cpu and even affect the service. Finally, we found that when the add jar is executed, the FileUtil chmod method is used to grant permissions to the downloaded jar file. The performance of this method is very low. So we use the setPosixFilePermissions method of the Files class to test. The performance is seventy to eighty times that of FileUtil (the same file is given permissions in multiple cycles, when it is cycled 1000 times). But the file requires jdk7+, which is not friendly to windows. Therefore, if you use the setPosixFilePermissions method of the Files class to grant permissions to files in an operating system that conforms to the posix specification(tested on Mac and Linux), the performance will be improved. (was: In the Linux environment, when there are multiple concurrent add jars through HiveCli or JDBC, the system cpu will increase. The currently used FileUtil.chmod(dest, "ugo+rx", true); method is used for file authorization, However, in jdk7+, can use Files.setPosixFilePermissions(path, perms); for file authorization. The performance is seventy to eighty times that of the above. Why not apply this method?) Summary: Improvement concurrency add jars cause hiveserver2 sys cpu to high (was: concurrency add jars cause hiveserver2 sys cpu to high) > Improvement concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Attachments: HIVE-25474.patch > > > In the Linux environment, adding multiple jars concurrently through HiveCli > or JDBC will increase the system cpu and even affect the service. Finally, we > found that when the add jar is executed, the FileUtil chmod method is used to > grant permissions to the downloaded jar file. The performance of this method > is very low. So we use the setPosixFilePermissions method of the Files class > to test. The performance is seventy to eighty times that of FileUtil (the > same file is given permissions in multiple cycles, when it is cycled 1000 > times). But the file requires jdk7+, which is not friendly to windows. > Therefore, if you use the setPosixFilePermissions method of the Files class > to grant permissions to files in an operating system that conforms to the > posix specification(tested on Mac and Linux), the performance will be > improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25331) Create database query doesn't create MANAGEDLOCATION directory
[ https://issues.apache.org/jira/browse/HIVE-25331?focusedWorklogId=641095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641095 ] ASF GitHub Bot logged work on HIVE-25331: - Author: ASF GitHub Bot Created on: 24/Aug/21 13:30 Start Date: 24/Aug/21 13:30 Worklog Time Spent: 10m Work Description: ujc714 commented on a change in pull request #2478: URL: https://github.com/apache/hive/pull/2478#discussion_r694851572 ## File path: ql/src/test/results/clientpositive/llap/alter_change_db_location.q.out ## @@ -11,7 +11,7 @@ PREHOOK: Input: database:newdb POSTHOOK: query: describe database extended newDB POSTHOOK: type: DESCDATABASE POSTHOOK: Input: database:newdb -newdb location/in/testhive_test_user USER + A masked pattern was here Review comment: Actually the original output is like: newdblocation/in/test file:/home/robbie/hive/itests/qtest/target/localfs/warehouse/newdb.db hive_test_user USER The managedlocation is not empty. Because of the pattern "file:/", QOutProcessor masks the whole line. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641095) Time Spent: 1.5h (was: 1h 20m) > Create database query doesn't create MANAGEDLOCATION directory > -- > > Key: HIVE-25331 > URL: https://issues.apache.org/jira/browse/HIVE-25331 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > If we don't assign MANAGEDLOCATION in a "create database" query, the > MANAGEDLOCATION will be NULL so HMS doesn't create the directory. In this > case, a CTAS query immediately after the CREATE DATABASE query might fail in > MOVE task due to "destination's parent does not exist". I can use the > following script to reproduce this issue: > {code:java} > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > create database testdb location '/tmp/testdb.db'; > create table testdb.test as select 1; > {code} > If the staging directory is under the MANAGEDLOCATION directory, the CTAS > query is fine as the MANAGEDLOCATION directory is created while creating the > staging directory. Since we set LOCATION to a default directory when LOCATION > is not assigned in the CREATE DATABASE query, I believe it's worth to set > MANAGEDLOCATION to a default directory, too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25329) CTAS creates a managed table as non-ACID table
[ https://issues.apache.org/jira/browse/HIVE-25329?focusedWorklogId=641086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641086 ] ASF GitHub Bot logged work on HIVE-25329: - Author: ASF GitHub Bot Created on: 24/Aug/21 13:20 Start Date: 24/Aug/21 13:20 Worklog Time Spent: 10m Work Description: ujc714 commented on a change in pull request #2477: URL: https://github.com/apache/hive/pull/2477#discussion_r694842778 ## File path: ql/src/test/queries/clientpositive/create_table.q ## @@ -0,0 +1,39 @@ +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; +set hive.create.as.external.legacy=true; + +-- When hive.create.as.external.legacy is true, the tables created with +-- 'managed' or 'transactional' are ACID tables but the tables create +-- without 'managed' and 'transactional' are non-ACID tables. +-- Note: managed non-ACID tables are allowed because tables are not +-- transformed when hive.in.test is true. + +-- Create tables with 'transactional'. These tables have table property +-- 'transactional'='true' +create transactional table test11 as select 1; +show create table test11; +describe formatted test11; + +create transactional table test12 as select 1; Review comment: I'll change the test cases then rebase and submit again :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641086) Time Spent: 1h 40m (was: 1.5h) > CTAS creates a managed table as non-ACID table > -- > > Key: HIVE-25329 > URL: https://issues.apache.org/jira/browse/HIVE-25329 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > According to HIVE-22158, MANAGED tables should be ACID tables only. When we > set hive.create.as.external.legacy to true, the query like 'create managed > table as select 1' creates a non-ACID table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25329) CTAS creates a managed table as non-ACID table
[ https://issues.apache.org/jira/browse/HIVE-25329?focusedWorklogId=641065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641065 ] ASF GitHub Bot logged work on HIVE-25329: - Author: ASF GitHub Bot Created on: 24/Aug/21 12:54 Start Date: 24/Aug/21 12:54 Worklog Time Spent: 10m Work Description: ujc714 commented on a change in pull request #2477: URL: https://github.com/apache/hive/pull/2477#discussion_r694821704 ## File path: iceberg/iceberg-handler/src/test/results/positive/truncate_force_iceberg_table.q.out ## @@ -85,7 +85,7 @@ Retention:0 A masked pattern was here Table Type:EXTERNAL_TABLE Table Parameters: - COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} + COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\",\"value\":\"true\"}} Review comment: The iceberg tests failed after I rebased. I don't think this change is related to the code in SemanticAnalyzer.java. HIVE-25276 also changed these test files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641065) Time Spent: 1.5h (was: 1h 20m) > CTAS creates a managed table as non-ACID table > -- > > Key: HIVE-25329 > URL: https://issues.apache.org/jira/browse/HIVE-25329 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > According to HIVE-22158, MANAGED tables should be ACID tables only. When we > set hive.create.as.external.legacy to true, the query like 'create managed > table as select 1' creates a non-ACID table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit
[ https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=641044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641044 ] ASF GitHub Bot logged work on HIVE-25429: - Author: ASF GitHub Bot Created on: 24/Aug/21 12:02 Start Date: 24/Aug/21 12:02 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2563: URL: https://github.com/apache/hive/pull/2563#discussion_r694770338 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetricsOnTez.java ## @@ -17,42 +17,59 @@ */ package org.apache.hadoop.hive.ql.txn.compactor; -import com.codahale.metrics.Gauge; import org.apache.commons.lang3.RandomStringUtils; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.common.metrics.common.MetricsFactory; -import org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics; import org.apache.hadoop.hive.conf.HiveConf; import org.apache.hadoop.hive.metastore.api.CompactionType; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; import org.apache.hadoop.hive.ql.txn.compactor.metrics.DeltaFilesMetricReporter; +import org.apache.tez.dag.api.TezConfiguration; +import org.junit.After; import org.junit.Assert; import org.junit.Test; import java.text.MessageFormat; import java.util.HashMap; -import java.util.Map; import java.util.concurrent.TimeUnit; -import static org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.equivalent; -import static org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.gaugeToMap; +import static org.apache.hadoop.hive.ql.txn.compactor.TestDeltaFilesMetrics.gaugeToMap; +import static org.apache.hadoop.hive.ql.txn.compactor.TestDeltaFilesMetrics.verifyMetricsMatch; import static org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.executeStatementOnDriver; public class TestCompactionMetricsOnTez extends CompactorOnTezTest { - @Test - public void testDeltaFilesMetric() throws Exception { -MetricsFactory.close(); -HiveConf conf = driver.getConf(); + /** + * Use {@link CompactorOnTezTest#setupWithConf(org.apache.hadoop.hive.conf.HiveConf)} when HiveConf is + * configured to your liking. + */ + @Override + public void setup() { + } -HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, true); -MetricsFactory.init(conf); + @After + public void tearDown() { +DeltaFilesMetricReporter.close(); + } + private void configureMetrics(HiveConf conf) { HiveConf.setIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD, 0); HiveConf.setIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD, 0); HiveConf.setTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, 1, TimeUnit.SECONDS); HiveConf.setTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_CHECK_THRESHOLD, 0, TimeUnit.SECONDS); HiveConf.setFloatVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_PCT_THRESHOLD, 0.7f); + } + + @Test + public void testDeltaFilesMetric() throws Exception { +HiveConf conf = new HiveConf(); +HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, true); +configureMetrics(conf); +setupWithConf(conf); Review comment: We need to be able to set different (conflicting) configs before setupWithConf is called. setup() would need to be parametrized. Do you know how to do that? ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetricsOnTez.java ## @@ -105,12 +118,100 @@ public void testDeltaFilesMetric() throws Exception { executeStatementOnDriver("select avg(b) from " + tableName, driver); Thread.sleep(1000); -Assert.assertTrue( - equivalent( -new HashMap() {{ +verifyMetricsMatch(new HashMap() {{ put(tableName + Path.SEPARATOR + partitionToday, "1"); -}}, gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS))); +}}, gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS)); + } -DeltaFilesMetricReporter.close(); + /** + * Queries shouldn't fail, but metrics should be 0, if tez.counters.max limit is passed. + * @throws Exception + */ + @Test + public void testDeltaFilesMetricTezMaxCounters() throws Exception { +HiveConf conf = new HiveConf(); +conf.setInt(TezConfiguration.TEZ_COUNTERS_MAX, 50); +HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, true); +configureMetrics(conf); +setupWithConf(conf); + +MetricsFactory.close(); +MetricsFactory.init(conf); +DeltaFilesMetricReporter.init(conf); + +String tableName = "test_metrics"; +CompactorOnTezTest.TestD
[jira] [Work started] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high
[ https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25474 started by guangbao zhao. > concurrency add jars cause hiveserver2 sys cpu to high > -- > > Key: HIVE-25474 > URL: https://issues.apache.org/jira/browse/HIVE-25474 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Affects Versions: 3.1.2 >Reporter: guangbao zhao >Assignee: guangbao zhao >Priority: Major > Fix For: 3.1.2 > > Attachments: HIVE-25474.patch > > > In the Linux environment, when there are multiple concurrent add jars through > HiveCli or JDBC, the system cpu will increase. The currently used > FileUtil.chmod(dest, "ugo+rx", true); method is used for file authorization, > However, in jdk7+, can use Files.setPosixFilePermissions(path, perms); for > file authorization. The performance is seventy to eighty times that of the > above. Why not apply this method? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit
[ https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=641022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641022 ] ASF GitHub Bot logged work on HIVE-25429: - Author: ASF GitHub Bot Created on: 24/Aug/21 10:05 Start Date: 24/Aug/21 10:05 Worklog Time Spent: 10m Work Description: lcspinter commented on a change in pull request #2563: URL: https://github.com/apache/hive/pull/2563#discussion_r694684449 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetricsOnTez.java ## @@ -105,12 +118,100 @@ public void testDeltaFilesMetric() throws Exception { executeStatementOnDriver("select avg(b) from " + tableName, driver); Thread.sleep(1000); -Assert.assertTrue( - equivalent( -new HashMap() {{ +verifyMetricsMatch(new HashMap() {{ put(tableName + Path.SEPARATOR + partitionToday, "1"); -}}, gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS))); +}}, gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS)); + } -DeltaFilesMetricReporter.close(); + /** + * Queries shouldn't fail, but metrics should be 0, if tez.counters.max limit is passed. + * @throws Exception + */ + @Test + public void testDeltaFilesMetricTezMaxCounters() throws Exception { +HiveConf conf = new HiveConf(); +conf.setInt(TezConfiguration.TEZ_COUNTERS_MAX, 50); +HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, true); +configureMetrics(conf); +setupWithConf(conf); + +MetricsFactory.close(); +MetricsFactory.init(conf); +DeltaFilesMetricReporter.init(conf); + +String tableName = "test_metrics"; +CompactorOnTezTest.TestDataProvider testDataProvider = new CompactorOnTezTest.TestDataProvider(); +testDataProvider.createFullAcidTable(tableName, true, false); +// Create 51 partitions +for (int i = 0; i < 51; i++) { + executeStatementOnDriver("insert into " + tableName + " values('1', " + i * i + ", '" + i + "')", driver); +} + +// Touch all partitions +executeStatementOnDriver("select avg(b) from " + tableName, driver); +Thread.sleep(1000); + +Assert.assertEquals(0, gaugeToMap(MetricsConstants.COMPACTION_NUM_DELTAS).size()); +Assert.assertEquals(0, gaugeToMap(MetricsConstants.COMPACTION_NUM_OBSOLETE_DELTAS).size()); +Assert.assertEquals(0, gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS).size()); + } + + /** + * Queries should succeed if additional acid metrics are disabled. + * @throws Exception + */ + @Test + public void testDeltaFilesMetricWithMetricsDisabled() throws Exception { +HiveConf conf = new HiveConf(); +HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, false); +MetastoreConf.setBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON, true); +configureMetrics(conf); +super.setupWithConf(conf); + +MetricsFactory.close(); +MetricsFactory.init(conf); + +String tableName = "test_metrics"; +CompactorOnTezTest.TestDataProvider testDataProvider = new CompactorOnTezTest.TestDataProvider(); +testDataProvider.createFullAcidTable(tableName, true, false); +testDataProvider.insertTestDataPartitioned(tableName); + +executeStatementOnDriver("select avg(b) from " + tableName, driver); + +try { + Assert.assertEquals(0, gaugeToMap(MetricsConstants.COMPACTION_NUM_DELTAS).size()); Review comment: Would it be possible to move this assertion to function level? `@Test(expected = javax.management.InstanceNotFoundException.class)` ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetricsOnTez.java ## @@ -17,42 +17,59 @@ */ package org.apache.hadoop.hive.ql.txn.compactor; -import com.codahale.metrics.Gauge; import org.apache.commons.lang3.RandomStringUtils; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.common.metrics.common.MetricsFactory; -import org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics; import org.apache.hadoop.hive.conf.HiveConf; import org.apache.hadoop.hive.metastore.api.CompactionType; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; import org.apache.hadoop.hive.ql.txn.compactor.metrics.DeltaFilesMetricReporter; +import org.apache.tez.dag.api.TezConfiguration; +import org.junit.After; import org.junit.Assert; import org.junit.Test; import java.text.MessageFormat; import java.util.HashMap; -import java.util.Map; import java.util.concurrent.TimeUnit; -import static org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.equivalent; -import static org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.gaugeToMap; +import sta
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641015 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:28 Start Date: 24/Aug/21 09:28 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694680303 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java ## @@ -247,6 +243,12 @@ public OrcEncodedDataReader(LowLevelCache lowLevelCache, BufferUsageManager buff this.jobConf = jobConf; // TODO: setFileMetadata could just create schema. Called in two places; clean up later. this.evolution = sef.createSchemaEvolution(fileMetadata.getSchema()); + +fileIncludes = includes.generateFileIncludes(fileSchema); Review comment: Ok.. it is covered by the tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641015) Time Spent: 2h (was: 1h 50m) > Add LLAP IO support for Iceberg ORC tables > -- > > Key: HIVE-25453 > URL: https://issues.apache.org/jira/browse/HIVE-25453 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641014 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:27 Start Date: 24/Aug/21 09:27 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694679451 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java ## @@ -247,6 +243,12 @@ public OrcEncodedDataReader(LowLevelCache lowLevelCache, BufferUsageManager buff this.jobConf = jobConf; // TODO: setFileMetadata could just create schema. Called in two places; clean up later. this.evolution = sef.createSchemaEvolution(fileMetadata.getSchema()); + +fileIncludes = includes.generateFileIncludes(fileSchema); Review comment: What happens if there are multiple files with different schema? The test data is created this way: - Iceberg table created with 2 columns - Data inserted with 2 columns - Iceberg table schema modified - Data inserted with the modified schema ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641014) Time Spent: 1h 50m (was: 1h 40m) > Add LLAP IO support for Iceberg ORC tables > -- > > Key: HIVE-25453 > URL: https://issues.apache.org/jira/browse/HIVE-25453 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641011 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:20 Start Date: 24/Aug/21 09:20 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694674515 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ## @@ -2693,6 +2696,77 @@ public static TypeDescription getDesiredRowTypeDescr(Configuration conf, return result; } + /** + * Based on the file schema and the low level file includes provided in the SchemaEvolution instance, this method + * calculates which top level columns should be included i.e. if any of the nested columns inside complex types is + * required, then its relevant top level parent column will be considered as required (and thus the full subtree). + * Hive and LLAP currently only supports column pruning on the first level, thus we need to calculate this ourselves. + * @param evolution + * @return bool array of include values, where 0th element is root struct, and any Nth element is a first level + * column within that + */ + public static boolean[] firstLevelFileIncludes(SchemaEvolution evolution) { +// This is the leaf level type description include bool array +boolean[] lowLevelIncludes = evolution.getFileIncluded(); +Map idMap = new HashMap<>(); +Map parentIdMap = new HashMap<>(); +idToFieldSchemaMap(evolution.getFileSchema(), idMap, parentIdMap); + +// Root + N top level columns... +boolean[] result = new boolean[evolution.getFileSchema().getChildren().size() + 1]; + +Set requiredTopLevelSchemaIds = new HashSet<>(); +for (int i = 1; i < lowLevelIncludes.length; ++i) { + if (lowLevelIncludes[i]) { +int topLevelParentId = getTopLevelParentId(i, parentIdMap); +if (!requiredTopLevelSchemaIds.contains(topLevelParentId)) { + requiredTopLevelSchemaIds.add(topLevelParentId); +} + } +} + +List topLevelFields = evolution.getFileSchema().getChildren(); + +for (int typeDescriptionId : requiredTopLevelSchemaIds) { + result[IntStream.range(0, topLevelFields.size()).filter( + i -> typeDescriptionId == topLevelFields.get(i).getId()).findFirst().getAsInt() + 1] = true; +} + +return result; + } + + /** + * Recursively builds 2 maps: + * ID to type description + * child to parent type description Review comment: child to partent id? ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ## @@ -2693,6 +2696,77 @@ public static TypeDescription getDesiredRowTypeDescr(Configuration conf, return result; } + /** + * Based on the file schema and the low level file includes provided in the SchemaEvolution instance, this method + * calculates which top level columns should be included i.e. if any of the nested columns inside complex types is + * required, then its relevant top level parent column will be considered as required (and thus the full subtree). + * Hive and LLAP currently only supports column pruning on the first level, thus we need to calculate this ourselves. + * @param evolution + * @return bool array of include values, where 0th element is root struct, and any Nth element is a first level + * column within that + */ + public static boolean[] firstLevelFileIncludes(SchemaEvolution evolution) { +// This is the leaf level type description include bool array +boolean[] lowLevelIncludes = evolution.getFileIncluded(); +Map idMap = new HashMap<>(); +Map parentIdMap = new HashMap<>(); +idToFieldSchemaMap(evolution.getFileSchema(), idMap, parentIdMap); + +// Root + N top level columns... +boolean[] result = new boolean[evolution.getFileSchema().getChildren().size() + 1]; + +Set requiredTopLevelSchemaIds = new HashSet<>(); +for (int i = 1; i < lowLevelIncludes.length; ++i) { + if (lowLevelIncludes[i]) { +int topLevelParentId = getTopLevelParentId(i, parentIdMap); +if (!requiredTopLevelSchemaIds.contains(topLevelParentId)) { + requiredTopLevelSchemaIds.add(topLevelParentId); +} + } +} + +List topLevelFields = evolution.getFileSchema().getChildren(); + +for (int typeDescriptionId : requiredTopLevelSchemaIds) { + result[IntStream.range(0, topLevelFields.size()).filter( + i -> typeDescriptionId == topLevelFields.get(i).getId()).findFirst().getAsInt() + 1] = true; +} + +return result; + } + + /** + * Recursively builds 2 maps: + * ID to type description + * child to parent type description Review comment: chil
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641010 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:19 Start Date: 24/Aug/21 09:19 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694673583 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ## @@ -2693,6 +2696,77 @@ public static TypeDescription getDesiredRowTypeDescr(Configuration conf, return result; } + /** + * Based on the file schema and the low level file includes provided in the SchemaEvolution instance, this method + * calculates which top level columns should be included i.e. if any of the nested columns inside complex types is + * required, then its relevant top level parent column will be considered as required (and thus the full subtree). + * Hive and LLAP currently only supports column pruning on the first level, thus we need to calculate this ourselves. Review comment: So for ACID tables column pruning is not working, since the data is contained in the row struct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641010) Time Spent: 1.5h (was: 1h 20m) > Add LLAP IO support for Iceberg ORC tables > -- > > Key: HIVE-25453 > URL: https://issues.apache.org/jira/browse/HIVE-25453 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641009 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:18 Start Date: 24/Aug/21 09:18 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694672513 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java ## @@ -720,13 +750,54 @@ public SchemaEvolution createSchemaEvolution(TypeDescription fileSchema) { readerSchema, readerLogicalColumnIds); Reader.Options options = new Reader.Options(jobConf) .include(readerIncludes).includeAcidColumns(includeAcidColumns); - return new SchemaEvolution(fileSchema, readerSchema, options); + evolution = new SchemaEvolution(fileSchema, readerSchema, options); + + generateLogicalOrderedColumnIds(); + return evolution; +} + +/** + * LLAP IO always returns the column vectors in the order as they are seen in the file. + * To support logical column reordering, we need to do a matching between file and read schemas. + * (this only supports one level of schema reordering, not within complex types, also not supported for ORC ACID) + */ +private void generateLogicalOrderedColumnIds() { Review comment: Maybe some debug level logging? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641009) Time Spent: 1h 20m (was: 1h 10m) > Add LLAP IO support for Iceberg ORC tables > -- > > Key: HIVE-25453 > URL: https://issues.apache.org/jira/browse/HIVE-25453 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641006&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641006 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:16 Start Date: 24/Aug/21 09:16 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694671199 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/ColumnVectorProducer.java ## @@ -48,14 +48,17 @@ boolean[] generateFileIncludes(TypeDescription fileSchema); List getPhysicalColumnIds(); List getReaderLogicalColumnIds(); + Review comment: nit: Why is this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641006) Time Spent: 1h (was: 50m) > Add LLAP IO support for Iceberg ORC tables > -- > > Key: HIVE-25453 > URL: https://issues.apache.org/jira/browse/HIVE-25453 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641007 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:16 Start Date: 24/Aug/21 09:16 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694671519 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/ColumnVectorProducer.java ## @@ -48,14 +48,17 @@ boolean[] generateFileIncludes(TypeDescription fileSchema); List getPhysicalColumnIds(); List getReaderLogicalColumnIds(); + TypeDescription[] getBatchReaderTypes(TypeDescription fileSchema); + Review comment: nit: Why do we have this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641007) Time Spent: 1h 10m (was: 1h) > Add LLAP IO support for Iceberg ORC tables > -- > > Key: HIVE-25453 > URL: https://issues.apache.org/jira/browse/HIVE-25453 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641004 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:14 Start Date: 24/Aug/21 09:14 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694669605 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java ## @@ -158,8 +167,11 @@ private LlapRecordReader(MapWork mapWork, JobConf job, FileSplit split, rbCtx = ctx != null ? ctx : LlapInputFormat.createFakeVrbCtx(mapWork); isAcidScan = AcidUtils.isFullAcidScan(jobConf); -TypeDescription schema = OrcInputFormat.getDesiredRowTypeDescr( -job, isAcidScan, Integer.MAX_VALUE); + +String icebergOrcSchema = job.get(ColumnProjectionUtils.ICEBERG_ORC_SCHEMA_STRING); Review comment: This is a little strange here. I would try to avoid using Iceberg specific stuff in LLAP packages. Maybe a job config containing the requested schema? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641004) Time Spent: 50m (was: 40m) > Add LLAP IO support for Iceberg ORC tables > -- > > Key: HIVE-25453 > URL: https://issues.apache.org/jira/browse/HIVE-25453 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641003 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:11 Start Date: 24/Aug/21 09:11 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694667660 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java ## @@ -417,15 +419,21 @@ public OrcTail getOrcTailFromCache(Path path, Configuration jobConf, CacheTag ta } @Override - public RecordReader llapVectorizedOrcReaderForPath(Object fileKey, Path path, CacheTag tag, List tableIncludedCols, - JobConf conf, long offset, long length) throws IOException { + public RecordReader llapVectorizedOrcReaderForPath(Object fileKey, Path path, + CacheTag tag, List tableIncludedCols, JobConf conf, long offset, long length, Reporter reporter) + throws IOException { -OrcTail tail = getOrcTailFromCache(path, conf, tag, fileKey); +OrcTail tail = null; +if (tag != null) { Review comment: Why not put this inside the `getOrcTailFromCache`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641003) Time Spent: 40m (was: 0.5h) > Add LLAP IO support for Iceberg ORC tables > -- > > Key: HIVE-25453 > URL: https://issues.apache.org/jira/browse/HIVE-25453 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641002 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 09:06 Start Date: 24/Aug/21 09:06 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694663951 ## File path: itests/src/test/resources/testconfiguration.properties ## @@ -1251,4 +1251,11 @@ erasurecoding.only.query.files=\ # tests that requires external database connection externalDB.llap.query.files=\ dataconnector.q,\ - dataconnector_mysql.q \ No newline at end of file + dataconnector_mysql.q + +iceberg.llap.query.files=\ Review comment: We might want to use specific directories specified by the driver and then we do not have to use `testconfiguration.properties`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641002) Time Spent: 0.5h (was: 20m) > Add LLAP IO support for Iceberg ORC tables > -- > > Key: HIVE-25453 > URL: https://issues.apache.org/jira/browse/HIVE-25453 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted
[ https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105218#comment-17105218 ] Marc Demierre edited comment on HIVE-23437 at 8/24/21, 8:54 AM: We tried a workaround on the client side to ensure the calls are not simultaneous by delaying them. It didn't solve the issue, only made it rarer. We also observed a second instance of the problem which is slightly different: * T1: ** R1 creates the directory, then is paused/waiting * T2: ** R2 arrives, does not create the directory as it exists ** R2 creates the partition (wins the race on DB) and completes * T3: ** R1 resumes, sees that it failed the DB transaction, deletes the folder Relevant logs (R1=2558, R2=2556): {code:java} 2020-05-11 20:00:00,944 INFO [pool-7-thread-2558]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:00,945 INFO [pool-7-thread-2558]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydomain.net ip=10.222.76.2 cmd=append_partition_by_name: db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:01,311 INFO [pool-7-thread-2556]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(775)) - 2556: append_partition_by_name: db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:01,311 INFO [pool-7-thread-2556]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydomain.net ip=10.222.76.2 cmd=append_partition_by_name: db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:01,481 INFO [pool-7-thread-2558]: common.FileUtils (FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:01,521 WARN [pool-7-thread-2556]: hive.log (MetaStoreUtils.java:updatePartitionStatsFast(352)) - Updating partition stats fast for: myproj_dev_debug_hive_4 2020-05-11 20:00:01,537 WARN [pool-7-thread-2556]: hive.log (MetaStoreUtils.java:updatePartitionStatsFast(355)) - Updated size to 0 2020-05-11 20:00:01,764 INFO [pool-7-thread-2558]: metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - deleting hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:01,787 INFO [pool-7-thread-2558]: fs.TrashPolicyDefault (TrashPolicyDefault.java:moveToTrash(168)) - Moved: 'hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18' to trash at: hdfs://platform/user/kafka-dump/.Trash/Current/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:01,787 INFO [pool-7-thread-2558]: metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved to trash: hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:01,788 ERROR [pool-7-thread-2558]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDODataStoreException: Insert of object "org.apache.hadoop.hive.metastore.model.MPartition@3254e57d" using statement "INSERT INTO "PARTITIONS" ("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique constraint "UNIQUEPARTITION" 2020-05-11 20:00:03,788 INFO [pool-7-thread-2558]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:03,788 INFO [pool-7-thread-2558]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydomain.net ip=10.222.76.2 cmd=append_partition_by_name: db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18 2020-05-11 20:00:03,869 ERROR [pool-7-thread-2558]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(203)) - AlreadyExistsException(message:Partition already exists:Partition(values:[ingestion, hourly, 2020-05-11, 18], dbName:myproj_dev_autodump, tableName:myproj_dev_debug_hive_4, createTime:0, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:age, type:i
[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted
[ https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Demierre updated HIVE-23437: - Description: There seems to be a race condition in Hive Metasore when issuing several concurrent partition creation requests for the same new partition. In our case, this triggered due to Kafka Connect Hive integration, which fires simultaneous partition creation requests from all its tasks when syncing to Hive. We are running HDP 2.6.5 but a quick survey of the upstream code still shows the same in 3.1.2 (latest Hive release). Our investigation pointed to the following code (here in Hive 2.1.0, the base for HDP 2.6.5): [https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127] Same code in 3.1.2: https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3202 The generic scenario is the following: # T1 (time period 1): ** R1 (request 1) creates the HDFS dir ** R2 also tries creating the HDFS dir ** Both succeed (as if it already exists it succeeds, R1/R2 could be interverted) # T2: ** R1 creates the partition in metastore DB, all OK # T3: ** R2 tries to create partition in metastore DB, gets exception from DB because it exists. Rollback transaction. ** R2 thinks it created the directory (in fact they both did we do not know who), so it removes it # T4: State is invalid: ## Partition exists ## HDFS folder does not exist ## Some Hive/Spark queries fail when trying to use the folder Here are some logs of the issue happening on our cluster in a standalone metastore (R1 = thread 2303, R2 = thread 2302): {code:none} 2020-05-11 13:43:46,379 INFO [pool-7-thread-2303]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: db=myproj_autodump tbl=myproj_debug_hive_4 part=time=ingestion/buc ket=hourly/date=2020-05-11/hour=11 2020-05-11 13:43:46,379 INFO [pool-7-thread-2302]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: db=myproj_autodump tbl=myproj_debug_hive_4 part=time=ingestion/buc ket=hourly/date=2020-05-11/hour=11 2020-05-11 13:43:46,379 INFO [pool-7-thread-2303]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydoman.net ip=10.222.76.1 cmd=append_partition_by_name : db=myproj_autodump tbl=myproj_debug_hive_4 part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11 2020-05-11 13:43:46,379 INFO [pool-7-thread-2302]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydoman.net ip=10.222.76.1 cmd=append_partition_by_name : db=myproj_autodump tbl=myproj_debug_hive_4 part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11 2020-05-11 13:43:47,953 INFO [pool-7-thread-2302]: common.FileUtils (FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly /date=2020-05-11/hour=11 2020-05-11 13:43:47,957 INFO [pool-7-thread-2303]: common.FileUtils (FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly /date=2020-05-11/hour=11 2020-05-11 13:43:47,986 INFO [pool-7-thread-2302]: metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - deleting hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/dat e=2020-05-11/hour=11 2020-05-11 13:43:47,992 INFO [pool-7-thread-2302]: fs.TrashPolicyDefault (TrashPolicyDefault.java:moveToTrash(168)) - Moved: 'hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11' to trash at: hdfs://platfrom/user/kafka-dump/.Trash/Current/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11 2020-05-11 13:43:47,993 INFO [pool-7-thread-2302]: metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved to trash: hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11 2020-05-11 13:43:47,993 ERROR [pool-7-thread-2302]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDODataStoreException: Insert of object "org.apache.hadoop.hive.metastore.model.MPartition@548a5b6c" using statement "INSERT INTO "PARTITIONS" ("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique constraint "UNIQUEPARTITION" Detail: Key ("PART_NAME", "TBL_ID")=(time=ingestion/bucket=hourly/date=2020-05-11/hour=11, 6015512) already exists. at org.datanucleu
[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables
[ https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=640982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640982 ] ASF GitHub Bot logged work on HIVE-25453: - Author: ASF GitHub Bot Created on: 24/Aug/21 08:01 Start Date: 24/Aug/21 08:01 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2586: URL: https://github.com/apache/hive/pull/2586#discussion_r694599589 ## File path: data/conf/iceberg/llap/hive-site.xml ## @@ -0,0 +1,394 @@ + + + + + + + +hive.in.test +true +Internal marker for test. Used for masking env-dependent values + + + +hive.in.iceberg.test +true + + + + + + + + + + +hadoop.tmp.dir +${test.tmp.dir}/hadoop-tmp +A base for other temporary directories. + + + +hive.tez.container.size +128 + + + + +hive.merge.tezfiles +false +Merge small files at the end of a Tez DAG + + + +hive.tez.input.format +org.apache.hadoop.hive.ql.io.HiveInputFormat +The default input format for tez. Tez groups splits in the AM. + + + +hive.exec.scratchdir +${test.tmp.dir}/scratchdir +Scratch space for Hive jobs + + + +datanucleus.schema.autoCreateAll +true + + + +datanucleus.connectionPool.maxPoolSize +4 + + + +hive.exec.local.scratchdir +${test.tmp.dir}/localscratchdir/ +Local scratch space for Hive jobs + + + +javax.jdo.option.ConnectionURL + jdbc:derby:memory:${test.tmp.dir}/junit_metastore_db;create=true + + + +hive.metastore.schema.verification +false + + + +javax.jdo.option.ConnectionDriverName +org.apache.derby.jdbc.EmbeddedDriver + + + +javax.jdo.option.ConnectionUserName +APP + + + +javax.jdo.option.ConnectionPassword +mine + + + + +hive.metastore.warehouse.dir +${test.warehouse.dir} + + + + +hive.metastore.metadb.dir +file://${test.tmp.dir}/metadb/ + +Required by metastore server or if the uris argument below is not supplied + + + + +test.log.dir +${test.tmp.dir}/log/ + + + + +test.data.files +${hive.root}/data/files + + + + +test.data.scripts +${hive.root}/data/scripts + + + + +hive.jar.path + ${maven.local.repository}/org/apache/hive/hive-exec/${hive.version}/hive-exec-${hive.version}.jar + + + + +hive.metastore.rawstore.impl +org.apache.hadoop.hive.metastore.ObjectStore +Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database + + + +hive.querylog.location +${test.tmp.dir}/tmp +Location of the structured hive logs + + + +hive.exec.pre.hooks +org.apache.hadoop.hive.ql.hooks.PreExecutePrinter, org.apache.hadoop.hive.ql.hooks.EnforceReadOnlyTables +Pre Execute Hook for Tests + + + +hive.exec.post.hooks +org.apache.hadoop.hive.ql.hooks.PostExecutePrinter +Post Execute Hook for Tests + + + +hive.support.concurrency +false +Whether hive supports concurrency or not. A zookeeper instance must be up and running for the default hive lock manager to support read-write locks. + + + +fs.pfile.impl +org.apache.hadoop.fs.ProxyLocalFileSystem +A proxy for local file system used for cross file system testing + + + +hive.exec.mode.local.auto +false + +Let hive determine whether to run in local mode automatically +Disabling this for tests so that minimr is not affected + + + + +hive.auto.convert.join +false +Whether Hive enable the optimization about converting common join into mapjoin based on the input file size + + + +hive.ignore.mapjoin.hint +true +Whether Hive ignores the mapjoin hint + + + +io.sort.mb +10 + + + +hive.input.format +org.apache.hadoop.hive.ql.io.CombineHiveInputFormat +The default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombineHiveInputFormat, it can always b
[jira] [Work logged] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-25404?focusedWorklogId=640966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640966 ] ASF GitHub Bot logged work on HIVE-25404: - Author: ASF GitHub Bot Created on: 24/Aug/21 07:00 Start Date: 24/Aug/21 07:00 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2568: URL: https://github.com/apache/hive/pull/2568#discussion_r694551954 ## File path: ql/src/test/queries/clientpositive/merge_partitioned_insert.q ## @@ -0,0 +1,19 @@ +--! qt:transactional + +drop table u; +drop table t; Review comment: I haven't found where is table `t` created but `t1` and `t2`. ## File path: ql/src/test/queries/clientpositive/merge_partitioned_insert.q ## @@ -0,0 +1,19 @@ +--! qt:transactional + +drop table u; +drop table t; + +create table u(id integer); +insert into u values(3); + +create table t1(id integer, value string default 'def'); +insert into t1 values(1,'xx'); +insert into t1 (id) values(2); + +merge into t1 t using u on t.id=u.id when not matched then insert (id) values (u.id); + Review comment: Do we have tests that checks the content of the target table after merge like `select * from t1` ? ## File path: ql/src/test/queries/clientpositive/merge_partitioned_insert.q ## @@ -0,0 +1,19 @@ +--! qt:transactional + +drop table u; +drop table t; + +create table u(id integer); +insert into u values(3); + +create table t1(id integer, value string default 'def'); +insert into t1 values(1,'xx'); +insert into t1 (id) values(2); + +merge into t1 t using u on t.id=u.id when not matched then insert (id) values (u.id); Review comment: Adding `explain merge into t1 t using u on t.id=u.id when not matched then insert (id) values (u.id);` would help checking the right plan is generated. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 640966) Time Spent: 20m (was: 10m) > Inserts inside merge statements are rewritten incorrectly for partitioned > tables > > > Key: HIVE-25404 > URL: https://issues.apache.org/jira/browse/HIVE-25404 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code} > drop table u;drop table t; > create table t(value string default 'def') partitioned by (id integer); > create table u(id integer); > {code} > #1 id&value specified > rewritten > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause > SELECT `u`.`id`,'x' >WHERE `t`.`id` IS NULL > {code} > #2 when values is not specified > {code} > merge into t using u on t.id=u.id when not matched then insert (id) values > (u.id); > {code} > rewritten query: > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause > SELECT `u`.`id` >WHERE `t`.`id` IS NULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)