[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431994#comment-15431994 ] Saket Saurabh commented on HIVE-14199: -- Thanks [~ekoifman] for reviewing it. > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Fix For: 2.2.0 > > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, > HIVE-14199.03.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419857#comment-15419857 ] Hive QA commented on HIVE-14199: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12823552/HIVE-14199.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10471 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[tez_join_hash] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] org.apache.hive.hcatalog.listener.TestMsgBusConnection.testConnection {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/875/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/875/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-875/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12823552 - PreCommit-HIVE-MASTER-Build > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, > HIVE-14199.03.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419717#comment-15419717 ] Gopal V commented on HIVE-14199: [~ekoifman]: The acid_bucket_pruning.q does not have vectorization enabled. > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, > HIVE-14199.03.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419715#comment-15419715 ] Eugene Koifman commented on HIVE-14199: --- [~mmccline], is there anything in the query plan that shows that the plan is vectorized or not? In the attached patch, I don't see anything in the plan that indicates that it's been vectorized. > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, > HIVE-14199.03.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415995#comment-15415995 ] Eugene Koifman commented on HIVE-14199: --- I think what [~gopalv] meant by legacy is "transactional_properties=legacy" though the issues with streaming/bucketing were fixed long ago in HIVE-11983 > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368883#comment-15368883 ] Saket Saurabh commented on HIVE-14199: -- Thanks [~gopalv] for the comment. I have updated the patch with these changes. Currently, to disable the codepath for legacy layouts, I do not consider the case of matching the bucketName against the AcidUtils.LEGACY_BUCKET_DIGIT_PATTERN. So, I am thinking these legacy layouts will be ignored then. > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368799#comment-15368799 ] Gopal V commented on HIVE-14199: [~saketj]: I recommend reusing the AcidUtils.BUCKET_DIGIT_PATTERN instead of a new regex for this case. > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)