[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-22 Thread Saket Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431994#comment-15431994
 ] 

Saket Saurabh commented on HIVE-14199:
--

Thanks [~ekoifman] for reviewing it.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419857#comment-15419857
 ] 

Hive QA commented on HIVE-14199:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823552/HIVE-14199.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10471 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[tez_join_hash]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hive.hcatalog.listener.TestMsgBusConnection.testConnection
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/875/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/875/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-875/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823552 - PreCommit-HIVE-MASTER-Build

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419717#comment-15419717
 ] 

Gopal V commented on HIVE-14199:


[~ekoifman]: The acid_bucket_pruning.q does not have vectorization enabled.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419715#comment-15419715
 ] 

Eugene Koifman commented on HIVE-14199:
---

[~mmccline], is there anything in the query plan that shows that the plan is 
vectorized or not?
In the attached patch, I don't see anything in the plan that indicates that 
it's been vectorized.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415995#comment-15415995
 ] 

Eugene Koifman commented on HIVE-14199:
---

I think what [~gopalv] meant by legacy is "transactional_properties=legacy"  
though the issues with streaming/bucketing were fixed long ago in HIVE-11983

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-07-08 Thread Saket Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368883#comment-15368883
 ] 

Saket Saurabh commented on HIVE-14199:
--

Thanks [~gopalv] for the comment. I have updated the patch with these changes. 
Currently, to disable the codepath for legacy layouts, I do not consider the 
case of matching the bucketName against the 
AcidUtils.LEGACY_BUCKET_DIGIT_PATTERN. So, I am thinking these legacy layouts 
will be ignored then. 

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-07-08 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368799#comment-15368799
 ] 

Gopal V commented on HIVE-14199:


[~saketj]: I recommend reusing the AcidUtils.BUCKET_DIGIT_PATTERN instead of a 
new regex for this case.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)