[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-28 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353289#comment-15353289
 ] 

Mohit Sabharwal commented on HIVE-13884:


LGTM. 

A unit test would be great in TestHiveMetaStore#testListPartitions - we can do 
that as a follow-up item.

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, 
> HIVE-13884.6.patch, HIVE-13884.7.patch, HIVE-13884.8.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353057#comment-15353057
 ] 

Sergio Peña commented on HIVE-13884:


[~mohitsabharwal] [~szehon] The patch is ready, could you let me know if there 
are other comments or if I can commit this?

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, 
> HIVE-13884.6.patch, HIVE-13884.7.patch, HIVE-13884.8.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352278#comment-15352278
 ] 

Hive QA commented on HIVE-13884:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12813827/HIVE-13884.8.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10273 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_nonvec_mapwork_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/280/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/280/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-280/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12813827 - PreCommit-HIVE-MASTER-Build

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, 
> HIVE-13884.6.patch, HIVE-13884.7.patch, HIVE-13884.8.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342416#comment-15342416
 ] 

Sergio Peña commented on HIVE-13884:


Thanks [~sershe].

[~mohitsabharwal] [~brocknoland] I run a test with 10K partitions {{select * 
from table12 where dt < 1}} with the variable enabled and disabled. There's 
not too much difference. I got a difference of 1 second, and I tested it 5 
times each time, even without the patch applied. I think we are good to go for 
this.

I'll wait until HIVE-14055 is fixed as I would need to change this patch as 
well.

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340603#comment-15340603
 ] 

Sergey Shelukhin commented on HIVE-13884:
-

It would fall back to ORM in this case. Assuming there was ORM implementation 
in the original patch

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340594#comment-15340594
 ] 

Sergio Peña commented on HIVE-13884:


If {{MetastoreDirectSql.getNumPartitionsViaSqlFilter()}} returns an error or 
throws an exception whenever the internal 
{{PartitionFilterGenerator.generateSqlFilter}} fails, then how should we handle 
the partition limit request? There is no data to validate this, and we cannot 
abort the query because of this.

[~sershe] [~mohitsabharwal] Any ideas on this? Should we fix the 
{[generateSqlFilter}} to avoid returning NULL when the filter cannot be formed?

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340585#comment-15340585
 ] 

Sergio Peña commented on HIVE-13884:


Sometimes, the {{MetastoreDirectSql.getNumPartitionsViaSqlFilter()}} returns 0 
when the query filter expression couldn't be created. This number makes a false 
positive to the limit request if the number of partitions is too large, so 
causing the query to fetch all partitions.

HIVE-14055 is required for this patch.

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338103#comment-15338103
 ] 

Hive QA commented on HIVE-13884:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12811387/HIVE-13884.6.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10235 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_table_nonprintable
org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsCheck
org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testTableCheck
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/167/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/167/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-167/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12811387 - PreCommit-HIVE-MASTER-Build

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-17 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337115#comment-15337115
 ] 

Szehon Ho commented on HIVE-13884:
--

+1 from my side, pending one last comment on RB, and also the other reviews 
from Mohit.

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335558#comment-15335558
 ] 

Hive QA commented on HIVE-13884:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12811139/HIVE-13884.5.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 27 failed/errored test(s), 10234 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_13
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_orig_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_lvj_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin_3way
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_nonvec_mapwork_part_all_primitive
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_bmj_schema_evolution
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_decimal
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_when_case_null
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_table_nonprintable
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning
org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsCheck
org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testTableCheck
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/146/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/146/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-146/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 27 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12811139 - PreCommit-HIVE-MASTER-Build

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition 

[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332774#comment-15332774
 ] 

Hive QA commented on HIVE-13884:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12810920/HIVE-13884.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 217 failed/errored test(s), 10233 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucketpruning1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby_empty
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_udf_udaf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_views
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_windowing
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_mat_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_mat_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_whole_partition
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_4

[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-13 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328764#comment-15328764
 ] 

Mohit Sabharwal commented on HIVE-13884:


Since we are moving the functionality from driver to HMS, should we deprecate 
{{hive.limit.query.max.table.partition}} and introduce a new config called 
{{hive.metastore.retrieve.max.partitions}} ?

All metastore configs have "hive.metastore" prefix. 

Otherwise:
1) The change is backward incompatible for existing users that
are setting this config at HS2 level and are now expected to set it
at HMS level to get the same functionality.
2) Name would be confusing.

We could do the following:
1) Mark {{hive.limit.query.max.table.partition}} as deprecated in HiveConf and 
suggest that user move to {{hive.metastore.retrieve.max.partitions}} at HMS
level.
2) Do not remove current functionality associated with 
{{hive.limit.query.max.table.partition}} in PartitionPruner.
It does do what the description promises - i.e. fail the query before execution 
stage if number of 
partitions associated with any scan operator exceed configured value.
3) Add new config {{hive.metastore.retrieve.max.partitions}} to configure 
functionality in this patch.

Makes sense ?

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328370#comment-15328370
 ] 

Hive QA commented on HIVE-13884:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809979/HIVE-13884.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 361 failed/errored test(s), 10226 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_tbl_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_const
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_part_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl_dp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cp_sel
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_global_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_unused
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_merge_multi_expressions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_partition_metadataonly
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_subquery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_null_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_int_type_promotion
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_date
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_decode_name
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_special_char
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_timestamp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_timestamp2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_type_check
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_type_in_plan
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_varchar1

[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327997#comment-15327997
 ] 

Hive QA commented on HIVE-13884:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809966/HIVE-13884.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/115/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/115/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-115/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-service-rpc ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/service-rpc/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/service-rpc/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/service-rpc/target/tmp/conf
 [copy] Copying 15 files to 
/data/hive-ptest/working/apache-github-source-source/service-rpc/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-service-rpc ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-service-rpc ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-service-rpc ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.2.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-service-rpc ---
[INFO] 
[INFO] --- maven-jar-plugin:2.2:test-jar (default) @ hive-service-rpc ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.2.0-SNAPSHOT-tests.jar
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ 
hive-service-rpc ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.2.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-service-rpc/2.2.0-SNAPSHOT/hive-service-rpc-2.2.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/service-rpc/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/hive-service-rpc/2.2.0-SNAPSHOT/hive-service-rpc-2.2.0-SNAPSHOT.pom
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.2.0-SNAPSHOT-tests.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-service-rpc/2.2.0-SNAPSHOT/hive-service-rpc-2.2.0-SNAPSHOT-tests.jar
[INFO] 
[INFO] 
[INFO] Building Hive Serde 2.2.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-serde ---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/serde/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/serde 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-serde ---
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-serde 
---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/serde/src/gen/protobuf/gen-java
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/serde/src/gen/thrift/gen-javabean
 added.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-serde ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-serde ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/serde/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-serde ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-serde ---
[INFO] Compiling 414 source files to 
/data/hive-ptest/working/apache-github-source-source/serde/target/classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:
 Recompile 

[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327768#comment-15327768
 ] 

Sergio Peña commented on HIVE-13884:


[~brocknoland] What type of query would you prefer to test? This patch won't 
allow a query to fetch all 100K partitions  if 
{{HiveConf.ConfVars.HIVELIMITTABLESCANPARTITION}} is set to a number > -1. If 
it is not set, no query is executed to reques the # of partitions, so no 
overhead will be added to it.

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-09 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323525#comment-15323525
 ] 

Brock Noland commented on HIVE-13884:
-

Can you test on a MySQL MS with a table with 100K partitions how much latency 
this adds on average?

Otherwise it's reasonable.

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-09 Thread Reuben Kuhnert (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322686#comment-15322686
 ] 

Reuben Kuhnert commented on HIVE-13884:
---

This looks fine to me. A few minor nitpicks about code style / cleanup, but for 
the most part this is clear. Good to move the limit check to the metastore 
rather than downstream during semantic analysis.

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322622#comment-15322622
 ] 

Sergio Peña commented on HIVE-13884:


[~szehon] Do you know about metastore? Could you help me review this patch? Or 
do you know someone who has knowledge on this area?

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319592#comment-15319592
 ] 

Sergio Peña commented on HIVE-13884:


[~hagleitn] [~selinazh] I saw you added the partition limit on HIVE-6492. This 
ticket is extending that limit to the metastore to avoid OOM exceptions when 
fetching too many partitions. Could you help me reviewing this patch? Or do you 
know any other person who understand the metastore too?

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316165#comment-15316165
 ] 

Hive QA commented on HIVE-13884:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12808078/HIVE-13884.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10220 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_partition_metadataonly
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_limit_partition
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_partition_metadataonly
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/13/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/13/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-13/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12808078 - PreCommit-HIVE-MASTER-Build

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312960#comment-15312960
 ] 

Sergio Peña commented on HIVE-13884:


Agree with [~sershe]. We should validate this limit in the metastore side. 

Also, I think we should make the call to get just the # of partitions so that 
it returns 1 row to validate instead of returning all partitions names and 
count them in the code.

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-05-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308319#comment-15308319
 ] 

Sergey Shelukhin commented on HIVE-13884:
-

Should the limit rather be passed to metastore to avoid 2 network roundtrips 
for normal cases?

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)