[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353289#comment-15353289 ] Mohit Sabharwal commented on HIVE-13884: LGTM. A unit test would be great in TestHiveMetaStore#testListPartitions - we can do that as a follow-up item. > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, > HIVE-13884.6.patch, HIVE-13884.7.patch, HIVE-13884.8.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353057#comment-15353057 ] Sergio Peña commented on HIVE-13884: [~mohitsabharwal] [~szehon] The patch is ready, could you let me know if there are other comments or if I can commit this? > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, > HIVE-13884.6.patch, HIVE-13884.7.patch, HIVE-13884.8.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352278#comment-15352278 ] Hive QA commented on HIVE-13884: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12813827/HIVE-13884.8.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10273 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_nonvec_mapwork_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/280/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/280/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-280/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12813827 - PreCommit-HIVE-MASTER-Build > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, > HIVE-13884.6.patch, HIVE-13884.7.patch, HIVE-13884.8.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342416#comment-15342416 ] Sergio Peña commented on HIVE-13884: Thanks [~sershe]. [~mohitsabharwal] [~brocknoland] I run a test with 10K partitions {{select * from table12 where dt < 1}} with the variable enabled and disabled. There's not too much difference. I got a difference of 1 second, and I tested it 5 times each time, even without the patch applied. I think we are good to go for this. I'll wait until HIVE-14055 is fixed as I would need to change this patch as well. > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340603#comment-15340603 ] Sergey Shelukhin commented on HIVE-13884: - It would fall back to ORM in this case. Assuming there was ORM implementation in the original patch > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340594#comment-15340594 ] Sergio Peña commented on HIVE-13884: If {{MetastoreDirectSql.getNumPartitionsViaSqlFilter()}} returns an error or throws an exception whenever the internal {{PartitionFilterGenerator.generateSqlFilter}} fails, then how should we handle the partition limit request? There is no data to validate this, and we cannot abort the query because of this. [~sershe] [~mohitsabharwal] Any ideas on this? Should we fix the {[generateSqlFilter}} to avoid returning NULL when the filter cannot be formed? > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340585#comment-15340585 ] Sergio Peña commented on HIVE-13884: Sometimes, the {{MetastoreDirectSql.getNumPartitionsViaSqlFilter()}} returns 0 when the query filter expression couldn't be created. This number makes a false positive to the limit request if the number of partitions is too large, so causing the query to fetch all partitions. HIVE-14055 is required for this patch. > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338103#comment-15338103 ] Hive QA commented on HIVE-13884: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12811387/HIVE-13884.6.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10235 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_table_nonprintable org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsCheck org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testTableCheck {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/167/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/167/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-167/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12811387 - PreCommit-HIVE-MASTER-Build > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337115#comment-15337115 ] Szehon Ho commented on HIVE-13884: -- +1 from my side, pending one last comment on RB, and also the other reviews from Mohit. > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335558#comment-15335558 ] Hive QA commented on HIVE-13884: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12811139/HIVE-13884.5.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 27 failed/errored test(s), 10234 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_13 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_orig_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_lvj_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin_3way org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_nonvec_mapwork_part_all_primitive org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_bmj_schema_evolution org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_decimal org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_when_case_null org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_table_nonprintable org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsCheck org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testTableCheck {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/146/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/146/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-146/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 27 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12811139 - PreCommit-HIVE-MASTER-Build > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetc
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332774#comment-15332774 ] Hive QA commented on HIVE-13884: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12810920/HIVE-13884.4.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 217 failed/errored test(s), 10233 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_2_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucketpruning1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby_empty org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_simple_select org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_udf_udaf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_union org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_views org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_windowing org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_count org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_mat_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_mat_5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_tmp_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_no_match org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_whole_partition org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.tes
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328764#comment-15328764 ] Mohit Sabharwal commented on HIVE-13884: Since we are moving the functionality from driver to HMS, should we deprecate {{hive.limit.query.max.table.partition}} and introduce a new config called {{hive.metastore.retrieve.max.partitions}} ? All metastore configs have "hive.metastore" prefix. Otherwise: 1) The change is backward incompatible for existing users that are setting this config at HS2 level and are now expected to set it at HMS level to get the same functionality. 2) Name would be confusing. We could do the following: 1) Mark {{hive.limit.query.max.table.partition}} as deprecated in HiveConf and suggest that user move to {{hive.metastore.retrieve.max.partitions}} at HMS level. 2) Do not remove current functionality associated with {{hive.limit.query.max.table.partition}} in PartitionPruner. It does do what the description promises - i.e. fail the query before execution stage if number of partitions associated with any scan operator exceed configured value. 3) Add new config {{hive.metastore.retrieve.max.partitions}} to configure functionality in this patch. Makes sense ? > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328370#comment-15328370 ] Hive QA commented on HIVE-13884: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809979/HIVE-13884.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 361 failed/errored test(s), 10226 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_tbl_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_const org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_part_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl_dp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cp_sel org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_global_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_unused org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_merge_multi_expressions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_partition_metadataonly org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_subquery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_null_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_int_type_promotion org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_date org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_decode_name org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_special_char org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_timestamp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_timestamp2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_type_check org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_type_in_plan org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_va
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15327997#comment-15327997 ] Hive QA commented on HIVE-13884: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809966/HIVE-13884.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/115/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/115/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-115/ Messages: {noformat} This message was trimmed, see log for full details [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-service-rpc --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/service-rpc/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/service-rpc/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/service-rpc/target/tmp/conf [copy] Copying 15 files to /data/hive-ptest/working/apache-github-source-source/service-rpc/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-service-rpc --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-service-rpc --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-service-rpc --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.2.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-service-rpc --- [INFO] [INFO] --- maven-jar-plugin:2.2:test-jar (default) @ hive-service-rpc --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.2.0-SNAPSHOT-tests.jar [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-service-rpc --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.2.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-service-rpc/2.2.0-SNAPSHOT/hive-service-rpc-2.2.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/service-rpc/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-service-rpc/2.2.0-SNAPSHOT/hive-service-rpc-2.2.0-SNAPSHOT.pom [INFO] Installing /data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.2.0-SNAPSHOT-tests.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-service-rpc/2.2.0-SNAPSHOT/hive-service-rpc-2.2.0-SNAPSHOT-tests.jar [INFO] [INFO] [INFO] Building Hive Serde 2.2.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-serde --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/serde/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/serde (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-serde --- [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-serde --- [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/serde/src/gen/protobuf/gen-java added. [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/serde/src/gen/thrift/gen-javabean added. [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-serde --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-serde --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/serde/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-serde --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-serde --- [INFO] Compiling 414 source files to /data/hive-ptest/working/apache-github-source-source/serde/target/classes [WARNING] /data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java: Some input files use or override a deprecated API. [WARNING] /data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.j
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15327768#comment-15327768 ] Sergio Peña commented on HIVE-13884: [~brocknoland] What type of query would you prefer to test? This patch won't allow a query to fetch all 100K partitions if {{HiveConf.ConfVars.HIVELIMITTABLESCANPARTITION}} is set to a number > -1. If it is not set, no query is executed to reques the # of partitions, so no overhead will be added to it. > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323525#comment-15323525 ] Brock Noland commented on HIVE-13884: - Can you test on a MySQL MS with a table with 100K partitions how much latency this adds on average? Otherwise it's reasonable. > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322686#comment-15322686 ] Reuben Kuhnert commented on HIVE-13884: --- This looks fine to me. A few minor nitpicks about code style / cleanup, but for the most part this is clear. Good to move the limit check to the metastore rather than downstream during semantic analysis. > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322622#comment-15322622 ] Sergio Peña commented on HIVE-13884: [~szehon] Do you know about metastore? Could you help me review this patch? Or do you know someone who has knowledge on this area? > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319592#comment-15319592 ] Sergio Peña commented on HIVE-13884: [~hagleitn] [~selinazh] I saw you added the partition limit on HIVE-6492. This ticket is extending that limit to the metastore to avoid OOM exceptions when fetching too many partitions. Could you help me reviewing this patch? Or do you know any other person who understand the metastore too? > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316165#comment-15316165 ] Hive QA commented on HIVE-13884: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12808078/HIVE-13884.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10220 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_partition_metadataonly org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_limit_partition org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_partition_metadataonly {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/13/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/13/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-13/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12808078 - PreCommit-HIVE-MASTER-Build > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312960#comment-15312960 ] Sergio Peña commented on HIVE-13884: Agree with [~sershe]. We should validate this limit in the metastore side. Also, I think we should make the call to get just the # of partitions so that it returns 1 row to validate instead of returning all partitions names and count them in the code. > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Mohit Sabharwal > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308319#comment-15308319 ] Sergey Shelukhin commented on HIVE-13884: - Should the limit rather be passed to metastore to avoid 2 network roundtrips for normal cases? > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Mohit Sabharwal > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)