[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337522#comment-15337522 ] Lefty Leverenz commented on HIVE-14018: --- Doc note: This adds *hive.stats.filter.in.factor* to HiveConf.java, so it will need to be documented for releases 2.1.1 and 2.2.0. * [Configuration Properties -- Statistics | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Statistics] Added TODOC2.1.1 and TODOC2.2 labels. > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: TODOC2.1.1, TODOC2.2 > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14018.1.patch, HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335000#comment-15335000 ] Hive QA commented on HIVE-14018: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12811059/HIVE-14018.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10233 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_table_nonprintable org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsCheck org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testTableCheck {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/142/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/142/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-142/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12811059 - PreCommit-HIVE-MASTER-Build > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Attachments: HIVE-14018.1.patch, HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334061#comment-15334061 ] Hive QA commented on HIVE-14018: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12811059/HIVE-14018.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 10233 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_interval_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_varchar_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_char_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mr_diff_schema_alias org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_reduce_groupby_decimal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_string_concat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mr_diff_schema_alias org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_reduce_groupby_decimal org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_string_concat org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_string_concat org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/138/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/138/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-138/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 30 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12811059 - PreCommit-HIVE-MASTER-Build > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Attachments: HIVE-14018.1.patch, HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332208#comment-15332208 ] Ashutosh Chauhan commented on HIVE-14018: - +1 > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Attachments: HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)