[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-17 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337522#comment-15337522
 ] 

Lefty Leverenz commented on HIVE-14018:
---

Doc note:  This adds *hive.stats.filter.in.factor* to HiveConf.java, so it will 
need to be documented for releases 2.1.1 and 2.2.0.

* [Configuration Properties -- Statistics | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Statistics]

Added TODOC2.1.1 and TODOC2.2 labels.

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
>  Labels: TODOC2.1.1, TODOC2.2
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335000#comment-15335000
 ] 

Hive QA commented on HIVE-14018:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12811059/HIVE-14018.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10233 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_table_nonprintable
org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsCheck
org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testTableCheck
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/142/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/142/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-142/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12811059 - PreCommit-HIVE-MASTER-Build

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334061#comment-15334061
 ] 

Hive QA commented on HIVE-14018:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12811059/HIVE-14018.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 10233 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_interval_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_varchar_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_char_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mr_diff_schema_alias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_reduce_groupby_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_string_concat
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mr_diff_schema_alias
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_reduce_groupby_decimal
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_string_concat
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_string_concat
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/138/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/138/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-138/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 30 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12811059 - PreCommit-HIVE-MASTER-Build

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332208#comment-15332208
 ] 

Ashutosh Chauhan commented on HIVE-14018:
-

+1

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Attachments: HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)