[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673595#comment-13673595
 ] 

Ashutosh Chauhan commented on HIVE-4435:
----------------------------------------

Sorry for the delay. +1 Will commit if tests pass.
                
> Column stats: Distinct value estimator should use hash functions that are 
> pairwise independent
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4435
>                 URL: https://issues.apache.org/jira/browse/HIVE-4435
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 0.10.0
>            Reporter: Shreepadma Venugopalan
>            Assignee: Shreepadma Venugopalan
>         Attachments: chart_1(1).png, HIVE-4435.1.patch
>
>
> The current implementation of Flajolet-Martin estimator to estimate the 
> number of distinct values doesn't use hash functions that are pairwise 
> independent. This is problematic because the input values don't distribute 
> uniformly. When run on large TPC-H data sets, this leads to a huge 
> discrepancy for primary key columns. Primary key columns are typically a 
> monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to