[jira] [Commented] (HIVE-6540) Support Multi Column Stats

Alex Nastetsky (JIRA) Thu, 12 Jun 2014 10:21:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029443#comment-14029443
 ]


Alex Nastetsky commented on HIVE-6540:
--------------------------------------

I hope this is included in the next version. It would cut the time in half 
needed to create and validate data transformation by combining the steps needed 
to create the new table and gather statistics on it into one step.

> Support Multi Column Stats
> --------------------------
>
>                 Key: HIVE-6540
>                 URL: https://issues.apache.org/jira/browse/HIVE-6540
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Laljo John Pullokkaran
>            Assignee: Laljo John Pullokkaran
>
> For Joins involving compound predicates, multi column stats can be used to 
> accurately compute the NDV.
> Objective is to compute NDV of more than one columns.
> Compute NDV of (x,y,z).
> R1 IJ R2 on R1.x=R2.x and R1.y=R2.y and R1.z=R2.z can use max(NDV(R1.x, R1.y, 
> R1.z), NDV(R2.x, R2.y, R2.z)) for Join NDV (& hence selectivity).
> http://www.oracle-base.com/articles/11g/statistics-collection-enhancements-11gr1.php#multi_column_statistics
> http://blogs.msdn.com/b/ianjo/archive/2005/11/10/491548.aspx
> http://developer.teradata.com/database/articles/removing-multi-column-statistics-a-process-for-identification-of-redundant-statist



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6540) Support Multi Column Stats

Reply via email to