Laljo John Pullokkaran created HIVE-6540:
--------------------------------------------
Summary: Support Multi Column Stats
Key: HIVE-6540
URL: https://issues.apache.org/jira/browse/HIVE-6540
Project: Hive
Issue Type: Improvement
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
For Joins involving compound predicates, multi column stats can be used to
accurately compute the NDV.
Objective is to compute NDV of more than one columns.
Compute NDV of (x,y,z).
R1 IJ R2 on R1.x=R2.x and R1.y=R2.y and R1.z=R2.z can use max(NDV(R1.x, R1.y,
R1.z), NDV(R2.x, R2.y, R2.z)) for Join NDV (& hence selectivity).
http://www.oracle-base.com/articles/11g/statistics-collection-enhancements-11gr1.php#multi_column_statistics
http://blogs.msdn.com/b/ianjo/archive/2005/11/10/491548.aspx
http://developer.teradata.com/database/articles/removing-multi-column-statistics-a-process-for-identification-of-redundant-statist
--
This message was sent by Atlassian JIRA
(v6.2#6252)