[ 
https://issues.apache.org/jira/browse/HIVE-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146586#comment-15146586
 ] 

Dhanasekar commented on HIVE-9689:
----------------------------------

Just wanted to know if someone is working on this one. I am a GSOC 2016 
aspirant and would like to know if I can work on this one this summer.

> Store histograms and distinct value estimator's bit vectors in metastore
> ------------------------------------------------------------------------
>
>                 Key: HIVE-9689
>                 URL: https://issues.apache.org/jira/browse/HIVE-9689
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Prasanth Jayachandran
>              Labels: gsoc, gsoc2015, hive, java
>
> Hive currently uses PCSA (Probabilistic Counting and Stochastic Averaging) 
> algorithm to determine distinct cardinality. The NDV value determined from 
> the UDF is stored in the metastore instead of the actual bit vectors. This 
> makes it impossible to estimate the overall NDV across all the partitions (or 
> selected partitions). We should ideally store the bitvectors in the metastore 
> and do server side merging of the bitvectors. Also we could replace the 
> current PCSA algorithm in favour of HyperLogLog if space is a constraint. 
> Also Hive has a UDF for computing histogram. We can persist the histogram in 
> the metastore so that hive optimizer can make better decisions. Also having 
> histograms in metastore can help with order by, skew join and count distinct 
> + group by optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to