Alex Levenson created PARQUET-42:
------------------------------------

             Summary: Add HyperLogLog / CountMinSketch to parquet statistics
                 Key: PARQUET-42
                 URL: https://issues.apache.org/jira/browse/PARQUET-42
             Project: Parquet
          Issue Type: New Feature
          Components: parquet-mr
            Reporter: Alex Levenson
            Priority: Minor


HLL and CMS for rowgroups could help with query planning (getting a sense of 
data skew) and with cheaply counting approximate distinct values. Both are 
commutative which means they can be combined across rowgroups (unlike an exact 
distinct count for example).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to