Alex Levenson created PARQUET-42:
------------------------------------
Summary: Add HyperLogLog / CountMinSketch to parquet statistics
Key: PARQUET-42
URL: https://issues.apache.org/jira/browse/PARQUET-42
Project: Parquet
Issue Type: New Feature
Components: parquet-mr
Reporter: Alex Levenson
Priority: Minor
HLL and CMS for rowgroups could help with query planning (getting a sense of
data skew) and with cheaply counting approximate distinct values. Both are
commutative which means they can be combined across rowgroups (unlike an exact
distinct count for example).
--
This message was sent by Atlassian JIRA
(v6.2#6252)