[ https://issues.apache.org/jira/browse/KYLIN-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090861#comment-15090861 ]
liyang commented on KYLIN-1186: ------------------------------- Just merged the "KYLIN-1186-1.x-staging.2.patch" into 1.x-staging. Overall it's a very good patch. Just a few minor comments. - "metadata" module need not depend on calcite. I dropped calcite from pom.xml and everything is just fine. - Coding style of java files should be 4-space indent. But the expectation is not mentioned anywhere in "How to contribute". I'll amend the doc. - IIQueryTest disablement is no problem for the moment. We have not fully decided how to leverage inverted-index yet. Many thanks Yerui!! Btw, how did you test the patch? I found no document about regression test in Developer Guide. Something to amend too. > Support precise Count Distinct using bitmap > ------------------------------------------- > > Key: KYLIN-1186 > URL: https://issues.apache.org/jira/browse/KYLIN-1186 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Affects Versions: v1.1 > Reporter: Yerui Sun > Assignee: Yerui Sun > Fix For: v2.0, v1.3 > > Attachments: KYLIN-1186-1.x-staging.2.patch, > KYLIN-1186-1.x-staging.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > For now, kylin only support non-precise count distinct by hyperloglog. > In our production scenario, there're strongly requirements for precise count > distinct, mainly for the column of type int or bigint, such as user-id, > product-id, etc. > Implementing of precise count distinct for all types is difficult and not > efficiency. However, only supporting int or bigint make this much easier. The > values can be projected into a bitmap, which is easy to be compressed and > stored, and easy to count. > I've created a POC based on RoaringBitmap, proving that worked. There's some > more work to be done: > * RoaringBitmap only support int, there need a solution to support bigint; > * Add a new measure and codec, like HyperLogLogPlusCounter, make it easy to > use; > * Add new measure on web ui, and check that whether the column type is int > or bigint; -- This message was sent by Atlassian JIRA (v6.3.4#6332)