[
https://issues.apache.org/jira/browse/KYLIN-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yerui Sun updated KYLIN-1186:
-----------------------------
Attachment: KYLIN-1186-1.x-staging.patch
Here's the patch for branch 1.x-staging, and sorry for my slow response.
Some details about this patch:
* Support Integer family (tinyint, smallint, int, bigint) columns, but cast
Long to Integer when store into bitmap, since the bitmap didn't support long
currently;
* Besides query engine code, also updated query test case. Changed the return
type of count (distinct seller_id) from hllc(10) to bitmap in
test_case_data/localmeta/cube_desc, and run query_distinct using
execAndCompQuery instead of batchExecuteQuery in KylinQueryTest.
* Not sure BitmapCounter worked well with II Cube logic, and some interface
leaved to implemented later, such as peekLength(). Since the bitmap is always
not fixed-length, mabye some work still needed here.
Please [[email protected]] [~Shaofengshi] review this patch, I'll attach
the patch for branch 2.x-staging later.
> Support precise Count Distinct using bitmap
> -------------------------------------------
>
> Key: KYLIN-1186
> URL: https://issues.apache.org/jira/browse/KYLIN-1186
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Affects Versions: v1.1
> Reporter: Yerui Sun
> Assignee: Yerui Sun
> Fix For: v2.0, v1.3
>
> Attachments: KYLIN-1186-1.x-staging.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> For now, kylin only support non-precise count distinct by hyperloglog.
> In our production scenario, there're strongly requirements for precise count
> distinct, mainly for the column of type int or bigint, such as user-id,
> product-id, etc.
> Implementing of precise count distinct for all types is difficult and not
> efficiency. However, only supporting int or bigint make this much easier. The
> values can be projected into a bitmap, which is easy to be compressed and
> stored, and easy to count.
> I've created a POC based on RoaringBitmap, proving that worked. There's some
> more work to be done:
> * RoaringBitmap only support int, there need a solution to support bigint;
> * Add a new measure and codec, like HyperLogLogPlusCounter, make it easy to
> use;
> * Add new measure on web ui, and check that whether the column type is int
> or bigint;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)