[ https://issues.apache.org/jira/browse/KYLIN-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
nichunen resolved KYLIN-3487. ----------------------------- Resolution: Fixed > Create a new measure for precise count distinct > ----------------------------------------------- > > Key: KYLIN-3487 > URL: https://issues.apache.org/jira/browse/KYLIN-3487 > Project: Kylin > Issue Type: Improvement > Reporter: Zhong Yanghong > Assignee: Zhong Yanghong > Priority: Major > Fix For: v3.1.0 > > > To compute the precise count distinct, we can use bitmap and global > dictionary. However, there's a limitation for the global dictionary. It maps > from values to ids whose type is integer, which means the number of ids will > be less than 2B. And it's like a Pixiu for which there's increase but no > decrease. > In eBay, there's a requirement of calculating precise count distinct of > session. The session cardinality is large and will grow as time goes on. It > will not be feasible to use the global dictionary when its cardinality > exceeds the upper bound 2B. How can we deal with this? > There's good news that a session never crosses days. With this feature, we > don't need to merge bitmap across days. To calculate precise session > cardinality, we can assign each day a bitmap and directly summarize the > cardinalities estimated by each bitmap. No bitmap merge is needed. > To use bitmap for cardinality calculation, we need to map raw data from value > to an integer id, which is achieved by encoding the value with a dictionary. > Previously, for the ability of merging bitmaps from multiple segments, global > dictionary is used. However, in this case, there's no need of bitmap merge, > the global dictionary is not needed. > And we don't need to filter by or group by session. Then there's no need to > map from value to id and from id to value after the related bitmap is > constructed. Therefore, we don't need to store dictionaries for session. Only > the bitmap is enough. > To deal with segment merge, since bitmaps of each segment are not able to > merge to one bitmap, we use a map for storing multiple bitmaps. In the map, > the key is the segment name and the value is the segment-level bitmap. -- This message was sent by Atlassian Jira (v8.3.4#803005)