[ https://issues.apache.org/jira/browse/KYLIN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641080#comment-16641080 ]
Shaofeng SHI commented on KYLIN-3491: ------------------------------------- Hi Ruslan, this optimization is only for cube building, has no impact on query performance. For high cardinality columns, its performance won't be okay because the bitmap is compressed format, please avoid much post-aggregation in the query time. The main difficulty might be building the global dictionary if the date type is not integer. > Improve the cube building process when using global dictionary > -------------------------------------------------------------- > > Key: KYLIN-3491 > URL: https://issues.apache.org/jira/browse/KYLIN-3491 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Reporter: Zhong Yanghong > Assignee: Zhong Yanghong > Priority: Major > Fix For: v2.5.0 > > Attachments: APACHE-KYLIN-3491-with-fix.patch, APACHE-KYLIN-3491.patch > > > By current cubing process, if the global dictionary is very large, since the > raw data records are unsorted, it's hard to encode raw values into ids for > the input of bitmap due to frequent swap of the dictionary slices. We need a > refined process. The idea is as follows: > # for each source data block, there will be a mapper generating the distinct > values & sort them > # encode the sorted distinct values and generate a shrunken dict for each > source data block. > # when building base cuboid, use the shrunken dict for each source data > block for encoding. -- This message was sent by Atlassian JIRA (v7.6.3#76005)