[ https://issues.apache.org/jira/browse/KYLIN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592327#comment-16592327 ]
ASF subversion and git services commented on KYLIN-3491: -------------------------------------------------------- Commit 637f45d8444c7b52713780c1701d33d6656fffc0 in kylin's branch refs/heads/master from Zhong [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=637f45d ] KYLIN-3491 add a shrunken global dictionary step to improve the encoding process Signed-off-by: shaofengshi <shaofeng...@apache.org> > Improve the cube building process when using global dictionary > -------------------------------------------------------------- > > Key: KYLIN-3491 > URL: https://issues.apache.org/jira/browse/KYLIN-3491 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Reporter: Zhong Yanghong > Assignee: Zhong Yanghong > Priority: Major > Fix For: v2.5.0 > > Attachments: APACHE-KYLIN-3491.patch > > > By current cubing process, if the global dictionary is very large, since the > raw data records are unsorted, it's hard to encode raw values into ids for > the input of bitmap due to frequent swap of the dictionary slices. We need a > refined process. The idea is as follows: > # for each source data block, there will be a mapper generating the distinct > values & sort them > # encode the sorted distinct values and generate a shrunken dict for each > source data block. > # when building base cuboid, use the shrunken dict for each source data > block for encoding. -- This message was sent by Atlassian JIRA (v7.6.3#76005)