[ https://issues.apache.org/jira/browse/KYLIN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950415#comment-15950415 ]
hongbin ma commented on KYLIN-2506: ----------------------------------- Today we use trie dict as default encoding for precise distinct count, however global dictionary seems to be a better default choice. I do understand model designer need trie dict in some cases (where global dict may grow too large), however we can hide it in advanced settings. There might be a little more work to keep backward compatibility, still I think it's manageable. > Refactor Global Dictionary > -------------------------- > > Key: KYLIN-2506 > URL: https://issues.apache.org/jira/browse/KYLIN-2506 > Project: Kylin > Issue Type: Improvement > Components: General > Affects Versions: v2.0.0 > Reporter: kangkaisen > Assignee: kangkaisen > Fix For: v2.0.0 > > > The main points of this refactor: > 1 Fix the bug that the RemoveListener of LoadingCache swallowed any > exceptions when building the GlobalDict. > 2 Fix the bug that the HDFS filename of DictSliceKey had Illegal characters. > 3 Fix the bug that the HDFS filename of DictSliceKey maybe longer than 255. > 4 Fix the bug that DictNode split failed if value length greater than 255 > bytes. > 5 Decouple the build and query of GlobalDict: > Abstract the builder of AppendTrieDictionary to AppendTrieDictionaryBuilder; > Add LoadingCache to AppendTrieDictionary and make AppendTrieDictionary is > only readable. > 6 Remove dependence of LoadingCache when building the GlobalDict. > 7 Abstract the HDFS operations to GlobalDictStore. > 8 Abstract the metadata of GlobalDict to GlobalDictMetadata. > 9 Delete CachedTreeMap. > 10 Remove the support of multithreading concurrent build and I will add > distributed lock for GlobalDict later. -- This message was sent by Atlassian JIRA (v6.3.15#6346)