[ 
https://issues.apache.org/jira/browse/KYLIN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950415#comment-15950415
 ] 

hongbin ma commented on KYLIN-2506:
-----------------------------------

Today we use trie dict as default encoding for precise distinct count, however 
global dictionary seems to be a better default choice. I do understand model 
designer need trie dict in some cases (where global dict may grow too large), 
however we can hide it in advanced settings. There might be a little more work 
to keep backward compatibility, still I think it's manageable. 

> Refactor Global Dictionary
> --------------------------
>
>                 Key: KYLIN-2506
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2506
>             Project: Kylin
>          Issue Type: Improvement
>          Components: General
>    Affects Versions: v2.0.0
>            Reporter: kangkaisen
>            Assignee: kangkaisen
>             Fix For: v2.0.0
>
>
> The main points of this refactor:
> 1 Fix the bug that the RemoveListener of LoadingCache swallowed any 
> exceptions when building the GlobalDict.
> 2 Fix the bug that the HDFS filename of DictSliceKey had Illegal characters.
> 3 Fix the bug that the HDFS filename of DictSliceKey maybe longer than 255.
> 4 Fix the bug that DictNode split failed if value length greater than 255 
> bytes.
> 5 Decouple the build and query of GlobalDict: 
> Abstract the builder of AppendTrieDictionary to AppendTrieDictionaryBuilder; 
> Add LoadingCache to AppendTrieDictionary and make AppendTrieDictionary is 
> only readable.
> 6 Remove dependence of LoadingCache when building the GlobalDict.
> 7 Abstract the HDFS operations to GlobalDictStore.
> 8 Abstract the metadata of GlobalDict to GlobalDictMetadata.
> 9 Delete CachedTreeMap.
> 10 Remove the support of multithreading concurrent build and I will add 
> distributed lock for GlobalDict later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to