[ 
https://issues.apache.org/jira/browse/KYLIN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533424#comment-16533424
 ] 

kangkaisen commented on KYLIN-3430:
-----------------------------------

Hi [~temple.zhou] [~Shaofengshi]  the Global Dictionary Cleanup is not a issue.

Global Dictionary will delete expired versions dir when commit. If you think 
the three version is redundant, you only need to set  
"kylin.dictionary.append-max-versions" to 1.

 

As for the lock issue, you can refer to 
https://issues.apache.org/jira/browse/KYLIN-2506

{quote}

So, up to now, How do we ensure the correctness of the global dict in 
distributed env?

1 Distributed lock: it ensure only one thread could write the global dict at 
the same time.
2 MVCC: we write the global dict in the working dir and read the global dict 
form the versions dir.
3 every time we read the global dict, we will construct the 
AppendTrieDictionary from the metadata in the latestVersion dir.

Based on above 3 points, we could ensure global dict is sequential write and 
parallel read in distributed env.

{quote}

Thanks you.

> Global Dictionary Cleanup
> -------------------------
>
>                 Key: KYLIN-3430
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3430
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Tools, Build and Test
>    Affects Versions: v2.1.0, v2.2.0, v2.3.0, v2.3.1, v2.4.0
>            Reporter: Temple Zhou
>            Assignee: Temple Zhou
>            Priority: Major
>         Attachments: KYLIN-3430.master.001.patch
>
>
> I had run "{{./bin/metastore.sh clean --delete true" to cleanup my Kylin 
> metadata, but, after that, the Global Dictionary still exists in my HDFS and 
> the size of directory "/kylin_metadata/resources/GlobalDict/dict" hasn't 
> shrunk.}}
>  
> {{BTW: I'm very sure that there are redundant Global Dictionaries.}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to