[ https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16152630#comment-16152630 ]
kangkaisen commented on KYLIN-2764: ----------------------------------- I have rebased KYLIN-2622 and KYLIN-2764 on master branch. KYLIN-2622 and KYLIN-2764 are both about global dict, So I put those two commit on one branch 2622-2764 and run IT together. > Build the dict for UHC column with MR > ------------------------------------- > > Key: KYLIN-2764 > URL: https://issues.apache.org/jira/browse/KYLIN-2764 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Affects Versions: v2.0.0 > Reporter: kangkaisen > Assignee: kangkaisen > Attachments: job-memory-after.png, job-memory-before.png > > > KYLIN-2217 has built dict for normal column with MR, but the UHC column > still build dict in JobServer. Like KYLIN-2217, we also could use MR build > dict for UHC column. which could thoroughly release the memory pressure and > improve job concurrent for JobServer as well as speed up multi UHC columns > procedure. > The MR input is the output of "Extract Fact Table Distinct Columns", the MR > output is the UHC column dict. Because it is very hard build global dict with > multi reducers, I use one reducer handle one UHC column and allocate enough > memory to the reducer. According to my test, 8G memory is enough. -- This message was sent by Atlassian JIRA (v6.4.14#64029)