[ https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151478#comment-16151478 ]
kangkaisen commented on KYLIN-2764: ----------------------------------- This is the commit: https://github.com/apache/kylin/commit/2607e18b5e17d2a68f4079a76b8c990f144cbbd6. The core idea is easy, but there are four special points we should note: 1. The FK column in fact table could be UHC column. 2. we could not get correct HDFS working dir from KylinConfig in MR. 3. The one or all UHC columns maybe NULL. 4. There maybe timeout in setup phase of Reducer because of global dict copy and lock. > Build the dict for UHC column with MR > ------------------------------------- > > Key: KYLIN-2764 > URL: https://issues.apache.org/jira/browse/KYLIN-2764 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Affects Versions: v2.0.0 > Reporter: kangkaisen > Assignee: kangkaisen > > KYLIN-2217 has built dict for normal column with MR, but the UHC column > still build dict in JobServer. Like KYLIN-2217, we also could use MR build > dict for UHC column. which could thoroughly release the memory pressure and > improve job concurrent for JobServer as well as speed up multi UHC columns > procedure. > The MR input is the output of "Extract Fact Table Distinct Columns", the MR > output is the UHC column dict. Because it is very hard build global dict with > multi reducers, I use one reducer handle one UHC column and allocate enough > memory to the reducer. According to my test, 8G memory is enough. -- This message was sent by Atlassian JIRA (v6.4.14#64029)