[ https://issues.apache.org/jira/browse/KYLIN-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624616#comment-15624616 ]
Shaofeng SHI commented on KYLIN-2135: ------------------------------------- Hi Kaisen, 1. The UHC column is very common; today there is no such marker in Kylin, later we can add that. (today the "shard by = true" column is UHC); By default we can start multiple (say 3 or 4) reducers for a UHC column (user doesn't need extra configure). 2. I don't know how big a global dictionary can be, I just curios about that, not required to do that; If it is too big, loading that to Mapper might not be a good idea; 4. For the close() you can catch the IOException and continue the iterator to end as no extra handling can do for the exception; I re-read the next() method, its functionality is okay but can be improved for better readability. Thanks very much for your contribution! > Enlarge FactDistinctColumns reducer number > ------------------------------------------ > > Key: KYLIN-2135 > URL: https://issues.apache.org/jira/browse/KYLIN-2135 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Affects Versions: v1.5.4.1 > Reporter: kangkaisen > Assignee: kangkaisen > Attachments: KYLIN-2135.patch, new.png, old.png > > > When the hive table has billions of rows and use global dictionary for > precise count distinct measures, the {{Extract Fact Table Distinct Columns}} > job will run o long time. > So we could use more reducer to deal with the one column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)