[jira] [Commented] (KYLIN-2135) Enlarge FactDistinctColumns reducer number

Shaofeng SHI (JIRA) Tue, 01 Nov 2016 00:25:50 -0700

    [ 
https://issues.apache.org/jira/browse/KYLIN-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624616#comment-15624616
 ]


Shaofeng SHI commented on KYLIN-2135:
-------------------------------------

Hi Kaisen,

1. The UHC column is very common; today there is no such marker in Kylin, later 
we can add that. (today the "shard by = true" column is UHC); By default we can 
start multiple (say 3 or 4) reducers for a UHC column (user doesn't need extra 
configure).

2.  I don't know how big a global dictionary can be, I just curios about that, 
not required to do that; If it is too big, loading that to Mapper might not be 
a good idea;

4. For the close() you can catch the IOException and continue the iterator to 
end as no extra handling can do for the exception; I re-read the next() method, 
its functionality is okay but can be improved for better readability.

Thanks very much for your contribution! 

> Enlarge FactDistinctColumns reducer number
> ------------------------------------------
>
>                 Key: KYLIN-2135
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2135
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v1.5.4.1
>            Reporter: kangkaisen
>            Assignee: kangkaisen
>         Attachments: KYLIN-2135.patch, new.png, old.png
>
>
> When the hive table has billions of rows and use global dictionary for 
> precise count distinct measures, the  {{Extract Fact Table Distinct Columns}} 
> job will run o long time.
> So we could use more reducer to deal with the one column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KYLIN-2135) Enlarge FactDistinctColumns reducer number

Reply via email to