[ 
https://issues.apache.org/jira/browse/KYLIN-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624044#comment-15624044
 ] 

Shaofeng SHI commented on KYLIN-2135:
-------------------------------------

I have couple of comments for the patch:

1. this function will benefit the performnce when there is UHC column, 
regardless of using Global Dictionary or normal Dictionary; If it can be made 
as generic, that will be great (like adding some method like isUHC(), and the 
column that using Global Dict is one kind of UHC, later we can add more 
possibilities there)

2. For Global Dictionariy each time only need the new values to build into the 
dict ? if so, is it possible to exclude the existing values in mapper side so 
to reduce the IO? 

3. The check for "if (reducerIndex > 255) " can be moved before submitting to 
Hadoop (now it is inner the mapper)

4.  The new DFSFileTableReader.java, the close() method should catch exception 
and continue the loo; and the next() method seems incorrect as each time it 
will read from the first element in "readerList" (as I couldn't apply this 
patch so this might be wrong, please double check)  

> Enlarge FactDistinctColumns reducer number
> ------------------------------------------
>
>                 Key: KYLIN-2135
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2135
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v1.5.4.1
>            Reporter: kangkaisen
>            Assignee: kangkaisen
>         Attachments: KYLIN-2135.patch, new.png, old.png
>
>
> When the hive table has billions of rows and use global dictionary for 
> precise count distinct measures, the  {{Extract Fact Table Distinct Columns}} 
> job will run o long time.
> So we could use more reducer to deal with the one column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to