[ 
https://issues.apache.org/jira/browse/KYLIN-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289234#comment-16289234
 ] 

Dong Li edited comment on KYLIN-2866 at 12/13/17 1:11 PM:
----------------------------------------------------------

Hi [~yaho], in this patch, I found several duplicated code between 
org.apache.kylin.engine.mr.steps.SaveStatisticsStep#doWork and 
org.apache.kylin.engine.mr.common.CubeStatsReader, which will make the code 
more complex. Could you make some refine?

Besides, in org.apache.kylin.engine.mr.common.MapReduceUtil#getHLLShardBase, 
which seems to calculate reducer number. What does the "HLLShardBase" mean here?


was (Author: lidong_sjtu):
Hi [~yaho], in this patch, I found several duplicated code between 
org.apache.kylin.engine.mr.steps.SaveStatisticsStep#doWork and 
org.apache.kylin.engine.mr.common.CubeStatsReader, which make code complex. 
Could you make some refine?

Besides, in org.apache.kylin.engine.mr.common.MapReduceUtil#getHLLShardBase, 
which seems to calculate reducer number. What does the "HLLShardBase" mean here?

> Enlarge the reducer number for hyperloglog statistics calculation at step 
> FactDistinctColumnsJob
> ------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-2866
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2866
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>             Fix For: v2.3.0
>
>         Attachments: APACHE-KYLIN-2866.patch
>
>
> Currently only one reducer is assigned for hll stats calculation, which may 
> become the bottleneck for slow down this step. Since the stats for different 
> cuboids will not influence each other, it's better to divide the cuboid set 
> into several and assign a reduce for each subset.
> The strategy of this patch is to assign 100 cuboids into a subset. And 
> there's a upper limit of reducers for hll stats calculation. Currently it's 
> 50.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to