[ https://issues.apache.org/jira/browse/KYLIN-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289234#comment-16289234 ]
Dong Li edited comment on KYLIN-2866 at 12/13/17 1:11 PM: ---------------------------------------------------------- Hi [~yaho], in this patch, I found several duplicated code between org.apache.kylin.engine.mr.steps.SaveStatisticsStep#doWork and org.apache.kylin.engine.mr.common.CubeStatsReader, which will make the code more complex. Could you make some refine? Besides, in org.apache.kylin.engine.mr.common.MapReduceUtil#getHLLShardBase, which seems to calculate reducer number. What does the "HLLShardBase" mean here? was (Author: lidong_sjtu): Hi [~yaho], in this patch, I found several duplicated code between org.apache.kylin.engine.mr.steps.SaveStatisticsStep#doWork and org.apache.kylin.engine.mr.common.CubeStatsReader, which make code complex. Could you make some refine? Besides, in org.apache.kylin.engine.mr.common.MapReduceUtil#getHLLShardBase, which seems to calculate reducer number. What does the "HLLShardBase" mean here? > Enlarge the reducer number for hyperloglog statistics calculation at step > FactDistinctColumnsJob > ------------------------------------------------------------------------------------------------ > > Key: KYLIN-2866 > URL: https://issues.apache.org/jira/browse/KYLIN-2866 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Reporter: Zhong Yanghong > Assignee: Zhong Yanghong > Fix For: v2.3.0 > > Attachments: APACHE-KYLIN-2866.patch > > > Currently only one reducer is assigned for hll stats calculation, which may > become the bottleneck for slow down this step. Since the stats for different > cuboids will not influence each other, it's better to divide the cuboid set > into several and assign a reduce for each subset. > The strategy of this patch is to assign 100 cuboids into a subset. And > there's a upper limit of reducers for hll stats calculation. Currently it's > 50. -- This message was sent by Atlassian JIRA (v6.4.14#64029)