Hi ShaoFeng Shi, Thank you for the answer. # Step1: Create Intermediate Flat Hive Table Step2: Redistribute intermediate table # Perhaps, Kylin can insert one rand column for the next shard, (as default). At the same time, Kylin should support the custom column for shard.
Best Wishes. > 在 2018年11月2日,下午2:06,ShaoFeng Shi <shaofeng...@apache.org> 写道: > > Hi ShaoFeng Shi, > > Kylin 2.5.1 will add some tips in the advanced step, hope that can help. > > liuzhixin <liuz...@163.com> 于2018年11月2日周五 下午2:05写道: > >> Hi Chao Long: >> >> Thank you for the answer. >> # >> Maybe kylin should provide config for every build step >> >> Best wishes. >> >>> 在 2018年11月2日,下午1:38,Chao Long <wayn...@qq.com> 写道: >>> >>> Hi zhixin, >>> Data may become not correct if use "distribute by rand()". >>> https://issues.apache.org/jira/browse/KYLIN-3388 >>> >>> >>> >>> >>> ------------------ 原始邮件 ------------------ >>> 发件人: "liuzhixin"<liuz...@163.com>; >>> 发送时间: 2018年11月2日(星期五) 中午12:53 >>> 收件人: "dev"<dev@kylin.apache.org>; >>> 抄送: "ShaoFeng Shi"<shaofeng...@apache.org>; >>> 主题: Re: Redistribute intermediate table default not by rand() >>> >>> >>> >>> Hi kylin team: >>> >>> Step: Redistribute intermediate table >>> # >>> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND() >>> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。 >>> >>> Best Regards! >>> >>>> 在 2018年11月2日,下午12:03,liuzhixin <liuz...@163.com> 写道: >>>> >>>> Hi kylin team: >>>> >>>> Version: Kylin2.5-hadoop3.1 for hdp3.0 >>>> # >>>> Step: Redistribute intermediate table >>>> # >>>> DISTRIBUTE BY is that: >>>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM >> table_intermediate DISTRIBUTE BY Field1, Field2, Field3; >>>> # >>>> Not DISTRIBUTE BY RAND() >>>> # >>>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE >> BY RAND()? >>>> >>>> Best wishes. >> >> >> > > -- > Best regards, > > Shaofeng Shi 史少锋