Hi Zhixin, Kylin 2.5.1 will add some tips in the advanced step, hope that can help.
liuzhixin <liuz...@163.com> 于2018年11月2日周五 下午2:05写道: > Hi Chao Long: > > Thank you for the answer. > # > Maybe kylin should provide config for every build step > > Best wishes. > > > 在 2018年11月2日,下午1:38,Chao Long <wayn...@qq.com> 写道: > > > > Hi zhixin, > > Data may become not correct if use "distribute by rand()". > > https://issues.apache.org/jira/browse/KYLIN-3388 > > > > > > > > > > ------------------ 原始邮件 ------------------ > > 发件人: "liuzhixin"<liuz...@163.com>; > > 发送时间: 2018年11月2日(星期五) 中午12:53 > > 收件人: "dev"<dev@kylin.apache.org>; > > 抄送: "ShaoFeng Shi"<shaofeng...@apache.org>; > > 主题: Re: Redistribute intermediate table default not by rand() > > > > > > > > Hi kylin team: > > > > Step: Redistribute intermediate table > > # > > 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND() > > 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。 > > > > Best Regards! > > > >> 在 2018年11月2日,下午12:03,liuzhixin <liuz...@163.com> 写道: > >> > >> Hi kylin team: > >> > >> Version: Kylin2.5-hadoop3.1 for hdp3.0 > >> # > >> Step: Redistribute intermediate table > >> # > >> DISTRIBUTE BY is that: > >> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM > table_intermediate DISTRIBUTE BY Field1, Field2, Field3; > >> # > >> Not DISTRIBUTE BY RAND() > >> # > >> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE > BY RAND()? > >> > >> Best wishes. > > > -- Best regards, Shaofeng Shi 史少锋