Hi ShaoFeng Shi OK, thank you for the answer. # Perhaps Kylin should provide the tips or notes for the default shard.
Best Wishes. > 在 2018年11月2日,下午1:42,ShaoFeng Shi <[email protected]> 写道: > > Please move the high cardinality dimensions to the leading position of > rowkey, that will make the data distribution more even; > > Chao Long <[email protected]> 于2018年11月2日周五 下午1:38写道: > >> Hi zhixin, >> Data may become not correct if use "distribute by rand()". >> https://issues.apache.org/jira/browse/KYLIN-3388 >> >> >> >> >> ------------------ 原始邮件 ------------------ >> 发件人: "liuzhixin"<[email protected]>; >> 发送时间: 2018年11月2日(星期五) 中午12:53 >> 收件人: "dev"<[email protected]>; >> 抄送: "ShaoFeng Shi"<[email protected]>; >> 主题: Re: Redistribute intermediate table default not by rand() >> >> >> >> Hi kylin team: >> >> Step: Redistribute intermediate table >> # >> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND() >> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。 >> >> Best Regards! >> >>> 在 2018年11月2日,下午12:03,liuzhixin <[email protected]> 写道: >>> >>> Hi kylin team: >>> >>> Version: Kylin2.5-hadoop3.1 for hdp3.0 >>> # >>> Step: Redistribute intermediate table >>> # >>> DISTRIBUTE BY is that: >>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM >> table_intermediate DISTRIBUTE BY Field1, Field2, Field3; >>> # >>> Not DISTRIBUTE BY RAND() >>> # >>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE >> BY RAND()? >>> >>> Best wishes. >>> > > > > -- > Best regards, > > Shaofeng Shi 史少锋
