Hi Chao Long:

Thank you for the answer.
#
Maybe kylin should provide config for every build step

Best wishes.

> 在 2018年11月2日,下午1:38,Chao Long <[email protected]> 写道:
> 
> Hi zhixin,
> Data may become not correct if use "distribute by rand()".
> https://issues.apache.org/jira/browse/KYLIN-3388
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "liuzhixin"<[email protected]>;
> 发送时间: 2018年11月2日(星期五) 中午12:53
> 收件人: "dev"<[email protected]>;
> 抄送: "ShaoFeng Shi"<[email protected]>; 
> 主题: Re: Redistribute intermediate table default not by rand()
> 
> 
> 
> Hi kylin team:
> 
> Step: Redistribute intermediate table
> #
> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
> 
> Best Regards!
> 
>> 在 2018年11月2日,下午12:03,liuzhixin <[email protected]> 写道:
>> 
>> Hi kylin team:
>> 
>> Version: Kylin2.5-hadoop3.1 for hdp3.0
>> #
>> Step: Redistribute intermediate table
>> #
>> DISTRIBUTE BY is that:
>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate 
>> DISTRIBUTE BY Field1, Field2, Field3;
>> #
>> Not DISTRIBUTE BY RAND()
>> #
>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY 
>> RAND()?
>> 
>> Best wishes.


Reply via email to