Hi Chao Long:

Thank you for the answer.
#
Maybe kylin should provide config for every build step

Best wishes.

> 在 2018年11月2日,下午1:38,Chao Long <wayn...@qq.com> 写道:
> 
> Hi zhixin,
> Data may become not correct if use "distribute by rand()".
> https://issues.apache.org/jira/browse/KYLIN-3388
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "liuzhixin"<liuz...@163.com>;
> 发送时间: 2018年11月2日(星期五) 中午12:53
> 收件人: "dev"<dev@kylin.apache.org>;
> 抄送: "ShaoFeng Shi"<shaofeng...@apache.org>; 
> 主题: Re: Redistribute intermediate table default not by rand()
> 
> 
> 
> Hi kylin team:
> 
> Step: Redistribute intermediate table
> #
> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
> 
> Best Regards!
> 
>> 在 2018年11月2日,下午12:03,liuzhixin <liuz...@163.com> 写道:
>> 
>> Hi kylin team:
>> 
>> Version: Kylin2.5-hadoop3.1 for hdp3.0
>> #
>> Step: Redistribute intermediate table
>> #
>> DISTRIBUTE BY is that:
>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate 
>> DISTRIBUTE BY Field1, Field2, Field3;
>> #
>> Not DISTRIBUTE BY RAND()
>> #
>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY 
>> RAND()?
>> 
>> Best wishes.


Reply via email to