Re: Redistribute intermediate table default not by rand()

ShaoFeng Shi Thu, 01 Nov 2018 22:44:00 -0700

Please move the high cardinality dimensions to the leading position of
rowkey, that will make the data distribution more even;


Chao Long <[email protected]> 于2018年11月2日周五 下午1:38写道：

> Hi zhixin,
>  Data may become not correct if use "distribute by rand()".
>  https://issues.apache.org/jira/browse/KYLIN-3388
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "liuzhixin"<[email protected]>;
> 发送时间: 2018年11月2日(星期五) 中午12:53
> 收件人: "dev"<[email protected]>;
> 抄送: "ShaoFeng Shi"<[email protected]>;
> 主题: Re: Redistribute intermediate table default not by rand()
>
>
>
> Hi kylin team:
>
> Step: Redistribute intermediate table
> #
> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据，没有采用DISTRIBUTE BY RAND()
> 如果没有合适的维度字段，这样的默认策略将会导致数据更加的数据不均衡。
>
> Best Regards！
>
> > 在 2018年11月2日，下午12:03，liuzhixin <[email protected]> 写道：
> >
> > Hi kylin team:
> >
> > Version: Kylin2.5-hadoop3.1 for hdp3.0
> > #
> > Step: Redistribute intermediate table
> > #
> > DISTRIBUTE BY is that:
> > INSERT OVERWRITE TABLE table_intermediate SELECT * FROM
> table_intermediate DISTRIBUTE BY Field1, Field2, Field3;
> > #
> > Not DISTRIBUTE BY RAND()
> > #
> > Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE
> BY RAND()?
> >
> > Best wishes.
> >



-- 
Best regards,

Shaofeng Shi 史少锋

Re: Redistribute intermediate table default not by rand()

Reply via email to