Re: Redistribute intermediate table default not by rand()

liuzhixin Thu, 01 Nov 2018 23:20:46 -0700

Hi ShaoFeng Shi，

Thank you for the answer.
#
Step1: Create Intermediate Flat Hive Table
Step2: Redistribute intermediate table
#
Perhaps, Kylin can insert one rand column for the next shard, (as default).
At the same time,  Kylin should support the custom column for shard.


Best Wishes.

> 在 2018年11月2日，下午2:06，ShaoFeng Shi <[email protected]> 写道：
> 
> Hi ShaoFeng Shi,
> 
> Kylin 2.5.1 will add some tips in the advanced step, hope that can help.
> 
> liuzhixin <[email protected]> 于2018年11月2日周五 下午2:05写道：
> 
>> Hi Chao Long:
>> 
>> Thank you for the answer.
>> #
>> Maybe kylin should provide config for every build step
>> 
>> Best wishes.
>> 
>>> 在 2018年11月2日，下午1:38，Chao Long <[email protected]> 写道：
>>> 
>>> Hi zhixin,
>>> Data may become not correct if use "distribute by rand()".
>>> https://issues.apache.org/jira/browse/KYLIN-3388
>>> 
>>> 
>>> 
>>> 
>>> ------------------ 原始邮件 ------------------
>>> 发件人: "liuzhixin"<[email protected]>;
>>> 发送时间: 2018年11月2日(星期五) 中午12:53
>>> 收件人: "dev"<[email protected]>;
>>> 抄送: "ShaoFeng Shi"<[email protected]>;
>>> 主题: Re: Redistribute intermediate table default not by rand()
>>> 
>>> 
>>> 
>>> Hi kylin team:
>>> 
>>> Step: Redistribute intermediate table
>>> #
>>> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据，没有采用DISTRIBUTE BY RAND()
>>> 如果没有合适的维度字段，这样的默认策略将会导致数据更加的数据不均衡。
>>> 
>>> Best Regards！
>>> 
>>>> 在 2018年11月2日，下午12:03，liuzhixin <[email protected]> 写道：
>>>> 
>>>> Hi kylin team:
>>>> 
>>>> Version: Kylin2.5-hadoop3.1 for hdp3.0
>>>> #
>>>> Step: Redistribute intermediate table
>>>> #
>>>> DISTRIBUTE BY is that:
>>>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM
>> table_intermediate DISTRIBUTE BY Field1, Field2, Field3;
>>>> #
>>>> Not DISTRIBUTE BY RAND()
>>>> #
>>>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE
>> BY RAND()?
>>>> 
>>>> Best wishes.
>> 
>> 
>> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋

Re: Redistribute intermediate table default not by rand()

Reply via email to