Hi Chao Long, Yes! # So I said “has provided”, below, > At the same time, Kylin should support the custom column for shard. (has > provided)
# Bug, Kylin can insert one rand column in the intermediate hive table for the next shard, (as default). Best Wishes! > 在 2018年11月2日,下午4:03,Chao Long <wayn...@qq.com> 写道: > > Hi zhixin, > As I remember If you set "shard by" column in cube design page, Kylin will > use this column as the condition of "distribute by", rather than the first > three field of rowkey. > > > > > ------------------ 原始邮件 ------------------ > 发件人: "liuzhixin"<liuz...@163.com>; > 发送时间: 2018年11月2日(星期五) 下午3:11 > 收件人: "dev"<dev@kylin.apache.org>; > 抄送: "Chao Long"<wayn...@qq.com>; > 主题: Re: Redistribute intermediate table default not by rand() > > > > Hi Chao Long, > > Thank you for the answer. > # > Step1: Create Intermediate Flat Hive Table > Step2: Redistribute intermediate table > # > Perhaps, Kylin can insert one rand column in the intermediate hive table for > the next shard, (as default). > At the same time, Kylin should support the custom column for shard. (has > provided) > > Best Wishes. > >> 在 2018年11月2日,下午1:38,Chao Long <wayn...@qq.com> 写道: >> >> Hi zhixin, >> Data may become not correct if use "distribute by rand()". >> https://issues.apache.org/jira/browse/KYLIN-3388 >> >> >> >> >> ------------------ 原始邮件 ------------------ >> 发件人: "liuzhixin"<liuz...@163.com>; >> 发送时间: 2018年11月2日(星期五) 中午12:53 >> 收件人: "dev"<dev@kylin.apache.org>; >> 抄送: "ShaoFeng Shi"<shaofeng...@apache.org>; >> 主题: Re: Redistribute intermediate table default not by rand() >> >> >> >> Hi kylin team: >> >> Step: Redistribute intermediate table >> # >> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND() >> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。 >> >> Best Regards! >> >>> 在 2018年11月2日,下午12:03,liuzhixin <liuz...@163.com> 写道: >>> >>> Hi kylin team: >>> >>> Version: Kylin2.5-hadoop3.1 for hdp3.0 >>> # >>> Step: Redistribute intermediate table >>> # >>> DISTRIBUTE BY is that: >>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate >>> DISTRIBUTE BY Field1, Field2, Field3; >>> # >>> Not DISTRIBUTE BY RAND() >>> # >>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY >>> RAND()? >>> >>> Best wishes.