Hi Godfrey,

 @Qianfei (kimijqf) had ever already supported this "distribute by" feature 
nearly two years ago in the internal forked Flink version.
 Would you like to share the FLIP doc? Maybe we can give some feedback and work 
together to make this feature done.

Best
Yun Tang
________________________________
From: godfrey he <godfre...@gmail.com>
Sent: Tuesday, May 10, 2022 17:33
To: dev <dev@flink.apache.org>
Subject: Re: Re: 【Could we support distribute by For FlinkSql】

Hi, Ipengdream. I will drive this work.
We will support this functionality via hints,
because "distribute by" is not in the sql standard.
But it will be supported in hive dialect.
I will post the FLIP doc recently.

Best,
Godfrey


Jark Wu <imj...@gmail.com> 于2022年5月9日周一 16:03写道:

>
> We will start a FLIP discussion in the dev mailing list, so please watch on
> the ML.
> I also find that you opened FLINK-27541, we will also update FLINK-27541
> once we have an initial FLIP.
>
> Best,
> Jark
>
> On Mon, 9 May 2022 at 15:18, lpengdr...@163.com <lpengdr...@163.com> wrote:
>
> > Yeah!  That's great. Thank you!   Where can i get more information about
> > that?
> >
> >
> >
> > lpengdr...@163.com
> >
> > 发件人: Jark Wu
> > 发送时间: 2022-05-09 14:12
> > 收件人: dev
> > 抄送: 贺小令
> > 主题: Re: Re: 【Could we support distribute by For FlinkSql】
> > I got what you want, maybe something like DISTRIBUTED BY in Hive SQL.
> > The community is planning to support this feature but has not started yet.
> > @Godfrey will drive this work.
> >
> > Best,
> > Jark
> >
> > On Mon, 9 May 2022 at 13:45, lpengdr...@163.com <lpengdr...@163.com>
> > wrote:
> >
> > > Hi
> > >     Thanks for your reply.
> > >     The way I want is not only for hash-lookup-join,   there are manay
> > > operators  need  a hash-operation to solve the skew-problem.  Lookup-join
> > > is a special scene.
> > >     So I hope there is a operator could make a shuffle. Maybe it's a way
> > > to solve the problems ?
> > >
> > >
> > >
> > https://docs.google.com/document/d/1D7AX-_wttMNY53TxLQxiDaRyDVCeEZYCE8AwYflDXZM/edit?usp=sharing
> > >
> > >
> > >
> > >
> > >
> > > lpengdr...@163.com
> > >
> > > 发件人: Jark Wu
> > > 发送时间: 2022-05-09 12:27
> > > 收件人: dev
> > > 主题: Re: 【Could we support distribute by For FlinkSql】
> > > Hi,
> > >
> > > If you are looking for the hash lookup join, there is an in-progress
> > > FLIP-204[1] working for it.
> > >
> > > Btw, I still can't see your picture. You can upload your picture to some
> > > image service and share a link here.
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> > >
> > > On Mon, 9 May 2022 at 11:22, lpengdr...@163.com <lpengdr...@163.com>
> > > wrote:
> > >
> > > > Sorry!
> > > > The destroied picture is the attachment ;
> > > >
> > > > ------------------------------
> > > > lpengdr...@163.com
> > > >
> > > >
> > > > *发件人:* lpengdr...@163.com
> > > > *发送时间:* 2022-05-09 11:16
> > > > *收件人:* user-zh <user...@flink.apache.org>; dev <dev@flink.apache.org>
> > > > *主题:* 【Could we support distribute by For FlinkSql】
> > > > Hello:
> > > >     Now we cann't add a shuffle-operation in a sql-job.
> > > > Sometimes , for example, I have a kafka-source(three partitions) with
> > > > parallelism three. And then I have a lookup-join function, I want
> > process
> > > > the data distribute by id so that the data can split into thre
> > > parallelism
> > > > evenly (The source maybe slant seriously).
> > > > In DataStream API i can do it with keyby(), but it's so sad that i can
> > do
> > > > nothing when i use a sql;
> > > > Maybe we can do it like 'select id, f1,f2 from sourceTable distribute
> > by
> > > > id' like we do it in SparkSql.
> > > >
> > > > Sot that we can make change on the picture  in sql-mode;
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > > lpengdr...@163.com
> > > >
> > > >
> > >
> >

Reply via email to