Re: Re: 【Could we support distribute by For FlinkSql】

Jark Wu Sun, 08 May 2022 23:12:28 -0700

I got what you want, maybe something like DISTRIBUTED BY in Hive SQL.
The community is planning to support this feature but has not started yet.
@Godfrey will drive this work.


Best,
Jark

On Mon, 9 May 2022 at 13:45, [email protected] <[email protected]> wrote:

> Hi
>     Thanks for your reply.
>     The way I want is not only for hash-lookup-join,   there are manay
> operators  need  a hash-operation to solve the skew-problem.  Lookup-join
> is a special scene.
>     So I hope there is a operator could make a shuffle. Maybe it's a way
> to solve the problems ?
>
>
> https://docs.google.com/document/d/1D7AX-_wttMNY53TxLQxiDaRyDVCeEZYCE8AwYflDXZM/edit?usp=sharing
>
>
>
>
>
> [email protected]
>
> 发件人： Jark Wu
> 发送时间： 2022-05-09 12:27
> 收件人： dev
> 主题： Re: 【Could we support distribute by For FlinkSql】
> Hi,
>
> If you are looking for the hash lookup join, there is an in-progress
> FLIP-204[1] working for it.
>
> Btw, I still can't see your picture. You can upload your picture to some
> image service and share a link here.
>
> Best,
> Jark
>
> [1]:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
>
> On Mon, 9 May 2022 at 11:22, [email protected] <[email protected]>
> wrote:
>
> > Sorry!
> > The destroied picture is the attachment ;
> >
> > ------------------------------
> > [email protected]
> >
> >
> > *发件人：* [email protected]
> > *发送时间：* 2022-05-09 11:16
> > *收件人：* user-zh <[email protected]>; dev <[email protected]>
> > *主题：* 【Could we support distribute by For FlinkSql】
> > Hello：
> >     Now we cann't add a shuffle-operation in a sql-job.
> > Sometimes , for example, I have a kafka-source(three partitions) with
> > parallelism three. And then I have a lookup-join function, I want process
> > the data distribute by id so that the data can split into thre
> parallelism
> > evenly (The source maybe slant seriously).
> > In DataStream API i can do it with keyby(), but it's so sad that i can do
> > nothing when i use a sql;
> > Maybe we can do it like 'select id, f1,f2 from sourceTable distribute by
> > id' like we do it in SparkSql.
> >
> > Sot that we can make change on the picture  in sql-mode;
> >
> >
> >
> > ------------------------------
> > [email protected]
> >
> >
>

Re: Re: 【Could we support distribute by For FlinkSql】

Reply via email to