I got what you want, maybe something like DISTRIBUTED BY in Hive SQL. The community is planning to support this feature but has not started yet. @Godfrey will drive this work.
Best, Jark On Mon, 9 May 2022 at 13:45, lpengdr...@163.com <lpengdr...@163.com> wrote: > Hi > Thanks for your reply. > The way I want is not only for hash-lookup-join, there are manay > operators need a hash-operation to solve the skew-problem. Lookup-join > is a special scene. > So I hope there is a operator could make a shuffle. Maybe it's a way > to solve the problems ? > > > https://docs.google.com/document/d/1D7AX-_wttMNY53TxLQxiDaRyDVCeEZYCE8AwYflDXZM/edit?usp=sharing > > > > > > lpengdr...@163.com > > 发件人: Jark Wu > 发送时间: 2022-05-09 12:27 > 收件人: dev > 主题: Re: 【Could we support distribute by For FlinkSql】 > Hi, > > If you are looking for the hash lookup join, there is an in-progress > FLIP-204[1] working for it. > > Btw, I still can't see your picture. You can upload your picture to some > image service and share a link here. > > Best, > Jark > > [1]: > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join > > On Mon, 9 May 2022 at 11:22, lpengdr...@163.com <lpengdr...@163.com> > wrote: > > > Sorry! > > The destroied picture is the attachment ; > > > > ------------------------------ > > lpengdr...@163.com > > > > > > *发件人:* lpengdr...@163.com > > *发送时间:* 2022-05-09 11:16 > > *收件人:* user-zh <user...@flink.apache.org>; dev <dev@flink.apache.org> > > *主题:* 【Could we support distribute by For FlinkSql】 > > Hello: > > Now we cann't add a shuffle-operation in a sql-job. > > Sometimes , for example, I have a kafka-source(three partitions) with > > parallelism three. And then I have a lookup-join function, I want process > > the data distribute by id so that the data can split into thre > parallelism > > evenly (The source maybe slant seriously). > > In DataStream API i can do it with keyby(), but it's so sad that i can do > > nothing when i use a sql; > > Maybe we can do it like 'select id, f1,f2 from sourceTable distribute by > > id' like we do it in SparkSql. > > > > Sot that we can make change on the picture in sql-mode; > > > > > > > > ------------------------------ > > lpengdr...@163.com > > > > >