Hi Awake, Thanks for your good point, updated Best, Aitozi.
宇航 李 <liyuh...@yuewen.com> 于2023年7月5日周三 11:29写道: > Hi Aitozi, > > I think it is necessary to add the following description in FLIP to > express the difference between user-defined asynchronous table function and > AsyncTableFunction: > > User-defined asynchronous table functions allow complex parameters (e.g., > Row type) to be passed to function, which is important in RPC, rather than > using ‘join … on ...'. > > Thanks, > Awake. > > > On 2023/06/26 02:31:59 Aitozi wrote: > > Hi Lincoln, > > Thanks for your confirmation. I have updated the consensus to the > FLIP > > doc. > > If there are no other comments, I'd like to restart the vote process in > [1] > > today. > > > > https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx > > > > Thanks, > > Aitozi. > > > > Lincoln Lee <li...@gmail.com> 于2023年6月21日周三 22:29写道: > > > > > Hi Aitozi, > > > > > > Thanks for your updates! > > > > > > By the design of hints, the hints after select clause belong to the > query > > > hints category, and this new hint is also a kind of join hints[1]. > > > Join table function is one of the join type defined by flink sql > joins[2], > > > all existing join hints[1] omit the 'join' keyword, > > > so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for > > > 'ASYNC_TABLE_FUNC_JOIN'). > > > > > > Since a short Chinese holiday is coming, I suggest waiting for other > > > people's responses before continuing to vote (next monday?) > > > > > > Btw, I discussed with @fudian offline about pyflink support, there > should > > > be no known issues, so you can create a subtask with pyflink support > after > > > the vote passed. > > > > > > [1] > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints > > > [2] > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/ > > > > > > Best, > > > Lincoln Lee > > > > > > > > > Aitozi <gj...@gmail.com> 于2023年6月18日周日 21:18写道: > > > > > > > Hi all, > > > > Sorry for the late reply, I have a discussion with Lincoln > offline, > > > > mainly about > > > > the naming of the hints option. Thanks Lincoln for the valuable > > > > suggestions. > > > > > > > > Let me answer the last email inline. > > > > > > > > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC > call > > > as > > > > an example? > > > > > > > > Sure, will give an example when adding the doc of async udtf and will > > > > update the FLIP simultaneously > > > > > > > > >For the name of this query hint, 'LATERAL' (include its internal > > > options) > > > > don't show any relevance to async, but I haven't thought of a > suitable > > > name > > > > at the moment, > > > > > > > > After some discussion with Lincoln, We prefer to choose one of the > > > > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`. > > > > Besides, In my opinion the keyword `lateral`'s use scenario is wider > than > > > > the table function join, but in this case we only want to config > > > > the async table function, So I'm a bit more lean to the > > > `ASYNC_TABLE_FUNC`. > > > > Looking forward to some inputs if you guys have > > > > some better suggestion on the naming. > > > > > > > > For the usage of the hints config option, I have updated the section > > > > of ConfigOption, you can refer to the FLIP > > > > for more details. > > > > > > > > >Also, the terms 'correlate join' and 'lateral join' are not the > same as > > > in > > > > the current joins page[1], so maybe it would be better if we unified > them > > > > into 'join table function' > > > > > > > > Yes, we should unified to the 'join table function', updated. > > > > > > > > Best, > > > > Aitozi > > > > > > > > Lincoln Lee <li...@gmail.com> 于2023年6月15日周四 09:15写道: > > > > > > > > > Hi Aitozi, > > > > > > > > > > Thanks for your reply! Gives sql users more flexibility to get > > > > > asynchronous processing capabilities via lateral join table > function +1 > > > > for > > > > > this > > > > > > > > > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC > call > > > > as > > > > > an example? > > > > > > > > > > For the name of this query hint, 'LATERAL' (include its internal > > > options) > > > > > don't show any relevance to async, but I haven't thought of a > suitable > > > > name > > > > > at the moment, > > > > > maybe we need to highlight the async keyword directly, we can also > see > > > if > > > > > others have better candidates > > > > > > > > > > For the hint option "timeout = '180s'" should be "'timeout' = > '180s'", > > > > > seems a typo in the flip. And use upper case for all keywords in > sql > > > > > examples. > > > > > Also, the terms 'correlate join' and 'lateral join' are not the > same as > > > > in > > > > > the current joins page[1], so maybe it would be better if we > unified > > > them > > > > > into 'join table function' > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function > > > > > > > > > > Best, > > > > > Lincoln Lee > > > > > > > > > > > > > > > Aitozi <gj...@gmail.com> 于2023年6月14日周三 16:11写道: > > > > > > > > > > > Hi Lincoln > > > > > > > > > > > > Very thanks for your valuable question. I will try to answer > your > > > > > > questions inline. > > > > > > > > > > > > >Does the async udtf bring any additional benefits besides a > > > > > > lighter implementation? > > > > > > > > > > > > IMO, async udtf is more than a lighter implementation. It can > act as > > > a > > > > > > general way for sql users to use the async operator. And they > don't > > > > have > > > > > to > > > > > > bind the async function with a table (a LookupTable), and they > are > > > not > > > > > > forced to join on an equality join condition, and they can use > it to > > > do > > > > > > more than enrich data. > > > > > > > > > > > > The async lookup join is more like a subset/specific usage of > async > > > > udtf. > > > > > > The specific version has more opportunity to be optimized (like > push > > > > > down) > > > > > > is acceptable. Async table function should be categorized to > > > > used-defined > > > > > > function. > > > > > > > > > > > > >Should users > > > > > > > > > > > > migrate to the lookup source when they encounter similar > requirements > > > > or > > > > > > > > > > > > problems, or should we develop an additional set of similar > > > mechanisms? > > > > > > > > > > > > As I clarified above, the lookup join is a specific usage of > async > > > > udtf. > > > > > So > > > > > > it deserves more refined optimization like caching / retryable. > But > > > it > > > > > may > > > > > > not all > > > > > > > > > > > > suitable for the async udtf. As function, it can be > deterministic/or > > > > > > non-deterministic. So caching is not suitable, and we also do not > > > have > > > > a > > > > > > common cache for the udf now. So I think optimization like > > > > caching/retry > > > > > > should be handed over to the function implementor. > > > > > > > > > > > > > the newly added query hint need a different name that > > > > > > can be easier related to the lateral operation as the current > join > > > > > hints[5] > > > > > > do. > > > > > > > > > > > > > > > > > > What about using LATERAL? > > > > > > > > > > > > as below > > > > > > > > > > > > SELECT /*+ LATERAL('output-mode' = 'ordered', 'capacity' = '200', > > > > > timeout = > > > > > > '180s') */ a, c1, c2 > > > > > > > > > > > > FROM T1 > > > > > > > > > > > > LEFT JOIN lateral TABLE (async_split(b)) AS T(c1, c2) ON true > > > > > > > > > > > > >For the async func example, since the target scenario is an > external > > > > io > > > > > > operation, it's better to add the `close` method to actively > release > > > > > > resources as a good example for users > > > > > > > > > > > > > > > > > > Make sense to me, will update the FLIP > > > > > > > > > > > > Best, > > > > > > > > > > > > Aitozi. > > > > > > > > > > > > Lincoln Lee <li...@gmail.com> 于2023年6月14日周三 14:24写道: > > > > > > > > > > > > > Hi Aitozi, > > > > > > > > > > > > > > Sorry for the lately reply here! Supports async > > > > > > udtf(`AsyncTableFunction`) > > > > > > > directly in sql seems like an attractive feature, but there're > two > > > > > issues > > > > > > > that need to be addressed before we can be sure to add it: > > > > > > > 1. As mentioned in the flip[1], the current lookup function can > > > > already > > > > > > > implement the requirements, but it requires implementing an > extra > > > > > > > `LookupTableSource` and explicitly declaring the table schema > > > (which > > > > > can > > > > > > > help implementers the various push-down optimizations > supported by > > > > the > > > > > > > planner). Does the async udtf bring any additional benefits > > > besides a > > > > > > > lighter implementation? > > > > > > > 2. FLIP-221[2] abstracts a reusable cache and metric > infrastructure > > > > for > > > > > > > lookup sources, which are important to improve performance and > > > > > > > observability for high overhead external io scenarios, how do > we > > > > > > integrate > > > > > > > and reuse these capabilities after introducing async udtf? > Should > > > > users > > > > > > > migrate to the lookup source when they encounter similar > > > requirements > > > > > or > > > > > > > problems, or should we develop an additional set of similar > > > > mechanisms? > > > > > > (a > > > > > > > similarly case: FLIP-234[3] introduced the retryable > capability > > > for > > > > > > lookup > > > > > > > join) > > > > > > > > > > > > > > For the flip itself, > > > > > > > 1. Considering the 'options' is already used as the dynamic > table > > > > > > > options[4] in flink, the newly added query hint need a > different > > > name > > > > > > that > > > > > > > can be easier related to the lateral operation as the current > join > > > > > > hints[5] > > > > > > > do. > > > > > > > 2. For the async func example, since the target scenario is an > > > > external > > > > > > io > > > > > > > operation, it's better to add the `close` method to actively > > > release > > > > > > > resources as a good example for users. Also in terms of the > > > > determinism > > > > > > of > > > > > > > a function, it is important to remind users that unless the > > > behavior > > > > of > > > > > > the > > > > > > > function is deterministic, it needs to be explicitly declared > as > > > > > > > non-deterministic. > > > > > > > > > > > > > > [1]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction?src=contextnavpagetreemode > > > > > > > [2]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric?src=contextnavpagetreemode > > > > > > > [3]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode > > > > > > > [4]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL?src=contextnavpagetreemode > > > > > > > [5]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job?src=contextnavpagetreemode > > > > > > > > > > > > > > Best, > > > > > > > Lincoln Lee > > > > > > > > > > > > > > > > > > > > > Aitozi <gj...@gmail.com> 于2023年6月13日周二 11:30写道: > > > > > > > > > > > > > > > Get your meaning now, thanks :) > > > > > > > > > > > > > > > > Best, > > > > > > > > Aitozi. > > > > > > > > > > > > > > > > Feng Jin <ji...@gmail.com> 于2023年6月13日周二 11:16写道: > > > > > > > > > > > > > > > > > Hi Aitozi, > > > > > > > > > > > > > > > > > > Sorry for the confusing description. > > > > > > > > > > > > > > > > > > What I meant was that if we need to remind users about tire > > > > safety > > > > > > > > issues, > > > > > > > > > we should introduce the new UDTF interface instead of > executing > > > > the > > > > > > > > > original UDTF asynchronously. Therefore, I agree with > > > introducing > > > > > the > > > > > > > > > AsyncTableFunction. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Feng > > > > > > > > > > > > > > > > > > On Tue, Jun 13, 2023 at 10:42 AM Aitozi <gj...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Feng, > > > > > > > > > > Thanks for your question. We do not provide a way to > > > switch > > > > > the > > > > > > > > UDTF > > > > > > > > > > between sync and async way, > > > > > > > > > > So there should be no thread safety problem here. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Aitozi > > > > > > > > > > > > > > > > > > > > Feng Jin <ji...@gmail.com> 于2023年6月13日周二 10:31写道: > > > > > > > > > > > > > > > > > > > > > Hi Aitozi, We do need to remind users about thread > safety > > > > > issues. > > > > > > > > Thank > > > > > > > > > > you > > > > > > > > > > > for your efforts on this FLIP. I have no further > questions. > > > > > > > > > > > Best, Feng > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 13, 2023 at 6:05 AM Jing Ge > > > > > > <j...@ververica.com.invalid > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Aitozi, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for taking care of that part. I have no other > > > > concern. > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Jing > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 12, 2023 at 5:38 PM Aitozi < > > > > gjying1...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > BTW, If there are no other more blocking issue / > > > > comments, > > > > > I > > > > > > > > would > > > > > > > > > > like > > > > > > > > > > > > to > > > > > > > > > > > > > start a VOTE in another thread this wednesday 6.14 > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > Aitozi. > > > > > > > > > > > > > > > > > > > > > > > > > > Aitozi <gj...@gmail.com> 于2023年6月12日周一 23:34写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Jing, > > > > > > > > > > > > > > Thanks for your explanation. I get your point > > > now. > > > > > > > > > > > > > > > > > > > > > > > > > > > > For the performance part, I think it's a good > idea to > > > > run > > > > > > > with > > > > > > > > > > > > returning > > > > > > > > > > > > > a > > > > > > > > > > > > > > big table case, the memory consumption > > > > > > > > > > > > > > should be a point to be taken care about. > Because in > > > > the > > > > > > > > ordered > > > > > > > > > > > mode, > > > > > > > > > > > > > the > > > > > > > > > > > > > > head element in buffer may affect the > > > > > > > > > > > > > > total memory consumption. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Aitozi. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jing Ge <ji...@ververica.com.invalid> > 于2023年6月12日周一 > > > > > > 20:28写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Aitozi, > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> Which key will be used for lookup is not an > issue, > > > > only > > > > > > one > > > > > > > > row > > > > > > > > > > will > > > > > > > > > > > > be > > > > > > > > > > > > > >> required for each key in order to enrich it. > True, > > > it > > > > > > > depends > > > > > > > > on > > > > > > > > > > the > > > > > > > > > > > > > >> implementation whether multiple rows or single > row > > > for > > > > > > each > > > > > > > > key > > > > > > > > > > will > > > > > > > > > > > > be > > > > > > > > > > > > > >> returned. However, for the lookup & enrichment > > > > scenario, > > > > > > one > > > > > > > > > > row/key > > > > > > > > > > > > is > > > > > > > > > > > > > >> recommended, otherwise, like I mentioned > previously, > > > > > > > > enrichment > > > > > > > > > > > won't > > > > > > > > > > > > > >> work. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> I am a little bit concerned about returning a > big > > > > table > > > > > > for > > > > > > > > each > > > > > > > > > > > key, > > > > > > > > > > > > > >> since > > > > > > > > > > > > > >> it will take the async call longer to return and > > > need > > > > > more > > > > > > > > > memory. > > > > > > > > > > > The > > > > > > > > > > > > > >> performance tests should cover this scenario. > This > > > is > > > > > not > > > > > > a > > > > > > > > > > blocking > > > > > > > > > > > > > issue > > > > > > > > > > > > > >> for this FLIP. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> Best regards, > > > > > > > > > > > > > >> Jing > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi < > > > > > > > gjying1...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > Hi Jing, > > > > > > > > > > > > > >> > I means the join key is not necessary > [message truncated...]