Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-07-06 Thread Aitozi
Hi guys,
Since there are no further comments, Kindly ping for the vote thread
[1] :D

Thanks,
Aitozi.

[1]: https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx

Aitozi  于2023年6月26日周一 10:31写道:

> Hi Lincoln,
> Thanks for your confirmation. I have updated the consensus to the FLIP
> doc.
> If there are no other comments, I'd like to restart the vote process in
> [1] today.
>
> https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx
>
> Thanks,
> Aitozi.
>
> Lincoln Lee  于2023年6月21日周三 22:29写道:
>
>> Hi Aitozi,
>>
>> Thanks for your updates!
>>
>> By the design of hints, the hints after select clause belong to the query
>> hints category, and this new hint is also a kind of join hints[1].
>> Join table function is one of the join type defined by flink sql joins[2],
>> all existing join hints[1] omit the 'join' keyword,
>> so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for
>> 'ASYNC_TABLE_FUNC_JOIN').
>>
>> Since a short Chinese holiday is coming, I suggest waiting for other
>> people's responses before continuing to vote (next monday?)
>>
>> Btw, I discussed with @fudian offline about pyflink support, there should
>> be no known issues, so you can create a subtask with pyflink support after
>> the vote passed.
>>
>> [1]
>>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints
>> [2]
>>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/
>>
>> Best,
>> Lincoln Lee
>>
>>
>> Aitozi  于2023年6月18日周日 21:18写道:
>>
>> > Hi all,
>> > Sorry for the late reply, I have a discussion with Lincoln offline,
>> > mainly about
>> > the naming of the hints option. Thanks Lincoln for the valuable
>> > suggestions.
>> >
>> > Let me answer the last email inline.
>> >
>> > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
>> call as
>> > an example?
>> >
>> > Sure, will give an example when adding the doc of async udtf and will
>> > update the FLIP simultaneously
>> >
>> > >For the name of this query hint, 'LATERAL' (include its internal
>> options)
>> > don't show any relevance to async, but I haven't thought of a suitable
>> name
>> > at the moment,
>> >
>> > After some discussion with Lincoln, We prefer to choose one of the
>> > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
>> > Besides, In my opinion the keyword `lateral`'s use scenario is wider
>> than
>> > the table function join, but in this case we only want to config
>> > the async table function, So I'm a bit more lean to the
>> `ASYNC_TABLE_FUNC`.
>> > Looking forward to some inputs if you guys have
>> > some better suggestion on the naming.
>> >
>> > For the usage of the hints config option, I have updated the section
>> > of ConfigOption, you can refer to the FLIP
>> > for more details.
>> >
>> > >Also, the terms 'correlate join' and 'lateral join' are not the same
>> as in
>> > the current joins page[1], so maybe it would be better if we unified
>> them
>> > into  'join table function'
>> >
>> > Yes, we should unified to the 'join table function', updated.
>> >
>> > Best,
>> > Aitozi
>> >
>> > Lincoln Lee  于2023年6月15日周四 09:15写道:
>> >
>> > > Hi Aitozi,
>> > >
>> > > Thanks for your reply!  Gives sql users more flexibility to get
>> > > asynchronous processing capabilities via lateral join table function
>> +1
>> > for
>> > > this
>> > >
>> > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
>> call
>> > as
>> > > an example?
>> > >
>> > > For the name of this query hint, 'LATERAL' (include its internal
>> options)
>> > > don't show any relevance to async, but I haven't thought of a suitable
>> > name
>> > > at the moment,
>> > > maybe we need to highlight the async keyword directly, we can also
>> see if
>> > > others have better candidates
>> > >
>> > > For the hint option "timeout = '180s'" should be "'timeout' = '180s'",
>> > > seems a typo in the flip. And use upper case for all keywords in sql
>> > > examples.
>> > > Also, the terms 'correlate join' and 'lateral join' are not the same
>> as
>> > in
>> > > the current joins page[1], so maybe it would be better if we unified
>> them
>> > > into  'join table function'
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function
>> > >
>> > > Best,
>> > > Lincoln Lee
>> > >
>> > >
>> > > Aitozi  于2023年6月14日周三 16:11写道:
>> > >
>> > > > Hi Lincoln
>> > > >
>> > > > Very thanks for your valuable question. I will try to answer
>> your
>> > > > questions inline.
>> > > >
>> > > > >Does the async udtf bring any additional benefits besides a
>> > > > lighter implementation?
>> > > >
>> > > > IMO, async udtf is more than a lighter implementation. It can act
>> as a
>> > > > general way for sql users to use the async operator. And they don't
>> > have
>> > > to
>> > > > bind the async function with a table (a LookupTable), and they are
>> not
>> > > > forced to join on an 

Re: Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-07-06 Thread Aitozi
Hi Awake,
Thanks for your good point, updated

Best,
Aitozi.

宇航 李  于2023年7月5日周三 11:29写道:

> Hi Aitozi,
>
> I think it is necessary to add the following description in FLIP to
> express the difference between user-defined asynchronous table function and
> AsyncTableFunction:
>
> User-defined asynchronous table functions allow complex parameters (e.g.,
> Row type) to be passed to function, which is important in RPC, rather than
> using ‘join … on ...'.
>
> Thanks,
> Awake.
>
>
> On 2023/06/26 02:31:59 Aitozi wrote:
> > Hi Lincoln,
> > Thanks for your confirmation. I have updated the consensus to the
> FLIP
> > doc.
> > If there are no other comments, I'd like to restart the vote process in
> [1]
> > today.
> >
> > https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx
> >
> > Thanks,
> > Aitozi.
> >
> > Lincoln Lee  于2023年6月21日周三 22:29写道:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for your updates!
> > >
> > > By the design of hints, the hints after select clause belong to the
> query
> > > hints category, and this new hint is also a kind of join hints[1].
> > > Join table function is one of the join type defined by flink sql
> joins[2],
> > > all existing join hints[1] omit the 'join' keyword,
> > > so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for
> > > 'ASYNC_TABLE_FUNC_JOIN').
> > >
> > > Since a short Chinese holiday is coming, I suggest waiting for other
> > > people's responses before continuing to vote (next monday?)
> > >
> > > Btw, I discussed with @fudian offline about pyflink support, there
> should
> > > be no known issues, so you can create a subtask with pyflink support
> after
> > > the vote passed.
> > >
> > > [1]
> > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints
> > > [2]
> > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Aitozi  于2023年6月18日周日 21:18写道:
> > >
> > > > Hi all,
> > > > Sorry for the late reply, I have a discussion with Lincoln
> offline,
> > > > mainly about
> > > > the naming of the hints option. Thanks Lincoln for the valuable
> > > > suggestions.
> > > >
> > > > Let me answer the last email inline.
> > > >
> > > > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
> call
> > > as
> > > > an example?
> > > >
> > > > Sure, will give an example when adding the doc of async udtf and will
> > > > update the FLIP simultaneously
> > > >
> > > > >For the name of this query hint, 'LATERAL' (include its internal
> > > options)
> > > > don't show any relevance to async, but I haven't thought of a
> suitable
> > > name
> > > > at the moment,
> > > >
> > > > After some discussion with Lincoln, We prefer to choose one of the
> > > > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
> > > > Besides, In my opinion the keyword `lateral`'s use scenario is wider
> than
> > > > the table function join, but in this case we only want to config
> > > > the async table function, So I'm a bit more lean to the
> > > `ASYNC_TABLE_FUNC`.
> > > > Looking forward to some inputs if you guys have
> > > > some better suggestion on the naming.
> > > >
> > > > For the usage of the hints config option, I have updated the section
> > > > of ConfigOption, you can refer to the FLIP
> > > > for more details.
> > > >
> > > > >Also, the terms 'correlate join' and 'lateral join' are not the
> same as
> > > in
> > > > the current joins page[1], so maybe it would be better if we unified
> them
> > > > into  'join table function'
> > > >
> > > > Yes, we should unified to the 'join table function', updated.
> > > >
> > > > Best,
> > > > Aitozi
> > > >
> > > > Lincoln Lee  于2023年6月15日周四 09:15写道:
> > > >
> > > > > Hi Aitozi,
> > > > >
> > > > > Thanks for your reply!  Gives sql users more flexibility to get
> > > > > asynchronous processing capabilities via lateral join table
> function +1
> > > > for
> > > > > this
> > > > >
> > > > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
> call
> > > > as
> > > > > an example?
> > > > >
> > > > > For the name of this query hint, 'LATERAL' (include its internal
> > > options)
> > > > > don't show any relevance to async, but I haven't thought of a
> suitable
> > > > name
> > > > > at the moment,
> > > > > maybe we need to highlight the async keyword directly, we can also
> see
> > > if
> > > > > others have better candidates
> > > > >
> > > > > For the hint option "timeout = '180s'" should be "'timeout' =
> '180s'",
> > > > > seems a typo in the flip. And use upper case for all keywords in
> sql
> > > > > examples.
> > > > > Also, the terms 'correlate join' and 'lateral join' are not the
> same as
> > > > in
> > > > > the current joins page[1], so maybe it would be better if we
> unified
> > > them
> > > > > into  'join table function'
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> 

RE: Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-07-04 Thread 宇航 李
Hi Aitozi,

I think it is necessary to add the following description in FLIP to express the 
difference between user-defined asynchronous table function and 
AsyncTableFunction:

User-defined asynchronous table functions allow complex parameters (e.g., Row 
type) to be passed to function, which is important in RPC, rather than using 
‘join … on ...'. 

Thanks,
Awake.


On 2023/06/26 02:31:59 Aitozi wrote:
> Hi Lincoln,
> Thanks for your confirmation. I have updated the consensus to the FLIP
> doc.
> If there are no other comments, I'd like to restart the vote process in [1]
> today.
> 
> https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx
> 
> Thanks,
> Aitozi.
> 
> Lincoln Lee  于2023年6月21日周三 22:29写道:
> 
> > Hi Aitozi,
> >
> > Thanks for your updates!
> >
> > By the design of hints, the hints after select clause belong to the query
> > hints category, and this new hint is also a kind of join hints[1].
> > Join table function is one of the join type defined by flink sql joins[2],
> > all existing join hints[1] omit the 'join' keyword,
> > so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for
> > 'ASYNC_TABLE_FUNC_JOIN').
> >
> > Since a short Chinese holiday is coming, I suggest waiting for other
> > people's responses before continuing to vote (next monday?)
> >
> > Btw, I discussed with @fudian offline about pyflink support, there should
> > be no known issues, so you can create a subtask with pyflink support after
> > the vote passed.
> >
> > [1]
> >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints
> > [2]
> >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Aitozi  于2023年6月18日周日 21:18写道:
> >
> > > Hi all,
> > > Sorry for the late reply, I have a discussion with Lincoln offline,
> > > mainly about
> > > the naming of the hints option. Thanks Lincoln for the valuable
> > > suggestions.
> > >
> > > Let me answer the last email inline.
> > >
> > > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call
> > as
> > > an example?
> > >
> > > Sure, will give an example when adding the doc of async udtf and will
> > > update the FLIP simultaneously
> > >
> > > >For the name of this query hint, 'LATERAL' (include its internal
> > options)
> > > don't show any relevance to async, but I haven't thought of a suitable
> > name
> > > at the moment,
> > >
> > > After some discussion with Lincoln, We prefer to choose one of the
> > > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
> > > Besides, In my opinion the keyword `lateral`'s use scenario is wider than
> > > the table function join, but in this case we only want to config
> > > the async table function, So I'm a bit more lean to the
> > `ASYNC_TABLE_FUNC`.
> > > Looking forward to some inputs if you guys have
> > > some better suggestion on the naming.
> > >
> > > For the usage of the hints config option, I have updated the section
> > > of ConfigOption, you can refer to the FLIP
> > > for more details.
> > >
> > > >Also, the terms 'correlate join' and 'lateral join' are not the same as
> > in
> > > the current joins page[1], so maybe it would be better if we unified them
> > > into  'join table function'
> > >
> > > Yes, we should unified to the 'join table function', updated.
> > >
> > > Best,
> > > Aitozi
> > >
> > > Lincoln Lee  于2023年6月15日周四 09:15写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > Thanks for your reply!  Gives sql users more flexibility to get
> > > > asynchronous processing capabilities via lateral join table function +1
> > > for
> > > > this
> > > >
> > > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call
> > > as
> > > > an example?
> > > >
> > > > For the name of this query hint, 'LATERAL' (include its internal
> > options)
> > > > don't show any relevance to async, but I haven't thought of a suitable
> > > name
> > > > at the moment,
> > > > maybe we need to highlight the async keyword directly, we can also see
> > if
> > > > others have better candidates
> > > >
> > > > For the hint option "timeout = '180s'" should be "'timeout' = '180s'",
> > > > seems a typo in the flip. And use upper case for all keywords in sql
> > > > examples.
> > > > Also, the terms 'correlate join' and 'lateral join' are not the same as
> > > in
> > > > the current joins page[1], so maybe it would be better if we unified
> > them
> > > > into  'join table function'
> > > >
> > > > [1]
> > > >
> > > >
> > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Aitozi  于2023年6月14日周三 16:11写道:
> > > >
> > > > > Hi Lincoln
> > > > >
> > > > > Very thanks for your valuable question. I will try to answer your
> > > > > questions inline.
> > > > >
> > > > > >Does the async udtf bring any additional benefits besides a
> > > > > lighter implementation?

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-25 Thread Aitozi
Hi Lincoln,
Thanks for your confirmation. I have updated the consensus to the FLIP
doc.
If there are no other comments, I'd like to restart the vote process in [1]
today.

https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx

Thanks,
Aitozi.

Lincoln Lee  于2023年6月21日周三 22:29写道:

> Hi Aitozi,
>
> Thanks for your updates!
>
> By the design of hints, the hints after select clause belong to the query
> hints category, and this new hint is also a kind of join hints[1].
> Join table function is one of the join type defined by flink sql joins[2],
> all existing join hints[1] omit the 'join' keyword,
> so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for
> 'ASYNC_TABLE_FUNC_JOIN').
>
> Since a short Chinese holiday is coming, I suggest waiting for other
> people's responses before continuing to vote (next monday?)
>
> Btw, I discussed with @fudian offline about pyflink support, there should
> be no known issues, so you can create a subtask with pyflink support after
> the vote passed.
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/
>
> Best,
> Lincoln Lee
>
>
> Aitozi  于2023年6月18日周日 21:18写道:
>
> > Hi all,
> > Sorry for the late reply, I have a discussion with Lincoln offline,
> > mainly about
> > the naming of the hints option. Thanks Lincoln for the valuable
> > suggestions.
> >
> > Let me answer the last email inline.
> >
> > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call
> as
> > an example?
> >
> > Sure, will give an example when adding the doc of async udtf and will
> > update the FLIP simultaneously
> >
> > >For the name of this query hint, 'LATERAL' (include its internal
> options)
> > don't show any relevance to async, but I haven't thought of a suitable
> name
> > at the moment,
> >
> > After some discussion with Lincoln, We prefer to choose one of the
> > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
> > Besides, In my opinion the keyword `lateral`'s use scenario is wider than
> > the table function join, but in this case we only want to config
> > the async table function, So I'm a bit more lean to the
> `ASYNC_TABLE_FUNC`.
> > Looking forward to some inputs if you guys have
> > some better suggestion on the naming.
> >
> > For the usage of the hints config option, I have updated the section
> > of ConfigOption, you can refer to the FLIP
> > for more details.
> >
> > >Also, the terms 'correlate join' and 'lateral join' are not the same as
> in
> > the current joins page[1], so maybe it would be better if we unified them
> > into  'join table function'
> >
> > Yes, we should unified to the 'join table function', updated.
> >
> > Best,
> > Aitozi
> >
> > Lincoln Lee  于2023年6月15日周四 09:15写道:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for your reply!  Gives sql users more flexibility to get
> > > asynchronous processing capabilities via lateral join table function +1
> > for
> > > this
> > >
> > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call
> > as
> > > an example?
> > >
> > > For the name of this query hint, 'LATERAL' (include its internal
> options)
> > > don't show any relevance to async, but I haven't thought of a suitable
> > name
> > > at the moment,
> > > maybe we need to highlight the async keyword directly, we can also see
> if
> > > others have better candidates
> > >
> > > For the hint option "timeout = '180s'" should be "'timeout' = '180s'",
> > > seems a typo in the flip. And use upper case for all keywords in sql
> > > examples.
> > > Also, the terms 'correlate join' and 'lateral join' are not the same as
> > in
> > > the current joins page[1], so maybe it would be better if we unified
> them
> > > into  'join table function'
> > >
> > > [1]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Aitozi  于2023年6月14日周三 16:11写道:
> > >
> > > > Hi Lincoln
> > > >
> > > > Very thanks for your valuable question. I will try to answer your
> > > > questions inline.
> > > >
> > > > >Does the async udtf bring any additional benefits besides a
> > > > lighter implementation?
> > > >
> > > > IMO, async udtf is more than a lighter implementation. It can act as
> a
> > > > general way for sql users to use the async operator. And they don't
> > have
> > > to
> > > > bind the async function with a table (a LookupTable), and they are
> not
> > > > forced to join on an equality join condition, and they can use it to
> do
> > > > more than enrich data.
> > > >
> > > > The async lookup join is more like a subset/specific usage of async
> > udtf.
> > > > The specific version has more opportunity to be optimized (like push
> > > down)
> > > > is acceptable. Async table function should be categorized to
> > used-defined
> > > > function.
> > > >
> > > > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-21 Thread Lincoln Lee
Hi Aitozi,

Thanks for your updates!

By the design of hints, the hints after select clause belong to the query
hints category, and this new hint is also a kind of join hints[1].
Join table function is one of the join type defined by flink sql joins[2],
all existing join hints[1] omit the 'join' keyword,
so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for
'ASYNC_TABLE_FUNC_JOIN').

Since a short Chinese holiday is coming, I suggest waiting for other
people's responses before continuing to vote (next monday?)

Btw, I discussed with @fudian offline about pyflink support, there should
be no known issues, so you can create a subtask with pyflink support after
the vote passed.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/

Best,
Lincoln Lee


Aitozi  于2023年6月18日周日 21:18写道:

> Hi all,
> Sorry for the late reply, I have a discussion with Lincoln offline,
> mainly about
> the naming of the hints option. Thanks Lincoln for the valuable
> suggestions.
>
> Let me answer the last email inline.
>
> >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call as
> an example?
>
> Sure, will give an example when adding the doc of async udtf and will
> update the FLIP simultaneously
>
> >For the name of this query hint, 'LATERAL' (include its internal options)
> don't show any relevance to async, but I haven't thought of a suitable name
> at the moment,
>
> After some discussion with Lincoln, We prefer to choose one of the
> `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
> Besides, In my opinion the keyword `lateral`'s use scenario is wider than
> the table function join, but in this case we only want to config
> the async table function, So I'm a bit more lean to the `ASYNC_TABLE_FUNC`.
> Looking forward to some inputs if you guys have
> some better suggestion on the naming.
>
> For the usage of the hints config option, I have updated the section
> of ConfigOption, you can refer to the FLIP
> for more details.
>
> >Also, the terms 'correlate join' and 'lateral join' are not the same as in
> the current joins page[1], so maybe it would be better if we unified them
> into  'join table function'
>
> Yes, we should unified to the 'join table function', updated.
>
> Best,
> Aitozi
>
> Lincoln Lee  于2023年6月15日周四 09:15写道:
>
> > Hi Aitozi,
> >
> > Thanks for your reply!  Gives sql users more flexibility to get
> > asynchronous processing capabilities via lateral join table function +1
> for
> > this
> >
> > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call
> as
> > an example?
> >
> > For the name of this query hint, 'LATERAL' (include its internal options)
> > don't show any relevance to async, but I haven't thought of a suitable
> name
> > at the moment,
> > maybe we need to highlight the async keyword directly, we can also see if
> > others have better candidates
> >
> > For the hint option "timeout = '180s'" should be "'timeout' = '180s'",
> > seems a typo in the flip. And use upper case for all keywords in sql
> > examples.
> > Also, the terms 'correlate join' and 'lateral join' are not the same as
> in
> > the current joins page[1], so maybe it would be better if we unified them
> > into  'join table function'
> >
> > [1]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Aitozi  于2023年6月14日周三 16:11写道:
> >
> > > Hi Lincoln
> > >
> > > Very thanks for your valuable question. I will try to answer your
> > > questions inline.
> > >
> > > >Does the async udtf bring any additional benefits besides a
> > > lighter implementation?
> > >
> > > IMO, async udtf is more than a lighter implementation. It can act as a
> > > general way for sql users to use the async operator. And they don't
> have
> > to
> > > bind the async function with a table (a LookupTable), and they are not
> > > forced to join on an equality join condition, and they can use it to do
> > > more than enrich data.
> > >
> > > The async lookup join is more like a subset/specific usage of async
> udtf.
> > > The specific version has more opportunity to be optimized (like push
> > down)
> > > is acceptable. Async table function should be categorized to
> used-defined
> > > function.
> > >
> > > >Should users
> > >
> > > migrate to the lookup source when they encounter similar requirements
> or
> > >
> > > problems, or should we develop an additional set of similar mechanisms?
> > >
> > > As I clarified above, the lookup join is a specific usage of async
> udtf.
> > So
> > > it deserves more refined optimization like caching / retryable. But it
> > may
> > > not all
> > >
> > > suitable for the async udtf. As function, it can be deterministic/or
> > > non-deterministic. So caching is not suitable, and we also do not have
> a
> > > common cache for the udf now. So I 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-18 Thread Aitozi
Hi all,
Sorry for the late reply, I have a discussion with Lincoln offline,
mainly about
the naming of the hints option. Thanks Lincoln for the valuable suggestions.

Let me answer the last email inline.

>For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call as
an example?

Sure, will give an example when adding the doc of async udtf and will
update the FLIP simultaneously

>For the name of this query hint, 'LATERAL' (include its internal options)
don't show any relevance to async, but I haven't thought of a suitable name
at the moment,

After some discussion with Lincoln, We prefer to choose one of the
`ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
Besides, In my opinion the keyword `lateral`'s use scenario is wider than
the table function join, but in this case we only want to config
the async table function, So I'm a bit more lean to the `ASYNC_TABLE_FUNC`.
Looking forward to some inputs if you guys have
some better suggestion on the naming.

For the usage of the hints config option, I have updated the section
of ConfigOption, you can refer to the FLIP
for more details.

>Also, the terms 'correlate join' and 'lateral join' are not the same as in
the current joins page[1], so maybe it would be better if we unified them
into  'join table function'

Yes, we should unified to the 'join table function', updated.

Best,
Aitozi

Lincoln Lee  于2023年6月15日周四 09:15写道:

> Hi Aitozi,
>
> Thanks for your reply!  Gives sql users more flexibility to get
> asynchronous processing capabilities via lateral join table function +1 for
> this
>
> For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call as
> an example?
>
> For the name of this query hint, 'LATERAL' (include its internal options)
> don't show any relevance to async, but I haven't thought of a suitable name
> at the moment,
> maybe we need to highlight the async keyword directly, we can also see if
> others have better candidates
>
> For the hint option "timeout = '180s'" should be "'timeout' = '180s'",
> seems a typo in the flip. And use upper case for all keywords in sql
> examples.
> Also, the terms 'correlate join' and 'lateral join' are not the same as in
> the current joins page[1], so maybe it would be better if we unified them
> into  'join table function'
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function
>
> Best,
> Lincoln Lee
>
>
> Aitozi  于2023年6月14日周三 16:11写道:
>
> > Hi Lincoln
> >
> > Very thanks for your valuable question. I will try to answer your
> > questions inline.
> >
> > >Does the async udtf bring any additional benefits besides a
> > lighter implementation?
> >
> > IMO, async udtf is more than a lighter implementation. It can act as a
> > general way for sql users to use the async operator. And they don't have
> to
> > bind the async function with a table (a LookupTable), and they are not
> > forced to join on an equality join condition, and they can use it to do
> > more than enrich data.
> >
> > The async lookup join is more like a subset/specific usage of async udtf.
> > The specific version has more opportunity to be optimized (like push
> down)
> > is acceptable. Async table function should be categorized to used-defined
> > function.
> >
> > >Should users
> >
> > migrate to the lookup source when they encounter similar requirements or
> >
> > problems, or should we develop an additional set of similar mechanisms?
> >
> > As I clarified above, the lookup join is a specific usage of async udtf.
> So
> > it deserves more refined optimization like caching / retryable. But it
> may
> > not all
> >
> > suitable for the async udtf. As function, it can be deterministic/or
> > non-deterministic. So caching is not suitable, and we also do not have a
> > common cache for the udf now. So I think optimization like caching/retry
> > should be handed over to the function implementor.
> >
> > > the newly added query hint need a different name that
> > can be easier related to the lateral operation as the current join
> hints[5]
> > do.
> >
> >
> > What about using LATERAL?
> >
> > as below
> >
> > SELECT /*+ LATERAL('output-mode' = 'ordered', 'capacity' = '200',
> timeout =
> > '180s') */ a, c1, c2
> >
> > FROM T1
> >
> > LEFT JOIN lateral TABLE (async_split(b)) AS T(c1, c2) ON true
> >
> > >For the async func example, since the target scenario is an external io
> > operation, it's better to add the `close` method to actively release
> > resources as a good example for users
> >
> >
> > Make sense to me, will update the FLIP
> >
> > Best,
> >
> > Aitozi.
> >
> > Lincoln Lee  于2023年6月14日周三 14:24写道:
> >
> > > Hi Aitozi,
> > >
> > > Sorry for the lately reply here!  Supports async
> > udtf(`AsyncTableFunction`)
> > > directly in sql seems like an attractive feature, but there're two
> issues
> > > that need to be addressed before we can be sure to add it:
> > > 1. As mentioned in the flip[1], the current lookup function can already
> > > implement the 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-14 Thread Lincoln Lee
Hi Aitozi,

Thanks for your reply!  Gives sql users more flexibility to get
asynchronous processing capabilities via lateral join table function +1 for
this

For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call as
an example?

For the name of this query hint, 'LATERAL' (include its internal options)
don't show any relevance to async, but I haven't thought of a suitable name
at the moment,
maybe we need to highlight the async keyword directly, we can also see if
others have better candidates

For the hint option "timeout = '180s'" should be "'timeout' = '180s'",
seems a typo in the flip. And use upper case for all keywords in sql
examples.
Also, the terms 'correlate join' and 'lateral join' are not the same as in
the current joins page[1], so maybe it would be better if we unified them
into  'join table function'

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function

Best,
Lincoln Lee


Aitozi  于2023年6月14日周三 16:11写道:

> Hi Lincoln
>
> Very thanks for your valuable question. I will try to answer your
> questions inline.
>
> >Does the async udtf bring any additional benefits besides a
> lighter implementation?
>
> IMO, async udtf is more than a lighter implementation. It can act as a
> general way for sql users to use the async operator. And they don't have to
> bind the async function with a table (a LookupTable), and they are not
> forced to join on an equality join condition, and they can use it to do
> more than enrich data.
>
> The async lookup join is more like a subset/specific usage of async udtf.
> The specific version has more opportunity to be optimized (like push down)
> is acceptable. Async table function should be categorized to used-defined
> function.
>
> >Should users
>
> migrate to the lookup source when they encounter similar requirements or
>
> problems, or should we develop an additional set of similar mechanisms?
>
> As I clarified above, the lookup join is a specific usage of async udtf. So
> it deserves more refined optimization like caching / retryable. But it may
> not all
>
> suitable for the async udtf. As function, it can be deterministic/or
> non-deterministic. So caching is not suitable, and we also do not have a
> common cache for the udf now. So I think optimization like caching/retry
> should be handed over to the function implementor.
>
> > the newly added query hint need a different name that
> can be easier related to the lateral operation as the current join hints[5]
> do.
>
>
> What about using LATERAL?
>
> as below
>
> SELECT /*+ LATERAL('output-mode' = 'ordered', 'capacity' = '200', timeout =
> '180s') */ a, c1, c2
>
> FROM T1
>
> LEFT JOIN lateral TABLE (async_split(b)) AS T(c1, c2) ON true
>
> >For the async func example, since the target scenario is an external io
> operation, it's better to add the `close` method to actively release
> resources as a good example for users
>
>
> Make sense to me, will update the FLIP
>
> Best,
>
> Aitozi.
>
> Lincoln Lee  于2023年6月14日周三 14:24写道:
>
> > Hi Aitozi,
> >
> > Sorry for the lately reply here!  Supports async
> udtf(`AsyncTableFunction`)
> > directly in sql seems like an attractive feature, but there're two issues
> > that need to be addressed before we can be sure to add it:
> > 1. As mentioned in the flip[1], the current lookup function can already
> > implement the requirements, but it requires implementing an extra
> > `LookupTableSource` and explicitly declaring the table schema (which can
> > help implementers the various push-down optimizations supported by the
> > planner). Does the async udtf bring any additional benefits besides a
> > lighter implementation?
> > 2. FLIP-221[2] abstracts a reusable cache and metric infrastructure for
> > lookup sources, which are important to improve performance and
> > observability for high overhead external io scenarios, how do we
> integrate
> > and reuse these capabilities after introducing async udtf? Should users
> > migrate to the lookup source when they encounter similar requirements or
> > problems, or should we develop an additional set of similar mechanisms?
> (a
> > similarly case:  FLIP-234[3] introduced the retryable capability for
> lookup
> > join)
> >
> > For the flip itself,
> > 1. Considering the 'options' is already used as the dynamic table
> > options[4] in flink, the newly added query hint need a different name
> that
> > can be easier related to the lateral operation as the current join
> hints[5]
> > do.
> > 2. For the async func example, since the target scenario is an external
> io
> > operation, it's better to add the `close` method to actively release
> > resources as a good example for users. Also in terms of the determinism
> of
> > a function, it is important to remind users that unless the behavior of
> the
> > function is deterministic, it needs to be explicitly declared as
> > non-deterministic.
> >
> > [1].
> >
> >
> 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-14 Thread Aitozi
Hi Lincoln

Very thanks for your valuable question. I will try to answer your
questions inline.

>Does the async udtf bring any additional benefits besides a
lighter implementation?

IMO, async udtf is more than a lighter implementation. It can act as a
general way for sql users to use the async operator. And they don't have to
bind the async function with a table (a LookupTable), and they are not
forced to join on an equality join condition, and they can use it to do
more than enrich data.

The async lookup join is more like a subset/specific usage of async udtf.
The specific version has more opportunity to be optimized (like push down)
is acceptable. Async table function should be categorized to used-defined
function.

>Should users

migrate to the lookup source when they encounter similar requirements or

problems, or should we develop an additional set of similar mechanisms?

As I clarified above, the lookup join is a specific usage of async udtf. So
it deserves more refined optimization like caching / retryable. But it may
not all

suitable for the async udtf. As function, it can be deterministic/or
non-deterministic. So caching is not suitable, and we also do not have a
common cache for the udf now. So I think optimization like caching/retry
should be handed over to the function implementor.

> the newly added query hint need a different name that
can be easier related to the lateral operation as the current join hints[5]
do.


What about using LATERAL?

as below

SELECT /*+ LATERAL('output-mode' = 'ordered', 'capacity' = '200', timeout =
'180s') */ a, c1, c2

FROM T1

LEFT JOIN lateral TABLE (async_split(b)) AS T(c1, c2) ON true

>For the async func example, since the target scenario is an external io
operation, it's better to add the `close` method to actively release
resources as a good example for users


Make sense to me, will update the FLIP

Best,

Aitozi.

Lincoln Lee  于2023年6月14日周三 14:24写道:

> Hi Aitozi,
>
> Sorry for the lately reply here!  Supports async udtf(`AsyncTableFunction`)
> directly in sql seems like an attractive feature, but there're two issues
> that need to be addressed before we can be sure to add it:
> 1. As mentioned in the flip[1], the current lookup function can already
> implement the requirements, but it requires implementing an extra
> `LookupTableSource` and explicitly declaring the table schema (which can
> help implementers the various push-down optimizations supported by the
> planner). Does the async udtf bring any additional benefits besides a
> lighter implementation?
> 2. FLIP-221[2] abstracts a reusable cache and metric infrastructure for
> lookup sources, which are important to improve performance and
> observability for high overhead external io scenarios, how do we integrate
> and reuse these capabilities after introducing async udtf? Should users
> migrate to the lookup source when they encounter similar requirements or
> problems, or should we develop an additional set of similar mechanisms? (a
> similarly case:  FLIP-234[3] introduced the retryable capability for lookup
> join)
>
> For the flip itself,
> 1. Considering the 'options' is already used as the dynamic table
> options[4] in flink, the newly added query hint need a different name that
> can be easier related to the lateral operation as the current join hints[5]
> do.
> 2. For the async func example, since the target scenario is an external io
> operation, it's better to add the `close` method to actively release
> resources as a good example for users. Also in terms of the determinism of
> a function, it is important to remind users that unless the behavior of the
> function is deterministic, it needs to be explicitly declared as
> non-deterministic.
>
> [1].
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction?src=contextnavpagetreemode
> [2].
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric?src=contextnavpagetreemode
> [3].
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
> [4].
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL?src=contextnavpagetreemode
> [5].
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job?src=contextnavpagetreemode
>
> Best,
> Lincoln Lee
>
>
> Aitozi  于2023年6月13日周二 11:30写道:
>
> > Get your meaning now, thanks :)
> >
> > Best,
> > Aitozi.
> >
> > Feng Jin  于2023年6月13日周二 11:16写道:
> >
> > > Hi Aitozi,
> > >
> > > Sorry for the confusing description.
> > >
> > > What I meant was that if we need to remind users about tire safety
> > issues,
> > > we should introduce the new UDTF interface instead of executing the
> > > original UDTF asynchronously. Therefore, I agree with introducing the
> > > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-14 Thread Lincoln Lee
Hi Aitozi,

Sorry for the lately reply here!  Supports async udtf(`AsyncTableFunction`)
directly in sql seems like an attractive feature, but there're two issues
that need to be addressed before we can be sure to add it:
1. As mentioned in the flip[1], the current lookup function can already
implement the requirements, but it requires implementing an extra
`LookupTableSource` and explicitly declaring the table schema (which can
help implementers the various push-down optimizations supported by the
planner). Does the async udtf bring any additional benefits besides a
lighter implementation?
2. FLIP-221[2] abstracts a reusable cache and metric infrastructure for
lookup sources, which are important to improve performance and
observability for high overhead external io scenarios, how do we integrate
and reuse these capabilities after introducing async udtf? Should users
migrate to the lookup source when they encounter similar requirements or
problems, or should we develop an additional set of similar mechanisms? (a
similarly case:  FLIP-234[3] introduced the retryable capability for lookup
join)

For the flip itself,
1. Considering the 'options' is already used as the dynamic table
options[4] in flink, the newly added query hint need a different name that
can be easier related to the lateral operation as the current join hints[5]
do.
2. For the async func example, since the target scenario is an external io
operation, it's better to add the `close` method to actively release
resources as a good example for users. Also in terms of the determinism of
a function, it is important to remind users that unless the behavior of the
function is deterministic, it needs to be explicitly declared as
non-deterministic.

[1].
https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction?src=contextnavpagetreemode
[2].
https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric?src=contextnavpagetreemode
[3].
https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
[4].
https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL?src=contextnavpagetreemode
[5].
https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job?src=contextnavpagetreemode

Best,
Lincoln Lee


Aitozi  于2023年6月13日周二 11:30写道:

> Get your meaning now, thanks :)
>
> Best,
> Aitozi.
>
> Feng Jin  于2023年6月13日周二 11:16写道:
>
> > Hi Aitozi,
> >
> > Sorry for the confusing description.
> >
> > What I meant was that if we need to remind users about tire safety
> issues,
> > we should introduce the new UDTF interface instead of executing the
> > original UDTF asynchronously. Therefore, I agree with introducing the
> > AsyncTableFunction.
> >
> > Best,
> > Feng
> >
> > On Tue, Jun 13, 2023 at 10:42 AM Aitozi  wrote:
> >
> > > Hi Feng,
> > > Thanks for your question. We do not provide a way to switch the
> UDTF
> > > between sync and async way,
> > > So there should be no thread safety problem here.
> > >
> > > Best,
> > > Aitozi
> > >
> > > Feng Jin  于2023年6月13日周二 10:31写道:
> > >
> > > > Hi Aitozi, We do need to remind users about thread safety issues.
> Thank
> > > you
> > > > for your efforts on this FLIP. I have no further questions.
> > > > Best, Feng
> > > >
> > > >
> > > > On Tue, Jun 13, 2023 at 6:05 AM Jing Ge 
> > > > wrote:
> > > >
> > > > > Hi Aitozi,
> > > > >
> > > > > Thanks for taking care of that part. I have no other concern.
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > > > >
> > > > > On Mon, Jun 12, 2023 at 5:38 PM Aitozi 
> wrote:
> > > > >
> > > > > > BTW, If there are no other more blocking issue / comments, I
> would
> > > like
> > > > > to
> > > > > > start a VOTE in another thread this wednesday 6.14
> > > > > >
> > > > > > Thanks,
> > > > > > Aitozi.
> > > > > >
> > > > > > Aitozi  于2023年6月12日周一 23:34写道:
> > > > > >
> > > > > > > Hi, Jing,
> > > > > > > Thanks for your explanation. I get your point now.
> > > > > > >
> > > > > > > For the performance part, I think it's a good idea to run with
> > > > > returning
> > > > > > a
> > > > > > > big table case, the memory consumption
> > > > > > > should be a point to be taken care about. Because in the
> ordered
> > > > mode,
> > > > > > the
> > > > > > > head element in buffer may affect the
> > > > > > > total memory consumption.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Aitozi.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Jing Ge  于2023年6月12日周一 20:28写道:
> > > > > > >
> > > > > > >> Hi Aitozi,
> > > > > > >>
> > > > > > >> Which key will be used for lookup is not an issue, only one
> row
> > > will
> > > > > be
> > > > > > >> required for each key in order to enrich it. True, it depends
> on
> > > the
> > > > > > >> implementation whether 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Aitozi
Get your meaning now, thanks :)

Best,
Aitozi.

Feng Jin  于2023年6月13日周二 11:16写道:

> Hi Aitozi,
>
> Sorry for the confusing description.
>
> What I meant was that if we need to remind users about tire safety issues,
> we should introduce the new UDTF interface instead of executing the
> original UDTF asynchronously. Therefore, I agree with introducing the
> AsyncTableFunction.
>
> Best,
> Feng
>
> On Tue, Jun 13, 2023 at 10:42 AM Aitozi  wrote:
>
> > Hi Feng,
> > Thanks for your question. We do not provide a way to switch the UDTF
> > between sync and async way,
> > So there should be no thread safety problem here.
> >
> > Best,
> > Aitozi
> >
> > Feng Jin  于2023年6月13日周二 10:31写道:
> >
> > > Hi Aitozi, We do need to remind users about thread safety issues. Thank
> > you
> > > for your efforts on this FLIP. I have no further questions.
> > > Best, Feng
> > >
> > >
> > > On Tue, Jun 13, 2023 at 6:05 AM Jing Ge 
> > > wrote:
> > >
> > > > Hi Aitozi,
> > > >
> > > > Thanks for taking care of that part. I have no other concern.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > >
> > > > On Mon, Jun 12, 2023 at 5:38 PM Aitozi  wrote:
> > > >
> > > > > BTW, If there are no other more blocking issue / comments, I would
> > like
> > > > to
> > > > > start a VOTE in another thread this wednesday 6.14
> > > > >
> > > > > Thanks,
> > > > > Aitozi.
> > > > >
> > > > > Aitozi  于2023年6月12日周一 23:34写道:
> > > > >
> > > > > > Hi, Jing,
> > > > > > Thanks for your explanation. I get your point now.
> > > > > >
> > > > > > For the performance part, I think it's a good idea to run with
> > > > returning
> > > > > a
> > > > > > big table case, the memory consumption
> > > > > > should be a point to be taken care about. Because in the ordered
> > > mode,
> > > > > the
> > > > > > head element in buffer may affect the
> > > > > > total memory consumption.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Aitozi.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Jing Ge  于2023年6月12日周一 20:28写道:
> > > > > >
> > > > > >> Hi Aitozi,
> > > > > >>
> > > > > >> Which key will be used for lookup is not an issue, only one row
> > will
> > > > be
> > > > > >> required for each key in order to enrich it. True, it depends on
> > the
> > > > > >> implementation whether multiple rows or single row for each key
> > will
> > > > be
> > > > > >> returned. However, for the lookup & enrichment scenario, one
> > row/key
> > > > is
> > > > > >> recommended, otherwise, like I mentioned previously, enrichment
> > > won't
> > > > > >> work.
> > > > > >>
> > > > > >> I am a little bit concerned about returning a big table for each
> > > key,
> > > > > >> since
> > > > > >> it will take the async call longer to return and need more
> memory.
> > > The
> > > > > >> performance tests should cover this scenario. This is not a
> > blocking
> > > > > issue
> > > > > >> for this FLIP.
> > > > > >>
> > > > > >> Best regards,
> > > > > >> Jing
> > > > > >>
> > > > > >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi 
> > > wrote:
> > > > > >>
> > > > > >> > Hi Jing,
> > > > > >> > I means the join key is not necessary to be the primary
> key
> > or
> > > > > >> unique
> > > > > >> > index of the database.
> > > > > >> > In this situation, we may queried out multi rows for one join
> > > key. I
> > > > > >> think
> > > > > >> > that's why the
> > > > > >> > LookupFunction#lookup will return a collection of RowData.
> > > > > >> >
> > > > > >> > BTW, I think the behavior of lookup join will not affect the
> > > > semantic
> > > > > of
> > > > > >> > the async udtf.
> > > > > >> > We use the Async TableFunction here and the table function can
> > > > collect
> > > > > >> > multiple rows.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Atiozi.
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > Jing Ge  于2023年6月10日周六 00:15写道:
> > > > > >> >
> > > > > >> > > Hi Aitozi,
> > > > > >> > >
> > > > > >> > > The keyRow used in this case contains all keys[1].
> > > > > >> > >
> > > > > >> > > Best regards,
> > > > > >> > > Jing
> > > > > >> > >
> > > > > >> > > [1]
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Fri, Jun 9, 2023 at 3:42 PM Aitozi  >
> > > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Jing,
> > > > > >> > > >
> > > > > >> > > >  The performance test is added to the FLIP.
> > > > > >> > > >
> > > > > >> > > >  As I know, The lookup join can return multi rows, it
> > > > depends
> > > > > on
> > > > > >> > > > whether  the join key
> > > > > >> > > > is the primary key of the external database or not. The
> > > `lookup`
> > > > > [1]
> > > > > >> > will
> > > > > >> > > > return a collection of
> > > > > >> > > > joined result, and each of them will be 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Feng Jin
Hi Aitozi,

Sorry for the confusing description.

What I meant was that if we need to remind users about tire safety issues,
we should introduce the new UDTF interface instead of executing the
original UDTF asynchronously. Therefore, I agree with introducing the
AsyncTableFunction.

Best,
Feng

On Tue, Jun 13, 2023 at 10:42 AM Aitozi  wrote:

> Hi Feng,
> Thanks for your question. We do not provide a way to switch the UDTF
> between sync and async way,
> So there should be no thread safety problem here.
>
> Best,
> Aitozi
>
> Feng Jin  于2023年6月13日周二 10:31写道:
>
> > Hi Aitozi, We do need to remind users about thread safety issues. Thank
> you
> > for your efforts on this FLIP. I have no further questions.
> > Best, Feng
> >
> >
> > On Tue, Jun 13, 2023 at 6:05 AM Jing Ge 
> > wrote:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for taking care of that part. I have no other concern.
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Mon, Jun 12, 2023 at 5:38 PM Aitozi  wrote:
> > >
> > > > BTW, If there are no other more blocking issue / comments, I would
> like
> > > to
> > > > start a VOTE in another thread this wednesday 6.14
> > > >
> > > > Thanks,
> > > > Aitozi.
> > > >
> > > > Aitozi  于2023年6月12日周一 23:34写道:
> > > >
> > > > > Hi, Jing,
> > > > > Thanks for your explanation. I get your point now.
> > > > >
> > > > > For the performance part, I think it's a good idea to run with
> > > returning
> > > > a
> > > > > big table case, the memory consumption
> > > > > should be a point to be taken care about. Because in the ordered
> > mode,
> > > > the
> > > > > head element in buffer may affect the
> > > > > total memory consumption.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Aitozi.
> > > > >
> > > > >
> > > > >
> > > > > Jing Ge  于2023年6月12日周一 20:28写道:
> > > > >
> > > > >> Hi Aitozi,
> > > > >>
> > > > >> Which key will be used for lookup is not an issue, only one row
> will
> > > be
> > > > >> required for each key in order to enrich it. True, it depends on
> the
> > > > >> implementation whether multiple rows or single row for each key
> will
> > > be
> > > > >> returned. However, for the lookup & enrichment scenario, one
> row/key
> > > is
> > > > >> recommended, otherwise, like I mentioned previously, enrichment
> > won't
> > > > >> work.
> > > > >>
> > > > >> I am a little bit concerned about returning a big table for each
> > key,
> > > > >> since
> > > > >> it will take the async call longer to return and need more memory.
> > The
> > > > >> performance tests should cover this scenario. This is not a
> blocking
> > > > issue
> > > > >> for this FLIP.
> > > > >>
> > > > >> Best regards,
> > > > >> Jing
> > > > >>
> > > > >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi 
> > wrote:
> > > > >>
> > > > >> > Hi Jing,
> > > > >> > I means the join key is not necessary to be the primary key
> or
> > > > >> unique
> > > > >> > index of the database.
> > > > >> > In this situation, we may queried out multi rows for one join
> > key. I
> > > > >> think
> > > > >> > that's why the
> > > > >> > LookupFunction#lookup will return a collection of RowData.
> > > > >> >
> > > > >> > BTW, I think the behavior of lookup join will not affect the
> > > semantic
> > > > of
> > > > >> > the async udtf.
> > > > >> > We use the Async TableFunction here and the table function can
> > > collect
> > > > >> > multiple rows.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Atiozi.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > Jing Ge  于2023年6月10日周六 00:15写道:
> > > > >> >
> > > > >> > > Hi Aitozi,
> > > > >> > >
> > > > >> > > The keyRow used in this case contains all keys[1].
> > > > >> > >
> > > > >> > > Best regards,
> > > > >> > > Jing
> > > > >> > >
> > > > >> > > [1]
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > > > >> > >
> > > > >> > >
> > > > >> > > On Fri, Jun 9, 2023 at 3:42 PM Aitozi 
> > > wrote:
> > > > >> > >
> > > > >> > > > Hi Jing,
> > > > >> > > >
> > > > >> > > >  The performance test is added to the FLIP.
> > > > >> > > >
> > > > >> > > >  As I know, The lookup join can return multi rows, it
> > > depends
> > > > on
> > > > >> > > > whether  the join key
> > > > >> > > > is the primary key of the external database or not. The
> > `lookup`
> > > > [1]
> > > > >> > will
> > > > >> > > > return a collection of
> > > > >> > > > joined result, and each of them will be collected
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > [1]:
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Aitozi.
> 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Aitozi
Hi Feng,
Thanks for your question. We do not provide a way to switch the UDTF
between sync and async way,
So there should be no thread safety problem here.

Best,
Aitozi

Feng Jin  于2023年6月13日周二 10:31写道:

> Hi Aitozi, We do need to remind users about thread safety issues. Thank you
> for your efforts on this FLIP. I have no further questions.
> Best, Feng
>
>
> On Tue, Jun 13, 2023 at 6:05 AM Jing Ge 
> wrote:
>
> > Hi Aitozi,
> >
> > Thanks for taking care of that part. I have no other concern.
> >
> > Best regards,
> > Jing
> >
> >
> > On Mon, Jun 12, 2023 at 5:38 PM Aitozi  wrote:
> >
> > > BTW, If there are no other more blocking issue / comments, I would like
> > to
> > > start a VOTE in another thread this wednesday 6.14
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > > Aitozi  于2023年6月12日周一 23:34写道:
> > >
> > > > Hi, Jing,
> > > > Thanks for your explanation. I get your point now.
> > > >
> > > > For the performance part, I think it's a good idea to run with
> > returning
> > > a
> > > > big table case, the memory consumption
> > > > should be a point to be taken care about. Because in the ordered
> mode,
> > > the
> > > > head element in buffer may affect the
> > > > total memory consumption.
> > > >
> > > >
> > > > Thanks,
> > > > Aitozi.
> > > >
> > > >
> > > >
> > > > Jing Ge  于2023年6月12日周一 20:28写道:
> > > >
> > > >> Hi Aitozi,
> > > >>
> > > >> Which key will be used for lookup is not an issue, only one row will
> > be
> > > >> required for each key in order to enrich it. True, it depends on the
> > > >> implementation whether multiple rows or single row for each key will
> > be
> > > >> returned. However, for the lookup & enrichment scenario, one row/key
> > is
> > > >> recommended, otherwise, like I mentioned previously, enrichment
> won't
> > > >> work.
> > > >>
> > > >> I am a little bit concerned about returning a big table for each
> key,
> > > >> since
> > > >> it will take the async call longer to return and need more memory.
> The
> > > >> performance tests should cover this scenario. This is not a blocking
> > > issue
> > > >> for this FLIP.
> > > >>
> > > >> Best regards,
> > > >> Jing
> > > >>
> > > >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi 
> wrote:
> > > >>
> > > >> > Hi Jing,
> > > >> > I means the join key is not necessary to be the primary key or
> > > >> unique
> > > >> > index of the database.
> > > >> > In this situation, we may queried out multi rows for one join
> key. I
> > > >> think
> > > >> > that's why the
> > > >> > LookupFunction#lookup will return a collection of RowData.
> > > >> >
> > > >> > BTW, I think the behavior of lookup join will not affect the
> > semantic
> > > of
> > > >> > the async udtf.
> > > >> > We use the Async TableFunction here and the table function can
> > collect
> > > >> > multiple rows.
> > > >> >
> > > >> > Thanks,
> > > >> > Atiozi.
> > > >> >
> > > >> >
> > > >> >
> > > >> > Jing Ge  于2023年6月10日周六 00:15写道:
> > > >> >
> > > >> > > Hi Aitozi,
> > > >> > >
> > > >> > > The keyRow used in this case contains all keys[1].
> > > >> > >
> > > >> > > Best regards,
> > > >> > > Jing
> > > >> > >
> > > >> > > [1]
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > > >> > >
> > > >> > >
> > > >> > > On Fri, Jun 9, 2023 at 3:42 PM Aitozi 
> > wrote:
> > > >> > >
> > > >> > > > Hi Jing,
> > > >> > > >
> > > >> > > >  The performance test is added to the FLIP.
> > > >> > > >
> > > >> > > >  As I know, The lookup join can return multi rows, it
> > depends
> > > on
> > > >> > > > whether  the join key
> > > >> > > > is the primary key of the external database or not. The
> `lookup`
> > > [1]
> > > >> > will
> > > >> > > > return a collection of
> > > >> > > > joined result, and each of them will be collected
> > > >> > > >
> > > >> > > >
> > > >> > > > [1]:
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
> > > >> > > >
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Aitozi.
> > > >> > > >
> > > >> > > > Jing Ge  于2023年6月9日周五 17:05写道:
> > > >> > > >
> > > >> > > > > Hi Aitozi,
> > > >> > > > >
> > > >> > > > > Thanks for the feedback. Looking forward to the performance
> > > tests.
> > > >> > > > >
> > > >> > > > > Afaik, lookup returns one row for each key [1] [2].
> > > Conceptually,
> > > >> the
> > > >> > > > > lookup function is used to enrich column(s) from the
> dimension
> > > >> table.
> > > >> > > If,
> > > >> > > > > for the given key, there will be more than one row, there
> will
> > > be
> > > >> no
> > > >> > > way
> > > >> > > > to
> > > >> > > > > know which row will be used to enrich the key.
> > > >> > > > >
> > > >> > > > > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Feng Jin
Hi Aitozi, We do need to remind users about thread safety issues. Thank you
for your efforts on this FLIP. I have no further questions.
Best, Feng


On Tue, Jun 13, 2023 at 6:05 AM Jing Ge  wrote:

> Hi Aitozi,
>
> Thanks for taking care of that part. I have no other concern.
>
> Best regards,
> Jing
>
>
> On Mon, Jun 12, 2023 at 5:38 PM Aitozi  wrote:
>
> > BTW, If there are no other more blocking issue / comments, I would like
> to
> > start a VOTE in another thread this wednesday 6.14
> >
> > Thanks,
> > Aitozi.
> >
> > Aitozi  于2023年6月12日周一 23:34写道:
> >
> > > Hi, Jing,
> > > Thanks for your explanation. I get your point now.
> > >
> > > For the performance part, I think it's a good idea to run with
> returning
> > a
> > > big table case, the memory consumption
> > > should be a point to be taken care about. Because in the ordered mode,
> > the
> > > head element in buffer may affect the
> > > total memory consumption.
> > >
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > >
> > >
> > > Jing Ge  于2023年6月12日周一 20:28写道:
> > >
> > >> Hi Aitozi,
> > >>
> > >> Which key will be used for lookup is not an issue, only one row will
> be
> > >> required for each key in order to enrich it. True, it depends on the
> > >> implementation whether multiple rows or single row for each key will
> be
> > >> returned. However, for the lookup & enrichment scenario, one row/key
> is
> > >> recommended, otherwise, like I mentioned previously, enrichment won't
> > >> work.
> > >>
> > >> I am a little bit concerned about returning a big table for each key,
> > >> since
> > >> it will take the async call longer to return and need more memory. The
> > >> performance tests should cover this scenario. This is not a blocking
> > issue
> > >> for this FLIP.
> > >>
> > >> Best regards,
> > >> Jing
> > >>
> > >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi  wrote:
> > >>
> > >> > Hi Jing,
> > >> > I means the join key is not necessary to be the primary key or
> > >> unique
> > >> > index of the database.
> > >> > In this situation, we may queried out multi rows for one join key. I
> > >> think
> > >> > that's why the
> > >> > LookupFunction#lookup will return a collection of RowData.
> > >> >
> > >> > BTW, I think the behavior of lookup join will not affect the
> semantic
> > of
> > >> > the async udtf.
> > >> > We use the Async TableFunction here and the table function can
> collect
> > >> > multiple rows.
> > >> >
> > >> > Thanks,
> > >> > Atiozi.
> > >> >
> > >> >
> > >> >
> > >> > Jing Ge  于2023年6月10日周六 00:15写道:
> > >> >
> > >> > > Hi Aitozi,
> > >> > >
> > >> > > The keyRow used in this case contains all keys[1].
> > >> > >
> > >> > > Best regards,
> > >> > > Jing
> > >> > >
> > >> > > [1]
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > >> > >
> > >> > >
> > >> > > On Fri, Jun 9, 2023 at 3:42 PM Aitozi 
> wrote:
> > >> > >
> > >> > > > Hi Jing,
> > >> > > >
> > >> > > >  The performance test is added to the FLIP.
> > >> > > >
> > >> > > >  As I know, The lookup join can return multi rows, it
> depends
> > on
> > >> > > > whether  the join key
> > >> > > > is the primary key of the external database or not. The `lookup`
> > [1]
> > >> > will
> > >> > > > return a collection of
> > >> > > > joined result, and each of them will be collected
> > >> > > >
> > >> > > >
> > >> > > > [1]:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
> > >> > > >
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Aitozi.
> > >> > > >
> > >> > > > Jing Ge  于2023年6月9日周五 17:05写道:
> > >> > > >
> > >> > > > > Hi Aitozi,
> > >> > > > >
> > >> > > > > Thanks for the feedback. Looking forward to the performance
> > tests.
> > >> > > > >
> > >> > > > > Afaik, lookup returns one row for each key [1] [2].
> > Conceptually,
> > >> the
> > >> > > > > lookup function is used to enrich column(s) from the dimension
> > >> table.
> > >> > > If,
> > >> > > > > for the given key, there will be more than one row, there will
> > be
> > >> no
> > >> > > way
> > >> > > > to
> > >> > > > > know which row will be used to enrich the key.
> > >> > > > >
> > >> > > > > [1]
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > >> > > > > [2]
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
> > >> > > > >
> > >> 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Jing Ge
Hi Aitozi,

Thanks for taking care of that part. I have no other concern.

Best regards,
Jing


On Mon, Jun 12, 2023 at 5:38 PM Aitozi  wrote:

> BTW, If there are no other more blocking issue / comments, I would like to
> start a VOTE in another thread this wednesday 6.14
>
> Thanks,
> Aitozi.
>
> Aitozi  于2023年6月12日周一 23:34写道:
>
> > Hi, Jing,
> > Thanks for your explanation. I get your point now.
> >
> > For the performance part, I think it's a good idea to run with returning
> a
> > big table case, the memory consumption
> > should be a point to be taken care about. Because in the ordered mode,
> the
> > head element in buffer may affect the
> > total memory consumption.
> >
> >
> > Thanks,
> > Aitozi.
> >
> >
> >
> > Jing Ge  于2023年6月12日周一 20:28写道:
> >
> >> Hi Aitozi,
> >>
> >> Which key will be used for lookup is not an issue, only one row will be
> >> required for each key in order to enrich it. True, it depends on the
> >> implementation whether multiple rows or single row for each key will be
> >> returned. However, for the lookup & enrichment scenario, one row/key is
> >> recommended, otherwise, like I mentioned previously, enrichment won't
> >> work.
> >>
> >> I am a little bit concerned about returning a big table for each key,
> >> since
> >> it will take the async call longer to return and need more memory. The
> >> performance tests should cover this scenario. This is not a blocking
> issue
> >> for this FLIP.
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi  wrote:
> >>
> >> > Hi Jing,
> >> > I means the join key is not necessary to be the primary key or
> >> unique
> >> > index of the database.
> >> > In this situation, we may queried out multi rows for one join key. I
> >> think
> >> > that's why the
> >> > LookupFunction#lookup will return a collection of RowData.
> >> >
> >> > BTW, I think the behavior of lookup join will not affect the semantic
> of
> >> > the async udtf.
> >> > We use the Async TableFunction here and the table function can collect
> >> > multiple rows.
> >> >
> >> > Thanks,
> >> > Atiozi.
> >> >
> >> >
> >> >
> >> > Jing Ge  于2023年6月10日周六 00:15写道:
> >> >
> >> > > Hi Aitozi,
> >> > >
> >> > > The keyRow used in this case contains all keys[1].
> >> > >
> >> > > Best regards,
> >> > > Jing
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> >>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> >> > >
> >> > >
> >> > > On Fri, Jun 9, 2023 at 3:42 PM Aitozi  wrote:
> >> > >
> >> > > > Hi Jing,
> >> > > >
> >> > > >  The performance test is added to the FLIP.
> >> > > >
> >> > > >  As I know, The lookup join can return multi rows, it depends
> on
> >> > > > whether  the join key
> >> > > > is the primary key of the external database or not. The `lookup`
> [1]
> >> > will
> >> > > > return a collection of
> >> > > > joined result, and each of them will be collected
> >> > > >
> >> > > >
> >> > > > [1]:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
> >> > > >
> >> > > >
> >> > > > Thanks,
> >> > > > Aitozi.
> >> > > >
> >> > > > Jing Ge  于2023年6月9日周五 17:05写道:
> >> > > >
> >> > > > > Hi Aitozi,
> >> > > > >
> >> > > > > Thanks for the feedback. Looking forward to the performance
> tests.
> >> > > > >
> >> > > > > Afaik, lookup returns one row for each key [1] [2].
> Conceptually,
> >> the
> >> > > > > lookup function is used to enrich column(s) from the dimension
> >> table.
> >> > > If,
> >> > > > > for the given key, there will be more than one row, there will
> be
> >> no
> >> > > way
> >> > > > to
> >> > > > > know which row will be used to enrich the key.
> >> > > > >
> >> > > > > [1]
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> >> > > > > [2]
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
> >> > > > >
> >> > > > > Best regards,
> >> > > > > Jing
> >> > > > >
> >> > > > > On Fri, Jun 9, 2023 at 5:18 AM Aitozi 
> >> wrote:
> >> > > > >
> >> > > > > > Hi Jing
> >> > > > > > Thanks for your good questions. I have updated the example
> >> to
> >> > the
> >> > > > > FLIP.
> >> > > > > >
> >> > > > > > > Only one row for each lookup
> >> > > > > > lookup can also return multi rows, based on the query result.
> >> [1]
> >> > > > > >
> >> > > > > > [1]:
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Aitozi
BTW, If there are no other more blocking issue / comments, I would like to
start a VOTE in another thread this wednesday 6.14

Thanks,
Aitozi.

Aitozi  于2023年6月12日周一 23:34写道:

> Hi, Jing,
> Thanks for your explanation. I get your point now.
>
> For the performance part, I think it's a good idea to run with returning a
> big table case, the memory consumption
> should be a point to be taken care about. Because in the ordered mode, the
> head element in buffer may affect the
> total memory consumption.
>
>
> Thanks,
> Aitozi.
>
>
>
> Jing Ge  于2023年6月12日周一 20:28写道:
>
>> Hi Aitozi,
>>
>> Which key will be used for lookup is not an issue, only one row will be
>> required for each key in order to enrich it. True, it depends on the
>> implementation whether multiple rows or single row for each key will be
>> returned. However, for the lookup & enrichment scenario, one row/key is
>> recommended, otherwise, like I mentioned previously, enrichment won't
>> work.
>>
>> I am a little bit concerned about returning a big table for each key,
>> since
>> it will take the async call longer to return and need more memory. The
>> performance tests should cover this scenario. This is not a blocking issue
>> for this FLIP.
>>
>> Best regards,
>> Jing
>>
>> On Sat, Jun 10, 2023 at 4:11 AM Aitozi  wrote:
>>
>> > Hi Jing,
>> > I means the join key is not necessary to be the primary key or
>> unique
>> > index of the database.
>> > In this situation, we may queried out multi rows for one join key. I
>> think
>> > that's why the
>> > LookupFunction#lookup will return a collection of RowData.
>> >
>> > BTW, I think the behavior of lookup join will not affect the semantic of
>> > the async udtf.
>> > We use the Async TableFunction here and the table function can collect
>> > multiple rows.
>> >
>> > Thanks,
>> > Atiozi.
>> >
>> >
>> >
>> > Jing Ge  于2023年6月10日周六 00:15写道:
>> >
>> > > Hi Aitozi,
>> > >
>> > > The keyRow used in this case contains all keys[1].
>> > >
>> > > Best regards,
>> > > Jing
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
>> > >
>> > >
>> > > On Fri, Jun 9, 2023 at 3:42 PM Aitozi  wrote:
>> > >
>> > > > Hi Jing,
>> > > >
>> > > >  The performance test is added to the FLIP.
>> > > >
>> > > >  As I know, The lookup join can return multi rows, it depends on
>> > > > whether  the join key
>> > > > is the primary key of the external database or not. The `lookup` [1]
>> > will
>> > > > return a collection of
>> > > > joined result, and each of them will be collected
>> > > >
>> > > >
>> > > > [1]:
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
>> > > >
>> > > >
>> > > > Thanks,
>> > > > Aitozi.
>> > > >
>> > > > Jing Ge  于2023年6月9日周五 17:05写道:
>> > > >
>> > > > > Hi Aitozi,
>> > > > >
>> > > > > Thanks for the feedback. Looking forward to the performance tests.
>> > > > >
>> > > > > Afaik, lookup returns one row for each key [1] [2]. Conceptually,
>> the
>> > > > > lookup function is used to enrich column(s) from the dimension
>> table.
>> > > If,
>> > > > > for the given key, there will be more than one row, there will be
>> no
>> > > way
>> > > > to
>> > > > > know which row will be used to enrich the key.
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
>> > > > > [2]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
>> > > > >
>> > > > > Best regards,
>> > > > > Jing
>> > > > >
>> > > > > On Fri, Jun 9, 2023 at 5:18 AM Aitozi 
>> wrote:
>> > > > >
>> > > > > > Hi Jing
>> > > > > > Thanks for your good questions. I have updated the example
>> to
>> > the
>> > > > > FLIP.
>> > > > > >
>> > > > > > > Only one row for each lookup
>> > > > > > lookup can also return multi rows, based on the query result.
>> [1]
>> > > > > >
>> > > > > > [1]:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
>> > > > > >
>> > > > > > > If we use async calls with lateral join, my gut feeling is
>> > > > > > that we might have many more async calls than lookup join. I am
>> not
>> > > > > really
>> > > > > > sure if we will be facing potential issues in this case or not.
>> > > > > >
>> > > > > > IMO, the work pattern is 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Aitozi
Hi, Jing,
Thanks for your explanation. I get your point now.

For the performance part, I think it's a good idea to run with returning a
big table case, the memory consumption
should be a point to be taken care about. Because in the ordered mode, the
head element in buffer may affect the
total memory consumption.


Thanks,
Aitozi.



Jing Ge  于2023年6月12日周一 20:28写道:

> Hi Aitozi,
>
> Which key will be used for lookup is not an issue, only one row will be
> required for each key in order to enrich it. True, it depends on the
> implementation whether multiple rows or single row for each key will be
> returned. However, for the lookup & enrichment scenario, one row/key is
> recommended, otherwise, like I mentioned previously, enrichment won't work.
>
> I am a little bit concerned about returning a big table for each key, since
> it will take the async call longer to return and need more memory. The
> performance tests should cover this scenario. This is not a blocking issue
> for this FLIP.
>
> Best regards,
> Jing
>
> On Sat, Jun 10, 2023 at 4:11 AM Aitozi  wrote:
>
> > Hi Jing,
> > I means the join key is not necessary to be the primary key or unique
> > index of the database.
> > In this situation, we may queried out multi rows for one join key. I
> think
> > that's why the
> > LookupFunction#lookup will return a collection of RowData.
> >
> > BTW, I think the behavior of lookup join will not affect the semantic of
> > the async udtf.
> > We use the Async TableFunction here and the table function can collect
> > multiple rows.
> >
> > Thanks,
> > Atiozi.
> >
> >
> >
> > Jing Ge  于2023年6月10日周六 00:15写道:
> >
> > > Hi Aitozi,
> > >
> > > The keyRow used in this case contains all keys[1].
> > >
> > > Best regards,
> > > Jing
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > >
> > >
> > > On Fri, Jun 9, 2023 at 3:42 PM Aitozi  wrote:
> > >
> > > > Hi Jing,
> > > >
> > > >  The performance test is added to the FLIP.
> > > >
> > > >  As I know, The lookup join can return multi rows, it depends on
> > > > whether  the join key
> > > > is the primary key of the external database or not. The `lookup` [1]
> > will
> > > > return a collection of
> > > > joined result, and each of them will be collected
> > > >
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
> > > >
> > > >
> > > > Thanks,
> > > > Aitozi.
> > > >
> > > > Jing Ge  于2023年6月9日周五 17:05写道:
> > > >
> > > > > Hi Aitozi,
> > > > >
> > > > > Thanks for the feedback. Looking forward to the performance tests.
> > > > >
> > > > > Afaik, lookup returns one row for each key [1] [2]. Conceptually,
> the
> > > > > lookup function is used to enrich column(s) from the dimension
> table.
> > > If,
> > > > > for the given key, there will be more than one row, there will be
> no
> > > way
> > > > to
> > > > > know which row will be used to enrich the key.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > > > > On Fri, Jun 9, 2023 at 5:18 AM Aitozi 
> wrote:
> > > > >
> > > > > > Hi Jing
> > > > > > Thanks for your good questions. I have updated the example to
> > the
> > > > > FLIP.
> > > > > >
> > > > > > > Only one row for each lookup
> > > > > > lookup can also return multi rows, based on the query result. [1]
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
> > > > > >
> > > > > > > If we use async calls with lateral join, my gut feeling is
> > > > > > that we might have many more async calls than lookup join. I am
> not
> > > > > really
> > > > > > sure if we will be facing potential issues in this case or not.
> > > > > >
> > > > > > IMO, the work pattern is similar to the lookup function, for each
> > row
> > > > > from
> > > > > > the left table,
> > > > > > it will evaluate the eval method once, so the async call numbers
> > will
> > > > not
> > > > > > change.
> > > > > > and the maximum calls in flight is limited by the Async operators
> > > > buffer
> > > > > > capacity
> > > > > > which will be controlled by 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Jing Ge
Hi Aitozi,

Which key will be used for lookup is not an issue, only one row will be
required for each key in order to enrich it. True, it depends on the
implementation whether multiple rows or single row for each key will be
returned. However, for the lookup & enrichment scenario, one row/key is
recommended, otherwise, like I mentioned previously, enrichment won't work.

I am a little bit concerned about returning a big table for each key, since
it will take the async call longer to return and need more memory. The
performance tests should cover this scenario. This is not a blocking issue
for this FLIP.

Best regards,
Jing

On Sat, Jun 10, 2023 at 4:11 AM Aitozi  wrote:

> Hi Jing,
> I means the join key is not necessary to be the primary key or unique
> index of the database.
> In this situation, we may queried out multi rows for one join key. I think
> that's why the
> LookupFunction#lookup will return a collection of RowData.
>
> BTW, I think the behavior of lookup join will not affect the semantic of
> the async udtf.
> We use the Async TableFunction here and the table function can collect
> multiple rows.
>
> Thanks,
> Atiozi.
>
>
>
> Jing Ge  于2023年6月10日周六 00:15写道:
>
> > Hi Aitozi,
> >
> > The keyRow used in this case contains all keys[1].
> >
> > Best regards,
> > Jing
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> >
> >
> > On Fri, Jun 9, 2023 at 3:42 PM Aitozi  wrote:
> >
> > > Hi Jing,
> > >
> > >  The performance test is added to the FLIP.
> > >
> > >  As I know, The lookup join can return multi rows, it depends on
> > > whether  the join key
> > > is the primary key of the external database or not. The `lookup` [1]
> will
> > > return a collection of
> > > joined result, and each of them will be collected
> > >
> > >
> > > [1]:
> > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
> > >
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > > Jing Ge  于2023年6月9日周五 17:05写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > Thanks for the feedback. Looking forward to the performance tests.
> > > >
> > > > Afaik, lookup returns one row for each key [1] [2]. Conceptually, the
> > > > lookup function is used to enrich column(s) from the dimension table.
> > If,
> > > > for the given key, there will be more than one row, there will be no
> > way
> > > to
> > > > know which row will be used to enrich the key.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > > > [2]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Fri, Jun 9, 2023 at 5:18 AM Aitozi  wrote:
> > > >
> > > > > Hi Jing
> > > > > Thanks for your good questions. I have updated the example to
> the
> > > > FLIP.
> > > > >
> > > > > > Only one row for each lookup
> > > > > lookup can also return multi rows, based on the query result. [1]
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
> > > > >
> > > > > > If we use async calls with lateral join, my gut feeling is
> > > > > that we might have many more async calls than lookup join. I am not
> > > > really
> > > > > sure if we will be facing potential issues in this case or not.
> > > > >
> > > > > IMO, the work pattern is similar to the lookup function, for each
> row
> > > > from
> > > > > the left table,
> > > > > it will evaluate the eval method once, so the async call numbers
> will
> > > not
> > > > > change.
> > > > > and the maximum calls in flight is limited by the Async operators
> > > buffer
> > > > > capacity
> > > > > which will be controlled by the option.
> > > > >
> > > > > BTW, for the naming of these option, I updated the FLIP about this
> > you
> > > > can
> > > > > refer to
> > > > > the section of "ConfigOption" and "Rejected Alternatives"
> > > > >
> > > > > In the end, for the performance evaluation, I'd like to do some
> tests
> > > and
> > > > > will update it to the FLIP doc
> > > > >
> > > > > Thanks,
> > > > > Aitozi.
> > > > >
> > > > >
> > > > > Jing Ge  于2023年6月9日周五 07:23写道:
> > > > >
> > > > > > Hi Aitozi,
> > > > > >
> > > > > > Thanks for the clarification. The code example looks
> interesting. I
> > > > would
> > > > > > suggest adding them into the FLIP. The description with 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-09 Thread Aitozi
Hi Jing,
I means the join key is not necessary to be the primary key or unique
index of the database.
In this situation, we may queried out multi rows for one join key. I think
that's why the
LookupFunction#lookup will return a collection of RowData.

BTW, I think the behavior of lookup join will not affect the semantic of
the async udtf.
We use the Async TableFunction here and the table function can collect
multiple rows.

Thanks,
Atiozi.



Jing Ge  于2023年6月10日周六 00:15写道:

> Hi Aitozi,
>
> The keyRow used in this case contains all keys[1].
>
> Best regards,
> Jing
>
> [1]
>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
>
>
> On Fri, Jun 9, 2023 at 3:42 PM Aitozi  wrote:
>
> > Hi Jing,
> >
> >  The performance test is added to the FLIP.
> >
> >  As I know, The lookup join can return multi rows, it depends on
> > whether  the join key
> > is the primary key of the external database or not. The `lookup` [1] will
> > return a collection of
> > joined result, and each of them will be collected
> >
> >
> > [1]:
> >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
> >
> >
> > Thanks,
> > Aitozi.
> >
> > Jing Ge  于2023年6月9日周五 17:05写道:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for the feedback. Looking forward to the performance tests.
> > >
> > > Afaik, lookup returns one row for each key [1] [2]. Conceptually, the
> > > lookup function is used to enrich column(s) from the dimension table.
> If,
> > > for the given key, there will be more than one row, there will be no
> way
> > to
> > > know which row will be used to enrich the key.
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > > [2]
> > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Fri, Jun 9, 2023 at 5:18 AM Aitozi  wrote:
> > >
> > > > Hi Jing
> > > > Thanks for your good questions. I have updated the example to the
> > > FLIP.
> > > >
> > > > > Only one row for each lookup
> > > > lookup can also return multi rows, based on the query result. [1]
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
> > > >
> > > > > If we use async calls with lateral join, my gut feeling is
> > > > that we might have many more async calls than lookup join. I am not
> > > really
> > > > sure if we will be facing potential issues in this case or not.
> > > >
> > > > IMO, the work pattern is similar to the lookup function, for each row
> > > from
> > > > the left table,
> > > > it will evaluate the eval method once, so the async call numbers will
> > not
> > > > change.
> > > > and the maximum calls in flight is limited by the Async operators
> > buffer
> > > > capacity
> > > > which will be controlled by the option.
> > > >
> > > > BTW, for the naming of these option, I updated the FLIP about this
> you
> > > can
> > > > refer to
> > > > the section of "ConfigOption" and "Rejected Alternatives"
> > > >
> > > > In the end, for the performance evaluation, I'd like to do some tests
> > and
> > > > will update it to the FLIP doc
> > > >
> > > > Thanks,
> > > > Aitozi.
> > > >
> > > >
> > > > Jing Ge  于2023年6月9日周五 07:23写道:
> > > >
> > > > > Hi Aitozi,
> > > > >
> > > > > Thanks for the clarification. The code example looks interesting. I
> > > would
> > > > > suggest adding them into the FLIP. The description with code
> examples
> > > > will
> > > > > help readers understand the motivation and how to use it. Afaiac,
> it
> > > is a
> > > > > valid feature for Flink users.
> > > > >
> > > > > As we knew, lookup join is based on temporal join, i.e. FOR
> > SYSTEM_TIME
> > > > AS
> > > > > OF which is also used in your code example. Temporal join performs
> > the
> > > > > lookup based on the processing time match. Only one row for each
> > > > > lookup(afaiu, I need to check the source code to double confirm)
> will
> > > > > return for further enrichment. One the other hand, lateral join
> will
> > > have
> > > > > sub-queries correlated with every individual value of the reference
> > > table
> > > > > from the preceding part of the query and each sub query will return
> > > > > multiple rows. If we use async calls with lateral join, my gut
> > feeling
> > > is
> > > > > that we might have many more async calls than lookup join. I am not
> > > > really
> > > > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-09 Thread Jing Ge
Hi Aitozi,

The keyRow used in this case contains all keys[1].

Best regards,
Jing

[1]
https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49


On Fri, Jun 9, 2023 at 3:42 PM Aitozi  wrote:

> Hi Jing,
>
>  The performance test is added to the FLIP.
>
>  As I know, The lookup join can return multi rows, it depends on
> whether  the join key
> is the primary key of the external database or not. The `lookup` [1] will
> return a collection of
> joined result, and each of them will be collected
>
>
> [1]:
>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
>
>
> Thanks,
> Aitozi.
>
> Jing Ge  于2023年6月9日周五 17:05写道:
>
> > Hi Aitozi,
> >
> > Thanks for the feedback. Looking forward to the performance tests.
> >
> > Afaik, lookup returns one row for each key [1] [2]. Conceptually, the
> > lookup function is used to enrich column(s) from the dimension table. If,
> > for the given key, there will be more than one row, there will be no way
> to
> > know which row will be used to enrich the key.
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > [2]
> >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
> >
> > Best regards,
> > Jing
> >
> > On Fri, Jun 9, 2023 at 5:18 AM Aitozi  wrote:
> >
> > > Hi Jing
> > > Thanks for your good questions. I have updated the example to the
> > FLIP.
> > >
> > > > Only one row for each lookup
> > > lookup can also return multi rows, based on the query result. [1]
> > >
> > > [1]:
> > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
> > >
> > > > If we use async calls with lateral join, my gut feeling is
> > > that we might have many more async calls than lookup join. I am not
> > really
> > > sure if we will be facing potential issues in this case or not.
> > >
> > > IMO, the work pattern is similar to the lookup function, for each row
> > from
> > > the left table,
> > > it will evaluate the eval method once, so the async call numbers will
> not
> > > change.
> > > and the maximum calls in flight is limited by the Async operators
> buffer
> > > capacity
> > > which will be controlled by the option.
> > >
> > > BTW, for the naming of these option, I updated the FLIP about this you
> > can
> > > refer to
> > > the section of "ConfigOption" and "Rejected Alternatives"
> > >
> > > In the end, for the performance evaluation, I'd like to do some tests
> and
> > > will update it to the FLIP doc
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > >
> > > Jing Ge  于2023年6月9日周五 07:23写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > Thanks for the clarification. The code example looks interesting. I
> > would
> > > > suggest adding them into the FLIP. The description with code examples
> > > will
> > > > help readers understand the motivation and how to use it. Afaiac, it
> > is a
> > > > valid feature for Flink users.
> > > >
> > > > As we knew, lookup join is based on temporal join, i.e. FOR
> SYSTEM_TIME
> > > AS
> > > > OF which is also used in your code example. Temporal join performs
> the
> > > > lookup based on the processing time match. Only one row for each
> > > > lookup(afaiu, I need to check the source code to double confirm) will
> > > > return for further enrichment. One the other hand, lateral join will
> > have
> > > > sub-queries correlated with every individual value of the reference
> > table
> > > > from the preceding part of the query and each sub query will return
> > > > multiple rows. If we use async calls with lateral join, my gut
> feeling
> > is
> > > > that we might have many more async calls than lookup join. I am not
> > > really
> > > > sure if we will be facing potential issues in this case or not.
> > Possible
> > > > issues I can think of now e.g. too many PRC calls, too many async
> calls
> > > > processing, the sub query will return a table which might be (too)
> big,
> > > and
> > > > might cause performance issues. I would suggest preparing some use
> > cases
> > > > and running some performance tests to check it. These are my concerns
> > > about
> > > > using async calls with lateral join and I'd like to share with you,
> > happy
> > > > to discuss with you and hear different opinions, hopefully the
> > > > discussion could help me understand it more deeply. Please correct me
> > if
> > > I
> > > > am wrong.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > >
> > > > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-09 Thread Aitozi
Hi Jing,

 The performance test is added to the FLIP.

 As I know, The lookup join can return multi rows, it depends on
whether  the join key
is the primary key of the external database or not. The `lookup` [1] will
return a collection of
joined result, and each of them will be collected


[1]:
https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52


Thanks,
Aitozi.

Jing Ge  于2023年6月9日周五 17:05写道:

> Hi Aitozi,
>
> Thanks for the feedback. Looking forward to the performance tests.
>
> Afaik, lookup returns one row for each key [1] [2]. Conceptually, the
> lookup function is used to enrich column(s) from the dimension table. If,
> for the given key, there will be more than one row, there will be no way to
> know which row will be used to enrich the key.
>
> [1]
>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> [2]
>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
>
> Best regards,
> Jing
>
> On Fri, Jun 9, 2023 at 5:18 AM Aitozi  wrote:
>
> > Hi Jing
> > Thanks for your good questions. I have updated the example to the
> FLIP.
> >
> > > Only one row for each lookup
> > lookup can also return multi rows, based on the query result. [1]
> >
> > [1]:
> >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
> >
> > > If we use async calls with lateral join, my gut feeling is
> > that we might have many more async calls than lookup join. I am not
> really
> > sure if we will be facing potential issues in this case or not.
> >
> > IMO, the work pattern is similar to the lookup function, for each row
> from
> > the left table,
> > it will evaluate the eval method once, so the async call numbers will not
> > change.
> > and the maximum calls in flight is limited by the Async operators buffer
> > capacity
> > which will be controlled by the option.
> >
> > BTW, for the naming of these option, I updated the FLIP about this you
> can
> > refer to
> > the section of "ConfigOption" and "Rejected Alternatives"
> >
> > In the end, for the performance evaluation, I'd like to do some tests and
> > will update it to the FLIP doc
> >
> > Thanks,
> > Aitozi.
> >
> >
> > Jing Ge  于2023年6月9日周五 07:23写道:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for the clarification. The code example looks interesting. I
> would
> > > suggest adding them into the FLIP. The description with code examples
> > will
> > > help readers understand the motivation and how to use it. Afaiac, it
> is a
> > > valid feature for Flink users.
> > >
> > > As we knew, lookup join is based on temporal join, i.e. FOR SYSTEM_TIME
> > AS
> > > OF which is also used in your code example. Temporal join performs the
> > > lookup based on the processing time match. Only one row for each
> > > lookup(afaiu, I need to check the source code to double confirm) will
> > > return for further enrichment. One the other hand, lateral join will
> have
> > > sub-queries correlated with every individual value of the reference
> table
> > > from the preceding part of the query and each sub query will return
> > > multiple rows. If we use async calls with lateral join, my gut feeling
> is
> > > that we might have many more async calls than lookup join. I am not
> > really
> > > sure if we will be facing potential issues in this case or not.
> Possible
> > > issues I can think of now e.g. too many PRC calls, too many async calls
> > > processing, the sub query will return a table which might be (too) big,
> > and
> > > might cause performance issues. I would suggest preparing some use
> cases
> > > and running some performance tests to check it. These are my concerns
> > about
> > > using async calls with lateral join and I'd like to share with you,
> happy
> > > to discuss with you and hear different opinions, hopefully the
> > > discussion could help me understand it more deeply. Please correct me
> if
> > I
> > > am wrong.
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Thu, Jun 8, 2023 at 7:22 AM Aitozi  wrote:
> > >
> > > > Hi Mason,
> > > > Thanks for your input. I think if we support the user defined
> async
> > > > table function,
> > > > user will be able to use it to hold a batch data then handle it at
> one
> > > time
> > > > in the customized function.
> > > >
> > > > AsyncSink is meant for the sink operator. I have not figure out how
> to
> > > > integrate in this case.
> > > >
> > > > Thanks,
> > > > Atiozi.
> > > >
> > > >
> > > > Mason Chen  于2023年6月8日周四 12:40写道:
> > > >
> > > > > Hi Aitozi,
> > > > >
> > > > > I think it makes 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-09 Thread Jing Ge
Hi Aitozi,

Thanks for the feedback. Looking forward to the performance tests.

Afaik, lookup returns one row for each key [1] [2]. Conceptually, the
lookup function is used to enrich column(s) from the dimension table. If,
for the given key, there will be more than one row, there will be no way to
know which row will be used to enrich the key.

[1]
https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
[2]
https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196

Best regards,
Jing

On Fri, Jun 9, 2023 at 5:18 AM Aitozi  wrote:

> Hi Jing
> Thanks for your good questions. I have updated the example to the FLIP.
>
> > Only one row for each lookup
> lookup can also return multi rows, based on the query result. [1]
>
> [1]:
>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
>
> > If we use async calls with lateral join, my gut feeling is
> that we might have many more async calls than lookup join. I am not really
> sure if we will be facing potential issues in this case or not.
>
> IMO, the work pattern is similar to the lookup function, for each row from
> the left table,
> it will evaluate the eval method once, so the async call numbers will not
> change.
> and the maximum calls in flight is limited by the Async operators buffer
> capacity
> which will be controlled by the option.
>
> BTW, for the naming of these option, I updated the FLIP about this you can
> refer to
> the section of "ConfigOption" and "Rejected Alternatives"
>
> In the end, for the performance evaluation, I'd like to do some tests and
> will update it to the FLIP doc
>
> Thanks,
> Aitozi.
>
>
> Jing Ge  于2023年6月9日周五 07:23写道:
>
> > Hi Aitozi,
> >
> > Thanks for the clarification. The code example looks interesting. I would
> > suggest adding them into the FLIP. The description with code examples
> will
> > help readers understand the motivation and how to use it. Afaiac, it is a
> > valid feature for Flink users.
> >
> > As we knew, lookup join is based on temporal join, i.e. FOR SYSTEM_TIME
> AS
> > OF which is also used in your code example. Temporal join performs the
> > lookup based on the processing time match. Only one row for each
> > lookup(afaiu, I need to check the source code to double confirm) will
> > return for further enrichment. One the other hand, lateral join will have
> > sub-queries correlated with every individual value of the reference table
> > from the preceding part of the query and each sub query will return
> > multiple rows. If we use async calls with lateral join, my gut feeling is
> > that we might have many more async calls than lookup join. I am not
> really
> > sure if we will be facing potential issues in this case or not. Possible
> > issues I can think of now e.g. too many PRC calls, too many async calls
> > processing, the sub query will return a table which might be (too) big,
> and
> > might cause performance issues. I would suggest preparing some use cases
> > and running some performance tests to check it. These are my concerns
> about
> > using async calls with lateral join and I'd like to share with you, happy
> > to discuss with you and hear different opinions, hopefully the
> > discussion could help me understand it more deeply. Please correct me if
> I
> > am wrong.
> >
> > Best regards,
> > Jing
> >
> >
> > On Thu, Jun 8, 2023 at 7:22 AM Aitozi  wrote:
> >
> > > Hi Mason,
> > > Thanks for your input. I think if we support the user defined async
> > > table function,
> > > user will be able to use it to hold a batch data then handle it at one
> > time
> > > in the customized function.
> > >
> > > AsyncSink is meant for the sink operator. I have not figure out how to
> > > integrate in this case.
> > >
> > > Thanks,
> > > Atiozi.
> > >
> > >
> > > Mason Chen  于2023年6月8日周四 12:40写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > I think it makes sense to make it easier for SQL users to make RPCs.
> Do
> > > you
> > > > think your proposal can extend to the ability to batch data for the
> > RPC?
> > > > This is also another common strategy to increase throughput. Also,
> have
> > > you
> > > > considered solving this a bit differently by leveraging Flink's
> > > AsyncSink?
> > > >
> > > > Best,
> > > > Mason
> > > >
> > > > On Mon, Jun 5, 2023 at 1:50 AM Aitozi  wrote:
> > > >
> > > > > One more thing for discussion:
> > > > >
> > > > > In our internal implementation, we reuse the option
> > > > > `table.exec.async-lookup.buffer-capacity` and
> > > > > `table.exec.async-lookup.timeout` to config
> > > > > the async udtf. Do you think we should add two extra option to
> > > > distinguish
> > > > > from the lookup option 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-08 Thread Aitozi
Hi Feng,
Thanks for your good question, It's very attractive if we can support
run the original
UDTF asynchronously without introducing new UDTFs.

But I think it's not easy, because the original UDTFs are executed one
instance per parallelism
So there is no thread-safe problem to user. But for the asynchronously
version, usually means
the main process is running in a dedicated thread pool. So if we run the
originally UDTFs in
asynchronously way, it may bring thread-safe problem.


@Jing Ge  Also add the "Performance" section in FLIP.

Thanks,
Aitozi.

Feng Jin  于2023年6月9日周五 13:16写道:

> hi, Aitozi
>
> Thank you for your proposal.
>
> In our production environment, we often encounter efficiency issues with
> user-defined functions (UDFs), which can lead to slower processing speeds.
> I believe that this FLIP will make it easier for UDFs to be executed more
> efficiently.
>
>
> I have a small question:
>
> Is it possible for us to execute the original UDTF asynchronously without
> introducing new UDTFs?
> Of course, this is just my personal idea and I am not sure if it is a
> feasible solution.
>
>
> Best,
> Feng
>
>
>
> On Fri, Jun 9, 2023 at 11:18 AM Aitozi  wrote:
>
> > Hi Jing
> > Thanks for your good questions. I have updated the example to the
> FLIP.
> >
> > > Only one row for each lookup
> > lookup can also return multi rows, based on the query result. [1]
> >
> > [1]:
> >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
> >
> > > If we use async calls with lateral join, my gut feeling is
> > that we might have many more async calls than lookup join. I am not
> really
> > sure if we will be facing potential issues in this case or not.
> >
> > IMO, the work pattern is similar to the lookup function, for each row
> from
> > the left table,
> > it will evaluate the eval method once, so the async call numbers will not
> > change.
> > and the maximum calls in flight is limited by the Async operators buffer
> > capacity
> > which will be controlled by the option.
> >
> > BTW, for the naming of these option, I updated the FLIP about this you
> can
> > refer to
> > the section of "ConfigOption" and "Rejected Alternatives"
> >
> > In the end, for the performance evaluation, I'd like to do some tests and
> > will update it to the FLIP doc
> >
> > Thanks,
> > Aitozi.
> >
> >
> > Jing Ge  于2023年6月9日周五 07:23写道:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for the clarification. The code example looks interesting. I
> would
> > > suggest adding them into the FLIP. The description with code examples
> > will
> > > help readers understand the motivation and how to use it. Afaiac, it
> is a
> > > valid feature for Flink users.
> > >
> > > As we knew, lookup join is based on temporal join, i.e. FOR SYSTEM_TIME
> > AS
> > > OF which is also used in your code example. Temporal join performs the
> > > lookup based on the processing time match. Only one row for each
> > > lookup(afaiu, I need to check the source code to double confirm) will
> > > return for further enrichment. One the other hand, lateral join will
> have
> > > sub-queries correlated with every individual value of the reference
> table
> > > from the preceding part of the query and each sub query will return
> > > multiple rows. If we use async calls with lateral join, my gut feeling
> is
> > > that we might have many more async calls than lookup join. I am not
> > really
> > > sure if we will be facing potential issues in this case or not.
> Possible
> > > issues I can think of now e.g. too many PRC calls, too many async calls
> > > processing, the sub query will return a table which might be (too) big,
> > and
> > > might cause performance issues. I would suggest preparing some use
> cases
> > > and running some performance tests to check it. These are my concerns
> > about
> > > using async calls with lateral join and I'd like to share with you,
> happy
> > > to discuss with you and hear different opinions, hopefully the
> > > discussion could help me understand it more deeply. Please correct me
> if
> > I
> > > am wrong.
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Thu, Jun 8, 2023 at 7:22 AM Aitozi  wrote:
> > >
> > > > Hi Mason,
> > > > Thanks for your input. I think if we support the user defined
> async
> > > > table function,
> > > > user will be able to use it to hold a batch data then handle it at
> one
> > > time
> > > > in the customized function.
> > > >
> > > > AsyncSink is meant for the sink operator. I have not figure out how
> to
> > > > integrate in this case.
> > > >
> > > > Thanks,
> > > > Atiozi.
> > > >
> > > >
> > > > Mason Chen  于2023年6月8日周四 12:40写道:
> > > >
> > > > > Hi Aitozi,
> > > > >
> > > > > I think it makes sense to make it easier for SQL users to make
> RPCs.
> > Do
> > > > you
> > > > > think your proposal can extend to the ability to batch data for the
> > > RPC?
> 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-08 Thread Feng Jin
hi, Aitozi

Thank you for your proposal.

In our production environment, we often encounter efficiency issues with
user-defined functions (UDFs), which can lead to slower processing speeds.
I believe that this FLIP will make it easier for UDFs to be executed more
efficiently.


I have a small question:

Is it possible for us to execute the original UDTF asynchronously without
introducing new UDTFs?
Of course, this is just my personal idea and I am not sure if it is a
feasible solution.


Best,
Feng



On Fri, Jun 9, 2023 at 11:18 AM Aitozi  wrote:

> Hi Jing
> Thanks for your good questions. I have updated the example to the FLIP.
>
> > Only one row for each lookup
> lookup can also return multi rows, based on the query result. [1]
>
> [1]:
>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
>
> > If we use async calls with lateral join, my gut feeling is
> that we might have many more async calls than lookup join. I am not really
> sure if we will be facing potential issues in this case or not.
>
> IMO, the work pattern is similar to the lookup function, for each row from
> the left table,
> it will evaluate the eval method once, so the async call numbers will not
> change.
> and the maximum calls in flight is limited by the Async operators buffer
> capacity
> which will be controlled by the option.
>
> BTW, for the naming of these option, I updated the FLIP about this you can
> refer to
> the section of "ConfigOption" and "Rejected Alternatives"
>
> In the end, for the performance evaluation, I'd like to do some tests and
> will update it to the FLIP doc
>
> Thanks,
> Aitozi.
>
>
> Jing Ge  于2023年6月9日周五 07:23写道:
>
> > Hi Aitozi,
> >
> > Thanks for the clarification. The code example looks interesting. I would
> > suggest adding them into the FLIP. The description with code examples
> will
> > help readers understand the motivation and how to use it. Afaiac, it is a
> > valid feature for Flink users.
> >
> > As we knew, lookup join is based on temporal join, i.e. FOR SYSTEM_TIME
> AS
> > OF which is also used in your code example. Temporal join performs the
> > lookup based on the processing time match. Only one row for each
> > lookup(afaiu, I need to check the source code to double confirm) will
> > return for further enrichment. One the other hand, lateral join will have
> > sub-queries correlated with every individual value of the reference table
> > from the preceding part of the query and each sub query will return
> > multiple rows. If we use async calls with lateral join, my gut feeling is
> > that we might have many more async calls than lookup join. I am not
> really
> > sure if we will be facing potential issues in this case or not. Possible
> > issues I can think of now e.g. too many PRC calls, too many async calls
> > processing, the sub query will return a table which might be (too) big,
> and
> > might cause performance issues. I would suggest preparing some use cases
> > and running some performance tests to check it. These are my concerns
> about
> > using async calls with lateral join and I'd like to share with you, happy
> > to discuss with you and hear different opinions, hopefully the
> > discussion could help me understand it more deeply. Please correct me if
> I
> > am wrong.
> >
> > Best regards,
> > Jing
> >
> >
> > On Thu, Jun 8, 2023 at 7:22 AM Aitozi  wrote:
> >
> > > Hi Mason,
> > > Thanks for your input. I think if we support the user defined async
> > > table function,
> > > user will be able to use it to hold a batch data then handle it at one
> > time
> > > in the customized function.
> > >
> > > AsyncSink is meant for the sink operator. I have not figure out how to
> > > integrate in this case.
> > >
> > > Thanks,
> > > Atiozi.
> > >
> > >
> > > Mason Chen  于2023年6月8日周四 12:40写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > I think it makes sense to make it easier for SQL users to make RPCs.
> Do
> > > you
> > > > think your proposal can extend to the ability to batch data for the
> > RPC?
> > > > This is also another common strategy to increase throughput. Also,
> have
> > > you
> > > > considered solving this a bit differently by leveraging Flink's
> > > AsyncSink?
> > > >
> > > > Best,
> > > > Mason
> > > >
> > > > On Mon, Jun 5, 2023 at 1:50 AM Aitozi  wrote:
> > > >
> > > > > One more thing for discussion:
> > > > >
> > > > > In our internal implementation, we reuse the option
> > > > > `table.exec.async-lookup.buffer-capacity` and
> > > > > `table.exec.async-lookup.timeout` to config
> > > > > the async udtf. Do you think we should add two extra option to
> > > > distinguish
> > > > > from the lookup option such as
> > > > >
> > > > > `table.exec.async-udtf.buffer-capacity`
> > > > > `table.exec.async-udtf.timeout`
> > > > >
> > > > >
> > > > > Best,
> > > > > Aitozi.
> > > > >
> > > > >
> > > > >
> > > > > Aitozi  于2023年6月5日周一 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-08 Thread Aitozi
Hi Jing
Thanks for your good questions. I have updated the example to the FLIP.

> Only one row for each lookup
lookup can also return multi rows, based on the query result. [1]

[1]:
https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56

> If we use async calls with lateral join, my gut feeling is
that we might have many more async calls than lookup join. I am not really
sure if we will be facing potential issues in this case or not.

IMO, the work pattern is similar to the lookup function, for each row from
the left table,
it will evaluate the eval method once, so the async call numbers will not
change.
and the maximum calls in flight is limited by the Async operators buffer
capacity
which will be controlled by the option.

BTW, for the naming of these option, I updated the FLIP about this you can
refer to
the section of "ConfigOption" and "Rejected Alternatives"

In the end, for the performance evaluation, I'd like to do some tests and
will update it to the FLIP doc

Thanks,
Aitozi.


Jing Ge  于2023年6月9日周五 07:23写道:

> Hi Aitozi,
>
> Thanks for the clarification. The code example looks interesting. I would
> suggest adding them into the FLIP. The description with code examples will
> help readers understand the motivation and how to use it. Afaiac, it is a
> valid feature for Flink users.
>
> As we knew, lookup join is based on temporal join, i.e. FOR SYSTEM_TIME AS
> OF which is also used in your code example. Temporal join performs the
> lookup based on the processing time match. Only one row for each
> lookup(afaiu, I need to check the source code to double confirm) will
> return for further enrichment. One the other hand, lateral join will have
> sub-queries correlated with every individual value of the reference table
> from the preceding part of the query and each sub query will return
> multiple rows. If we use async calls with lateral join, my gut feeling is
> that we might have many more async calls than lookup join. I am not really
> sure if we will be facing potential issues in this case or not. Possible
> issues I can think of now e.g. too many PRC calls, too many async calls
> processing, the sub query will return a table which might be (too) big, and
> might cause performance issues. I would suggest preparing some use cases
> and running some performance tests to check it. These are my concerns about
> using async calls with lateral join and I'd like to share with you, happy
> to discuss with you and hear different opinions, hopefully the
> discussion could help me understand it more deeply. Please correct me if I
> am wrong.
>
> Best regards,
> Jing
>
>
> On Thu, Jun 8, 2023 at 7:22 AM Aitozi  wrote:
>
> > Hi Mason,
> > Thanks for your input. I think if we support the user defined async
> > table function,
> > user will be able to use it to hold a batch data then handle it at one
> time
> > in the customized function.
> >
> > AsyncSink is meant for the sink operator. I have not figure out how to
> > integrate in this case.
> >
> > Thanks,
> > Atiozi.
> >
> >
> > Mason Chen  于2023年6月8日周四 12:40写道:
> >
> > > Hi Aitozi,
> > >
> > > I think it makes sense to make it easier for SQL users to make RPCs. Do
> > you
> > > think your proposal can extend to the ability to batch data for the
> RPC?
> > > This is also another common strategy to increase throughput. Also, have
> > you
> > > considered solving this a bit differently by leveraging Flink's
> > AsyncSink?
> > >
> > > Best,
> > > Mason
> > >
> > > On Mon, Jun 5, 2023 at 1:50 AM Aitozi  wrote:
> > >
> > > > One more thing for discussion:
> > > >
> > > > In our internal implementation, we reuse the option
> > > > `table.exec.async-lookup.buffer-capacity` and
> > > > `table.exec.async-lookup.timeout` to config
> > > > the async udtf. Do you think we should add two extra option to
> > > distinguish
> > > > from the lookup option such as
> > > >
> > > > `table.exec.async-udtf.buffer-capacity`
> > > > `table.exec.async-udtf.timeout`
> > > >
> > > >
> > > > Best,
> > > > Aitozi.
> > > >
> > > >
> > > >
> > > > Aitozi  于2023年6月5日周一 12:20写道:
> > > >
> > > > > Hi Jing,
> > > > >
> > > > > > what is the difference between the RPC call or query you
> > > mentioned
> > > > > and the lookup in a very
> > > > > general way
> > > > >
> > > > > I think the RPC call or query service is quite similar to the
> lookup
> > > > join.
> > > > > But lookup join should work
> > > > > with `LookupTableSource`.
> > > > >
> > > > > Let's see how we can perform an async RPC call with lookup join:
> > > > >
> > > > > (1) Implement an AsyncTableFunction with RPC call logic.
> > > > > (2) Implement a `LookupTableSource` connector run with the async
> udtf
> > > > > defined in (1).
> > > > > (3) Then define a DDL of this look up table in SQL
> > > > >
> > > > > CREATE TEMPORARY TABLE Customers (
> > > > >   id INT,
> > > > >   name STRING,

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-08 Thread Jing Ge
Hi Aitozi,

Thanks for the clarification. The code example looks interesting. I would
suggest adding them into the FLIP. The description with code examples will
help readers understand the motivation and how to use it. Afaiac, it is a
valid feature for Flink users.

As we knew, lookup join is based on temporal join, i.e. FOR SYSTEM_TIME AS
OF which is also used in your code example. Temporal join performs the
lookup based on the processing time match. Only one row for each
lookup(afaiu, I need to check the source code to double confirm) will
return for further enrichment. One the other hand, lateral join will have
sub-queries correlated with every individual value of the reference table
from the preceding part of the query and each sub query will return
multiple rows. If we use async calls with lateral join, my gut feeling is
that we might have many more async calls than lookup join. I am not really
sure if we will be facing potential issues in this case or not. Possible
issues I can think of now e.g. too many PRC calls, too many async calls
processing, the sub query will return a table which might be (too) big, and
might cause performance issues. I would suggest preparing some use cases
and running some performance tests to check it. These are my concerns about
using async calls with lateral join and I'd like to share with you, happy
to discuss with you and hear different opinions, hopefully the
discussion could help me understand it more deeply. Please correct me if I
am wrong.

Best regards,
Jing


On Thu, Jun 8, 2023 at 7:22 AM Aitozi  wrote:

> Hi Mason,
> Thanks for your input. I think if we support the user defined async
> table function,
> user will be able to use it to hold a batch data then handle it at one time
> in the customized function.
>
> AsyncSink is meant for the sink operator. I have not figure out how to
> integrate in this case.
>
> Thanks,
> Atiozi.
>
>
> Mason Chen  于2023年6月8日周四 12:40写道:
>
> > Hi Aitozi,
> >
> > I think it makes sense to make it easier for SQL users to make RPCs. Do
> you
> > think your proposal can extend to the ability to batch data for the RPC?
> > This is also another common strategy to increase throughput. Also, have
> you
> > considered solving this a bit differently by leveraging Flink's
> AsyncSink?
> >
> > Best,
> > Mason
> >
> > On Mon, Jun 5, 2023 at 1:50 AM Aitozi  wrote:
> >
> > > One more thing for discussion:
> > >
> > > In our internal implementation, we reuse the option
> > > `table.exec.async-lookup.buffer-capacity` and
> > > `table.exec.async-lookup.timeout` to config
> > > the async udtf. Do you think we should add two extra option to
> > distinguish
> > > from the lookup option such as
> > >
> > > `table.exec.async-udtf.buffer-capacity`
> > > `table.exec.async-udtf.timeout`
> > >
> > >
> > > Best,
> > > Aitozi.
> > >
> > >
> > >
> > > Aitozi  于2023年6月5日周一 12:20写道:
> > >
> > > > Hi Jing,
> > > >
> > > > > what is the difference between the RPC call or query you
> > mentioned
> > > > and the lookup in a very
> > > > general way
> > > >
> > > > I think the RPC call or query service is quite similar to the lookup
> > > join.
> > > > But lookup join should work
> > > > with `LookupTableSource`.
> > > >
> > > > Let's see how we can perform an async RPC call with lookup join:
> > > >
> > > > (1) Implement an AsyncTableFunction with RPC call logic.
> > > > (2) Implement a `LookupTableSource` connector run with the async udtf
> > > > defined in (1).
> > > > (3) Then define a DDL of this look up table in SQL
> > > >
> > > > CREATE TEMPORARY TABLE Customers (
> > > >   id INT,
> > > >   name STRING,
> > > >   country STRING,
> > > >   zip STRING
> > > > ) WITH (
> > > >   'connector' = 'custom'
> > > > );
> > > >
> > > > (4) Run with the query as below:
> > > >
> > > > SELECT o.order_id, o.total, c.country, c.zip
> > > > FROM Orders AS o
> > > >   JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
> > > > ON o.customer_id = c.id;
> > > >
> > > > This example is from doc
> > > > <
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/queries/joins/#lookup-join
> > > >.You
> > > > can image the look up process as an async RPC call process.
> > > >
> > > > Let's see how we can perform an async RPC call with lateral join:
> > > >
> > > > (1) Implement an AsyncTableFunction with RPC call logic.
> > > > (2) Run query with
> > > >
> > > > Create function f1 as '...' ;
> > > >
> > > > SELECT o.order_id, o.total, c.country, c.zip FROM Orders  lateral
> table
> > > > (f1(order_id)) as T(...);
> > > >
> > > > As you can see, the lateral join version is more simple and intuitive
> > to
> > > > users. Users do not have to wrap a
> > > > LookupTableSource for the purpose of using async udtf.
> > > >
> > > > In the end, We can also see the user defined async table function is
> an
> > > > enhancement of the current lateral table join
> > > > which only supports sync lateral join now.
> > > >
> > > > Best,
> > > > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-07 Thread Aitozi
Hi Mason,
Thanks for your input. I think if we support the user defined async
table function,
user will be able to use it to hold a batch data then handle it at one time
in the customized function.

AsyncSink is meant for the sink operator. I have not figure out how to
integrate in this case.

Thanks,
Atiozi.


Mason Chen  于2023年6月8日周四 12:40写道:

> Hi Aitozi,
>
> I think it makes sense to make it easier for SQL users to make RPCs. Do you
> think your proposal can extend to the ability to batch data for the RPC?
> This is also another common strategy to increase throughput. Also, have you
> considered solving this a bit differently by leveraging Flink's AsyncSink?
>
> Best,
> Mason
>
> On Mon, Jun 5, 2023 at 1:50 AM Aitozi  wrote:
>
> > One more thing for discussion:
> >
> > In our internal implementation, we reuse the option
> > `table.exec.async-lookup.buffer-capacity` and
> > `table.exec.async-lookup.timeout` to config
> > the async udtf. Do you think we should add two extra option to
> distinguish
> > from the lookup option such as
> >
> > `table.exec.async-udtf.buffer-capacity`
> > `table.exec.async-udtf.timeout`
> >
> >
> > Best,
> > Aitozi.
> >
> >
> >
> > Aitozi  于2023年6月5日周一 12:20写道:
> >
> > > Hi Jing,
> > >
> > > > what is the difference between the RPC call or query you
> mentioned
> > > and the lookup in a very
> > > general way
> > >
> > > I think the RPC call or query service is quite similar to the lookup
> > join.
> > > But lookup join should work
> > > with `LookupTableSource`.
> > >
> > > Let's see how we can perform an async RPC call with lookup join:
> > >
> > > (1) Implement an AsyncTableFunction with RPC call logic.
> > > (2) Implement a `LookupTableSource` connector run with the async udtf
> > > defined in (1).
> > > (3) Then define a DDL of this look up table in SQL
> > >
> > > CREATE TEMPORARY TABLE Customers (
> > >   id INT,
> > >   name STRING,
> > >   country STRING,
> > >   zip STRING
> > > ) WITH (
> > >   'connector' = 'custom'
> > > );
> > >
> > > (4) Run with the query as below:
> > >
> > > SELECT o.order_id, o.total, c.country, c.zip
> > > FROM Orders AS o
> > >   JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
> > > ON o.customer_id = c.id;
> > >
> > > This example is from doc
> > > <
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/queries/joins/#lookup-join
> > >.You
> > > can image the look up process as an async RPC call process.
> > >
> > > Let's see how we can perform an async RPC call with lateral join:
> > >
> > > (1) Implement an AsyncTableFunction with RPC call logic.
> > > (2) Run query with
> > >
> > > Create function f1 as '...' ;
> > >
> > > SELECT o.order_id, o.total, c.country, c.zip FROM Orders  lateral table
> > > (f1(order_id)) as T(...);
> > >
> > > As you can see, the lateral join version is more simple and intuitive
> to
> > > users. Users do not have to wrap a
> > > LookupTableSource for the purpose of using async udtf.
> > >
> > > In the end, We can also see the user defined async table function is an
> > > enhancement of the current lateral table join
> > > which only supports sync lateral join now.
> > >
> > > Best,
> > > Aitozi.
> > >
> > >
> > > Jing Ge  于2023年6月2日周五 19:37写道:
> > >
> > >> Hi Aitozi,
> > >>
> > >> Thanks for the update. Just out of curiosity, what is the difference
> > >> between the RPC call or query you mentioned and the lookup in a very
> > >> general way? Since Lateral join is used in the FLIP. Is there any
> > special
> > >> thought for that? Sorry for asking so many questions. The FLIP
> contains
> > >> limited information to understand the motivation.
> > >>
> > >> Best regards,
> > >> Jing
> > >>
> > >> On Fri, Jun 2, 2023 at 3:48 AM Aitozi  wrote:
> > >>
> > >> > Hi Jing,
> > >> > I have updated the proposed changes to the FLIP. IMO, lookup has
> > its
> > >> > clear
> > >> > async call requirement is due to its IO heavy operator. In our
> usage,
> > >> sql
> > >> > users have
> > >> > logic to do some RPC call or query the third-party service which is
> > >> also IO
> > >> > intensive.
> > >> > In these case, we'd like to leverage the async function to improve
> the
> > >> > throughput.
> > >> >
> > >> > Thanks,
> > >> > Aitozi.
> > >> >
> > >> > Jing Ge  于2023年6月1日周四 22:55写道:
> > >> >
> > >> > > Hi Aitozi,
> > >> > >
> > >> > > Sorry for the late reply. Would you like to update the proposed
> > >> changes
> > >> > > with more details into the FLIP too?
> > >> > > I got your point. It looks like a rational idea. However, since
> > lookup
> > >> > has
> > >> > > its clear async call requirement, are there any real use cases
> that
> > >> > > need this change? This will help us understand the motivation.
> After
> > >> all,
> > >> > > lateral join and temporal lookup join[1] are quite different.
> > >> > >
> > >> > > Best regards,
> > >> > > Jing
> > >> > >
> > >> > >
> > >> > > [1]
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-07 Thread Mason Chen
Hi Aitozi,

I think it makes sense to make it easier for SQL users to make RPCs. Do you
think your proposal can extend to the ability to batch data for the RPC?
This is also another common strategy to increase throughput. Also, have you
considered solving this a bit differently by leveraging Flink's AsyncSink?

Best,
Mason

On Mon, Jun 5, 2023 at 1:50 AM Aitozi  wrote:

> One more thing for discussion:
>
> In our internal implementation, we reuse the option
> `table.exec.async-lookup.buffer-capacity` and
> `table.exec.async-lookup.timeout` to config
> the async udtf. Do you think we should add two extra option to distinguish
> from the lookup option such as
>
> `table.exec.async-udtf.buffer-capacity`
> `table.exec.async-udtf.timeout`
>
>
> Best,
> Aitozi.
>
>
>
> Aitozi  于2023年6月5日周一 12:20写道:
>
> > Hi Jing,
> >
> > > what is the difference between the RPC call or query you mentioned
> > and the lookup in a very
> > general way
> >
> > I think the RPC call or query service is quite similar to the lookup
> join.
> > But lookup join should work
> > with `LookupTableSource`.
> >
> > Let's see how we can perform an async RPC call with lookup join:
> >
> > (1) Implement an AsyncTableFunction with RPC call logic.
> > (2) Implement a `LookupTableSource` connector run with the async udtf
> > defined in (1).
> > (3) Then define a DDL of this look up table in SQL
> >
> > CREATE TEMPORARY TABLE Customers (
> >   id INT,
> >   name STRING,
> >   country STRING,
> >   zip STRING
> > ) WITH (
> >   'connector' = 'custom'
> > );
> >
> > (4) Run with the query as below:
> >
> > SELECT o.order_id, o.total, c.country, c.zip
> > FROM Orders AS o
> >   JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
> > ON o.customer_id = c.id;
> >
> > This example is from doc
> > <
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/queries/joins/#lookup-join
> >.You
> > can image the look up process as an async RPC call process.
> >
> > Let's see how we can perform an async RPC call with lateral join:
> >
> > (1) Implement an AsyncTableFunction with RPC call logic.
> > (2) Run query with
> >
> > Create function f1 as '...' ;
> >
> > SELECT o.order_id, o.total, c.country, c.zip FROM Orders  lateral table
> > (f1(order_id)) as T(...);
> >
> > As you can see, the lateral join version is more simple and intuitive to
> > users. Users do not have to wrap a
> > LookupTableSource for the purpose of using async udtf.
> >
> > In the end, We can also see the user defined async table function is an
> > enhancement of the current lateral table join
> > which only supports sync lateral join now.
> >
> > Best,
> > Aitozi.
> >
> >
> > Jing Ge  于2023年6月2日周五 19:37写道:
> >
> >> Hi Aitozi,
> >>
> >> Thanks for the update. Just out of curiosity, what is the difference
> >> between the RPC call or query you mentioned and the lookup in a very
> >> general way? Since Lateral join is used in the FLIP. Is there any
> special
> >> thought for that? Sorry for asking so many questions. The FLIP contains
> >> limited information to understand the motivation.
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Fri, Jun 2, 2023 at 3:48 AM Aitozi  wrote:
> >>
> >> > Hi Jing,
> >> > I have updated the proposed changes to the FLIP. IMO, lookup has
> its
> >> > clear
> >> > async call requirement is due to its IO heavy operator. In our usage,
> >> sql
> >> > users have
> >> > logic to do some RPC call or query the third-party service which is
> >> also IO
> >> > intensive.
> >> > In these case, we'd like to leverage the async function to improve the
> >> > throughput.
> >> >
> >> > Thanks,
> >> > Aitozi.
> >> >
> >> > Jing Ge  于2023年6月1日周四 22:55写道:
> >> >
> >> > > Hi Aitozi,
> >> > >
> >> > > Sorry for the late reply. Would you like to update the proposed
> >> changes
> >> > > with more details into the FLIP too?
> >> > > I got your point. It looks like a rational idea. However, since
> lookup
> >> > has
> >> > > its clear async call requirement, are there any real use cases that
> >> > > need this change? This will help us understand the motivation. After
> >> all,
> >> > > lateral join and temporal lookup join[1] are quite different.
> >> > >
> >> > > Best regards,
> >> > > Jing
> >> > >
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> >>
> https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54
> >> > >
> >> > > On Wed, May 31, 2023 at 8:53 AM Aitozi 
> wrote:
> >> > >
> >> > > > Hi Jing,
> >> > > > What do you think about it? Can we move forward this feature?
> >> > > >
> >> > > > Thanks,
> >> > > > Aitozi.
> >> > > >
> >> > > > Aitozi  于2023年5月29日周一 09:56写道:
> >> > > >
> >> > > > > Hi Jing,
> >> > > > > > "Do you mean to support the AyncTableFunction beyond the
> >> > > > > LookupTableSource?"
> >> > > > > Yes, I mean to support the AyncTableFunction beyond the
> >> > > > LookupTableSource.
> >> > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-05 Thread Aitozi
One more thing for discussion:

In our internal implementation, we reuse the option
`table.exec.async-lookup.buffer-capacity` and
`table.exec.async-lookup.timeout` to config
the async udtf. Do you think we should add two extra option to distinguish
from the lookup option such as

`table.exec.async-udtf.buffer-capacity`
`table.exec.async-udtf.timeout`


Best,
Aitozi.



Aitozi  于2023年6月5日周一 12:20写道:

> Hi Jing,
>
> > what is the difference between the RPC call or query you mentioned
> and the lookup in a very
> general way
>
> I think the RPC call or query service is quite similar to the lookup join.
> But lookup join should work
> with `LookupTableSource`.
>
> Let's see how we can perform an async RPC call with lookup join:
>
> (1) Implement an AsyncTableFunction with RPC call logic.
> (2) Implement a `LookupTableSource` connector run with the async udtf
> defined in (1).
> (3) Then define a DDL of this look up table in SQL
>
> CREATE TEMPORARY TABLE Customers (
>   id INT,
>   name STRING,
>   country STRING,
>   zip STRING
> ) WITH (
>   'connector' = 'custom'
> );
>
> (4) Run with the query as below:
>
> SELECT o.order_id, o.total, c.country, c.zip
> FROM Orders AS o
>   JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
> ON o.customer_id = c.id;
>
> This example is from doc
> .You
> can image the look up process as an async RPC call process.
>
> Let's see how we can perform an async RPC call with lateral join:
>
> (1) Implement an AsyncTableFunction with RPC call logic.
> (2) Run query with
>
> Create function f1 as '...' ;
>
> SELECT o.order_id, o.total, c.country, c.zip FROM Orders  lateral table
> (f1(order_id)) as T(...);
>
> As you can see, the lateral join version is more simple and intuitive to
> users. Users do not have to wrap a
> LookupTableSource for the purpose of using async udtf.
>
> In the end, We can also see the user defined async table function is an
> enhancement of the current lateral table join
> which only supports sync lateral join now.
>
> Best,
> Aitozi.
>
>
> Jing Ge  于2023年6月2日周五 19:37写道:
>
>> Hi Aitozi,
>>
>> Thanks for the update. Just out of curiosity, what is the difference
>> between the RPC call or query you mentioned and the lookup in a very
>> general way? Since Lateral join is used in the FLIP. Is there any special
>> thought for that? Sorry for asking so many questions. The FLIP contains
>> limited information to understand the motivation.
>>
>> Best regards,
>> Jing
>>
>> On Fri, Jun 2, 2023 at 3:48 AM Aitozi  wrote:
>>
>> > Hi Jing,
>> > I have updated the proposed changes to the FLIP. IMO, lookup has its
>> > clear
>> > async call requirement is due to its IO heavy operator. In our usage,
>> sql
>> > users have
>> > logic to do some RPC call or query the third-party service which is
>> also IO
>> > intensive.
>> > In these case, we'd like to leverage the async function to improve the
>> > throughput.
>> >
>> > Thanks,
>> > Aitozi.
>> >
>> > Jing Ge  于2023年6月1日周四 22:55写道:
>> >
>> > > Hi Aitozi,
>> > >
>> > > Sorry for the late reply. Would you like to update the proposed
>> changes
>> > > with more details into the FLIP too?
>> > > I got your point. It looks like a rational idea. However, since lookup
>> > has
>> > > its clear async call requirement, are there any real use cases that
>> > > need this change? This will help us understand the motivation. After
>> all,
>> > > lateral join and temporal lookup join[1] are quite different.
>> > >
>> > > Best regards,
>> > > Jing
>> > >
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54
>> > >
>> > > On Wed, May 31, 2023 at 8:53 AM Aitozi  wrote:
>> > >
>> > > > Hi Jing,
>> > > > What do you think about it? Can we move forward this feature?
>> > > >
>> > > > Thanks,
>> > > > Aitozi.
>> > > >
>> > > > Aitozi  于2023年5月29日周一 09:56写道:
>> > > >
>> > > > > Hi Jing,
>> > > > > > "Do you mean to support the AyncTableFunction beyond the
>> > > > > LookupTableSource?"
>> > > > > Yes, I mean to support the AyncTableFunction beyond the
>> > > > LookupTableSource.
>> > > > >
>> > > > > The "AsyncTableFunction" is the function with ability to be
>> executed
>> > > > async
>> > > > > (with AsyncWaitOperator).
>> > > > > The async lookup join is a one of usage of this. So, we don't
>> have to
>> > > > bind
>> > > > > the AyncTableFunction with LookupTableSource.
>> > > > > If User-defined AsyncTableFunction is supported, user can directly
>> > use
>> > > > > lateral table syntax to perform async operation.
>> > > > >
>> > > > > > "It would be better if you could elaborate the proposed changes
>> wrt
>> > > the
>> > > > > CorrelatedCodeGenerator with more details"
>> > > > >
>> > > > > In the proposal, we use lateral table syntax to 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-04 Thread Aitozi
Hi Jing,

> what is the difference between the RPC call or query you mentioned
and the lookup in a very
general way

I think the RPC call or query service is quite similar to the lookup join.
But lookup join should work
with `LookupTableSource`.

Let's see how we can perform an async RPC call with lookup join:

(1) Implement an AsyncTableFunction with RPC call logic.
(2) Implement a `LookupTableSource` connector run with the async udtf
defined in (1).
(3) Then define a DDL of this look up table in SQL

CREATE TEMPORARY TABLE Customers (
  id INT,
  name STRING,
  country STRING,
  zip STRING
) WITH (
  'connector' = 'custom'
);

(4) Run with the query as below:

SELECT o.order_id, o.total, c.country, c.zip
FROM Orders AS o
  JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id;

This example is from doc
.You
can image the look up process as an async RPC call process.

Let's see how we can perform an async RPC call with lateral join:

(1) Implement an AsyncTableFunction with RPC call logic.
(2) Run query with

Create function f1 as '...' ;

SELECT o.order_id, o.total, c.country, c.zip FROM Orders  lateral table
(f1(order_id)) as T(...);

As you can see, the lateral join version is more simple and intuitive to
users. Users do not have to wrap a
LookupTableSource for the purpose of using async udtf.

In the end, We can also see the user defined async table function is an
enhancement of the current lateral table join
which only supports sync lateral join now.

Best,
Aitozi.


Jing Ge  于2023年6月2日周五 19:37写道:

> Hi Aitozi,
>
> Thanks for the update. Just out of curiosity, what is the difference
> between the RPC call or query you mentioned and the lookup in a very
> general way? Since Lateral join is used in the FLIP. Is there any special
> thought for that? Sorry for asking so many questions. The FLIP contains
> limited information to understand the motivation.
>
> Best regards,
> Jing
>
> On Fri, Jun 2, 2023 at 3:48 AM Aitozi  wrote:
>
> > Hi Jing,
> > I have updated the proposed changes to the FLIP. IMO, lookup has its
> > clear
> > async call requirement is due to its IO heavy operator. In our usage, sql
> > users have
> > logic to do some RPC call or query the third-party service which is also
> IO
> > intensive.
> > In these case, we'd like to leverage the async function to improve the
> > throughput.
> >
> > Thanks,
> > Aitozi.
> >
> > Jing Ge  于2023年6月1日周四 22:55写道:
> >
> > > Hi Aitozi,
> > >
> > > Sorry for the late reply. Would you like to update the proposed changes
> > > with more details into the FLIP too?
> > > I got your point. It looks like a rational idea. However, since lookup
> > has
> > > its clear async call requirement, are there any real use cases that
> > > need this change? This will help us understand the motivation. After
> all,
> > > lateral join and temporal lookup join[1] are quite different.
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54
> > >
> > > On Wed, May 31, 2023 at 8:53 AM Aitozi  wrote:
> > >
> > > > Hi Jing,
> > > > What do you think about it? Can we move forward this feature?
> > > >
> > > > Thanks,
> > > > Aitozi.
> > > >
> > > > Aitozi  于2023年5月29日周一 09:56写道:
> > > >
> > > > > Hi Jing,
> > > > > > "Do you mean to support the AyncTableFunction beyond the
> > > > > LookupTableSource?"
> > > > > Yes, I mean to support the AyncTableFunction beyond the
> > > > LookupTableSource.
> > > > >
> > > > > The "AsyncTableFunction" is the function with ability to be
> executed
> > > > async
> > > > > (with AsyncWaitOperator).
> > > > > The async lookup join is a one of usage of this. So, we don't have
> to
> > > > bind
> > > > > the AyncTableFunction with LookupTableSource.
> > > > > If User-defined AsyncTableFunction is supported, user can directly
> > use
> > > > > lateral table syntax to perform async operation.
> > > > >
> > > > > > "It would be better if you could elaborate the proposed changes
> wrt
> > > the
> > > > > CorrelatedCodeGenerator with more details"
> > > > >
> > > > > In the proposal, we use lateral table syntax to support the async
> > table
> > > > > function. So the planner will also treat this statement to a
> > > > > CommonExecCorrelate node. So the runtime code should be generated
> in
> > > > > CorrelatedCodeGenerator.
> > > > > In CorrelatedCodeGenerator, we will know the TableFunction's Kind
> of
> > > > > `FunctionKind.Table` or `FunctionKind.ASYNC_TABLE`
> > > > > For  `FunctionKind.ASYNC_TABLE` we can generate a AsyncWaitOperator
> > to
> > > > > execute the async table function.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Aitozi.
> > > > >
> > > > >
> > > > > Jing Ge  于2023年5月29日周一 03:22写道:

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-02 Thread Jing Ge
Hi Aitozi,

Thanks for the update. Just out of curiosity, what is the difference
between the RPC call or query you mentioned and the lookup in a very
general way? Since Lateral join is used in the FLIP. Is there any special
thought for that? Sorry for asking so many questions. The FLIP contains
limited information to understand the motivation.

Best regards,
Jing

On Fri, Jun 2, 2023 at 3:48 AM Aitozi  wrote:

> Hi Jing,
> I have updated the proposed changes to the FLIP. IMO, lookup has its
> clear
> async call requirement is due to its IO heavy operator. In our usage, sql
> users have
> logic to do some RPC call or query the third-party service which is also IO
> intensive.
> In these case, we'd like to leverage the async function to improve the
> throughput.
>
> Thanks,
> Aitozi.
>
> Jing Ge  于2023年6月1日周四 22:55写道:
>
> > Hi Aitozi,
> >
> > Sorry for the late reply. Would you like to update the proposed changes
> > with more details into the FLIP too?
> > I got your point. It looks like a rational idea. However, since lookup
> has
> > its clear async call requirement, are there any real use cases that
> > need this change? This will help us understand the motivation. After all,
> > lateral join and temporal lookup join[1] are quite different.
> >
> > Best regards,
> > Jing
> >
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54
> >
> > On Wed, May 31, 2023 at 8:53 AM Aitozi  wrote:
> >
> > > Hi Jing,
> > > What do you think about it? Can we move forward this feature?
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > > Aitozi  于2023年5月29日周一 09:56写道:
> > >
> > > > Hi Jing,
> > > > > "Do you mean to support the AyncTableFunction beyond the
> > > > LookupTableSource?"
> > > > Yes, I mean to support the AyncTableFunction beyond the
> > > LookupTableSource.
> > > >
> > > > The "AsyncTableFunction" is the function with ability to be executed
> > > async
> > > > (with AsyncWaitOperator).
> > > > The async lookup join is a one of usage of this. So, we don't have to
> > > bind
> > > > the AyncTableFunction with LookupTableSource.
> > > > If User-defined AsyncTableFunction is supported, user can directly
> use
> > > > lateral table syntax to perform async operation.
> > > >
> > > > > "It would be better if you could elaborate the proposed changes wrt
> > the
> > > > CorrelatedCodeGenerator with more details"
> > > >
> > > > In the proposal, we use lateral table syntax to support the async
> table
> > > > function. So the planner will also treat this statement to a
> > > > CommonExecCorrelate node. So the runtime code should be generated in
> > > > CorrelatedCodeGenerator.
> > > > In CorrelatedCodeGenerator, we will know the TableFunction's Kind of
> > > > `FunctionKind.Table` or `FunctionKind.ASYNC_TABLE`
> > > > For  `FunctionKind.ASYNC_TABLE` we can generate a AsyncWaitOperator
> to
> > > > execute the async table function.
> > > >
> > > >
> > > > Thanks,
> > > > Aitozi.
> > > >
> > > >
> > > > Jing Ge  于2023年5月29日周一 03:22写道:
> > > >
> > > >> Hi Aitozi,
> > > >>
> > > >> Thanks for the clarification. The naming "Lookup" might suggest
> using
> > it
> > > >> for table look up. But conceptually what the eval() method will do
> is
> > to
> > > >> get a collection of results(Row, RowData) from the given keys. How
> it
> > > will
> > > >> be done depends on the implementation, i.e. you can implement your
> own
> > > >> Source[1][2]. The example in the FLIP should be able to be handled
> in
> > > this
> > > >> way.
> > > >>
> > > >> Do you mean to support the AyncTableFunction beyond the
> > > LookupTableSource?
> > > >> It would be better if you could elaborate the proposed changes wrt
> the
> > > >> CorrelatedCodeGenerator with more details. Thanks!
> > > >>
> > > >> Best regards,
> > > >> Jing
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L64
> > > >> [2]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/AsyncTableFunctionProvider.java#L49
> > > >>
> > > >> On Sat, May 27, 2023 at 9:48 AM Aitozi 
> wrote:
> > > >>
> > > >> > Hi Jing,
> > > >> > Thanks for your response. As stated in the FLIP, the purpose
> of
> > > this
> > > >> > FLIP is meant to support
> > > >> > user-defined async table function. As described in flink document
> > [1]
> > > >> >
> > > >> > Async table functions are special functions for table sources that
> > > >> perform
> > > >> > > a lookup.
> > > >> > >
> > > >> >
> > > >> > So end user can not directly define and use async table function
> > now.
> > > An
> > > >> > user case is reported in [2]
> > > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-01 Thread Aitozi
Hi Jing,
I have updated the proposed changes to the FLIP. IMO, lookup has its
clear
async call requirement is due to its IO heavy operator. In our usage, sql
users have
logic to do some RPC call or query the third-party service which is also IO
intensive.
In these case, we'd like to leverage the async function to improve the
throughput.

Thanks,
Aitozi.

Jing Ge  于2023年6月1日周四 22:55写道:

> Hi Aitozi,
>
> Sorry for the late reply. Would you like to update the proposed changes
> with more details into the FLIP too?
> I got your point. It looks like a rational idea. However, since lookup has
> its clear async call requirement, are there any real use cases that
> need this change? This will help us understand the motivation. After all,
> lateral join and temporal lookup join[1] are quite different.
>
> Best regards,
> Jing
>
>
> [1]
>
> https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54
>
> On Wed, May 31, 2023 at 8:53 AM Aitozi  wrote:
>
> > Hi Jing,
> > What do you think about it? Can we move forward this feature?
> >
> > Thanks,
> > Aitozi.
> >
> > Aitozi  于2023年5月29日周一 09:56写道:
> >
> > > Hi Jing,
> > > > "Do you mean to support the AyncTableFunction beyond the
> > > LookupTableSource?"
> > > Yes, I mean to support the AyncTableFunction beyond the
> > LookupTableSource.
> > >
> > > The "AsyncTableFunction" is the function with ability to be executed
> > async
> > > (with AsyncWaitOperator).
> > > The async lookup join is a one of usage of this. So, we don't have to
> > bind
> > > the AyncTableFunction with LookupTableSource.
> > > If User-defined AsyncTableFunction is supported, user can directly use
> > > lateral table syntax to perform async operation.
> > >
> > > > "It would be better if you could elaborate the proposed changes wrt
> the
> > > CorrelatedCodeGenerator with more details"
> > >
> > > In the proposal, we use lateral table syntax to support the async table
> > > function. So the planner will also treat this statement to a
> > > CommonExecCorrelate node. So the runtime code should be generated in
> > > CorrelatedCodeGenerator.
> > > In CorrelatedCodeGenerator, we will know the TableFunction's Kind of
> > > `FunctionKind.Table` or `FunctionKind.ASYNC_TABLE`
> > > For  `FunctionKind.ASYNC_TABLE` we can generate a AsyncWaitOperator to
> > > execute the async table function.
> > >
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > >
> > > Jing Ge  于2023年5月29日周一 03:22写道:
> > >
> > >> Hi Aitozi,
> > >>
> > >> Thanks for the clarification. The naming "Lookup" might suggest using
> it
> > >> for table look up. But conceptually what the eval() method will do is
> to
> > >> get a collection of results(Row, RowData) from the given keys. How it
> > will
> > >> be done depends on the implementation, i.e. you can implement your own
> > >> Source[1][2]. The example in the FLIP should be able to be handled in
> > this
> > >> way.
> > >>
> > >> Do you mean to support the AyncTableFunction beyond the
> > LookupTableSource?
> > >> It would be better if you could elaborate the proposed changes wrt the
> > >> CorrelatedCodeGenerator with more details. Thanks!
> > >>
> > >> Best regards,
> > >> Jing
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L64
> > >> [2]
> > >>
> > >>
> >
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/AsyncTableFunctionProvider.java#L49
> > >>
> > >> On Sat, May 27, 2023 at 9:48 AM Aitozi  wrote:
> > >>
> > >> > Hi Jing,
> > >> > Thanks for your response. As stated in the FLIP, the purpose of
> > this
> > >> > FLIP is meant to support
> > >> > user-defined async table function. As described in flink document
> [1]
> > >> >
> > >> > Async table functions are special functions for table sources that
> > >> perform
> > >> > > a lookup.
> > >> > >
> > >> >
> > >> > So end user can not directly define and use async table function
> now.
> > An
> > >> > user case is reported in [2]
> > >> >
> > >> > So, in conclusion, no new interface is introduced, but we extend the
> > >> > ability to support user-defined async table function.
> > >> >
> > >> > [1]:
> > >> >
> > >> >
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/udfs/
> > >> > [2]:
> https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> > >> >
> > >> > Thanks.
> > >> > Aitozi.
> > >> >
> > >> >
> > >> > Jing Ge  于2023年5月27日周六 06:40写道:
> > >> >
> > >> > > Hi Aitozi,
> > >> > >
> > >> > > Thanks for your proposal. I am not quite sure if I understood your
> > >> > thoughts
> > >> > > correctly. You described a special case implementation of the
> > >> > > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-01 Thread Jing Ge
Hi Aitozi,

Sorry for the late reply. Would you like to update the proposed changes
with more details into the FLIP too?
I got your point. It looks like a rational idea. However, since lookup has
its clear async call requirement, are there any real use cases that
need this change? This will help us understand the motivation. After all,
lateral join and temporal lookup join[1] are quite different.

Best regards,
Jing


[1]
https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54

On Wed, May 31, 2023 at 8:53 AM Aitozi  wrote:

> Hi Jing,
> What do you think about it? Can we move forward this feature?
>
> Thanks,
> Aitozi.
>
> Aitozi  于2023年5月29日周一 09:56写道:
>
> > Hi Jing,
> > > "Do you mean to support the AyncTableFunction beyond the
> > LookupTableSource?"
> > Yes, I mean to support the AyncTableFunction beyond the
> LookupTableSource.
> >
> > The "AsyncTableFunction" is the function with ability to be executed
> async
> > (with AsyncWaitOperator).
> > The async lookup join is a one of usage of this. So, we don't have to
> bind
> > the AyncTableFunction with LookupTableSource.
> > If User-defined AsyncTableFunction is supported, user can directly use
> > lateral table syntax to perform async operation.
> >
> > > "It would be better if you could elaborate the proposed changes wrt the
> > CorrelatedCodeGenerator with more details"
> >
> > In the proposal, we use lateral table syntax to support the async table
> > function. So the planner will also treat this statement to a
> > CommonExecCorrelate node. So the runtime code should be generated in
> > CorrelatedCodeGenerator.
> > In CorrelatedCodeGenerator, we will know the TableFunction's Kind of
> > `FunctionKind.Table` or `FunctionKind.ASYNC_TABLE`
> > For  `FunctionKind.ASYNC_TABLE` we can generate a AsyncWaitOperator to
> > execute the async table function.
> >
> >
> > Thanks,
> > Aitozi.
> >
> >
> > Jing Ge  于2023年5月29日周一 03:22写道:
> >
> >> Hi Aitozi,
> >>
> >> Thanks for the clarification. The naming "Lookup" might suggest using it
> >> for table look up. But conceptually what the eval() method will do is to
> >> get a collection of results(Row, RowData) from the given keys. How it
> will
> >> be done depends on the implementation, i.e. you can implement your own
> >> Source[1][2]. The example in the FLIP should be able to be handled in
> this
> >> way.
> >>
> >> Do you mean to support the AyncTableFunction beyond the
> LookupTableSource?
> >> It would be better if you could elaborate the proposed changes wrt the
> >> CorrelatedCodeGenerator with more details. Thanks!
> >>
> >> Best regards,
> >> Jing
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L64
> >> [2]
> >>
> >>
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/AsyncTableFunctionProvider.java#L49
> >>
> >> On Sat, May 27, 2023 at 9:48 AM Aitozi  wrote:
> >>
> >> > Hi Jing,
> >> > Thanks for your response. As stated in the FLIP, the purpose of
> this
> >> > FLIP is meant to support
> >> > user-defined async table function. As described in flink document [1]
> >> >
> >> > Async table functions are special functions for table sources that
> >> perform
> >> > > a lookup.
> >> > >
> >> >
> >> > So end user can not directly define and use async table function now.
> An
> >> > user case is reported in [2]
> >> >
> >> > So, in conclusion, no new interface is introduced, but we extend the
> >> > ability to support user-defined async table function.
> >> >
> >> > [1]:
> >> >
> >> >
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/udfs/
> >> > [2]: https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> >> >
> >> > Thanks.
> >> > Aitozi.
> >> >
> >> >
> >> > Jing Ge  于2023年5月27日周六 06:40写道:
> >> >
> >> > > Hi Aitozi,
> >> > >
> >> > > Thanks for your proposal. I am not quite sure if I understood your
> >> > thoughts
> >> > > correctly. You described a special case implementation of the
> >> > > AsyncTableFunction with on public API changes. Would you please
> >> elaborate
> >> > > your purpose of writing a FLIP according to the FLIP
> documentation[1]?
> >> > > Thanks!
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >> > >
> >> > > Best regards,
> >> > > Jing
> >> > >
> >> > > On Wed, May 24, 2023 at 1:07 PM Aitozi 
> wrote:
> >> > >
> >> > > > May I ask for some feedback  :D
> >> > > >
> >> > > > Thanks,
> >> > > > Aitozi
> >> > > >
> >> > > > Aitozi  于2023年5月23日周二 19:14写道:
> >> > > > >
> >> > > > > Just catch an user case report from Giannis Polyzos for this
> 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-05-31 Thread Aitozi
Hi Jing,
What do you think about it? Can we move forward this feature?

Thanks,
Aitozi.

Aitozi  于2023年5月29日周一 09:56写道:

> Hi Jing,
> > "Do you mean to support the AyncTableFunction beyond the
> LookupTableSource?"
> Yes, I mean to support the AyncTableFunction beyond the LookupTableSource.
>
> The "AsyncTableFunction" is the function with ability to be executed async
> (with AsyncWaitOperator).
> The async lookup join is a one of usage of this. So, we don't have to bind
> the AyncTableFunction with LookupTableSource.
> If User-defined AsyncTableFunction is supported, user can directly use
> lateral table syntax to perform async operation.
>
> > "It would be better if you could elaborate the proposed changes wrt the
> CorrelatedCodeGenerator with more details"
>
> In the proposal, we use lateral table syntax to support the async table
> function. So the planner will also treat this statement to a
> CommonExecCorrelate node. So the runtime code should be generated in
> CorrelatedCodeGenerator.
> In CorrelatedCodeGenerator, we will know the TableFunction's Kind of
> `FunctionKind.Table` or `FunctionKind.ASYNC_TABLE`
> For  `FunctionKind.ASYNC_TABLE` we can generate a AsyncWaitOperator to
> execute the async table function.
>
>
> Thanks,
> Aitozi.
>
>
> Jing Ge  于2023年5月29日周一 03:22写道:
>
>> Hi Aitozi,
>>
>> Thanks for the clarification. The naming "Lookup" might suggest using it
>> for table look up. But conceptually what the eval() method will do is to
>> get a collection of results(Row, RowData) from the given keys. How it will
>> be done depends on the implementation, i.e. you can implement your own
>> Source[1][2]. The example in the FLIP should be able to be handled in this
>> way.
>>
>> Do you mean to support the AyncTableFunction beyond the LookupTableSource?
>> It would be better if you could elaborate the proposed changes wrt the
>> CorrelatedCodeGenerator with more details. Thanks!
>>
>> Best regards,
>> Jing
>>
>> [1]
>>
>> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L64
>> [2]
>>
>> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/AsyncTableFunctionProvider.java#L49
>>
>> On Sat, May 27, 2023 at 9:48 AM Aitozi  wrote:
>>
>> > Hi Jing,
>> > Thanks for your response. As stated in the FLIP, the purpose of this
>> > FLIP is meant to support
>> > user-defined async table function. As described in flink document [1]
>> >
>> > Async table functions are special functions for table sources that
>> perform
>> > > a lookup.
>> > >
>> >
>> > So end user can not directly define and use async table function now. An
>> > user case is reported in [2]
>> >
>> > So, in conclusion, no new interface is introduced, but we extend the
>> > ability to support user-defined async table function.
>> >
>> > [1]:
>> >
>> >
>> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/udfs/
>> > [2]: https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
>> >
>> > Thanks.
>> > Aitozi.
>> >
>> >
>> > Jing Ge  于2023年5月27日周六 06:40写道:
>> >
>> > > Hi Aitozi,
>> > >
>> > > Thanks for your proposal. I am not quite sure if I understood your
>> > thoughts
>> > > correctly. You described a special case implementation of the
>> > > AsyncTableFunction with on public API changes. Would you please
>> elaborate
>> > > your purpose of writing a FLIP according to the FLIP documentation[1]?
>> > > Thanks!
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>> > >
>> > > Best regards,
>> > > Jing
>> > >
>> > > On Wed, May 24, 2023 at 1:07 PM Aitozi  wrote:
>> > >
>> > > > May I ask for some feedback  :D
>> > > >
>> > > > Thanks,
>> > > > Aitozi
>> > > >
>> > > > Aitozi  于2023年5月23日周二 19:14写道:
>> > > > >
>> > > > > Just catch an user case report from Giannis Polyzos for this
>> usage:
>> > > > >
>> > > > > https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
>> > > > >
>> > > > > Aitozi  于2023年5月23日周二 17:45写道:
>> > > > > >
>> > > > > > Hi guys,
>> > > > > > I want to bring up a discussion about adding support of User
>> > > > > > Defined AsyncTableFunction in Flink.
>> > > > > > Currently, async table function are special functions for table
>> > > source
>> > > > > > to perform
>> > > > > > async lookup. However, it's worth to support the user defined
>> async
>> > > > > > table function.
>> > > > > > Because, in this way, the end SQL user can leverage it to
>> perform
>> > the
>> > > > > > async operation
>> > > > > > which is useful to maximum the system throughput especially for
>> IO
>> > > > > > bottleneck case.
>> > > > > >
>> > > > > > You can find some more detail in [1].
>> > > > > >
>> > > > > > Looking forward to feedback
>> > > > > >
>> > > > > >
>> > > > 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-05-28 Thread Aitozi
Hi Jing,
> "Do you mean to support the AyncTableFunction beyond the
LookupTableSource?"
Yes, I mean to support the AyncTableFunction beyond the LookupTableSource.

The "AsyncTableFunction" is the function with ability to be executed async
(with AsyncWaitOperator).
The async lookup join is a one of usage of this. So, we don't have to bind
the AyncTableFunction with LookupTableSource.
If User-defined AsyncTableFunction is supported, user can directly use
lateral table syntax to perform async operation.

> "It would be better if you could elaborate the proposed changes wrt the
CorrelatedCodeGenerator with more details"

In the proposal, we use lateral table syntax to support the async table
function. So the planner will also treat this statement to a
CommonExecCorrelate node. So the runtime code should be generated in
CorrelatedCodeGenerator.
In CorrelatedCodeGenerator, we will know the TableFunction's Kind of
`FunctionKind.Table` or `FunctionKind.ASYNC_TABLE`
For  `FunctionKind.ASYNC_TABLE` we can generate a AsyncWaitOperator to
execute the async table function.


Thanks,
Aitozi.


Jing Ge  于2023年5月29日周一 03:22写道:

> Hi Aitozi,
>
> Thanks for the clarification. The naming "Lookup" might suggest using it
> for table look up. But conceptually what the eval() method will do is to
> get a collection of results(Row, RowData) from the given keys. How it will
> be done depends on the implementation, i.e. you can implement your own
> Source[1][2]. The example in the FLIP should be able to be handled in this
> way.
>
> Do you mean to support the AyncTableFunction beyond the LookupTableSource?
> It would be better if you could elaborate the proposed changes wrt the
> CorrelatedCodeGenerator with more details. Thanks!
>
> Best regards,
> Jing
>
> [1]
>
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L64
> [2]
>
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/AsyncTableFunctionProvider.java#L49
>
> On Sat, May 27, 2023 at 9:48 AM Aitozi  wrote:
>
> > Hi Jing,
> > Thanks for your response. As stated in the FLIP, the purpose of this
> > FLIP is meant to support
> > user-defined async table function. As described in flink document [1]
> >
> > Async table functions are special functions for table sources that
> perform
> > > a lookup.
> > >
> >
> > So end user can not directly define and use async table function now. An
> > user case is reported in [2]
> >
> > So, in conclusion, no new interface is introduced, but we extend the
> > ability to support user-defined async table function.
> >
> > [1]:
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/udfs/
> > [2]: https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> >
> > Thanks.
> > Aitozi.
> >
> >
> > Jing Ge  于2023年5月27日周六 06:40写道:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for your proposal. I am not quite sure if I understood your
> > thoughts
> > > correctly. You described a special case implementation of the
> > > AsyncTableFunction with on public API changes. Would you please
> elaborate
> > > your purpose of writing a FLIP according to the FLIP documentation[1]?
> > > Thanks!
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Wed, May 24, 2023 at 1:07 PM Aitozi  wrote:
> > >
> > > > May I ask for some feedback  :D
> > > >
> > > > Thanks,
> > > > Aitozi
> > > >
> > > > Aitozi  于2023年5月23日周二 19:14写道:
> > > > >
> > > > > Just catch an user case report from Giannis Polyzos for this usage:
> > > > >
> > > > > https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> > > > >
> > > > > Aitozi  于2023年5月23日周二 17:45写道:
> > > > > >
> > > > > > Hi guys,
> > > > > > I want to bring up a discussion about adding support of User
> > > > > > Defined AsyncTableFunction in Flink.
> > > > > > Currently, async table function are special functions for table
> > > source
> > > > > > to perform
> > > > > > async lookup. However, it's worth to support the user defined
> async
> > > > > > table function.
> > > > > > Because, in this way, the end SQL user can leverage it to perform
> > the
> > > > > > async operation
> > > > > > which is useful to maximum the system throughput especially for
> IO
> > > > > > bottleneck case.
> > > > > >
> > > > > > You can find some more detail in [1].
> > > > > >
> > > > > > Looking forward to feedback
> > > > > >
> > > > > >
> > > > > > [1]:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/%5BFLIP-313%5D+Add+support+of+User+Defined+AsyncTableFunction
> > > > > >
> > > > > > Thanks,
> > > > > > Aitozi.
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-05-28 Thread Jing Ge
Hi Aitozi,

Thanks for the clarification. The naming "Lookup" might suggest using it
for table look up. But conceptually what the eval() method will do is to
get a collection of results(Row, RowData) from the given keys. How it will
be done depends on the implementation, i.e. you can implement your own
Source[1][2]. The example in the FLIP should be able to be handled in this
way.

Do you mean to support the AyncTableFunction beyond the LookupTableSource?
It would be better if you could elaborate the proposed changes wrt the
CorrelatedCodeGenerator with more details. Thanks!

Best regards,
Jing

[1]
https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L64
[2]
https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/AsyncTableFunctionProvider.java#L49

On Sat, May 27, 2023 at 9:48 AM Aitozi  wrote:

> Hi Jing,
> Thanks for your response. As stated in the FLIP, the purpose of this
> FLIP is meant to support
> user-defined async table function. As described in flink document [1]
>
> Async table functions are special functions for table sources that perform
> > a lookup.
> >
>
> So end user can not directly define and use async table function now. An
> user case is reported in [2]
>
> So, in conclusion, no new interface is introduced, but we extend the
> ability to support user-defined async table function.
>
> [1]:
>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/udfs/
> [2]: https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
>
> Thanks.
> Aitozi.
>
>
> Jing Ge  于2023年5月27日周六 06:40写道:
>
> > Hi Aitozi,
> >
> > Thanks for your proposal. I am not quite sure if I understood your
> thoughts
> > correctly. You described a special case implementation of the
> > AsyncTableFunction with on public API changes. Would you please elaborate
> > your purpose of writing a FLIP according to the FLIP documentation[1]?
> > Thanks!
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >
> > Best regards,
> > Jing
> >
> > On Wed, May 24, 2023 at 1:07 PM Aitozi  wrote:
> >
> > > May I ask for some feedback  :D
> > >
> > > Thanks,
> > > Aitozi
> > >
> > > Aitozi  于2023年5月23日周二 19:14写道:
> > > >
> > > > Just catch an user case report from Giannis Polyzos for this usage:
> > > >
> > > > https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> > > >
> > > > Aitozi  于2023年5月23日周二 17:45写道:
> > > > >
> > > > > Hi guys,
> > > > > I want to bring up a discussion about adding support of User
> > > > > Defined AsyncTableFunction in Flink.
> > > > > Currently, async table function are special functions for table
> > source
> > > > > to perform
> > > > > async lookup. However, it's worth to support the user defined async
> > > > > table function.
> > > > > Because, in this way, the end SQL user can leverage it to perform
> the
> > > > > async operation
> > > > > which is useful to maximum the system throughput especially for IO
> > > > > bottleneck case.
> > > > >
> > > > > You can find some more detail in [1].
> > > > >
> > > > > Looking forward to feedback
> > > > >
> > > > >
> > > > > [1]:
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/%5BFLIP-313%5D+Add+support+of+User+Defined+AsyncTableFunction
> > > > >
> > > > > Thanks,
> > > > > Aitozi.
> > >
> >
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-05-27 Thread Aitozi
Hi Jing,
Thanks for your response. As stated in the FLIP, the purpose of this
FLIP is meant to support
user-defined async table function. As described in flink document [1]

Async table functions are special functions for table sources that perform
> a lookup.
>

So end user can not directly define and use async table function now. An
user case is reported in [2]

So, in conclusion, no new interface is introduced, but we extend the
ability to support user-defined async table function.

[1]:
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/udfs/
[2]: https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b

Thanks.
Aitozi.


Jing Ge  于2023年5月27日周六 06:40写道:

> Hi Aitozi,
>
> Thanks for your proposal. I am not quite sure if I understood your thoughts
> correctly. You described a special case implementation of the
> AsyncTableFunction with on public API changes. Would you please elaborate
> your purpose of writing a FLIP according to the FLIP documentation[1]?
> Thanks!
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>
> Best regards,
> Jing
>
> On Wed, May 24, 2023 at 1:07 PM Aitozi  wrote:
>
> > May I ask for some feedback  :D
> >
> > Thanks,
> > Aitozi
> >
> > Aitozi  于2023年5月23日周二 19:14写道:
> > >
> > > Just catch an user case report from Giannis Polyzos for this usage:
> > >
> > > https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> > >
> > > Aitozi  于2023年5月23日周二 17:45写道:
> > > >
> > > > Hi guys,
> > > > I want to bring up a discussion about adding support of User
> > > > Defined AsyncTableFunction in Flink.
> > > > Currently, async table function are special functions for table
> source
> > > > to perform
> > > > async lookup. However, it's worth to support the user defined async
> > > > table function.
> > > > Because, in this way, the end SQL user can leverage it to perform the
> > > > async operation
> > > > which is useful to maximum the system throughput especially for IO
> > > > bottleneck case.
> > > >
> > > > You can find some more detail in [1].
> > > >
> > > > Looking forward to feedback
> > > >
> > > >
> > > > [1]:
> >
> https://cwiki.apache.org/confluence/display/FLINK/%5BFLIP-313%5D+Add+support+of+User+Defined+AsyncTableFunction
> > > >
> > > > Thanks,
> > > > Aitozi.
> >
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-05-26 Thread Jing Ge
Hi Aitozi,

Thanks for your proposal. I am not quite sure if I understood your thoughts
correctly. You described a special case implementation of the
AsyncTableFunction with on public API changes. Would you please elaborate
your purpose of writing a FLIP according to the FLIP documentation[1]?
Thanks!

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Best regards,
Jing

On Wed, May 24, 2023 at 1:07 PM Aitozi  wrote:

> May I ask for some feedback  :D
>
> Thanks,
> Aitozi
>
> Aitozi  于2023年5月23日周二 19:14写道:
> >
> > Just catch an user case report from Giannis Polyzos for this usage:
> >
> > https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> >
> > Aitozi  于2023年5月23日周二 17:45写道:
> > >
> > > Hi guys,
> > > I want to bring up a discussion about adding support of User
> > > Defined AsyncTableFunction in Flink.
> > > Currently, async table function are special functions for table source
> > > to perform
> > > async lookup. However, it's worth to support the user defined async
> > > table function.
> > > Because, in this way, the end SQL user can leverage it to perform the
> > > async operation
> > > which is useful to maximum the system throughput especially for IO
> > > bottleneck case.
> > >
> > > You can find some more detail in [1].
> > >
> > > Looking forward to feedback
> > >
> > >
> > > [1]:
> https://cwiki.apache.org/confluence/display/FLINK/%5BFLIP-313%5D+Add+support+of+User+Defined+AsyncTableFunction
> > >
> > > Thanks,
> > > Aitozi.
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-05-24 Thread Aitozi
May I ask for some feedback  :D

Thanks,
Aitozi

Aitozi  于2023年5月23日周二 19:14写道:
>
> Just catch an user case report from Giannis Polyzos for this usage:
>
> https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
>
> Aitozi  于2023年5月23日周二 17:45写道:
> >
> > Hi guys,
> > I want to bring up a discussion about adding support of User
> > Defined AsyncTableFunction in Flink.
> > Currently, async table function are special functions for table source
> > to perform
> > async lookup. However, it's worth to support the user defined async
> > table function.
> > Because, in this way, the end SQL user can leverage it to perform the
> > async operation
> > which is useful to maximum the system throughput especially for IO
> > bottleneck case.
> >
> > You can find some more detail in [1].
> >
> > Looking forward to feedback
> >
> >
> > [1]: 
> > https://cwiki.apache.org/confluence/display/FLINK/%5BFLIP-313%5D+Add+support+of+User+Defined+AsyncTableFunction
> >
> > Thanks,
> > Aitozi.


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-05-23 Thread Aitozi
Just catch an user case report from Giannis Polyzos for this usage:

https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b

Aitozi  于2023年5月23日周二 17:45写道:
>
> Hi guys,
> I want to bring up a discussion about adding support of User
> Defined AsyncTableFunction in Flink.
> Currently, async table function are special functions for table source
> to perform
> async lookup. However, it's worth to support the user defined async
> table function.
> Because, in this way, the end SQL user can leverage it to perform the
> async operation
> which is useful to maximum the system throughput especially for IO
> bottleneck case.
>
> You can find some more detail in [1].
>
> Looking forward to feedback
>
>
> [1]: 
> https://cwiki.apache.org/confluence/display/FLINK/%5BFLIP-313%5D+Add+support+of+User+Defined+AsyncTableFunction
>
> Thanks,
> Aitozi.