Hi Awake,
    Thanks for your good point, updated

Best,
Aitozi.

宇航 李 <liyuh...@yuewen.com> 于2023年7月5日周三 11:29写道:

> Hi Aitozi,
>
> I think it is necessary to add the following description in FLIP to
> express the difference between user-defined asynchronous table function and
> AsyncTableFunction:
>
> User-defined asynchronous table functions allow complex parameters (e.g.,
> Row type) to be passed to function, which is important in RPC, rather than
> using ‘join … on ...'.
>
> Thanks,
> Awake.
>
>
> On 2023/06/26 02:31:59 Aitozi wrote:
> > Hi Lincoln,
> >     Thanks for your confirmation. I have updated the consensus to the
> FLIP
> > doc.
> > If there are no other comments, I'd like to restart the vote process in
> [1]
> > today.
> >
> > https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx
> >
> > Thanks,
> > Aitozi.
> >
> > Lincoln Lee <li...@gmail.com> 于2023年6月21日周三 22:29写道:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for your updates!
> > >
> > > By the design of hints, the hints after select clause belong to the
> query
> > > hints category, and this new hint is also a kind of join hints[1].
> > > Join table function is one of the join type defined by flink sql
> joins[2],
> > > all existing join hints[1] omit the 'join' keyword,
> > > so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for
> > > 'ASYNC_TABLE_FUNC_JOIN').
> > >
> > > Since a short Chinese holiday is coming, I suggest waiting for other
> > > people's responses before continuing to vote (next monday?)
> > >
> > > Btw, I discussed with @fudian offline about pyflink support, there
> should
> > > be no known issues, so you can create a subtask with pyflink support
> after
> > > the vote passed.
> > >
> > > [1]
> > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints
> > > [2]
> > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Aitozi <gj...@gmail.com> 于2023年6月18日周日 21:18写道:
> > >
> > > > Hi all,
> > > >     Sorry for the late reply, I have a discussion with Lincoln
> offline,
> > > > mainly about
> > > > the naming of the hints option. Thanks Lincoln for the valuable
> > > > suggestions.
> > > >
> > > > Let me answer the last email inline.
> > > >
> > > > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
> call
> > > as
> > > > an example?
> > > >
> > > > Sure, will give an example when adding the doc of async udtf and will
> > > > update the FLIP simultaneously
> > > >
> > > > >For the name of this query hint, 'LATERAL' (include its internal
> > > options)
> > > > don't show any relevance to async, but I haven't thought of a
> suitable
> > > name
> > > > at the moment,
> > > >
> > > > After some discussion with Lincoln, We prefer to choose one of the
> > > > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
> > > > Besides, In my opinion the keyword `lateral`'s use scenario is wider
> than
> > > > the table function join, but in this case we only want to config
> > > > the async table function, So I'm a bit more lean to the
> > > `ASYNC_TABLE_FUNC`.
> > > > Looking forward to some inputs if you guys have
> > > > some better suggestion on the naming.
> > > >
> > > > For the usage of the hints config option, I have updated the section
> > > > of ConfigOption, you can refer to the FLIP
> > > > for more details.
> > > >
> > > > >Also, the terms 'correlate join' and 'lateral join' are not the
> same as
> > > in
> > > > the current joins page[1], so maybe it would be better if we unified
> them
> > > > into  'join table function'
> > > >
> > > > Yes, we should unified to the 'join table function', updated.
> > > >
> > > > Best,
> > > > Aitozi
> > > >
> > > > Lincoln Lee <li...@gmail.com> 于2023年6月15日周四 09:15写道:
> > > >
> > > > > Hi Aitozi,
> > > > >
> > > > > Thanks for your reply!  Gives sql users more flexibility to get
> > > > > asynchronous processing capabilities via lateral join table
> function +1
> > > > for
> > > > > this
> > > > >
> > > > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
> call
> > > > as
> > > > > an example?
> > > > >
> > > > > For the name of this query hint, 'LATERAL' (include its internal
> > > options)
> > > > > don't show any relevance to async, but I haven't thought of a
> suitable
> > > > name
> > > > > at the moment,
> > > > > maybe we need to highlight the async keyword directly, we can also
> see
> > > if
> > > > > others have better candidates
> > > > >
> > > > > For the hint option "timeout = '180s'" should be "'timeout' =
> '180s'",
> > > > > seems a typo in the flip. And use upper case for all keywords in
> sql
> > > > > examples.
> > > > > Also, the terms 'correlate join' and 'lateral join' are not the
> same as
> > > > in
> > > > > the current joins page[1], so maybe it would be better if we
> unified
> > > them
> > > > > into  'join table function'
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > > >
> > > > > Aitozi <gj...@gmail.com> 于2023年6月14日周三 16:11写道:
> > > > >
> > > > > > Hi Lincoln
> > > > > >
> > > > > >     Very thanks for your valuable question. I will try to answer
> your
> > > > > > questions inline.
> > > > > >
> > > > > > >Does the async udtf bring any additional benefits besides a
> > > > > > lighter implementation?
> > > > > >
> > > > > > IMO, async udtf is more than a lighter implementation. It can
> act as
> > > a
> > > > > > general way for sql users to use the async operator. And they
> don't
> > > > have
> > > > > to
> > > > > > bind the async function with a table (a LookupTable), and they
> are
> > > not
> > > > > > forced to join on an equality join condition, and they can use
> it to
> > > do
> > > > > > more than enrich data.
> > > > > >
> > > > > > The async lookup join is more like a subset/specific usage of
> async
> > > > udtf.
> > > > > > The specific version has more opportunity to be optimized (like
> push
> > > > > down)
> > > > > > is acceptable. Async table function should be categorized to
> > > > used-defined
> > > > > > function.
> > > > > >
> > > > > > >Should users
> > > > > >
> > > > > > migrate to the lookup source when they encounter similar
> requirements
> > > > or
> > > > > >
> > > > > > problems, or should we develop an additional set of similar
> > > mechanisms?
> > > > > >
> > > > > > As I clarified above, the lookup join is a specific usage of
> async
> > > > udtf.
> > > > > So
> > > > > > it deserves more refined optimization like caching / retryable.
> But
> > > it
> > > > > may
> > > > > > not all
> > > > > >
> > > > > > suitable for the async udtf. As function, it can be
> deterministic/or
> > > > > > non-deterministic. So caching is not suitable, and we also do not
> > > have
> > > > a
> > > > > > common cache for the udf now. So I think optimization like
> > > > caching/retry
> > > > > > should be handed over to the function implementor.
> > > > > >
> > > > > > > the newly added query hint need a different name that
> > > > > > can be easier related to the lateral operation as the current
> join
> > > > > hints[5]
> > > > > > do.
> > > > > >
> > > > > >
> > > > > > What about using LATERAL?
> > > > > >
> > > > > > as below
> > > > > >
> > > > > > SELECT /*+ LATERAL('output-mode' = 'ordered', 'capacity' = '200',
> > > > > timeout =
> > > > > > '180s') */ a, c1, c2
> > > > > >
> > > > > > FROM T1
> > > > > >
> > > > > > LEFT JOIN lateral TABLE (async_split(b)) AS T(c1, c2) ON true
> > > > > >
> > > > > > >For the async func example, since the target scenario is an
> external
> > > > io
> > > > > > operation, it's better to add the `close` method to actively
> release
> > > > > > resources as a good example for users
> > > > > >
> > > > > >
> > > > > > Make sense to me, will update the FLIP
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Aitozi.
> > > > > >
> > > > > > Lincoln Lee <li...@gmail.com> 于2023年6月14日周三 14:24写道:
> > > > > >
> > > > > > > Hi Aitozi,
> > > > > > >
> > > > > > > Sorry for the lately reply here!  Supports async
> > > > > > udtf(`AsyncTableFunction`)
> > > > > > > directly in sql seems like an attractive feature, but there're
> two
> > > > > issues
> > > > > > > that need to be addressed before we can be sure to add it:
> > > > > > > 1. As mentioned in the flip[1], the current lookup function can
> > > > already
> > > > > > > implement the requirements, but it requires implementing an
> extra
> > > > > > > `LookupTableSource` and explicitly declaring the table schema
> > > (which
> > > > > can
> > > > > > > help implementers the various push-down optimizations
> supported by
> > > > the
> > > > > > > planner). Does the async udtf bring any additional benefits
> > > besides a
> > > > > > > lighter implementation?
> > > > > > > 2. FLIP-221[2] abstracts a reusable cache and metric
> infrastructure
> > > > for
> > > > > > > lookup sources, which are important to improve performance and
> > > > > > > observability for high overhead external io scenarios, how do
> we
> > > > > > integrate
> > > > > > > and reuse these capabilities after introducing async udtf?
> Should
> > > > users
> > > > > > > migrate to the lookup source when they encounter similar
> > > requirements
> > > > > or
> > > > > > > problems, or should we develop an additional set of similar
> > > > mechanisms?
> > > > > > (a
> > > > > > > similarly case:  FLIP-234[3] introduced the retryable
> capability
> > > for
> > > > > > lookup
> > > > > > > join)
> > > > > > >
> > > > > > > For the flip itself,
> > > > > > > 1. Considering the 'options' is already used as the dynamic
> table
> > > > > > > options[4] in flink, the newly added query hint need a
> different
> > > name
> > > > > > that
> > > > > > > can be easier related to the lateral operation as the current
> join
> > > > > > hints[5]
> > > > > > > do.
> > > > > > > 2. For the async func example, since the target scenario is an
> > > > external
> > > > > > io
> > > > > > > operation, it's better to add the `close` method to actively
> > > release
> > > > > > > resources as a good example for users. Also in terms of the
> > > > determinism
> > > > > > of
> > > > > > > a function, it is important to remind users that unless the
> > > behavior
> > > > of
> > > > > > the
> > > > > > > function is deterministic, it needs to be explicitly declared
> as
> > > > > > > non-deterministic.
> > > > > > >
> > > > > > > [1].
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction?src=contextnavpagetreemode
> > > > > > > [2].
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric?src=contextnavpagetreemode
> > > > > > > [3].
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
> > > > > > > [4].
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL?src=contextnavpagetreemode
> > > > > > > [5].
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job?src=contextnavpagetreemode
> > > > > > >
> > > > > > > Best,
> > > > > > > Lincoln Lee
> > > > > > >
> > > > > > >
> > > > > > > Aitozi <gj...@gmail.com> 于2023年6月13日周二 11:30写道:
> > > > > > >
> > > > > > > > Get your meaning now, thanks :)
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Aitozi.
> > > > > > > >
> > > > > > > > Feng Jin <ji...@gmail.com> 于2023年6月13日周二 11:16写道:
> > > > > > > >
> > > > > > > > > Hi Aitozi,
> > > > > > > > >
> > > > > > > > > Sorry for the confusing description.
> > > > > > > > >
> > > > > > > > > What I meant was that if we need to remind users about tire
> > > > safety
> > > > > > > > issues,
> > > > > > > > > we should introduce the new UDTF interface instead of
> executing
> > > > the
> > > > > > > > > original UDTF asynchronously. Therefore, I agree with
> > > introducing
> > > > > the
> > > > > > > > > AsyncTableFunction.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Feng
> > > > > > > > >
> > > > > > > > > On Tue, Jun 13, 2023 at 10:42 AM Aitozi <gj...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Feng,
> > > > > > > > > >     Thanks for your question. We do not provide a way to
> > > switch
> > > > > the
> > > > > > > > UDTF
> > > > > > > > > > between sync and async way,
> > > > > > > > > > So there should be no thread safety problem here.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Aitozi
> > > > > > > > > >
> > > > > > > > > > Feng Jin <ji...@gmail.com> 于2023年6月13日周二 10:31写道:
> > > > > > > > > >
> > > > > > > > > > > Hi Aitozi, We do need to remind users about thread
> safety
> > > > > issues.
> > > > > > > > Thank
> > > > > > > > > > you
> > > > > > > > > > > for your efforts on this FLIP. I have no further
> questions.
> > > > > > > > > > > Best, Feng
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jun 13, 2023 at 6:05 AM Jing Ge
> > > > > > <j...@ververica.com.invalid
> > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Aitozi,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for taking care of that part. I have no other
> > > > concern.
> > > > > > > > > > > >
> > > > > > > > > > > > Best regards,
> > > > > > > > > > > > Jing
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 12, 2023 at 5:38 PM Aitozi <
> > > > gjying1...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > BTW, If there are no other more blocking issue /
> > > > comments,
> > > > > I
> > > > > > > > would
> > > > > > > > > > like
> > > > > > > > > > > > to
> > > > > > > > > > > > > start a VOTE in another thread this wednesday 6.14
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Aitozi.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Aitozi <gj...@gmail.com> 于2023年6月12日周一 23:34写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi, Jing,
> > > > > > > > > > > > > >     Thanks for your explanation. I get your point
> > > now.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For the performance part, I think it's a good
> idea to
> > > > run
> > > > > > > with
> > > > > > > > > > > > returning
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > big table case, the memory consumption
> > > > > > > > > > > > > > should be a point to be taken care about.
> Because in
> > > > the
> > > > > > > > ordered
> > > > > > > > > > > mode,
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > head element in buffer may affect the
> > > > > > > > > > > > > > total memory consumption.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Aitozi.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jing Ge <ji...@ververica.com.invalid>
> 于2023年6月12日周一
> > > > > > 20:28写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Hi Aitozi,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Which key will be used for lookup is not an
> issue,
> > > > only
> > > > > > one
> > > > > > > > row
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > >> required for each key in order to enrich it.
> True,
> > > it
> > > > > > > depends
> > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > > > > >> implementation whether multiple rows or single
> row
> > > for
> > > > > > each
> > > > > > > > key
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > >> returned. However, for the lookup & enrichment
> > > > scenario,
> > > > > > one
> > > > > > > > > > row/key
> > > > > > > > > > > > is
> > > > > > > > > > > > > >> recommended, otherwise, like I mentioned
> previously,
> > > > > > > > enrichment
> > > > > > > > > > > won't
> > > > > > > > > > > > > >> work.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I am a little bit concerned about returning a
> big
> > > > table
> > > > > > for
> > > > > > > > each
> > > > > > > > > > > key,
> > > > > > > > > > > > > >> since
> > > > > > > > > > > > > >> it will take the async call longer to return and
> > > need
> > > > > more
> > > > > > > > > memory.
> > > > > > > > > > > The
> > > > > > > > > > > > > >> performance tests should cover this scenario.
> This
> > > is
> > > > > not
> > > > > > a
> > > > > > > > > > blocking
> > > > > > > > > > > > > issue
> > > > > > > > > > > > > >> for this FLIP.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Best regards,
> > > > > > > > > > > > > >> Jing
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi <
> > > > > > > gjying1...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > Hi Jing,
> > > > > > > > > > > > > >> >     I means the join key is not necessary
> [message truncated...]

Reply via email to