Hi all,

Besides all the good questions raised above, we seem all agree to have a
MVP for Flink 1.10, "to support users to create and persist a java
class-based udf that's already in classpath (no extra resource loading),
and use it later in queries".

IIUIC, to achieve that in 1.10, the following are currently the core
issues/blockers we should figure out, and solve them as our **highest
priority**:

- what's the syntax to distinguish function language (java, scala, python,
etc)? we only need to implement the java one in 1.10 but have to settle
down the long term solution
- how to persist function language in backend catalog? as a k-v pair in
properties map, or a dedicate field?
- do we really need to allow users set a properties map for udf? what needs
to be stored there? what are they used for?
- should a catalog impl be able to decide whether it can take a properties
map (if we decide to have one), and which language of a udf it can persist?
   - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot take a
properties map and is only able to persist java udf [1], unless we do
something hacky to it

I feel these questions are essential to Flink functions in the long run,
but most importantly, are also the minimum scope for Flink 1.10. Aspects
like resource loading security or compatibility with Hive syntax are
important too, however if we focus on them now, we may not be able to get
the MVP out in time.

[1]
-
https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html
-
https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html



On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <huangzhenqiu0...@gmail.com>
wrote:

> Hi Timo,
>
> Thanks for the feedback. I replied and adjust the design accordingly. For
> the concern of class loading.
> I think we need to distinguish the function class loading for Temporary and
> Permanent function.
>
> 1) For Permanent function, we can add it to the job graph so that we don't
> need to load it multiple times for the different sessions.
> 2) For Temporary function, we can register function with a session key, and
> use different class loaders in RuntimeContext implementation.
>
> I added more description in the doc. Please review it again.
>
>
> Best Regards
> Peter Huang
>
>
>
>
> On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <twal...@apache.org> wrote:
>
> > Hi Peter,
> >
> > thanks for your proposal. I left some comments in the FLIP document. I
> > agree with Terry that we can have a MVP in Flink 1.10 but should already
> > discuss the bigger picture as a DDL string cannot be changed easily once
> > released.
> >
> > In particular we should discuss how resources for function are loaded.
> > If they are simply added to the JobGraph they are available to all
> > functions and could potentially interfere with each other, right?
> >
> > Thanks,
> > Timo
> >
> >
> >
> > On 24.10.19 05:32, Terry Wang wrote:
> > > Hi Peter,
> > >
> > > Sorry late to reply. Thanks for your efforts on this and I just looked
> > through your design.
> > > I left some comments in the doc about alter function section and
> > function catalog interface.
> > > IMO, the overall design is ok and we can discuss further more about
> some
> > details.
> > > I also think it’s necessary to have this awesome feature limit to basic
> > function (of course better to have all :) ) in 1.10 release.
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > >> 2019年10月16日 14:19,Peter Huang <huangzhenqiu0...@gmail.com> 写道:
> > >>
> > >> Hi Xuefu,
> > >>
> > >> Thank you for the feedback. I think you are pointing out a similar
> > concern
> > >> with Bowen. Let me describe
> > >> how the catalog function and function factory will be changed in the
> > >> implementation section.
> > >> Then, we can have more discussion in detail.
> > >>
> > >>
> > >> Best Regards
> > >> Peter Huang
> > >>
> > >> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <usxu...@gmail.com> wrote:
> > >>
> > >>> Thanks to Peter for the proposal!
> > >>>
> > >>> I left some comments in the google doc. Besides what Bowen pointed
> > out, I'm
> > >>> unclear about how things  work end to end from the document. For
> > instance,
> > >>> SQL DDL-like function definition is mentioned. I guess just having a
> > DDL
> > >>> for it doesn't explain how it's supported functionally. I think it's
> > better
> > >>> to have some clarification on what is expected work and what's for
> the
> > >>> future.
> > >>>
> > >>> Thanks,
> > >>> Xuefu
> > >>>
> > >>>
> > >>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <bowenl...@gmail.com>
> wrote:
> > >>>
> > >>>> Hi Zhenqiu,
> > >>>>
> > >>>> Thanks for taking on this effort!
> > >>>>
> > >>>> A couple questions:
> > >>>> - Though this FLIP is about function DDL, can we also think about
> how
> > the
> > >>>> created functions can be mapped to CatalogFunction and see if we
> need
> > to
> > >>>> modify CatalogFunction interface? Syntax changes need to be backed
> by
> > the
> > >>>> backend.
> > >>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
> > among
> > >>> all
> > >>>> the proposed changes? The current overall scope seems to be quite
> > wide,
> > >>> and
> > >>>> it may be unrealistic to get everything in a single release, or
> even a
> > >>>> couple. However, I believe the most common user story can be
> > something as
> > >>>> simple as "being able to create and persist a java class-based udf
> and
> > >>> use
> > >>>> it later in queries", which will add great value for most Flink
> users
> > and
> > >>>> is achievable in 1.10.
> > >>>>
> > >>>> Bowen
> > >>>>
> > >>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> > huangzhenqiu0...@gmail.com
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> Dear Community,
> > >>>>>
> > >>>>> FLIP-79 Flink Function DDL Support
> > >>>>> <
> > >>>>>
> > >>>>
> > >>>
> >
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> > >>>>>>
> > >>>>>
> > >>>>> This proposal aims to support function DDL with the consideration
> of
> > >>> SQL
> > >>>>> syntax, language compliance, and advanced external UDF lib
> > >>> registration.
> > >>>>> The Flink DDL is initialized and discussed in the design
> > >>>>> <
> > >>>>>
> > >>>>
> > >>>
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> > >>>>>>
> > >>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
> focused
> > on
> > >>>> the
> > >>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
> > >>>> discussion
> > >>>>> of DDL for catalog, database, and function. Original the function
> DDL
> > >>> was
> > >>>>> under the scope of FLIP-69. After some discussion
> > >>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> > community,
> > >>>> we
> > >>>>> found that there are several ongoing efforts, such as FLIP-64 [3],
> > >>>> FLIP-65
> > >>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax
> of
> > >>>>> function DDL, the proposal wants to describe the problem clearly
> with
> > >>> the
> > >>>>> consideration of existing works and make sure the design aligns
> with
> > >>>>> efforts of API change of temporary objects and type inference for
> UDF
> > >>>>> defined by different languages.
> > >>>>>
> > >>>>> The FlLIP outlines the requirements from related works, and
> propose a
> > >>> SQL
> > >>>>> syntax to meet those requirements. The corresponding implementation
> > is
> > >>>> also
> > >>>>> discussed. Please kindly review and give feedback.
> > >>>>>
> > >>>>>
> > >>>>> Best Regards
> > >>>>> Peter Huang
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Xuefu Zhang
> > >>>
> > >>> "In Honey We Trust!"
> > >>>
> >
> >
>

Reply via email to