Hi Terry,

Thanks for the quick response. We are on the same page. For the
properties of function DDL, let's see whether there is such a need from
other people.
I will start voting on the design in 24 hours.


Best Regards
Peter Huang







On Thu, Oct 31, 2019 at 3:18 AM Terry Wang <zjuwa...@gmail.com> wrote:

> Hi Peter,
>
> I’d like to share some thoughts from mysids:
> 1. what's the syntax to distinguish function language ?
>         +1 for using `[LANGUAGE JVM|PYTHON] USING JAR`
> 2. How to persist function language in backend catalog ?
>         + 1 for a separate field in CatalogFunction. But as to specific
> backend, we may persist it case by case. Special case includes how
> HiveCatalog store the kind of CatalogFucnction.
> 3. do we really need to allow users set a properties map for a udf?
>     There are use case requiring passing external arguments to udf for
> sure, but the need can also be met by passing arguments to `eval` when
> calling udf in sql.
> IMO, there is not much need to support set properties map for a udf.
>
> 4. Should a catalog implement to be able to decide whether it can take a
> properties map, and which language of a udf it can persist?
> IMO, it’s necessary for catalog implementation to provide such
> information. But for flink 1.10 map goal, we can just skip this part.
>
>
>
> Best,
> Terry Wang
>
>
>
> > 2019年10月30日 13:52,Peter Huang <huangzhenqiu0...@gmail.com> 写道:
> >
> > Hi Bowen,
> >
> > I can't agree more about we first have an agreement on the DDL syntax and
> > focus on the MVP in the current phase.
> >
> > 1) what's the syntax to distinguish function language
> > Currently, there are two opinions:
> >
> >   - USING 'python .....'
> >   - [LANGUAGE JVM|PYTHON] USING JAR '...'
> >
> > As we need to support multiple resources as HQL, we shouldn't repeat the
> > language symbol as a suffix of each resource.
> > I would prefer option two, but definitely open to more comments.
> >
> > 2) How to persist function language in backend catalog? as a k-v pair in
> > properties map, or a dedicate field?
> > Even though language type is also a property, I think a separate field in
> > CatalogFunction is a more clean solution.
> >
> > 3) do we really need to allow users set a properties map for udf? what
> needs
> > to be stored there? what are they used for?
> >
> > I am considering a type of use case that use UDFS for realtime inference.
> > The model is nested in the udf as a resource. But there are
> > multiple parameters are customizable. In this way, user can use
> properties
> > to define those parameters.
> >
> > I only have answers to these questions. For questions about the catalog
> > implementation, I hope we can collect more feedback from the community.
> >
> >
> > Best Regards
> > Peter Huang
> >
> >
> >
> >
> >
> > Best Regards
> > Peter Huang
> >
> > On Tue, Oct 29, 2019 at 11:31 AM Bowen Li <bowenl...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> Besides all the good questions raised above, we seem all agree to have a
> >> MVP for Flink 1.10, "to support users to create and persist a java
> >> class-based udf that's already in classpath (no extra resource loading),
> >> and use it later in queries".
> >>
> >> IIUIC, to achieve that in 1.10, the following are currently the core
> >> issues/blockers we should figure out, and solve them as our **highest
> >> priority**:
> >>
> >> - what's the syntax to distinguish function language (java, scala,
> python,
> >> etc)? we only need to implement the java one in 1.10 but have to settle
> >> down the long term solution
> >> - how to persist function language in backend catalog? as a k-v pair in
> >> properties map, or a dedicate field?
> >> - do we really need to allow users set a properties map for udf? what
> needs
> >> to be stored there? what are they used for?
> >> - should a catalog impl be able to decide whether it can take a
> properties
> >> map (if we decide to have one), and which language of a udf it can
> persist?
> >>   - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot take a
> >> properties map and is only able to persist java udf [1], unless we do
> >> something hacky to it
> >>
> >> I feel these questions are essential to Flink functions in the long run,
> >> but most importantly, are also the minimum scope for Flink 1.10. Aspects
> >> like resource loading security or compatibility with Hive syntax are
> >> important too, however if we focus on them now, we may not be able to
> get
> >> the MVP out in time.
> >>
> >> [1]
> >> -
> >>
> >>
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html
> >> -
> >>
> >>
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html
> >>
> >>
> >>
> >> On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <huangzhenqiu0...@gmail.com
> >
> >> wrote:
> >>
> >>> Hi Timo,
> >>>
> >>> Thanks for the feedback. I replied and adjust the design accordingly.
> For
> >>> the concern of class loading.
> >>> I think we need to distinguish the function class loading for Temporary
> >> and
> >>> Permanent function.
> >>>
> >>> 1) For Permanent function, we can add it to the job graph so that we
> >> don't
> >>> need to load it multiple times for the different sessions.
> >>> 2) For Temporary function, we can register function with a session key,
> >> and
> >>> use different class loaders in RuntimeContext implementation.
> >>>
> >>> I added more description in the doc. Please review it again.
> >>>
> >>>
> >>> Best Regards
> >>> Peter Huang
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <twal...@apache.org>
> wrote:
> >>>
> >>>> Hi Peter,
> >>>>
> >>>> thanks for your proposal. I left some comments in the FLIP document. I
> >>>> agree with Terry that we can have a MVP in Flink 1.10 but should
> >> already
> >>>> discuss the bigger picture as a DDL string cannot be changed easily
> >> once
> >>>> released.
> >>>>
> >>>> In particular we should discuss how resources for function are loaded.
> >>>> If they are simply added to the JobGraph they are available to all
> >>>> functions and could potentially interfere with each other, right?
> >>>>
> >>>> Thanks,
> >>>> Timo
> >>>>
> >>>>
> >>>>
> >>>> On 24.10.19 05:32, Terry Wang wrote:
> >>>>> Hi Peter,
> >>>>>
> >>>>> Sorry late to reply. Thanks for your efforts on this and I just
> >> looked
> >>>> through your design.
> >>>>> I left some comments in the doc about alter function section and
> >>>> function catalog interface.
> >>>>> IMO, the overall design is ok and we can discuss further more about
> >>> some
> >>>> details.
> >>>>> I also think it’s necessary to have this awesome feature limit to
> >> basic
> >>>> function (of course better to have all :) ) in 1.10 release.
> >>>>>
> >>>>> Best,
> >>>>> Terry Wang
> >>>>>
> >>>>>
> >>>>>
> >>>>>> 2019年10月16日 14:19,Peter Huang <huangzhenqiu0...@gmail.com> 写道:
> >>>>>>
> >>>>>> Hi Xuefu,
> >>>>>>
> >>>>>> Thank you for the feedback. I think you are pointing out a similar
> >>>> concern
> >>>>>> with Bowen. Let me describe
> >>>>>> how the catalog function and function factory will be changed in the
> >>>>>> implementation section.
> >>>>>> Then, we can have more discussion in detail.
> >>>>>>
> >>>>>>
> >>>>>> Best Regards
> >>>>>> Peter Huang
> >>>>>>
> >>>>>> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <usxu...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Thanks to Peter for the proposal!
> >>>>>>>
> >>>>>>> I left some comments in the google doc. Besides what Bowen pointed
> >>>> out, I'm
> >>>>>>> unclear about how things  work end to end from the document. For
> >>>> instance,
> >>>>>>> SQL DDL-like function definition is mentioned. I guess just having
> >> a
> >>>> DDL
> >>>>>>> for it doesn't explain how it's supported functionally. I think
> >> it's
> >>>> better
> >>>>>>> to have some clarification on what is expected work and what's for
> >>> the
> >>>>>>> future.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Xuefu
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <bowenl...@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Hi Zhenqiu,
> >>>>>>>>
> >>>>>>>> Thanks for taking on this effort!
> >>>>>>>>
> >>>>>>>> A couple questions:
> >>>>>>>> - Though this FLIP is about function DDL, can we also think about
> >>> how
> >>>> the
> >>>>>>>> created functions can be mapped to CatalogFunction and see if we
> >>> need
> >>>> to
> >>>>>>>> modify CatalogFunction interface? Syntax changes need to be backed
> >>> by
> >>>> the
> >>>>>>>> backend.
> >>>>>>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
> >>>> among
> >>>>>>> all
> >>>>>>>> the proposed changes? The current overall scope seems to be quite
> >>>> wide,
> >>>>>>> and
> >>>>>>>> it may be unrealistic to get everything in a single release, or
> >>> even a
> >>>>>>>> couple. However, I believe the most common user story can be
> >>>> something as
> >>>>>>>> simple as "being able to create and persist a java class-based udf
> >>> and
> >>>>>>> use
> >>>>>>>> it later in queries", which will add great value for most Flink
> >>> users
> >>>> and
> >>>>>>>> is achievable in 1.10.
> >>>>>>>>
> >>>>>>>> Bowen
> >>>>>>>>
> >>>>>>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> >>>> huangzhenqiu0...@gmail.com
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Dear Community,
> >>>>>>>>>
> >>>>>>>>> FLIP-79 Flink Function DDL Support
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This proposal aims to support function DDL with the consideration
> >>> of
> >>>>>>> SQL
> >>>>>>>>> syntax, language compliance, and advanced external UDF lib
> >>>>>>> registration.
> >>>>>>>>> The Flink DDL is initialized and discussed in the design
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> >>>>>>>>>>
> >>>>>>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
> >>> focused
> >>>> on
> >>>>>>>> the
> >>>>>>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
> >>>>>>>> discussion
> >>>>>>>>> of DDL for catalog, database, and function. Original the function
> >>> DDL
> >>>>>>> was
> >>>>>>>>> under the scope of FLIP-69. After some discussion
> >>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> >>>> community,
> >>>>>>>> we
> >>>>>>>>> found that there are several ongoing efforts, such as FLIP-64
> >> [3],
> >>>>>>>> FLIP-65
> >>>>>>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax
> >>> of
> >>>>>>>>> function DDL, the proposal wants to describe the problem clearly
> >>> with
> >>>>>>> the
> >>>>>>>>> consideration of existing works and make sure the design aligns
> >>> with
> >>>>>>>>> efforts of API change of temporary objects and type inference for
> >>> UDF
> >>>>>>>>> defined by different languages.
> >>>>>>>>>
> >>>>>>>>> The FlLIP outlines the requirements from related works, and
> >>> propose a
> >>>>>>> SQL
> >>>>>>>>> syntax to meet those requirements. The corresponding
> >> implementation
> >>>> is
> >>>>>>>> also
> >>>>>>>>> discussed. Please kindly review and give feedback.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Best Regards
> >>>>>>>>> Peter Huang
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Xuefu Zhang
> >>>>>>>
> >>>>>>> "In Honey We Trust!"
> >>>>>>>
> >>>>
> >>>>
> >>>
> >>
>
>

Reply via email to