Hi,

Regarding stateful functions and MapView/DataView/ListView: I think it’s best 
to keep that for a later FLIP and focus on a more basic version. Supporting 
stateful functions, especially with MapView can potentially be very slow so we 
have to see what we can do there.

For the method names, I don’t know. If FLIP-64 passes they have to be changed. 
So we could use the final names right away, but I’m also fine with using the 
old method names for now.

Best,
Aljoscha

> On 5. Sep 2019, at 12:40, jincheng sun <sunjincheng...@gmail.com> wrote:
> 
> Hi Aljoscha,
> 
> Thanks for your comments!
> 
> Regarding to the FLIP scope, it seems that we have agreed on the design of
> the stateless function support.
> What do you think about starting the development of the stateless function
> support firstly and continue the discussion of stateful function support?
> Or you think we should split the current FLIP into two FLIPs and discuss
> the stateful function support in another thread?
> 
> Currently, the Python DataView/MapView/ListView interfaces design follow
> the Java/Scala naming conversions.
> Of couse, We can continue to discuss whether there are better solutions,
> i.e. using annotations.
> 
> Regarding to the magic logic to support DataView/MapView/ListView, it will
> be done by the framework and is transparent for users.
> Per my understanding, the magic logic is unavoidable no matter what the
> interfaces will be.
> 
> Regarding to the catalog support of python function:1) If it's stored in
> memory as temporary object, just as you said, users can call
> TableEnvironment.register_function(will change to
> register_temporary_function in FLIP-64)
> 2) If it's persisted in external storage, users can call
> Catalog.create_function. There will be no API change per my understanding.
> 
> What do you think?
> Best,Jincheng
> 
> Aljoscha Krettek <aljos...@apache.org> 于2019年9月5日周四 下午5:32写道:
> 
>> Hi,
>> 
>> Another thing to consider is the Scope of the FLIP. Currently, we try to
>> support (stateful) AggregateFunctions. I have some concerns about whether
>> or not DataView/MapView/ListView is a good interface because it requires
>> quite some magic from the runners to make it work, such as messing with the
>> TypeInformation and injecting objects at runtime. If the FLIP aims for the
>> minimum of ScalarFunctions and the whole execution harness, that should be
>> easier to agree on.
>> 
>> Another point is the naming of the new methods. I think Timo hinted at the
>> fact that we have to consider catalog support for functions. There is
>> ongoing work about differentiating between temporary objects and objects
>> that are stored in a catalog (FLIP-64 [1]). With this in mind, the method
>> for registering functions should be called register_temporary_function()
>> and so on. Unless we want to already think about mixing Python and Java
>> functions in the catalog, which is outside the scope of this FLIP, I think.
>> 
>> Best,
>> Aljoscha
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>> 
>> 
>>> On 5. Sep 2019, at 05:01, jincheng sun <sunjincheng...@gmail.com> wrote:
>>> 
>>> Hi Aljoscha,
>>> 
>>> That's a good points, so far, most of the code will live in flink-python
>>> module, and the rules and relNodes will be put into the both blink and
>>> flink planner modules, some of the common interface of required by
>> planners
>>> will be placed in flink-table-common. I think you are right, we should
>> try
>>> to ensure the changes of this feature is minimal.  For more detail we
>> would
>>> follow this principle when review the PRs.
>>> 
>>> Great thanks for your questions and remind!
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> 
>>> Aljoscha Krettek <aljos...@apache.org> 于2019年9月4日周三 下午8:58写道:
>>> 
>>>> Hi,
>>>> 
>>>> Things looks interesting so far!
>>>> 
>>>> I had one question: Where will most of the support code for this live?
>>>> Will this add the required code to flink-table-common or the different
>>>> runners? Can we implement this in such a way that only a minimal amount
>> of
>>>> support code is required in the parts of the Table API (and Table API
>>>> runners) that  are not python specific?
>>>> 
>>>> Best,
>>>> Aljoscha
>>>> 
>>>>> On 4. Sep 2019, at 14:14, Timo Walther <twal...@apache.org> wrote:
>>>>> 
>>>>> Hi Jincheng,
>>>>> 
>>>>> 2. Serializability of functions: "#2 is very convenient for users"
>> means
>>>> only until they have the first backwards-compatibility issue, after that
>>>> they will find it not so convinient anymore and will ask why the
>> framework
>>>> allowed storing such objects in a persistent storage. I don't want to be
>>>> picky about it, but wanted to raise awareness that sometimes it is ok to
>>>> limit use cases to guide users for devloping backwards-compatible
>> programs.
>>>>> 
>>>>> Thanks for the explanation fo the remaining items. It sounds reasonable
>>>> to me. Regarding the example with `getKind()`, I actually meant
>>>> `org.apache.flink.table.functions.ScalarFunction#getKind` we don't allow
>>>> users to override this property. And I think we should do something
>> similar
>>>> for the getLanguage property.
>>>>> 
>>>>> Thanks,
>>>>> Timo
>>>>> 
>>>>> On 03.09.19 15:01, jincheng sun wrote:
>>>>>> Hi Timo,
>>>>>> 
>>>>>> Thanks for the quick reply ! :)
>>>>>> I have added more example for #3 and #5 to the FLIP. That are great
>>>>>> suggestions !
>>>>>> 
>>>>>> Regarding 2:
>>>>>> 
>>>>>> There are two kind Serialization for CloudPickle(Which is different
>> from
>>>>>> Java):
>>>>>> 1) For class and function which can be imported, CloudPickle only
>>>>>> serialize the full path of the class and function (just like java
>> class
>>>>>> name).
>>>>>> 2) For the class and function which can not be imported, CloudPickle
>>>> will
>>>>>> serialize the full content of the class and function.
>>>>>> For #2, It means that we can not just store the full path of the class
>>>> and
>>>>>> function.
>>>>>> 
>>>>>> The above serialization is recursive.
>>>>>> 
>>>>>> However, there is indeed an problem of backwards compatibility when
>> the
>>>>>> module path of the parent class changed. But I think this is an rare
>>>> case
>>>>>> and acceptable. i.e., For Flink framework we never change the user
>>>>>> interface module path if we want to keep backwards compatibility. For
>>>> user
>>>>>> code, if they change the interface of UDF's parent, they should
>>>> re-register
>>>>>> their functions.
>>>>>> 
>>>>>> If we do not want support #2, we can store the full path of class and
>>>>>> function, in that case we have no backwards compatibility problem.
>> But I
>>>>>> think the #2 is very convenient for users.
>>>>>> 
>>>>>> What do you think?
>>>>>> 
>>>>>> Regarding 4:
>>>>>> As I mentioned earlier, there may be built-in Python functions and I
>>>> think
>>>>>> language is a "function" concept. Function and Language are orthogonal
>>>>>> concepts.
>>>>>> We may have R, GO and other language functions in the future, not only
>>>>>> user-defined, but also built-in functions.
>>>>>> 
>>>>>> You are right that users will not set this method and for Python
>>>> functions,
>>>>>> it will be set in the code-generated Java function by the framework.
>>>> So, I
>>>>>> think we should declare the getLanguage() in FunctionDefinition for
>> now.
>>>>>> (I'm not pretty sure what do you mean by saying that getKind() is
>> final
>>>> in
>>>>>> UserDefinedFunction?)
>>>>>> 
>>>>>> Best,
>>>>>> Jincheng
>>>>>> 
>>>>>> Timo Walther <twal...@apache.org> 于2019年9月3日周二 下午6:01写道:
>>>>>> 
>>>>>>> Hi Jincheng,
>>>>>>> 
>>>>>>> thanks for your response.
>>>>>>> 
>>>>>>> 2. Serializability of functions: Using some arbitrary serialization
>>>>>>> format for shipping a function to worker sounds fine to me. But once
>> we
>>>>>>> store functions a the catalog we need to think about backwards
>>>>>>> compatibility and evolution of interfaces etc. I'm not sure if
>>>>>>> CloudPickle is the right long-term storage format for this. If we
>> don't
>>>>>>> think about this in advance, we are basically violating our code
>>>> quality
>>>>>>> guide [1] of never use Java Serialization but in the Python-way. We
>> are
>>>>>>> using the RPC serialization for persistence.
>>>>>>> 
>>>>>>> 3. TableEnvironment: Can you add some example to the FLIP? Because
>> API
>>>>>>> code like the following is not covered there:
>>>>>>> 
>>>>>>> self.t_env.register_function("add_one", udf(lambda i: i + 1,
>>>>>>> DataTypes.BIGINT(),
>>>>>>>                                            DataTypes.BIGINT()))
>>>>>>> self.t_env.register_function("subtract_one", udf(SubtractOne(),
>>>>>>> DataTypes.BIGINT(),
>>>>>>> DataTypes.BIGINT()))
>>>>>>> self.t_env.register_function("add", add)
>>>>>>> 
>>>>>>> 4. FunctionDefinition: Your response still doesn't answer my question
>>>>>>> entirely. Why do we need FunctionDefinition.getLanguage() if this is
>> a
>>>>>>> "user-defined function" concept and not a "function" concept. In any
>>>>>>> case, all users should not be able to set this method. So it must be
>>>>>>> final in UserDefinedFunction similar to getKind().
>>>>>>> 
>>>>>>> 5. Function characteristics: If UserDefinedFunction is defined in
>>>>>>> Python, why is it not used in your example in FLIP-58. You could you
>>>>>>> extend the example to show how to specify these attributes in the
>> FLIP?
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Timo
>>>>>>> 
>>>>>>> [1]
>>>> https://flink.apache.org/contributing/code-style-and-quality-java.html
>>>>>>> 
>>>>>>> On 02.09.19 15:35, jincheng sun wrote:
>>>>>>>> Hi Timo,
>>>>>>>> 
>>>>>>>> Great thanks for your feedback. I would like to share my thoughts
>> with
>>>>>>> you
>>>>>>>> inline. :)
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jincheng
>>>>>>>> 
>>>>>>>> Timo Walther <twal...@apache.org> 于2019年9月2日周一 下午5:04写道:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> the FLIP looks awesome. However, I would like to discuss the
>> changes
>>>> to
>>>>>>>>> the user-facing parts again. Some feedback:
>>>>>>>>> 
>>>>>>>>> 1. DataViews: With the current non-annotation design for DataViews,
>>>> we
>>>>>>>>> cannot perform eager state declaration, right? At which point
>> during
>>>>>>>>> execution do we know which state is required by the function? We
>>>> need to
>>>>>>>>> instantiate the function first, right?
>>>>>>>>> 
>>>>>>>>>> We will analysis the Python AggregateFunction and extract the
>>>> DataViews
>>>>>>>> used in the Python AggregateFunction. This can be done
>>>>>>>> by instantiate a Python AggregateFunction, creating an accumulator
>> by
>>>>>>>> calling method create_accumulator and then analysis the created
>>>>>>>> accumulator. This is actually similar to the way that Java
>>>>>>>> AggregateFunction processing codegen logic. The extracted DataViews
>>>> can
>>>>>>>> then be used to construct the StateDescriptors in the operator,
>> i.e.,
>>>> we
>>>>>>>> should have hold the state spec and the state descriptor id in Java
>>>>>>>> operator and Python worker can access the state by specifying the
>>>>>>>> corresponding state descriptor id.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 2. Serializability of functions: How do we ensure serializability
>> of
>>>>>>>>> functions for catalog persistence? In the Scala/Java API, we would
>>>> like
>>>>>>>>> to register classes instead of instances soon. This is the only way
>>>> to
>>>>>>>>> store a function properly in a catalog or we need some
>>>>>>>>> serialization/deserialization logic in the function interfaces to
>>>>>>>>> convert an instance to string properties.
>>>>>>>>> 
>>>>>>>>>> The Python function will be serialized with CloudPickle anyway in
>>>> the
>>>>>>>> Python API as we need to transfer it to the Python worker which can
>>>> then
>>>>>>>> deserialize it for execution. The serialized Python function can be
>>>>>>> stored
>>>>>>>> into catalog.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 3. TableEnvironment: What is the signature of
>>>> `register_function(self,
>>>>>>>>> name, function)`? Does it accept both a class and function? Like
>>>> `class
>>>>>>>>> Sum` and `def split()`? Could you add some examples for registering
>>>> both
>>>>>>>>> kinds of functions?
>>>>>>>>> 
>>>>>>>>>> It has been already supported which you mentioned. You can find an
>>>>>>>> example in the POC code:
>>>>>>>> 
>>>>>>> 
>>>> 
>> https://github.com/dianfu/flink/commit/93f41ba173482226af7513fdec5acba72b274489#diff-34f619b31a7e38604e22a42a441fbe2fR26
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 4. FunctionDefinition: Function definition is not a user-defined
>>>>>>>>> function definition. It is the highest interface for both
>>>> user-defined
>>>>>>>>> and built-in functions. I'm not sure if getLanguage() should be
>> part
>>>> of
>>>>>>>>> this interface or one-level down which would be
>>>> `UserDefinedFunction`.
>>>>>>>>> Built-in functions will never be implemented in a different
>>>> language. In
>>>>>>>>> any case, I would vote for removing the UNKNOWN language, because
>> it
>>>>>>>>> does not solve anything. Why should a user declare a function that
>>>> the
>>>>>>>>> runtime can not handle? I also find the term `JAVA` confusing for
>>>> Scala
>>>>>>>>> users. How about `FunctionLanguage.JVM` instead?
>>>>>>>>> 
>>>>>>>>>> Actually we may have built-in Python functions in the future.
>>>> Regarding
>>>>>>>> to the following expression: py_udf1(a, b) + py_udf2(c), if there is
>>>>>>>> built-in Python
>>>>>>>> funciton for '+' operator, then we don't need to mix using Java and
>>>>>>> Python
>>>>>>>> UDFs. In this way, we can improve the execution performance.
>>>>>>>> Regarding to removing FunctionLanguage.UNKNOWN and renaming
>>>>>>>> FunctionLanguage.Java to FunctionLanguage.JVM, it makes more sense
>> to
>>>> me.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 5. Function characteristics: In the current design, function
>> classes
>>>> do
>>>>>>>>> not extend from any upper class. How can users declare
>>>> characteristics
>>>>>>>>> that are present in `FunctionDefinition` like determinism,
>>>> requirements,
>>>>>>>>> or soon also monotonism.
>>>>>>>>> 
>>>>>>>>>> Actually we have defined 'UserDefinedFunction' which is the base
>>>> class
>>>>>>>> for all user-defined functions.
>>>>>>>> We can define the deterministic, requirements, etc in this class.
>>>>>>>> Currently, we have already supported to define the deterministic.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Timo
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 02.09.19 03:38, Shaoxuan Wang wrote:
>>>>>>>>>> Hi Jincheng, Fudian, and Aljoscha,
>>>>>>>>>> I am assuming the proposed python UDX can also be applied to Flink
>>>> SQL.
>>>>>>>>>> Is this correct? If yes, I would suggest to title the FLIP as
>> "Flink
>>>>>>>>> Python
>>>>>>>>>> User-Defined Function" or "Flink Python User-Defined Function for
>>>>>>> Table".
>>>>>>>>>> Regards,
>>>>>>>>>> Shaoxuan
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 28, 2019 at 12:22 PM jincheng sun <
>>>>>>> sunjincheng...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks for the feedback Bowen!
>>>>>>>>>>> 
>>>>>>>>>>> Great thanks for create the FLIP and bring up the VOTE Dian!
>>>>>>>>>>> 
>>>>>>>>>>> Best, Jincheng
>>>>>>>>>>> 
>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月28日周三 上午11:32写道:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> I have started a voting thread [1]. Thanks a lot for your help
>>>> during
>>>>>>>>>>>> creating the FLIP @Jincheng.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>> 
>>>>>>>>>>>> Very appreciated for your comments. I have replied you in the
>>>> design
>>>>>>>>> doc.
>>>>>>>>>>>> As it seems that the comments doesn't affect the overall design,
>>>> I'll
>>>>>>>>> not
>>>>>>>>>>>> cancel the vote for now and we can continue the discussion in
>> the
>>>>>>>>> design
>>>>>>>>>>>> doc.
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Dian
>>>>>>>>>>>> 
>>>>>>>>>>>>> 在 2019年8月28日,上午11:05,Bowen Li <bowenl...@gmail.com> 写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Jincheng and Dian,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sorry for being late to the party. I took a glance at the
>>>> proposal,
>>>>>>>>>>> LGTM
>>>>>>>>>>>> in
>>>>>>>>>>>>> general, and I left only a couple comments.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Aug 26, 2019 at 8:05 PM Dian Fu <dian0511...@gmail.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks! It works.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 在 2019年8月27日,上午10:55,jincheng sun <sunjincheng...@gmail.com>
>>>> 写道:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Dian, can you check if you have edit access? :)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月26日周一 上午10:52写道:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Appreciated for the kind tips and offering of help.
>> Definitely
>>>>>>> need
>>>>>>>>>>>> it!
>>>>>>>>>>>>>>>> Could you grant me write permission for confluence? My Id:
>>>> Dian
>>>>>>> Fu
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 在 2019年8月26日,上午9:53,jincheng sun <sunjincheng...@gmail.com
>>> 
>>>> 写道:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for your feedback Hequn & Dian.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Dian, I am glad to see that you want help to create the
>> FLIP!
>>>>>>>>>>>>>>>>> Everyone will have first time, and I am very willing to
>> help
>>>> you
>>>>>>>>>>>>>> complete
>>>>>>>>>>>>>>>>> your first FLIP creation. Here some tips:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - First I'll give your account write permission for
>>>> confluence.
>>>>>>>>>>>>>>>>> - Before create the FLIP, please have look at the FLIP
>>>> Template
>>>>>>>>>>> [1],
>>>>>>>>>>>>>>>> (It's
>>>>>>>>>>>>>>>>> better to know more about FLIP by reading [2])
>>>>>>>>>>>>>>>>> - Create Flink Python UDFs related JIRAs after completing
>> the
>>>>>>> VOTE
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> FLIP.(I think you also can bring up the VOTE thread, if you
>>>>>>> want!
>>>>>>>>> )
>>>>>>>>>>>>>>>>> Any problems you encounter during this period,feel free to
>>>> tell
>>>>>>> me
>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> can solve them together. :)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>>>>>>>>>>>>>>>>> Hequn Cheng <chenghe...@gmail.com> 于2019年8月23日周五
>> 上午11:54写道:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> +1 for starting the vote.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks Jincheng a lot for the discussion.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best, Hequn
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu <
>>>>>>> dian0511...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> +1 to start the FLIP create and VOTE on this feature. I'm
>>>>>>>>> willing
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> help
>>>>>>>>>>>>>>>>>>> on the FLIP create if you don't mind. As I haven't
>> created
>>>> a
>>>>>>>>> FLIP
>>>>>>>>>>>>>>>> before,
>>>>>>>>>>>>>>>>>>> it will be great if you could help on this. :)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 在 2019年8月22日,下午11:41,jincheng sun <
>>>> sunjincheng...@gmail.com>
>>>>>>>>>>> 写道:
>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks a lot for your feedback. If there are no more
>>>>>>>>> suggestions
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> comments, I think it's better to  initiate a vote to
>>>> create a
>>>>>>>>>>> FLIP
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> Apache Flink Python UDFs.
>>>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best, Jincheng
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> jincheng sun <sunjincheng...@gmail.com> 于2019年8月15日周四
>>>>>>>>>>> 上午12:54写道:
>>>>>>>>>>>>>>>>>>>>> Hi Thomas,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks for your confirmation and the very important
>>>> reminder
>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>> bundle
>>>>>>>>>>>>>>>>>>>>> processing.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I have had add the description about how to perform
>>>> bundle
>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>> the perspective of checkpoint and watermark. Feel free
>> to
>>>>>>>>> leave
>>>>>>>>>>>>>>>>>>> comments if
>>>>>>>>>>>>>>>>>>>>> there are anything not describe clearly.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月14日周三
>> 上午10:08写道:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hi Thomas,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks a lot the suggestions.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Regarding to bundle processing, there is a section
>>>>>>>>>>>> "Checkpoint"[1]
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> design doc which talks about how to handle the
>>>> checkpoint.
>>>>>>>>>>>>>>>>>>>>>> However, I think you are right that we should talk
>> more
>>>>>>> about
>>>>>>>>>>>> it,
>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>> what's bundle processing, how it affects the
>> checkpoint
>>>> and
>>>>>>>>>>>>>>>>>> watermark,
>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>> to handle the checkpoint and watermark, etc.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 在 2019年8月14日,上午1:01,Thomas Weise <t...@apache.org>
>> 写道:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks for putting this together. The proposal is
>> very
>>>>>>>>>>>> detailed,
>>>>>>>>>>>>>>>>>>>>>> thorough
>>>>>>>>>>>>>>>>>>>>>>> and for me as a Beam Flink runner contributor easy to
>>>>>>>>>>>> understand
>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>> One thing that you should probably detail more is the
>>>>>>> bundle
>>>>>>>>>>>>>>>>>>>>>> processing. It
>>>>>>>>>>>>>>>>>>>>>>> is critically important for performance that multiple
>>>>>>>>>>> elements
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>> processed in a bundle. The default bundle size in the
>>>>>>> Flink
>>>>>>>>>>>>>> runner
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> 1s or
>>>>>>>>>>>>>>>>>>>>>>> 1000 elements, whichever comes first. And for
>>>> streaming,
>>>>>>> you
>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> find
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> logic necessary to align the bundle processing with
>>>>>>>>>>> watermarks
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> checkpointing here:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
>>>>>>>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun <
>>>>>>>>>>>>>>>>>>> sunjincheng...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> The Python Table API(without Python UDF support) has
>>>>>>>>> already
>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>>>>>>>> supported
>>>>>>>>>>>>>>>>>>>>>>>> and will be available in the coming release 1.9.
>>>>>>>>>>>>>>>>>>>>>>>> As Python UDF is very important for Python users,
>> we'd
>>>>>>> like
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> discussion about the Python UDF support in the
>> Python
>>>>>>> Table
>>>>>>>>>>>> API.
>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek, Dian Fu and I have discussed
>> offline
>>>>>>> and
>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>> drafted a
>>>>>>>>>>>>>>>>>>>>>>>> design doc[1]. It includes the following items:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function interfaces.
>>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function execution architecture.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> As mentioned by many guys in the previous discussion
>>>>>>>>>>>> thread[2],
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> portability framework was introduced in Apache Beam
>> in
>>>>>>>>>>> latest
>>>>>>>>>>>>>>>>>>>>>> releases. It
>>>>>>>>>>>>>>>>>>>>>>>> provides well-defined, language-neutral data
>>>> structures
>>>>>>> and
>>>>>>>>>>>>>>>>>> protocols
>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> language-neutral user-defined function execution.
>> This
>>>>>>>>>>> design
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> Beam's portability framework. We will introduce how
>> to
>>>>>>> make
>>>>>>>>>>>> use
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> Beam's
>>>>>>>>>>>>>>>>>>>>>>>> portability framework for user-defined function
>>>>>>> execution:
>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>>>> transmission, state access, checkpoint, metrics,
>>>> logging,
>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>>>>>> Considering that the design relies on Beam's
>>>> portability
>>>>>>>>>>>>>> framework
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> Python user-defined function execution and not all
>> the
>>>>>>>>>>>>>>>> contributors
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>> Flink community are familiar with Beam's portability
>>>>>>>>>>>> framework,
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>>> done a prototype[3] for proof of concept and also
>>>> ease of
>>>>>>>>>>>>>>>>>>>>>> understanding of
>>>>>>>>>>>>>>>>>>>>>>>> the design.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Welcome any feedback.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html
>>>>>>>>>>>>>>>>>>>>>>>> [3] https://github.com/dianfu/flink/commits/udf_poc
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to