[GitHub] [flink] WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs.

GitBox Fri, 11 Oct 2019 00:52:01 -0700

WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument 
Python UDFs.
URL: https://github.com/apache/flink/pull/9865#issuecomment-540956430
 
 
   @hequn8128 Thanks for your feedback! Dian Fu and me has talked about these 
two approach and we come to an agreement that skip the optimization in 
`ExpressionReducer` and optimize the UDFs during runtime. Our thoughts are as 
follows:
   1. How to optimize Python UDFs during runtime?
   After support constant parameters in Python UDF(see [this 
PR](https://github.com/apache/flink/pull/9858)), we can do this optimization 
when evaluating the chained Python UDFs in python worker: 
   If the evaluated Python UDF is deterministic and has no argument or its 
arguments are all constant value, which means it will always return a constant 
value, replace it with the constant result value.
   This rule can be applied recursively until all the deterministic UDFs with 
constant inputs have been replaced. If the root nodes of the chained Python 
UDFs become constant values, we can only transmit them only once and replace 
them with None in following transmission to save IO resource. The Java operator 
also knows which fields of the evaluated result should be constant value rather 
than None because the reduce rule can be applied in Java side too. No 
additional interaction between Java operator and Python worker is needed in 
this design.
   2. Why not optimize Python UDFs during optimization phase?
   Reducing Python UDFs in optimization phase is not a easy work in current 
design. It means the generated Java wrappers of Python UDFs can be evaluated 
and return the correct result. In other word we need run Python UDFs in client 
side, but the Python UDFs is designed to run on cluster whose python 
environment may different from client side after we introduce environment and 
dependency management in the future. To solve the environment problem, we need 
to prepare a python environment that is the same as the python environment on 
cluster before reducing Python UDFs. To evaluate the Python UDF, we need to 
implement a new UDF runner which does not depend on Apache Beam(the client side 
of Flink Python API does not depend on Apache Beam). We know if we complete 
these it will be a perfect solution of this problem, but it is too expensive 
compared to runtime optimization.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs.

Reply via email to