WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs. URL: https://github.com/apache/flink/pull/9865#issuecomment-540956430 @hequn8128 Thanks for your feedback! Dian Fu and me has talked about these two approach and we come to an agreement that skip the optimization in `ExpressionReducer` and optimize the UDFs during runtime. Our thoughts are as follows: 1. How to optimize Python UDFs during runtime? After support constant parameters in Python UDF(see [this PR](https://github.com/apache/flink/pull/9858)), we can do this optimization when evaluating the chained Python UDFs in python worker: If the evaluated Python UDF is deterministic and has no argument or its arguments are all constant value, which means it will always return a constant value, replace it with the constant result value. This rule can be applied recursively until all the deterministic UDFs with constant inputs have been replaced. If the root nodes of the chained Python UDFs become constant values, we can only transmit them only once and replace them with None in following transmission to save IO resource. The Java operator also knows which fields of the evaluated result should be constant value rather than None because the reduce rule can be applied in Java side too. No additional interaction between Java operator and Python worker is needed in this design. 2. Why not optimize Python UDFs during optimization phase? Reducing Python UDFs in optimization phase is not a easy work in current design. It means the generated Java wrappers of Python UDFs can be evaluated and return the correct result. In other word we need run Python UDFs in client side, but the Python UDFs is designed to run on cluster whose python environment may different from client side after we introduce environment and dependency management in the future. To solve the environment problem, we need to prepare a python environment that is the same as the python environment on cluster before reducing Python UDFs. To evaluate the Python UDF, we need to implement a new UDF runner which does not depend on Apache Beam(the client side of Flink Python API does not depend on Apache Beam). We know if we complete these it will be a perfect solution of this problem, but it is too expensive compared to runtime optimization.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services