Hi all, The Python Table API(without Python UDF support) has already been supported and will be available in the coming release 1.9. As Python UDF is very important for Python users, we'd like to start the discussion about the Python UDF support in the Python Table API. Aljoscha Krettek, Dian Fu and I have discussed offline and have drafted a design doc[1]. It includes the following items:
- The user-defined function interfaces. - The user-defined function execution architecture. As mentioned by many guys in the previous discussion thread[2], a portability framework was introduced in Apache Beam in latest releases. It provides well-defined, language-neutral data structures and protocols for language-neutral user-defined function execution. This design is based on Beam's portability framework. We will introduce how to make use of Beam's portability framework for user-defined function execution: data transmission, state access, checkpoint, metrics, logging, etc. Considering that the design relies on Beam's portability framework for Python user-defined function execution and not all the contributors in Flink community are familiar with Beam's portability framework, we have done a prototype[3] for proof of concept and also ease of understanding of the design. Welcome any feedback. Best, Jincheng [1] https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing [2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html [3] https://github.com/dianfu/flink/commits/udf_poc
