Hi everyone, I would like to start a discussion thread on "Support Pandas UDAF in PyFlink"
Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves the high serialization/deserialization overhead in Python UDF and makes it convenient to leverage the popular Python libraries such as Pandas, Numpy, etc. Since Pandas UDF has so many advantages, we want to support Pandas UDAF to extend usage of Pandas UDF. Dian Fu and I have discussed offline and have drafted the FLIP-137[2]. It includes the following items: - Support Pandas UDAF in Batch Group Aggregation - Support Pandas UDAF in Batch Group Window Aggregation - Support Pandas UDAF in Batch Over Window Aggregation - Support Pandas UDAF in Stream Group Window Aggregation - Support Pandas UDAF in Stream Bounded Over Window Aggregation Looking forward to your feedback! Best, Xingbo [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink