Hi everyone,

I would like to start a discussion thread on "Support Pandas UDAF in
PyFlink"

Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves the
high serialization/deserialization overhead in Python UDF and makes it
convenient to leverage the popular Python libraries such as Pandas, Numpy,
etc. Since Pandas UDF has so many advantages, we want to support Pandas
UDAF to extend usage of Pandas UDF.

Dian Fu and I have discussed offline and have drafted the FLIP-137[2]. It
includes the following items:
  - Support Pandas UDAF in Batch Group Aggregation
  - Support Pandas UDAF in Batch Group Window Aggregation
  - Support Pandas UDAF in Batch Over Window Aggregation
  - Support Pandas UDAF in Stream Group Window Aggregation
  - Support Pandas UDAF in Stream Bounded Over Window Aggregation


Looking forward to your feedback!

Best,
Xingbo

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink

Reply via email to