Hi Jincheng,

Thanks a lot for joining the discussion and the suggestion of discussing
FLIP-137 and FLIP-139 together.

>> 1. We also need to consider how pandas UDAF supports metrics, and whether
we need a custom interface for pandas UDAF?

Yes. We need to add an interface so that users can add some logic in the
`open` or `close` method such as creating metrics. I have added the
definition of the interface and the corresponding example in the doc.

>> 2. We have added @udaf(), so whether to use ordinary Python UDAF?

Yes. From the overall view of Python User Defined Function, we use @udf to
describe general python udf and pandas udf, @udtf to describe python udtf,
and @udaf to describe general python udaf and pandas udaf, which is more
unified. I will discuss it in FLIP-139 later.

Best,
Xingbo

jincheng sun <sunjincheng...@gmail.com> 于2020年8月31日周一 上午11:05写道:

> Hi Xingbo,
>
> Thanks for the discussion! Overall, + 1 for this FLIP.
> I have two points to add:
>
>  - We also need to consider how pandas UDAF supports metrics, and whether
> we need a custom interface for pandas UDAF?
>  - We have added @udaf(), so whether to use ordinary Python UDAF? If not,
> the addition of @udaf is not appropriate. We need to discuss it further.
>
> We can consider it combination with FLIP-139 for design. What do you think?
>
> Best,
> Jincheng
>
>
> Xingbo Huang <hxbks...@gmail.com> 于2020年8月24日周一 下午2:25写道:
>
> > Hi everyone,
> >
> > I would like to start a discussion thread on "Support Pandas UDAF in
> > PyFlink"
> >
> > Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves the
> > high serialization/deserialization overhead in Python UDF and makes it
> > convenient to leverage the popular Python libraries such as Pandas,
> Numpy,
> > etc. Since Pandas UDF has so many advantages, we want to support Pandas
> > UDAF to extend usage of Pandas UDF.
> >
> > Dian Fu and I have discussed offline and have drafted the FLIP-137[2]. It
> > includes the following items:
> >   - Support Pandas UDAF in Batch Group Aggregation
> >   - Support Pandas UDAF in Batch Group Window Aggregation
> >   - Support Pandas UDAF in Batch Over Window Aggregation
> >   - Support Pandas UDAF in Stream Group Window Aggregation
> >   - Support Pandas UDAF in Stream Bounded Over Window Aggregation
> >
> >
> > Looking forward to your feedback!
> >
> > Best,
> > Xingbo
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink
> >
>

Reply via email to