Thank you! looking forward to the voting :)

Best,
Jincheng


Xingbo Huang <hxbks...@gmail.com> 于2020年9月3日周四 下午2:39写道:

> Hi Jincheng,
>
> Yes, I agree that users can extend the class `AggregateFunction` if they
> want to define a Pandas UDAF by the way of custom classes. I have updated
> the part of the FLIP.
>
> Best,
> Xingbo
>
> jincheng sun <sunjincheng...@gmail.com> 于2020年9月3日周四 下午1:48写道:
>
> > Thanks for the update Xingbo!
> >
> > Pandas UDAF can reuse the `class aggregate function (user defined
> > function)` interface in FLIP-139, and the core logic of Pandas UDAF users
> > is written in the `accumulate` method. In this way, we can unify the
> > interface semantics of all UDAF.
> >
> > What do you think?
> >
> > Best,
> > Jincheng
> >
> >
> >
> > Xingbo Huang <hxbks...@gmail.com> 于2020年8月31日周一 下午6:06写道:
> >
> > > Hi Jincheng,
> > >
> > > Thanks a lot for joining the discussion and the suggestion of
> discussing
> > > FLIP-137 and FLIP-139 together.
> > >
> > > >> 1. We also need to consider how pandas UDAF supports metrics, and
> > > whether
> > > we need a custom interface for pandas UDAF?
> > >
> > > Yes. We need to add an interface so that users can add some logic in
> the
> > > `open` or `close` method such as creating metrics. I have added the
> > > definition of the interface and the corresponding example in the doc.
> > >
> > > >> 2. We have added @udaf(), so whether to use ordinary Python UDAF?
> > >
> > > Yes. From the overall view of Python User Defined Function, we use @udf
> > to
> > > describe general python udf and pandas udf, @udtf to describe python
> > udtf,
> > > and @udaf to describe general python udaf and pandas udaf, which is
> more
> > > unified. I will discuss it in FLIP-139 later.
> > >
> > > Best,
> > > Xingbo
> > >
> > > jincheng sun <sunjincheng...@gmail.com> 于2020年8月31日周一 上午11:05写道:
> > >
> > > > Hi Xingbo,
> > > >
> > > > Thanks for the discussion! Overall, + 1 for this FLIP.
> > > > I have two points to add:
> > > >
> > > >  - We also need to consider how pandas UDAF supports metrics, and
> > whether
> > > > we need a custom interface for pandas UDAF?
> > > >  - We have added @udaf(), so whether to use ordinary Python UDAF? If
> > not,
> > > > the addition of @udaf is not appropriate. We need to discuss it
> > further.
> > > >
> > > > We can consider it combination with FLIP-139 for design. What do you
> > > think?
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > > >
> > > > Xingbo Huang <hxbks...@gmail.com> 于2020年8月24日周一 下午2:25写道:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I would like to start a discussion thread on "Support Pandas UDAF
> in
> > > > > PyFlink"
> > > > >
> > > > > Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves
> > the
> > > > > high serialization/deserialization overhead in Python UDF and makes
> > it
> > > > > convenient to leverage the popular Python libraries such as Pandas,
> > > > Numpy,
> > > > > etc. Since Pandas UDF has so many advantages, we want to support
> > Pandas
> > > > > UDAF to extend usage of Pandas UDF.
> > > > >
> > > > > Dian Fu and I have discussed offline and have drafted the
> > FLIP-137[2].
> > > It
> > > > > includes the following items:
> > > > >   - Support Pandas UDAF in Batch Group Aggregation
> > > > >   - Support Pandas UDAF in Batch Group Window Aggregation
> > > > >   - Support Pandas UDAF in Batch Over Window Aggregation
> > > > >   - Support Pandas UDAF in Stream Group Window Aggregation
> > > > >   - Support Pandas UDAF in Stream Bounded Over Window Aggregation
> > > > >
> > > > >
> > > > > Looking forward to your feedback!
> > > > >
> > > > > Best,
> > > > > Xingbo
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink
> > > > >
> > > >
> > >
> >
>

Reply via email to