Hi Jincheng,

Thanks for putting this together. The proposal is very detailed, thorough
and for me as a Beam Flink runner contributor easy to understand :)

One thing that you should probably detail more is the bundle processing. It
is critically important for performance that multiple elements are
processed in a bundle. The default bundle size in the Flink runner is 1s or
1000 elements, whichever comes first. And for streaming, you can find the
logic necessary to align the bundle processing with watermarks and
checkpointing here:
https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java

Thomas







On Tue, Aug 13, 2019 at 7:05 AM jincheng sun <sunjincheng...@gmail.com>
wrote:

> Hi all,
>
> The Python Table API(without Python UDF support) has already been supported
> and will be available in the coming release 1.9.
> As Python UDF is very important for Python users, we'd like to start the
> discussion about the Python UDF support in the Python Table API.
> Aljoscha Krettek, Dian Fu and I have discussed offline and have drafted a
> design doc[1]. It includes the following items:
>
> - The user-defined function interfaces.
> - The user-defined function execution architecture.
>
>  As mentioned by many guys in the previous discussion thread[2], a
> portability framework was introduced in Apache Beam in latest releases. It
> provides well-defined, language-neutral data structures and protocols for
> language-neutral user-defined function execution. This design is based on
> Beam's portability framework. We will introduce how to make use of Beam's
> portability framework for user-defined function execution: data
> transmission, state access, checkpoint, metrics, logging, etc.
>
> Considering that the design relies on Beam's portability framework for
> Python user-defined function execution and not all the contributors in
> Flink community are familiar with Beam's portability framework, we have
> done a prototype[3] for proof of concept and also ease of understanding of
> the design.
>
> Welcome any feedback.
>
> Best,
> Jincheng
>
> [1]
>
> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html
> [3] https://github.com/dianfu/flink/commits/udf_poc
>

Reply via email to