I'm also late to the party here :) When I saw the first draft, I was thinking how exactly the design doc would tie in with Beam. Thanks for the update.

A couple of comments with this regard:

Flink has provided a distributed cache mechanism and allows users to upload their files using 
"registerCachedFile" method in ExecutionEnvironment/StreamExecutionEnvironment. The python files users 
specified through "add_python_file", "set_python_requirements" and "add_python_archive" 
are also uploaded through this method eventually.

For process-based execution we use Flink's cache distribution instead of Beam's artifact staging.

Apache Beam Portability Framework already supports artifact staging that works out of the box with the Docker environment. We can use the artifact staging service defined in Apache Beam to transfer the dependencies from the operator to Python SDK harness running in the docker container.

Do we want to implement two different ways of staging artifacts? It seems sensible to use the same artifact staging functionality also for the process-based execution. Apart from being simpler, this would also allow the process-based execution to run in other environments than the Flink TaskManager environment.

Thanks,
Max

On 15.10.19 11:13, Wei Zhong wrote:
Hi Thomas,

Thanks a lot for your suggestion!

As you can see from the section "Goals" that this FLIP focuses on the 
dependency management in process mode. However, the APIs and design proposed in this FLIP 
also applies for the docker mode. So it makes sense to me to also describe how this 
design is integated to the artifact staging service of Apache Beam in docker mode. I have 
updated the design doc and looking forward to your feedback.

Thanks,
Wei

在 2019年10月15日,01:54,Thomas Weise <t...@apache.org> 写道:

Sorry for joining the discussion late.

The Beam environment already supports artifact staging, it works out of the
box with the Docker environment. I think it would be helpful to explain in
the FLIP how this proposal relates to what Beam offers / how it would be
integrated.

Thanks,
Thomas


On Mon, Oct 14, 2019 at 8:09 AM Jeff Zhang <zjf...@gmail.com> wrote:

+1

Hequn Cheng <chenghe...@gmail.com> 于2019年10月14日周一 下午10:55写道:

+1

Good job, Wei!

Best, Hequn

On Mon, Oct 14, 2019 at 2:54 PM Dian Fu <dian0511...@gmail.com> wrote:

Hi Wei,

+1 (non-binding). Thanks for driving this.

Thanks,
Dian

在 2019年10月14日,下午1:40,jincheng sun <sunjincheng...@gmail.com> 写道:

+1

Wei Zhong <weizhong0...@gmail.com> 于2019年10月12日周六 下午8:41写道:

Hi all,

I would like to start the vote for FLIP-78[1] which is discussed and
reached consensus in the discussion thread[2].

The vote will be open for at least 72 hours. I'll try to close it by
2019-10-16 18:00 UTC, unless there is an objection or not enough
votes.

Thanks,
Wei

[1]



https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
<



https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management

[2]



http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-UDF-Environment-and-Dependency-Management-td33514.html
<



http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-UDF-Environment-and-Dependency-Management-td33514.html









--
Best Regards

Jeff Zhang


Reply via email to