[jira] [Commented] (BEAM-11077) Simplify use of the Python Portable runner for Go SDK pipelines

Danny McCormick (Jira) Sat, 04 Jun 2022 11:07:04 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-11077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548924#comment-17548924
 ]


Danny McCormick commented on BEAM-11077:
----------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/20534

> Simplify use of the Python Portable runner for Go SDK pipelines
> ---------------------------------------------------------------
>
>                 Key: BEAM-11077
>                 URL: https://issues.apache.org/jira/browse/BEAM-11077
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: Robert Burke
>            Priority: P3
>
> It's possible to execute Go SDK pipelines on any portable Beam runner, using 
> the "universal" runner and specifying the endpoint of the job server. 
> However, this is inconvenient in some instances as it requires having a 
> standing Job Management server for the runner in question.
> This task is to simplify using the Python Portable Runner for 
> arbitrary/novice Go SDK users. While for performance, its generally better to 
> keep a job management server around so it can execute multiple jobs, this 
> isn't required.
> The goal would be to create a "python" runner for the Go SDK, which will 
> start up the python portable runner job server, and submit a pipeline to it 
> in Loopback mode for execution, using the "universal runner", and wait for 
> the job to finish.
>  This will give Go users access to a correct runner for testing, and allow 
> them to develop their pipelines confidently before moving them to distributed 
> runners like Flink, Spark, or Dataflow.
> Ideally outside of some clearly indicated dependencies (and failures when 
> they aren't present), a user should be able to import the package and specify 
> --runner=python, and have their pipeline execute.
> The "long way" for using the Python Portable Runner with the Go SDK is on the 
> [Go Tips page of the Dev wiki. 
> |https://cwiki.apache.org/confluence/display/BEAM/Go+Tips] 
>  The Go side runner code is in 
> [https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/runners] 
> The Python Portable runner entry point is here: 
> [https://github.com/apache/beam/blob/3d296c42f9d9dbb7c2234dec325f6a5255b821ee/sdks/python/apache_beam/runners/portability/portable_runner.py]
>  
>  
> The simplest way for this would probably be to require users have Docker 
> installed, and for the Beam project to publish a Docker Container image that 
> can start up the Python Runner job server appropriately. This keeps the 
> dependencies minimal, and start up consistent for users, and we likely can 
> re-use the technique for other purposes. And using a similar technique would 
> make developing new SDKs easier as well, as new SDKs can use the same 
> infrastructure from the start.
> Other approaches to solve the problem are of course welcome.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (BEAM-11077) Simplify use of the Python Portable runner for Go SDK pipelines

Reply via email to