[
https://issues.apache.org/jira/browse/BEAM-11077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548924#comment-17548924
]
Danny McCormick commented on BEAM-11077:
----------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/20534
> Simplify use of the Python Portable runner for Go SDK pipelines
> ---------------------------------------------------------------
>
> Key: BEAM-11077
> URL: https://issues.apache.org/jira/browse/BEAM-11077
> Project: Beam
> Issue Type: Improvement
> Components: sdk-go
> Reporter: Robert Burke
> Priority: P3
>
> It's possible to execute Go SDK pipelines on any portable Beam runner, using
> the "universal" runner and specifying the endpoint of the job server.
> However, this is inconvenient in some instances as it requires having a
> standing Job Management server for the runner in question.
> This task is to simplify using the Python Portable Runner for
> arbitrary/novice Go SDK users. While for performance, its generally better to
> keep a job management server around so it can execute multiple jobs, this
> isn't required.
> The goal would be to create a "python" runner for the Go SDK, which will
> start up the python portable runner job server, and submit a pipeline to it
> in Loopback mode for execution, using the "universal runner", and wait for
> the job to finish.
> This will give Go users access to a correct runner for testing, and allow
> them to develop their pipelines confidently before moving them to distributed
> runners like Flink, Spark, or Dataflow.
> Ideally outside of some clearly indicated dependencies (and failures when
> they aren't present), a user should be able to import the package and specify
> --runner=python, and have their pipeline execute.
> The "long way" for using the Python Portable Runner with the Go SDK is on the
> [Go Tips page of the Dev wiki.
> |https://cwiki.apache.org/confluence/display/BEAM/Go+Tips]
> The Go side runner code is in
> [https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/runners]
> The Python Portable runner entry point is here:
> [https://github.com/apache/beam/blob/3d296c42f9d9dbb7c2234dec325f6a5255b821ee/sdks/python/apache_beam/runners/portability/portable_runner.py]
>
>
> The simplest way for this would probably be to require users have Docker
> installed, and for the Beam project to publish a Docker Container image that
> can start up the Python Runner job server appropriately. This keeps the
> dependencies minimal, and start up consistent for users, and we likely can
> re-use the technique for other purposes. And using a similar technique would
> make developing new SDKs easier as well, as new SDKs can use the same
> infrastructure from the start.
> Other approaches to solve the problem are of course welcome.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)