Thanks Robert.

On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw <[email protected]> wrote:

> Now that we have the FnAPI, I started playing around with support for
> cross-language pipelines. This will allow things like IOs to be shared
> across all languages, SQL to be invoked from non-Java, TFX tensorflow
> transforms to be invoked from non-Python, etc. and I think is the next
> step in extending (and taking advantage of) the portability layer
> we've developed. These are often composite transforms whose inner
> structure depends in non-trivial ways on their configuration.
>

Some additional benefits of cross-language transforms are given below.

(1) Current large collection of Java IO connectors will be become available
to other languages.
(2) Current Java and Python transforms will be available for Go and any
other future SDKs.
(3) New transform authors will be able to pick their language of choice and
make their transform available to all Beam SDKs. For example, this can be
the language the transform author is most familiar with or the only
language for which a client library is available for connecting to an
external data store.


> I created a PR [1] that basically follows the "expand via an external
> process" over RPC alternative from the proposals we came up with when
> we were discussing this last time [2]. There are still some unknowns,
> e.g. how to handle artifacts supplied by an alternative SDK (they
> currently must be provided by the environment), but I think this is a
> good incremental step forward that will already be useful in a large
> number of cases. It would be good to validate the general direction
> and I would be interested in any feedback others may have on it.
>

I think there are multiple semi-dependent problems we have to tackle to
reach the final goal of supporting fully-fledged cross-language transforms
in Beam. I agree with taking an incremental approach here with overall
vision in mind. Some other problems we have to tackle involve following.

* Defining a user API that will allow pipelines defined in a SDK X to use
transforms defined in SDK Y.
* Update various runners to use URN/payload based environment definition [1]
* Updating various runners to support starting containers for multiple
environments/languages for the same pipeline and supporting executing
pipeline steps in containers started for multiple environments.

Thanks,
Cham

[1]
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L952








>
> - Robert
>
> [1] https://github.com/apache/beam/pull/7316
> [2] https://s.apache.org/beam-mixed-language-pipelines
>

Reply via email to