Thank you for starting on the cross-language feature Robert!

Just to recap: Each SDK runs an ExpansionService which can be contacted during pipeline translation to expand transforms that are unknown to the SDK. The service returns the Proto definitions to the querying process.

There will be multiple environments such that during execution cross-language pipelines select the appropriate environment for a transform.

It's not clear to me, should the expansion happen during pipeline construction or during translation by the Runner?

Thanks,
Max

On 23.01.19 04:12, Robert Bradshaw wrote:
No, this PR simply takes an endpoint address as a parameter, expecting
it to already be up and available. More convenient APIs, e.g. ones
that spin up and endpoint and tear it down, or catalog and locate code
and services offering these endpoints, could be provided as wrappers
on top of or extensions of this.

On Wed, Jan 23, 2019 at 12:19 AM Kenneth Knowles <k...@apache.org> wrote:

Nice! If I recall correctly, there was mostly concern about how to launch and 
manage the expansion service (Docker? Vendor-specific? Etc). Does this PR a 
position on that question?

Kenn

On Tue, Jan 22, 2019 at 1:44 PM Chamikara Jayalath <chamik...@google.com> wrote:



On Tue, Jan 22, 2019 at 11:35 AM Udi Meiri <eh...@google.com> wrote:

Also debugability: collecting logs from each of these systems.


Agree.



On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath <chamik...@google.com> 
wrote:

Thanks Robert.

On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw <rober...@google.com> wrote:

Now that we have the FnAPI, I started playing around with support for
cross-language pipelines. This will allow things like IOs to be shared
across all languages, SQL to be invoked from non-Java, TFX tensorflow
transforms to be invoked from non-Python, etc. and I think is the next
step in extending (and taking advantage of) the portability layer
we've developed. These are often composite transforms whose inner
structure depends in non-trivial ways on their configuration.


Some additional benefits of cross-language transforms are given below.

(1) Current large collection of Java IO connectors will be become available to 
other languages.
(2) Current Java and Python transforms will be available for Go and any other 
future SDKs.
(3) New transform authors will be able to pick their language of choice and 
make their transform available to all Beam SDKs. For example, this can be the 
language the transform author is most familiar with or the only language for 
which a client library is available for connecting to an external data store.


I created a PR [1] that basically follows the "expand via an external
process" over RPC alternative from the proposals we came up with when
we were discussing this last time [2]. There are still some unknowns,
e.g. how to handle artifacts supplied by an alternative SDK (they
currently must be provided by the environment), but I think this is a
good incremental step forward that will already be useful in a large
number of cases. It would be good to validate the general direction
and I would be interested in any feedback others may have on it.


I think there are multiple semi-dependent problems we have to tackle to reach 
the final goal of supporting fully-fledged cross-language transforms in Beam. I 
agree with taking an incremental approach here with overall vision in mind. 
Some other problems we have to tackle involve following.

* Defining a user API that will allow pipelines defined in a SDK X to use 
transforms defined in SDK Y.
* Update various runners to use URN/payload based environment definition [1]
* Updating various runners to support starting containers for multiple 
environments/languages for the same pipeline and supporting executing pipeline 
steps in containers started for multiple environments.


I've been working with +Heejong Lee to add some of the missing pieces mentioned 
above.

We created following doc that captures some of the ongoing work related to 
cross-language transforms and which will hopefully serve as a knowledge base 
for anybody who wish to quickly learn context related to this.
Feel free to refer to this and/or add to this.

https://docs.google.com/document/d/1H3yCyVFI9xYs1jsiF1GfrDtARgWGnLDEMwG5aQIx2AU/edit?usp=sharing




Thanks,
Cham

[1] 
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L952









- Robert

[1] https://github.com/apache/beam/pull/7316
[2] https://s.apache.org/beam-mixed-language-pipelines

Reply via email to