Nice! If I recall correctly, there was mostly concern about how to launch and manage the expansion service (Docker? Vendor-specific? Etc). Does this PR a position on that question?
Kenn On Tue, Jan 22, 2019 at 1:44 PM Chamikara Jayalath <chamik...@google.com> wrote: > > > On Tue, Jan 22, 2019 at 11:35 AM Udi Meiri <eh...@google.com> wrote: > >> Also debugability: collecting logs from each of these systems. >> > > Agree. > > >> >> On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath <chamik...@google.com> >> wrote: >> >>> Thanks Robert. >>> >>> On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> Now that we have the FnAPI, I started playing around with support for >>>> cross-language pipelines. This will allow things like IOs to be shared >>>> across all languages, SQL to be invoked from non-Java, TFX tensorflow >>>> transforms to be invoked from non-Python, etc. and I think is the next >>>> step in extending (and taking advantage of) the portability layer >>>> we've developed. These are often composite transforms whose inner >>>> structure depends in non-trivial ways on their configuration. >>>> >>> >>> Some additional benefits of cross-language transforms are given below. >>> >>> (1) Current large collection of Java IO connectors will be become >>> available to other languages. >>> (2) Current Java and Python transforms will be available for Go and any >>> other future SDKs. >>> (3) New transform authors will be able to pick their language of choice >>> and make their transform available to all Beam SDKs. For example, this can >>> be the language the transform author is most familiar with or the only >>> language for which a client library is available for connecting to an >>> external data store. >>> >>> >>>> I created a PR [1] that basically follows the "expand via an external >>>> process" over RPC alternative from the proposals we came up with when >>>> we were discussing this last time [2]. There are still some unknowns, >>>> e.g. how to handle artifacts supplied by an alternative SDK (they >>>> currently must be provided by the environment), but I think this is a >>>> good incremental step forward that will already be useful in a large >>>> number of cases. It would be good to validate the general direction >>>> and I would be interested in any feedback others may have on it. >>>> >>> >>> I think there are multiple semi-dependent problems we have to tackle to >>> reach the final goal of supporting fully-fledged cross-language transforms >>> in Beam. I agree with taking an incremental approach here with overall >>> vision in mind. Some other problems we have to tackle involve following. >>> >>> * Defining a user API that will allow pipelines defined in a SDK X to >>> use transforms defined in SDK Y. >>> * Update various runners to use URN/payload based environment definition >>> [1] >>> * Updating various runners to support starting containers for multiple >>> environments/languages for the same pipeline and supporting executing >>> pipeline steps in containers started for multiple environments. >>> >> > I've been working with +Heejong Lee <heej...@google.com> to add some of > the missing pieces mentioned above. > > We created following doc that captures some of the ongoing work related to > cross-language transforms and which will hopefully serve as a knowledge > base for anybody who wish to quickly learn context related to this. > Feel free to refer to this and/or add to this. > > > https://docs.google.com/document/d/1H3yCyVFI9xYs1jsiF1GfrDtARgWGnLDEMwG5aQIx2AU/edit?usp=sharing > > > >> >>> Thanks, >>> Cham >>> >>> [1] >>> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L952 >>> >>> >>> >>> >>> >>> >>> >>> >>>> >>>> - Robert >>>> >>>> [1] https://github.com/apache/beam/pull/7316 >>>> [2] https://s.apache.org/beam-mixed-language-pipelines >>>> >>>