Re: Cross-language pipelines

Kenneth Knowles Tue, 22 Jan 2019 15:20:13 -0800

Nice! If I recall correctly, there was mostly concern about how to launch
and manage the expansion service (Docker? Vendor-specific? Etc). Does this
PR a position on that question?


Kenn

On Tue, Jan 22, 2019 at 1:44 PM Chamikara Jayalath <[email protected]>
wrote:

>
>
> On Tue, Jan 22, 2019 at 11:35 AM Udi Meiri <[email protected]> wrote:
>
>> Also debugability: collecting logs from each of these systems.
>>
>
> Agree.
>
>
>>
>> On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath <[email protected]>
>> wrote:
>>
>>> Thanks Robert.
>>>
>>> On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw <[email protected]>
>>> wrote:
>>>
>>>> Now that we have the FnAPI, I started playing around with support for
>>>> cross-language pipelines. This will allow things like IOs to be shared
>>>> across all languages, SQL to be invoked from non-Java, TFX tensorflow
>>>> transforms to be invoked from non-Python, etc. and I think is the next
>>>> step in extending (and taking advantage of) the portability layer
>>>> we've developed. These are often composite transforms whose inner
>>>> structure depends in non-trivial ways on their configuration.
>>>>
>>>
>>> Some additional benefits of cross-language transforms are given below.
>>>
>>> (1) Current large collection of Java IO connectors will be become
>>> available to other languages.
>>> (2) Current Java and Python transforms will be available for Go and any
>>> other future SDKs.
>>> (3) New transform authors will be able to pick their language of choice
>>> and make their transform available to all Beam SDKs. For example, this can
>>> be the language the transform author is most familiar with or the only
>>> language for which a client library is available for connecting to an
>>> external data store.
>>>
>>>
>>>> I created a PR [1] that basically follows the "expand via an external
>>>> process" over RPC alternative from the proposals we came up with when
>>>> we were discussing this last time [2]. There are still some unknowns,
>>>> e.g. how to handle artifacts supplied by an alternative SDK (they
>>>> currently must be provided by the environment), but I think this is a
>>>> good incremental step forward that will already be useful in a large
>>>> number of cases. It would be good to validate the general direction
>>>> and I would be interested in any feedback others may have on it.
>>>>
>>>
>>> I think there are multiple semi-dependent problems we have to tackle to
>>> reach the final goal of supporting fully-fledged cross-language transforms
>>> in Beam. I agree with taking an incremental approach here with overall
>>> vision in mind. Some other problems we have to tackle involve following.
>>>
>>> * Defining a user API that will allow pipelines defined in a SDK X to
>>> use transforms defined in SDK Y.
>>> * Update various runners to use URN/payload based environment definition
>>> [1]
>>> * Updating various runners to support starting containers for multiple
>>> environments/languages for the same pipeline and supporting executing
>>> pipeline steps in containers started for multiple environments.
>>>
>>
> I've been working with +Heejong Lee <[email protected]> to add some of
> the missing pieces mentioned above.
>
> We created following doc that captures some of the ongoing work related to
> cross-language transforms and which will hopefully serve as a knowledge
> base for anybody who wish to quickly learn context related to this.
> Feel free to refer to this and/or add to this.
>
>
> https://docs.google.com/document/d/1H3yCyVFI9xYs1jsiF1GfrDtARgWGnLDEMwG5aQIx2AU/edit?usp=sharing
>
>
>
>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L952
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> - Robert
>>>>
>>>> [1] https://github.com/apache/beam/pull/7316
>>>> [2] https://s.apache.org/beam-mixed-language-pipelines
>>>>
>>>

Re: Cross-language pipelines

Reply via email to