On Wed, Jan 23, 2019 at 1:03 PM Robert Bradshaw <rober...@google.com> wrote:
> On Wed, Jan 23, 2019 at 6:38 PM Maximilian Michels <m...@apache.org> wrote: > > > > Thank you for starting on the cross-language feature Robert! > > > > Just to recap: Each SDK runs an ExpansionService which can be contacted > during > > pipeline translation to expand transforms that are unknown to the SDK. > The > > service returns the Proto definitions to the querying process. > > Yep. Technically it doesn't have to be the SDK, or even if it is there > may be a variety of services (e.g. one offering SQL, one offering > different IOs). > > > There will be multiple environments such that during execution > cross-language > > pipelines select the appropriate environment for a transform. > > Exactly. And fuses only those steps with compatible environments together. > > > It's not clear to me, should the expansion happen during pipeline > construction > > or during translation by the Runner? > > I think it need to happen as part of construction because the set of > outputs (and their properties) can be dynamic based on the expansion. > Also, without expansion at pipeline construction, we'll have to define all composite cross-language transforms as runner-native transforms which won't be practical ? > > > Thanks, > > Max > > > > On 23.01.19 04:12, Robert Bradshaw wrote: > > > No, this PR simply takes an endpoint address as a parameter, expecting > > > it to already be up and available. More convenient APIs, e.g. ones > > > that spin up and endpoint and tear it down, or catalog and locate code > > > and services offering these endpoints, could be provided as wrappers > > > on top of or extensions of this. > > > > > > On Wed, Jan 23, 2019 at 12:19 AM Kenneth Knowles <k...@apache.org> > wrote: > > >> > > >> Nice! If I recall correctly, there was mostly concern about how to > launch and manage the expansion service (Docker? Vendor-specific? Etc). > Does this PR a position on that question? > > >> > > >> Kenn > > >> > > >> On Tue, Jan 22, 2019 at 1:44 PM Chamikara Jayalath < > chamik...@google.com> wrote: > > >>> > > >>> > > >>> > > >>> On Tue, Jan 22, 2019 at 11:35 AM Udi Meiri <eh...@google.com> wrote: > > >>>> > > >>>> Also debugability: collecting logs from each of these systems. > > >>> > > >>> > > >>> Agree. > > >>> > > >>>> > > >>>> > > >>>> On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath < > chamik...@google.com> wrote: > > >>>>> > > >>>>> Thanks Robert. > > >>>>> > > >>>>> On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw < > rober...@google.com> wrote: > > >>>>>> > > >>>>>> Now that we have the FnAPI, I started playing around with support > for > > >>>>>> cross-language pipelines. This will allow things like IOs to be > shared > > >>>>>> across all languages, SQL to be invoked from non-Java, TFX > tensorflow > > >>>>>> transforms to be invoked from non-Python, etc. and I think is the > next > > >>>>>> step in extending (and taking advantage of) the portability layer > > >>>>>> we've developed. These are often composite transforms whose inner > > >>>>>> structure depends in non-trivial ways on their configuration. > > >>>>> > > >>>>> > > >>>>> Some additional benefits of cross-language transforms are given > below. > > >>>>> > > >>>>> (1) Current large collection of Java IO connectors will be become > available to other languages. > > >>>>> (2) Current Java and Python transforms will be available for Go > and any other future SDKs. > > >>>>> (3) New transform authors will be able to pick their language of > choice and make their transform available to all Beam SDKs. For example, > this can be the language the transform author is most familiar with or the > only language for which a client library is available for connecting to an > external data store. > > >>>>> > > >>>>>> > > >>>>>> I created a PR [1] that basically follows the "expand via an > external > > >>>>>> process" over RPC alternative from the proposals we came up with > when > > >>>>>> we were discussing this last time [2]. There are still some > unknowns, > > >>>>>> e.g. how to handle artifacts supplied by an alternative SDK (they > > >>>>>> currently must be provided by the environment), but I think this > is a > > >>>>>> good incremental step forward that will already be useful in a > large > > >>>>>> number of cases. It would be good to validate the general > direction > > >>>>>> and I would be interested in any feedback others may have on it. > > >>>>> > > >>>>> > > >>>>> I think there are multiple semi-dependent problems we have to > tackle to reach the final goal of supporting fully-fledged cross-language > transforms in Beam. I agree with taking an incremental approach here with > overall vision in mind. Some other problems we have to tackle involve > following. > > >>>>> > > >>>>> * Defining a user API that will allow pipelines defined in a SDK X > to use transforms defined in SDK Y. > > >>>>> * Update various runners to use URN/payload based environment > definition [1] > > >>>>> * Updating various runners to support starting containers for > multiple environments/languages for the same pipeline and supporting > executing pipeline steps in containers started for multiple environments. > > >>> > > >>> > > >>> I've been working with +Heejong Lee to add some of the missing > pieces mentioned above. > > >>> > > >>> We created following doc that captures some of the ongoing work > related to cross-language transforms and which will hopefully serve as a > knowledge base for anybody who wish to quickly learn context related to > this. > > >>> Feel free to refer to this and/or add to this. > > >>> > > >>> > https://docs.google.com/document/d/1H3yCyVFI9xYs1jsiF1GfrDtARgWGnLDEMwG5aQIx2AU/edit?usp=sharing > > >>> > > >>> > > >>>>> > > >>>>> > > >>>>> Thanks, > > >>>>> Cham > > >>>>> > > >>>>> [1] > https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L952 > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>>> > > >>>>>> > > >>>>>> - Robert > > >>>>>> > > >>>>>> [1] https://github.com/apache/beam/pull/7316 > > >>>>>> [2] https://s.apache.org/beam-mixed-language-pipelines >