Re: Environments for External Transforms

Robert Bradshaw Thu, 23 May 2019 02:33:01 -0700

On Thu, May 23, 2019 at 11:07 AM Maximilian Michels <[email protected]> wrote:


> My motivation was to get rid of the Docker dependency for the Python VR
> tests. Similarly to how we use Python's LOOPBACK environment for
> executing all non-cross-language tests, I wanted to use Java's EMBEDDED
> environment to run the cross-language transforms.
>
> I suppose we could also go with an override of the default environment
> during startup of the Expansion Service. This would be less intrusive
> and still allow us to use the embedded environment in some of the tests.
>

Yes, I think for the purpose of cheap testing customizing the expansion
service to provide embedded environments is a good solution.

Long term, I think the answer is the ability for a runner to
intelligently swap out (merge, reconcile) environments in a compatible way,
e.g. a runner could be instructed to use the embedded environment instead
as a cheap replacement whenever the standard java environment is
encountered. This would allow one to write pipelines that (for example)
embed the java portions when running on a java runner, the python portions
when running on a python runner, and neither when running on a C++ runner
without hard-coding expectations at the expansion call sites. I would
imagine runners could even be conservative about which environments they
choose to run in embedded modes, depending on whether the specified
dependencies are compatible with their own.


> > 2(c) can also be "hacked" inside an SDK as an explicit environment
> override by the "user" where the expansion service isn't involved and the
> user/SDK manipulates the expansion service response. As Chamikara pointed
> out, I believe the response from the expansion service should be "safe"
> instead of allowing it to return broken combinations.
>
> The responses are not safe today because we do not have a check whether
> an expanded transform is compatible with the default environment. I
> agree though that it is better to develop these checks first before
> allowing overrides by the user.
>
> I'll need to follow-up with the artifact staging doc, then we can agree
> on the first steps for implementation.
>
> Thanks,
> Max
>
> On 22.05.19 20:13, Lukasz Cwik wrote:
> > 2(c) can also be "hacked" inside an SDK as an explicit environment
> > override by the "user" where the expansion service isn't involved and
> > the user/SDK manipulates the expansion service response. As Chamikara
> > pointed out, I believe the response from the expansion service should be
> > "safe" instead of allowing it to return broken combinations.
> >
> > On Wed, May 22, 2019 at 11:08 AM Chamikara Jayalath
> > <[email protected] <mailto:[email protected]>> wrote:
> >
> >
> >
> >     On Wed, May 22, 2019 at 9:17 AM Maximilian Michels <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         Hi,
> >
> >         Robert and me were discussing on the subject of user-specified
> >         environments for external transforms [1]. We couldn't decide
> >         whether
> >         users should have direct control over the environment when they
> >         use an
> >         external transform in their pipeline.
> >
> >         In my mind, it is quite natural that the Expansion Service is a
> >         long-running service that gets started with a list of available
> >         environments. Such a list can be outdated and users may write
> >         transforms
> >         for a new environment they want to use in their pipeline. The
> >         easiest
> >         way would be to allow to pass the environment with the
> >         transform. Note
> >         that we already give users control over the "main" environment
> >         via the
> >         PortablePipelineOptions, so this wouldn't be an entirely new
> >         concept.
> >
> >
> >
> >     I think we are trying to generalize the expansion service along
> >     multiple axes.
> >     (1) dependencies
> >     (a) dependencies embedded in an environment (b) dependencies
> >     specific to an transform (c) dependencies specified by the user
> >     expanding the transform
> >
> >     (2) environments
> >     (a)default environment (b) environments specified a startup of the
> >     expansion service (c) environments specified by the user expanding
> >     the transform (this proposal)
> >
> >     It's great if we can implement the most generic solution along all
> >     these exes but I think we run into risk of resulting in broken
> >     combinations by trying to implement this before we have other
> >     necessary pieces to support a long running expansion service. For
> >     example, support for dynamically registering transforms and support
> >     for discovering transforms.
> >
> >     What is the need for implementing 2 (c) now ? If there's no real
> >     need now I suggest we settle with 2(a) or 2(b) for now till we can
> >     truly support a long running expansion service. Also we'll have a
> >     better idea of how this kind if features should evolve when we have
> >     at least two runners supporting cross-language transforms (we are in
> >     the process of updating Dataflow to support this). Just my 2 cents
> >     though :)
> >
> >
> >         The contrary position is that the Expansion Service should have
> >         full
> >         control over which environment is chosen. Going back to the
> >         discussion
> >         about artifact staging [2], this could enable to perform more
> >         optimizations, such as merging environments or detecting
> conflicts.
> >         However, this only works if this information has been provided
> >         upfront
> >         to the Expansion Service. It wouldn't be impossible to provide
> >         these
> >         hints alongside with the environment like suggested in the
> previous
> >         paragraph.
> >
> >         Any opinions? Should we allow users to optionally specify an
> >         environment
> >         for external transforms?
> >
> >         Thanks,
> >         Max
> >
> >         [1] https://github.com/apache/beam/pull/8639
> >         [2]
> >
> https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E
> >
>

Re: Environments for External Transforms

Reply via email to