Re: Environments for External Transforms

Robert Bradshaw Mon, 27 May 2019 04:34:29 -0700

On Mon, May 27, 2019 at 12:38 PM Maximilian Michels <[email protected]> wrote:
>
> > Which environment would be used to perform the expansion? I think this is 
> > an interesting option, as long as it does not introduce a hard dependency 
> > on docker.
>
> The same environment that the to-be-expanded transform requires during
> runtime.
>
> > Dataflow has been doing something similar in this route where it is trying 
> > to get rid of the driver program running on the users machine. If you can 
> > get the expansion service to launch and run an environment to perform the 
> > expansion, you could also get it to create and submit a job as well 
> > returning data around the running job.
>
> Portability already runs without a driver on the user machine, apart
> from expansion and staging. For anything runtime-related the job server
> kicks in. It's worth to think about delegating expansion and staging to
> the Job server.


I think it makes a lot of sense for job servers to also act as
expansion services, but one can't of course defer expansion until job
submission.

> On 24.05.19 23:48, Lukasz Cwik wrote:
> > Dataflow has been doing something similar in this route where it is
> > trying to get rid of the driver program running on the users machine. If
> > you can get the expansion service to launch and run an environment to
> > perform the expansion, you could also get it to create and submit a job
> > as well returning data around the running job.
> >
> > On Thu, May 23, 2019 at 7:47 AM Thomas Weise <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >
> >
> >     On Thu, May 23, 2019 at 3:46 AM Maximilian Michels <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >          >  Writing a new transform involves updating the expansion
> >         service to include their new transform.
> >
> >         Would it be conceivable that the expansion is performed via the
> >         environment? That would solve the problem of updating the expansion
> >         service, although it adds additional complexity for bringing up the
> >         environment.
> >
> >
> >     Which environment would be used to perform the expansion? I think
> >     this is an interesting option, as long as it does not introduce a
> >     hard dependency on docker.
> >
> >         On 23.05.19 11:31, Robert Bradshaw wrote:
> >          > On Wed, May 22, 2019 at 6:17 PM Maximilian Michels
> >         <[email protected] <mailto:[email protected]>
> >          > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >          >
> >          >     Hi,
> >          >
> >          >     Robert and me were discussing on the subject of
> >         user-specified
> >          >     environments for external transforms [1]. We couldn't
> >         decide whether
> >          >     users should have direct control over the environment
> >         when they use an
> >          >     external transform in their pipeline.
> >          >
> >          >     In my mind, it is quite natural that the Expansion
> >         Service is a
> >          >     long-running service that gets started with a list of
> >         available
> >          >     environments.
> >          >
> >          >
> >          > +1.
> >          >
> >          > IMHO, the expansion service should be expected to provide valid
> >          > environments for the transforms it vendors. Removing this
> >         expectation
> >          > seems wrong. Making it cheap to specify non-default
> >         dependencies without
> >          > building (publishing, etc.) a docker image is probably key to
> >         making
> >          > this work well (and also allowing more powerful environment
> >         introspection).
> >          >
> >          >     Such a list can be outdated and users may write transforms
> >          >     for a new environment they want to use in their pipeline.
> >          >
> >          >
> >          > This is the part that I'm having trouble following. Writing a
> >         new
> >          > transform involves updating the expansion service to include
> >         their new
> >          > transform. The author of a transform (in other words, the one
> >         who
> >          > defines its expansion and implementation) is in the position
> >         to name its
> >          > dependencies, etc. and the user of the transform (the one
> >         invoking it)
> >          > is not in a generally good position to know what environments
> >         would be
> >          > valid.
> >          >
> >          >     The easiest
> >          >     way would be to allow to pass the environment with the
> >         transform.
> >          >
> >          >
> >          > What this allows is using existing transforms in new
> >         environments. There
> >          > are possibly some usecases for this, e.g. expansion of a
> >         given transform
> >          > may be compatible with ether version X or version Y of a
> >         library, left
> >          > up to the discretion of the caller, but I think that this is
> >         really just
> >          > a deficiency in our environment specifications (e.g. it one
> >         should be
> >          > able to express this flexibility in the returned environment).
> >          >
> >          >     Note
> >          >     that we already give users control over the "main"
> >         environment via the
> >          >     PortablePipelineOptions, so this wouldn't be an entirely
> >         new concept.
> >          >
> >          >
> >          > Yes, the author of a pipeline/transform chooses the
> >         environment in which
> >          > those transforms execute.
> >          >
> >          >     The contrary position is that the Expansion Service
> >         should have full
> >          >     control over which environment is chosen. Going back to
> >         the discussion
> >          >     about artifact staging [2], this could enable to perform more
> >          >     optimizations, such as merging environments or detecting
> >         conflicts.
> >          >     However, this only works if this information has been
> >         provided upfront
> >          >     to the Expansion Service. It wouldn't be impossible to
> >         provide these
> >          >     hints alongside with the environment like suggested in
> >         the previous
> >          >     paragraph.
> >          >
> >          >     Any opinions? Should we allow users to optionally specify an
> >          >     environment
> >          >     for external transforms?
> >          >
> >          >     Thanks,
> >          >     Max
> >          >
> >          >     [1] https://github.com/apache/beam/pull/8639
> >          >     [2]
> >          >
> >         
> > https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E
> >          >
> >

Re: Environments for External Transforms

Reply via email to