On Mon, May 27, 2019 at 12:38 PM Maximilian Michels <[email protected]> wrote: > > > Which environment would be used to perform the expansion? I think this is > > an interesting option, as long as it does not introduce a hard dependency > > on docker. > > The same environment that the to-be-expanded transform requires during > runtime. > > > Dataflow has been doing something similar in this route where it is trying > > to get rid of the driver program running on the users machine. If you can > > get the expansion service to launch and run an environment to perform the > > expansion, you could also get it to create and submit a job as well > > returning data around the running job. > > Portability already runs without a driver on the user machine, apart > from expansion and staging. For anything runtime-related the job server > kicks in. It's worth to think about delegating expansion and staging to > the Job server.
I think it makes a lot of sense for job servers to also act as expansion services, but one can't of course defer expansion until job submission. > On 24.05.19 23:48, Lukasz Cwik wrote: > > Dataflow has been doing something similar in this route where it is > > trying to get rid of the driver program running on the users machine. If > > you can get the expansion service to launch and run an environment to > > perform the expansion, you could also get it to create and submit a job > > as well returning data around the running job. > > > > On Thu, May 23, 2019 at 7:47 AM Thomas Weise <[email protected] > > <mailto:[email protected]>> wrote: > > > > > > > > On Thu, May 23, 2019 at 3:46 AM Maximilian Michels <[email protected] > > <mailto:[email protected]>> wrote: > > > > > Writing a new transform involves updating the expansion > > service to include their new transform. > > > > Would it be conceivable that the expansion is performed via the > > environment? That would solve the problem of updating the expansion > > service, although it adds additional complexity for bringing up the > > environment. > > > > > > Which environment would be used to perform the expansion? I think > > this is an interesting option, as long as it does not introduce a > > hard dependency on docker. > > > > On 23.05.19 11:31, Robert Bradshaw wrote: > > > On Wed, May 22, 2019 at 6:17 PM Maximilian Michels > > <[email protected] <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > > > Hi, > > > > > > Robert and me were discussing on the subject of > > user-specified > > > environments for external transforms [1]. We couldn't > > decide whether > > > users should have direct control over the environment > > when they use an > > > external transform in their pipeline. > > > > > > In my mind, it is quite natural that the Expansion > > Service is a > > > long-running service that gets started with a list of > > available > > > environments. > > > > > > > > > +1. > > > > > > IMHO, the expansion service should be expected to provide valid > > > environments for the transforms it vendors. Removing this > > expectation > > > seems wrong. Making it cheap to specify non-default > > dependencies without > > > building (publishing, etc.) a docker image is probably key to > > making > > > this work well (and also allowing more powerful environment > > introspection). > > > > > > Such a list can be outdated and users may write transforms > > > for a new environment they want to use in their pipeline. > > > > > > > > > This is the part that I'm having trouble following. Writing a > > new > > > transform involves updating the expansion service to include > > their new > > > transform. The author of a transform (in other words, the one > > who > > > defines its expansion and implementation) is in the position > > to name its > > > dependencies, etc. and the user of the transform (the one > > invoking it) > > > is not in a generally good position to know what environments > > would be > > > valid. > > > > > > The easiest > > > way would be to allow to pass the environment with the > > transform. > > > > > > > > > What this allows is using existing transforms in new > > environments. There > > > are possibly some usecases for this, e.g. expansion of a > > given transform > > > may be compatible with ether version X or version Y of a > > library, left > > > up to the discretion of the caller, but I think that this is > > really just > > > a deficiency in our environment specifications (e.g. it one > > should be > > > able to express this flexibility in the returned environment). > > > > > > Note > > > that we already give users control over the "main" > > environment via the > > > PortablePipelineOptions, so this wouldn't be an entirely > > new concept. > > > > > > > > > Yes, the author of a pipeline/transform chooses the > > environment in which > > > those transforms execute. > > > > > > The contrary position is that the Expansion Service > > should have full > > > control over which environment is chosen. Going back to > > the discussion > > > about artifact staging [2], this could enable to perform more > > > optimizations, such as merging environments or detecting > > conflicts. > > > However, this only works if this information has been > > provided upfront > > > to the Expansion Service. It wouldn't be impossible to > > provide these > > > hints alongside with the environment like suggested in > > the previous > > > paragraph. > > > > > > Any opinions? Should we allow users to optionally specify an > > > environment > > > for external transforms? > > > > > > Thanks, > > > Max > > > > > > [1] https://github.com/apache/beam/pull/8639 > > > [2] > > > > > > > https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E > > > > >
