On Thu, May 23, 2019 at 3:46 AM Maximilian Michels <[email protected]> wrote:
> > Writing a new transform involves updating the expansion service to > include their new transform. > > Would it be conceivable that the expansion is performed via the > environment? That would solve the problem of updating the expansion > service, although it adds additional complexity for bringing up the > environment. > > Which environment would be used to perform the expansion? I think this is an interesting option, as long as it does not introduce a hard dependency on docker. > On 23.05.19 11:31, Robert Bradshaw wrote: > > On Wed, May 22, 2019 at 6:17 PM Maximilian Michels <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi, > > > > Robert and me were discussing on the subject of user-specified > > environments for external transforms [1]. We couldn't decide whether > > users should have direct control over the environment when they use > an > > external transform in their pipeline. > > > > In my mind, it is quite natural that the Expansion Service is a > > long-running service that gets started with a list of available > > environments. > > > > > > +1. > > > > IMHO, the expansion service should be expected to provide valid > > environments for the transforms it vendors. Removing this expectation > > seems wrong. Making it cheap to specify non-default dependencies without > > building (publishing, etc.) a docker image is probably key to making > > this work well (and also allowing more powerful environment > introspection). > > > > Such a list can be outdated and users may write transforms > > for a new environment they want to use in their pipeline. > > > > > > This is the part that I'm having trouble following. Writing a new > > transform involves updating the expansion service to include their new > > transform. The author of a transform (in other words, the one who > > defines its expansion and implementation) is in the position to name its > > dependencies, etc. and the user of the transform (the one invoking it) > > is not in a generally good position to know what environments would be > > valid. > > > > The easiest > > way would be to allow to pass the environment with the transform. > > > > > > What this allows is using existing transforms in new environments. There > > are possibly some usecases for this, e.g. expansion of a given transform > > may be compatible with ether version X or version Y of a library, left > > up to the discretion of the caller, but I think that this is really just > > a deficiency in our environment specifications (e.g. it one should be > > able to express this flexibility in the returned environment). > > > > Note > > that we already give users control over the "main" environment via > the > > PortablePipelineOptions, so this wouldn't be an entirely new concept. > > > > > > Yes, the author of a pipeline/transform chooses the environment in which > > those transforms execute. > > > > The contrary position is that the Expansion Service should have full > > control over which environment is chosen. Going back to the > discussion > > about artifact staging [2], this could enable to perform more > > optimizations, such as merging environments or detecting conflicts. > > However, this only works if this information has been provided > upfront > > to the Expansion Service. It wouldn't be impossible to provide these > > hints alongside with the environment like suggested in the previous > > paragraph. > > > > Any opinions? Should we allow users to optionally specify an > > environment > > for external transforms? > > > > Thanks, > > Max > > > > [1] https://github.com/apache/beam/pull/8639 > > [2] > > > https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E > > >
