On Thu, May 23, 2019 at 11:07 AM Maximilian Michels <[email protected]> wrote:
> My motivation was to get rid of the Docker dependency for the Python VR > tests. Similarly to how we use Python's LOOPBACK environment for > executing all non-cross-language tests, I wanted to use Java's EMBEDDED > environment to run the cross-language transforms. > > I suppose we could also go with an override of the default environment > during startup of the Expansion Service. This would be less intrusive > and still allow us to use the embedded environment in some of the tests. > Yes, I think for the purpose of cheap testing customizing the expansion service to provide embedded environments is a good solution. Long term, I think the answer is the ability for a runner to intelligently swap out (merge, reconcile) environments in a compatible way, e.g. a runner could be instructed to use the embedded environment instead as a cheap replacement whenever the standard java environment is encountered. This would allow one to write pipelines that (for example) embed the java portions when running on a java runner, the python portions when running on a python runner, and neither when running on a C++ runner without hard-coding expectations at the expansion call sites. I would imagine runners could even be conservative about which environments they choose to run in embedded modes, depending on whether the specified dependencies are compatible with their own. > > 2(c) can also be "hacked" inside an SDK as an explicit environment > override by the "user" where the expansion service isn't involved and the > user/SDK manipulates the expansion service response. As Chamikara pointed > out, I believe the response from the expansion service should be "safe" > instead of allowing it to return broken combinations. > > The responses are not safe today because we do not have a check whether > an expanded transform is compatible with the default environment. I > agree though that it is better to develop these checks first before > allowing overrides by the user. > > I'll need to follow-up with the artifact staging doc, then we can agree > on the first steps for implementation. > > Thanks, > Max > > On 22.05.19 20:13, Lukasz Cwik wrote: > > 2(c) can also be "hacked" inside an SDK as an explicit environment > > override by the "user" where the expansion service isn't involved and > > the user/SDK manipulates the expansion service response. As Chamikara > > pointed out, I believe the response from the expansion service should be > > "safe" instead of allowing it to return broken combinations. > > > > On Wed, May 22, 2019 at 11:08 AM Chamikara Jayalath > > <[email protected] <mailto:[email protected]>> wrote: > > > > > > > > On Wed, May 22, 2019 at 9:17 AM Maximilian Michels <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi, > > > > Robert and me were discussing on the subject of user-specified > > environments for external transforms [1]. We couldn't decide > > whether > > users should have direct control over the environment when they > > use an > > external transform in their pipeline. > > > > In my mind, it is quite natural that the Expansion Service is a > > long-running service that gets started with a list of available > > environments. Such a list can be outdated and users may write > > transforms > > for a new environment they want to use in their pipeline. The > > easiest > > way would be to allow to pass the environment with the > > transform. Note > > that we already give users control over the "main" environment > > via the > > PortablePipelineOptions, so this wouldn't be an entirely new > > concept. > > > > > > > > I think we are trying to generalize the expansion service along > > multiple axes. > > (1) dependencies > > (a) dependencies embedded in an environment (b) dependencies > > specific to an transform (c) dependencies specified by the user > > expanding the transform > > > > (2) environments > > (a)default environment (b) environments specified a startup of the > > expansion service (c) environments specified by the user expanding > > the transform (this proposal) > > > > It's great if we can implement the most generic solution along all > > these exes but I think we run into risk of resulting in broken > > combinations by trying to implement this before we have other > > necessary pieces to support a long running expansion service. For > > example, support for dynamically registering transforms and support > > for discovering transforms. > > > > What is the need for implementing 2 (c) now ? If there's no real > > need now I suggest we settle with 2(a) or 2(b) for now till we can > > truly support a long running expansion service. Also we'll have a > > better idea of how this kind if features should evolve when we have > > at least two runners supporting cross-language transforms (we are in > > the process of updating Dataflow to support this). Just my 2 cents > > though :) > > > > > > The contrary position is that the Expansion Service should have > > full > > control over which environment is chosen. Going back to the > > discussion > > about artifact staging [2], this could enable to perform more > > optimizations, such as merging environments or detecting > conflicts. > > However, this only works if this information has been provided > > upfront > > to the Expansion Service. It wouldn't be impossible to provide > > these > > hints alongside with the environment like suggested in the > previous > > paragraph. > > > > Any opinions? Should we allow users to optionally specify an > > environment > > for external transforms? > > > > Thanks, > > Max > > > > [1] https://github.com/apache/beam/pull/8639 > > [2] > > > https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E > > >
