Re: Environments for External Transforms

Lukasz Cwik Wed, 22 May 2019 11:13:57 -0700

2(c) can also be "hacked" inside an SDK as an explicit environment override
by the "user" where the expansion service isn't involved and the user/SDK
manipulates the expansion service response. As Chamikara pointed out, I
believe the response from the expansion service should be "safe" instead of
allowing it to return broken combinations.


On Wed, May 22, 2019 at 11:08 AM Chamikara Jayalath <[email protected]>
wrote:

>
>
> On Wed, May 22, 2019 at 9:17 AM Maximilian Michels <[email protected]> wrote:
>
>> Hi,
>>
>> Robert and me were discussing on the subject of user-specified
>> environments for external transforms [1]. We couldn't decide whether
>> users should have direct control over the environment when they use an
>> external transform in their pipeline.
>>
>> In my mind, it is quite natural that the Expansion Service is a
>> long-running service that gets started with a list of available
>> environments. Such a list can be outdated and users may write transforms
>> for a new environment they want to use in their pipeline. The easiest
>> way would be to allow to pass the environment with the transform. Note
>> that we already give users control over the "main" environment via the
>> PortablePipelineOptions, so this wouldn't be an entirely new concept.
>>
>
>
> I think we are trying to generalize the expansion service along multiple
> axes.
> (1) dependencies
> (a) dependencies embedded in an environment (b) dependencies specific to
> an transform (c) dependencies specified by the user expanding the transform
>
> (2) environments
> (a)default environment (b) environments specified a startup of the
> expansion service (c) environments specified by the user expanding the
> transform (this proposal)
>
> It's great if we can implement the most generic solution along all these
> exes but I think we run into risk of resulting in broken combinations by
> trying to implement this before we have other necessary pieces to support a
> long running expansion service. For example, support for dynamically
> registering transforms and support for discovering transforms.
>
> What is the need for implementing 2 (c) now ? If there's no real need now
> I suggest we settle with 2(a) or 2(b) for now till we can truly support a
> long running expansion service. Also we'll have a better idea of how this
> kind if features should evolve when we have at least two runners supporting
> cross-language transforms (we are in the process of updating Dataflow to
> support this). Just my 2 cents though :)
>
>
>>
>> The contrary position is that the Expansion Service should have full
>> control over which environment is chosen. Going back to the discussion
>> about artifact staging [2], this could enable to perform more
>> optimizations, such as merging environments or detecting conflicts.
>> However, this only works if this information has been provided upfront
>> to the Expansion Service. It wouldn't be impossible to provide these
>> hints alongside with the environment like suggested in the previous
>> paragraph.
>>
>> Any opinions? Should we allow users to optionally specify an environment
>> for external transforms?
>>
>> Thanks,
>> Max
>>
>> [1] https://github.com/apache/beam/pull/8639
>> [2]
>>
>> https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E
>>
>

Re: Environments for External Transforms

Reply via email to