Re: Environments for External Transforms

Maximilian Michels Thu, 23 May 2019 02:16:50 -0700

My motivation was to get rid of the Docker dependency for the Python VRtests. Similarly to how we use Python's LOOPBACK environment forexecuting all non-cross-language tests, I wanted to use Java's EMBEDDEDenvironment to run the cross-language transforms.

I suppose we could also go with an override of the default environmentduring startup of the Expansion Service. This would be less intrusiveand still allow us to use the embedded environment in some of the tests.

2(c) can also be "hacked" inside an SDK as an explicit environment override by the "user" 
where the expansion service isn't involved and the user/SDK manipulates the expansion service response. As 
Chamikara pointed out, I believe the response from the expansion service should be "safe" instead 
of allowing it to return broken combinations.

The responses are not safe today because we do not have a check whetheran expanded transform is compatible with the default environment. Iagree though that it is better to develop these checks first beforeallowing overrides by the user.

I'll need to follow-up with the artifact staging doc, then we can agreeon the first steps for implementation.


Thanks,
Max

On 22.05.19 20:13, Lukasz Cwik wrote:

2(c) can also be "hacked" inside an SDK as an explicit environmentoverride by the "user" where the expansion service isn't involved andthe user/SDK manipulates the expansion service response. As Chamikarapointed out, I believe the response from the expansion service should be"safe" instead of allowing it to return broken combinations.

On Wed, May 22, 2019 at 11:08 AM Chamikara Jayalath<[email protected] <mailto:[email protected]>> wrote:




    On Wed, May 22, 2019 at 9:17 AM Maximilian Michels <[email protected]
    <mailto:[email protected]>> wrote:

        Hi,

        Robert and me were discussing on the subject of user-specified
        environments for external transforms [1]. We couldn't decide
        whether
        users should have direct control over the environment when they
        use an
        external transform in their pipeline.

        In my mind, it is quite natural that the Expansion Service is a
        long-running service that gets started with a list of available
        environments. Such a list can be outdated and users may write
        transforms
        for a new environment they want to use in their pipeline. The
        easiest
        way would be to allow to pass the environment with the
        transform. Note
        that we already give users control over the "main" environment
        via the
        PortablePipelineOptions, so this wouldn't be an entirely new
        concept.



    I think we are trying to generalize the expansion service along
    multiple axes.
    (1) dependencies
    (a) dependencies embedded in an environment (b) dependencies
    specific to an transform (c) dependencies specified by the user
    expanding the transform

    (2) environments
    (a)default environment (b) environments specified a startup of the
    expansion service (c) environments specified by the user expanding
    the transform (this proposal)

    It's great if we can implement the most generic solution along all
    these exes but I think we run into risk of resulting in broken
    combinations by trying to implement this before we have other
    necessary pieces to support a long running expansion service. For
    example, support for dynamically registering transforms and support
    for discovering transforms.

    What is the need for implementing 2 (c) now ? If there's no real
    need now I suggest we settle with 2(a) or 2(b) for now till we can
    truly support a long running expansion service. Also we'll have a
    better idea of how this kind if features should evolve when we have
    at least two runners supporting cross-language transforms (we are in
    the process of updating Dataflow to support this). Just my 2 cents
    though :)


        The contrary position is that the Expansion Service should have
        full
        control over which environment is chosen. Going back to the
        discussion
        about artifact staging [2], this could enable to perform more
        optimizations, such as merging environments or detecting conflicts.
        However, this only works if this information has been provided
        upfront
        to the Expansion Service. It wouldn't be impossible to provide
        these
        hints alongside with the environment like suggested in the previous
        paragraph.

        Any opinions? Should we allow users to optionally specify an
        environment
        for external transforms?

        Thanks,
        Max

        [1] https://github.com/apache/beam/pull/8639
        [2]
        
https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E

Re: Environments for External Transforms

Reply via email to