Re: Environments for External Transforms

Maximilian Michels Thu, 23 May 2019 03:46:40 -0700

Writing a new transform involves updating the expansion service to include their new transform.

Would it be conceivable that the expansion is performed via theenvironment? That would solve the problem of updating the expansionservice, although it adds additional complexity for bringing up theenvironment.


On 23.05.19 11:31, Robert Bradshaw wrote:

On Wed, May 22, 2019 at 6:17 PM Maximilian Michels <m...@apache.org<mailto:m...@apache.org>> wrote:
    Hi,

    Robert and me were discussing on the subject of user-specified
    environments for external transforms [1]. We couldn't decide whether
    users should have direct control over the environment when they use an
    external transform in their pipeline.

    In my mind, it is quite natural that the Expansion Service is a
    long-running service that gets started with a list of available
environments.
+1.
IMHO, the expansion service should be expected to provide validenvironments for the transforms it vendors. Removing this expectationseems wrong. Making it cheap to specify non-default dependencies withoutbuilding (publishing, etc.) a docker image is probably key to makingthis work well (and also allowing more powerful environment introspection).
    Such a list can be outdated and users may write transforms
for a new environment they want to use in their pipeline.
This is the part that I'm having trouble following. Writing a newtransform involves updating the expansion service to include their newtransform. The author of a transform (in other words, the one whodefines its expansion and implementation) is in the position to name itsdependencies, etc. and the user of the transform (the one invoking it)is not in a generally good position to know what environments would bevalid.
    The easiest
way would be to allow to pass the environment with the transform.
What this allows is using existing transforms in new environments. Thereare possibly some usecases for this, e.g. expansion of a given transformmay be compatible with ether version X or version Y of a library, leftup to the discretion of the caller, but I think that this is really justa deficiency in our environment specifications (e.g. it one should beable to express this flexibility in the returned environment).
    Note
    that we already give users control over the "main" environment via the
    PortablePipelineOptions, so this wouldn't be an entirely new concept.
Yes, the author of a pipeline/transform chooses the environment in whichthose transforms execute.
    The contrary position is that the Expansion Service should have full
    control over which environment is chosen. Going back to the discussion
    about artifact staging [2], this could enable to perform more
    optimizations, such as merging environments or detecting conflicts.
    However, this only works if this information has been provided upfront
    to the Expansion Service. It wouldn't be impossible to provide these
    hints alongside with the environment like suggested in the previous
    paragraph.

    Any opinions? Should we allow users to optionally specify an
    environment
    for external transforms?

    Thanks,
    Max

    [1] https://github.com/apache/beam/pull/8639
    [2]
    
https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E

Re: Environments for External Transforms

Reply via email to