Hi,
Robert and me were discussing on the subject of user-specified
environments for external transforms [1]. We couldn't decide whether
users should have direct control over the environment when they use an
external transform in their pipeline.
In my mind, it is quite natural that the Expansion Service is a
long-running service that gets started with a list of available
environments. Such a list can be outdated and users may write transforms
for a new environment they want to use in their pipeline. The easiest
way would be to allow to pass the environment with the transform. Note
that we already give users control over the "main" environment via the
PortablePipelineOptions, so this wouldn't be an entirely new concept.
The contrary position is that the Expansion Service should have full
control over which environment is chosen. Going back to the discussion
about artifact staging [2], this could enable to perform more
optimizations, such as merging environments or detecting conflicts.
However, this only works if this information has been provided upfront
to the Expansion Service. It wouldn't be impossible to provide these
hints alongside with the environment like suggested in the previous
paragraph.
Any opinions? Should we allow users to optionally specify an environment
for external transforms?
Thanks,
Max
[1] https://github.com/apache/beam/pull/8639
[2]
https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E