My motivation was to get rid of the Docker dependency for the Python VR
tests. Similarly to how we use Python's LOOPBACK environment for
executing all non-cross-language tests, I wanted to use Java's EMBEDDED
environment to run the cross-language transforms.
I suppose we could also go with an override of the default environment
during startup of the Expansion Service. This would be less intrusive
and still allow us to use the embedded environment in some of the tests.
2(c) can also be "hacked" inside an SDK as an explicit environment override by the "user"
where the expansion service isn't involved and the user/SDK manipulates the expansion service response. As
Chamikara pointed out, I believe the response from the expansion service should be "safe" instead
of allowing it to return broken combinations.
The responses are not safe today because we do not have a check whether
an expanded transform is compatible with the default environment. I
agree though that it is better to develop these checks first before
allowing overrides by the user.
I'll need to follow-up with the artifact staging doc, then we can agree
on the first steps for implementation.
Thanks,
Max
On 22.05.19 20:13, Lukasz Cwik wrote:
2(c) can also be "hacked" inside an SDK as an explicit environment
override by the "user" where the expansion service isn't involved and
the user/SDK manipulates the expansion service response. As Chamikara
pointed out, I believe the response from the expansion service should be
"safe" instead of allowing it to return broken combinations.
On Wed, May 22, 2019 at 11:08 AM Chamikara Jayalath
<[email protected] <mailto:[email protected]>> wrote:
On Wed, May 22, 2019 at 9:17 AM Maximilian Michels <[email protected]
<mailto:[email protected]>> wrote:
Hi,
Robert and me were discussing on the subject of user-specified
environments for external transforms [1]. We couldn't decide
whether
users should have direct control over the environment when they
use an
external transform in their pipeline.
In my mind, it is quite natural that the Expansion Service is a
long-running service that gets started with a list of available
environments. Such a list can be outdated and users may write
transforms
for a new environment they want to use in their pipeline. The
easiest
way would be to allow to pass the environment with the
transform. Note
that we already give users control over the "main" environment
via the
PortablePipelineOptions, so this wouldn't be an entirely new
concept.
I think we are trying to generalize the expansion service along
multiple axes.
(1) dependencies
(a) dependencies embedded in an environment (b) dependencies
specific to an transform (c) dependencies specified by the user
expanding the transform
(2) environments
(a)default environment (b) environments specified a startup of the
expansion service (c) environments specified by the user expanding
the transform (this proposal)
It's great if we can implement the most generic solution along all
these exes but I think we run into risk of resulting in broken
combinations by trying to implement this before we have other
necessary pieces to support a long running expansion service. For
example, support for dynamically registering transforms and support
for discovering transforms.
What is the need for implementing 2 (c) now ? If there's no real
need now I suggest we settle with 2(a) or 2(b) for now till we can
truly support a long running expansion service. Also we'll have a
better idea of how this kind if features should evolve when we have
at least two runners supporting cross-language transforms (we are in
the process of updating Dataflow to support this). Just my 2 cents
though :)
The contrary position is that the Expansion Service should have
full
control over which environment is chosen. Going back to the
discussion
about artifact staging [2], this could enable to perform more
optimizations, such as merging environments or detecting conflicts.
However, this only works if this information has been provided
upfront
to the Expansion Service. It wouldn't be impossible to provide
these
hints alongside with the environment like suggested in the previous
paragraph.
Any opinions? Should we allow users to optionally specify an
environment
for external transforms?
Thanks,
Max
[1] https://github.com/apache/beam/pull/8639
[2]
https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d@%3Cdev.beam.apache.org%3E