I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
beam:env:docker:v1 VS beam:env:docker:v11 Is this on the right track? If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well. [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218 <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218> > On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <rober...@google.com> wrote: > > On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke.wu...@gmail.com > <mailto:ke.wu...@gmail.com>> wrote: >> >> Ideally, we do not want to expose anything directly to users and we, as the >> framework and platform provider, separate things out under the hood. >> >> I would expect users to author their DoFn(s) in the same way as they do >> right now, but we expect to change the DoFn(s) that we provide, will be >> annotated/marked so that it can be recognized during runtime. >> >> In our use case, application is executed in Kubernetes environment >> therefore, we are expecting to directly use different docker image to >> isolate dependencies. >> >> e.g. we have docker image A, which is beam core, that is used to start job >> server and runner process. We have a docker image B, which contains DoFn(s) >> that platform provides to serve as a external worker pool service to execute >> platform provided DoFn(s), last but not least, users would have their own >> docker image represent their application, which will be used to start the >> external worker pool service to handle their own UDF execution. >> >> Does this make sense ? > > In Python it's pretty trivial to annotate transforms (e.g. the > "platform" transforms) which could be used to mark their environments > prior to optimization (e.g. fusion). As mentioned, you could use > resource hints (even a "dummy" hint like > "use_platform_environment=True") to force these into a separate docker > image as well. > >> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote: >> >> That sounds neat. I think that before you try to figure out how to change >> Beam to fit this usecase is to think about what would be the best way for >> users to specify these requirements when they are constructing the pipeline. >> Once you have some samples that you could share the community would probably >> be able to give you more pointed advice. >> For example will they be running one application with a complicated class >> loader setup, if so then we could probably do away with multiple >> environments and try to have DoFn's recognize their specific class loader >> configuration and replicate it on the SDK harness side. >> >> Also, for performance reasons users may want to resolve their dependency >> issues to create a maximally fused graph to limit performance impact due to >> the encoding/decoding boundaries at the edges of those fused graphs. >> >> Finally, this could definitely apply to languages like Python and Go (now >> that Go has support for modules) as dependency issues are a common problem. >> >> >> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke.wu...@gmail.com> wrote: >>> >>> Thanks for the advice. >>> >>> Here are some more background: >>> >>> We are building a feature called “split deployment” such that, we can >>> isolate framework/platform core from user code/dependencies to address >>> couple of operational challenges such as dependency conflict, >>> alert/exception triaging. >>> >>> With Beam’s portability framework, runner and sdk worker process naturally >>> decouples beam core and user UDFs(DoFn), which is awesome! On top of this, >>> we could further distinguish DoFn(s) that end user authors from DoFn(s) >>> that platform provides, therefore, we would like these DoFn(s) to be >>> executed in different environments, even in the same language, e.g. Java. >>> >>> Therefore, I am exploring approaches and recommendations what are the >>> proper way to do that. >>> >>> Let me know your thoughts, any feedback/advice is welcome. >>> >>> Best, >>> Ke >>> >>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote: >>> >>> Resource hints have a limited use case and might fit your need. >>> You could also try to use the expansion service XLang route to bring in a >>> different Java environment. >>> Finally, you could modify the pipeline proto that is generated directly to >>> choose which environment is used for which PTransform. >>> >>> Can you provide additional details as to why you would want to have two >>> separate java environments (e.g. incompatible versions of libraries)? >>> >>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke.wu...@gmail.com> wrote: >>>> >>>> Thanks Luke for the reply, do you know what is the preferred way to >>>> configure a PTransform to be executed in a different environment from >>>> another PTransform when both are in the same SDK, e.g. Java ? >>>> >>>> Best, >>>> Ke >>>> >>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote: >>>> >>>> Environments that aren't exactly the same are already in separate >>>> ExecutableStages. The GreedyPCollectionFuser ensures that today[1]. >>>> >>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may >>>> also be effectively dead-code. >>>> >>>> 1: >>>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144 >>>> >>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke.wu...@gmail.com> wrote: >>>>> >>>>> Hello All, >>>>> >>>>> We have a use case where in a java portable pipeline, we would like to >>>>> have multiple environments setup in order that some executable stage runs >>>>> in one environment while some other executable stages runs in another >>>>> environment. Couple of questions on this: >>>>> >>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it >>>>> is feature pending support >>>>> 2. If we did support it, what would the ideal mechanism to distinguish >>>>> ParDo/ExecutableStage to be executed in different environment, is it >>>>> through ResourceHints? >>>>> >>>>> >>>>> Best, >>>>> Ke >>>>> >>>>> >>>>> [1] >>>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344