Re: Multi Environment Support

Ke Wu Wed, 06 Oct 2021 13:12:54 -0700

I have a quick follow up questions. 

When using multiple external environments, is there a way to configure the 
multiple external service address? It looks like the current PipelineOptions 
only supports specifying one external address.


Best,
Ke

> On Oct 4, 2021, at 4:12 PM, Ke Wu <[email protected]> wrote:
> 
> This is great, let me try it out.
> 
> Best,
> Ke
> 
>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <[email protected]> wrote:
>> 
>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <[email protected]> wrote:
>>> 
>>> I am able to annotate/mark a java transform by setting its resource hints 
>>> [1] as well, which resulted in a different environment id, e.g.
>>> 
>>> beam:env:docker:v1 VS beam:env:docker:v11
>>> 
>>> Is this on the right track?
>> 
>> Yep.
>> 
>>> If Yes, I suppose then I need to configure job bundle factory to be able to 
>>> understand multiple environments and configure them separately as well.
>> 
>> It should already do the right thing here. That's how multi-language works.
>> 
>>> [1] 
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>>> 
>>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <[email protected]> wrote:
>>> 
>>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <[email protected]> wrote:
>>> 
>>> 
>>> Ideally, we do not want to expose anything directly to users and we, as the 
>>> framework and platform provider, separate things out under the hood.
>>> 
>>> I would expect users to author their DoFn(s) in the same way as they do 
>>> right now, but we expect to change the DoFn(s) that we provide, will be 
>>> annotated/marked so that it can be recognized during runtime.
>>> 
>>> In our use case, application is executed in Kubernetes environment 
>>> therefore, we are expecting to directly use different docker image to 
>>> isolate dependencies.
>>> 
>>> e.g. we have docker image A, which is beam core, that is used to start job 
>>> server and runner process. We have a docker image B, which contains DoFn(s) 
>>> that platform provides to serve as a external worker pool service to 
>>> execute platform provided DoFn(s), last but not least, users would have 
>>> their own docker image represent their application, which will be used to 
>>> start the external worker pool service to handle their own UDF execution.
>>> 
>>> Does this make sense ?
>>> 
>>> 
>>> In Python it's pretty trivial to annotate transforms (e.g. the
>>> "platform" transforms) which could be used to mark their environments
>>> prior to optimization (e.g. fusion). As mentioned, you could use
>>> resource hints (even a "dummy" hint like
>>> "use_platform_environment=True") to force these into a separate docker
>>> image as well.
>>> 
>>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <[email protected]> wrote:
>>> 
>>> That sounds neat. I think that before you try to figure out how to change 
>>> Beam to fit this usecase is to think about what would be the best way for 
>>> users to specify these requirements when they are constructing the 
>>> pipeline. Once you have some samples that you could share the community 
>>> would probably be able to give you more pointed advice.
>>> For example will they be running one application with a complicated class 
>>> loader setup, if so then we could probably do away with multiple 
>>> environments and try to have DoFn's recognize their specific class loader 
>>> configuration and replicate it on the SDK harness side.
>>> 
>>> Also, for performance reasons users may want to resolve their dependency 
>>> issues to create a maximally fused graph to limit performance impact due to 
>>> the encoding/decoding boundaries at the edges of those fused graphs.
>>> 
>>> Finally, this could definitely apply to languages like Python and Go (now 
>>> that Go has support for modules) as dependency issues are a common problem.
>>> 
>>> 
>>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <[email protected]> wrote:
>>> 
>>> 
>>> Thanks for the advice.
>>> 
>>> Here are some more background:
>>> 
>>> We are building a feature called “split deployment” such that, we can 
>>> isolate framework/platform core from user code/dependencies to address 
>>> couple of operational challenges such as dependency conflict, 
>>> alert/exception triaging.
>>> 
>>> With Beam’s portability framework, runner and sdk worker process naturally 
>>> decouples beam core and user UDFs(DoFn), which is awesome! On top of this, 
>>> we could further distinguish DoFn(s) that end user authors from DoFn(s) 
>>> that platform provides, therefore, we would like these DoFn(s) to be 
>>> executed in different environments, even in the same language, e.g. Java.
>>> 
>>> Therefore, I am exploring approaches and recommendations what are the 
>>> proper way to do that.
>>> 
>>> Let me know your thoughts, any feedback/advice is welcome.
>>> 
>>> Best,
>>> Ke
>>> 
>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <[email protected]> wrote:
>>> 
>>> Resource hints have a limited use case and might fit your need.
>>> You could also try to use the expansion service XLang route to bring in a 
>>> different Java environment.
>>> Finally, you could modify the pipeline proto that is generated directly to 
>>> choose which environment is used for which PTransform.
>>> 
>>> Can you provide additional details as to why you would want to have two 
>>> separate java environments (e.g. incompatible versions of libraries)?
>>> 
>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <[email protected]> wrote:
>>> 
>>> 
>>> Thanks Luke for the reply, do you know what is the preferred way to 
>>> configure a PTransform to be executed in a different environment from 
>>> another PTransform when both are in the same SDK, e.g. Java ?
>>> 
>>> Best,
>>> Ke
>>> 
>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <[email protected]> wrote:
>>> 
>>> Environments that aren't exactly the same are already in separate 
>>> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>> 
>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also 
>>> be effectively dead-code.
>>> 
>>> 1: 
>>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>> 
>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <[email protected]> wrote:
>>> 
>>> 
>>> Hello All,
>>> 
>>> We have a use case where in a java portable pipeline, we would like to have 
>>> multiple environments setup in order that some executable stage runs in one 
>>> environment while some other executable stages runs in another environment. 
>>> Couple of questions on this:
>>> 
>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is 
>>> feature pending support
>>> 2. If we did support it, what would the ideal mechanism to distinguish 
>>> ParDo/ExecutableStage to be executed in different environment, is it 
>>> through ResourceHints?
>>> 
>>> 
>>> Best,
>>> Ke
>>> 
>>> 
>>> [1] 
>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>> 
>>> 
>

Re: Multi Environment Support

Reply via email to