Re: Multi Environment Support

Ke Wu Wed, 17 Nov 2021 11:05:38 -0800

Hi Robert,

Thanks for the pointer, using expansion service hack seems to work!


On the other hand, since PipelineOptions is the place to configure external 
service address anyway, do you think it makes sense to expand it so it is 
capable of specifying multiple external environment to external service address 
mapping?

Best,
Ke

> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <rober...@google.com> wrote:
> 
> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke.wu...@gmail.com> wrote:
>> 
>> I have a quick follow up questions.
>> 
>> When using multiple external environments, is there a way to configure the 
>> multiple external service address? It looks like the current PipelineOptions 
>> only supports specifying one external address.
> 
> PipelineOptions wasn't really built with the idea of multiple distinct
> environments in mind. One hack you could do is put one of the
> environments in an expansion service with its own environment (as if
> it were written in a different language) and configure that
> environment separately.
> 
>>> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke.wu...@gmail.com> wrote:
>>> 
>>> This is great, let me try it out.
>>> 
>>> Best,
>>> Ke
>>> 
>>>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <rober...@google.com> wrote:
>>>> 
>>>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke.wu...@gmail.com> wrote:
>>>>> 
>>>>> I am able to annotate/mark a java transform by setting its resource hints 
>>>>> [1] as well, which resulted in a different environment id, e.g.
>>>>> 
>>>>> beam:env:docker:v1 VS beam:env:docker:v11
>>>>> 
>>>>> Is this on the right track?
>>>> 
>>>> Yep.
>>>> 
>>>>> If Yes, I suppose then I need to configure job bundle factory to be able 
>>>>> to understand multiple environments and configure them separately as well.
>>>> 
>>>> It should already do the right thing here. That's how multi-language works.
>>>> 
>>>>> [1] 
>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>>>>> 
>>>>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <rober...@google.com> wrote:
>>>>> 
>>>>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke.wu...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Ideally, we do not want to expose anything directly to users and we, as 
>>>>> the framework and platform provider, separate things out under the hood.
>>>>> 
>>>>> I would expect users to author their DoFn(s) in the same way as they do 
>>>>> right now, but we expect to change the DoFn(s) that we provide, will be 
>>>>> annotated/marked so that it can be recognized during runtime.
>>>>> 
>>>>> In our use case, application is executed in Kubernetes environment 
>>>>> therefore, we are expecting to directly use different docker image to 
>>>>> isolate dependencies.
>>>>> 
>>>>> e.g. we have docker image A, which is beam core, that is used to start 
>>>>> job server and runner process. We have a docker image B, which contains 
>>>>> DoFn(s) that platform provides to serve as a external worker pool service 
>>>>> to execute platform provided DoFn(s), last but not least, users would 
>>>>> have their own docker image represent their application, which will be 
>>>>> used to start the external worker pool service to handle their own UDF 
>>>>> execution.
>>>>> 
>>>>> Does this make sense ?
>>>>> 
>>>>> 
>>>>> In Python it's pretty trivial to annotate transforms (e.g. the
>>>>> "platform" transforms) which could be used to mark their environments
>>>>> prior to optimization (e.g. fusion). As mentioned, you could use
>>>>> resource hints (even a "dummy" hint like
>>>>> "use_platform_environment=True") to force these into a separate docker
>>>>> image as well.
>>>>> 
>>>>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>>>>> 
>>>>> That sounds neat. I think that before you try to figure out how to change 
>>>>> Beam to fit this usecase is to think about what would be the best way for 
>>>>> users to specify these requirements when they are constructing the 
>>>>> pipeline. Once you have some samples that you could share the community 
>>>>> would probably be able to give you more pointed advice.
>>>>> For example will they be running one application with a complicated class 
>>>>> loader setup, if so then we could probably do away with multiple 
>>>>> environments and try to have DoFn's recognize their specific class loader 
>>>>> configuration and replicate it on the SDK harness side.
>>>>> 
>>>>> Also, for performance reasons users may want to resolve their dependency 
>>>>> issues to create a maximally fused graph to limit performance impact due 
>>>>> to the encoding/decoding boundaries at the edges of those fused graphs.
>>>>> 
>>>>> Finally, this could definitely apply to languages like Python and Go (now 
>>>>> that Go has support for modules) as dependency issues are a common 
>>>>> problem.
>>>>> 
>>>>> 
>>>>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke.wu...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Thanks for the advice.
>>>>> 
>>>>> Here are some more background:
>>>>> 
>>>>> We are building a feature called “split deployment” such that, we can 
>>>>> isolate framework/platform core from user code/dependencies to address 
>>>>> couple of operational challenges such as dependency conflict, 
>>>>> alert/exception triaging.
>>>>> 
>>>>> With Beam’s portability framework, runner and sdk worker process 
>>>>> naturally decouples beam core and user UDFs(DoFn), which is awesome! On 
>>>>> top of this, we could further distinguish DoFn(s) that end user authors 
>>>>> from DoFn(s) that platform provides, therefore, we would like these 
>>>>> DoFn(s) to be executed in different environments, even in the same 
>>>>> language, e.g. Java.
>>>>> 
>>>>> Therefore, I am exploring approaches and recommendations what are the 
>>>>> proper way to do that.
>>>>> 
>>>>> Let me know your thoughts, any feedback/advice is welcome.
>>>>> 
>>>>> Best,
>>>>> Ke
>>>>> 
>>>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>>>>> 
>>>>> Resource hints have a limited use case and might fit your need.
>>>>> You could also try to use the expansion service XLang route to bring in a 
>>>>> different Java environment.
>>>>> Finally, you could modify the pipeline proto that is generated directly 
>>>>> to choose which environment is used for which PTransform.
>>>>> 
>>>>> Can you provide additional details as to why you would want to have two 
>>>>> separate java environments (e.g. incompatible versions of libraries)?
>>>>> 
>>>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke.wu...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Thanks Luke for the reply, do you know what is the preferred way to 
>>>>> configure a PTransform to be executed in a different environment from 
>>>>> another PTransform when both are in the same SDK, e.g. Java ?
>>>>> 
>>>>> Best,
>>>>> Ke
>>>>> 
>>>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>>>> 
>>>>> Environments that aren't exactly the same are already in separate 
>>>>> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>>>> 
>>>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may 
>>>>> also be effectively dead-code.
>>>>> 
>>>>> 1: 
>>>>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>>>> 
>>>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke.wu...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Hello All,
>>>>> 
>>>>> We have a use case where in a java portable pipeline, we would like to 
>>>>> have multiple environments setup in order that some executable stage runs 
>>>>> in one environment while some other executable stages runs in another 
>>>>> environment. Couple of questions on this:
>>>>> 
>>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it 
>>>>> is feature pending support
>>>>> 2. If we did support it, what would the ideal mechanism to distinguish 
>>>>> ParDo/ExecutableStage to be executed in different environment, is it 
>>>>> through ResourceHints?
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Ke
>>>>> 
>>>>> 
>>>>> [1] 
>>>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>>>> 
>>>>> 
>>> 
>>

Re: Multi Environment Support

Reply via email to