Re: Multi Environment Support

Ke Wu Wed, 15 Dec 2021 13:46:34 -0800

Thanks for the pointer! I missed the part that pipeline’s options is copied 
from expansion options!


Then, I think I just need a way to disable the auto creation of expansion 
service during job server start up.


> On Dec 15, 2021, at 11:29 AM, Chamikara Jayalath <[email protected]> wrote:
> 
> 
> 
> On Wed, Dec 15, 2021 at 10:38 AM Ke Wu <[email protected] 
> <mailto:[email protected]>> wrote:
> Thanks for Cham to chime in, having each expansion service instance to be 
> able to serve a distinct environment which is configurable is what I am 
> looking for, however, I don’t think it is practical atm.
> 
> The pipeline option it uses to register environment is NOT the pipeline 
> option of expansion service but the option of the pipeline that is created 
> with default, i.e. not configurable.
> 
> 
> AFAICT we get the options object to register the environment from the 
> Pipeline  here: 
> https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L527
>  
> <https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L527>
> 
> We set a copy of the input options passed to the expansion service in the 
> pipeline object here: 
> https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L583
>  
> <https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L583>
> 
> So probably you just need to make sure the options you need for configuring 
> the environment are copied over when creating "effectiveOpts". Which specific 
> options are you trying to use (assuming Java expansion service) ?
> 
> Thanks,
> Cham
> 
> 
>> On Dec 14, 2021, at 8:49 PM, Chamikara Jayalath <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>> On Mon, Dec 13, 2021 at 11:12 AM Ke Wu <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Agreed, that is why I was hesitant to complicate ServerConfiguration such 
>> that it can configure the expansion service it brings up.
>> 
>> In this case, do you think we may just add one more option to disable the 
>> expansion service setup so for complicated use cases, expansion service can 
>> be brought up manually with desired options?
>> 
>> Another question is, if we update java Expansion service to use its own 
>> option instead of blindly uses pipeline’s option [1], will this cause any 
>> backward compatibility concern?
>> 
>> I haven't fully followed the thread but this indeed will be a backwards 
>> incompatible change. I believe currently we assume each expansion service 
>> instance to serve a distinct environment. Also we artificially limit the set 
>> of PipelineOptions that apply performing the expansion [1] since arbitrary 
>> PipelineOptions specified through the ExpansionService do not get propagated 
>> to the SDK that submits the pipeline to the runner. But we can expand this 
>> list as needed so that the environment of each expansion service is fully 
>> configurable through PipelineOptions
>> 
>> Thanks,
>> Cham
>> 
>> [1] 
>> https://github.com/apache/beam/blob/f9b0d6d6290436ff2a357fcebff8a281cc383025/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L572
>>  
>> <https://github.com/apache/beam/blob/f9b0d6d6290436ff2a357fcebff8a281cc383025/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L572>
>> 
>> 
>> Best,
>> Ke
>> 
>> [1] 
>> https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528
>>  
>> <https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528>
>> 
>> pipeline.getOptions() -> this.pipelineOptions
>> 
>>> On Dec 13, 2021, at 10:40 AM, Robert Bradshaw <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> I think bringing up the expansion service as part of the job server is
>>> just a convenience for the simple case.
>>> 
>>> On Mon, Dec 13, 2021 at 9:54 AM Ke Wu <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Awesome, I am happy to make the update.
>>>> 
>>>> One more question, I noticed that expansion service is currently being 
>>>> brought up as part of job server in Java [1], if this is the preferred 
>>>> approach, does it mean, we should update ServerConfiguration to include 
>>>> such configurations, like expansion server port [2]
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> [1] 
>>>> https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58
>>>>  
>>>> <https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58>
>>>> [2] 
>>>> https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88
>>>>  
>>>> <https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88>
>>>> 
>>>> 
>>>> On Dec 9, 2021, at 4:34 PM, Robert Bradshaw <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> An expansion service should be able to specify its own environment, as
>>>> is done in Python, not blindly inherit it from the caller. Java should
>>>> be updated to have the same flexibility.
>>>> 
>>>> On Thu, Dec 9, 2021 at 4:27 PM Ke Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> Hi Robert,
>>>> 
>>>> After some more digging and exploration, I realized that it is not clean 
>>>> and straightforward to update PipelineOption to support such.
>>>> 
>>>> Therefore, I spent more time exploring the external transform & expansion 
>>>> service route. However, I noticed that this approach has the same 
>>>> limitation where we cannot configure environments ourselves.
>>>> 
>>>> This is because when ExpansionService register environment [1], it uses 
>>>> the pipeline options of the original pipeline. This limitation not only 
>>>> restrict our capability to configure a different external environment in 
>>>> external mode but also restrict Docker mode to specify docker image for 
>>>> different SDKs but use defaults.
>>>> 
>>>> Is this expected or do you think ExpansionService should be updated to be 
>>>> configurable of the environment it registers ?
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> 
>>>> [1]
>>>> 
>>>> Java: 
>>>> https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526
>>>>  
>>>> <https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526>
>>>> 
>>>> Python:
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39
>>>>  
>>>> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39>
>>>> 
>>>> 
>>>> 
>>>> On Nov 17, 2021, at 1:02 PM, Robert Bradshaw <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Could you give a concrete example of what such a specification would look 
>>>> like?
>>>> 
>>>> On Wed, Nov 17, 2021 at 11:05 AM Ke Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> Hi Robert,
>>>> 
>>>> Thanks for the pointer, using expansion service hack seems to work!
>>>> 
>>>> On the other hand, since PipelineOptions is the place to configure 
>>>> external service address anyway, do you think it makes sense to expand it 
>>>> so it is capable of specifying multiple external environment to external 
>>>> service address mapping?
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> I have a quick follow up questions.
>>>> 
>>>> When using multiple external environments, is there a way to configure the 
>>>> multiple external service address? It looks like the current 
>>>> PipelineOptions only supports specifying one external address.
>>>> 
>>>> 
>>>> PipelineOptions wasn't really built with the idea of multiple distinct
>>>> environments in mind. One hack you could do is put one of the
>>>> environments in an expansion service with its own environment (as if
>>>> it were written in a different language) and configure that
>>>> environment separately.
>>>> 
>>>> On Oct 4, 2021, at 4:12 PM, Ke Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> This is great, let me try it out.
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> I am able to annotate/mark a java transform by setting its resource hints 
>>>> [1] as well, which resulted in a different environment id, e.g.
>>>> 
>>>> beam:env:docker:v1 VS beam:env:docker:v11
>>>> 
>>>> Is this on the right track?
>>>> 
>>>> 
>>>> Yep.
>>>> 
>>>> If Yes, I suppose then I need to configure job bundle factory to be able 
>>>> to understand multiple environments and configure them separately as well.
>>>> 
>>>> 
>>>> It should already do the right thing here. That's how multi-language works.
>>>> 
>>>> [1] 
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>>>>  
>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218>
>>>> 
>>>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> Ideally, we do not want to expose anything directly to users and we, as 
>>>> the framework and platform provider, separate things out under the hood.
>>>> 
>>>> I would expect users to author their DoFn(s) in the same way as they do 
>>>> right now, but we expect to change the DoFn(s) that we provide, will be 
>>>> annotated/marked so that it can be recognized during runtime.
>>>> 
>>>> In our use case, application is executed in Kubernetes environment 
>>>> therefore, we are expecting to directly use different docker image to 
>>>> isolate dependencies.
>>>> 
>>>> e.g. we have docker image A, which is beam core, that is used to start job 
>>>> server and runner process. We have a docker image B, which contains 
>>>> DoFn(s) that platform provides to serve as a external worker pool service 
>>>> to execute platform provided DoFn(s), last but not least, users would have 
>>>> their own docker image represent their application, which will be used to 
>>>> start the external worker pool service to handle their own UDF execution.
>>>> 
>>>> Does this make sense ?
>>>> 
>>>> 
>>>> In Python it's pretty trivial to annotate transforms (e.g. the
>>>> "platform" transforms) which could be used to mark their environments
>>>> prior to optimization (e.g. fusion). As mentioned, you could use
>>>> resource hints (even a "dummy" hint like
>>>> "use_platform_environment=True") to force these into a separate docker
>>>> image as well.
>>>> 
>>>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> That sounds neat. I think that before you try to figure out how to change 
>>>> Beam to fit this usecase is to think about what would be the best way for 
>>>> users to specify these requirements when they are constructing the 
>>>> pipeline. Once you have some samples that you could share the community 
>>>> would probably be able to give you more pointed advice.
>>>> For example will they be running one application with a complicated class 
>>>> loader setup, if so then we could probably do away with multiple 
>>>> environments and try to have DoFn's recognize their specific class loader 
>>>> configuration and replicate it on the SDK harness side.
>>>> 
>>>> Also, for performance reasons users may want to resolve their dependency 
>>>> issues to create a maximally fused graph to limit performance impact due 
>>>> to the encoding/decoding boundaries at the edges of those fused graphs.
>>>> 
>>>> Finally, this could definitely apply to languages like Python and Go (now 
>>>> that Go has support for modules) as dependency issues are a common problem.
>>>> 
>>>> 
>>>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> Thanks for the advice.
>>>> 
>>>> Here are some more background:
>>>> 
>>>> We are building a feature called “split deployment” such that, we can 
>>>> isolate framework/platform core from user code/dependencies to address 
>>>> couple of operational challenges such as dependency conflict, 
>>>> alert/exception triaging.
>>>> 
>>>> With Beam’s portability framework, runner and sdk worker process naturally 
>>>> decouples beam core and user UDFs(DoFn), which is awesome! On top of this, 
>>>> we could further distinguish DoFn(s) that end user authors from DoFn(s) 
>>>> that platform provides, therefore, we would like these DoFn(s) to be 
>>>> executed in different environments, even in the same language, e.g. Java.
>>>> 
>>>> Therefore, I am exploring approaches and recommendations what are the 
>>>> proper way to do that.
>>>> 
>>>> Let me know your thoughts, any feedback/advice is welcome.
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Resource hints have a limited use case and might fit your need.
>>>> You could also try to use the expansion service XLang route to bring in a 
>>>> different Java environment.
>>>> Finally, you could modify the pipeline proto that is generated directly to 
>>>> choose which environment is used for which PTransform.
>>>> 
>>>> Can you provide additional details as to why you would want to have two 
>>>> separate java environments (e.g. incompatible versions of libraries)?
>>>> 
>>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> Thanks Luke for the reply, do you know what is the preferred way to 
>>>> configure a PTransform to be executed in a different environment from 
>>>> another PTransform when both are in the same SDK, e.g. Java ?
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Environments that aren't exactly the same are already in separate 
>>>> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>>> 
>>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may 
>>>> also be effectively dead-code.
>>>> 
>>>> 1: 
>>>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>>>  
>>>> <https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144>
>>>> 
>>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> Hello All,
>>>> 
>>>> We have a use case where in a java portable pipeline, we would like to 
>>>> have multiple environments setup in order that some executable stage runs 
>>>> in one environment while some other executable stages runs in another 
>>>> environment. Couple of questions on this:
>>>> 
>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is 
>>>> feature pending support
>>>> 2. If we did support it, what would the ideal mechanism to distinguish 
>>>> ParDo/ExecutableStage to be executed in different environment, is it 
>>>> through ResourceHints?
>>>> 
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> 
>>>> [1] 
>>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>>>  
>>>> <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>

Re: Multi Environment Support

Reply via email to