Re: Environments and compatibility Was: [Proposal] Beam ML containers

Joey Tran Tue, 15 Jul 2025 17:32:36 -0700

>
> "Environment" has become almost-but-not-quite assumed to be a docker
> container image specification. TBH I haven't dug in deeply recently but
> this is how I see it discussed. I am aware that there are LOOPBACK and
> EMBEDDED environments as well.



Docker container images only cover the software deps part of the
"Environment", right? A runner still needs to figure out hardware
requirements for provisioning workers based on the Environment as well. I'm
not sure how that might be interpretable from a docker URL.


> In such a regime (where so much is hidden inside a docker URL)
> compatibility pretty much has to be runner-determined. First the runner
> reinterprets docker URLs (what aspects are important is partially
> runner-specific) and deps and in Joey's case licenses more abstractly, then
> has its own logic to decide which environments it can merge. This logic is
> inherently cross-SDK, both in terms of versions and languages. We could
> have a spec in docs and proto but it can't live as code in a particular SDK.


I think the spec may also need ways to implement user-configurable merge
strategies. e.g. you can imagine that a gpu-rich user may be happy with
greedy fusion while a user with few GPUs would want only GPU-requiring
transforms running on their GPU nodes


On Tue, Jul 15, 2025 at 4:42 PM Kenneth Knowles <k...@apache.org> wrote:

> Altered subject because this has come up in a number of contexts and I
> wonder if now is a time to have fresh thoughts on it.
>
> "Environment" has become almost-but-not-quite assumed to be a docker
> container image specification. TBH I haven't dug in deeply recently but
> this is how I see it discussed. I am aware that there are LOOPBACK and
> EMBEDDED environments as well.
>
> The intention behind having environments is really that they could be
> somewhat abstract and *often* recognized by runners and elided for
> efficiency. It is an anit-goal to have the entirety of the contents of a
> container be "the spec" for an environment. This is begging to be trapped
> by Hyrum's Law.
>
> All that said, we are where we are. My presumption, for a while now, is
> that runners would have to recognize and/or parse docker images and
> *reinterpret* them as abstract specifications of environments. In other
> words the URL for the Beam Java SDK harness container (plus deps) would be
> reinterpreted as "default Beam Java SDK harness" and the runner would then
> run in any compatible way it desired.
>
> For example it is *intended* that non-portable runners could be seamlessly
> reused after verifying that all environments are compatible with their
> non-portable execution style. The Flink/Spark/Samza runner executing an
> all-Java pipeline via the portable gRPC protocols is a huge missed
> opportunity, just throwing existing functionality and performance away.
>
> In such a regime (where so much is hidden inside a docker URL)
> compatibility pretty much has to be runner-determined. First the runner
> reinterprets docker URLs (what aspects are important is partially
> runner-specific) and deps and in Joey's case licenses more abstractly, then
> has its own logic to decide which environments it can merge. This logic is
> inherently cross-SDK, both in terms of versions and languages. We could
> have a spec in docs and proto but it can't live as code in a particular SDK.
>
> Kenn
>
> On Thu, Jul 3, 2025 at 4:34 AM Joey Tran <joey.t...@schrodinger.com>
> wrote:
>
>>
>> On Tue, Jul 1, 2025 at 2:37 PM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> I think it is probably reasonable to automate this when a GPU resource
>>> hint is used. I think we still need to expose this as a config option for
>>> the ML containers (and it is the same with distroless) since it is pretty
>>> difficult to say with confidence that those images are/aren't needed (even
>>> if you're using a transform like RunInference, maybe you're using Spacy
>>> which isn't a default dependency included in the ML images) and there is a
>>> cost to using them (longer startup times).
>>>
>>> > This being the messy world of ML, would these images be
>>> mahine/accelerator agnostic?
>>>
>>> That is the goal (at least to be agnostic within GPU types), and the
>>> images will be as simple as possible to accommodate this. I think building
>>> from an Nvidia base should accomplish this for most cases. For anything
>>> beyond that, I think it is reasonable to ask users to build their own
>>> container.
>>>
>>> On Tue, Jul 1, 2025 at 1:36 PM Robert Bradshaw <rober...@waymo.com>
>>> wrote:
>>>
>>>> On Tue, Jul 1, 2025 at 10:32 AM Kenneth Knowles <k...@apache.org>
>>>> wrote:
>>>> >
>>>> > Obligatory question: can we automate this? Specifically: can we
>>>> publish the ML-specific containers and then use them as appropriate without
>>>> making it a user-facing knob?
>>>>
>>>> +1
>>>>
>>>> Transforms can declare their own environments. The only problem with
>>>> this is that distinct environments prohibit fusion--we need a way to
>>>> say that a given environment is a superset of another. (We can do this
>>>> with dependencies, but not with arbitrary docker images.) (One could
>>>> possibly get away with the "AnyOf" environment as the base environment
>>>> as well, if we define (and enforce) a preference order.)
>>>>
>>>>
>> This comes up a lot for us (Schrodinger). e.g. our runner allows for
>> transforms to specify what licenses they require, but the current rules for
>> environment compatibility make it difficult to allow transforms that have
>> no license requirements to fuse with environments that do have requirements
>> (as a workaround, we just implement this through transform annotations).
>>
>> It'd also be really convenient for us since we don't ship our software
>> with GCP libraries so we need a separate environment for GCP-transforms.
>> Allowing fusion of GCP-transforms with non-GCP-transforms will be a bit
>> difficult with the current system.
>>
>>
>>
>> This being the messy world of ML, would these images be
>>>> mahine/accelerator agnostic?
>>>>
>>>> > Kenn
>>>> >
>>>> > On Mon, Jun 30, 2025 at 12:07 PM Danny McCormick via dev <
>>>> dev@beam.apache.org> wrote:
>>>> >>
>>>> >> Hey everyone, I'd like to propose publishing some ML-specific Beam
>>>> containers alongside our normal base containers. The end result would be
>>>> allowing users to specify `--sdk_container_image=ml` or
>>>> `--sdk_container_image=gpu` so that their jobs run in containers which work
>>>> well with ML/GPU jobs.
>>>> >>
>>>> >> I put together a tiny design, please take a look and let me know
>>>> what you think.
>>>> >>
>>>> >>
>>>> https://docs.google.com/document/d/1JcVFJsPbVvtvaYdGi-DzWy9PIIYJhL7LwWGEXt2NZMk/edit?usp=sharing
>>>> >>
>>>> >> Thanks,
>>>> >> Danny
>>>>
>>>

Re: Environments and compatibility Was: [Proposal] Beam ML containers

Reply via email to