> > "Environment" has become almost-but-not-quite assumed to be a docker > container image specification. TBH I haven't dug in deeply recently but > this is how I see it discussed. I am aware that there are LOOPBACK and > EMBEDDED environments as well.
Docker container images only cover the software deps part of the "Environment", right? A runner still needs to figure out hardware requirements for provisioning workers based on the Environment as well. I'm not sure how that might be interpretable from a docker URL. > In such a regime (where so much is hidden inside a docker URL) > compatibility pretty much has to be runner-determined. First the runner > reinterprets docker URLs (what aspects are important is partially > runner-specific) and deps and in Joey's case licenses more abstractly, then > has its own logic to decide which environments it can merge. This logic is > inherently cross-SDK, both in terms of versions and languages. We could > have a spec in docs and proto but it can't live as code in a particular SDK. I think the spec may also need ways to implement user-configurable merge strategies. e.g. you can imagine that a gpu-rich user may be happy with greedy fusion while a user with few GPUs would want only GPU-requiring transforms running on their GPU nodes On Tue, Jul 15, 2025 at 4:42 PM Kenneth Knowles <k...@apache.org> wrote: > Altered subject because this has come up in a number of contexts and I > wonder if now is a time to have fresh thoughts on it. > > "Environment" has become almost-but-not-quite assumed to be a docker > container image specification. TBH I haven't dug in deeply recently but > this is how I see it discussed. I am aware that there are LOOPBACK and > EMBEDDED environments as well. > > The intention behind having environments is really that they could be > somewhat abstract and *often* recognized by runners and elided for > efficiency. It is an anit-goal to have the entirety of the contents of a > container be "the spec" for an environment. This is begging to be trapped > by Hyrum's Law. > > All that said, we are where we are. My presumption, for a while now, is > that runners would have to recognize and/or parse docker images and > *reinterpret* them as abstract specifications of environments. In other > words the URL for the Beam Java SDK harness container (plus deps) would be > reinterpreted as "default Beam Java SDK harness" and the runner would then > run in any compatible way it desired. > > For example it is *intended* that non-portable runners could be seamlessly > reused after verifying that all environments are compatible with their > non-portable execution style. The Flink/Spark/Samza runner executing an > all-Java pipeline via the portable gRPC protocols is a huge missed > opportunity, just throwing existing functionality and performance away. > > In such a regime (where so much is hidden inside a docker URL) > compatibility pretty much has to be runner-determined. First the runner > reinterprets docker URLs (what aspects are important is partially > runner-specific) and deps and in Joey's case licenses more abstractly, then > has its own logic to decide which environments it can merge. This logic is > inherently cross-SDK, both in terms of versions and languages. We could > have a spec in docs and proto but it can't live as code in a particular SDK. > > Kenn > > On Thu, Jul 3, 2025 at 4:34 AM Joey Tran <joey.t...@schrodinger.com> > wrote: > >> >> On Tue, Jul 1, 2025 at 2:37 PM Danny McCormick via dev < >> dev@beam.apache.org> wrote: >> >>> I think it is probably reasonable to automate this when a GPU resource >>> hint is used. I think we still need to expose this as a config option for >>> the ML containers (and it is the same with distroless) since it is pretty >>> difficult to say with confidence that those images are/aren't needed (even >>> if you're using a transform like RunInference, maybe you're using Spacy >>> which isn't a default dependency included in the ML images) and there is a >>> cost to using them (longer startup times). >>> >>> > This being the messy world of ML, would these images be >>> mahine/accelerator agnostic? >>> >>> That is the goal (at least to be agnostic within GPU types), and the >>> images will be as simple as possible to accommodate this. I think building >>> from an Nvidia base should accomplish this for most cases. For anything >>> beyond that, I think it is reasonable to ask users to build their own >>> container. >>> >>> On Tue, Jul 1, 2025 at 1:36 PM Robert Bradshaw <rober...@waymo.com> >>> wrote: >>> >>>> On Tue, Jul 1, 2025 at 10:32 AM Kenneth Knowles <k...@apache.org> >>>> wrote: >>>> > >>>> > Obligatory question: can we automate this? Specifically: can we >>>> publish the ML-specific containers and then use them as appropriate without >>>> making it a user-facing knob? >>>> >>>> +1 >>>> >>>> Transforms can declare their own environments. The only problem with >>>> this is that distinct environments prohibit fusion--we need a way to >>>> say that a given environment is a superset of another. (We can do this >>>> with dependencies, but not with arbitrary docker images.) (One could >>>> possibly get away with the "AnyOf" environment as the base environment >>>> as well, if we define (and enforce) a preference order.) >>>> >>>> >> This comes up a lot for us (Schrodinger). e.g. our runner allows for >> transforms to specify what licenses they require, but the current rules for >> environment compatibility make it difficult to allow transforms that have >> no license requirements to fuse with environments that do have requirements >> (as a workaround, we just implement this through transform annotations). >> >> It'd also be really convenient for us since we don't ship our software >> with GCP libraries so we need a separate environment for GCP-transforms. >> Allowing fusion of GCP-transforms with non-GCP-transforms will be a bit >> difficult with the current system. >> >> >> >> This being the messy world of ML, would these images be >>>> mahine/accelerator agnostic? >>>> >>>> > Kenn >>>> > >>>> > On Mon, Jun 30, 2025 at 12:07 PM Danny McCormick via dev < >>>> dev@beam.apache.org> wrote: >>>> >> >>>> >> Hey everyone, I'd like to propose publishing some ML-specific Beam >>>> containers alongside our normal base containers. The end result would be >>>> allowing users to specify `--sdk_container_image=ml` or >>>> `--sdk_container_image=gpu` so that their jobs run in containers which work >>>> well with ML/GPU jobs. >>>> >> >>>> >> I put together a tiny design, please take a look and let me know >>>> what you think. >>>> >> >>>> >> >>>> https://docs.google.com/document/d/1JcVFJsPbVvtvaYdGi-DzWy9PIIYJhL7LwWGEXt2NZMk/edit?usp=sharing >>>> >> >>>> >> Thanks, >>>> >> Danny >>>> >>>