That's a good question. I think the main difference is a matter of scope. Annotations would apply to a PTransform while an environment applies to sets of transforms. A difference is the optional nature of the annotations they don't affect correctness. Runners don't need to do anything with them and still execute the pipeline correctly.
Consider a privacy analysis on a pipeline graph. An annotation indicating that a transform provides a certain level of anonymization can be used in an analysis to determine if the downstream transforms are encountering raw data or not. >From my understanding (which can be wrong) environments are rigid. Transforms in different environments can't be fused. "This is the python env", "this is the java env" can't be merged together. It's not clear to me that we have defined when environments are safely fuseable outside of equality. There's value in that simplicity. AFIACT environment has less to do with the machines a pipeline is executing on than it does about the kinds of SDK pipelines it understands and can execute. On Mon, Nov 16, 2020, 10:36 AM Chad Dombrova <[email protected]> wrote: > >> Another example of an optional annotation is marking a transform to run >> on secure hardware, or to give hints to profiling/dynamic analysis tools. >> > > There seems to be a lot of overlap between this idea and Environments. > Can you talk about how you feel they may be different or related? For > example, I could see annotations as a way of tagging transforms with an > Environment, or I could see Environments becoming a specialized form of > annotation. > > -chad > >
