Re: PTransform Annotations Proposal

Robert Burke Mon, 16 Nov 2020 10:54:20 -0800

That's a good question.

I think the main difference is a matter of scope. Annotations would apply
to a PTransform while an environment applies to sets of transforms. A
difference is the optional nature of the annotations they don't affect
correctness. Runners don't need to do anything with them and still execute
the pipeline correctly.

Consider a privacy analysis on a pipeline graph. An annotation indicating
that a transform provides a certain level of anonymization can be used in
an analysis to determine if the downstream transforms are encountering raw
data or not.

>From my understanding (which can be wrong) environments are rigid.
Transforms in different environments can't be fused. "This is the python
env", "this is the java env" can't be merged together. It's not clear to me
that we have defined when environments are safely fuseable outside of
equality. There's value in that simplicity.

AFIACT environment has less to do with the machines a pipeline is executing
on than it does about the kinds of SDK pipelines it understands and can
execute.

On Mon, Nov 16, 2020, 10:36 AM Chad Dombrova <[email protected]> wrote:

>
>> Another example of an optional annotation is marking a transform to run
>> on secure hardware, or to give hints to profiling/dynamic analysis tools.
>>
>
> There seems to be a lot of overlap between this idea and Environments.
> Can you talk about how you feel they may be different or related?  For
> example, I could see annotations as a way of tagging transforms with an
> Environment, or I could see Environments becoming a specialized form of
> annotation.
>
> -chad
>
>

Re: PTransform Annotations Proposal

Reply via email to