On the runner requirements side: if you have such a list at the pipeline
level, it is an opportunity for the list to be inconsistent with the
contents of the pipeline. For example, if a DoFn is marked "requires stable
input" but not listed at the pipeline level, then the runner may run it
without ensuring it requires stable input.

On the SDK requirements side: the constructing SDK owns the Environment
proto completely, so it is in a position to ensure the involved docker
images support the necessary features. Is it sufficient for each SDK
involved in a cross-language expansion to validate that it understands the
inputs? For example if Python sends a PCollection with a pickle coder to
Java as input to an expansion then it will fail. And conversely if the
returned subgraph outputs a PCollection with a Java custom coder. More
complex use cases that I can imagine all seem futuristic and unlikely to
come to pass (Python passes a pickled DoFn to the Java expansion service
which inserts it into the graph in a way where a Java-based transform would
have to invoke it on every element, etc)

Kenn

On Mon, Feb 10, 2020 at 5:03 PM Brian Hulette <bhule...@google.com> wrote:

> I like the capabilities/requirements idea. Would these capabilities be at
> a level that it would make sense to document in the capabilities matrix?
> i.e. could the URNs be the values of "X" Pablo described here [1].
>
> Brian
>
> [1]
> https://lists.apache.org/thread.html/e93ac64d484551d61e559e1ba0cf4a15b760e69d74c5b1d0549ff74f%40%3Cdev.beam.apache.org%3E
>
> On Mon, Feb 10, 2020 at 3:55 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> With an eye towards cross-language (which includes cross-version)
>> pipelines and services (specifically looking at Dataflow) supporting
>> portable pipelines, there's been a desire to stabilize the portability
>> protos. There are currently many cleanups we'd like to do [1] (some
>> essential, others nice to have); are there others that people would
>> like to see?
>>
>> Of course we would like it to be possible for the FnAPI and Beam
>> itself to continue to evolve. Most of this can be handled by runners
>> understanding various transform URNs, but not all. (An example that
>> comes to mind is support for large iterables [2], or the requirement
>> to observe and respect new fields on a PTransform or its payloads
>> [3]). One proposal for this is to add capabilities and/or
>> requirements. An environment (corresponding generally to an SDK) could
>> adveritize various capabilities (as a list or map of URNs) which a
>> runner can take advantage of without requiring all SDKs to support all
>> features at the same time. For the other way around, we need a way of
>> marking something that a runner must reject if it does not understand
>> it. This could be a set of requirements (again, a list of map of URNs)
>> that designate capabilities required to at least be understood by the
>> runner to faithfully execute this pipeline. (These could be attached
>> to a transform or the pipeline itself.) Do these sound like reasonable
>> additions? Also, would they ever need to be parameterized (map), or
>> would a list suffice?
>>
>> [1] BEAM-2645, BEAM-2822, BEAM-3203, BEAM-3221, BEAM-3223, BEAM-3227,
>> BEAM-3576, BEAM-3577, BEAM-3595, BEAM-4150, BEAM-4180, BEAM-4374,
>> BEAM-5391, BEAM-5649, BEAM-8172, BEAM-8201, BEAM-8271, BEAM-8373,
>> BEAM-8539, BEAM-8804, BEAM-9229, BEAM-9262, BEAM-9266, and BEAM-9272
>> [2]
>> https://lists.apache.org/thread.html/70cac361b659516933c505b513d43986c25c13da59eabfd28457f1f2@%3Cdev.beam.apache.org%3E
>> [3]
>> https://lists.apache.org/thread.html/rdc57f240069c0807eae87ed2ff13d3ee503bc18e5f906d05624e6433%40%3Cdev.beam.apache.org%3E
>>
>

Reply via email to