I just want to highlight that "just returns its inputs" has a couple of
non-useless instantiations:

 - programmatically generating a transform (as in SqlTransform)
 - rearranging the Java or Python data structures (like PCollectionList and
PCollectionTuple) that are noops in the protocol buffers

Kenn

On Wed, Jan 13, 2021 at 8:01 AM Brian Hulette <bhule...@google.com> wrote:

> Thanks Robert. Docs PR is up: https://github.com/apache/beam/pull/13744
>
> Brian
>
> On Tue, Jan 12, 2021 at 2:04 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> Yes, a PTansform can have no sub-transforms, as long as it only returns
>> its inputs. Updating the docs would be a good idea.
>>
>> On Tue, Jan 12, 2021 at 1:04 PM Brian Hulette <bhule...@google.com>
>> wrote:
>>
>>> A recent bug with SqlTransform on Dataflow Runner V2 [1] revealed an
>>> interesting ambiguity in the Beam model: it's not clear if a composite
>>> transform is allowed to have zero sub-transforms [2]. This may sound like
>>> an academic concern, but it can happen if a PTransform returns its own
>>> input, making it a no-op.
>>>
>>> I tend to agree with Kenn's comment in the jira that we should allow it.
>>> If we don't this puts a burden on SDKs, they would need to either
>>> a) detect when a PTransform returns one of its inputs and raise an
>>> error, or
>>> b) find and replace any such no-ops before generating a portable
>>> pipeline graph
>>>
>>> If there aren't any objections to allowing "empty" composites I'll send
>>> a PR to clarify this in beam_runner_api.proto
>>>
>>> Brian
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-11614
>>> [2]
>>> https://github.com/apache/beam/blob/05c8471b27e03e5611a2a13137c4a785f2d17fc9/model/pipeline/src/main/proto/beam_runner_api.proto#L152-L155
>>>
>>

Reply via email to