With splittable DoFns, we should be aiming to have 'sources' take
PCollections for input as the default implementation. The common 'Read'
with no inputs will still exist and it would make sense to have PBegin.

We have found that most sinks actually could have meaningful output (e.g.
how many records written, what files were written, dead letter queue for
failed writes) and is useful for others to be able to create relationships
within the graph like write to X and then write to Y.



On Tue, Jul 14, 2020 at 5:42 PM Robert Bradshaw <rober...@google.com> wrote:

> SGTM.
>
> On Tue, Jul 14, 2020 at 5:28 PM Udi Meiri <eh...@google.com> wrote:
> >
> > So it sounds like we should:
> > - Make PBegin public
> > - Deprecate PDone return type in favor of None
> > - Update the programming guide's Composite Transforms section.
> >
> >
> > On Tue, Jul 14, 2020 at 5:13 PM Robert Burke <rob...@frantil.com> wrote:
> >>
> >> For contrast, the Go SDK provides an Impulse transform directly
> (analogous to PBegin, part of the model) and has a ParDo0 (which like PDone
> has no output Pcollections).  The numeral suffixing the go ParDo functions
> indicate the number of Output Pcollections are expected from the passed in
> DoFm.
> >>
> >> On Tue, Jul 14, 2020, 5:03 PM Robert Bradshaw <rober...@google.com>
> wrote:
> >>>
> >>> Yes, PBegin and PDone are used in the SDKs, but are not part of the
> model.
> >>>
> >>> I would be supportive of making PBegin more public to denote that a
> >>> transform is a "root" of the pipeline. PDone was required for Java,
> >>> however I don't think there's any use for it in the Python SDK (a
> >>> transform can simply not return any value (which is equivalent to
> >>> returning None) if it has no outputs.
> >>>
> >>> On Mon, Jul 13, 2020 at 8:17 PM Udi Meiri <eh...@google.com> wrote:
> >>> >
> >>> > Details:
> >>> > One item of interest that came up during the implementation of
> BEAM-10258 [1] is how to treat PTransforms that act like sources or sinks.
> >>> > These transforms have either no input or output PCollections,
> respectively.
> >>> >
> >>> > Internally, we use PBegin and PDone to denote this. (ex: [2])
> >>> > IIUC, PBegin and PDone aren't part of the Beam model, and in the
> pipeline description they manifest as empty input and output lists.
> >>> >
> >>> > To support type hinting, I propose making these types public.
> >>> > They are not currently listed in [3] and the documentation implies
> they're internal.
> >>> > Java SDK already supports these types and makes them public. [4]
> >>> >
> >>> >
> >>> > [1] https://github.com/apache/beam/pull/12009
> >>> > [2]
> https://github.com/apache/beam/blob/e73e1d1cce93930fa3d85046b9bbae7c724926bf/sdks/python/apache_beam/io/gcp/experimental/spannerio.py#L471-L506
> >>> > [3]
> https://github.com/apache/beam/blob/e73e1d1cce93930fa3d85046b9bbae7c724926bf/sdks/python/apache_beam/pvalue.py#L61
> >>> > [4]
> https://github.com/apache/beam/blob/e73e1d1cce93930fa3d85046b9bbae7c724926bf/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L53-L56
>

Reply via email to