Yes, I would expect sinks to provide similar additional interfaces
like sources, e.g. checkpointing. We could also use the
startBundle/processElement/finishBundle lifecycle methods to implement
checkpointing. I just wonder, if we want to make it more explicit.
Also, does it make sense that sinks can return a PCollection? You can
return PDone but you don't have to.

Since sinks are fundamental in streaming pipelines, it just seemed odd
to me that there is not dedicated interface. I understand a bit
clearer now that it is not viewed as crucial because we can use
existing primitives to create sinks. In a way, that might be elegant
but also less explicit.

On Fri, Apr 29, 2016 at 11:00 PM, Frances Perry <f...@google.com.invalid> wrote:
>>
>> @Frances Sources are not simple DoFns. They add additional
>> functionality, e.g. checkpointing, watermark generation, creating
>> splits. If we want sinks to be portable, we should think about a
>> dedicated interface. At least for the checkpointing.
>>
>
> We might be mixing sources and sinks in this conversation. ;-) Sources
> definitely provide additional functionality as you mentioned. But at least
> currently, sinks don't provide any new primitive functionality. Are you
> suggestion there needs to be a checkpointing interface for sinks beyond
> DoFn's bundle finalization? (Note that the existing Write for batch is just
> a PTransform based around ParDo.)

Reply via email to