Yes, I would expect sinks to provide similar additional interfaces like sources, e.g. checkpointing. We could also use the startBundle/processElement/finishBundle lifecycle methods to implement checkpointing. I just wonder, if we want to make it more explicit. Also, does it make sense that sinks can return a PCollection? You can return PDone but you don't have to.
Since sinks are fundamental in streaming pipelines, it just seemed odd to me that there is not dedicated interface. I understand a bit clearer now that it is not viewed as crucial because we can use existing primitives to create sinks. In a way, that might be elegant but also less explicit. On Fri, Apr 29, 2016 at 11:00 PM, Frances Perry <f...@google.com.invalid> wrote: >> >> @Frances Sources are not simple DoFns. They add additional >> functionality, e.g. checkpointing, watermark generation, creating >> splits. If we want sinks to be portable, we should think about a >> dedicated interface. At least for the checkpointing. >> > > We might be mixing sources and sinks in this conversation. ;-) Sources > definitely provide additional functionality as you mentioned. But at least > currently, sinks don't provide any new primitive functionality. Are you > suggestion there needs to be a checkpointing interface for sinks beyond > DoFn's bundle finalization? (Note that the existing Write for batch is just > a PTransform based around ParDo.)