Correct, Kafka doesn't support rollbacks of the producer. In Flink there is the RollingSink which supports transactional rolling files. Admittedly, that is the only one. Still, checkpointing sinks in Beam could be useful for users who are concerned about exactly once semantics. I'm not sure whether we can implement something similar with the bundle mechanism.
On Mon, May 2, 2016 at 11:50 PM, Raghu Angadi <rang...@google.com.invalid> wrote: > What are good examples of streaming sinks that support checkpointing (or > transactions/rollbacks)? I don't Kafka supports a rollback. > > On Mon, May 2, 2016 at 2:54 AM, Maximilian Michels <m...@apache.org> wrote: > >> Yes, I would expect sinks to provide similar additional interfaces >> like sources, e.g. checkpointing. We could also use the >> startBundle/processElement/finishBundle lifecycle methods to implement >> checkpointing. I just wonder, if we want to make it more explicit. >> Also, does it make sense that sinks can return a PCollection? You can >> return PDone but you don't have to. >> >> Since sinks are fundamental in streaming pipelines, it just seemed odd >> to me that there is not dedicated interface. I understand a bit >> clearer now that it is not viewed as crucial because we can use >> existing primitives to create sinks. In a way, that might be elegant >> but also less explicit. >> >> On Fri, Apr 29, 2016 at 11:00 PM, Frances Perry <f...@google.com.invalid> >> wrote: >> >> >> >> @Frances Sources are not simple DoFns. They add additional >> >> functionality, e.g. checkpointing, watermark generation, creating >> >> splits. If we want sinks to be portable, we should think about a >> >> dedicated interface. At least for the checkpointing. >> >> >> > >> > We might be mixing sources and sinks in this conversation. ;-) Sources >> > definitely provide additional functionality as you mentioned. But at >> least >> > currently, sinks don't provide any new primitive functionality. Are you >> > suggestion there needs to be a checkpointing interface for sinks beyond >> > DoFn's bundle finalization? (Note that the existing Write for batch is >> just >> > a PTransform based around ParDo.) >>