I was wondering when checkpointing is enabled, who does the actual work? The streaming datasource or the execution engine/driver?
I have written a small/trivial datasource that just generates strings. After enabling checkpointing, I do see a folder being created under the checkpoint folder, but there's nothing else in there. Same question for write-ahead and recovery? And on a restart from a failed streaming session - who should set the offsets? The driver/Spark or the datasource? Any pointers to design docs would also be greatly appreciated. Thanks, Jayesh