Jungtaek, How would you contrast stateful streaming with checkpoint vs. the idea of writing updates to a Delta Lake table, and then using the Delta Lake table as a streaming source for our state stream?
Thank you, Bryan On Mon, Sep 28, 2020 at 9:50 AM Debabrata Ghosh <mailford...@gmail.com> wrote: > Thank You Jungtaek and Amit ! This is very helpful indeed ! > > Cheers, > > Debu > > On Mon, Sep 28, 2020 at 5:33 AM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> >> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala >> >> You would need to implement CheckpointFileManager by yourself, which is >> tightly integrated with HDFS (parameters and return types of methods are >> mostly from HDFS). That wouldn't mean it's impossible to >> implement CheckpointFileManager against a non-filesystem, but it'd be >> non-trivial to override all of the functionalities and make it work >> seamlessly. >> >> Required consistency is documented via javadoc of CheckpointFileManager - >> please go through reading it, and evaluate whether your target storage can >> fulfill the requirement. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> On Mon, Sep 28, 2020 at 3:04 AM Amit Joshi <mailtojoshia...@gmail.com> >> wrote: >> >>> Hi, >>> >>> As far as I know, it depends on whether you are using spark streaming or >>> structured streaming. >>> In spark streaming you can write your own code to checkpoint. >>> But in case of structured streaming it should be file location. >>> But main question in why do you want to checkpoint in >>> Nosql, as it's eventual consistence. >>> >>> >>> Regards >>> Amit >>> >>> On Sunday, September 27, 2020, Debabrata Ghosh <mailford...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> I had a query around Spark checkpoints - Can I store the >>>> checkpoints in NoSQL or Kafka instead of Filesystem ? >>>> >>>> Regards, >>>> >>>> Debu >>>> >>>