Re: Query around Spark Checkpoints

Bryan Jeffrey Tue, 29 Sep 2020 09:15:17 -0700

Jungtaek,

How would you contrast stateful streaming with checkpoint vs. the idea of
writing updates to a Delta Lake table, and then using the Delta Lake table
as a streaming source for our state stream?


Thank you,

Bryan

On Mon, Sep 28, 2020 at 9:50 AM Debabrata Ghosh <mailford...@gmail.com>
wrote:

> Thank You Jungtaek and Amit ! This is very helpful indeed !
>
> Cheers,
>
> Debu
>
> On Mon, Sep 28, 2020 at 5:33 AM Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
>>
>> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
>>
>> You would need to implement CheckpointFileManager by yourself, which is
>> tightly integrated with HDFS (parameters and return types of methods are
>> mostly from HDFS). That wouldn't mean it's impossible to
>> implement CheckpointFileManager against a non-filesystem, but it'd be
>> non-trivial to override all of the functionalities and make it work
>> seamlessly.
>>
>> Required consistency is documented via javadoc of CheckpointFileManager -
>> please go through reading it, and evaluate whether your target storage can
>> fulfill the requirement.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> On Mon, Sep 28, 2020 at 3:04 AM Amit Joshi <mailtojoshia...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> As far as I know, it depends on whether you are using spark streaming or
>>> structured streaming.
>>> In spark streaming you can write your own code to checkpoint.
>>> But in case of structured streaming it should be file location.
>>> But main question in why do you want to checkpoint in
>>> Nosql, as it's eventual consistence.
>>>
>>>
>>> Regards
>>> Amit
>>>
>>> On Sunday, September 27, 2020, Debabrata Ghosh <mailford...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>     I had a query around Spark checkpoints - Can I store the
>>>> checkpoints in NoSQL or Kafka instead of Filesystem ?
>>>>
>>>> Regards,
>>>>
>>>> Debu
>>>>
>>>

Re: Query around Spark Checkpoints

Reply via email to