Re: Query around Spark Checkpoints

Jungtaek Lim Tue, 29 Sep 2020 16:39:57 -0700

Sorry I have no idea on Delta Lake. You may get a better answer from Delta
Lake mailing list.


One thing is clear that stateful processing is simply an essential feature
on almost every streaming framework. If you're struggling with something
around the state feature and trying to find a workaround then probably
something is going wrong. Please feel free to share it.

Thanks,
Jungtaek Lim (HeartSaVioR)

2020년 9월 30일 (수) 오전 1:14, Bryan Jeffrey <bryan.jeff...@gmail.com>님이 작성:

> Jungtaek,
>
> How would you contrast stateful streaming with checkpoint vs. the idea of
> writing updates to a Delta Lake table, and then using the Delta Lake table
> as a streaming source for our state stream?
>
> Thank you,
>
> Bryan
>
> On Mon, Sep 28, 2020 at 9:50 AM Debabrata Ghosh <mailford...@gmail.com>
> wrote:
>
>> Thank You Jungtaek and Amit ! This is very helpful indeed !
>>
>> Cheers,
>>
>> Debu
>>
>> On Mon, Sep 28, 2020 at 5:33 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>>
>>> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
>>>
>>> You would need to implement CheckpointFileManager by yourself, which is
>>> tightly integrated with HDFS (parameters and return types of methods are
>>> mostly from HDFS). That wouldn't mean it's impossible to
>>> implement CheckpointFileManager against a non-filesystem, but it'd be
>>> non-trivial to override all of the functionalities and make it work
>>> seamlessly.
>>>
>>> Required consistency is documented via javadoc of CheckpointFileManager
>>> - please go through reading it, and evaluate whether your target storage
>>> can fulfill the requirement.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> On Mon, Sep 28, 2020 at 3:04 AM Amit Joshi <mailtojoshia...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> As far as I know, it depends on whether you are using spark streaming
>>>> or structured streaming.
>>>> In spark streaming you can write your own code to checkpoint.
>>>> But in case of structured streaming it should be file location.
>>>> But main question in why do you want to checkpoint in
>>>> Nosql, as it's eventual consistence.
>>>>
>>>>
>>>> Regards
>>>> Amit
>>>>
>>>> On Sunday, September 27, 2020, Debabrata Ghosh <mailford...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>     I had a query around Spark checkpoints - Can I store the
>>>>> checkpoints in NoSQL or Kafka instead of Filesystem ?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Debu
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Query around Spark Checkpoints

Reply via email to