Re: Idempotent count

Arush Kharbanda Tue, 17 Mar 2015 23:50:17 -0700

Hi

Yes spark streaming is capable of stateful stream processing. With or
without state is a way of classifying state.
Checkpoints hold metadata and Data.


Thanks


On Wed, Mar 18, 2015 at 4:00 AM, Binh Nguyen Van <binhn...@gmail.com> wrote:

> Hi all,
>
> I am new to Spark so please forgive me if my questions is stupid.
> I am trying to use Spark-Streaming in an application that read data
> from a queue (Kafka) and do some aggregation (sum, count..) and
> then persist result to an external storage system (MySQL, VoltDB...)
>
> From my understanding of Spark-Streaming, I can have two ways
> of doing aggregation:
>
>    - Stateless: I don't have to keep state and just apply new delta
>    values to the external system. From my understanding, doing in this way I
>    may end up with over counting when there is failure and replay.
>    - Statefull: Use checkpoint to keep state and blindly save new state
>    to external system. Doing in this way I have correct aggregation result but
>    I have to keep data in two places (state and external system)
>
> My questions are:
>
>    - Is my understanding of Stateless and Statefull aggregation correct?
>    If not please correct me!
>    - For the Statefull aggregation, What does Spark-Streaming keep when
>    it saves checkpoint?
>
> Please kindly help!
>
> Thanks
> -Binh
>



-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com

Re: Idempotent count

Reply via email to