[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

Saisai Shao (JIRA) Thu, 04 Sep 2014 18:53:11 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122274#comment-14122274
 ]


Saisai Shao commented on SPARK-3129:
------------------------------------

Hi [~hshreedharan]], thanks for your reply, is this PR 
(https://github.com/apache/spark/pull/1195) the one you mentioned about 
storeReliably()? 

According to my knowledge, this API aims to store bunch of messages into BM 
directly to make it reliable, but for some receiver like Kafka, socket and 
others, data is injected one by one message, we can't call storeReliably() each 
time because of efficiency and throughput concern, so we need to store these 
data locally to some amount, and then flush to BM using storeReliably(). So I 
think data will potentially be lost as we store it locally. These days I 
thought about WAL things, IMHO i think WAL would be a better solution compared 
to blocked store API.

> Prevent data loss in Spark Streaming
> ------------------------------------
>
>                 Key: SPARK-3129
>                 URL: https://issues.apache.org/jira/browse/SPARK-3129
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>         Attachments: StreamingPreventDataLoss.pdf
>
>
> Spark Streaming can small amounts of data when the driver goes down - and the 
> sending system cannot re-send the data (or the data has already expired on 
> the sender side). The document attached has more details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

Reply via email to