[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122274#comment-14122274 ]
Saisai Shao commented on SPARK-3129: ------------------------------------ Hi [~hshreedharan]], thanks for your reply, is this PR (https://github.com/apache/spark/pull/1195) the one you mentioned about storeReliably()? According to my knowledge, this API aims to store bunch of messages into BM directly to make it reliable, but for some receiver like Kafka, socket and others, data is injected one by one message, we can't call storeReliably() each time because of efficiency and throughput concern, so we need to store these data locally to some amount, and then flush to BM using storeReliably(). So I think data will potentially be lost as we store it locally. These days I thought about WAL things, IMHO i think WAL would be a better solution compared to blocked store API. > Prevent data loss in Spark Streaming > ------------------------------------ > > Key: SPARK-3129 > URL: https://issues.apache.org/jira/browse/SPARK-3129 > Project: Spark > Issue Type: New Feature > Reporter: Hari Shreedharan > Assignee: Hari Shreedharan > Attachments: StreamingPreventDataLoss.pdf > > > Spark Streaming can small amounts of data when the driver goes down - and the > sending system cannot re-send the data (or the data has already expired on > the sender side). The document attached has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org