Reliable SQS Receiver for Spark Streaming

2015-06-13 Thread cizmazia
I would like to have a Spark Streaming *SQS Receiver* which deletes SQS
messages only *after* they were successfully stored on S3.
For this a *Custom Receiver* can be implemented with the semantics of the
Reliable Receiver.
The store(multiple-records) call  blocks until the given records have been
stored and replicated inside Spark
https://spark.apache.org/docs/latest/streaming-custom-receivers.html#receiver-reliability
 
.
If the *write-ahead logs* are enabled, all the data received from a receiver
gets written into a write  ahead log in the configuration checkpoint
directory
https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing
 
. The checkpoint directory can be pointed to S3.
After the store(multiple-records) blocking call finishes, are the records
already stored in the checkpoint directory (and thus can be safely deleted
from SQS)?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reliable-SQS-Receiver-for-Spark-Streaming-tp23302.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Reliable SQS Receiver for Spark Streaming

2015-06-13 Thread Akhil Das
Yes, if you have enabled WAL and checkpointing then after the store, you
can simply delete the SQS Messages from your receiver.

Thanks
Best Regards

On Sat, Jun 13, 2015 at 6:14 AM, Michal Čizmazia mici...@gmail.com wrote:

 I would like to have a Spark Streaming SQS Receiver which deletes SQS
 messages only after they were successfully stored on S3.

 For this a Custom Receiver can be implemented with the semantics of
 the Reliable Receiver.

 The store(multiple-records) call blocks until the given records have
 been stored and replicated inside Spark.

 If the write-ahead logs are enabled, all the data received from a
 receiver gets written into a write ahead log in the configuration
 checkpoint directory. The checkpoint directory can be pointed to S3.

 After the store(multiple-records) blocking call finishes, are the
 records already stored in the checkpoint directory (and thus can be
 safely deleted from SQS)?

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Reliable SQS Receiver for Spark Streaming

2015-06-13 Thread Michal Čizmazia
Thanks Akhil!

I just looked it up in the code as well.

Receiver.store(ArrayBuffer[T], ...)
ReceiverSupervisorImpl.pushArrayBuffer(ArrayBuffer[T], ...)
ReceiverSupervisorImpl.pushAndReportBlock(...)
WriteAheadLogBasedBlockHandler.storeBlock(...)
This implementation stores the block into the block
manager as well as a write ahead log. It does this in parallel, using
Scala Futures, and returns only after the block has been stored in
both places.

https://www.codatlas.com/github.com/apache/spark/master/streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala?keyword=WriteAheadLogBasedBlockHandlerline=160


On 13 June 2015 at 06:46, Akhil Das ak...@sigmoidanalytics.com wrote:
 Yes, if you have enabled WAL and checkpointing then after the store, you can
 simply delete the SQS Messages from your receiver.

 Thanks
 Best Regards

 On Sat, Jun 13, 2015 at 6:14 AM, Michal Čizmazia mici...@gmail.com wrote:

 I would like to have a Spark Streaming SQS Receiver which deletes SQS
 messages only after they were successfully stored on S3.

 For this a Custom Receiver can be implemented with the semantics of
 the Reliable Receiver.

 The store(multiple-records) call blocks until the given records have
 been stored and replicated inside Spark.

 If the write-ahead logs are enabled, all the data received from a
 receiver gets written into a write ahead log in the configuration
 checkpoint directory. The checkpoint directory can be pointed to S3.

 After the store(multiple-records) blocking call finishes, are the
 records already stored in the checkpoint directory (and thus can be
 safely deleted from SQS)?

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Reliable SQS Receiver for Spark Streaming

2015-06-12 Thread Michal Čizmazia
I would like to have a Spark Streaming SQS Receiver which deletes SQS
messages only after they were successfully stored on S3.

For this a Custom Receiver can be implemented with the semantics of
the Reliable Receiver.

The store(multiple-records) call blocks until the given records have
been stored and replicated inside Spark.

If the write-ahead logs are enabled, all the data received from a
receiver gets written into a write ahead log in the configuration
checkpoint directory. The checkpoint directory can be pointed to S3.

After the store(multiple-records) blocking call finishes, are the
records already stored in the checkpoint directory (and thus can be
safely deleted from SQS)?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org