[ 
https://issues.apache.org/jira/browse/SPARK-15842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323837#comment-15323837
 ] 

Prashant Sharma commented on SPARK-15842:
-----------------------------------------

Thank you for making it clear.

Actual question I had was, "What if we could give exactly-once guarantees only 
for a configurable amount of time ?"

     In some sense, even socket stream can have the concept of per record 
offset, by introducing some kind of control bit. But certainly, it does not 
support the features(like replay an arbitrary sequence of past data and so on.) 
most message queues come built in. Also, having this would require our own 
mechanism to support end-to-end exactly once guarantees and that is actually 
non trivial as one would need receiver as a long running thread and then have 
to worry about their failover etc.. Address challenges like scaling.

  This certainly puts it at odds with current design of structured streaming. 

Also, any one who would like to use socket stream, can always deploy kafka or 
similar message queue as middleware and have all the guarantees that streaming 
intends to provide.



> Add support for socket stream.
> ------------------------------
>
>                 Key: SPARK-15842
>                 URL: https://issues.apache.org/jira/browse/SPARK-15842
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Streaming
>            Reporter: Prashant Sharma
>            Assignee: Prashant Sharma
>
> Streaming so far has offset based sources with all the available sources like 
> file-source and memory-source that do not need additional capabilities to 
> implement offset for any given range.
> Socket stream at OS level has a very tiny buffer. Many message queues have 
> the ability to keep the message lingering until it is read by the receiver 
> end. ZeroMQ is one such example. However in the case of socket stream, this 
> is not supported. 
> The challenge here would be to implement a way to  buffer for a configurable 
> amount of time and discuss strategies for overflow and underflow.
> This JIRA will form the basis for implementing sources which do not have 
> native support for lingering a message for any amount of time until it is 
> read. It deals with design doc if necessary and supporting code to implement 
> such sources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to