[ 
https://issues.apache.org/jira/browse/SPARK-15842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323006#comment-15323006
 ] 

Tathagata Das commented on SPARK-15842:
---------------------------------------

This is slightly at odds with the fundamental design of structured streaming 
source. The semantics of such sources is that it should be able to exactly 
replay an arbitrary sequence of past data in the stream, using a range of 
offsets. This means that only streaming sources like Kafka and Kinesis (which 
have the concept of per-record offset) fit into this model. This is the 
assumption we have made to achieve end-to-end exactly-once guarantees. 

So a socket stream does not quite fit into this model. 

> Add support for socket stream.
> ------------------------------
>
>                 Key: SPARK-15842
>                 URL: https://issues.apache.org/jira/browse/SPARK-15842
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Streaming
>            Reporter: Prashant Sharma
>            Assignee: Prashant Sharma
>
> Streaming so far has offset based sources with all the available sources like 
> file-source and memory-source that do not need additional capabilities to 
> implement offset for any given range.
> Socket stream at OS level has a very tiny buffer. Many message queues have 
> the ability to keep the message lingering until it is read by the receiver 
> end. ZeroMQ is one such example. However in the case of socket stream, this 
> is not supported. 
> The challenge here would be to implement a way to  buffer for a configurable 
> amount of time and discuss strategies for overflow and underflow.
> This JIRA will form the basis for implementing sources which do not have 
> native support for lingering a message for any amount of time until it is 
> read. It deals with design doc if necessary and supporting code to implement 
> such sources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to