[ https://issues.apache.org/jira/browse/SPARK-15842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323837#comment-15323837 ]
Prashant Sharma commented on SPARK-15842: ----------------------------------------- Thank you for making it clear. Actual question I had was, "What if we could give exactly-once guarantees only for a configurable amount of time ?" In some sense, even socket stream can have the concept of per record offset, by introducing some kind of control bit. But certainly, it does not support the features(like replay an arbitrary sequence of past data and so on.) most message queues come built in. Also, having this would require our own mechanism to support end-to-end exactly once guarantees and that is actually non trivial as one would need receiver as a long running thread and then have to worry about their failover etc.. Address challenges like scaling. This certainly puts it at odds with current design of structured streaming. Also, any one who would like to use socket stream, can always deploy kafka or similar message queue as middleware and have all the guarantees that streaming intends to provide. > Add support for socket stream. > ------------------------------ > > Key: SPARK-15842 > URL: https://issues.apache.org/jira/browse/SPARK-15842 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming > Reporter: Prashant Sharma > Assignee: Prashant Sharma > > Streaming so far has offset based sources with all the available sources like > file-source and memory-source that do not need additional capabilities to > implement offset for any given range. > Socket stream at OS level has a very tiny buffer. Many message queues have > the ability to keep the message lingering until it is read by the receiver > end. ZeroMQ is one such example. However in the case of socket stream, this > is not supported. > The challenge here would be to implement a way to buffer for a configurable > amount of time and discuss strategies for overflow and underflow. > This JIRA will form the basis for implementing sources which do not have > native support for lingering a message for any amount of time until it is > read. It deals with design doc if necessary and supporting code to implement > such sources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org