[ https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553405#comment-15553405 ]
Michael Armbrust commented on SPARK-15406: ------------------------------------------ The programming guide has a pretty good description of the model we use in structured streaming: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html At a high level, this means for kafka, we pick ranges of offsets to process in each batch and write those to a WAL before processing. We then transactionally commit updates to our internal state and to the supported sinks. This is a much higher level of abstraction that DStreams (just like RDDs vs DataFrames), so I'm very curious to hear from users what works and what is missing. > Structured streaming support for consuming from Kafka > ----------------------------------------------------- > > Key: SPARK-15406 > URL: https://issues.apache.org/jira/browse/SPARK-15406 > Project: Spark > Issue Type: New Feature > Reporter: Cody Koeninger > > This is the parent JIRA to track all the work for the building a Kafka source > for Structured Streaming. Here is the design doc for an initial version of > the Kafka Source. > https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing > ================== Old description ========================= > Structured streaming doesn't have support for kafka yet. I personally feel > like time based indexing would make for a much better interface, but it's > been pushed back to kafka 0.10.1 > https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org