[ https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553476#comment-15553476 ]
Cody Koeninger commented on SPARK-15406: ---------------------------------------- You cannot have reliable delivery semantics to a downstream data store (i.e. what people usually care about when they say exactly once) without either idempotent writes, or transactional writes. The structured streaming api as it exists today provides no way to specify offsets on startup, and no batched way to access offsets for insertion into a data store, which means in practical terms that exactly-once depends on idempotence. Idempotence is not always an option. The existing DStream allows me to get reliable delivery of arbitrary aggregations to a partitioned, scalable downstream data store. The structured streaming wrapper around the DStream (which honestly is what it is currently) does not allow that. I understand that you want to split the interface from the implementation, but I as yet have heard no concrete ideas on how to make the implementation meaningfully different from DStreams when it comes to Kafka (which is pretty clearly the primary use case). > Structured streaming support for consuming from Kafka > ----------------------------------------------------- > > Key: SPARK-15406 > URL: https://issues.apache.org/jira/browse/SPARK-15406 > Project: Spark > Issue Type: New Feature > Reporter: Cody Koeninger > > This is the parent JIRA to track all the work for the building a Kafka source > for Structured Streaming. Here is the design doc for an initial version of > the Kafka Source. > https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing > ================== Old description ========================= > Structured streaming doesn't have support for kafka yet. I personally feel > like time based indexing would make for a much better interface, but it's > been pushed back to kafka 0.10.1 > https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org