[jira] [Commented] (SPARK-15406) Structured streaming support for consuming from Kafka

Cody Koeninger (JIRA) Thu, 06 Oct 2016 15:58:38 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553476#comment-15553476
 ]


Cody Koeninger commented on SPARK-15406:
----------------------------------------

You cannot have reliable delivery semantics to a downstream data store (i.e. 
what people usually care about when they say exactly once) without either 
idempotent writes, or transactional writes.  The structured streaming api as it 
exists today provides no way to specify offsets on startup, and no batched way 
to access offsets for insertion into a data store, which means in practical 
terms that exactly-once depends on idempotence.  Idempotence is not always an 
option.

The existing DStream allows me to get reliable delivery of arbitrary 
aggregations to a partitioned, scalable downstream data store.  The structured 
streaming wrapper around the DStream (which honestly is what it is currently) 
does not allow that.

I understand that you want to split the interface from the implementation, but 
I as yet have heard no concrete ideas on how to make the implementation 
meaningfully different from DStreams when it comes to Kafka (which is pretty 
clearly the primary use case).

> Structured streaming support for consuming from Kafka
> -----------------------------------------------------
>
>                 Key: SPARK-15406
>                 URL: https://issues.apache.org/jira/browse/SPARK-15406
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Cody Koeninger
>
> This is the parent JIRA to track all the work for the building a Kafka source 
> for Structured Streaming. Here is the design doc for an initial version of 
> the Kafka Source.
> https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing
> ================== Old description =========================
> Structured streaming doesn't have support for kafka yet.  I personally feel 
> like time based indexing would make for a much better interface, but it's 
> been pushed back to kafka 0.10.1
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15406) Structured streaming support for consuming from Kafka

Reply via email to