[GitHub] [spark] HeartSaVioR edited a comment on issue #25618: [SPARK-28908][SS]Implement Kafka EOS sink for Structured Streaming

GitBox Sat, 14 Sep 2019 15:47:15 -0700

HeartSaVioR edited a comment on issue #25618: [SPARK-28908][SS]Implement Kafka
EOS sink for Structured Streaming
URL: https://github.com/apache/spark/pull/25618#issuecomment-531519195

> About a new Kafka API to resolve Kafka transaction in distributed system,
as @HeartSaVioR mentioned above, Kafka producer transaction is not provided
only for Kafka Stream, and a new API for Spark/Flink/HIve may be customized. So
I also think we should adapt Spark/Flink/Hive to it.

Sorry you are understanding my comment in opposite way. My claim was that
Kafka producer transaction is designed "for" Kafka Stream. Please take a look
at my comment thoughtfully.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics

According to design doc, Kafka community took the approach "transaction per
task":

> In this design we take the approach to assign a separate producer per task
so that any transaction contains only output messages of a single task.

which they never need to worry about transaction across multiple
connections/JVMs - unlike other streaming frameworks. According to the
information I guess Kafka stream should leverage Kafka topic as shuffle storage
and have multiple connected `read-process-write` topologies to run user
application. (So ensuring exactly-once for all of connected parts brings
exactly-once for overall graph/) That's completely coupled with Kafka and Spark
can't (and shouldn't) do the same.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on issue #25618: [SPARK-28908][SS]Implement Kafka EOS sink for Structured Streaming

Reply via email to