wenxuanguan commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS sink for Structured Streaming URL: https://github.com/apache/spark/pull/25618#issuecomment-530713693 @gaborgsomogyi @HeartSaVioR Thanks for your reply. About transaction timeout, as described above and in the [mail list](http://apache-spark-developers-list.1001551.n3.nabble.com/SS-Kafka-EOS-transaction-timeout-solution-td27797.html), I think at present the only way to address the issue is: 1. lost data when transaction timeout. 2. resend data when we find transaction timeout. 3. set `transactional.timeout.ms` to a larger value, see `Integer.MAX_VALUE`. The first way is rejected since it leads to data loss. The second one is rejected since it is not consistent with exactly-once semantics. The last one seems workable. Because commit transaction is lightweight action, If transaction is not recovered after `Integer.MAX_VALUE` ms, we can say it is caused by Kafka cluster fatal error, and transaction can`t be recovered. At this situation, we can provider a way to resend data when user want to. @gaborgsomogyi About a new Kafka API to resolve Kafka transaction in distributed system, as @HeartSaVioR mentioned above, producer transaction is not provided only for kafka stream, and I also think we should adapt it to Spark/Flink/Hive.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org