wenxuanguan commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS 
sink for Structured Streaming
URL: https://github.com/apache/spark/pull/25618#issuecomment-530713693
 
 
   @gaborgsomogyi @HeartSaVioR Thanks for your reply.
   About transaction timeout, as described above and in the [mail 
list](http://apache-spark-developers-list.1001551.n3.nabble.com/SS-Kafka-EOS-transaction-timeout-solution-td27797.html),
  I think at present the only way to address the issue is: 
   1. lost data when transaction timeout.
   2. resend data when we find transaction timeout.
   3. set `transactional.timeout.ms` to a larger value, see `Integer.MAX_VALUE`.
   
   The first way is rejected since it leads to data loss. The second one is 
rejected since it is not consistent with exactly-once semantics. 
   The last one seems workable. Because commit transaction is lightweight 
action, If transaction is not recovered after `Integer.MAX_VALUE` ms,  we can 
say it is caused by Kafka cluster fatal error, and transaction can`t be 
recovered. At this situation, we can provider a way to resend data when user 
want to.
   
   @gaborgsomogyi About a new Kafka API to resolve Kafka transaction in 
distributed system, as @HeartSaVioR mentioned above,  producer transaction is 
not provided only for kafka stream, and I also think we should adapt it to 
Spark/Flink/Hive.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to