[ https://issues.apache.org/jira/browse/SPARK-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637541#comment-16637541 ]
Shixiong Zhu commented on SPARK-25005: -------------------------------------- [~qambard] If `poll` returns and offset gets changed, it means Kafka consumer fetches something but all of messages are invisible so consumer return empty. If `poll` returns but offset doesn't change, it means Kafka fetches nothing before timeout. In this case, we just throw "TimeoutException". Spark will retry the task or just fail the job. Large GC pause can cause timeout and the user should tune the configs to avoid this happening. We cannot do much in Spark. > Structured streaming doesn't support kafka transaction (creating empty offset > with abort & markers) > --------------------------------------------------------------------------------------------------- > > Key: SPARK-25005 > URL: https://issues.apache.org/jira/browse/SPARK-25005 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.3.1 > Reporter: Quentin Ambard > Assignee: Shixiong Zhu > Priority: Major > Fix For: 2.4.0 > > > Structured streaming can't consume kafka transaction. > We could try to apply SPARK-24720 (DStream) logic to Structured Streaming > source -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org