[jira] [Commented] (KAFKA-12226) High-throughput source tasks fail to commit offsets
[ https://issues.apache.org/jira/browse/KAFKA-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443975#comment-17443975 ] Randall Hauch commented on KAFKA-12226: --- Thanks for catching this, [~dajac]. I did miss merging this to the `3.1` branch -- I recall at the time looking for the branch and not seeing it. I'm in the process or building the branch after merging locally, so you should see this a bit later today. > High-throughput source tasks fail to commit offsets > --- > > Key: KAFKA-12226 > URL: https://issues.apache.org/jira/browse/KAFKA-12226 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > Fix For: 3.1.0 > > > The current source task thread has the following workflow: > # Poll messages from the source task > # Queue these messages to the producer and send them to Kafka asynchronously. > # Add the message to outstandingMessages, or if a flush is currently active, > outstandingMessagesBacklog > # When the producer completes the send of a record, remove it from > outstandingMessages > The commit offsets thread has the following workflow: > # Wait a flat timeout for outstandingMessages to flush completely > # If this times out, add all of the outstandingMessagesBacklog to the > outstandingMessages and reset > # If it succeeds, commit the source task offsets to the backing store. > # Retry the above on a fixed schedule > If the source task is producing records quickly (faster than the producer can > send), then the producer will throttle the task thread by blocking in its > {{send}} method, waiting at most {{max.block.ms}} for space in the > {{buffer.memory}} to be available. This means that the number of records in > {{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to > the size of the producer memory buffer. > This amount of data might take more than {{offset.flush.timeout.ms}} to > flush, and thus the flush will never succeed while the source task is > rate-limited by the producer memory. This means that we may write multiple > hours of data to Kafka and not ever commit source offsets for the connector. > When the task is lost due to a worker failure, hours of data will be > re-processed that otherwise were successfully written to Kafka. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KAFKA-12226) High-throughput source tasks fail to commit offsets
[ https://issues.apache.org/jira/browse/KAFKA-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443699#comment-17443699 ] David Jacot commented on KAFKA-12226: - [~rhauch] I don't see this commit in the 3.1.0 branch. Note that the 3.1.0 branch was cut on November 2nd. > High-throughput source tasks fail to commit offsets > --- > > Key: KAFKA-12226 > URL: https://issues.apache.org/jira/browse/KAFKA-12226 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > Fix For: 3.1.0 > > > The current source task thread has the following workflow: > # Poll messages from the source task > # Queue these messages to the producer and send them to Kafka asynchronously. > # Add the message to outstandingMessages, or if a flush is currently active, > outstandingMessagesBacklog > # When the producer completes the send of a record, remove it from > outstandingMessages > The commit offsets thread has the following workflow: > # Wait a flat timeout for outstandingMessages to flush completely > # If this times out, add all of the outstandingMessagesBacklog to the > outstandingMessages and reset > # If it succeeds, commit the source task offsets to the backing store. > # Retry the above on a fixed schedule > If the source task is producing records quickly (faster than the producer can > send), then the producer will throttle the task thread by blocking in its > {{send}} method, waiting at most {{max.block.ms}} for space in the > {{buffer.memory}} to be available. This means that the number of records in > {{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to > the size of the producer memory buffer. > This amount of data might take more than {{offset.flush.timeout.ms}} to > flush, and thus the flush will never succeed while the source task is > rate-limited by the producer memory. This means that we may write multiple > hours of data to Kafka and not ever commit source offsets for the connector. > When the task is lost due to a worker failure, hours of data will be > re-processed that otherwise were successfully written to Kafka. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KAFKA-12226) High-throughput source tasks fail to commit offsets
[ https://issues.apache.org/jira/browse/KAFKA-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17440067#comment-17440067 ] Randall Hauch commented on KAFKA-12226: --- Merged the PR to the `trunk` branch, which will be included in the upcoming 3.1.0 branch. > High-throughput source tasks fail to commit offsets > --- > > Key: KAFKA-12226 > URL: https://issues.apache.org/jira/browse/KAFKA-12226 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > Fix For: 3.1.0 > > > The current source task thread has the following workflow: > # Poll messages from the source task > # Queue these messages to the producer and send them to Kafka asynchronously. > # Add the message to outstandingMessages, or if a flush is currently active, > outstandingMessagesBacklog > # When the producer completes the send of a record, remove it from > outstandingMessages > The commit offsets thread has the following workflow: > # Wait a flat timeout for outstandingMessages to flush completely > # If this times out, add all of the outstandingMessagesBacklog to the > outstandingMessages and reset > # If it succeeds, commit the source task offsets to the backing store. > # Retry the above on a fixed schedule > If the source task is producing records quickly (faster than the producer can > send), then the producer will throttle the task thread by blocking in its > {{send}} method, waiting at most {{max.block.ms}} for space in the > {{buffer.memory}} to be available. This means that the number of records in > {{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to > the size of the producer memory buffer. > This amount of data might take more than {{offset.flush.timeout.ms}} to > flush, and thus the flush will never succeed while the source task is > rate-limited by the producer memory. This means that we may write multiple > hours of data to Kafka and not ever commit source offsets for the connector. > When the task is lost due to a worker failure, hours of data will be > re-processed that otherwise were successfully written to Kafka. -- This message was sent by Atlassian Jira (v8.20.1#820001)