[GitHub] [kafka] C0urante commented on a diff in pull request #13948: KAFKA-15091: Fix misleading Javadoc for SourceTask::commit
C0urante commented on code in PR #13948: URL: https://github.com/apache/kafka/pull/13948#discussion_r1267283474 ## connect/api/src/main/java/org/apache/kafka/connect/source/SourceTask.java: ## @@ -105,9 +105,11 @@ public void initialize(SourceTaskContext context) { public abstract List poll() throws InterruptedException; /** - * - * Commit the offsets, up to the offsets that have been returned by {@link #poll()}. This - * method should block until the commit is complete. + * This method is invoked periodically when offsets are committed for this source task. Note that the offsets + * being committed won't necessarily correspond to the latest offsets returned by this source task via + * {@link #poll()}. When exactly-once support is disabled, offsets are committed periodically and asynchronously + * (i.e. on a separate thread from the one which calls {@link #poll()}). When exactly-once support is enabled, + * offsets are committed on transaction commits (also see {@link TransactionBoundary}). * * SourceTasks are not required to implement this functionality; Kafka Connect will record offsets * automatically. This hook is provided for systems that also need to store offsets internally Review Comment: The "store offsets internally" language isn't great, but I'd rather leave it for now and explore it further if/when we start discussing deprecating this method. Connector developers might theoretically use this for acknowledging JMS records, for example, which in a very loose sense is storing offsets (or at least, some JMS-specific equivalent of them) in that system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] C0urante commented on a diff in pull request #13948: KAFKA-15091: Fix misleading Javadoc for SourceTask::commit
C0urante commented on code in PR #13948: URL: https://github.com/apache/kafka/pull/13948#discussion_r1267278752 ## connect/api/src/main/java/org/apache/kafka/connect/source/SourceTask.java: ## @@ -105,9 +105,11 @@ public void initialize(SourceTaskContext context) { public abstract List poll() throws InterruptedException; /** - * - * Commit the offsets, up to the offsets that have been returned by {@link #poll()}. This - * method should block until the commit is complete. + * This method is invoked periodically when offsets are committed for this source task. Note that the offsets + * being committed won't necessarily correspond to the latest offsets returned by this source task via + * {@link #poll()}. When exactly-once support is disabled, offsets are committed periodically and asynchronously + * (i.e. on a separate thread from the one which calls {@link #poll()}). When exactly-once support is enabled, + * offsets are committed on transaction commits (also see {@link TransactionBoundary}). Review Comment: Looks great, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] C0urante commented on a diff in pull request #13948: KAFKA-15091: Fix misleading Javadoc for SourceTask::commit
C0urante commented on code in PR #13948: URL: https://github.com/apache/kafka/pull/13948#discussion_r1262996406 ## connect/api/src/main/java/org/apache/kafka/connect/source/SourceTask.java: ## @@ -105,9 +105,11 @@ public void initialize(SourceTaskContext context) { public abstract List poll() throws InterruptedException; /** - * - * Commit the offsets, up to the offsets that have been returned by {@link #poll()}. This - * method should block until the commit is complete. + * This method is invoked periodically when offsets are committed for this source task. Note that the offsets + * being committed won't necessarily correspond to the latest offsets returned by this source task via + * {@link #poll()}. When exactly-once support is disabled, offsets are committed periodically and asynchronously + * (i.e. on a separate thread from the one which calls {@link #poll()}). When exactly-once support is enabled, + * offsets are committed on transaction commits (also see {@link TransactionBoundary}). Review Comment: I don't love how we're outlining differences in behavior when exactly-once support is enabled/disabled; it adds to the cognitive load and may tempt connector developers to write connectors that are designed to work exclusively with one mode or the other. Could it be enough to leave this bit out and rely on the "Note that the offsets being committed won't necessarily correspond to the latest offsets returned by this source task via `poll`" part? We can also refer people to [SourceTask::commitRecord](https://kafka.apache.org/35/javadoc/org/apache/kafka/connect/source/SourceTask.html#commitRecord(org.apache.kafka.connect.source.SourceRecord,org.apache.kafka.clients.producer.RecordMetadata)) for fine-grained tracking of records (though there's also no guarantee that all records that have been ack'd in that method will have their offsets committed before a call to `SourceTask::commit`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org