[GitHub] [kafka] C0urante commented on a diff in pull request #13948: KAFKA-15091: Fix misleading Javadoc for SourceTask::commit

2023-07-18 Thread via GitHub


C0urante commented on code in PR #13948:
URL: https://github.com/apache/kafka/pull/13948#discussion_r1267283474


##
connect/api/src/main/java/org/apache/kafka/connect/source/SourceTask.java:
##
@@ -105,9 +105,11 @@ public void initialize(SourceTaskContext context) {
 public abstract List poll() throws InterruptedException;
 
 /**
- * 
- * Commit the offsets, up to the offsets that have been returned by {@link 
#poll()}. This
- * method should block until the commit is complete.
+ * This method is invoked periodically when offsets are committed for this 
source task. Note that the offsets
+ * being committed won't necessarily correspond to the latest offsets 
returned by this source task via
+ * {@link #poll()}. When exactly-once support is disabled, offsets are 
committed periodically and asynchronously
+ * (i.e. on a separate thread from the one which calls {@link #poll()}). 
When exactly-once support is enabled,
+ * offsets are committed on transaction commits (also see {@link 
TransactionBoundary}).
  * 
  * SourceTasks are not required to implement this functionality; Kafka 
Connect will record offsets
  * automatically. This hook is provided for systems that also need to 
store offsets internally

Review Comment:
   The "store offsets internally" language isn't great, but I'd rather leave it 
for now and explore it further if/when we start discussing deprecating this 
method. Connector developers might theoretically use this for acknowledging JMS 
records, for example, which in a very loose sense is storing offsets (or at 
least, some JMS-specific equivalent of them) in that system.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] C0urante commented on a diff in pull request #13948: KAFKA-15091: Fix misleading Javadoc for SourceTask::commit

2023-07-18 Thread via GitHub


C0urante commented on code in PR #13948:
URL: https://github.com/apache/kafka/pull/13948#discussion_r1267278752


##
connect/api/src/main/java/org/apache/kafka/connect/source/SourceTask.java:
##
@@ -105,9 +105,11 @@ public void initialize(SourceTaskContext context) {
 public abstract List poll() throws InterruptedException;
 
 /**
- * 
- * Commit the offsets, up to the offsets that have been returned by {@link 
#poll()}. This
- * method should block until the commit is complete.
+ * This method is invoked periodically when offsets are committed for this 
source task. Note that the offsets
+ * being committed won't necessarily correspond to the latest offsets 
returned by this source task via
+ * {@link #poll()}. When exactly-once support is disabled, offsets are 
committed periodically and asynchronously
+ * (i.e. on a separate thread from the one which calls {@link #poll()}). 
When exactly-once support is enabled,
+ * offsets are committed on transaction commits (also see {@link 
TransactionBoundary}).

Review Comment:
   Looks great, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] C0urante commented on a diff in pull request #13948: KAFKA-15091: Fix misleading Javadoc for SourceTask::commit

2023-07-13 Thread via GitHub


C0urante commented on code in PR #13948:
URL: https://github.com/apache/kafka/pull/13948#discussion_r1262996406


##
connect/api/src/main/java/org/apache/kafka/connect/source/SourceTask.java:
##
@@ -105,9 +105,11 @@ public void initialize(SourceTaskContext context) {
 public abstract List poll() throws InterruptedException;
 
 /**
- * 
- * Commit the offsets, up to the offsets that have been returned by {@link 
#poll()}. This
- * method should block until the commit is complete.
+ * This method is invoked periodically when offsets are committed for this 
source task. Note that the offsets
+ * being committed won't necessarily correspond to the latest offsets 
returned by this source task via
+ * {@link #poll()}. When exactly-once support is disabled, offsets are 
committed periodically and asynchronously
+ * (i.e. on a separate thread from the one which calls {@link #poll()}). 
When exactly-once support is enabled,
+ * offsets are committed on transaction commits (also see {@link 
TransactionBoundary}).

Review Comment:
   I don't love how we're outlining differences in behavior when exactly-once 
support is enabled/disabled; it adds to the cognitive load and may tempt 
connector developers to write connectors that are designed to work exclusively 
with one mode or the other.
   
   Could it be enough to leave this bit out and rely on the "Note that the 
offsets being committed won't necessarily correspond to the latest offsets 
returned by this source task via `poll`" part? We can also refer people to 
[SourceTask::commitRecord](https://kafka.apache.org/35/javadoc/org/apache/kafka/connect/source/SourceTask.html#commitRecord(org.apache.kafka.connect.source.SourceRecord,org.apache.kafka.clients.producer.RecordMetadata))
 for fine-grained tracking of records (though there's also no guarantee that 
all records that have been ack'd in that method will have their offsets 
committed before a call to `SourceTask::commit`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org