ableegoldman commented on code in PR #14735:
URL: https://github.com/apache/kafka/pull/14735#discussion_r1396623185


##########
streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java:
##########
@@ -1012,6 +1022,8 @@ private void prepareChangelogs(final Map<TaskId, Task> 
tasks,
                 // no records to restore; in this case we just initialize the 
sensor to zero
                 final long recordsToRestore = 
Math.max(changelogMetadata.restoreEndOffset - startOffset, 0L);
                 task.recordRestoration(time, recordsToRestore, true);
+            }  else if (changelogMetadata.stateManager.taskType() == 
TaskType.STANDBY && storeMetadata.endOffset() != null) {

Review Comment:
   That's a really good point, most (close to all) of the time we won't 
actually know the `endOffset` when `#onUpdateStart` is invoked. I think it's 
totally reasonable to remove this parameter from that callback. It's probably 
way more important/useful to track the end offset as it changes over the course 
of the standby updates anyways, compared to `#onRestoreStart` where the end 
offset never advances during restoration but it's nice to know it up front to 
compute how much there is to restore.
   
   So I'm in agreement with 2), you can just send a quick update to the KIP 
thread with a brief explanation. That said, I think we should still try to 
guarantee that the listener callbacks are always invoked, even if we don't know 
the end offset. 
    In particular `#onUpdateSuspended`, which I imagine might be used to clean 
up resources (such as metrics) created during `#onUpdateStart`. One could argue 
that since `#onBatchLoaded` is only ever called after a successful poll and so 
the end offset should always be known in that case, but just for the sake of 
consistency, I think we should treat both callbacks the same and invoke them 
whether `endOffset` is known or not. Also, the end offset is ultimately just 
one piece of information and not the sole source of value, so it feels weird to 
skip over the listener altogether just because a single field is missing.
   
   So I guess overall, I actually think we should do both of @coltmcnealy-lh 's 
suggestions, 1 and 2. As for whether to use `-1`, `null`, or 
`Optional.empty()`, I'll leave that up to the KIP authors. Of course since you 
should probably update the mailing list thread with this change as well, if you 
don't have a strong preference yourself, you can also just put the options out 
there and see what everyone agrees on.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to