Nikita-Shupletsov commented on code in PR #20767:
URL: https://github.com/apache/kafka/pull/20767#discussion_r2497817961


##########
streams/src/main/java/org/apache/kafka/streams/processor/internals/DefaultStateUpdater.java:
##########
@@ -349,23 +345,26 @@ private void handleTaskCorruptedException(final 
TaskCorruptedException taskCorru
         // TODO: we can let the exception encode the actual corrupted 
changelog partitions and only
         //       mark those instead of marking all changelogs
         private void removeCheckpointForCorruptedTask(final Task task) {
-            task.markChangelogAsCorrupted(task.changelogPartitions());
+            try {
+                task.markChangelogAsCorrupted(task.changelogPartitions());
 
-            // we need to enforce a checkpoint that removes the corrupted 
partitions
-            measureCheckpointLatency(() -> task.maybeCheckpoint(true));
+                // we need to enforce a checkpoint that removes the corrupted 
partitions
+                measureCheckpointLatency(() -> task.maybeCheckpoint(true));
+            } catch (final StreamsException e) {
+                log.warn("Checkpoint failed for corrupted task {}", task.id(), 
e);
+            }
         }
 
         private void handleStreamsException(final StreamsException 
streamsException) {
             log.info("Encountered streams exception: ", streamsException);
             if (streamsException.taskId().isPresent()) {
-                handleStreamsExceptionWithTask(streamsException);
+                handleStreamsExceptionWithTask(streamsException, 
streamsException.taskId().get());
             } else {
                 handleStreamsExceptionWithoutTask(streamsException);
             }
         }
 
-        private void handleStreamsExceptionWithTask(final StreamsException 
streamsException) {
-            final TaskId failedTaskId = streamsException.taskId().get();
+        private void handleStreamsExceptionWithTask(final StreamsException 
streamsException, final TaskId failedTaskId) {

Review Comment:
   there are a lot of places where we create StreamsExceptions an we don't pass 
taskid. my understanding from the code was that we do pass task id when we are 
processing a batch of tasks(like `changelogReader.restore`) and we want to know 
which task failed. when we process one task at a time, more often than not we 
don't pass any task ids. i.e. ProcessorStateException is a StreamsException, 
but it never has a task id in it.
   
   so my change here is to use the task id if we are processing one task at a 
time. and when we process a batch, I use the old logic to take it from the 
exception, as in handleStreamsException.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to