ableegoldman commented on a change in pull request #9446:
URL: https://github.com/apache/kafka/pull/9446#discussion_r506616155



##########
File path: 
streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsPartitionAssignor.java
##########
@@ -409,9 +396,18 @@ public GroupAssignment assign(final Cluster metadata, 
final GroupSubscription gr
                 minSupportedMetadataVersion,
                 versionProbing,
                 probingRebalanceNeeded
-        );
+            );
 
-        return new GroupAssignment(assignment);
+            return new GroupAssignment(assignment);
+        } catch (final MissingSourceTopicException e) {
+            return new GroupAssignment(
+                errorAssignment(clientMetadataMap, 
AssignorError.INCOMPLETE_SOURCE_TOPIC_METADATA.code())
+            );
+        } catch (final TaskAssignmentException e) {
+            return new GroupAssignment(
+                errorAssignment(clientMetadataMap, 
AssignorError.ASSIGNMENT_ERROR.code())

Review comment:
       I'm honestly not sure that INCOMPLETE_SOURCE_TOPIC_METADATA is better 
than just throwing nothing for older clients. Is it better to shut everyone 
down immediately if it not only completely obfuscates the reason for shutting 
down, but actually points to a completely different (and wrong) root cause?
   
   Obviously a slow death by one thread at a time is not a good user 
experience, but this is slightly different. For one thing, you'd still see all 
the upgraded clients shut down, and only the older ones would remain. 
Presumably you're in the middle of a rolling bounce, so eventually all of those 
surviving older clients will eventually be upgraded to understand this error 
code and shut down. Note that the previous behavior is to just kill a single 
thread at a time upon hitting this exception, so it's not a regression. 
Throwing a misleading exception actually seems more like a regression to me. 
   
   WDYT? How important does it seem to ensure that the entire group shuts down 
immediately during a rolling bounce? It does seem preferable to cut the rolling 
bounce early if it's doomed from the start. But I personally feel like logging 
a misleading error is a worse user experience. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to