lianetm commented on code in PR #16686:
URL: https://github.com/apache/kafka/pull/16686#discussion_r1747345639
##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java:
##########
@@ -1302,6 +1304,42 @@ private void releaseAssignmentAndLeaveGroup(final Timer
timer) {
}
}
+ /**
+ * The unsubscribe process requires a handful of back-and-forth trips
between the application thread and
+ * the background thread:
+ *
+ * <ol>
+ * <li>
+ * Application thread: enqueue {@link UnsubscribeEvent}
+ * </li>
+ * <li>
+ * Background thread: process {@link UnsubscribeEvent} and
+ * enqueue {@link
ConsumerRebalanceListenerCallbackNeededEvent}
+ * </li>
+ * <li>
+ * Application thread: process {@link
ConsumerRebalanceListenerCallbackNeededEvent},
+ * invoke appropriate {@link
ConsumerRebalanceListener} method, and
+ * enqueue {@link
ConsumerRebalanceListenerCallbackCompletedEvent}
+ * </li>
+ * <li>
+ * Background thread: process {@link
ConsumerRebalanceListenerCallbackCompletedEvent} and
+ * enqueue {@link
NetworkClientDelegate.UnsentRequest} to send the
+ * {@link ConsumerGroupHeartbeatRequest} to
leave the consumer group
+ * </li>
+ * </ol>
+ *
+ * In cases where the incoming {@link Timer timer} has very little
remaining time, e.g. 0, it is impossible
+ * to perform the thread switches necessary to leave the group. Therefore,
in cases where there isn't much
+ * of a timeout left, increase it slightly (presently 1000 ms.) to improve
the chances for the consumer to
+ * properly leave the group.
Review Comment:
very interesting, and good that you tried out the close(0) approach because
it reveals a bigger gap we still have, regardless of the interrupt issue: close
with low timeouts may still not send the leave.
This is definitely bigger than the interrupt, so what about we tackle this
separately, to properly think about the best approach (this wait for 1000 is
definitely one but seems brittle, what's the magic number there?). We could
maybe simply have a way to GenerateLeaveRequestEvent that we can explicitly
call and wait on when we know we didn't have time to Unsubscribe properly). We
should also add an integration test like the one you have here but for
close(0).
What about:
1. unblock this PR, only for the interrupt (description includes the low
timeouts too), with the approach you had of allowing a close with it's original
timeout.
2. new jira to review and fix close with low timeout that may not send leave
group
3. new jira blocked on #2, to review close behaviour on interrupt, and
attempt to respect the interrupted status better, by not allowing the close to
take up all the time it wants
Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]