ableegoldman commented on a change in pull request #9984:
URL: https://github.com/apache/kafka/pull/9984#discussion_r565611900
##########
File path: streams/src/main/java/org/apache/kafka/streams/KafkaStreams.java
##########
@@ -91,6 +93,7 @@
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
Review comment:
There's actually a kafka-specific version of `TimeoutException` that you
should use to keep in line with other kafka APIs. It's
`org.apache.kafka.common.errors.TimeoutException`
##########
File path: streams/src/main/java/org/apache/kafka/streams/KafkaStreams.java
##########
@@ -1005,11 +1008,60 @@ private StreamThread createAndAddStreamThread(final
long cacheSizePerThread, fin
|| threads.size() == 1)) {
streamThread.shutdown();
if
(!streamThread.getName().equals(Thread.currentThread().getName())) {
-
streamThread.waitOnThreadState(StreamThread.State.DEAD);
+
streamThread.waitOnThreadState(StreamThread.State.DEAD, -1);
}
threads.remove(streamThread);
final long cacheSizePerThread =
getCacheSizePerThread(threads.size());
resizeThreadCache(cacheSizePerThread);
+ final Collection<MemberToRemove> membersToRemove =
Collections.singletonList(new
MemberToRemove(streamThread.getGroupInstanceID()));
+
adminClient.removeMembersFromConsumerGroup(config.getString(StreamsConfig.APPLICATION_ID_CONFIG),
new RemoveMembersFromConsumerGroupOptions(membersToRemove));
+ return Optional.of(streamThread.getName());
+ }
+ }
+ }
+ log.warn("There are no threads eligible for removal");
+ } else {
+ log.warn("Cannot remove a stream thread when Kafka Streams client
is in state " + state());
+ }
+ return Optional.empty();
+ }
+
+ /**
+ * Removes one stream thread out of the running stream threads from this
Kafka Streams client.
+ * <p>
+ * The removed stream thread is gracefully shut down. This method does not
specify which stream
+ * thread is shut down.
+ * <p>
+ * Since the number of stream threads decreases, the sizes of the caches
in the remaining stream
+ * threads are adapted so that the sum of the cache sizes over all stream
threads equals the total
+ * cache size specified in configuration {@link
StreamsConfig#CACHE_MAX_BYTES_BUFFERING_CONFIG}.
+ *
+ * @param timeout The the length of time to wait for the thread to shutdown
+ * @throws TimeoutException if the thread does not stop in time
+ * @return name of the removed stream thread or empty if a stream thread
could not be removed because
+ * no stream threads are alive
+ */
+ public Optional<String> removeStreamThread(final Duration timeout) throws
TimeoutException {
+ final String msgPrefix = prepareMillisCheckFailMsgPrefix(timeout,
"timeout");
+ final long timeoutMs = validateMillisecondDuration(timeout, msgPrefix);
+ if (isRunningOrRebalancing()) {
+ synchronized (changeThreadCount) {
+ // make a copy of threads to avoid holding lock
+ for (final StreamThread streamThread : new
ArrayList<>(threads)) {
+ if (streamThread.isAlive() &&
(!streamThread.getName().equals(Thread.currentThread().getName())
+ || threads.size() == 1)) {
+ streamThread.shutdown();
+ if
(!streamThread.getName().equals(Thread.currentThread().getName())) {
+ if
(!streamThread.waitOnThreadState(StreamThread.State.DEAD, timeoutMs)) {
+ log.warn("Thread " + streamThread.getName() +
" did not stop in the allotted time");
+ throw new TimeoutException("Thread " +
streamThread.getName() + " did not stop in the allotted time");
+ }
+ }
+ threads.remove(streamThread);
+ final long cacheSizePerThread =
getCacheSizePerThread(threads.size());
+ resizeThreadCache(cacheSizePerThread);
+ Collection<MemberToRemove> membersToRemove =
Collections.singletonList(new
MemberToRemove(streamThread.getGroupInstanceID()));
Review comment:
I'm not sure how `removeMembersFromConsumerGroup` would behave if you
passed in `""` as the `group.instance.id`, do you know? If not then let's just
be safe and check what `streamThread.getGroupInstanceID()` returns, and skip
this call if there is no group.instance.id (ie if not static)
##########
File path: streams/src/main/java/org/apache/kafka/streams/KafkaStreams.java
##########
@@ -1005,11 +1008,60 @@ private StreamThread createAndAddStreamThread(final
long cacheSizePerThread, fin
|| threads.size() == 1)) {
streamThread.shutdown();
if
(!streamThread.getName().equals(Thread.currentThread().getName())) {
-
streamThread.waitOnThreadState(StreamThread.State.DEAD);
+
streamThread.waitOnThreadState(StreamThread.State.DEAD, -1);
}
threads.remove(streamThread);
final long cacheSizePerThread =
getCacheSizePerThread(threads.size());
resizeThreadCache(cacheSizePerThread);
+ final Collection<MemberToRemove> membersToRemove =
Collections.singletonList(new
MemberToRemove(streamThread.getGroupInstanceID()));
+
adminClient.removeMembersFromConsumerGroup(config.getString(StreamsConfig.APPLICATION_ID_CONFIG),
new RemoveMembersFromConsumerGroupOptions(membersToRemove));
+ return Optional.of(streamThread.getName());
+ }
+ }
+ }
+ log.warn("There are no threads eligible for removal");
+ } else {
+ log.warn("Cannot remove a stream thread when Kafka Streams client
is in state " + state());
+ }
+ return Optional.empty();
+ }
+
+ /**
+ * Removes one stream thread out of the running stream threads from this
Kafka Streams client.
+ * <p>
+ * The removed stream thread is gracefully shut down. This method does not
specify which stream
+ * thread is shut down.
+ * <p>
+ * Since the number of stream threads decreases, the sizes of the caches
in the remaining stream
+ * threads are adapted so that the sum of the cache sizes over all stream
threads equals the total
+ * cache size specified in configuration {@link
StreamsConfig#CACHE_MAX_BYTES_BUFFERING_CONFIG}.
+ *
+ * @param timeout The the length of time to wait for the thread to shutdown
+ * @throws TimeoutException if the thread does not stop in time
+ * @return name of the removed stream thread or empty if a stream thread
could not be removed because
+ * no stream threads are alive
+ */
+ public Optional<String> removeStreamThread(final Duration timeout) throws
TimeoutException {
+ final String msgPrefix = prepareMillisCheckFailMsgPrefix(timeout,
"timeout");
+ final long timeoutMs = validateMillisecondDuration(timeout, msgPrefix);
+ if (isRunningOrRebalancing()) {
+ synchronized (changeThreadCount) {
+ // make a copy of threads to avoid holding lock
+ for (final StreamThread streamThread : new
ArrayList<>(threads)) {
+ if (streamThread.isAlive() &&
(!streamThread.getName().equals(Thread.currentThread().getName())
+ || threads.size() == 1)) {
+ streamThread.shutdown();
+ if
(!streamThread.getName().equals(Thread.currentThread().getName())) {
+ if
(!streamThread.waitOnThreadState(StreamThread.State.DEAD, timeoutMs)) {
+ log.warn("Thread " + streamThread.getName() +
" did not stop in the allotted time");
+ throw new TimeoutException("Thread " +
streamThread.getName() + " did not stop in the allotted time");
+ }
+ }
+ threads.remove(streamThread);
+ final long cacheSizePerThread =
getCacheSizePerThread(threads.size());
+ resizeThreadCache(cacheSizePerThread);
+ Collection<MemberToRemove> membersToRemove =
Collections.singletonList(new
MemberToRemove(streamThread.getGroupInstanceID()));
+
adminClient.removeMembersFromConsumerGroup(config.getString(StreamsConfig.APPLICATION_ID_CONFIG),
new RemoveMembersFromConsumerGroupOptions(membersToRemove));
Review comment:
Ok, this is going to be a little
tricky...`removeMembersFromConsumerGroup` is async so we have two options. (1)
just ignore the returned result and hope that it succeeded, or (2) check the
returned `KafkaFuture` and wait/make sure that it succeeded.
Probably we should go with (2) and just apply the remaining time of the
timeout. If you haven't mucked around with the KafkaFuture class before, I
believe `KafkaFuture#get(long timeout, TimeUnit unit)` is what you'd need here
##########
File path: streams/src/main/java/org/apache/kafka/streams/KafkaStreams.java
##########
@@ -1005,11 +1008,60 @@ private StreamThread createAndAddStreamThread(final
long cacheSizePerThread, fin
|| threads.size() == 1)) {
streamThread.shutdown();
if
(!streamThread.getName().equals(Thread.currentThread().getName())) {
-
streamThread.waitOnThreadState(StreamThread.State.DEAD);
+
streamThread.waitOnThreadState(StreamThread.State.DEAD, -1);
Review comment:
To be consistent with the semantics of `KafkaStreams#close`, the
overload with no parameter should probably default to be fully blocking, ie
with a timeout of `Long.MAX_VALUE`. Also, to avoid duplicate code, I would just
have this method call `removeStreamThread(final Duration timeout)` instead of
doing everything twice. Again, something like what we do for `#close`
##########
File path:
streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java
##########
@@ -610,17 +610,32 @@ public void setStreamsUncaughtExceptionHandler(final
java.util.function.Consumer
this.streamsUncaughtExceptionHandler = streamsUncaughtExceptionHandler;
}
- public void waitOnThreadState(final StreamThread.State targetState) {
+ public boolean waitOnThreadState(final StreamThread.State targetState,
long timeoutMs) {
+ if (timeoutMs < 0) {
Review comment:
I think if you fix the semantics of `removeStreamThread()` to match that
of `close()` then there's no need for a `-1` sentinel, in which case we should
just throw an `IllegalArgumentException` here (or it's probably better to check
and throw that in the actual `removeStreamThread(timeout)` call to fail fast
##########
File path:
streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java
##########
@@ -610,17 +610,32 @@ public void setStreamsUncaughtExceptionHandler(final
java.util.function.Consumer
this.streamsUncaughtExceptionHandler = streamsUncaughtExceptionHandler;
}
- public void waitOnThreadState(final StreamThread.State targetState) {
+ public boolean waitOnThreadState(final StreamThread.State targetState,
long timeoutMs) {
+ if (timeoutMs < 0) {
+ timeoutMs = 0;
+ } else if (timeoutMs == 0) {
+ timeoutMs = Long.MAX_VALUE;
Review comment:
We definitely shouldn't modify the passed in timeout like this -- a user
should be able to pass in `0` to mean "don't block at all". Mysteriously
blocking forever when they do so would be pretty weird
##########
File path: streams/src/main/java/org/apache/kafka/streams/KafkaStreams.java
##########
@@ -91,6 +93,7 @@
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
Review comment:
(Tbh that drives me crazy, I once spent like 4 hours debugging something
only to realize that I wasn't using the correct TimeoutException 😠)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]