vamossagar12 commented on PR #12485: URL: https://github.com/apache/kafka/pull/12485#issuecomment-1209328983
> This should definitely come with a test :) > > I'm also not sure this is the best approach, since the [ExecutorService::shutdownNow Javadocs](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/ExecutorService.html#shutdownNow()) don't give us any guarantees about threads being interrupted: > > > There are no guarantees beyond best-effort attempts to stop processing actively executing tasks. For example, typical implementations will cancel via [Thread.interrupt()](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/Thread.html#interrupt()), so any task that fails to respond to interrupts may never terminate. > > And on top of that, by the time the thread is interrupted--if it's interrupted at all--we've already exhausted our graceful shutdown timeout. > > Can we leverage the existing shutdown logic in `KafkaBasedLog::stop` somehow, possibly by accounting for a `WakeupException` being thrown while reading to the end of the log during `KafkaBasedLog::start`? I'm not certain that it's safe to stop a log while another thread is in the middle of starting the log; we may have to tweak some of the logic there. We may also have to wake up/interrupt/shut down the admin client (if we're using one to read offsets) since that too could potentially block (but perhaps not indefinitely). Thanks Chris. I agree do need a test. Would figure out how to add one.. I think you brought up a great point. Here's what I understood as per the issue. If a DistributedHerder gets a signal to be shutdown in `stop` method, it invokes `shutdown` and waits for `graceful shutdown timeout`. `awaitTermination` can throw an `InterruptedException` and once that's thrown, the KafkaBasedLog remains in an infinite loop since I believe the initial readToEnd is off the same thread(switched to a `WorkerThread` later on. That's the interrupted exception that I was looking to handle in this PR. Do you think that makes sense? IIUC, the scenario you have described is if even after the graceful shutdown, if the Log isn't stopped and the herder never got to get interrupted, the potential infinite loop issue still remains. Is that correct? I think using `KafkaBasedLog::stop` might be a good idea in this case. BTW, I see there's something called `stopServices` which is effectively stopping these the backing consumers. That inherently calls the `KafkaBasedLog::stop` method and it's all wired using `halt` method which checks upon a `stopping` flag which is set in `stop` method (BTW, I know you know all this, just writing down for my own confirmation :D ) . Do you think the situation you described would still arise or the only unhandled case was the interruption of the herder executor. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org