vamossagar12 commented on PR #12485:
URL: https://github.com/apache/kafka/pull/12485#issuecomment-1209328983

   > This should definitely come with a test :)
   > 
   > I'm also not sure this is the best approach, since the 
[ExecutorService::shutdownNow 
Javadocs](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/ExecutorService.html#shutdownNow())
 don't give us any guarantees about threads being interrupted:
   > 
   > > There are no guarantees beyond best-effort attempts to stop processing 
actively executing tasks. For example, typical implementations will cancel via 
[Thread.interrupt()](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/Thread.html#interrupt()),
 so any task that fails to respond to interrupts may never terminate.
   > 
   > And on top of that, by the time the thread is interrupted--if it's 
interrupted at all--we've already exhausted our graceful shutdown timeout.
   > 
   > Can we leverage the existing shutdown logic in `KafkaBasedLog::stop` 
somehow, possibly by accounting for a `WakeupException` being thrown while 
reading to the end of the log during `KafkaBasedLog::start`? I'm not certain 
that it's safe to stop a log while another thread is in the middle of starting 
the log; we may have to tweak some of the logic there. We may also have to wake 
up/interrupt/shut down the admin client (if we're using one to read offsets) 
since that too could potentially block (but perhaps not indefinitely).
   
   Thanks Chris. I agree do need a test. Would figure out how to add one..
   
   I think you brought up a great point. Here's what I understood as per the 
issue. If a DistributedHerder gets a signal to be shutdown in `stop` method, it 
invokes `shutdown` and waits for `graceful shutdown timeout`. 
`awaitTermination` can throw an `InterruptedException` and once that's thrown, 
the KafkaBasedLog remains in an infinite loop since I believe the initial 
readToEnd is off the same thread(switched to a `WorkerThread` later on. That's 
the interrupted exception that I was looking to handle in this PR. Do you think 
that makes sense?
   
   IIUC, the scenario you have described is if even after the graceful 
shutdown, if the Log isn't stopped and the herder never got to get interrupted, 
the potential infinite loop issue still remains. Is that correct?
   
   I think using `KafkaBasedLog::stop` might be a good idea in this case. BTW, 
I see there's something called `stopServices` which is effectively stopping 
these the backing consumers. That inherently calls the  `KafkaBasedLog::stop` 
method and it's all wired using `halt` method which checks upon a `stopping` 
flag which is set in `stop` method (BTW, I know you know all this, just writing 
down for my own confirmation :D ) . Do you think the situation you described 
would still arise or the only unhandled case was the interruption of the herder 
executor. WDYT?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to