[ https://issues.apache.org/jira/browse/BOOKKEEPER-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547726#comment-13547726 ]
Sijie Guo commented on BOOKKEEPER-538: -------------------------------------- sounds great. I agreed that moving shutting down scheduler and mainWorkerPool before releaseExternalResources would be helpful. but awaitTermination is a kind of violence, it might terminate some ongoing operations forcefully. I am not sure is it possible to cause some channel is still leaking without close then block Netty#releaseExternalResources again. I thought that hedwig client already handled similar issue well. You could check hedwig-client/src/main/java/org/apache/hedwig/client/netty/CleanupChannelMap.java. It ensures that all outstanding channels would be closed finally during closing, so it would not block Netty#releaseExternalResources. > Race condition in BookKeeper#close > ---------------------------------- > > Key: BOOKKEEPER-538 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-538 > Project: Bookkeeper > Issue Type: Bug > Reporter: Ivan Kelly > Assignee: Ivan Kelly > Fix For: 4.2.0 > > Attachments: > 0001-BOOKKEEPER-538-Race-condition-in-BookKeeper-close.patch > > > I've seen this with BookieAutoRecoveryTest. Basically, we interrupt and join > the replicationworker thread, and then close the BookKeeper instance. This > can have caused a bookkeeper operation that never finished. The executor runs > it after #close has closed the BookieClient. The operation opens a connection > and therefore we get a hang on releaseExternalResources(). > Solution is pretty simple. We should shutdown all executors before closing > the bookieClient. I'll attach a patch which does this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira