[ https://issues.apache.org/jira/browse/THRIFT-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869573#comment-17869573 ]
Xinyu Tan edited comment on THRIFT-5230 at 7/30/24 8:50 AM: ------------------------------------------------------------ Hi, We Apache IoTDB Team encounter the same problem,This problem makes the node unable to accept any more communication, which has a serious impact. !screenshot-2.png! I try to take some changes, and is similar to scheme in https://github.com/apache/thrift/pull/2171/files !screenshot-3.png! In fact, from my point of view, for TThreadedSelectorServer, even idling the selector thread consuming CPU is better than currently providing an unstable fix that could cause the selector thread to die entirely. We need to discuss how to improve this jvmbug fix, or maybe it's better not to do the rebuildSelector fix. For the above reasons, we will temporarily replace all servers of IoTDB with THsHaServer or TThreadPoolServer without the rebuildSelector function in TThreadedSelectorServer was (Author: tanxinyu): Hi, We Apache IoTDB Team encounter the same problem,This problem makes the node unable to accept any more communication, which has a serious impact. !screenshot-2.png! I try to take some changes, and is similar to scheme in https://github.com/apache/thrift/pull/2171/files !screenshot-3.png! In fact, from my point of view, for TThreadedSelectorServer, even idling the selector thread consuming CPU is better than currently providing an unstable fix that could cause the selector thread to die entirely. We need to discuss how to improve this jvmbug fix, or maybe it's better not to do the rebuildSelector fix. For the above reasons, we will temporarily replace all servers of IoTDB with THsHaServer or TThreadPoolServer without the rebuildSelector function > Fix connection leak and CancelledKeyException when handling Epoll bug > --------------------------------------------------------------------- > > Key: THRIFT-5230 > URL: https://issues.apache.org/jira/browse/THRIFT-5230 > Project: Thrift > Issue Type: Bug > Components: Java - Library > Affects Versions: 0.13.0 > Environment: java version "1.8.0_161" > Reporter: zengji > Priority: Major > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > Time Spent: 40m > Remaining Estimate: 0h > > 1. When Epoll bug occurs, the TThreadedSelectorServer.rebuildSelector > rebuilds only the channel has events, the idle connection was ignored and > caused connection leak > > {code:java} > for (SelectionKey key : oldSelector.selectedKeys()) { > if (!key.isValid() && key.readyOps() == 0) > continue; > SelectableChannel channel = key.channel(); > Object attachment = key.attachment(); > try { > if (attachment == null) { > channel.register(newSelector, key.readyOps()); > } else { > channel.register(newSelector, key.readyOps(), attachment); > } > } catch (ClosedChannelException e) { > LOGGER.error("Register new selector key error.", e); > } > } > selector = newSelector; > try { > oldSelector.close(); > } catch (IOException e) { > LOGGER.error("Close old selector error.", e); > } > {code} > 2. When re-register the channel to new selector, the interested ops should > same as before, not only the readyOps > > 3. In the same code block, the channel will be registered to a new selector > and the previous selector will be closed, but the FrameBuffer is still > holding the previous selector causing the FrameBuffer in a wrong state. When > the FrameBuffer is trying to processing the channel, it may occur a > CancelledKeyException.This issue (CancelledKeyException) has been reported > before:https://issues.apache.org/jira/browse/THRIFT-4847 -- This message was sent by Atlassian Jira (v8.20.10#820010)