[ 
https://issues.apache.org/jira/browse/THRIFT-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869573#comment-17869573
 ] 

Xinyu Tan edited comment on THRIFT-5230 at 7/30/24 8:50 AM:
------------------------------------------------------------

Hi, We Apache IoTDB Team encounter the same problem,This problem makes the node 
unable to accept any more communication, which has a serious impact.
 !screenshot-2.png! 

I try to take some changes, and is similar to scheme in 
https://github.com/apache/thrift/pull/2171/files
 !screenshot-3.png! 

In fact, from my point of view, for TThreadedSelectorServer, even idling the 
selector thread consuming CPU is better than currently providing an unstable 
fix that could cause the selector thread to die entirely. We need to discuss 
how to improve this jvmbug fix, or maybe it's better not to do the 
rebuildSelector fix.

For the above reasons, we will temporarily replace all servers of IoTDB with 
THsHaServer or TThreadPoolServer without the rebuildSelector function in 
TThreadedSelectorServer



was (Author: tanxinyu):
Hi, We Apache IoTDB Team encounter the same problem,This problem makes the node 
unable to accept any more communication, which has a serious impact.
 !screenshot-2.png! 

I try to take some changes, and is similar to scheme in 
https://github.com/apache/thrift/pull/2171/files
 !screenshot-3.png! 

In fact, from my point of view, for TThreadedSelectorServer, even idling the 
selector thread consuming CPU is better than currently providing an unstable 
fix that could cause the selector thread to die entirely. We need to discuss 
how to improve this jvmbug fix, or maybe it's better not to do the 
rebuildSelector fix.

For the above reasons, we will temporarily replace all servers of IoTDB with 
THsHaServer or TThreadPoolServer without the rebuildSelector function


> Fix connection leak and CancelledKeyException when handling Epoll bug
> ---------------------------------------------------------------------
>
>                 Key: THRIFT-5230
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5230
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.13.0
>         Environment: java version "1.8.0_161"
>            Reporter: zengji
>            Priority: Major
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> 1. When Epoll bug occurs, the TThreadedSelectorServer.rebuildSelector 
> rebuilds only the channel has events, the idle connection was ignored and 
> caused connection leak
>  
> {code:java}
> for (SelectionKey key : oldSelector.selectedKeys()) {
>   if (!key.isValid() && key.readyOps() == 0)
>     continue;
>   SelectableChannel channel = key.channel();
>   Object attachment = key.attachment();
>   try {
>     if (attachment == null) {
>       channel.register(newSelector, key.readyOps());
>     } else {
>       channel.register(newSelector, key.readyOps(), attachment);
>     }
>   } catch (ClosedChannelException e) {
>     LOGGER.error("Register new selector key error.", e);
>   }
> }
> selector = newSelector;
> try {
>   oldSelector.close();
> } catch (IOException e) {
>   LOGGER.error("Close old selector error.", e);
> }
> {code}
> 2. When re-register the channel to new selector, the interested ops should 
> same as before, not only the readyOps
>  
> 3. In the same code block, the channel will be registered to a new selector 
> and the previous selector will be closed, but the FrameBuffer is still 
> holding the previous selector causing the FrameBuffer in a wrong state. When 
> the FrameBuffer is trying to processing the channel, it may occur a 
> CancelledKeyException.This issue (CancelledKeyException) has been reported 
> before:https://issues.apache.org/jira/browse/THRIFT-4847



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to