[ https://issues.apache.org/jira/browse/HBASE-21885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766735#comment-16766735 ]
Duo Zhang commented on HBASE-21885: ----------------------------------- [~sershe] [~stack] FYI. > Cancel remote procedure call if the remote procedure is succeeded > ----------------------------------------------------------------- > > Key: HBASE-21885 > URL: https://issues.apache.org/jira/browse/HBASE-21885 > Project: HBase > Issue Type: Improvement > Components: proc-v2 > Reporter: Duo Zhang > Priority: Major > > I used to think it could rarely rarely happen that a region server can report > back to master but master can not get the response from region server, only > if there are strange network errors. But when implementing HBASE-21875, I > found a way to reproduce the problem without any strange network issues. > First time, we send the request to region server, and it accept the request, > but before returning, there is a network error cause the connection to be > broken, so master will try to send the request to the region server again. > But then the region server gets too busy, and always returns > CallQueueTooBigException, then the master will retry forever, even if the > region has already been opened on the region server. > And this is not only waste more resources, as later we may close the region > on the region server, and if the region server is back, we will receive an > open region requst and a close region request at the same time. Not sure if > this will cause any problems but at least, we haven't thought this condition > yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)