[ 
https://issues.apache.org/jira/browse/HBASE-21885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766735#comment-16766735
 ] 

Duo Zhang commented on HBASE-21885:
-----------------------------------

[~sershe] [~stack] FYI.

> Cancel remote procedure call if the remote procedure is succeeded
> -----------------------------------------------------------------
>
>                 Key: HBASE-21885
>                 URL: https://issues.apache.org/jira/browse/HBASE-21885
>             Project: HBase
>          Issue Type: Improvement
>          Components: proc-v2
>            Reporter: Duo Zhang
>            Priority: Major
>
> I used to think it could rarely rarely happen that a region server can report 
> back to master but master can not get the response from region server, only 
> if there are strange network errors. But when implementing HBASE-21875, I 
> found a way to reproduce the problem without any strange network issues.
> First time, we send the request to region server, and it accept the request, 
> but before returning, there is a network error cause the connection to be 
> broken, so master  will try to send the request to the region server again. 
> But then the region server gets too busy, and always returns 
> CallQueueTooBigException, then the master will retry forever, even if the 
> region has already been opened on the region server.
> And this is not only waste more resources, as later we may close the region 
> on the region server, and if the region server is back, we will receive an 
> open region requst and a close region request at the same time. Not sure if 
> this will cause any problems but at least, we haven't thought this condition 
> yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to