[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116232#comment-13116232
 ] 

Ming Ma commented on HBASE-4497:
--------------------------------

ok, Ram.

Add some more clarification.

1. It looks ZKAssign.transitionNode has provided atomicity via "expected 
version" feature in ZK. So we are good here.
2. Global AtomicInteger isn't necessary in this context, we can just use the 
"expected version" from ZK for a given ZNode, given "expected version" just 
need to be unique on a given ZNode, not global.
3. With regard to HBase .META. update, we can put "expected version" as ID into 
the .META. table and enforce new update's ID has to be greater than the 
previous version for a given region via some new HBase API checkGreaterAndPut. 
This ID value is local to the region node, that should be ok; for a given 
region node, this value will increment all the time. Currently this "expected 
version" is passed via RPC RegionOpeningState openRegion(HRegionInfo region, 
int versionOfOfflineNode). Will that address the issue, Jonathan?



Jonathan Dhruba's suggestion is interesting. Could scale be an issue when HBase 
scales to the next level in terms of number of machines, number of regions and 
number of region movements? .META. table will be distributed to different RSs, 
putting it on the Master could be a bottleneck. However, we might first run 
into other more important issues in such large scale.
                
> If region opening fails after updating META HBCK reports it as inconsistent 
> and scanning the region throws NSRE
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4497
>                 URL: https://issues.apache.org/jira/browse/HBASE-4497
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>
> As per the discussion in the mail chain "HBCK reporting of possible mismatch 
> in RS assignment" this JIRA is created.
> Consider two RS-> RS1 and RS2.
> A region tries to open in RS1. But it takes a while.  The RS1 has still not 
> updated meta and transitioned the node from OPENING to OPENED
> So timeout assigns the region to RS2.  RS2 successfully updates the META and 
> opens the region.
> Now RS1 tries to act on the region by first updating the META and then 
> transiting the node to OPENING to OPENED.
> RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
> will have RS1 as the latest.
> Now HBCK reports this as an inconsistency and if we try to scan the Region we 
> get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to