[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117007#comment-13117007
 ] 

stack commented on HBASE-4497:
------------------------------

Good stuff Ming.

Looking at your pathological case, I think it is possible.  I could add to the 
checkAndPut that takes a version a check that we never write back the same 
version; if the version we are checking will go in with a timestamp that is 
exactly what we are checking, add a millisecond (especially if the value we 
write back is the same again).

I think we should do this though the probability of the scenario your postulate 
is extremely low.

Why would RSs need access to a global counter?  Master assigns.  It'd need to 
keep its running counter in zk in case it crashed but I'd think only the 
assigner would need to use it (Here are some notes on counter in zk from zk 
mailing list: 
http://www.mail-archive.com/zookeeper-user@hadoop.apache.org/msg01968.html)

Would this counter be other than ephemeral data?  Design dictum up to this has 
been that zk is for ephemeral data only.  Would keeping a counter change that?

Does the 'region assignment id' need to monotonically increase?  Can it just be 
unique (uuid?)?

Good stuff Ming.




                
> If region opening fails after updating META HBCK reports it as inconsistent 
> and scanning the region throws NSRE
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4497
>                 URL: https://issues.apache.org/jira/browse/HBASE-4497
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>
> As per the discussion in the mail chain "HBCK reporting of possible mismatch 
> in RS assignment" this JIRA is created.
> Consider two RS-> RS1 and RS2.
> A region tries to open in RS1. But it takes a while.  The RS1 has still not 
> updated meta and transitioned the node from OPENING to OPENED
> So timeout assigns the region to RS2.  RS2 successfully updates the META and 
> opens the region.
> Now RS1 tries to act on the region by first updating the META and then 
> transiting the node to OPENING to OPENED.
> RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
> will have RS1 as the latest.
> Now HBCK reports this as an inconsistency and if we try to scan the Region we 
> get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to