[ 
https://issues.apache.org/jira/browse/HBASE-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550900#comment-13550900
 ] 

Anoop Sam John commented on HBASE-7034:
---------------------------------------

Is this code came in by mistake?
{code}
RecoverableZooKeeper#setData(String path, byte[] data, int version){
....
  byte[] revData = zk.getData(path, false, stat);
  int idLength = Bytes.toInt(revData, ID_LENGTH_SIZE);
  int dataLength = revData.length-ID_LENGTH_SIZE-idLength;
  int dataOffset = ID_LENGTH_SIZE+idLength;
  
  if(Bytes.compareTo(revData, ID_LENGTH_SIZE, id.length, 
          revData, dataOffset, dataLength) == 0) {
        // the bad version is caused by previous successful setData
        return stat;
  }
}
{code}
When we write the data to zk, we write an identifier for the process. Here in 
order to check whether the BADVERSION exception from zookeeper is due to a 
previous setData (from the same process), we need to compare the id read from 
the zookeeper and the id for this process (this.id).. Or am I missing some 
thing. The above offset and length calculating math and compare looks 
problematic for me.

In that case this is the issue for this bug I guess.

>From the log it is clear that there is no problem wrt the node and version at 
>1st. [As part of the transition of state from OPENING to OPENED 1st the 
>present data is read and the check below tells the data and its version every 
>thing is fine.] Immediately a connection loss happened. This triggers a retry 
>for the setData. May be the previous operation made the data change in 
>zookeeper and master got the data changed event. (?)

I think correcting the above code may solve the problems.
                
> Bad version, failed OPENING to OPENED but master thinks it is open anyways
> --------------------------------------------------------------------------
>
>                 Key: HBASE-7034
>                 URL: https://issues.apache.org/jira/browse/HBASE-7034
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 0.94.2
>            Reporter: stack
>
> I have this in RS log:
> {code}
> 2012-10-22 02:21:50,698 ERROR 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> transitioning node 
> b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f.
>  from OPENING to OPENED -- closing region
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f
> {code}
> Master says this (it is bulk assigning):
> {code}
> ....
> 2012-10-22 02:21:40,673 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:10302-0xb3a862e57a503ba Set watcher on existing znode 
> /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f
> ...
> then this
> ....
> 2012-10-22 02:23:47,089 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:10302-0xb3a862e57a503ba Set watcher on existing znode 
> /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f
> ....
> 2012-10-22 02:24:34,176 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:10302-0xb3a862e57a503ba Retrieved 112 byte(s) of data from znode 
> /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f and set watcher; 
> region=b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f.,
>  origin=sv4r17s44,10304,1350872216778, state=RS_ZK_REGION_OPENED
> etc.
> {code}
> Disagreement as to what is going on here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to