[ 
https://issues.apache.org/jira/browse/HBASE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418777#comment-13418777
 ] 

Jonathan Hsieh commented on HBASE-6310:
---------------------------------------

hbck writes directly to .META. but I don't think it ever writes to root unless 
you put the -metaonly flag on.  

It may be possible that if there were two .META. region dirs, hbck tried to 
pull in the old .META. dir.  This would probably write something goofy to .META 
though.  If you just used the -repair option, it would have first tried to 
merge regions before modifying meta. (but also would likely have not modified 
ROOT).
                
> -ROOT- corruption when .META. is using the old encoding scheme
> --------------------------------------------------------------
>
>                 Key: HBASE-6310
>                 URL: https://issues.apache.org/jira/browse/HBASE-6310
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.94.0
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.96.0, 0.94.2
>
>
> We're still working the on the root cause here, but after the leap second 
> armageddon we had a hard time getting our 0.94 cluster back up. This is what 
> we saw in the logs until the master died by itself:
> {noformat}
> 2012-07-01 23:01:52,149 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> locateRegionInMeta parentTable=-ROOT-,
> metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28,
> port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000
> because: HRegionInfo was null or empty in -ROOT-,
> row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0,
> .META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0}
> {noformat}
> (it's strage that we retry this)
> This was really misleading because I could see the regioninfo in a scan:
> {noformat}
> hbase(main):002:0> scan '-ROOT-'
> ROW                                           COLUMN+CELL
>  .META.,,1                                    column=info:regioninfo,
> timestamp=1331755381142, value={NAME => '.META.,,1', STARTKEY => '',
> ENDKEY => '', ENCODED => 1028785192,}
>  .META.,,1                                    column=info:server,
> timestamp=1341183448693, value=sfor3s40:10304
>  .META.,,1
> column=info:serverstartcode, timestamp=1341183448693,
> value=1341183444689
>  .META.,,1                                    column=info:v,
> timestamp=1331755419291, value=\x00\x00
>  .META.,,1259448304806                        column=info:server,
> timestamp=1341124914705, value=sfor3s24:10304
>  .META.,,1259448304806
> column=info:serverstartcode, timestamp=1341124914705,
> value=1341124455863
> {noformat}
> Except that the devil is in the details, ".META.,,1" is not 
> ".META.,,1259448304806". Basically something writes to .META. by directly 
> creating the row key without caring if the row is in the old format. I did a 
> deleteall in the shell and it fixed the issue... until some time later it was 
> stuck again because the edits reappeared (still not sure why). This time the 
> PostOpenDeployTasksThread were stuck in the RS trying to update .META. but 
> there was no logging (saw it with a jstack). I deleted the row again to make 
> it work.
> I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1 
> out, but I wouldn't recommend upgrading to 0.94 if your cluster was created 
> before 0.89

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to