[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196397#comment-14196397 ]
stack commented on HBASE-12319: ------------------------------- Reverted from 0.98 and from branch-1. Now am going to see if branch-1 goes stable again. Will report back. > Inconsistencies during region recovery due to close/open of a region during > recovery > ------------------------------------------------------------------------------------ > > Key: HBASE-12319 > URL: https://issues.apache.org/jira/browse/HBASE-12319 > Project: HBase > Issue Type: Bug > Affects Versions: 0.98.7, 0.99.1 > Reporter: Devaraj Das > Assignee: Jeffrey Zhong > Fix For: 0.98.8, 0.99.2 > > Attachments: HBASE-12319.patch > > > In one of my test runs, I saw the following: > {noformat} > 2014-10-14 13:45:30,782 DEBUG > [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded > hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, > isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true > 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] > regionserver.HRegion: Found 3 recovered edits file(s) under > hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d > ............. > ............. > 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] > regionserver.HRegion: Null or non-existent edits file: > hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0000000000000198080 > {noformat} > The above logs is from a regionserver, say RS2. From the initial analysis it > seemed like the master asked a certain regionserver to open the region (let's > say RS1) and for some reason asked it to close soon after. The open was still > proceeding on RS1 but the master reassigned the region to RS2. This also > started the recovery but it ended up seeing an inconsistent view of the > recovered-edits files (it reports missing files as per the logs above) since > the first regionserver (RS1) deleted some files after it completed the > recovery. When RS2 really opens the region, it might not see the recent data > that was written by flushes on hor9n10 during the recovery process. Reads of > that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)