[ 
https://issues.apache.org/jira/browse/HBASE-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891716#action_12891716
 ] 

Karthik Ranganathan commented on HBASE-2866:
--------------------------------------------

One thing we can try is to change the state of the region to "CLOSED" in 
UNASSIGNED in zk...

Alternatively, is it possible to edit META somehow to set the region 
unassigned? 

> Region permanently offlined 
> ----------------------------
>
>                 Key: HBASE-2866
>                 URL: https://issues.apache.org/jira/browse/HBASE-2866
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Karthik Ranganathan
>            Priority: Blocker
>         Attachments: master.log
>
>
> After split, master attempts to reassign a region to a region server. 
> Occasionally, such a region can get permanently offlined.
> Master:
> ---------
> {code}
> 2010-07-22 01:26:00,914 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Processing MSG_REPORT_SPLIT_INCLUDES_DAUGHTERS: 
> test1,6512200000,1279784117114.6466481aa931f8c1fa87622735487a72.: Daughters; 
> test1,6512200000,1279787158624.6ead25ae677116cc88fc5420bb39d52e., 
> test1,6531790000,1279787\
> 158624.8d5490bfc166c687657cb09203bd7d44. from 
> test024.test.xyz.com,60020,1279780567744; 1 of 1                              
>                                                                               
>                                                                               
>            
> 2010-07-22 01:26:00,935 DEBUG 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED 
> region 8d5490bfc166c687657cb09203bd7d44 in state = M2ZK_REGION_OFFLINE
> 2010-07-22 01:26:00,935 DEBUG 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED 
> region 8d5490bfc166c687657cb09203bd7d44 in state = M2ZK_REGION_OFFLINE
> 2010-07-22 01:26:00,945 INFO org.apache.hadoop.hbase.master.RegionManager: 
> Assigning region 
> test1,6531790000,1279787158624.8d5490bfc166c687657cb09203bd7d44. to 
> test024.test.xyz.com,60020,1279780567744
> 2010-07-22 01:26:00,949 DEBUG 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: While updating UNASSIGNED 
> region 8d5490bfc166c687657cb09203bd7d44 exists, state = M2ZK_REGION_OFFLINE
> 2010-07-22 01:26:00,954 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Created UNASSIGNED zNode 
> test1,6531790000,1279787158624.8d5490bfc166c687657cb09203bd7d44. in state 
> M2ZK_REGION_OFFLINE
> {code}
> -------------------
> Region Server:
> {code}
> 2010-07-22 01:26:00,947 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: 
> test1,6531790000,1279787158624.8d5490bfc166c687657cb09203bd7d44.
> 2010-07-22 01:26:00,947 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: 
> test1,6512200000,1279787158624.6ead25ae677116cc88fc5420bb39d52e.
> 2010-07-22 01:26:00,947 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: 
> test1,6531790000,1279787158624.8d5490bfc166c687657cb09203bd7d44.
> 2010-07-22 01:26:00,948 DEBUG 
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode 
> /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44 with 
> [RS2ZK_REGION_OPENING] expected version = 0
> 2010-07-22 01:26:00,952 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, 
> state: SyncConnected, type: NodeDataChanged, path: 
> /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44
> 2010-07-22 01:26:00,974 WARN 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: 
> <msgstorectrl001.test.xyz.com,msgstorectrl021.test.xyz.com,msgstorectrl041.test.xyz.com,msgstorectrl061.test.xyz.com,msgstorectrl081.ash2.facebook\
> .com:/hbase,test024.test.xyz.com,60020,1279780567744>Failed to write data to 
> ZooKeeper
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>         at 
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>         at 
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1428)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1337)
>         at java.lang.Thread.run(Thread.java:619)
> 2010-07-22 01:26:00,975 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening 
> test1,6531790000,1279787158624.8d5490bfc166c687657cb09203bd7d44.
> java.io.IOException: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1072)
> {code}
> Meta:
> -----
> Relevant section of META.
> Note that these are the only two entries for the problem region. The first 
> one is the parent region (and this problem
> region is its splitB).  For the next one, note that there is no "info:server" 
> and "info:serverstartcode" columns.
> {code}
>  test1,6512200000,12797841 column=info:splitB, timestamp=1279787160693, 
> value=\x00\x0A6551820000\x00
>  17114.6466481aa931f8c1fa8 
> \x00\x00\x01)\xf9...@test1,6531790000,1279787158624.8d5490bfc166c687657cb
>  7622735487a72.            
> 09203bd7d44.\x00\x0A6531790000\x00\x00\x00\x05\x05test1\x00\x00\x00\x00\x
>                            
> 00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\x00\x00\x07IS_META
>                            
> \x00\x00\x00\x05false\x00\x00\x00\x01\x08\x07actions\x00\x00\x00\x08\x00\
>                            
> x00\x00\x0BBLOOMFILTER\x00\x00\x00\x04NONE\x00\x00\x00\x11REPLICATION_SCO
>                            
> PE\x00\x00\x00\x010\x00\x00\x00\x0BCOMPRESSION\x00\x00\x00\x04NONE\x00\x0
>                            
> 0\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x00\x00\x00\x0A2147
>                            
> 483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x00\x09IN_ME
>                            
> MORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true\x
>                            FE\xA0\xFD\xC5
>  ..
>  test1,6531790000,12797871 column=info:regioninfo, timestamp=1279787160782, 
> value=REGION => {NAME =>
>  58624.8d5490bfc166c687657  
> 'test1,6531790000,1279787158624.8d5490bfc166c687657cb09203bd7d44.', STAR
>  cb09203bd7d44.            TKEY => '6531790000', ENDKEY => '6551820000', 
> ENCODED => 8d5490bfc166c687
>                            657cb09203bd7d44, TABLE => {{NAME => 'test1', 
> FAMILIES => [{NAME => 'acti
>                            ons', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => 
> '0', VERSIONS => '3', C
>                            OMPRESSION => 'NONE', TTL => '2147483647', 
> BLOCKSIZE => '65536', IN_MEMOR
>                            Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}
> I think Karthik has a handle on the first part (i.e. why the RS ran into the 
> version mismatch, and aborted opening the region). He'll add details to the 
> JIRA. But what we aren't clear about at this stage is why the base scanner 
> didn't kick in and try to reassign the region.
> BTW, HBase "hbck" reported this as well (which was good!):
> {code}
> Number of Tables: 5
> Number of live region servers:92
> Number of dead region servers:0
> .........
> ERROR: Region 
> test1,6512200000,1279784117114.6466481aa931f8c1fa87622735487a72. is not 
> served by any region server  but is listed in META to be on server null
> ERROR: Region 
> test1,6531790000,1279787158624.8d5490bfc166c687657cb09203bd7d44. is not 
> served by any region server  but is listed in META to be on server null
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to