Region is left unassigned after a split/rebalancing, throws NSRE
----------------------------------------------------------------
Key: HBASE-851
URL: https://issues.apache.org/jira/browse/HBASE-851
Project: Hadoop HBase
Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Jean-Daniel Cryans
Fix For: 0.19.0
Master log:
{code}
2008-08-28 12:12:27,174 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
from 192.168.1.95:60020
<jdcryans> 2008-08-28 12:12:27,174 INFO
org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
web_pages,http://www.salonskincare.co.uk/product_info.php/products_id/168,1219939934794
from 192.168.1.95:60020
<jdcryans> 2008-08-28 12:12:27,174 INFO
org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
from 192.168.1.95:60020
<jdcryans> 2008-08-28 12:12:27,174 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Server 192.168.1.95:60020 is
overloaded. Server load: 8 avg: 7.0
<jdcryans> 2008-08-28 12:12:27,174 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 1 regions.
mostLoadedRegions has 8 regions in it.
<jdcryans> 2008-08-28 12:12:27,174 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Going to close region
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
<jdcryans> 2008-08-28 12:12:27,174 DEBUG
org.apache.hadoop.hbase.master.HMaster: Main processing loop:
PendingOpenOperation from 192.168.1.95:60020
<jdcryans> 2008-08-28 12:12:27,175 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
web_pages,http://www.salonskincare.co.uk/product_info.php/products_id/168,1219939934794
open on 192.168.1.95:60020
<jdcryans> 2008-08-28 12:12:27,175 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperation: numberOfMetaRegions: 1,
onlineMetaRegions.size(): 1
<jdcryans> 2008-08-28 12:12:27,175 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
web_pages,http://www.salonskincare.co.uk/product_info.php/products_id/168,1219939934794
in region .META.,,1 with startcode 1219931259154 and server 192.168.1.95:60020
<jdcryans> 2008-08-28 12:12:30,352 INFO
org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE:
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
from 192.168.1.95:60020
<jdcryans> 2008-08-28 12:1
<jdcryans> 2008-08-28 12:12:32,557 DEBUG
org.apache.hadoop.hbase.master.ServerManager: Total Load: 103, Num Servers: 15,
Avg Load: 7.0
<jdcryans> 2008-08-28 12:12:34,093 DEBUG
org.apache.hadoop.hbase.master.HMaster: Main processing loop:
PendingOpenOperation from 192.168.1.95:60020
<jdcryans> 2008-08-28 12:12:34,093 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
open on 192.168.1.95:60020
<jdcryans> 2008-08-28 12:12:34,093 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperation: numberOfMetaRegions: 1,
onlineMetaRegions.size(): 1
<jdcryans> 2008-08-28 12:12:34,093 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
in region .META.,,1 with startcode 1219931259154 and server 192.168.1.95:60020
{code}
HRS 192.168.1.95
{code}
jdcryans> 2008-08-28 12:12:24,953 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested
for region:
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
<jdcryans> 2008-08-28 12:12:27,307 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794:
[EMAIL PROTECTED]
<jdcryans> 2008-08-28 12:12:27,307 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794:
[EMAIL PROTECTED]
<jdcryans> 2008-08-28 12:12:27,308 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Compactions and cache flushes
disabled for region
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
<jdcryans> 2008-08-28 12:12:27,308 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Scanners disabled for region
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
<jdcryans> 2008-08-28 12:12:27,308 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: No more active scanners for
region
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
<jdcryans> 2008-08-28 12:12:27,308 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
<jdcryans> 2008-08-28 12:12:27,308 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: No more row locks outstanding on
region
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
<jdcryans> 2008-08-28 12:12:27,308 DEBUG
org.apache.hadoop.hbase.regionserver.HStore: closed 1860667227/attribute
<jdcryans> 2008-08-28 12:12:27,308 INFO
org.apache.hadoop.hbase.regionserver.HRegion: closed
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
<jdcryans> 2008-08-28 12:12:34,246 INFO org.apache.hadoop.ipc.Server: IPC
Server handler 1 on 60020, call batchUpdate([EMAIL PROTECTED], row =>
http://www.simplewebengines.com/, {column => attribute:traveliness, value =>
'...', column => attribute:processed_at, value => '...', column =>
attribute:content, value => '...', column => attribute:refs, value => '...',
column => attribute:crawled_at, value => '...', column => att
<jdcryans> ribute:html, value => '...', column => attribute:crawled, value =>
'...'}) from 192.168.1.96:50102: error:
org.apache.hadoop.hbase.NotServingRegionException:
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
<jdcryans> org.apache.hadoop.hbase.NotServingRegionException:
web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
NSRE for a hundred times
{code}
Restarting the cluster cleared the issue but this is a nasty bug. Proposed
bandaid would be that if we have a NSRE after the retries, asked the master to
scan the HRS to see if it's located somewhere else. If not, assign it
somewhere. Finally update META.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.