[
https://issues.apache.org/jira/browse/HBASE-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637224#action_12637224
]
Jim Kellerman commented on HBASE-851:
-------------------------------------
It appears that the master got the REPORT_OPEN from the server but not
the REPORT_CLOSE.
So when the NSRE happens, the master thinks the region is open, but
has told the region server to close it.
The region server has not yet reported the region as closed, but it
may have removed the region from onlineRegions but just has not yet
gotten around to finish and report close or, due to a thread
scheduling problem, the heartbeat has either not been sent to the
master or the master has not polled the heartbeat message queue. Beyond
that, the logs do not show enough information
What would be really useful next time, would be thread dumps of master
and region server.
If that is really all that is in the logs except for the NSRE's,
something is wedged.
If the master never receives MSG_REPORT_CLOSE, it is never going to
reassign the region.
> Region is left unassigned after a split/rebalancing, throws NSRE
> ----------------------------------------------------------------
>
> Key: HBASE-851
> URL: https://issues.apache.org/jira/browse/HBASE-851
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.2.0, 0.2.1
> Reporter: Jean-Daniel Cryans
> Fix For: 0.19.0
>
>
> Master log:
> {code}
> 2008-08-28 12:12:27,174 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received MSG_REPORT_PROCESS_OPEN:
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> from 192.168.1.95:60020
> <jdcryans> 2008-08-28 12:12:27,174 INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
> web_pages,http://www.salonskincare.co.uk/product_info.php/products_id/168,1219939934794
> from 192.168.1.95:60020
> <jdcryans> 2008-08-28 12:12:27,174 INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> from 192.168.1.95:60020
> <jdcryans> 2008-08-28 12:12:27,174 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Server 192.168.1.95:60020 is
> overloaded. Server load: 8 avg: 7.0
> <jdcryans> 2008-08-28 12:12:27,174 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 1 regions.
> mostLoadedRegions has 8 regions in it.
> <jdcryans> 2008-08-28 12:12:27,174 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Going to close region
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> <jdcryans> 2008-08-28 12:12:27,174 DEBUG
> org.apache.hadoop.hbase.master.HMaster: Main processing loop:
> PendingOpenOperation from 192.168.1.95:60020
> <jdcryans> 2008-08-28 12:12:27,175 INFO
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
> web_pages,http://www.salonskincare.co.uk/product_info.php/products_id/168,1219939934794
> open on 192.168.1.95:60020
> <jdcryans> 2008-08-28 12:12:27,175 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: numberOfMetaRegions: 1,
> onlineMetaRegions.size(): 1
> <jdcryans> 2008-08-28 12:12:27,175 INFO
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
> web_pages,http://www.salonskincare.co.uk/product_info.php/products_id/168,1219939934794
> in region .META.,,1 with startcode 1219931259154 and server
> 192.168.1.95:60020
> <jdcryans> 2008-08-28 12:12:30,352 INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE:
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> from 192.168.1.95:60020
> <jdcryans> 2008-08-28 12:1
> <jdcryans> 2008-08-28 12:12:32,557 DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Total Load: 103, Num Servers:
> 15, Avg Load: 7.0
> <jdcryans> 2008-08-28 12:12:34,093 DEBUG
> org.apache.hadoop.hbase.master.HMaster: Main processing loop:
> PendingOpenOperation from 192.168.1.95:60020
> <jdcryans> 2008-08-28 12:12:34,093 INFO
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> open on 192.168.1.95:60020
> <jdcryans> 2008-08-28 12:12:34,093 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: numberOfMetaRegions: 1,
> onlineMetaRegions.size(): 1
> <jdcryans> 2008-08-28 12:12:34,093 INFO
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> in region .META.,,1 with startcode 1219931259154 and server
> 192.168.1.95:60020
> {code}
> HRS 192.168.1.95
> {code}
> jdcryans> 2008-08-28 12:12:24,953 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested
> for region:
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> <jdcryans> 2008-08-28 12:12:27,307 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794:
> [EMAIL PROTECTED]
> <jdcryans> 2008-08-28 12:12:27,307 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794:
> [EMAIL PROTECTED]
> <jdcryans> 2008-08-28 12:12:27,308 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Compactions and cache flushes
> disabled for region
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> <jdcryans> 2008-08-28 12:12:27,308 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Scanners disabled for region
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> <jdcryans> 2008-08-28 12:12:27,308 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: No more active scanners for
> region
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> <jdcryans> 2008-08-28 12:12:27,308 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> <jdcryans> 2008-08-28 12:12:27,308 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: No more row locks outstanding
> on region
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> <jdcryans> 2008-08-28 12:12:27,308 DEBUG
> org.apache.hadoop.hbase.regionserver.HStore: closed 1860667227/attribute
> <jdcryans> 2008-08-28 12:12:27,308 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: closed
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> <jdcryans> 2008-08-28 12:12:34,246 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 1 on 60020, call batchUpdate([EMAIL PROTECTED], row =>
> http://www.simplewebengines.com/, {column => attribute:traveliness, value =>
> '...', column => attribute:processed_at, value => '...', column =>
> attribute:content, value => '...', column => attribute:refs, value => '...',
> column => attribute:crawled_at, value => '...', column => att
> <jdcryans> ribute:html, value => '...', column => attribute:crawled, value =>
> '...'}) from 192.168.1.96:50102: error:
> org.apache.hadoop.hbase.NotServingRegionException:
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> <jdcryans> org.apache.hadoop.hbase.NotServingRegionException:
> web_pages,http://www.senior-community.net/michigan/charlevoix.htm,1219939934794
> NSRE for a hundred times
> {code}
> Restarting the cluster cleared the issue but this is a nasty bug. Proposed
> bandaid would be that if we have a NSRE after the retries, asked the master
> to scan the HRS to see if it's located somewhere else. If not, assign it
> somewhere. Finally update META.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.