[jira] Updated: (HADOOP-2343) [hbase] Stuck regionserver?

2008-01-15 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2343:
--

Priority: Trivial  (was: Minor)

 [hbase] Stuck regionserver?
 ---

 Key: HADOOP-2343
 URL: https://issues.apache.org/jira/browse/HADOOP-2343
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Trivial

 Looking in logs, a regionserver went down because it could not contact the 
 master after 60 seconds.  Watching logging, the HRS is repeatedly checking 
 all 150 loaded regions over and over again w/ a pause of about 5 seconds 
 between runs... then there is a suspicious 60+ second gap with no logging as 
 though the regionserver had hung up on something:
 {code}
 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635
 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965
 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to 
 master for 67467 milliseconds - aborting server
 2007-12-03 13:16:04,455 INFO  hbase.Leases - 
 regionserver/0:0:0:0:0:0:0:0:60020 closing leases
 2007-12-03 13:16:04,455 INFO  hbase.Leases$LeaseMonitor - 
 regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting
 {code}
 Master seems to be running fine scanning its ~700 regions.  Then you see this 
 in log, before the HRS shuts itself down.
 {code}
 2007-12-03 13:14:31,416 INFO  hbase.Leases - HMaster.leaseChecker lease 
 expired 153260899/1532608992007-12-03 13:14:31,417 INFO  hbase.HMaster - 
 XX.XX.XX.102:60020 lease expired
 {code}
 ... and we go on to process shutdown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2343) [hbase] Stuck regionserver?

2007-12-07 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2343:
--

Priority: Minor  (was: Major)

I think this was probably caused by thread starvation in the master or by not 
having enough server threads allocated in the master.

The latter is a configuration parameter. If the former, it was probably caused 
by the bug in HADOOP-2338.

Changing priority to minor.

 [hbase] Stuck regionserver?
 ---

 Key: HADOOP-2343
 URL: https://issues.apache.org/jira/browse/HADOOP-2343
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Minor

 Looking in logs, a regionserver went down because it could not contact the 
 master after 60 seconds.  Watching logging, the HRS is repeatedly checking 
 all 150 loaded regions over and over again w/ a pause of about 5 seconds 
 between runs... then there is a suspicious 60+ second gap with no logging as 
 though the regionserver had hung up on something:
 {code}
 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635
 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965
 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to 
 master for 67467 milliseconds - aborting server
 2007-12-03 13:16:04,455 INFO  hbase.Leases - 
 regionserver/0:0:0:0:0:0:0:0:60020 closing leases
 2007-12-03 13:16:04,455 INFO  hbase.Leases$LeaseMonitor - 
 regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting
 {code}
 Master seems to be running fine scanning its ~700 regions.  Then you see this 
 in log, before the HRS shuts itself down.
 {code}
 2007-12-03 13:14:31,416 INFO  hbase.Leases - HMaster.leaseChecker lease 
 expired 153260899/1532608992007-12-03 13:14:31,417 INFO  hbase.HMaster - 
 XX.XX.XX.102:60020 lease expired
 {code}
 ... and we go on to process shutdown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2343) [hbase] Stuck regionserver?

2007-12-05 Thread Bryan Duxbury (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2343:
--

Priority: Major  (was: Minor)

Affects cluster stability, but cluster recovers on restart, so changing to 
major.

 [hbase] Stuck regionserver?
 ---

 Key: HADOOP-2343
 URL: https://issues.apache.org/jira/browse/HADOOP-2343
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack

 Looking in logs, a regionserver went down because it could not contact the 
 master after 60 seconds.  Watching logging, the HRS is repeatedly checking 
 all 150 loaded regions over and over again w/ a pause of about 5 seconds 
 between runs... then there is a suspicious 60+ second gap with no logging as 
 though the regionserver had hung up on something:
 {code}
 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635
 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965
 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to 
 master for 67467 milliseconds - aborting server
 2007-12-03 13:16:04,455 INFO  hbase.Leases - 
 regionserver/0:0:0:0:0:0:0:0:60020 closing leases
 2007-12-03 13:16:04,455 INFO  hbase.Leases$LeaseMonitor - 
 regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting
 {code}
 Master seems to be running fine scanning its ~700 regions.  Then you see this 
 in log, before the HRS shuts itself down.
 {code}
 2007-12-03 13:14:31,416 INFO  hbase.Leases - HMaster.leaseChecker lease 
 expired 153260899/1532608992007-12-03 13:14:31,417 INFO  hbase.HMaster - 
 XX.XX.XX.102:60020 lease expired
 {code}
 ... and we go on to process shutdown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.