-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
Jean-
Daniel Cryans
Sent: Thursday, February 26, 2009 8:01 AM
To: [email protected]
Subject: Re: Hbase stops working
Andrew,
At 512MB I'm sure it swapped, tho it would be hard to tell now
unless you
installed some monitoring software. Next time, you should at least
monitor
your job using top to make sure it's the root cause of the problem.
Something that may help you is in
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#7 mainly the
third
bullet point in Resolution (adding -XX:+UseConcMarkSweepGC in hbase-
env.sh
in the HBASE_OPTS and uncommenting it).
J-D
On Thu, Feb 26, 2009 at 10:14 AM, Andrew McCall
<[email protected]>wrote:
Hi J-D,
I'm running the HBase 0.19 release downloaded to the website on a
pretty
lacklustre linux machine; 1.6Ghz single core processor, 512MB ram
with
1.5GB
swap space enabled and 80GB hard disk. It's a testing machine,
I've been
using for development. It's running Hadoop core 0.19 in pseudo
distributed
mode and running a map/reduce reduce crawl using nutch and HBase.
I've
been
planning to move to an EC3 cluster in the near future but I've been
putting
it off since everything seemed to come up and work on the testing
box.
It's entirely possible that it is a result of a swap, how do I tell?
I've
looked in all the logs for hadoop, hbase and my job and can't see
that
anything stopped logging around then, the logs all have entries for
9:34.
I've also increased my lease timeout from the default to 600,000
ms in
my
hbase-site, which is below as a result of another issue. I did
search of
the
mailing list for the other issue and answers suggested that the
lease
timeout should be the same as my map/reduce job timeout.
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
<description>The directory shared by region servers.
</description>
</property>
<property>
<name>hbase.regionserver.lease.period</name>
<value>600000</value>
</property>
</configuration>
Thanks for the help,
Andrew
On 26 Feb 2009, at 13:41, Jean-Daniel Cryans wrote:
Andrew,
Oh yes sorry, your setup is in local mode so no region server in a
different
thread. My bad!
That lease timeout probably means that you were swapping and a
full jvm
garbage collecting occurred which blocked the main thread. Can
you make
sure
that it did swap at that moment?
It would also be helpful to get the usual info like hardware, OS,
hbase
version, etc.
Thx!
J-D
On Thu, Feb 26, 2009 at 8:35 AM, Andrew McCall
<[email protected]
wrote:
Hi J-D,
I did check the region server logs and they didn't seem to be very
useful
at all:
Mon Feb 23 22:11:57 GMT 2009 Starting regionserver on
betty.goroam.net
ulimit -n 1024
2009-02-23 22:11:58,817 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Not starting a
distinct
region server because hbase.master is set to 'local' mode
Is all that is in them, once for each time I started HBase.
I saw the shutdown message, which was preceded by this:
2009-02-24 09:34:41,579 INFO
org.apache.hadoop.hbase.master.ServerManager:
127.0.0.1:40076 lease expired
I can't find any evidence it's related to my map/reduce task,
it's the
only
one in the log files and it's suspiciously just before the process
shutdown
message - but, I'm new to this and honestly don't know what to
make of
it.
Andrew
On 26 Feb 2009, at 12:57, Jean-Daniel Cryans wrote:
Andrew,
For some reason, your region server shut down around 9:36:
2009-02-24 09:36:05,884 INFO org.apache.hadoop.hbase.
master.RegionServerOperation: process shutdown of server
127.0.0.1:40076
:
logSplit: false, rootRescanned: false, numberOfMetaRegions: 1,
But just before that the master received a startup message from
it.
Did
you
take a look at your region server log? What does it say around
that
time?
Thx
J-D
On Thu, Feb 26, 2009 at 5:34 AM, Andrew McCall <
[email protected]
wrote:
I've been testing a nuch on HBase and I'm running into an issue
with
it
when it's been running for a while.
I'm running hbase 0.19 on a single local host pseudo distributed
hadoop
0.19 cluster.
After some time I get the following in my logs:
2009-02-24 09:31:18,664 INFO
org.apache.hadoop.hbase.master.BaseScanner:
All 1 .META. region(s) scanned
2009-02-24 09:31:18,664 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname:
.META.,,1,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:31:24,293 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region
{regionname:
-ROOT-,,0, startKey: <>, server: 127.0.0.1:40
076} complete
2009-02-24 09:31:24,293 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {regionname: -
ROOT-
,,0,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:31:27,217 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 2 row(s) of meta region
{regionname:
.META.,,1, startKey: <>, server: 127.0.0.1:40
076} complete
2009-02-24 09:31:27,217 INFO
org.apache.hadoop.hbase.master.BaseScanner:
All 1 .META. region(s) scanned
2009-02-24 09:31:55,533 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region
{regionname:
-ROOT-,,0, startKey: <>, server: 127.0.0.1:40
076} complete
2009-02-24 09:32:18,665 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname:
.META.,,1,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:32:35,749 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {regionname: -
ROOT-
,,0,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:32:38,957 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 2 row(s) of meta region
{regionname:
.META.,,1, startKey: <>, server: 127.0.0.1:40
076} complete
2009-02-24 09:32:38,957 INFO
org.apache.hadoop.hbase.master.BaseScanner:
All 1 .META. region(s) scanned
2009-02-24 09:33:18,771 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname:
.META.,,1,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:33:54,889 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region
{regionname:
-ROOT-,,0, startKey: <>, server: 127.0.0.1:40
076} complete
2009-02-24 09:33:57,387 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {regionname: -
ROOT-
,,0,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:34:41,579 INFO
org.apache.hadoop.hbase.master.ServerManager:
127.0.0.1:40076 lease expired
2009-02-24 09:34:53,440 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 2 row(s) of meta region
{regionname:
.META.,,1, startKey: <>, server: 127.0.0.1:40
076} complete
2009-02-24 09:34:53,441 INFO
org.apache.hadoop.hbase.master.BaseScanner:
All 1 .META. region(s) scanned
2009-02-24 09:34:53,441 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname:
.META.,,1,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:35:38,720 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 2 row(s) of meta region
{regionname:
.META.,,1, startKey: <>, server: 127.0.0.1:40
076} complete
2009-02-24 09:35:38,720 INFO
org.apache.hadoop.hbase.master.BaseScanner:
All 1 .META. region(s) scanned
2009-02-24 09:35:56,951 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname:
.META.,,1,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:36:05,667 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
MSG_CALL_SERVER_STARTUP
2009-02-24 09:36:05,668 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region
{regionname:
-ROOT-,,0, startKey: <>, server: 127.0.0.1:40
076} complete
2009-02-24 09:36:05,884 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: process
shutdown
of
server 127.0.0.1:40076: logSplit: false, rootRescanned: false,
numberOfMetaRegions: 1,
onlineMetaRegions.size(): 1
2009-02-24 09:36:17,689 INFO
org.apache.hadoop.hbase.regionserver.HLog:
Splitting 1 log(s) in
hdfs://localhost:9000/hbase/log_127.0.0.1_1235427119323_40076
2009-02-24 09:36:25,034 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 2 row(s) of meta region
{regionname:
.META.,,1, startKey: <>, server: 127.0.0.1:40
076} complete
2009-02-24 09:36:25,034 INFO
org.apache.hadoop.hbase.master.BaseScanner:
All 1 .META. region(s) scanned
2009-02-24 09:36:36,598 INFO
org.apache.hadoop.hbase.regionserver.HLog:
log
file splitting completed for
hdfs://localhost:9000/hbase/log_127.0.0.1_1235427119323_40076
2009-02-24 09:36:56,953 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname:
.META.,,1,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:37:05,279 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed
openScanner
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer
.java:2065)
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServ
er.java:1699)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown
Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
pl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:
632)
at
org.apache.hadoop.hbase.ipc.HBaseServer
$Handler.run(HBaseServer.java:895)
2009-02-24 09:37:06,102 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Server
handler 4 on 40076, call openScanner([...@1ac5a78, [...@6799dc,
[...@1a66cfc,
9223372036854775807, null) from 127.0.0.1
:50139: error:
org.apache.hadoop.hbase.NotServingRegionException:
.META.,,1
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer
.java:2065)
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServ
er.java:1699)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown
Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
pl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:
632)
at
org.apache.hadoop.hbase.ipc.HBaseServer
$Handler.run(HBaseServer.java:895)
2009-02-24 09:37:07,051 WARN
org.apache.hadoop.hbase.master.BaseScanner:
Scan one META region: {regionname: .META.,,1, startKey: <>,
server:
127.0.0.1:40076}
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer
.java:2065)
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServ
er.java:1699)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown
Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
pl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:
632)
at
org.apache.hadoop.hbase.ipc.HBaseServer
$Handler.run(HBaseServer.java:895)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun
.reflect
.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcc
essorImpl.java:39)
at
sun
.reflect
.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstr
uctorAccessorImpl.java:27)
at
java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org
.apache
.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(Remot
eExceptionHandler.java:95)
at
org
.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:
185
)
at
org
.apache
.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.j
ava:73)
at
org
.apache
.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.jav
a:129)
at
org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:
137)
at org.apache.hadoop.hbase.Chore.run(Chore.java:65)
2009-02-24 09:37:07,230 INFO
org.apache.hadoop.hbase.master.BaseScanner:
All 1 .META. region(s) scanned
2009-02-24 09:37:50,642 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
Closed .META.,,1
2009-02-24 09:37:56,955 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname:
.META.,,1,
startKey: <>, server: 127.0.0.1:40076}
2009-02-24 09:37:57,126 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed
openScanner
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer
.java:2065)
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServ
er.java:1699)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown
Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
pl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:
632)
at
org.apache.hadoop.hbase.ipc.HBaseServer
$Handler.run(HBaseServer.java:895)
2009-02-24 09:37:59,129 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Server
handler 2 on 40076, call openScanner([...@155bf4e, [...@82b436,
[...@1056757,
9223372036854775807, null) from 127.0.0.1
:35804: error:
org.apache.hadoop.hbase.NotServingRegionException:
.META.,,1
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer
.java:2065)
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServ
er.java:1699)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown
Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
pl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:
632)
at
org.apache.hadoop.hbase.ipc.HBaseServer
$Handler.run(HBaseServer.java:895)
2009-02-24 09:38:00,412 WARN
org.apache.hadoop.hbase.master.BaseScanner:
Scan one META region: {regionname: .META.,,1, startKey: <>,
server:
127.0.0.1:40076}
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer
.java:2065)
at
org
.apache
.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServ
er.java:1699)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown
Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
pl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:
632)
at
org.apache.hadoop.hbase.ipc.HBaseServer
$Handler.run(HBaseServer.java:895)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun
.reflect
.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcc
essorImpl.java:39)
at
sun
.reflect
.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstr
uctorAccessorImpl.java:27)
at
java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org
.apache
.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(Remot
eExceptionHandler.java:95)
at
org
.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:
185
)
at
org
.apache
.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.j
ava:73)
at
org
.apache
.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.jav
a:129)
at
org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:
137)
at org.apache.hadoop.hbase.Chore.run(Chore.java:65)
This just repeats forever, until I restart HBase.
Any help would be greatly appreciated.
Andrew