Hi J-D,
Thanks for your guide.

Following is my cluster and hardware information:

Node0:  HDFS Namenode, HBase Master.
Node1-5: HDFS Datanode, HBase RegionServer

DELL 2950 server, each node have 4GB memory.
Node0-4 have 1TB disk (0.8TB for HDFS)
Node5 have 1.5TB disk (1.2TB for HDFS)

I am using hadoop-0.19.1 and hbase-0.19.0.

My table schema is:
{NAME => 'TESTTAB', IS_ROOT => 'false', IS_META => 'false', FAMILIES =>
[{NAME => 'cdr', BLOOMFILTER => 'false'
, VERSIONS => '1', COMPRESSION => 'BLOCK', LENGTH => '2147483647', TTL =>
'-1', IN_MEMORY => 'false', BLOCKCAC
HE => 'true'}], INDEXES => []}
I am using BLOCKCACHE and COMPRESSION now. I am also waiting to
use BLOOMFILTER in the future (0.20.x)

My region size is 128MB.

At about 70 000 000 rows (each row has about 400B raw data), there are about
125 regions. And in HDFS, there is only 16GB (by "hadoop fs -dus
/hbase/TESTTAB') used. On the HDFS web GUI, there is only 1.x% storage used.
I think this volume is small for my hardware and storage.

I checked the logs of HDFS, there are some exceptions, but it is so hard
to associate these issues. I will check more of HDFS and try the
HBase-0.19.1 RC1. Thanks.

In namenode, there are some exceptions like:
.16_1236922775003_60020/hlog.dat.1236925456649 because dir.getFileBlocks()
is null  and pendingFile is null
2009-03-13 14:33:03,828 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 9000, call
complete(/hbase/log_10.24.1.16_1236922775003_60020/hlog.dat.1236925456649,
DFSClient_-1142813574) from 10.24.1.16:58444: error: java.io.IOException:
Could not complete write to file
/hbase/log_10.24.1.16_1236922775003_60020/hlog.dat.1236925456649 by
DFSClient_-1142813574
java.io.IOException: Could not complete write to file
/hbase/log_10.24.1.16_1236922775003_60020/hlog.dat.1236925456649 by
DFSClient_-1142813574
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:378)
        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

In datanode, like:
2009-03-13 14:28:45,519 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.24.1.12:50010, storageID=DS-1393238225-10.24.1.12-50010-1236857766916,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_-4945343177244655889_64091 to /10.24.1.10:
java.io.IOException: Block blk_-4945343177244655889_64091 is not valid.
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:726)
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:714)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
        at java.lang.Thread.run(Thread.java:619)

2009-03-13 14:28:45,519 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.24.1.12:50010, storageID=DS-1393238225-10.24.1.12-50010-1236857766916,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Block blk_-4945343177244655889_64091 is not valid.
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:726)
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:714)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
        at java.lang.Thread.run(Thread.java:619)

Schubert

On Fri, Mar 13, 2009 at 10:31 PM, Jean-Daniel Cryans <[email protected]>wrote:

> schubert,
>
> Yeah at 70 000 000 rows with only 5 region servers you might expect
> stuff like that... But this is not good. Can you take a look in your
> datanodes logs to see if there are any obvious exceptions around the
> time of the "error closing and deleting..." exception?
>
> I'd like to know more about your hardware.
>
> Also I strongly suggest that you try out the 0.19.1 RC1, it has really
> nice fixes.
>
> Thx for reporting all that stuff,
>
> J-D
>
> On Fri, Mar 13, 2009 at 5:41 AM, schubert zhang <[email protected]> wrote:
> > I have a 5+1 HBase/Hadoop cluster.  (5 region server and 1 master)A table
> > TESTTAB with only one column family with 36 qualifiers.
> > A process in the master node use batchUpdate (autoFlush=false) to insert
> > random rows into this table.
> > After about 70,000,000 rows inserted, failure. And on the web GUI of
> HBase,
> > I can only find 4 regionservers.  Then, I ssh into the missed node to
> check
> > the log.
> >
> > There is a very big log file (12GB), and when I "tail -f .." this log
> file,
> > endless FileNotFoundException is printing, like following:
> >
> > 2009-03-13 14:32:21,617 WARN
> > org.apache.hadoop.hbase.regionserver.HRegionServer: error getting store
> file
> > index size for 310680591/cdr: java.io.FileNotFoundException: File does
> not
> > exist:
> >
> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:941)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:334)
> > at java.lang.Thread.run(Thread.java:619) 2009-03-13 14:32:21,627 WARN
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message
> > (Retry: 0) java.io.FileNotFoundException: File does not exist:
> >
> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:625)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:341)
> > at java.lang.Thread.run(Thread.java:619) 2009-03-13 14:32:21,628 WARN
> > org.apache.hadoop.hbase.regionserver.HRegionServer: error getting store
> file
> > index size for 310680591/cdr: java.io.FileNotFoundException: File does
> not
> > exist:
> >
> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:941)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:334)
> > at java.lang.Thread.run(Thread.java:619) 2009-03-13 14:32:21,629 WARN
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message
> > (Retry: 1) java.io.FileNotFoundException: File does not exist:
> >
> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:625)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:341)
> > at java.lang.Thread.run(Thread.java:619)
> >
> > .......
> >
> > 2009-03-13 14:32:21,641 WARN
> > org.apache.hadoop.hbase.regionserver.HRegionServer: error getting store
> file
> > index size for 310680591/cdr: java.io.FileNotFoundException: File does
> not
> > exist:
> >
> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
> >        at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
> >        at
> >
> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
> >        at
> >
> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
> >        at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:941)
> >        at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:334)
> >        at java.lang.Thread.run(Thread.java:619)
> >
> > 2009-03-13 14:32:21,641 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Exceeded max retries:
> 10
> > java.io.FileNotFoundException: File does not exist:
> >
> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
> >        at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
> >        at
> >
> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
> >        at
> >
> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
> >        at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:625)
> >        at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:341)
> >        at java.lang.Thread.run(Thread.java:619)
> >
> > But it will not stop, still output  this exception with "Exceeded max
> > retries: 10".
> > ....endless...
> >
> > ======================================
> >
> > Before this exception, there is no other Exception on this node.
> > But on other nodes: there are some following:
> >
> > 2009-03-13 09:22:19,389 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: error closing and
> > deleting HLog org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> > Could not complete write to file
> > /hbase/log_10.24.1.16_1236857785825_60020/hlog.dat.1236906952707 by
> > DFSClient_-1142813574 at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:378)
> > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597) at
> > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at
> > org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) at
> > org.apache.hadoop.ipc.Client.call(Client.java:697) at
> > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at
> > $Proxy1.complete(Unknown Source) at
> > sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597) at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> > at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> > at $Proxy1.complete(Unknown Source) at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3130)
> > at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054)
> > at
> >
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
> > at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
> > at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:959)
> at
> > org.apache.hadoop.hbase.regionserver.HLog.close(HLog.java:421) at
> > org.apache.hadoop.hbase.regionserver.HLog.closeAndDelete(HLog.java:404)
> at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:373)
> > at java.lang.Thread.run(Thread.java:619)
> >
> >
> > and...
> >
> > 2009-03-13 10:03:40,149 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
> > org.apache.hadoop.hbase.NotServingRegionException: TESTTAB,,1236871823378
> at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
> > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597) at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> > 2009-03-13 10:03:40,150 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
> > org.apache.hadoop.hbase.NotServingRegionException: TESTTAB,,1236871823378
> at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
> > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597) at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> > 2009-03-13 10:03:40,150 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
> > org.apache.hadoop.hbase.NotServingRegionException: TESTTAB,,1236871823378
> at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
> > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597) at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >
> >
> > and ...
> >
> > 2009-03-13 13:01:58,220 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
> > org.apache.hadoop.hbase.NotServingRegionException:
> > CDRWAP,13576300...@2009-01-31 10:44:29.720,1236888415970 at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597) at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> > 2009-03-13 13:01:58,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 1 on 60020, call openScanner([...@7a5fe108, [...@2caf12fc,
> [...@1a07754f,
> > 9223372036854775807,
> > org.apache.hadoop.hbase.filter.whilematchrowfil...@5fa6a2e2) from
> > 10.24.1.10:44605: error:
> org.apache.hadoop.hbase.NotServingRegionException:
> > CDRWAP,13576300...@2009-01-31 10:44:29.720,1236888415970
> > org.apache.hadoop.hbase.NotServingRegionException:
> > CDRWAP,13576300...@2009-01-31 10:44:29.720,1236888415970 at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597) at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> > 2009-03-13 13:02:00,712 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
> > org.apache.hadoop.hbase.NotServingRegionException:
> > CDRWAP,13576300...@2009-01-31 10:44:29.720,1236888415970 at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597) at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> > 2009-03-13 13:02:00,730 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 8 on 60020, call openScanner([...@1eb57
> >
>

Reply via email to