I have a 5+1 HBase/Hadoop cluster. (5 region server and 1 master)A table
TESTTAB with only one column family with 36 qualifiers.
A process in the master node use batchUpdate (autoFlush=false) to insert
random rows into this table.
After about 70,000,000 rows inserted, failure. And on the web GUI of HBase,
I can only find 4 regionservers. Then, I ssh into the missed node to check
the log.
There is a very big log file (12GB), and when I "tail -f .." this log file,
endless FileNotFoundException is printing, like following:
2009-03-13 14:32:21,617 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: error getting store file
index size for 310680591/cdr: java.io.FileNotFoundException: File does not
exist:
hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
at
org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:941)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:334)
at java.lang.Thread.run(Thread.java:619) 2009-03-13 14:32:21,627 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message
(Retry: 0) java.io.FileNotFoundException: File does not exist:
hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
at
org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:625)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:341)
at java.lang.Thread.run(Thread.java:619) 2009-03-13 14:32:21,628 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: error getting store file
index size for 310680591/cdr: java.io.FileNotFoundException: File does not
exist:
hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
at
org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:941)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:334)
at java.lang.Thread.run(Thread.java:619) 2009-03-13 14:32:21,629 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message
(Retry: 1) java.io.FileNotFoundException: File does not exist:
hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
at
org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:625)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:341)
at java.lang.Thread.run(Thread.java:619)
.......
2009-03-13 14:32:21,641 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: error getting store file
index size for 310680591/cdr: java.io.FileNotFoundException: File does not
exist:
hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
at
org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:941)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:334)
at java.lang.Thread.run(Thread.java:619)
2009-03-13 14:32:21,641 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Exceeded max retries: 10
java.io.FileNotFoundException: File does not exist:
hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
at
org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:625)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:341)
at java.lang.Thread.run(Thread.java:619)
But it will not stop, still output this exception with "Exceeded max
retries: 10".
....endless...
======================================
Before this exception, there is no other Exception on this node.
But on other nodes: there are some following:
2009-03-13 09:22:19,389 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: error closing and
deleting HLog org.apache.hadoop.ipc.RemoteException: java.io.IOException:
Could not complete write to file
/hbase/log_10.24.1.16_1236857785825_60020/hlog.dat.1236906952707 by
DFSClient_-1142813574 at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:378)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) at
org.apache.hadoop.ipc.Client.call(Client.java:697) at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at
$Proxy1.complete(Unknown Source) at
sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.complete(Unknown Source) at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3130)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:959) at
org.apache.hadoop.hbase.regionserver.HLog.close(HLog.java:421) at
org.apache.hadoop.hbase.regionserver.HLog.closeAndDelete(HLog.java:404) at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:373)
at java.lang.Thread.run(Thread.java:619)
and...
2009-03-13 10:03:40,149 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
org.apache.hadoop.hbase.NotServingRegionException: TESTTAB,,1236871823378 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-03-13 10:03:40,150 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
org.apache.hadoop.hbase.NotServingRegionException: TESTTAB,,1236871823378 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-03-13 10:03:40,150 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
org.apache.hadoop.hbase.NotServingRegionException: TESTTAB,,1236871823378 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
and ...
2009-03-13 13:01:58,220 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
org.apache.hadoop.hbase.NotServingRegionException:
CDRWAP,13576300...@2009-01-31 10:44:29.720,1236888415970 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-03-13 13:01:58,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020, call openScanner([...@7a5fe108, [...@2caf12fc,
[...@1a07754f,
9223372036854775807,
org.apache.hadoop.hbase.filter.whilematchrowfil...@5fa6a2e2) from
10.24.1.10:44605: error: org.apache.hadoop.hbase.NotServingRegionException:
CDRWAP,13576300...@2009-01-31 10:44:29.720,1236888415970
org.apache.hadoop.hbase.NotServingRegionException:
CDRWAP,13576300...@2009-01-31 10:44:29.720,1236888415970 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-03-13 13:02:00,712 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
org.apache.hadoop.hbase.NotServingRegionException:
CDRWAP,13576300...@2009-01-31 10:44:29.720,1236888415970 at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-03-13 13:02:00,730 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020, call openScanner([...@1eb57