Finally failed between 7M-8M records. below is the last tail output. The other two region server don't have much activity in the logs, but i can post those if necessary.
=================================================== 2009-05-26 10:28:06,550 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_3233282543359573226_1303 bad datanode[0] 192.168.240.175:50010 2009-05-26 10:28:06,550 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_3233282543359573226_1303 in pipeline 192.168.240.175:50010, 192.168.240.180:50010: bad datanode 192.168.240.175:50010 2009-05-26 10:28:11,714 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 5000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.240.175:60733 remote=/192.168.240.180:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209) 2009-05-26 10:28:11,715 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_3233282543359573226_1311 bad datanode[0] 192.168.240.180:50010 2009-05-26 10:28:11,715 FATAL org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with ioe: java.io.IOException: All datanodes 192.168.240.180:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2444) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-05-26 10:28:11,716 FATAL org.apache.hadoop.hbase.regionserver.HLog: Could not append. Requesting close of log java.io.IOException: All datanodes 192.168.240.180:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2444) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-05-26 10:28:11,717 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: All datanodes 192.168.240.180:50010 are bad. Aborting... 2009-05-26 10:28:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=2, stores=4, storefiles=6, storefileIndexSize=0, memcacheSize=40, usedHeap=94, maxHeap=2999 2009-05-26 10:28:11,726 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2009-05-26 10:28:11,726 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020, call batchUpdates([...@41fb8e4, [Lorg.apache.hadoop.hbase.io.BatchUpdate;@3ea382d9) from 192.168.240.152:17086: error: java.io.IOException: All datanodes 192.168.240.180:50010 are bad. Aborting... java.io.IOException: All datanodes 192.168.240.180:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2444) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-05-26 10:28:12,894 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020 2009-05-26 10:28:12,895 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020: exiting 2009-05-26 10:28:12,895 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020: exiting 2009-05-26 10:28:12,895 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60020 2009-05-26 10:28:12,896 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020: exiting 2009-05-26 10:28:12,896 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020: exiting 2009-05-26 10:28:12,896 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020: exiting 2009-05-26 10:28:12,897 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020: exiting 2009-05-26 10:28:12,897 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60020: exiting 2009-05-26 10:28:12,898 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020: exiting 2009-05-26 10:28:12,898 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder 2009-05-26 10:28:12,898 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60020: exiting 2009-05-26 10:28:12,898 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020: exiting 2009-05-26 10:28:12,898 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer 2009-05-26 10:28:12,901 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030] 2009-05-26 10:28:12,908 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:60030 2009-05-26 10:28:13,345 INFO org.mortbay.util.Container: Stopped HttpContext[/logs,/logs] 2009-05-26 10:28:13,346 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.webapplicationhand...@6ad3c65d 2009-05-26 10:28:13,687 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/static,/static] 2009-05-26 10:28:13,687 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.webapplicationhand...@3adec8b3 2009-05-26 10:28:14,039 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/,/] 2009-05-26 10:28:14,040 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.ser...@6e79839 2009-05-26 10:28:14,040 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting 2009-05-26 10:28:14,040 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog 2009-05-26 10:28:14,040 INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: regionserver/0.0.0.0:60020.cacheFlusher exiting 2009-05-26 10:28:14,040 INFO org.apache.hadoop.hbase.regionserver.LogFlusher: regionserver/0.0.0.0:60020.logFlusher exiting 2009-05-26 10:28:14,040 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver/0.0.0.0:60020.compactor exiting 2009-05-26 10:28:14,040 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker: regionserver/0.0.0.0:60020.majorCompactionChecker exiting 2009-05-26 10:28:14,041 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed TableA,ROW_KEY,1243357190459 2009-05-26 10:28:14,041 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed TableA,,1243357190459 2009-05-26 10:28:14,041 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at: 192.168.240.175:60020 2009-05-26 10:28:14,044 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/0.0.0.0:60020 exiting 2009-05-26 10:28:14,270 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020.leaseChecker closing leases 2009-05-26 10:28:14,271 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020.leaseChecker closed leases 2009-05-26 10:28:14,273 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread. 2009-05-26 10:28:14,273 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete =================================================== -- View this message in context: http://www.nabble.com/HBase-looses-regions.-tp23657983p23727987.html Sent from the HBase User mailing list archive at Nabble.com.