During write operation in reduce phase, region servers are killed. (64,000 rows with 10,000 columns, 3 node)
---- 09/01/14 13:07:59 INFO mapred.JobClient: map 100% reduce 36% 09/01/14 13:11:38 INFO mapred.JobClient: map 100% reduce 33% 09/01/14 13:11:38 INFO mapred.JobClient: Task Id : attempt_200901140952_0010_r_000017_1, Status : FAILED org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 61.247.201.163:60020 for region DenseMatrix_randgnegu,,1231905480938, row '000000000000287', but failed after 10 attempts. Exceptions: java.io.IOException: java.io.IOException: Server not running, aborting at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2103) at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1611) ---- And, I can't stop the hbase. [d8g053:/root]# hbase-trunk/bin/stop-hbase.sh stopping master............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Can it be recovered? ---- Region server log: 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-4005955194083205373_14543 bad datanode[0] nodes == null 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Aborting... 2009-01-14 13:03:56,629 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split failed for region DenseMatrix_randllnma,000000000000,18,7-29116,1231898419257 java.io.IOException: Could not read from stream at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) 2009-01-14 13:03:56,631 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region DenseMatrix_randllnma,00000000000,16,19-26373,1231898311583 2009-01-14 13:03:56,692 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2009-01-14 13:03:56,692 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2009-01-14 13:03:56,693 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2009-01-14 13:03:56,693 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2009-01-14 13:03:57,521 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2009-01-14 13:03:57,810 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 2009-01-14 13:03:57,810 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2612702056484946948_14554 2009-01-14 13:03:59,343 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) 2009-01-14 13:03:59,344 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-5255885897790790367_14543 bad datanode[0] nodes == null 2009-01-14 13:03:59,344 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Aborting... 2009-01-14 13:03:59,344 FATAL org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: DenseMatrix_randgnegu,,1231905480938 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:896) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789) at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227) at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137) Caused by: java.io.IOException: Could not read from stream at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) 2009-01-14 13:03:59,359 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=15, regions=48, stores=192, storefiles=756, storefileIndexSize=6, memcacheSize=338, usedHeap=395, maxHeap=971 2009-01-14 13:03:59,359 INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher exiting 2009-01-14 13:03:59,368 INFO org.apache.hadoop.hbase.regionserver.HLog: Closed hdfs://dev3.nm2.naver.com:9000/hbase/log_61.247.201.165_1231894400437_60020/hlog.dat.1231905813472, entries=896500. New log writer: /hbase/log_61.247.201.165_1231894400437_60020/hlog.dat.1231905839367 2009-01-14 13:03:59,368 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. -- Best Regards, Edward J. Yoon @ NHN, corp. edwardy...@apache.org http://blog.udanax.org