All in 3 node : node1 : namenode, datanode, jobtracker, tasktracker, hmaster, region node2 : datanode, tasktracker, region node3 : datanode, tasktracker, region ----
>> > 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: >> > DataStreamer Exception: java.io.IOException: Unable to create new >> > block. >> > at namenode logs: ---- 2009-01-14 13:03:54,452 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2009-01-14 13:03:54,452 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/DenseMatrix_randllnma/compaction.dir/89428128/block/mapfiles/1969914437577830056/data. blk_8609709792065065878_14543 2009-01-14 13:03:55,781 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 61.247.201.163:50010 is added to blk_8609709792065065878_14543 size 67108864 2009-01-14 13:03:55,782 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2009-01-14 13:03:55,782 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/DenseMatrix_randllnma/compaction.dir/89428128/block/mapfiles/1969914437577830056/data. blk_7500745129745458361_14543 2009-01-14 13:03:56,057 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2009-01-14 13:03:56,057 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2009-01-14 13:03:56,058 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 61.247.201.165:50010 to delete blk_-3959166394378699051_13308 blk_502886823558166676_9198 blk_-5610559063210571767_11639 blk_6702150057882706492_13316 blk_-5444408121066664221_12917 blk_-3589842711099683684_13588 blk_6021853172562899162_13591 blk_4916378786226069241_10658 blk_-73948000127614266_14307 blk_2557829928774638206_14198 blk_-1274875861436874968_12707 blk_-5015487219925884636_6258 blk_-2299637893768828778_6867 blk_1098023530211333461_10832 blk_-3177548771524899519_11334 blk_7206161956876454365_13573 blk_5680570009091266394_10433 blk_3316251471276751802_9944 blk_2099728282945700283_11207 blk_6905553984719500492_13214 blk_-7106116525167441791_9576 blk_4379385578082585799_13472 blk_-7883876411219005437_13705 blk_463546149612246143_10886 blk_-7653965765662460467_12918 blk_3713772866976404885_9867 blk_6609568580654156411_13872 blk_4751791303428222695_13541 blk_4037366578825262863_8210 blk_8219092654455453467_12798 blk_-1422803420877997026_7957 blk_8300936932710895967_13735 blk_240135120783218949_6950 blk_-8159500787794013907_8394 blk_2688099293430950257_10472 blk_-1761018809405292520_13475 blk_-1128864877026465234_10130 blk_5407563808019590514_10763 blk_-1552641334423933916_12434 blk_-7029387758213145687_10061 blk_2577342369474690844_10606 blk_6631253829746909654_13551 blk_-7232622588691237910_9980 blk_2298610881366236419_13710 blk_-2887962599536680142_12816 blk_-2388595899338486062_11807 blk_7907922443118459013_11081 blk_6171415659899059586_11808 blk_-2039608738720071116_10676 blk_2020371844411420417_12545 blk_-7383687487311805520_13539 blk_-7482018213720259810_12328 blk_1264753810830518696_7837 blk_307963069359159831_11399 blk_4070872869299337885_10115 blk_-7670472609828967541_12755 blk_2012436361926027008_12584 blk_-1343739309916634143_13567 blk_8415337053938204597_12600 blk_-5704549986136158486_12545 blk_8394214138822655350_12974 blk_-2443471961554901789_12645 blk_-7237334559180624424_9959 blk_-8703341058779605761_12105 blk_-7047755282181199541_12189 blk_-7098214071255126673_12968 blk_7264472294671575886_8558 blk_-8073852726337049415_13892 blk_4600203458689631843_12005 blk_8852324706824983648_13549 blk_1303630004111722839_8720 blk_-1683388316308866611_13596 blk_2156472388468816613_10313 blk_5957706823407516027_13546 blk_-1563086059670811481_13053 blk_6849824098642582609_12915 blk_-4463282555782665548_10220 blk_8895781930292994008_10940 blk_-3629600968925541386_11846 blk_4934177062203626555_13161 blk_6073089223129638081_13858 blk_-941368068913837658_8515 blk_-8298444305398632744_12969 blk_2097411866805466421_10678 blk_5122633203781930902_13679 blk_3082337442730506186_13842 blk_5688615737901830577_11045 blk_8098726253380548026_13878 blk_8915658480780365903_12533 blk_-108377986544274284_12322 blk_-103346357025161241_13553 blk_-1120514732855196382_11335 blk_-5655595453584464335_9606 blk_-2459044407659470330_13493 blk_8600154252691938054_13724 blk_-6598484349484157044_13544 blk_6501730802180308257_12599 blk_7963054729340600565_11338 blk_3068539336658205460_8813 blk_-1021466638714964105_7505 2009-01-14 13:03:56,457 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 61.247.201.163:50010 is added to blk_7500745129745458361_14543 size 41025370 2009-01-14 13:03:56,458 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 5731 Total time for transactions(ms): 46 Number of syncs: 2923 SyncTimes(ms): 131124 2009-01-14 13:03:56,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/61.247.201.165 cmd=listStatus src=/hbase/DenseMatrix_randllnma/1262088429/attribute/mapfiles dst=null perm=null 2009-01-14 13:03:56,629 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/61.247.201.165 cmd=listStatus src=/hbase/DenseMatrix_randllnma/1262088429/block/mapfiles dst=null perm=null 2009-01-14 13:03:56,631 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/61.247.201.165 On Wed, Jan 14, 2009 at 6:33 PM, Samuel Guo <[email protected]> wrote: > 3 node cluster of Hadoop? > 3 node cluster of HBase? > > Can you attach the logs of hadoop namenode, datanodes, hbase master, and > hbase regionservers? Thanks in advance. > I am doubting that too many files opened cause datanode use out all the > xceivers .so DFSClient can create new block. > > On Wed, Jan 14, 2009 at 5:20 PM, Edward J. Yoon <[email protected]>wrote: > >> I tried to 10,000 by 10,000 mat-mat mult on 3 node. >> >> -random matrices successfully generated. >> -collecting jobs are successfully done. >> -successfully mult them in the map phase. >> >> And, during reduce job (sum operation and data insert operation) , the >> following is happened. >> >> ---------- Forwarded message ---------- >> From: stack <[email protected]> >> Date: Wed, Jan 14, 2009 at 3:50 PM >> Subject: Re: Frequent downs of region server >> To: [email protected] >> >> >> Edward J. Yoon wrote: >> > During write operation in reduce phase, region servers are killed. >> > (64,000 rows with 10,000 columns, 3 node) >> >> 10k columns is probably over what hbase is currently able to do >> (hbase-867). >> >> You've seen the notes at end of the >> http://wiki.apache.org/hadoop/Hbase/Troubleshooting page? >> >> See other notes below: >> >> > ---- >> > 09/01/14 13:07:59 INFO mapred.JobClient: map 100% reduce 36% >> > 09/01/14 13:11:38 INFO mapred.JobClient: map 100% reduce 33% >> > 09/01/14 13:11:38 INFO mapred.JobClient: Task Id : >> > attempt_200901140952_0010_r_000017_1, Status : FAILED >> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to >> > contact region server 61.247.201.163:60020 for region >> > DenseMatrix_randgnegu,,1231905480938, row '000000000000287', but >> > failed after 10 attempts. >> > Exceptions: >> > java.io.IOException: java.io.IOException: Server not running, aborting >> > at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2103) >> > at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1611) >> > ---- >> > >> You upped the hbase client timeouts? >> >> > And, I can't stop the hbase. >> > >> > [d8g053:/root]# hbase-trunk/bin/stop-hbase.sh >> > stopping >> >> master............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... >> > >> > Can it be recovered? >> >> What does master log say? Why ain't it going down? On tail of the log >> it'll usually say why its staying up. Probably a particular HRegionServer? >> >> > >> > ---- >> > Region server log: >> > >> > 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: >> > DataStreamer Exception: java.io.IOException: Unable to create new >> > block. >> > at >> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723) >> > at >> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) >> > at >> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) >> >> These look like issue that config. on the troubleshooting page might >> address >> (check your datanode logs). You are using 0.18.0 hbase? >> >> St.Ack >> >> >> >> On Tue, Jan 13, 2009 at 8:42 PM, Edward J. Yoon <[email protected] >> >wrote: >> >> > During write operation in reduce phase, region servers are killed. >> > (64,000 rows with 10,000 columns, 3 node) >> > >> > ---- >> > 09/01/14 13:07:59 INFO mapred.JobClient: map 100% reduce 36% >> > 09/01/14 13:11:38 INFO mapred.JobClient: map 100% reduce 33% >> > 09/01/14 13:11:38 INFO mapred.JobClient: Task Id : >> > attempt_200901140952_0010_r_000017_1, Status : FAILED >> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to >> > contact region server 61.247.201.163:60020 for region >> > DenseMatrix_randgnegu,,1231905480938, row '000000000000287', but >> > failed after 10 attempts. >> > Exceptions: >> > java.io.IOException: java.io.IOException: Server not running, aborting >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2103) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1611) >> > ---- >> > >> > And, I can't stop the hbase. >> > >> > [d8g053:/root]# hbase-trunk/bin/stop-hbase.sh >> > stopping >> > >> master............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... >> > >> > Can it be recovered? >> > >> > ---- >> > Region server log: >> > >> > 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: >> > DataStreamer Exception: java.io.IOException: Unable to create new >> > block. >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) >> > 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: Error >> > Recovery for block blk_-4005955194083205373_14543 bad datanode[0] >> > nodes == null >> > 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: Could >> > not get block locations. Aborting... >> > 2009-01-14 13:03:56,629 ERROR >> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: >> > Compaction/Split failed for region >> > DenseMatrix_randllnma,000000000000,18,7-29116,1231898419257 >> > java.io.IOException: Could not read from stream >> > at >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119) >> > at java.io.DataInputStream.readByte(DataInputStream.java:248) >> > at >> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325) >> > at >> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) >> > at org.apache.hadoop.io.Text.readString(Text.java:400) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) >> > 2009-01-14 13:03:56,631 INFO >> > org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on >> > region DenseMatrix_randllnma,00000000000,16,19-26373,1231898311583 >> > 2009-01-14 13:03:56,692 INFO org.apache.hadoop.io.compress.CodecPool: >> > Got brand-new decompressor >> > 2009-01-14 13:03:56,692 INFO org.apache.hadoop.io.compress.CodecPool: >> > Got brand-new decompressor >> > 2009-01-14 13:03:56,693 INFO org.apache.hadoop.io.compress.CodecPool: >> > Got brand-new decompressor >> > 2009-01-14 13:03:56,693 INFO org.apache.hadoop.io.compress.CodecPool: >> > Got brand-new decompressor >> > 2009-01-14 13:03:57,521 INFO org.apache.hadoop.io.compress.CodecPool: >> > Got brand-new compressor >> > 2009-01-14 13:03:57,810 INFO org.apache.hadoop.hdfs.DFSClient: >> > Exception in createBlockOutputStream java.io.IOException: Could not >> > read from stream >> > 2009-01-14 13:03:57,810 INFO org.apache.hadoop.hdfs.DFSClient: >> > Abandoning block blk_-2612702056484946948_14554 >> > 2009-01-14 13:03:59,343 WARN org.apache.hadoop.hdfs.DFSClient: >> > DataStreamer Exception: java.io.IOException: Unable to create new >> > block. >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) >> > >> > 2009-01-14 13:03:59,344 WARN org.apache.hadoop.hdfs.DFSClient: Error >> > Recovery for block blk_-5255885897790790367_14543 bad datanode[0] >> > nodes == null >> > 2009-01-14 13:03:59,344 WARN org.apache.hadoop.hdfs.DFSClient: Could >> > not get block locations. Aborting... >> > 2009-01-14 13:03:59,344 FATAL >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog >> > required. Forcing server shutdown >> > org.apache.hadoop.hbase.DroppedSnapshotException: region: >> > DenseMatrix_randgnegu,,1231905480938 >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:896) >> > at >> > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789) >> > at >> > >> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227) >> > at >> > >> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137) >> > Caused by: java.io.IOException: Could not read from stream >> > at >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119) >> > at java.io.DataInputStream.readByte(DataInputStream.java:248) >> > at >> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325) >> > at >> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) >> > at org.apache.hadoop.io.Text.readString(Text.java:400) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) >> > 2009-01-14 13:03:59,359 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: >> > request=15, regions=48, stores=192, storefiles=756, >> > storefileIndexSize=6, memcacheSize=338, usedHeap=395, maxHeap=971 >> > 2009-01-14 13:03:59,359 INFO >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: >> > regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher exiting >> > 2009-01-14 13:03:59,368 INFO >> > org.apache.hadoop.hbase.regionserver.HLog: Closed >> > hdfs:// >> > >> dev3.nm2.naver.com:9000/hbase/log_61.247.201.165_1231894400437_60020/hlog.dat.1231905813472 >> > , >> > entries=896500. New log writer: >> > /hbase/log_61.247.201.165_1231894400437_60020/hlog.dat.1231905839367 >> > >> > 2009-01-14 13:03:59,368 INFO >> > org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. >> > >> > >> > >> > -- >> > Best Regards, Edward J. Yoon @ NHN, corp. >> > [email protected] >> > http://blog.udanax.org >> > >> >> >> >> -- >> Best Regards, Edward J. Yoon @ NHN, corp. >> [email protected] >> http://blog.udanax.org >> > -- Best Regards, Edward J. Yoon @ NHN, corp. [email protected] http://blog.udanax.org
