Re: Cluster in Safe Mode
On Wed, Apr 7, 2010 at 7:27 PM, Edson Ramiro wrote: > To solve the safemode problem, you may first start the DFS, leave the > safemode and do a fsck. > > ./bin/start-dfs > ./bin/hadoop dfs -safemode leave > ./bin/hadoop fsck / > > After this, restart the DFS. > > You can configure HADOOP_OPTS in conf/hadoop-env.sh to give more mem. to > Java. > Also configure HADOOP_HEAPSIZE. > Yes that's exactly what I did, DataNodes are back. I've taken them out of safe mode. I'm planning to upgrade the hadoop instance to latest stable release. How wise would this be ? > # export HADOOP_OPTS="-server -XX:+HeapDumpOnOutOfMemoryError > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseParallelGC > -XX:ParallelGCThreads=4 -XX:NewSize=1G -XX:MaxNewSize=1G > > Edson Ramiro > > > On 7 April 2010 06:04, Manish N wrote: > > > On Wed, Apr 7, 2010 at 10:59 AM, Sagar Shukla < > > sagar_shu...@persistent.co.in > > > wrote: > > > > > Hi Manish, > > > Do you see any errors on DataNode log-files ? It is quite likely > > that > > > after the namenode starts the processes on datanode then are failing to > > > start, causing the namenode to wait in safe mode for datanode services > to > > > start. > > > > > > > I do see following in the DataNode.out file whenever I start a DataNode > on > > both the DataNodes of mine, after sometime they are marked as dead as > > expected. > > > > Exception in thread "DataNode: [/root/Datadir/hadoop/dfs/data]" > > java.lang.OutOfMemoryError: Java heap space > >at java.util.Arrays.copyOf(Arrays.java:2786) > >at > > java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71) > >at java.io.DataOutputStream.writeByte(DataOutputStream.java:136) > >at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:274) > >at org.apache.hadoop.io.UTF8.writeString(UTF8.java:246) > >at > > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:120) > >at > > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:126) > >at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:109) > >at > > org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:474) > >at org.apache.hadoop.ipc.Client.call(Client.java:706) > >at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > >at org.apache.hadoop.dfs.$Proxy4.blockReport(Unknown Source) > >at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:744) > >at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2967) > >at java.lang.Thread.run(Thread.java:619) > > > > > > > > > > > > > > > > > > Thanks, > > > Sagar > > > > > > -Original Message- > > > From: Manish N [mailto:m1n...@gmail.com] > > > Sent: Wednesday, April 07, 2010 10:47 AM > > > To: common-user@hadoop.apache.org > > > Subject: Cluster in Safe Mode > > > > > > Hey all, > > > > > > I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 > > hrs > > > now & yet to come out of Safe Mode. Does it normally take that long ? > > > > > > The DataNode logs on Node running NameNode indicates following & > similar > > > output on the slave node ( running only Data Node ) as well. > > > > > > 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner: > > > Verification succeeded for blk_-310922324774702076_996024 > > > 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner: > > > Verification succeeded for blk_3302288729849061244_813694 > > > 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner: > > > Verification succeeded for blk_-7252548330326272479_1259723 > > > 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner: > > > Verification succeeded for blk_-5909954202848831867_1075933 > > > 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner: > > > Verification succeeded for blk_-3213723859645738103_1075939 > > > 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner: > > > Verification succeeded for blk_-2209269106581706132_676390 > > > 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner: > > > Verification succeeded for blk_-6007998488187910667_676379 > > > 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner: > > > Verification
Re: Cluster in Safe Mode
To solve the safemode problem, you may first start the DFS, leave the safemode and do a fsck. ./bin/start-dfs ./bin/hadoop dfs -safemode leave ./bin/hadoop fsck / After this, restart the DFS. You can configure HADOOP_OPTS in conf/hadoop-env.sh to give more mem. to Java. Also configure HADOOP_HEAPSIZE. # export HADOOP_OPTS="-server -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:NewSize=1G -XX:MaxNewSize=1G Edson Ramiro On 7 April 2010 06:04, Manish N wrote: > On Wed, Apr 7, 2010 at 10:59 AM, Sagar Shukla < > sagar_shu...@persistent.co.in > > wrote: > > > Hi Manish, > > Do you see any errors on DataNode log-files ? It is quite likely > that > > after the namenode starts the processes on datanode then are failing to > > start, causing the namenode to wait in safe mode for datanode services to > > start. > > > > I do see following in the DataNode.out file whenever I start a DataNode on > both the DataNodes of mine, after sometime they are marked as dead as > expected. > > Exception in thread "DataNode: [/root/Datadir/hadoop/dfs/data]" > java.lang.OutOfMemoryError: Java heap space >at java.util.Arrays.copyOf(Arrays.java:2786) >at > java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71) >at java.io.DataOutputStream.writeByte(DataOutputStream.java:136) >at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:274) >at org.apache.hadoop.io.UTF8.writeString(UTF8.java:246) >at > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:120) >at > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:126) >at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:109) >at > org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:474) >at org.apache.hadoop.ipc.Client.call(Client.java:706) >at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) >at org.apache.hadoop.dfs.$Proxy4.blockReport(Unknown Source) >at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:744) >at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2967) >at java.lang.Thread.run(Thread.java:619) > > > > > > > > > > Thanks, > > Sagar > > > > -Original Message- > > From: Manish N [mailto:m1n...@gmail.com] > > Sent: Wednesday, April 07, 2010 10:47 AM > > To: common-user@hadoop.apache.org > > Subject: Cluster in Safe Mode > > > > Hey all, > > > > I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 > hrs > > now & yet to come out of Safe Mode. Does it normally take that long ? > > > > The DataNode logs on Node running NameNode indicates following & similar > > output on the slave node ( running only Data Node ) as well. > > > > 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_-310922324774702076_996024 > > 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_3302288729849061244_813694 > > 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_-7252548330326272479_1259723 > > 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_-5909954202848831867_1075933 > > 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_-3213723859645738103_1075939 > > 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_-2209269106581706132_676390 > > 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_-6007998488187910667_676379 > > 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_-1024215056075897357_676383 > > 2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_3780597313184168671_1270304 > > 2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner: > > Verification succeeded for blk_8891623760013835158_676336 > > > > One thing I wanted to point out is sometime back I'd to do setrep on the > > entire Cluster, are these verifications messages related to that ? > > > > Also while going through the NameNode logs i encountered following > things. > > > > 2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > > NameSystem.heartbeatCheck: lost heartbeat from 1
Re: Cluster in Safe Mode
On Wed, Apr 7, 2010 at 10:59 AM, Sagar Shukla wrote: > Hi Manish, > Do you see any errors on DataNode log-files ? It is quite likely that > after the namenode starts the processes on datanode then are failing to > start, causing the namenode to wait in safe mode for datanode services to > start. > I do see following in the DataNode.out file whenever I start a DataNode on both the DataNodes of mine, after sometime they are marked as dead as expected. Exception in thread "DataNode: [/root/Datadir/hadoop/dfs/data]" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71) at java.io.DataOutputStream.writeByte(DataOutputStream.java:136) at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:274) at org.apache.hadoop.io.UTF8.writeString(UTF8.java:246) at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:120) at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:126) at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:109) at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:474) at org.apache.hadoop.ipc.Client.call(Client.java:706) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy4.blockReport(Unknown Source) at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:744) at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2967) at java.lang.Thread.run(Thread.java:619) > > Thanks, > Sagar > > -Original Message- > From: Manish N [mailto:m1n...@gmail.com] > Sent: Wednesday, April 07, 2010 10:47 AM > To: common-user@hadoop.apache.org > Subject: Cluster in Safe Mode > > Hey all, > > I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 hrs > now & yet to come out of Safe Mode. Does it normally take that long ? > > The DataNode logs on Node running NameNode indicates following & similar > output on the slave node ( running only Data Node ) as well. > > 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-310922324774702076_996024 > 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_3302288729849061244_813694 > 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-7252548330326272479_1259723 > 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-5909954202848831867_1075933 > 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-3213723859645738103_1075939 > 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-2209269106581706132_676390 > 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-6007998488187910667_676379 > 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-1024215056075897357_676383 > 2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_3780597313184168671_1270304 > 2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_8891623760013835158_676336 > > One thing I wanted to point out is sometime back I'd to do setrep on the > entire Cluster, are these verifications messages related to that ? > > Also while going through the NameNode logs i encountered following things. > > 2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 > 2010-04-05 21:01:49,240 INFO org.apache.hadoop.net.NetworkTopology: > Removing > a node: /default-rack/192.168.100.21:50010 > 2010-04-05 21:01:49,243 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 > 2010-04-05 21:02:01,791 INFO org.apache.hadoop.net.NetworkTopology: > Removing > a node: /default-rack/192.168.100.2:50010 > > then again @ > > 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 > 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: > Removing > a node: /default-rack/192.168.100.21:50010 > 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 > 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: > Removing > a node: /default-rack/192.1
RE: Cluster in Safe Mode
Hi Manish, Do you see any errors on DataNode log-files ? It is quite likely that after the namenode starts the processes on datanode then are failing to start, causing the namenode to wait in safe mode for datanode services to start. Thanks, Sagar -Original Message- From: Manish N [mailto:m1n...@gmail.com] Sent: Wednesday, April 07, 2010 10:47 AM To: common-user@hadoop.apache.org Subject: Cluster in Safe Mode Hey all, I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 hrs now & yet to come out of Safe Mode. Does it normally take that long ? The DataNode logs on Node running NameNode indicates following & similar output on the slave node ( running only Data Node ) as well. 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-310922324774702076_996024 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_3302288729849061244_813694 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-7252548330326272479_1259723 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-5909954202848831867_1075933 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-3213723859645738103_1075939 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-2209269106581706132_676390 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-6007998488187910667_676379 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-1024215056075897357_676383 2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_3780597313184168671_1270304 2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_8891623760013835158_676336 One thing I wanted to point out is sometime back I'd to do setrep on the entire Cluster, are these verifications messages related to that ? Also while going through the NameNode logs i encountered following things. 2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 2010-04-05 21:01:49,240 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.21:50010 2010-04-05 21:01:49,243 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 2010-04-05 21:02:01,791 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.2:50010 then again @ 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.21:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.2:50010 I had to restart the cluster post which I got both the nodes back. 2010-04-06 10:11:24,325 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.100.21:50010storage DS-455083797-192 .168.100.21-50010-1268220157729 2010-04-06 10:11:24,328 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.100.21:50010 2010-04-06 10:11:25,245 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /data/listing/image/5/84025/35924c87e664a43893904effbd2be601_list.jpg. blk_-1845977707636580795_1665561 2010-04-06 10:11:25,342 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 192.168.100.21:50010 is added to blk_-1845977707636580795_1665561 size 72753 2010-04-06 10:11:44,257 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 64 Total time for transactions(ms): 4 Number of syncs: 45 SyncTimes(ms): 387 2010-04-06 10:11:51,485 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.100.2:50010storage DS-1237294752-192.168.100.2-50010-1252010614375 2010-04-06 10:11:51,488 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.100.2:50010 Then again subsequently they were removed. No clue why this happened. Ever since I'm seeing following things in logs.. 2010-04-06 10:00:49,052 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310, call create(/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg, rwxr-xr-x, DFSClient_1226879860, true, 2, 67108864) from 192.168.100.5:40437: error: org.apache.hadoop.dfs.SafeModeException: Cannot create fi
Re: Cluster in Safe Mode
Looks like your all data nodes are down. Please make sure your data nodes are up and running (Check from Name node web ui and by jps on data nodes). Fsck is showing that there are 0 minimally replicated files and Average block replication is 0. Also please verify if your Data nodes data dir has any blocks. - Ravi On 4/6/10 10:16 PM, "Manish N" wrote: CORRUPT FILES:1601525 MISSING BLOCKS:1601927 MISSING SIZE:540525108291 B CORRUPT BLOCKS: 1601927 Minimally replicated blocks:0 (0.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks:0 (0.0 %) Mis-replicated blocks:0 (0.0 %) Default replication factor:2 Average block replication:0.0 Corrupt blocks:1601927 Ravi --
Cluster in Safe Mode
Hey all, I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 hrs now & yet to come out of Safe Mode. Does it normally take that long ? The DataNode logs on Node running NameNode indicates following & similar output on the slave node ( running only Data Node ) as well. 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-310922324774702076_996024 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_3302288729849061244_813694 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-7252548330326272479_1259723 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-5909954202848831867_1075933 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-3213723859645738103_1075939 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-2209269106581706132_676390 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-6007998488187910667_676379 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-1024215056075897357_676383 2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_3780597313184168671_1270304 2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_8891623760013835158_676336 One thing I wanted to point out is sometime back I'd to do setrep on the entire Cluster, are these verifications messages related to that ? Also while going through the NameNode logs i encountered following things. 2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 2010-04-05 21:01:49,240 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.21:50010 2010-04-05 21:01:49,243 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 2010-04-05 21:02:01,791 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.2:50010 then again @ 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.21:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.2:50010 I had to restart the cluster post which I got both the nodes back. 2010-04-06 10:11:24,325 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.100.21:50010storage DS-455083797-192 .168.100.21-50010-1268220157729 2010-04-06 10:11:24,328 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.100.21:50010 2010-04-06 10:11:25,245 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /data/listing/image/5/84025/35924c87e664a43893904effbd2be601_list.jpg. blk_-1845977707636580795_1665561 2010-04-06 10:11:25,342 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 192.168.100.21:50010 is added to blk_-1845977707636580795_1665561 size 72753 2010-04-06 10:11:44,257 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 64 Total time for transactions(ms): 4 Number of syncs: 45 SyncTimes(ms): 387 2010-04-06 10:11:51,485 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.100.2:50010storage DS-1237294752-192.168.100.2-50010-1252010614375 2010-04-06 10:11:51,488 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.100.2:50010 Then again subsequently they were removed. No clue why this happened. Ever since I'm seeing following things in logs.. 2010-04-06 10:00:49,052 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310, call create(/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg, rwxr-xr-x, DFSClient_1226879860, true, 2, 67108864) from 192.168.100.5:40437: error: org.apache.hadoop.dfs.SafeModeException: Cannot create file/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg. Name node is in safe mode. The ratio of reported blocks 0. has not reached the threshold 0.9990. Safe mode will be turned off automatically. org.apache.hadoop.dfs.SafeModeException: Cannot create file/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg. Name node is in safe mode. The ratio of reported blocks 0. has not reached the threshold 0.9990. Safe mo