subject:"Cluster in Safe Mode"

Re: Cluster in Safe Mode

2010-04-07 Thread Manish N

On Wed, Apr 7, 2010 at 7:27 PM, Edson Ramiro  wrote:

> To solve the safemode problem, you may first start the DFS, leave the
> safemode and do a fsck.
>
> ./bin/start-dfs
> ./bin/hadoop dfs -safemode leave
> ./bin/hadoop fsck /
>
> After this, restart the DFS.
>
> You can configure HADOOP_OPTS in conf/hadoop-env.sh to give more mem. to
> Java.
> Also configure HADOOP_HEAPSIZE.
>


Yes that's exactly what I did, DataNodes are back. I've taken them out of
safe mode.

I'm planning to upgrade the hadoop instance to latest stable release. How
wise would this be ?




> # export HADOOP_OPTS="-server -XX:+HeapDumpOnOutOfMemoryError
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseParallelGC
> -XX:ParallelGCThreads=4 -XX:NewSize=1G -XX:MaxNewSize=1G
>
> Edson Ramiro
>
>
> On 7 April 2010 06:04, Manish N  wrote:
>
> > On Wed, Apr 7, 2010 at 10:59 AM, Sagar Shukla <
> > sagar_shu...@persistent.co.in
> > > wrote:
> >
> > > Hi Manish,
> > >  Do you see any errors on DataNode log-files ? It is quite likely
> > that
> > > after the namenode starts the processes on datanode then are failing to
> > > start, causing the namenode to wait in safe mode for datanode services
> to
> > > start.
> > >
> >
> > I do see following in the DataNode.out file whenever I start a DataNode
> on
> > both the DataNodes of mine, after sometime they are marked as dead as
> > expected.
> >
> > Exception in thread "DataNode: [/root/Datadir/hadoop/dfs/data]"
> > java.lang.OutOfMemoryError: Java heap space
> >at java.util.Arrays.copyOf(Arrays.java:2786)
> >at
> > java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
> >at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
> >at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:274)
> >at org.apache.hadoop.io.UTF8.writeString(UTF8.java:246)
> >at
> > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:120)
> >at
> > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:126)
> >at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:109)
> >at
> > org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:474)
> >at org.apache.hadoop.ipc.Client.call(Client.java:706)
> >at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> >at org.apache.hadoop.dfs.$Proxy4.blockReport(Unknown Source)
> >at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:744)
> >at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2967)
> >at java.lang.Thread.run(Thread.java:619)
> >
> >
> >
> >
> >
> >
> > >
> > > Thanks,
> > > Sagar
> > >
> > > -Original Message-
> > > From: Manish N [mailto:m1n...@gmail.com]
> > > Sent: Wednesday, April 07, 2010 10:47 AM
> > > To: common-user@hadoop.apache.org
> > > Subject: Cluster in Safe Mode
> > >
> > > Hey all,
> > >
> > > I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16
> > hrs
> > > now & yet to come out of Safe Mode. Does it normally take that long ?
> > >
> > > The DataNode logs on Node running NameNode indicates following &
> similar
> > > output on the slave node ( running only Data Node ) as well.
> > >
> > > 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > > Verification succeeded for blk_-310922324774702076_996024
> > > 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > > Verification succeeded for blk_3302288729849061244_813694
> > > 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > > Verification succeeded for blk_-7252548330326272479_1259723
> > > 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > > Verification succeeded for blk_-5909954202848831867_1075933
> > > 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > > Verification succeeded for blk_-3213723859645738103_1075939
> > > 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > > Verification succeeded for blk_-2209269106581706132_676390
> > > 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > > Verification succeeded for blk_-6007998488187910667_676379
> > > 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > > Verification

Re: Cluster in Safe Mode

2010-04-07 Thread Edson Ramiro

To solve the safemode problem, you may first start the DFS, leave the
safemode and do a fsck.

./bin/start-dfs
./bin/hadoop dfs -safemode leave
./bin/hadoop fsck /

After this, restart the DFS.

You can configure HADOOP_OPTS in conf/hadoop-env.sh to give more mem. to
Java.
Also configure HADOOP_HEAPSIZE.

# export HADOOP_OPTS="-server -XX:+HeapDumpOnOutOfMemoryError
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseParallelGC
-XX:ParallelGCThreads=4 -XX:NewSize=1G -XX:MaxNewSize=1G

Edson Ramiro


On 7 April 2010 06:04, Manish N  wrote:

> On Wed, Apr 7, 2010 at 10:59 AM, Sagar Shukla <
> sagar_shu...@persistent.co.in
> > wrote:
>
> > Hi Manish,
> >  Do you see any errors on DataNode log-files ? It is quite likely
> that
> > after the namenode starts the processes on datanode then are failing to
> > start, causing the namenode to wait in safe mode for datanode services to
> > start.
> >
>
> I do see following in the DataNode.out file whenever I start a DataNode on
> both the DataNodes of mine, after sometime they are marked as dead as
> expected.
>
> Exception in thread "DataNode: [/root/Datadir/hadoop/dfs/data]"
> java.lang.OutOfMemoryError: Java heap space
>at java.util.Arrays.copyOf(Arrays.java:2786)
>at
> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
>at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
>at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:274)
>at org.apache.hadoop.io.UTF8.writeString(UTF8.java:246)
>at
> org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:120)
>at
> org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:126)
>at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:109)
>at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:474)
>at org.apache.hadoop.ipc.Client.call(Client.java:706)
>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>at org.apache.hadoop.dfs.$Proxy4.blockReport(Unknown Source)
>at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:744)
>at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2967)
>at java.lang.Thread.run(Thread.java:619)
>
>
>
>
>
>
> >
> > Thanks,
> > Sagar
> >
> > -Original Message-
> > From: Manish N [mailto:m1n...@gmail.com]
> > Sent: Wednesday, April 07, 2010 10:47 AM
> > To: common-user@hadoop.apache.org
> > Subject: Cluster in Safe Mode
> >
> > Hey all,
> >
> > I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16
> hrs
> > now & yet to come out of Safe Mode. Does it normally take that long ?
> >
> > The DataNode logs on Node running NameNode indicates following & similar
> > output on the slave node ( running only Data Node ) as well.
> >
> > 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_-310922324774702076_996024
> > 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_3302288729849061244_813694
> > 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_-7252548330326272479_1259723
> > 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_-5909954202848831867_1075933
> > 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_-3213723859645738103_1075939
> > 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_-2209269106581706132_676390
> > 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_-6007998488187910667_676379
> > 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_-1024215056075897357_676383
> > 2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_3780597313184168671_1270304
> > 2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner:
> > Verification succeeded for blk_8891623760013835158_676336
> >
> > One thing I wanted to point out is sometime back I'd to do setrep on the
> > entire Cluster, are these verifications messages related to that ?
> >
> > Also while going through the NameNode logs i encountered following
> things.
> >
> > 2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> > NameSystem.heartbeatCheck: lost heartbeat from 1

Re: Cluster in Safe Mode

2010-04-07 Thread Manish N

On Wed, Apr 7, 2010 at 10:59 AM, Sagar Shukla  wrote:

> Hi Manish,
>  Do you see any errors on DataNode log-files ? It is quite likely that
> after the namenode starts the processes on datanode then are failing to
> start, causing the namenode to wait in safe mode for datanode services to
> start.
>

I do see following in the DataNode.out file whenever I start a DataNode on
both the DataNodes of mine, after sometime they are marked as dead as
expected.

Exception in thread "DataNode: [/root/Datadir/hadoop/dfs/data]"
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:274)
at org.apache.hadoop.io.UTF8.writeString(UTF8.java:246)
at
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:120)
at
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:126)
at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:109)
at
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:474)
at org.apache.hadoop.ipc.Client.call(Client.java:706)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy4.blockReport(Unknown Source)
at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:744)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2967)
at java.lang.Thread.run(Thread.java:619)






>
> Thanks,
> Sagar
>
> -Original Message-
> From: Manish N [mailto:m1n...@gmail.com]
> Sent: Wednesday, April 07, 2010 10:47 AM
> To: common-user@hadoop.apache.org
> Subject: Cluster in Safe Mode
>
> Hey all,
>
> I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 hrs
> now & yet to come out of Safe Mode. Does it normally take that long ?
>
> The DataNode logs on Node running NameNode indicates following & similar
> output on the slave node ( running only Data Node ) as well.
>
> 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_-310922324774702076_996024
> 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_3302288729849061244_813694
> 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_-7252548330326272479_1259723
> 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_-5909954202848831867_1075933
> 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_-3213723859645738103_1075939
> 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_-2209269106581706132_676390
> 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_-6007998488187910667_676379
> 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_-1024215056075897357_676383
> 2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_3780597313184168671_1270304
> 2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner:
> Verification succeeded for blk_8891623760013835158_676336
>
> One thing I wanted to point out is sometime back I'd to do setrep on the
> entire Cluster, are these verifications messages related to that ?
>
> Also while going through the NameNode logs i encountered following things.
>
> 2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010
> 2010-04-05 21:01:49,240 INFO org.apache.hadoop.net.NetworkTopology:
> Removing
> a node: /default-rack/192.168.100.21:50010
> 2010-04-05 21:01:49,243 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010
> 2010-04-05 21:02:01,791 INFO org.apache.hadoop.net.NetworkTopology:
> Removing
> a node: /default-rack/192.168.100.2:50010
>
> then again @
>
> 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010
> 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology:
> Removing
> a node: /default-rack/192.168.100.21:50010
> 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010
> 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology:
> Removing
> a node: /default-rack/192.1

RE: Cluster in Safe Mode

2010-04-06 Thread Sagar Shukla

Hi Manish,
  Do you see any errors on DataNode log-files ? It is quite likely that 
after the namenode starts the processes on datanode then are failing to start, 
causing the namenode to wait in safe mode for datanode services to start.

Thanks,
Sagar

-Original Message-
From: Manish N [mailto:m1n...@gmail.com]
Sent: Wednesday, April 07, 2010 10:47 AM
To: common-user@hadoop.apache.org
Subject: Cluster in Safe Mode

Hey all,

I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 hrs
now & yet to come out of Safe Mode. Does it normally take that long ?

The DataNode logs on Node running NameNode indicates following & similar
output on the slave node ( running only Data Node ) as well.

2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-310922324774702076_996024
2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_3302288729849061244_813694
2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-7252548330326272479_1259723
2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-5909954202848831867_1075933
2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-3213723859645738103_1075939
2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-2209269106581706132_676390
2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-6007998488187910667_676379
2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-1024215056075897357_676383
2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_3780597313184168671_1270304
2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_8891623760013835158_676336

One thing I wanted to point out is sometime back I'd to do setrep on the
entire Cluster, are these verifications messages related to that ?

Also while going through the NameNode logs i encountered following things.

2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010
2010-04-05 21:01:49,240 INFO org.apache.hadoop.net.NetworkTopology: Removing
a node: /default-rack/192.168.100.21:50010
2010-04-05 21:01:49,243 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010
2010-04-05 21:02:01,791 INFO org.apache.hadoop.net.NetworkTopology: Removing
a node: /default-rack/192.168.100.2:50010

then again @

2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010
2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing
a node: /default-rack/192.168.100.21:50010
2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010
2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing
a node: /default-rack/192.168.100.2:50010

I had to restart the cluster post which I got both the nodes back.

2010-04-06 10:11:24,325 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.registerDatanode: node registration from
192.168.100.21:50010storage DS-455083797-192
.168.100.21-50010-1268220157729
2010-04-06 10:11:24,328 INFO org.apache.hadoop.net.NetworkTopology: Adding a
new node: /default-rack/192.168.100.21:50010
2010-04-06 10:11:25,245 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/data/listing/image/5/84025/35924c87e664a43893904effbd2be601_list.jpg.
blk_-1845977707636580795_1665561
2010-04-06 10:11:25,342 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 192.168.100.21:50010 is added
to blk_-1845977707636580795_1665561 size 72753
2010-04-06 10:11:44,257 INFO org.apache.hadoop.fs.FSNamesystem: Number of
transactions: 64 Total time for transactions(ms): 4 Number of syncs: 45
SyncTimes(ms): 387
2010-04-06 10:11:51,485 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.registerDatanode: node registration from
192.168.100.2:50010storage
DS-1237294752-192.168.100.2-50010-1252010614375
2010-04-06 10:11:51,488 INFO org.apache.hadoop.net.NetworkTopology: Adding a
new node: /default-rack/192.168.100.2:50010

Then again subsequently they were removed. No clue why this happened.

Ever since I'm seeing following things in logs..

2010-04-06 10:00:49,052 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 54310, call
create(/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg,
rwxr-xr-x, DFSClient_1226879860, true, 2, 67108864) from 192.168.100.5:40437:
error: org.apache.hadoop.dfs.SafeModeException: Cannot create
fi

Re: Cluster in Safe Mode

2010-04-06 Thread Ravi Phulari

Looks like your all data nodes are down. Please make sure your data nodes are 
up and running (Check from Name node web ui and by jps on data nodes).
Fsck is showing that there are 0 minimally replicated files and Average block 
replication is 0.
Also please verify if your Data nodes data dir has any blocks.

-
Ravi


On 4/6/10 10:16 PM, "Manish N"  wrote:

CORRUPT FILES:1601525
  MISSING BLOCKS:1601927
  MISSING SIZE:540525108291 B
  CORRUPT BLOCKS: 1601927
  
Minimally replicated blocks:0 (0.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:0 (0.0 %)
Mis-replicated blocks:0 (0.0 %)
Default replication factor:2
Average block replication:0.0
Corrupt blocks:1601927

Ravi
--

Cluster in Safe Mode

2010-04-06 Thread Manish N

Hey all,

I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 hrs
now & yet to come out of Safe Mode. Does it normally take that long ?

The DataNode logs on Node running NameNode indicates following & similar
output on the slave node ( running only Data Node ) as well.

2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-310922324774702076_996024
2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_3302288729849061244_813694
2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-7252548330326272479_1259723
2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-5909954202848831867_1075933
2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-3213723859645738103_1075939
2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-2209269106581706132_676390
2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-6007998488187910667_676379
2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_-1024215056075897357_676383
2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_3780597313184168671_1270304
2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner:
Verification succeeded for blk_8891623760013835158_676336

One thing I wanted to point out is sometime back I'd to do setrep on the
entire Cluster, are these verifications messages related to that ?

Also while going through the NameNode logs i encountered following things.

2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010
2010-04-05 21:01:49,240 INFO org.apache.hadoop.net.NetworkTopology: Removing
a node: /default-rack/192.168.100.21:50010
2010-04-05 21:01:49,243 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010
2010-04-05 21:02:01,791 INFO org.apache.hadoop.net.NetworkTopology: Removing
a node: /default-rack/192.168.100.2:50010

then again @

2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010
2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing
a node: /default-rack/192.168.100.21:50010
2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010
2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing
a node: /default-rack/192.168.100.2:50010

I had to restart the cluster post which I got both the nodes back.

2010-04-06 10:11:24,325 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.registerDatanode: node registration from
192.168.100.21:50010storage DS-455083797-192
.168.100.21-50010-1268220157729
2010-04-06 10:11:24,328 INFO org.apache.hadoop.net.NetworkTopology: Adding a
new node: /default-rack/192.168.100.21:50010
2010-04-06 10:11:25,245 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/data/listing/image/5/84025/35924c87e664a43893904effbd2be601_list.jpg.
blk_-1845977707636580795_1665561
2010-04-06 10:11:25,342 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 192.168.100.21:50010 is added
to blk_-1845977707636580795_1665561 size 72753
2010-04-06 10:11:44,257 INFO org.apache.hadoop.fs.FSNamesystem: Number of
transactions: 64 Total time for transactions(ms): 4 Number of syncs: 45
SyncTimes(ms): 387
2010-04-06 10:11:51,485 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.registerDatanode: node registration from
192.168.100.2:50010storage
DS-1237294752-192.168.100.2-50010-1252010614375
2010-04-06 10:11:51,488 INFO org.apache.hadoop.net.NetworkTopology: Adding a
new node: /default-rack/192.168.100.2:50010

Then again subsequently they were removed. No clue why this happened.

Ever since I'm seeing following things in logs..

2010-04-06 10:00:49,052 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 54310, call
create(/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg,
rwxr-xr-x, DFSClient_1226879860, true, 2, 67108864) from 192.168.100.5:40437:
error: org.apache.hadoop.dfs.SafeModeException: Cannot create
file/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg.
Name node is in safe mode.
The ratio of reported blocks 0. has not reached the threshold 0.9990.
Safe mode will be turned off automatically.
org.apache.hadoop.dfs.SafeModeException: Cannot create
file/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg.
Name node is in safe mode.
The ratio of reported blocks 0. has not reached the threshold 0.9990.
Safe mo

Re: Cluster in Safe Mode

Re: Cluster in Safe Mode

Re: Cluster in Safe Mode

RE: Cluster in Safe Mode

Re: Cluster in Safe Mode

Cluster in Safe Mode

6 matches

Site Navigation

Mail list logo

Footer information