Re: live/dead node problem

Ravi Prakash Wed, 30 Mar 2011 08:53:47 -0700

I haven't used 0.21. You can compare the source codes of the two versions.

I set these in namenode's hdfs-site.xml to 1. I'm not sure you'd want to do it 
on a production cluster if its a big one.



On 3/29/11 7:13 PM, "Rita" <rmorgan...@gmail.com> wrote:

what about for 0.21 ?

Also, where do you set this? in the data node configuration or namenode?
It seems the default is set to "3 seconds".

On Tue, Mar 29, 2011 at 5:37 PM, Ravi Prakash <ravip...@yahoo-inc.com> wrote:
I set these parameters for quickly discovering live / dead nodes.

For 0.20 : heartbeat.recheck.interval
For 0.22 : dfs.namenode.heartbeat.recheck-interval dfs.heartbeat.interval

Cheers,
Ravi


On 3/29/11 10:24 AM, "Michael Segel" <michael_se...@hotmail.com 
<http://michael_se...@hotmail.com> > wrote:



Rita,

When the NameNode doesn't see a heartbeat for 10 minutes, it then recognizes 
that the node is down.

Per the Hadoop online documentation:
"Each DataNode sends a Heartbeat message to the NameNode periodically. A 
network partition can cause a
        subset of DataNodes to lose connectivity with the NameNode. The 
NameNode detects this condition by the
        absence of a Heartbeat message. The NameNode marks DataNodes without 
recent Heartbeats as dead and
        does not forward any new IO requests to them. Any data that was
        registered to a dead DataNode is not available to HDFS any more. 
DataNode death may cause the replication
        factor of some blocks to fall below their specified value. The NameNode 
constantly tracks which blocks need
        to be replicated and initiates replication whenever necessary. The 
necessity for re-replication may arise due
        to many reasons: a DataNode may become unavailable, a replica may 
become corrupted, a hard disk on a
        DataNode may fail, or the replication factor of a file may be increased.
        "

I was trying to find out if there's an hdfs-site parameter that could be set to 
decrease this time period, but wasn't successful.

HTH

-Mike


----------------------------------------
> Date: Tue, 29 Mar 2011 08:13:43 -0400
> Subject: live/dead node problem
> From: rmorgan...@gmail.com <http://rmorgan...@gmail.com>
> To: common-user@hadoop.apache.org <http://common-user@hadoop.apache.org>
>
> Hello All,
>
> Is there a parameter or procedure to check more aggressively for a live/dead
> node? Despite me killing the hadoop process, I see the node active for more
> than 10+ minutes in the "Live Nodes" page. Fortunately, the last contact
> increments.
>
>
> Using, branch-0.21, 0985326
>
> --
> --- Get your facts first, then you can distort them as you please.--

Re: live/dead node problem

Reply via email to