Thanks Colin and Suresh!

On Wed, Mar 13, 2013 at 3:08 PM, Colin McCabe <cmcc...@alumni.cmu.edu>wrote:

> My understanding is that the 10 minute timeout helps to avoid replication
> storms, especially during startup.
>
> You might be interested in HDFS-3703, which adds a "stale" state which
> datanodes are placed into after 30 seconds of missing heartbeats.  (This is
> an optional feature controlled by dfs.namenode.check.stale.datanode )
>
> best,
> Colin
>
>
> On Tue, Mar 12, 2013 at 5:29 PM, André Oriani <aori...@gmail.com> wrote:
>
> > No take on this one?
> >
> > In Zookeeper the heartbeats happen on every third of the timeout.  If I
> am
> > not mistaken, recomended timeout is  more than 2 minutes to avoid false
> > positives.
> >
> > But I still cannot see the relationship on HDFS between heartbeat
> interval
> > and timeout. Okay 10 minutes seems to be a conservative value to avoid
> > false positives in  a big cluster. But that means 200 hearbeats.
> Heartbeats
> > on HDFS are not only used for liveness detection but also to send
> > information about free space and load and to receive commands from
> > NameNode. So they are also essential for block placement decisions and
> for
> > ensuring the replication levels. Would that then be reason why heartbeats
> > are so frequent? A lot can happen to a DataNode in just three seconds?
> >
> >
> > Thanks,
> > André Oriani
> >
> >
> >
> > On Thu, Mar 7, 2013 at 10:37 PM, André Oriani <aori...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Is there any particular reason why the default heartbeat interval is 3
> > > seconds and the timeout is 10 minutes? Everywhere I looked (code,
> Google,
> > > ..) only mentions  the values but no clue on why those values were
> > chosen.
> > >
> > >
> > > Thanks in advance,
> > > André Oriani
> > >
> >
>

Reply via email to