[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

Yi Liu (JIRA) Mon, 21 Sep 2015 01:09:36 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900352#comment-14900352
 ]


Yi Liu commented on HDFS-9107:
------------------------------

Sorry I just see Steve's comments. 
{quote}
cores on different sockets may give different answers
{quote}
About the {{nanoTime}}, yes, I also ever saw similar points and discussion like 
this, but seems it's not correct and {{nanoTime}} is safe, see more discussion 
in 
http://stackoverflow.com/questions/510462/is-system-nanotime-completely-useless.
  (There are some links to oracle article.)

> Prevent NN's unrecoverable death spiral after full GC
> -----------------------------------------------------
>
>                 Key: HDFS-9107
>                 URL: https://issues.apache.org/jira/browse/HDFS-9107
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.0-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

Reply via email to