[jira] Commented: (HADOOP-4061) Large number of decommission freezes the Namenode

Raghu Angadi (JIRA) Thu, 20 Nov 2008 14:19:39 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649504#action_12649504
 ]


Raghu Angadi commented on HADOOP-4061:
--------------------------------------

> I would rather go with the approach, which counts down decommissioned blocks 
> as they are replicated. Then there is no need to scan all blocks to verify 
> the node is decommissioned, just check the counter.

The above is also useful and should help a lot here. Note these unnecessary 
blocks have overhead in other operations too (not just when checking the 
decommisioned nodes). Essentially, there is no need to treat these seperately.. 
we just need to give preference to deleting blocks from decommissioned nodes 
when there is excess replication.

> Large number of decommission freezes the Namenode
> -------------------------------------------------
>
>                 Key: HADOOP-4061
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4061
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Koji Noguchi
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 4061_20081119.patch
>
>
> On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks 
> each. Other 1500 nodes were almost empty.
> When decommission started, namenode's queue overflowed every 6 minutes.
> Looking at the cpu usage,  it showed that every 5 minutes 
> org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 
> 100% of the CPU for 1 minute causing the queue to overflow.
> {noformat}
>   public synchronized void decommissionedDatanodeCheck() {
>     for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
>          it.hasNext();) {
>       DatanodeDescriptor node = it.next();
>       checkDecommissionStateInternal(node);
>     }
>   }
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4061) Large number of decommission freezes the Namenode

Reply via email to