[jira] Issue Comment Edited: (HADOOP-4061) Large number of decommission freezes the Namenode

Raghu Angadi (JIRA) Thu, 20 Nov 2008 15:13:10 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649525#action_12649525
 ]


rangadi edited comment on HADOOP-4061 at 11/20/08 3:12 PM:
----------------------------------------------------------------

(edit : correct jira number)
> I don't think we should redesign decommission feature here. Removing blocks 
> will be a redesign. 

hmm.. this is not done in the patch anyway. It does not look like redesign in 
any way to me. It is fairly simple to me : if excess replica is on a normal DN, 
it will be deleted (right?), then there is no reason to keep it if that happens 
to be a decommissioned node. Alright, I noticed HADOOP-4701.

      was (Author: rangadi):
    > I don't think we should redesign decommission feature here. Removing 
blocks will be a redesign. 

hmm.. this is not done in the patch anyway. I does not look like redesign in 
any way to me. It is fairly simple to me : if excess replica is on a normal DN, 
it will be deleted (right?), then there is no reason to keep it if that happens 
to be a decommissioned node. Alright, I noticed HADOOP-4071.
  
> Large number of decommission freezes the Namenode
> -------------------------------------------------
>
>                 Key: HADOOP-4061
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4061
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Koji Noguchi
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 4061_20081119.patch, 4061_20081120.patch
>
>
> On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks 
> each. Other 1500 nodes were almost empty.
> When decommission started, namenode's queue overflowed every 6 minutes.
> Looking at the cpu usage,  it showed that every 5 minutes 
> org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 
> 100% of the CPU for 1 minute causing the queue to overflow.
> {noformat}
>   public synchronized void decommissionedDatanodeCheck() {
>     for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
>          it.hasNext();) {
>       DatanodeDescriptor node = it.next();
>       checkDecommissionStateInternal(node);
>     }
>   }
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-4061) Large number of decommission freezes the Namenode

Reply via email to