[jira] Created: (HADOOP-3810) NameNode seems unstable on a cluster with little space left

Raghu Angadi (JIRA) Tue, 22 Jul 2008 13:47:53 -0700

NameNode seems unstable on a cluster with little space left
-----------------------------------------------------------


                 Key: HADOOP-3810
                 URL: https://issues.apache.org/jira/browse/HADOOP-3810
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.17.1
            Reporter: Raghu Angadi
            Assignee: Raghu Angadi



NameNode seems not very responsive and unstable when the cluster has very 
little space left. The clients timeout. The main problem is that it is not 
clear the user what is going on. Once I have more details about a NameNode that 
was in this state, I will fill in here.

If there is not enough space left on a cluster, it is ok for clients to receive 
something like "DiskOutOfSpace" exception. 

Right now it looks like NameNode tries too hard find a node with any space left 
and ends up beeing slow to respond to clients. If the CPU take by 
chooseTarger() is the main reason, there two fixes possible :

# chooseTarget() iterates and takes quite a bit of CPU for allocating 
datanodes. Usually this not much of a problem. It takes even more cpu when it 
needs to search multiple racks for a datanode. We could probably reduce some 
CPU for these allocations. The benefit should be measurable.

# Also, once NameNode can not find any datanode that has space on a rack, it 
could mark the rack as "full" and skip searching the rack for next one minute 
or so. This flag gets cleared after a minute or if any new node is added to the 
rack.
#* Of course, this might not be optimal w.r.t disk space usage.. but only for a 
short duration. Once a cluster is mostly full, the user does expect errors.
#* On the flip side, this fix does not require extremely CPU optimized version 
of chooseTarget(). 
#* I think it is reasonable for NameNode to throw DiskOutOfSpace exception, 
even though it could have found space if it searched extensively.

 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-3810) NameNode seems unstable on a cluster with little space left

Reply via email to