[ 
https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2053:
------------------------------

    Fix Version/s: 0.23.0

> Bug in INodeDirectory#computeContentSummary warning
> ---------------------------------------------------
>
>                 Key: HDFS-2053
>                 URL: https://issues.apache.org/jira/browse/HDFS-2053
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0
>         Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch 
> applied.
> My impression is that the same issue exists also in the other branches where 
> the HDFS-1377 patch has been applied to (see description).
>            Reporter: Michael Noll
>            Assignee: Michael Noll
>            Priority: Minor
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: HDFS-2053_v1.txt, HDFS-2053_v2.txt, HDFS-2053_v3.txt, 
> hdfs-2053_v3-b20.patch
>
>
> *How to reproduce*
> {code}
> # create test directories
> $ hadoop fs -mkdir /hdfs-1377/A
> $ hadoop fs -mkdir /hdfs-1377/B
> $ hadoop fs -mkdir /hdfs-1377/C
> # ...add some test data (few kB or MB) to all three dirs...
> # set space quota for subdir C only
> $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C
> # the following two commands _on the parent dir_ trigger the warning
> $ hadoop fs -dus /hdfs-1377
> $ hadoop fs -count -q /hdfs-1377
> {code}
> Warning message in the namenode logs:
> {code}
> 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355
> {code}
> Note that the commands are run on the _parent directory_ but the warning is 
> shown for the _subdirectory_ with space quota.
> *Background*
> The bug was introduced by the HDFS-1377 patch, which is currently committed 
> to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, 
> branch-0.20-security-205 and release-0.20.3-rc2.  In the patch, 
> {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was 
> updated to trigger the warning above if the cached and computed diskspace 
> values are not the same for a directory with quota.
> The warning is written by {{computecontentSummary(long[] summary)}} in 
> {{INodeDirectory}}. In the method an inode's children are recursively walked 
> through while the {{summary}} parameter is passed and updated along the way.
> {code}
>   /** {@inheritDoc} */
>   long[] computeContentSummary(long[] summary) {
>     if (children != null) {
>       for (INode child : children) {
>         child.computeContentSummary(summary);
>       }
>     }
> {code}
> The condition that triggers the warning message compares the current node's 
> cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding 
> field in {{summary}}.
> {code}
>       if (-1 != node.getDsQuota() && space != summary[3]) {
>         NameNode.LOG.warn("Inconsistent diskspace for directory "
>           +getLocalName()+". Cached: "+space+" Computed: "+summary[3]);
> {code}
> However {{summary}} may already include diskspace information from other 
> inodes at this point (i.e. from different subtrees than the subtree of the 
> node for which the warning message is shown; in our example for the tree at 
> {{/hdfs-1377}}, {{summary}} can already contain information from 
> {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode 
> {{/hdfs-1377/C}}).  Hence the cached value for {{C}} can incorrectly be 
> different from the computed value.
> *How to fix*
> The supplied patch creates a fresh summary array for the subtree of the 
> current node.  The walk through the children passes and updates this 
> {{subtreeSummary}} array, and the condition is checked against 
> {{subtreeSummary}} instead of the original {{summary}}.  The original 
> {{summary}} is updated with the values of {{subtreeSummary}} before it 
> returns.
> *Unit Tests*
> I have run "ant test" on my patched build without any errors*.  However the 
> existing unit tests did not catch this issue for the original HDFS-1377 
> patch, so this might not mean anything. ;-)
> That said I am unsure what the most appropriate way to unit test this issue 
> would be.  A straight-forward approach would be to automate the steps in the 
> _How to reproduce section_ above and check whether the NN logs an incorrect 
> warning message.  But I'm not sure how this check could be implemented.  Feel 
> free to provide some pointers if you have some ideas.
> *Note about Fix Version/s*
> The patch _should_ apply to all branches where the HDFS-1377 patch has 
> committed to.  In my environment, the build was Hadoop 0.20.203.0 release 
> with a (trivial) backport of HDFS-1377 (0.20.203.0 release does not ship with 
> the HDFS-1377 fix).  I could apply the patch successfully to 
> {{branch-0.20-security}}, {{branch-0.20-security-204}} and 
> {{release-0.20.3-rc2}}, for instance.  Since I'm a bit confused regarding the 
> upcoming 0.20.x release versions (0.20.x vs. 0.20.20x.y) I have been so bold 
> and added 0.20.203.0 to the list of affected versions even though it is 
> actually only affected when HDFS-1377 is applied to it...
> Best,
> Michael
> *Well, I get one error for {{TestRumenJobTraces}} but first this seems to be 
> completely unrelated and second I get the same test error when running the 
> tests on the stock 0.20.203.0 release build.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to