[jira] Updated: (HDFS-1276) Put the failed volumes in the report of HDFS status
[ https://issues.apache.org/jira/browse/HDFS-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated HDFS-1276: - Attachment: HDFS_1276.patch Attach the patch. The unit test depends on the HDFS-1273. So I will attach the unit test after HDFS-1273 is committed > Put the failed volumes in the report of HDFS status > --- > > Key: HDFS-1276 > URL: https://issues.apache.org/jira/browse/HDFS-1276 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Jeff Zhang > Fix For: 0.21.0 > > Attachments: HDFS_1276.patch > > > Currently, users do not know which volumes are failed unless he looks into > the logs, this way is not convenient for users. I plan to put the failed > volumes in the report of HDFS. Then hdfs administers can use command > "bin/hadoop dfsadmin -report" to find which volumes are failed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883785#action_12883785 ] sam rash commented on HDFS-1057: from the raw console output of hudson: [exec] [junit] Tests run: 3, Failures: 0, Errors: 1, Time elapsed: 0.624 sec [exec] [junit] Test org.apache.hadoop.hdfs.security.token.block.TestBlockToken FAILED -- [exec] [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.706 sec [exec] [junit] Test org.apache.hadoop.hdfs.server.common.TestJspHelper FAILED -- [exec] [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 28.477 sec [exec] [junit] Test org.apache.hadoop.hdfsproxy.TestHdfsProxy FAILED I ran the tests locally and the first 2 succeed. The third fails on the latest trunk without hdfs-1057. I think from the test perspective, this is safe to commit. 1. TestBlockToken run-test-hdfs: [delete] Deleting directory /data/users/srash/apache/hadoop-hdfs/build/test/data [mkdir] Created dir: /data/users/srash/apache/hadoop-hdfs/build/test/data [delete] Deleting directory /data/users/srash/apache/hadoop-hdfs/build/test/logs [mkdir] Created dir: /data/users/srash/apache/hadoop-hdfs/build/test/logs [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/usr/local/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/srash/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.security.token.block.TestBlockToken [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.248 sec 2. TestJspHelper run-test-hdfs: [delete] Deleting directory /data/users/srash/apache/hadoop-hdfs/build/test/data [mkdir] Created dir: /data/users/srash/apache/hadoop-hdfs/build/test/data [delete] Deleting directory /data/users/srash/apache/hadoop-hdfs/build/test/logs [mkdir] Created dir: /data/users/srash/apache/hadoop-hdfs/build/test/logs [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/usr/local/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/srash/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.server.common.TestJspHelper [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.275 sec > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Fix For: 0.20-append > > Attachments: conurrent-reader-patch-1.txt, > conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, > HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, > hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, > hdfs-1057-trunk-6.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1276) Put the failed volumes in the report of HDFS status
[ https://issues.apache.org/jira/browse/HDFS-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883764#action_12883764 ] Jeff Zhang commented on HDFS-1276: -- The new report output should like this ( the red line is the failed volumes information): Configured Capacity: 118633709568 (110.49 GB) Present Capacity: 85291678356 (79.43 GB) DFS Remaining: 85291560960 (79.43 GB) DFS Used: 117396 (114.64 KB) DFS Used%: 0% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 - Datanodes available: 2 (2 total, 0 dead) Live datanodes: Name: 127.0.0.1:43452 (localhost) Decommission Status : Normal {color:red}Failed Volumns: /home/zjffdu/workspace/HDFS_trunk/build/test/data/dfs/data/data3{color} Configured Capacity: 39544569856 (36.83 GB) DFS Used: 44362 (43.32 KB) Non DFS Used: 4005174 (10.35 GB) DFS Remaining: 28430520320 (26.48 GB) DFS Used%: 0% DFS Remaining%: 71.89% Last contact: Wed Jun 30 10:06:17 CST 2010 Name: 127.0.0.1:52494 (localhost) Decommission Status : Normal Configured Capacity: 79089139712 (73.66 GB) DFS Used: 73034 (71.32 KB) Non DFS Used: 8026038 (20.7 GB) DFS Remaining: 56861040640 (52.96 GB) DFS Used%: 0% DFS Remaining%: 71.89% Last contact: Wed Jun 30 10:06:19 CST 2010 > Put the failed volumes in the report of HDFS status > --- > > Key: HDFS-1276 > URL: https://issues.apache.org/jira/browse/HDFS-1276 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Jeff Zhang > Fix For: 0.21.0 > > > Currently, users do not know which volumes are failed unless he looks into > the logs, this way is not convenient for users. I plan to put the failed > volumes in the report of HDFS. Then hdfs administers can use command > "bin/hadoop dfsadmin -report" to find which volumes are failed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1276) Put the failed volumes in the report of HDFS status
Put the failed volumes in the report of HDFS status --- Key: HDFS-1276 URL: https://issues.apache.org/jira/browse/HDFS-1276 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: Jeff Zhang Fix For: 0.21.0 Currently, users do not know which volumes are failed unless he looks into the logs, this way is not convenient for users. I plan to put the failed volumes in the report of HDFS. Then hdfs administers can use command "bin/hadoop dfsadmin -report" to find which volumes are failed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1275) Enabling Kerberized SSL on NameNode
[ https://issues.apache.org/jira/browse/HDFS-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang resolved HDFS-1275. - Resolution: Duplicate Closing this one as this is a duplicate HDFS-1004. > Enabling Kerberized SSL on NameNode > --- > > Key: HDFS-1275 > URL: https://issues.apache.org/jira/browse/HDFS-1275 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kan Zhang > Attachments: h6584-03.patch > > > This is the HDFS part of HADOOP-6584. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1004) Update NN to support Kerberized SSL from HADOOP-6584
[ https://issues.apache.org/jira/browse/HDFS-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated HDFS-1004: Attachment: h6584-03.patch Uploading a new patch that is simply a porting of Jacob's patch for HADOOP-6584. > Update NN to support Kerberized SSL from HADOOP-6584 > > > Key: HDFS-1004 > URL: https://issues.apache.org/jira/browse/HDFS-1004 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: h6584-03.patch, HDFS-1004.patch > > > Namenode needs to be tweaked to use the new kerberized-back ssl connector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1275) Enabling Kerberized SSL on NameNode
[ https://issues.apache.org/jira/browse/HDFS-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated HDFS-1275: Attachment: h6584-03.patch Uploading a patch that is simply a porting of Jakob's patch for HADOOP-6584. > Enabling Kerberized SSL on NameNode > --- > > Key: HDFS-1275 > URL: https://issues.apache.org/jira/browse/HDFS-1275 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kan Zhang > Attachments: h6584-03.patch > > > This is the HDFS part of HADOOP-6584. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1004) Update NN to support Kerberized SSL from HADOOP-6584
[ https://issues.apache.org/jira/browse/HDFS-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883745#action_12883745 ] Kan Zhang commented on HDFS-1004: - Sorry, I created HDFS-1275 before noticing this one. Will close HDFS-1275 as duplicate. > Update NN to support Kerberized SSL from HADOOP-6584 > > > Key: HDFS-1004 > URL: https://issues.apache.org/jira/browse/HDFS-1004 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: HDFS-1004.patch > > > Namenode needs to be tweaked to use the new kerberized-back ssl connector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1275) Enabling Kerberized SSL on NameNode
Enabling Kerberized SSL on NameNode --- Key: HDFS-1275 URL: https://issues.apache.org/jira/browse/HDFS-1275 Project: Hadoop HDFS Issue Type: Improvement Reporter: Kan Zhang This is the HDFS part of HADOOP-6584. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1271) Decommissioning nodes not persisted between NameNode restarts
[ https://issues.apache.org/jira/browse/HDFS-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883700#action_12883700 ] Allen Wittenauer commented on HDFS-1271: +1. I've just been too lazy to file a bug. :) > Decommissioning nodes not persisted between NameNode restarts > - > > Key: HDFS-1271 > URL: https://issues.apache.org/jira/browse/HDFS-1271 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Travis Crawford > > Datanodes in the process of being decomissioned should still be > decomissioning after namenode restarts. Currently they are marked as dead > after a restart. > Details: > Nodes can be safely removed from a cluster by marking them as decomissioned > and waiting for their data to be replicated elsewhere. This is accomplished > by adding a node to the filed referenced by dfs.hosts.excluded, then > refreshing nodes. > Decomissioning means block reports from the decomissioned datanode are no > longer accepted by the namenode, meaning for decomissioning to occur the NN > must have an existing block report. That is, a datanode can transition from: > live --> decomissioning --> dead. Nodes can NOT transition from: dead --> > decomissioning --> dead. > Operationally this is problematic because intervention is required should the > NN restart while nodes are decomissioning, meaning in-house administration > tools must be more complex, or more likely admins have to babysit the > decomissioning process. > Someone more familiar with the code might have a better idea, but perhaps the > first block report for dfs.hosts.excluded hosts should be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1258) Clearing namespace quota on "/" corrupts FS image
[ https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883699#action_12883699 ] Aaron T. Myers commented on HDFS-1258: -- Both of those test failures were failing in trunk before I created the patch. > Clearing namespace quota on "/" corrupts FS image > - > > Key: HDFS-1258 > URL: https://issues.apache.org/jira/browse/HDFS-1258 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Blocker > Fix For: 0.20.3, 0.21.0, 0.22.0 > > Attachments: clear-quota.patch, clear-quota.patch > > > The HDFS root directory starts out with a default namespace quota of > Integer.MAX_VALUE. If you clear this quota (using "hadoop dfsadmin -clrQuota > /"), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, > and the NN will not come back up from a restart. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883686#action_12883686 ] Rodrigo Schmidt commented on HDFS-: --- I thought DistributedFileSystem and Hdfs classes contact the namenode via RPC, using ClientProtocol. Maybe I'm missing something but I think that even if we change Hdfs and DistributedFileSystem, getCorruptFiles() will have to be part of ClientProtocol. > getCorruptFiles() should give some hint that the list is not complete > - > > Key: HDFS- > URL: https://issues.apache.org/jira/browse/HDFS- > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Rodrigo Schmidt >Assignee: Rodrigo Schmidt > Attachments: HADFS-.0.patch > > > If the list of corruptfiles returned by the namenode doesn't say anything if > the number of corrupted files is larger than the call output limit (which > means the list is not complete). There should be a way to hint incompleteness > to clients. > A simple hack would be to add an extra entry to the array returned with the > value null. Clients could interpret this as a sign that there are other > corrupt files in the system. > We should also do some rephrasing of the fsck output to make it more > confident when the list is not complete and less confident when the list is > known to be incomplete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883676#action_12883676 ] Hadoop QA commented on HDFS-1057: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448323/hdfs-1057-trunk-6.txt against trunk revision 957669. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/416/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/416/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/416/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/416/console This message is automatically generated. > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Fix For: 0.20-append > > Attachments: conurrent-reader-patch-1.txt, > conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, > HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, > hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, > hdfs-1057-trunk-6.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1271) Decommissioning nodes not persisted between NameNode restarts
[ https://issues.apache.org/jira/browse/HDFS-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883655#action_12883655 ] dhruba borthakur commented on HDFS-1271: This is a *real* problem that we have faced on our clusters's too. > Decommissioning nodes not persisted between NameNode restarts > - > > Key: HDFS-1271 > URL: https://issues.apache.org/jira/browse/HDFS-1271 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Travis Crawford > > Datanodes in the process of being decomissioned should still be > decomissioning after namenode restarts. Currently they are marked as dead > after a restart. > Details: > Nodes can be safely removed from a cluster by marking them as decomissioned > and waiting for their data to be replicated elsewhere. This is accomplished > by adding a node to the filed referenced by dfs.hosts.excluded, then > refreshing nodes. > Decomissioning means block reports from the decomissioned datanode are no > longer accepted by the namenode, meaning for decomissioning to occur the NN > must have an existing block report. That is, a datanode can transition from: > live --> decomissioning --> dead. Nodes can NOT transition from: dead --> > decomissioning --> dead. > Operationally this is problematic because intervention is required should the > NN restart while nodes are decomissioning, meaning in-house administration > tools must be more complex, or more likely admins have to babysit the > decomissioning process. > Someone more familiar with the code might have a better idea, but perhaps the > first block report for dfs.hosts.excluded hosts should be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-233) Support for snapshots
[ https://issues.apache.org/jira/browse/HDFS-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883633#action_12883633 ] dhruba borthakur commented on HDFS-233: --- There hasn't been much work in this direction. It was deemed complex to implement because it needs lots of changes to the current NameNode/DataNode code. I have a proposal in mind that can implement "HDFS snapshots" as a layer on top of the current HDFS code with negligible changes to the existing NameNode/DataNode architecture. If you have any ideas regarding this , or is willing to contribute towards this effort, that will be great! Thanks. > Support for snapshots > - > > Key: HDFS-233 > URL: https://issues.apache.org/jira/browse/HDFS-233 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Attachments: Snapshots.pdf, Snapshots.pdf > > > Support HDFS snapshots. It should support creating snapshots without shutting > down the file system. Snapshot creation should be lightweight and a typical > system should be able to support a few thousands concurrent snapshots. There > should be a way to surface (i.e. mount) a few of these snapshots > simultaneously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883632#action_12883632 ] dhruba borthakur commented on HDFS-1094: @Koji: we have files with replication factor of 3. if a large number of datanodes fail at the same time, we do see missing blocks. Sometimes, the datanode process on these machines fail to start even after repeated start-dfs.sh attempts, sometimes the entire machine fails to reboot. Then we have to manually fix a few of those bad datanode machines and make them come online; this fixes the "missing blocks" problem but is a manual process and is painful. > Intelligent block placement policy to decrease probability of block loss > > > Key: HDFS-1094 > URL: https://issues.apache.org/jira/browse/HDFS-1094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: dhruba borthakur >Assignee: Rodrigo Schmidt > Attachments: prob.pdf, prob.pdf > > > The current HDFS implementation specifies that the first replica is local and > the other two replicas are on any two random nodes on a random remote rack. > This means that if any three datanodes die together, then there is a > non-trivial probability of losing at least one block in the cluster. This > JIRA is to discuss if there is a better algorithm that can lower probability > of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-617) Support for non-recursive create() in HDFS
[ https://issues.apache.org/jira/browse/HDFS-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HDFS-617: - Affects Version/s: 0.20-append > Support for non-recursive create() in HDFS > -- > > Key: HDFS-617 > URL: https://issues.apache.org/jira/browse/HDFS-617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client, name-node >Affects Versions: 0.20-append >Reporter: Kan Zhang >Assignee: Kan Zhang > Fix For: 0.21.0 > > Attachments: h617-01.patch, h617-02.patch, h617-03.patch, > h617-04.patch, h617-06.patch, HDFS-617_20-append.patch > > > HADOOP-4952 calls for a create call that doesn't automatically create missing > parent directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-617) Support for non-recursive create() in HDFS
[ https://issues.apache.org/jira/browse/HDFS-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HDFS-617: - Attachment: HDFS-617_20-append.patch Backport of patch for 0.20-append branch > Support for non-recursive create() in HDFS > -- > > Key: HDFS-617 > URL: https://issues.apache.org/jira/browse/HDFS-617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client, name-node >Affects Versions: 0.20-append >Reporter: Kan Zhang >Assignee: Kan Zhang > Fix For: 0.21.0 > > Attachments: h617-01.patch, h617-02.patch, h617-03.patch, > h617-04.patch, h617-06.patch, HDFS-617_20-append.patch > > > HADOOP-4952 calls for a create call that doesn't automatically create missing > parent directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1274) ability to send replication traffic on a separate port to the Datanode
[ https://issues.apache.org/jira/browse/HDFS-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883613#action_12883613 ] dhruba borthakur commented on HDFS-1274: > Is this for TCP-level QoS? precisely. > ability to send replication traffic on a separate port to the Datanode > -- > > Key: HDFS-1274 > URL: https://issues.apache.org/jira/browse/HDFS-1274 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: dhruba borthakur > > The datanode receives data from a client write request or from a replication > request. It is useful to configure the cluster to so that dedicated bandwidth > is allocated for client writes and replication traffic. This requires that > the client-writes and replication traffic be configured to operate on two > different ports to the datanode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Status: Patch Available (was: Open) > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Fix For: 0.20-append > > Attachments: conurrent-reader-patch-1.txt, > conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, > HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, > hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, > hdfs-1057-trunk-6.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Status: Open (was: Patch Available) > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Fix For: 0.20-append > > Attachments: conurrent-reader-patch-1.txt, > conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, > HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, > hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, > hdfs-1057-trunk-6.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: hdfs-1057-trunk-6.txt -fixed warnings -fixed fd leak in some of the added tests > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Fix For: 0.20-append > > Attachments: conurrent-reader-patch-1.txt, > conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, > HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, > hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, > hdfs-1057-trunk-6.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883602#action_12883602 ] Sanjay Radia commented on HDFS-: >I really think the correct design choice is to export basic APIs like >getCorruptFiles() as RPCs. I suspect you have a misunderstanding of how the client side connects via RPC. We have no plans to expose the RPCs directly for now. In order to allow tools to access such functionality it is not necessary to use the RPC directly; Hdfs and DistributedFileSystem (which extend AbstractFileSystem and FileSystem) are effectively the client side library to access a NN. >On the other hand, if we do take getCorruptFiles() out of ClientProtocol, we >will make HDFS-1171 overly complicated or expensive. Not if you add the method to Hdfs and DistributedFileSystem. You simply need to make the case for adding getCorruptFIles to these two classes. It appears that this functionality got slipped in as part of HDFS-1171. > getCorruptFiles() should give some hint that the list is not complete > - > > Key: HDFS- > URL: https://issues.apache.org/jira/browse/HDFS- > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Rodrigo Schmidt >Assignee: Rodrigo Schmidt > Attachments: HADFS-.0.patch > > > If the list of corruptfiles returned by the namenode doesn't say anything if > the number of corrupted files is larger than the call output limit (which > means the list is not complete). There should be a way to hint incompleteness > to clients. > A simple hack would be to add an extra entry to the array returned with the > value null. Clients could interpret this as a sign that there are other > corrupt files in the system. > We should also do some rephrasing of the fsck output to make it more > confident when the list is not complete and less confident when the list is > known to be incomplete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1274) ability to send replication traffic on a separate port to the Datanode
[ https://issues.apache.org/jira/browse/HDFS-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883598#action_12883598 ] Todd Lipcon commented on HDFS-1274: --- Is this for TCP-level QoS? (eg using linux traffic shaping to balance bandwidth allocations?) > ability to send replication traffic on a separate port to the Datanode > -- > > Key: HDFS-1274 > URL: https://issues.apache.org/jira/browse/HDFS-1274 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: dhruba borthakur > > The datanode receives data from a client write request or from a replication > request. It is useful to configure the cluster to so that dedicated bandwidth > is allocated for client writes and replication traffic. This requires that > the client-writes and replication traffic be configured to operate on two > different ports to the datanode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1274) ability to send replication traffic on a separate port to the Datanode
[ https://issues.apache.org/jira/browse/HDFS-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HDFS-1274: --- Component/s: data-node > ability to send replication traffic on a separate port to the Datanode > -- > > Key: HDFS-1274 > URL: https://issues.apache.org/jira/browse/HDFS-1274 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: dhruba borthakur > > The datanode receives data from a client write request or from a replication > request. It is useful to configure the cluster to so that dedicated bandwidth > is allocated for client writes and replication traffic. This requires that > the client-writes and replication traffic be configured to operate on two > different ports to the datanode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1274) ability to send replication traffic on a separate port to the Datanode
ability to send replication traffic on a separate port to the Datanode -- Key: HDFS-1274 URL: https://issues.apache.org/jira/browse/HDFS-1274 Project: Hadoop HDFS Issue Type: Improvement Reporter: dhruba borthakur The datanode receives data from a client write request or from a replication request. It is useful to configure the cluster to so that dedicated bandwidth is allocated for client writes and replication traffic. This requires that the client-writes and replication traffic be configured to operate on two different ports to the datanode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1273) Handle disk failure when writing new blocks on datanode
[ https://issues.apache.org/jira/browse/HDFS-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883534#action_12883534 ] Hadoop QA commented on HDFS-1273: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448294/HDFS_1273.patch against trunk revision 957669. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/415/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/415/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/415/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/415/console This message is automatically generated. > Handle disk failure when writing new blocks on datanode > --- > > Key: HDFS-1273 > URL: https://issues.apache.org/jira/browse/HDFS-1273 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.21.0 >Reporter: Jeff Zhang > Fix For: 0.21.0 > > Attachments: HDFS_1273.patch > > > This issues relates to HDFS-457, in the patch of HDFS-457 only disk failure > when reading is handled. This jira is to handle the disk failure when writing > new blocks on data node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883481#action_12883481 ] Rodrigo Schmidt commented on HDFS-: --- The RaidNode is not currently using this API, although its use was one of the motivations I had for adding getCorruptFiles() to ClientProtocol. Originally, raid was part of HDFS and I could certainly see how Raid (and possibly other parts of HDFS) could benefit from this as an RPC to the namenode. I thought the others saw it too because when I got to HDFS-729, having getCorruptFiles() on ClientProtocol was not under discussion anymore. The JIRA that is responsible for making the RaidNode call getCorruptFiles is HDFS-1171. Most probably we will have to extend DistributedFileSystem to export getCorruptFiles(). That's why I said we don't have to be external to HDFS, but we can be external to the namenode. On the other hand, if we do take getCorruptFiles() out of ClientProtocol, we will make HDFS-1171 overly complicated or expensive. I really think the correct design choice is to export basic APIs like getCorruptFiles() as RPCs and build services like fsck and raid completely outside the namenode. After looking at the fsck code from the inside out and having experienced how it can sometimes compromise the whole filesystem because the namenode is using most of its resources to calculate outputs for fsck requests, I'm convinced it should be outside the namenode. For the sake of horizontal scalability of the namenode, we should be working in redesigning things like the current fsck implementation, instead of reinforcing it. That's what I meant when I said we should be taking things out of the namenode. In my opinion, even if my case about having other parts of HDFS call getCorruptFiles() is not convincing, taking it out of ClientProtocol only reinforces the design choice of running fsck inside the namenode, which I think is bad. As we have more and more discussions about a distributed namenode, things like fsck should be the first ones running externally to it (to the namenode, not to HDFS). I see this as a low-hanging fruit towards a more scalable and distributed namenode. > getCorruptFiles() should give some hint that the list is not complete > - > > Key: HDFS- > URL: https://issues.apache.org/jira/browse/HDFS- > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Rodrigo Schmidt >Assignee: Rodrigo Schmidt > Attachments: HADFS-.0.patch > > > If the list of corruptfiles returned by the namenode doesn't say anything if > the number of corrupted files is larger than the call output limit (which > means the list is not complete). There should be a way to hint incompleteness > to clients. > A simple hack would be to add an extra entry to the array returned with the > value null. Clients could interpret this as a sign that there are other > corrupt files in the system. > We should also do some rephrasing of the fsck output to make it more > confident when the list is not complete and less confident when the list is > known to be incomplete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-457) better handling of volume failure in Data Node storage
[ https://issues.apache.org/jira/browse/HDFS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883466#action_12883466 ] Jeff Zhang commented on HDFS-457: - Create Jira HDFS-1273 for this issue, and put patch there > better handling of volume failure in Data Node storage > -- > > Key: HDFS-457 > URL: https://issues.apache.org/jira/browse/HDFS-457 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Fix For: 0.21.0 > > Attachments: HDFS-457-1.patch, HDFS-457-2.patch, HDFS-457-2.patch, > HDFS-457-2.patch, HDFS-457-3.patch, HDFS-457.patch, HDFS-457_20-append.patch, > HDFS_457.patch, jira.HDFS-457.branch-0.20-internal.patch, TestFsck.zip > > > Current implementation shuts DataNode down completely when one of the > configured volumes of the storage fails. > This is rather wasteful behavior because it decreases utilization (good > storage becomes unavailable) and imposes extra load on the system > (replication of the blocks from the good volumes). These problems will become > even more prominent when we move to mixed (heterogeneous) clusters with many > more volumes per Data Node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1273) Handle disk failure when writing new blocks on datanode
[ https://issues.apache.org/jira/browse/HDFS-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated HDFS-1273: - Status: Patch Available (was: Open) > Handle disk failure when writing new blocks on datanode > --- > > Key: HDFS-1273 > URL: https://issues.apache.org/jira/browse/HDFS-1273 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.21.0 >Reporter: Jeff Zhang > Fix For: 0.21.0 > > Attachments: HDFS_1273.patch > > > This issues relates to HDFS-457, in the patch of HDFS-457 only disk failure > when reading is handled. This jira is to handle the disk failure when writing > new blocks on data node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1273) Handle disk failure when writing new blocks on datanode
[ https://issues.apache.org/jira/browse/HDFS-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated HDFS-1273: - Attachment: HDFS_1273.patch Attach the patch > Handle disk failure when writing new blocks on datanode > --- > > Key: HDFS-1273 > URL: https://issues.apache.org/jira/browse/HDFS-1273 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.21.0 >Reporter: Jeff Zhang > Fix For: 0.21.0 > > Attachments: HDFS_1273.patch > > > This issues relates to HDFS-457, in the patch of HDFS-457 only disk failure > when reading is handled. This jira is to handle the disk failure when writing > new blocks on data node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1273) Handle disk failure when writing new blocks on datanode
Handle disk failure when writing new blocks on datanode --- Key: HDFS-1273 URL: https://issues.apache.org/jira/browse/HDFS-1273 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.21.0 Reporter: Jeff Zhang Fix For: 0.21.0 This issues relates to HDFS-457, in the patch of HDFS-457 only disk failure when reading is handled. This jira is to handle the disk failure when writing new blocks on data node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-457) better handling of volume failure in Data Node storage
[ https://issues.apache.org/jira/browse/HDFS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883456#action_12883456 ] Eli Collins commented on HDFS-457: -- Hey Jeff, Nice catch. Please file a new jira. Thanks, Eli > better handling of volume failure in Data Node storage > -- > > Key: HDFS-457 > URL: https://issues.apache.org/jira/browse/HDFS-457 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Fix For: 0.21.0 > > Attachments: HDFS-457-1.patch, HDFS-457-2.patch, HDFS-457-2.patch, > HDFS-457-2.patch, HDFS-457-3.patch, HDFS-457.patch, HDFS-457_20-append.patch, > HDFS_457.patch, jira.HDFS-457.branch-0.20-internal.patch, TestFsck.zip > > > Current implementation shuts DataNode down completely when one of the > configured volumes of the storage fails. > This is rather wasteful behavior because it decreases utilization (good > storage becomes unavailable) and imposes extra load on the system > (replication of the blocks from the good volumes). These problems will become > even more prominent when we move to mixed (heterogeneous) clusters with many > more volumes per Data Node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-457) better handling of volume failure in Data Node storage
[ https://issues.apache.org/jira/browse/HDFS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883447#action_12883447 ] Jeff Zhang commented on HDFS-457: - This is my first patch on HDFS, not sure whether it is right to attach the patch here. or do need to create a new jira for this issue ? > better handling of volume failure in Data Node storage > -- > > Key: HDFS-457 > URL: https://issues.apache.org/jira/browse/HDFS-457 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Fix For: 0.21.0 > > Attachments: HDFS-457-1.patch, HDFS-457-2.patch, HDFS-457-2.patch, > HDFS-457-2.patch, HDFS-457-3.patch, HDFS-457.patch, HDFS-457_20-append.patch, > HDFS_457.patch, jira.HDFS-457.branch-0.20-internal.patch, TestFsck.zip > > > Current implementation shuts DataNode down completely when one of the > configured volumes of the storage fails. > This is rather wasteful behavior because it decreases utilization (good > storage becomes unavailable) and imposes extra load on the system > (replication of the blocks from the good volumes). These problems will become > even more prominent when we move to mixed (heterogeneous) clusters with many > more volumes per Data Node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-457) better handling of volume failure in Data Node storage
[ https://issues.apache.org/jira/browse/HDFS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated HDFS-457: Attachment: HDFS_457.patch Attach patch. Do checkDiskError in BlockReceiver before allocating volumes for the new block. So after checkDiskError, it is guaranteed that the volumes are all normal, the failed volumes has been removed. > better handling of volume failure in Data Node storage > -- > > Key: HDFS-457 > URL: https://issues.apache.org/jira/browse/HDFS-457 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Fix For: 0.21.0 > > Attachments: HDFS-457-1.patch, HDFS-457-2.patch, HDFS-457-2.patch, > HDFS-457-2.patch, HDFS-457-3.patch, HDFS-457.patch, HDFS-457_20-append.patch, > HDFS_457.patch, jira.HDFS-457.branch-0.20-internal.patch, TestFsck.zip > > > Current implementation shuts DataNode down completely when one of the > configured volumes of the storage fails. > This is rather wasteful behavior because it decreases utilization (good > storage becomes unavailable) and imposes extra load on the system > (replication of the blocks from the good volumes). These problems will become > even more prominent when we move to mixed (heterogeneous) clusters with many > more volumes per Data Node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1258) Clearing namespace quota on "/" corrupts FS image
[ https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883441#action_12883441 ] Hadoop QA commented on HDFS-1258: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448268/clear-quota.patch against trunk revision 957669. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/208/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/208/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/208/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/208/console This message is automatically generated. > Clearing namespace quota on "/" corrupts FS image > - > > Key: HDFS-1258 > URL: https://issues.apache.org/jira/browse/HDFS-1258 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Blocker > Fix For: 0.20.3, 0.21.0, 0.22.0 > > Attachments: clear-quota.patch, clear-quota.patch > > > The HDFS root directory starts out with a default namespace quota of > Integer.MAX_VALUE. If you clear this quota (using "hadoop dfsadmin -clrQuota > /"), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, > and the NN will not come back up from a restart. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.