[jira] [Commented] (HDFS-5919) FileJournalManager doesn't purge empty and corrupt inprogress edits files
[ https://issues.apache.org/jira/browse/HDFS-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897594#comment-13897594 ] Hadoop QA commented on HDFS-5919: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627976/HDFS-5919.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager org.apache.hadoop.hdfs.server.namenode.TestEditLog org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6106//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6106//console This message is automatically generated. > FileJournalManager doesn't purge empty and corrupt inprogress edits files > - > > Key: HDFS-5919 > URL: https://issues.apache.org/jira/browse/HDFS-5919 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5919.patch > > > FileJournalManager doesn't purge empty and corrupt inprogress edit files. > These stale files will be accumulated over time. > These should be cleared along with the purging of other edit logs -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897591#comment-13897591 ] Chris Nauroth commented on HDFS-5899: - I've submitted a patch on issue HDFS-5925 to make this change. > Add configuration flag to disable/enable support for ACLs. > -- > > Key: HDFS-5899 > URL: https://issues.apache.org/jira/browse/HDFS-5899 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: HDFS ACLs (HDFS-4685) > > Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch > > > Add a new configuration property that allows administrators to toggle support > for HDFS ACLs on/off. By default, the flag will be off. This is a > conservative choice, and administrators interested in using ACLs can enable > it explicitly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897590#comment-13897590 ] Vinayakumar B commented on HDFS-5583: - {code} + OOB_TYPE1 = 8; // Quick restart + OOB_TYPE2 = 9; // Reserved + OOB_TYPE3 = 10; // Reserved + OOB_TYPE4 = 11; // Reserved {code} I think instead of OOB_TYPE1, OOB_TYPE2 better names could be given.. any thoughts? {code} if (!responderClosed) { // Abnormal termination.{code} I think comment is no more holds good. May be that can be removed. changes done in {{sendAckUpstream()}} are not formatter correctly. Contains tab characters too. Java doc could be added for {{Status myStatus}} in {{sendAckUpstream()}} > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5925) ACL configuration flag must only reject ACL API calls, not ACLs present in fsimage or edits.
[ https://issues.apache.org/jira/browse/HDFS-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5925: Attachment: HDFS-5925.1.patch Attaching patch. > ACL configuration flag must only reject ACL API calls, not ACLs present in > fsimage or edits. > > > Key: HDFS-5925 > URL: https://issues.apache.org/jira/browse/HDFS-5925 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5925.1.patch > > > In follow-up discussion on HDFS-5899, we decided that it would cause less > harm to administrators if setting {{dfs.namenode.acls.enabled}} to false only > causes ACL API calls to be rejected. Existing ACLs found in fsimage or edits > will be loaded and enforced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5925) ACL configuration flag must only reject ACL API calls, not ACLs present in fsimage or edits.
[ https://issues.apache.org/jira/browse/HDFS-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5925 started by Chris Nauroth. > ACL configuration flag must only reject ACL API calls, not ACLs present in > fsimage or edits. > > > Key: HDFS-5925 > URL: https://issues.apache.org/jira/browse/HDFS-5925 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-5925.1.patch > > > In follow-up discussion on HDFS-5899, we decided that it would cause less > harm to administrators if setting {{dfs.namenode.acls.enabled}} to false only > causes ACL API calls to be rejected. Existing ACLs found in fsimage or edits > will be loaded and enforced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5925) ACL configuration flag must only reject ACL API calls, not ACLs present in fsimage or edits.
Chris Nauroth created HDFS-5925: --- Summary: ACL configuration flag must only reject ACL API calls, not ACLs present in fsimage or edits. Key: HDFS-5925 URL: https://issues.apache.org/jira/browse/HDFS-5925 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth In follow-up discussion on HDFS-5899, we decided that it would cause less harm to administrators if setting {{dfs.namenode.acls.enabled}} to false only causes ACL API calls to be rejected. Existing ACLs found in fsimage or edits will be loaded and enforced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897581#comment-13897581 ] Hadoop QA commented on HDFS-5810: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628147/HDFS-5810.020.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.http.TestHttpServerLifecycle {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6105//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6105//console This message is automatically generated. > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, > HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, > HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch, > HDFS-5810.020.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698
[ https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5914. - Resolution: Fixed Fix Version/s: HDFS ACLs (HDFS-4685) Hadoop Flags: Reviewed +1 for the patch. I committed it to the HDFS-4685 branch. Thanks again for taking care of this, Haohui. > Incorporate ACLs with the changes from HDFS-5698 > > > Key: HDFS-5914 > URL: https://issues.apache.org/jira/browse/HDFS-5914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, security >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: HDFS ACLs (HDFS-4685) > > Attachments: HDFS-5914.000.patch, HDFS-5914.001.patch > > > HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be > updated to work with these changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897548#comment-13897548 ] Kihwal Lee edited comment on HDFS-5583 at 2/11/14 5:38 AM: --- The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the new test case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE1 [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} was (Author: kihwal): The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE1 [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897548#comment-13897548 ] Kihwal Lee edited comment on HDFS-5583 at 2/11/14 5:37 AM: --- The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE1 [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} was (Author: kihwal): The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897548#comment-13897548 ] Kihwal Lee edited comment on HDFS-5583 at 2/11/14 5:37 AM: --- The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} was (Author: kihwal): The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {noformat} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {noformat} > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5583: - Attachment: HDFS-5583.patch The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {noformat} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {noformat} > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897545#comment-13897545 ] Kihwal Lee commented on HDFS-5583: -- This jira depends on HDFS-5585. I will post a patch, which applies on top of HDFS-5585. > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Reopened] (HDFS-5396) FSImage.getFsImageName should check whether fsimage exists
[ https://issues.apache.org/jira/browse/HDFS-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong reopened HDFS-5396: I made a mistake when I resolved this as Not A Problem. Because for (Iterator it = dirIterator(NameNodeDirType.IMAGE); it.hasNext();) sd = it.next(); will return last StorageDirectory of image, but due to HDFS-5367, it may not have fsimage in it. > FSImage.getFsImageName should check whether fsimage exists > -- > > Key: HDFS-5396 > URL: https://issues.apache.org/jira/browse/HDFS-5396 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 1.2.1 >Reporter: zhaoyunjiong >Assignee: zhaoyunjiong > Fix For: 1.3.0 > > Attachments: HDFS-5396-branch-1.2.patch > > > In https://issues.apache.org/jira/browse/HDFS-5367, fsimage may not write to > all IMAGE dir, so we need to check whether fsimage exists before > FSImage.getFsImageName returned. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5919) FileJournalManager doesn't purge empty and corrupt inprogress edits files
[ https://issues.apache.org/jira/browse/HDFS-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-5919: Status: Patch Available (was: Open) > FileJournalManager doesn't purge empty and corrupt inprogress edits files > - > > Key: HDFS-5919 > URL: https://issues.apache.org/jira/browse/HDFS-5919 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-5919.patch > > > FileJournalManager doesn't purge empty and corrupt inprogress edit files. > These stale files will be accumulated over time. > These should be cleared along with the purging of other edit logs -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897519#comment-13897519 ] Colin Patrick McCabe commented on HDFS-5899: bq. Here is a compromise proposal. Let's reject the API calls when dfs.namenode.acls.enabled is false, but let's still load and enforce all existing ACLs found in fsimage or edits. Sounds reasonable. > Add configuration flag to disable/enable support for ACLs. > -- > > Key: HDFS-5899 > URL: https://issues.apache.org/jira/browse/HDFS-5899 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: HDFS ACLs (HDFS-4685) > > Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch > > > Add a new configuration property that allows administrators to toggle support > for HDFS ACLs on/off. By default, the flag will be off. This is a > conservative choice, and administrators interested in using ACLs can enable > it explicitly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897518#comment-13897518 ] Hadoop QA commented on HDFS-5810: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628100/HDFS-5810.019.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6104//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6104//console This message is automatically generated. > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, > HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, > HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch, > HDFS-5810.020.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5917) Have an ability to refresh deadNodes list periodically
[ https://issues.apache.org/jira/browse/HDFS-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-5917: Description: In current HBase + HDFS trunk impl, if one node is added into deadNodes map, before deadNodes.clear() be invoked, this node could not be chosen any more. When i fixed HDFS-5637, i had a raw thought, since there're not a few conditions could trigger a node be added into deadNodes map, it would be better if we have an ability to refresh this cache map info automaticly. It's good for HBase scenario at least, e.g. before HDFS-5637 fixed, if a local node be added into deadNodes, then it will read remotely even if the local node is live in real:) if more unfortunately, this block is in a huge HFile which doesn't be picked into any minor compaction in short period, the performance penality will be continued until a large compaction or region reopend or deadNodes.clear() be invoked... (was: In current HBase + HDFS trunk impl, if one node is inserted into deadNodes list, before deadNodes.clear() be invoked, this node could not be choose always. When i fixed HDFS-5637, i had a raw thought, since there're not a few conditions could trigger a node be inserted into deadNodes, we should have an ability to refresh this important cache list info automaticly. It's benefit for HBase scenario at least, e.g. before HDFS-5637 fixed, if a local node be inserted into deadNodes, then it will read remotely even the local node is not dead:) if more unfortunately, this block is in a huge HFile which doesn't be picked into any minor compaction in short period, the performance penality will be continued until a large compaction or region reopend or deadNodes.clear() be invoked...) > Have an ability to refresh deadNodes list periodically > -- > > Key: HDFS-5917 > URL: https://issues.apache.org/jira/browse/HDFS-5917 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5917.txt > > > In current HBase + HDFS trunk impl, if one node is added into deadNodes map, > before deadNodes.clear() be invoked, this node could not be chosen any more. > When i fixed HDFS-5637, i had a raw thought, since there're not a few > conditions could trigger a node be added into deadNodes map, it would be > better if we have an ability to refresh this cache map info automaticly. It's > good for HBase scenario at least, e.g. before HDFS-5637 fixed, if a local > node be added into deadNodes, then it will read remotely even if the local > node is live in real:) if more unfortunately, this block is in a huge HFile > which doesn't be picked into any minor compaction in short period, the > performance penality will be continued until a large compaction or region > reopend or deadNodes.clear() be invoked... -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5583: - Summary: Make DN send an OOB Ack on shutdown before restaring (was: Make DN send an OOB Ack on upgrade-shutdown) > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on upgrade-shutdown
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897509#comment-13897509 ] Kihwal Lee commented on HDFS-5583: -- The client-side logic will be done in HDFS-5924. If the client-side change is missing, the OOB ack will simply treated as an error by clients. > Make DN send an OOB Ack on upgrade-shutdown > --- > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5583) Make DN send an OOB Ack on upgrade-shutdown
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5583: - Summary: Make DN send an OOB Ack on upgrade-shutdown (was: Add OOB upgrade response and client-side logic for writes) > Make DN send an OOB Ack on upgrade-shutdown > --- > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5924) Client-side OOB upgrade message processing for writes
Kihwal Lee created HDFS-5924: Summary: Client-side OOB upgrade message processing for writes Key: HDFS-5924 URL: https://issues.apache.org/jira/browse/HDFS-5924 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kihwal Lee Assignee: Kihwal Lee -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5810: --- Attachment: HDFS-5810.020.patch > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, > HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, > HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch, > HDFS-5810.020.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout
[ https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897505#comment-13897505 ] Wilfred Spiegelenburg commented on HDFS-4858: - I agree with Aaron: the change could be a simple change confined to the DatanodeProtocolClientSideTranslatorPB and leave the Client as is. That will remove the change of regressions in other areas that rely on the Client. Whether you want to use Client#getTimeout or Client#getPingInterval that is up to you to decide. > HDFS DataNode to NameNode RPC should timeout > > > Key: HDFS-4858 > URL: https://issues.apache.org/jira/browse/HDFS-4858 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha > Environment: Redhat/CentOS 6.4 64 bit Linux >Reporter: Jagane Sundar >Assignee: Konstantin Boudnik >Priority: Minor > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-4858.patch, HDFS-4858.patch > > > The DataNode is configured with ipc.client.ping false and ipc.ping.interval > 14000. This configuration means that the IPC Client (DataNode, in this case) > should timeout in 14000 seconds if the Standby NameNode does not respond to a > sendHeartbeat. > What we observe is this: If the Standby NameNode happens to reboot for any > reason, the DataNodes that are heartbeating to this Standby get stuck forever > while trying to sendHeartbeat. See Stack trace included below. When the > Standby NameNode comes back up, we find that the DataNode never re-registers > with the Standby NameNode. Thereafter failover completely fails. > The desired behavior is that the DataNode's sendHeartbeat should timeout in > 14 seconds, and keep retrying till the Standby NameNode comes back up. When > it does, the DataNode should reconnect, re-register, and offer service. > Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the > method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to > create the DatanodeProtocolPB object. > Stack trace of thread stuck in the DataNode after the Standby NN has rebooted: > Thread 25 (DataNode: [file:///opt/hadoop/data] heartbeating to > vmhost6-vm1/10.10.10.151:8020): > State: WAITING > Blocked count: 23843 > Waited count: 45676 > Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5 > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:485) > org.apache.hadoop.ipc.Client.call(Client.java:1220) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > java.lang.reflect.Method.invoke(Method.java:597) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) > java.lang.Thread.run(Thread.java:662) > DataNode RPC to Standby NameNode never times out. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5888) Cannot get the FileStatus of the root inode from the new Globber
[ https://issues.apache.org/jira/browse/HDFS-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897478#comment-13897478 ] Andrew Wang commented on HDFS-5888: --- I poked around to debug the failing test. It turns out we have some FC only code in TestGlobPaths#TestGlobFillsInScheme: {code} if (fc != null) { // If we're using FileContext, then we can list a file:/// URI. // Since everyone should have the root directory, we list that. statuses = wrap.globStatus(new Path("file:///"), new AcceptAllPathFilter()); Assert.assertEquals(1, statuses.length); Path filePath = statuses[0].getPath(); Assert.assertEquals("file", filePath.toUri().getScheme()); Assert.assertEquals("/", filePath.toUri().getPath()); } {code} The tricky part here is that the default filesystem for this FileContext is an HDFS, which is why Jenkins is picking up "localhost:port" for the authority in Globber#authorityFromPath: {code} authority = fc.getDefaultFileSystem().getUri().getAuthority(); {code} If I change it to this, the test passes: {code} authority = fc.getFSofPath(path).getUri().getAuthority(); {code} I think the error stems from how file:// URIs have a null authority, and we shouldn't fill it in. I think the fix is to use getFSofPath for both FC and FS in authorityFromPath. > Cannot get the FileStatus of the root inode from the new Globber > > > Key: HDFS-5888 > URL: https://issues.apache.org/jira/browse/HDFS-5888 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Colin Patrick McCabe > Attachments: HDFS-5888.002.patch > > > We can no longer get the correct FileStatus of the root inode "/" from the > Globber. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5923) Do not persist the ACL bit in the FsPermission
[ https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897475#comment-13897475 ] Fengdong Yu commented on HDFS-5923: --- Thanks Zhao Jing. Another question, HDFS-5968 has serialized FsImage using Protobuf, then does that also serialized ACL states? I don't think we've done it. because HDFS-4685 not merged to trunk yet. > Do not persist the ACL bit in the FsPermission > -- > > Key: HDFS-5923 > URL: https://issues.apache.org/jira/browse/HDFS-5923 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, security >Reporter: Haohui Mai >Assignee: Haohui Mai > > The current implementation persists and ACL bit in FSImage and editlogs. > Moreover, the security decisions also depend on whether the bit is set. > The problem here is that we have to maintain the implicit invariant, which is > the ACL bit is set if and only if the the inode has AclFeature. The invariant > has to be maintained everywhere otherwise it can lead to a security > vulnerability. In the worst case, an attacker can toggle the bit and bypass > the ACL checks. > The jira proposes to treat the ACL bit as a transient bit. The bit should not > be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout
[ https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897468#comment-13897468 ] Aaron T. Myers commented on HDFS-4858: -- bq. It may seem a simple fix, but you are absolutely right this will affect everything that is using Client.java, and there is a lot of things our there, such as your TaskTracker, which we don't know about, but can break them because of that. Why don't we open a separate jira for your proposal. If you want to do a small fix that is just isolated to the DN, and has no farther-reaching implications, then my suggestion would be to remove the changes to {{Client}} from this patch and change the call to {{Client#getTimeout}} in {{DatanodeProtocolClientSideTranslatorPB#createNamenode}} to instead call {{Client#getPingInterval}}. This should have the same net effect for DN RPCs without possibly impacting anything else. > HDFS DataNode to NameNode RPC should timeout > > > Key: HDFS-4858 > URL: https://issues.apache.org/jira/browse/HDFS-4858 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha > Environment: Redhat/CentOS 6.4 64 bit Linux >Reporter: Jagane Sundar >Assignee: Konstantin Boudnik >Priority: Minor > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-4858.patch, HDFS-4858.patch > > > The DataNode is configured with ipc.client.ping false and ipc.ping.interval > 14000. This configuration means that the IPC Client (DataNode, in this case) > should timeout in 14000 seconds if the Standby NameNode does not respond to a > sendHeartbeat. > What we observe is this: If the Standby NameNode happens to reboot for any > reason, the DataNodes that are heartbeating to this Standby get stuck forever > while trying to sendHeartbeat. See Stack trace included below. When the > Standby NameNode comes back up, we find that the DataNode never re-registers > with the Standby NameNode. Thereafter failover completely fails. > The desired behavior is that the DataNode's sendHeartbeat should timeout in > 14 seconds, and keep retrying till the Standby NameNode comes back up. When > it does, the DataNode should reconnect, re-register, and offer service. > Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the > method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to > create the DatanodeProtocolPB object. > Stack trace of thread stuck in the DataNode after the Standby NN has rebooted: > Thread 25 (DataNode: [file:///opt/hadoop/data] heartbeating to > vmhost6-vm1/10.10.10.151:8020): > State: WAITING > Blocked count: 23843 > Waited count: 45676 > Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5 > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:485) > org.apache.hadoop.ipc.Client.call(Client.java:1220) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > java.lang.reflect.Method.invoke(Method.java:597) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) > java.lang.Thread.run(Thread.java:662) > DataNode RPC to Standby NameNode never times out. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5923) Do not persist the ACL bit in the FsPermission
[ https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897466#comment-13897466 ] Jing Zhao commented on HDFS-5923: - Hi Fengdong, here Haohui refers to the ACL bit, not the whole ACL state. The ACL information will still be persisted in editlog and fsimage. > Do not persist the ACL bit in the FsPermission > -- > > Key: HDFS-5923 > URL: https://issues.apache.org/jira/browse/HDFS-5923 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, security >Reporter: Haohui Mai >Assignee: Haohui Mai > > The current implementation persists and ACL bit in FSImage and editlogs. > Moreover, the security decisions also depend on whether the bit is set. > The problem here is that we have to maintain the implicit invariant, which is > the ACL bit is set if and only if the the inode has AclFeature. The invariant > has to be maintained everywhere otherwise it can lead to a security > vulnerability. In the worst case, an attacker can toggle the bit and bypass > the ACL checks. > The jira proposes to treat the ACL bit as a transient bit. The bit should not > be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5923) Do not persist the ACL bit in the FsPermission
[ https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897462#comment-13897462 ] Fengdong Yu commented on HDFS-5923: --- Does that all Acl setting disappeaered after NN restart if we don't persisit ACL state in the fsImage? > Do not persist the ACL bit in the FsPermission > -- > > Key: HDFS-5923 > URL: https://issues.apache.org/jira/browse/HDFS-5923 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, security >Reporter: Haohui Mai >Assignee: Haohui Mai > > The current implementation persists and ACL bit in FSImage and editlogs. > Moreover, the security decisions also depend on whether the bit is set. > The problem here is that we have to maintain the implicit invariant, which is > the ACL bit is set if and only if the the inode has AclFeature. The invariant > has to be maintained everywhere otherwise it can lead to a security > vulnerability. In the worst case, an attacker can toggle the bit and bypass > the ACL checks. > The jira proposes to treat the ACL bit as a transient bit. The bit should not > be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5923) Do not persist the ACL bit in the FsPermission
Haohui Mai created HDFS-5923: Summary: Do not persist the ACL bit in the FsPermission Key: HDFS-5923 URL: https://issues.apache.org/jira/browse/HDFS-5923 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai The current implementation persists and ACL bit in FSImage and editlogs. Moreover, the security decisions also depend on whether the bit is set. The problem here is that we have to maintain the implicit invariant, which is the ACL bit is set if and only if the the inode has AclFeature. The invariant has to be maintained everywhere otherwise it can lead to a security vulnerability. In the worst case, an attacker can toggle the bit and bypass the ACL checks. The jira proposes to treat the ACL bit as a transient bit. The bit should not be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout
[ https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897452#comment-13897452 ] Konstantin Shvachko commented on HDFS-4858: --- Ok, sounds like you don't want it fixed in this release. > HDFS DataNode to NameNode RPC should timeout > > > Key: HDFS-4858 > URL: https://issues.apache.org/jira/browse/HDFS-4858 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha > Environment: Redhat/CentOS 6.4 64 bit Linux >Reporter: Jagane Sundar >Assignee: Konstantin Boudnik >Priority: Minor > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-4858.patch, HDFS-4858.patch > > > The DataNode is configured with ipc.client.ping false and ipc.ping.interval > 14000. This configuration means that the IPC Client (DataNode, in this case) > should timeout in 14000 seconds if the Standby NameNode does not respond to a > sendHeartbeat. > What we observe is this: If the Standby NameNode happens to reboot for any > reason, the DataNodes that are heartbeating to this Standby get stuck forever > while trying to sendHeartbeat. See Stack trace included below. When the > Standby NameNode comes back up, we find that the DataNode never re-registers > with the Standby NameNode. Thereafter failover completely fails. > The desired behavior is that the DataNode's sendHeartbeat should timeout in > 14 seconds, and keep retrying till the Standby NameNode comes back up. When > it does, the DataNode should reconnect, re-register, and offer service. > Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the > method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to > create the DatanodeProtocolPB object. > Stack trace of thread stuck in the DataNode after the Standby NN has rebooted: > Thread 25 (DataNode: [file:///opt/hadoop/data] heartbeating to > vmhost6-vm1/10.10.10.151:8020): > State: WAITING > Blocked count: 23843 > Waited count: 45676 > Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5 > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:485) > org.apache.hadoop.ipc.Client.call(Client.java:1220) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > java.lang.reflect.Method.invoke(Method.java:597) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) > java.lang.Thread.run(Thread.java:662) > DataNode RPC to Standby NameNode never times out. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897447#comment-13897447 ] Colin Patrick McCabe commented on HDFS-5810: munmap is going to be manipulating things in memory; mmap often has to hit disk. That's why the latter is more expensive. Recent Linux kernels have more fine-grained locking in this area, although I'm not an expert on that area of the kernel. We can't do I/O while holding a global client-side lock-- clients like HBase have on the order of 10k open files and we don't want to block everyone. bq. ClientContext#getFromConf, can we push the creation of a new DFSClient.Conf into #get when it's necessary? Seems better to avoid doing all those hash lookups. That method is really only for tests, where it's inconvenient to dig around to get a DFSClient.Conf. I will add a comment explaining that this is mostly for testing. (I think JspHelper uses it too.) bq. We removed the javadoc parameter descriptions in a few places, some of which were helpful (e.g. len of -1 means read as many bytes as possible). Could we add the one-line docs back to the builder variables? Good idea. I added javadoc for the BlockReaderFactory members. bq. Mind adding "dfs.client.cached.conn.retry" to hdfs-default.xml? OK. bq. cacheTries now counts down instead of counting up, so I think it needs a new name. cacheTriesRemaining isn't great, but something like that. ok bq. cacheTries used to also only tick when we got a stale peer out of the cache. Now, nextTcpPeer and nextDomainPeer tick cacheTries unconditionally. The effect is the same, since if we get a non-stale (i.e. usable) peer out of the cache, we're done. Centralizing it is a good idea since it avoids the kind of bugs we had in the past where we forgot to handle certain kinds of retries correctly. bq. Previously, we would disable domain sockets or throw an exception if we hit an error when using a new Peer (domain or TCP respectively). Now, we don't know if a peer is cached or new, and spin until we run out of cacheTries (which isn't really related here). OK, that's fair. That variable is supposed to be about how many times we'll try the *cache*, not how many times we'll retry in general. Fixed. > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, > HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, > HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout
[ https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897445#comment-13897445 ] Aaron T. Myers commented on HDFS-4858: -- Even if we choose to not fix this in a more general way in this JIRA, I don't think we should be changing the default behavior of whether or not to do client pings in this patch. That change also has the potential to affect things well beyond HDFS. > HDFS DataNode to NameNode RPC should timeout > > > Key: HDFS-4858 > URL: https://issues.apache.org/jira/browse/HDFS-4858 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha > Environment: Redhat/CentOS 6.4 64 bit Linux >Reporter: Jagane Sundar >Assignee: Konstantin Boudnik >Priority: Minor > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-4858.patch, HDFS-4858.patch > > > The DataNode is configured with ipc.client.ping false and ipc.ping.interval > 14000. This configuration means that the IPC Client (DataNode, in this case) > should timeout in 14000 seconds if the Standby NameNode does not respond to a > sendHeartbeat. > What we observe is this: If the Standby NameNode happens to reboot for any > reason, the DataNodes that are heartbeating to this Standby get stuck forever > while trying to sendHeartbeat. See Stack trace included below. When the > Standby NameNode comes back up, we find that the DataNode never re-registers > with the Standby NameNode. Thereafter failover completely fails. > The desired behavior is that the DataNode's sendHeartbeat should timeout in > 14 seconds, and keep retrying till the Standby NameNode comes back up. When > it does, the DataNode should reconnect, re-register, and offer service. > Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the > method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to > create the DatanodeProtocolPB object. > Stack trace of thread stuck in the DataNode after the Standby NN has rebooted: > Thread 25 (DataNode: [file:///opt/hadoop/data] heartbeating to > vmhost6-vm1/10.10.10.151:8020): > State: WAITING > Blocked count: 23843 > Waited count: 45676 > Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5 > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:485) > org.apache.hadoop.ipc.Client.call(Client.java:1220) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > java.lang.reflect.Method.invoke(Method.java:597) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) > java.lang.Thread.run(Thread.java:662) > DataNode RPC to Standby NameNode never times out. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout
[ https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897431#comment-13897431 ] Konstantin Shvachko commented on HDFS-4858: --- I understand now what you mean. So this is the same problem, but you want to fix it in a generic way. It may seem a simple fix, but you are absolutely right this will affect everything that is using Client.java, and there is a lot of things our there, such as your TaskTracker, which we don't know about, but can break them because of that. Why don't we open a separate jira for your proposal. Are you OK with committing this? My +1 > HDFS DataNode to NameNode RPC should timeout > > > Key: HDFS-4858 > URL: https://issues.apache.org/jira/browse/HDFS-4858 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha > Environment: Redhat/CentOS 6.4 64 bit Linux >Reporter: Jagane Sundar >Assignee: Konstantin Boudnik >Priority: Minor > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-4858.patch, HDFS-4858.patch > > > The DataNode is configured with ipc.client.ping false and ipc.ping.interval > 14000. This configuration means that the IPC Client (DataNode, in this case) > should timeout in 14000 seconds if the Standby NameNode does not respond to a > sendHeartbeat. > What we observe is this: If the Standby NameNode happens to reboot for any > reason, the DataNodes that are heartbeating to this Standby get stuck forever > while trying to sendHeartbeat. See Stack trace included below. When the > Standby NameNode comes back up, we find that the DataNode never re-registers > with the Standby NameNode. Thereafter failover completely fails. > The desired behavior is that the DataNode's sendHeartbeat should timeout in > 14 seconds, and keep retrying till the Standby NameNode comes back up. When > it does, the DataNode should reconnect, re-register, and offer service. > Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the > method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to > create the DatanodeProtocolPB object. > Stack trace of thread stuck in the DataNode after the Standby NN has rebooted: > Thread 25 (DataNode: [file:///opt/hadoop/data] heartbeating to > vmhost6-vm1/10.10.10.151:8020): > State: WAITING > Blocked count: 23843 > Waited count: 45676 > Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5 > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:485) > org.apache.hadoop.ipc.Client.call(Client.java:1220) > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > java.lang.reflect.Method.invoke(Method.java:597) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) > > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) > > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) > java.lang.Thread.run(Thread.java:662) > DataNode RPC to Standby NameNode never times out. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5920: Description: This jira provides rollback functionality for NameNode and JournalNode in rolling upgrade. Currently the proposed rollback for rolling upgrade is: 1. Shutdown both NN 2. Start one of the NN using "-rollingUpgrade rollback" option 3. This NN will load the special fsimage right before the upgrade marker, then discard all the editlog segments after the txid of the fsimage 4. The NN will also send RPC requests to all the JNs to discard editlog segments. This call expects response from all the JNs. The NN will keep running if the call succeeds. 5. We start the other NN using bootstrapstandby rather than "-rollingUpgrade rollback" was:This jira provides rollback functionality for NameNode and JournalNode in rolling upgrade. > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, > HDFS-5920.001.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. > Currently the proposed rollback for rolling upgrade is: > 1. Shutdown both NN > 2. Start one of the NN using "-rollingUpgrade rollback" option > 3. This NN will load the special fsimage right before the upgrade marker, > then discard all the editlog segments after the txid of the fsimage > 4. The NN will also send RPC requests to all the JNs to discard editlog > segments. This call expects response from all the JNs. The NN will keep > running if the call succeeds. > 5. We start the other NN using bootstrapstandby rather than "-rollingUpgrade > rollback" -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5920: Attachment: HDFS-5920.001.patch Update the patch: # address Suresh's comments # add unit tests for JN's rollback # fix a bug in JN to update the committedTxnId after discarding journal segments. > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, > HDFS-5920.001.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698
[ https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897393#comment-13897393 ] Haohui Mai commented on HDFS-5914: -- Thanks Chris for the comments. The v1 patch no longer serializes the ACLs for a symlink. Based on the discussion of HDFS-5899, this patch removes the {{TestAclConfigFlag#testFsImage}} test. > Incorporate ACLs with the changes from HDFS-5698 > > > Key: HDFS-5914 > URL: https://issues.apache.org/jira/browse/HDFS-5914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, security >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5914.000.patch, HDFS-5914.001.patch > > > HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be > updated to work with these changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698
[ https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5914: - Attachment: HDFS-5914.001.patch > Incorporate ACLs with the changes from HDFS-5698 > > > Key: HDFS-5914 > URL: https://issues.apache.org/jira/browse/HDFS-5914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, security >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5914.000.patch, HDFS-5914.001.patch > > > HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be > updated to work with these changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897387#comment-13897387 ] Chris Nauroth commented on HDFS-5899: - Both [~cmccabe] and [~wheat9] have expressed concerns about causing pain for administrators if we have code that aborts intentionally while loading fsimage or edits, so I think I need to reconsider this. Regarding skipping enforcement, my concern is the risk of unintentionally widening permissions due to interactions with the mask entry. (The full explanation is in my prior comment.) Here is a compromise proposal. Let's reject the API calls when {{dfs.namenode.acls.enabled}} is false, but let's still load *and enforce* all existing ACLs found in fsimage or edits. I expect that addresses the concerns about administrative pain, and it addresses my concerns about weakening enforcement. This does mean that the config flag is not a hard restriction, but admins who really want to nuke all ACLs can still use the procedure I described, and I expect this to be a rare occurrence. It looks like an acceptable compromise to me. Do others agree? If so, then I'll file a new issue for the change. Thank you, Colin and Haohui. > Add configuration flag to disable/enable support for ACLs. > -- > > Key: HDFS-5899 > URL: https://issues.apache.org/jira/browse/HDFS-5899 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: HDFS ACLs (HDFS-4685) > > Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch > > > Add a new configuration property that allows administrators to toggle support > for HDFS ACLs on/off. By default, the flag will be off. This is a > conservative choice, and administrators interested in using ACLs can enable > it explicitly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-95) UnknownHostException if the system can't determine its own name and you go DNS.getIPs("name-of-an-unknown-interface");
[ https://issues.apache.org/jira/browse/HDFS-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HDFS-95. Resolution: Fixed Fix Version/s: 0.21.0 Assignee: Steve Loughran fixed in HADOOP-3426 > UnknownHostException if the system can't determine its own name and you go > DNS.getIPs("name-of-an-unknown-interface"); > -- > > Key: HDFS-95 > URL: https://issues.apache.org/jira/browse/HDFS-95 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 0.21.0 > > > If you give an interface that doesnt exist, DNS.getIPs falls back to > InetAddress.getLocalHost().getHostAddress() > But there's an assumption there: that InetAddress.getLocalHost(). is valid. > If it doesnt resolve properly, you get an UnknownHostException > java.net.UnknownHostException: k2: k2 > at java.net.InetAddress.getLocalHost(InetAddress.java:1353) > at org.apache.hadoop.net.DNS.getIPs(DNS.java:96) > at > org.apache.hadoop.net.TestDNS.testIPsOfUnknownInterface(TestDNS.java:73) > It is possible to catch this and return something else. The big question: > what to fall back to? 127.0.0.1 would be an obvious choice -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897377#comment-13897377 ] Hudson commented on HDFS-5921: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5142 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5142/]) HDFS-5921. Cannot browse file system via NN web UI if any directory has the sticky bit set. Contributed by Aaron T. Myers. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1566916) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/explorer.js > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Fix For: 2.3.0 > > Attachments: HDFS-5921.patch, HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5921: - Resolution: Fixed Fix Version/s: 2.3.0 Target Version/s: 2.3.0 (was: 2.4.0) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk, branch-2, and branch-2.3. > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Fix For: 2.3.0 > > Attachments: HDFS-5921.patch, HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5888) Cannot get the FileStatus of the root inode from the new Globber
[ https://issues.apache.org/jira/browse/HDFS-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897358#comment-13897358 ] Hadoop QA commented on HDFS-5888: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627683/HDFS-5888.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestGlobPaths {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6102//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6102//console This message is automatically generated. > Cannot get the FileStatus of the root inode from the new Globber > > > Key: HDFS-5888 > URL: https://issues.apache.org/jira/browse/HDFS-5888 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Colin Patrick McCabe > Attachments: HDFS-5888.002.patch > > > We can no longer get the correct FileStatus of the root inode "/" from the > Globber. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897357#comment-13897357 ] Aaron T. Myers commented on HDFS-5921: -- Since Jenkins came back clean I'm going to go ahead and commit this based on Andrew and Haohui's +1's. Thanks for the quick reviews, gents. > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5921.patch, HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897297#comment-13897297 ] Hadoop QA commented on HDFS-5921: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628058/HDFS-5921.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6101//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6101//console This message is automatically generated. > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5921.patch, HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5922) DN heartbeat thread can get stuck in tight loop
[ https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897282#comment-13897282 ] Arpit Agarwal commented on HDFS-5922: - That sounds fine too. Thanks. > DN heartbeat thread can get stuck in tight loop > --- > > Key: HDFS-5922 > URL: https://issues.apache.org/jira/browse/HDFS-5922 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Arpit Agarwal > > We saw an issue recently on a test cluster where one of the DN threads was > consuming 100% of a single CPU. Running jstack indicated that it was the DN > heartbeat thread. I believe I've tracked down the cause to a bug in the > accounting around the value of {{pendingReceivedRequests}}. > More details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads
[ https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897283#comment-13897283 ] Hudson commented on HDFS-5915: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5141 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5141/]) HDFS-5915. Refactor FSImageFormatProtobuf to simplify cross section reads. Contributed by Haohui Mai. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1566824) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeduplicationMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java > Refactor FSImageFormatProtobuf to simplify cross section reads > -- > > Key: HDFS-5915 > URL: https://issues.apache.org/jira/browse/HDFS-5915 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 3.0.0 > > Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch > > > The PB-based FSImage puts the user name and the group name into a separate > section for deduplication. This jira refactor the code so that it is easier > to apply the same techniques for other types of data (e.g., > {{INodeReference}}) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads
[ https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5915: Affects Version/s: 3.0.0 > Refactor FSImageFormatProtobuf to simplify cross section reads > -- > > Key: HDFS-5915 > URL: https://issues.apache.org/jira/browse/HDFS-5915 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 3.0.0 > > Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch > > > The PB-based FSImage puts the user name and the group name into a separate > section for deduplication. This jira refactor the code so that it is easier > to apply the same techniques for other types of data (e.g., > {{INodeReference}}) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads
[ https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5915: Resolution: Fixed Fix Version/s: 3.0.0 Target Version/s: 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 I committed this to trunk. Haohui, thank you for the patch. > Refactor FSImageFormatProtobuf to simplify cross section reads > -- > > Key: HDFS-5915 > URL: https://issues.apache.org/jira/browse/HDFS-5915 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 3.0.0 > > Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch > > > The PB-based FSImage puts the user name and the group name into a separate > section for deduplication. This jira refactor the code so that it is easier > to apply the same techniques for other types of data (e.g., > {{INodeReference}}) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads
[ https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5915: Component/s: namenode > Refactor FSImageFormatProtobuf to simplify cross section reads > -- > > Key: HDFS-5915 > URL: https://issues.apache.org/jira/browse/HDFS-5915 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 3.0.0 > > Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch > > > The PB-based FSImage puts the user name and the group name into a separate > section for deduplication. This jira refactor the code so that it is easier > to apply the same techniques for other types of data (e.g., > {{INodeReference}}) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897176#comment-13897176 ] Hadoop QA commented on HDFS-5810: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628100/HDFS-5810.019.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6103//console This message is automatically generated. > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, > HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, > HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897170#comment-13897170 ] Andrew Wang commented on HDFS-5810: --- Hi Colin, some replies and new comments. I looked at the remaining parts of the previous patch, I haven't looked at the newest rev yet: Replies: bq. Sure, let's just doc it. bq. polymorphic Object in SCReplica Sure, this is just a style nit. If you tried it the other way and it looked worse, it's fine to leave it as is. bq. I guess this is for my own edification, but isn't munmap going to be approximately the same cost as mmap? Both involve updating the page tables and a TLB flush AFAIK, which should be order microseconds. This could be pushed up to milliseconds if the page tables are swapped out, but that's again an issue for both. I'd like to be internally consistent with regard to our locking, if it's a performance argument. Overall, I feel like microseconds are not a big deal, and mmap/munmap themselves have to grab a kernel lock. The code savings from removing the CV also aren't bad, since we could reduce the polymorphism of SCReplica#mmapData. Some new comments too (I think I've looked at all the changed files at this point): ClientContext: * ClientContext#confAsString has a dupe of socketCacheExpiry. Do we also need the mmap cache settings here? * ClientContext#getFromConf, can we push the creation of a new DFSClient.Conf into #get when it's necessary? Seems better to avoid doing all those hash lookups. BlockReaderFactory: * We removed the javadoc parameter descriptions in a few places, some of which were helpful (e.g. {{len}} of {{-1}} means read as many bytes as possible). Could we add the one-line docs back to the builder variables? * Mind adding "dfs.client.cached.conn.retry" to hdfs-default.xml? * cacheTries now counts down instead of counting up, so I think it needs a new name. cacheTriesRemaining isn't great, but something like that. * cacheTries used to also only tick when we got a stale peer out of the cache. Now, nextTcpPeer and nextDomainPeer tick cacheTries unconditionally. * Previously, we would disable domain sockets or throw an exception if we hit an error when using a new Peer (domain or TCP respectively). Now, we don't know if a peer is cached or new, and spin until we run out of cacheTries (which isn't really related here). > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, > HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, > HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5922) DN heartbeat thread can get stuck in tight loop
[ https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5922 started by Arpit Agarwal. > DN heartbeat thread can get stuck in tight loop > --- > > Key: HDFS-5922 > URL: https://issues.apache.org/jira/browse/HDFS-5922 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Arpit Agarwal > > We saw an issue recently on a test cluster where one of the DN threads was > consuming 100% of a single CPU. Running jstack indicated that it was the DN > heartbeat thread. I believe I've tracked down the cause to a bug in the > accounting around the value of {{pendingReceivedRequests}}. > More details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5922) DN heartbeat thread can get stuck in tight loop
[ https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HDFS-5922: --- Assignee: Arpit Agarwal > DN heartbeat thread can get stuck in tight loop > --- > > Key: HDFS-5922 > URL: https://issues.apache.org/jira/browse/HDFS-5922 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Arpit Agarwal > > We saw an issue recently on a test cluster where one of the DN threads was > consuming 100% of a single CPU. Running jstack indicated that it was the DN > heartbeat thread. I believe I've tracked down the cause to a bug in the > accounting around the value of {{pendingReceivedRequests}}. > More details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897159#comment-13897159 ] Colin Patrick McCabe commented on HDFS-5810: I uploaded a new version which is rebased on trunk. It changes the "caller strings" for dumping stack traces, uses {{dfs.client.read.shortcircuit.streams.cache.size}} as an upper bound on the size of both mmapped and non-mmapped replicas, and uses {{TimeUnit}} for time conversions. I changed the handling of {{outstandingMmapCount}} a little bit. Although we still track this stat, we don't try to cap the number of outstanding mmaps. That is up to the caller code, not to us. This is similar to how we handle opening new FDs in general... we do it on request, no matter how many existing FDs there are. Only when something is returned to the cache do we apply the limits. > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, > HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, > HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5922) DN heartbeat thread can get stuck in tight loop
[ https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897155#comment-13897155 ] Aaron T. Myers commented on HDFS-5922: -- Hi Arpit, yes please do take a look at fixing it. I was hoping you'd notice it since I'm less familiar with this code. :) I didn't file it as a blocker against 2.3 because the window for hitting this is really quite narrow, it's not the end of the world if a DN ends up hitting this, and I don't want to further hold up the 2.3.0 release. I personally think we should target this for 2.3.1 / 2.4.0. That said, if you think this is more serious than I do, then we can certainly raise the priority and target it for 2.3.0 if you want. > DN heartbeat thread can get stuck in tight loop > --- > > Key: HDFS-5922 > URL: https://issues.apache.org/jira/browse/HDFS-5922 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers > > We saw an issue recently on a test cluster where one of the DN threads was > consuming 100% of a single CPU. Running jstack indicated that it was the DN > heartbeat thread. I believe I've tracked down the cause to a bug in the > accounting around the value of {{pendingReceivedRequests}}. > More details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5810: --- Attachment: HDFS-5810.019.patch > Unify mmap cache and short-circuit file descriptor cache > > > Key: HDFS-5810 > URL: https://issues.apache.org/jira/browse/HDFS-5810 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, > HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, > HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch > > > We should unify the client mmap cache and the client file descriptor cache. > Since mmaps are granted corresponding to file descriptors in the cache > (currently FileInputStreamCache), they have to be tracked together to do > "smarter" things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5922) DN heartbeat thread can get stuck in tight loop
[ https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897151#comment-13897151 ] Arpit Agarwal commented on HDFS-5922: - Hi Aaron, Good catch and thanks for the detailed explanation. I can fix it today if you haven't started. This probably needs to be in 2.3. > DN heartbeat thread can get stuck in tight loop > --- > > Key: HDFS-5922 > URL: https://issues.apache.org/jira/browse/HDFS-5922 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers > > We saw an issue recently on a test cluster where one of the DN threads was > consuming 100% of a single CPU. Running jstack indicated that it was the DN > heartbeat thread. I believe I've tracked down the cause to a bug in the > accounting around the value of {{pendingReceivedRequests}}. > More details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5922) DN heartbeat thread can get stuck in tight loop
[ https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897137#comment-13897137 ] Aaron T. Myers commented on HDFS-5922: -- In the heartbeat thread in BPServiceActor, we have the following: {code} if (waitTime > 0 && pendingReceivedRequests == 0) { try { pendingIncrementalBRperStorage.wait(waitTime); {code} This means that if for some reason the value of {{pendingReceivedRequests}} permanently stays positive then we will never sleep in the heartbeat thread. The question, then, is what can cause this value to stay positive. I believe the issue is that in {{BPServiceActor#addPendingReplicationBlockInfo}} we might not increase the size of the {{PerStoragePendingIncrementalBR}} if there is already an entry for a given block in there: {code} // Make sure another entry for the same block is first removed. // There may only be one such entry. for (Map.Entry entry : pendingIncrementalBRperStorage.entrySet()) { if (entry.getValue().removeBlockInfo(bInfo)) { break; } } getIncrementalBRMapForStorage(storageUuid).putBlockInfo(bInfo); {code} But in {{BPServiceActor#notifyNamenodeBlockImmediately}} we will always increment {{pendingReceivedRequests}} regardless of whether or not there was already an entry for the block: {code} void notifyNamenodeBlockImmediately( ReceivedDeletedBlockInfo bInfo, String storageUuid) { synchronized (pendingIncrementalBRperStorage) { addPendingReplicationBlockInfo(bInfo, storageUuid); pendingReceivedRequests++; pendingIncrementalBRperStorage.notifyAll(); } } {code} Then, in {{BPServiceActor#reportReceivedDeletedBlocks}}, we will only subtract the number of blocks that are actually in the {{PerStoragePendingIncrementalBR}} from {{pendingReceivedRequests}}: {code} ReceivedDeletedBlockInfo[] rdbi = perStorageMap.dequeueBlockInfos(); pendingReceivedRequests = (pendingReceivedRequests > rdbi.length ? (pendingReceivedRequests - rdbi.length) : 0); {code} This means that if we ever call {{BPServiceActor#notifyNamenodeBlockImmediately}} twice without calling {{BPServiceActor#reportReceivedDeletedBlocks}} in between, we will have {{pendingReceivedRequests}} at 2, but then only subtract 1 from it. [~andrew.wang] also pointed out offline that it is perhaps incorrect to be subtracting the number of _deleted_ blocks from {{pendingReceivedRequests}} in {{BPServiceActor#reportReceivedDeletedBlocks}}, but the result of that is somewhat less serious, since in that case the worst case is just that we send a somewhat delayed IBR. > DN heartbeat thread can get stuck in tight loop > --- > > Key: HDFS-5922 > URL: https://issues.apache.org/jira/browse/HDFS-5922 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers > > We saw an issue recently on a test cluster where one of the DN threads was > consuming 100% of a single CPU. Running jstack indicated that it was the DN > heartbeat thread. I believe I've tracked down the cause to a bug in the > accounting around the value of {{pendingReceivedRequests}}. > More details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897124#comment-13897124 ] Hadoop QA commented on HDFS-5921: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628038/HDFS-5921.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6100//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6100//console This message is automatically generated. > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5921.patch, HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5922) DN heartbeat thread can get stuck in tight loop
Aaron T. Myers created HDFS-5922: Summary: DN heartbeat thread can get stuck in tight loop Key: HDFS-5922 URL: https://issues.apache.org/jira/browse/HDFS-5922 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.3.0 Reporter: Aaron T. Myers We saw an issue recently on a test cluster where one of the DN threads was consuming 100% of a single CPU. Running jstack indicated that it was the DN heartbeat thread. I believe I've tracked down the cause to a bug in the accounting around the value of {{pendingReceivedRequests}}. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698
[ https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897116#comment-13897116 ] Chris Nauroth commented on HDFS-5914: - Sorry I missed the HDFS-5915 pre-requisite first time. A few minor comments: # {{FSImageFormatPBINode}}: Symlinks don't get ACLs of their own. Shall we skip serialization/deserialization of ACLs here for symlinks? # {{TestAclConfigFlag#testFsImage}} is failing now, because it allowed loading of an fsimage containing an ACL even though ACLs were disabled in configuration. Previously, this was rejected by {{FSImageFormat#loadAclFeature}} checking the config flag. # {{FSImageFormatProtobuf}}: Minor typo: I think {{saveExtendAclSection}} was meant to be named {{saveExtendedAclSection}}. > Incorporate ACLs with the changes from HDFS-5698 > > > Key: HDFS-5914 > URL: https://issues.apache.org/jira/browse/HDFS-5914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, security >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5914.000.patch > > > HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be > updated to work with these changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897092#comment-13897092 ] Jing Zhao commented on HDFS-5920: - Thanks for the comments Suresh! bq. JournalNodeRpcServer#doRollingRollback is an empty method Oops.. I forgot to put "jn.doRollingRollback(journalId, startTxId)" there... The functionality has been included in the 000 patch except the call there. bq. "their editlog starting at or above the given txid." Here I want to mean ">=". I will update it to make it more clear. bq. Instead of RollingRollbackRequest in QJournalProtocol, we may be able to call it discardSegments? Will rename it in the next patch. > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897074#comment-13897074 ] Suresh Srinivas commented on HDFS-5920: --- Few early comments: # Instead of RollingRollbackRequest in QJournalProtocol, we may be able to call it discardSegments? # I am assuming that based on your previous comment, JournalNodeRpcServer#doRollingRollback is an empty method and you are still implementing the functionality. # "their editlog starting at or above the given txid." Is this correct? Journal must delete records starting from given txid. If the transaction ends before this txid, then the journal can ignore the request. > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads
[ https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897073#comment-13897073 ] Hadoop QA commented on HDFS-5915: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628034/HDFS-5915.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6098//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6098//console This message is automatically generated. > Refactor FSImageFormatProtobuf to simplify cross section reads > -- > > Key: HDFS-5915 > URL: https://issues.apache.org/jira/browse/HDFS-5915 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch > > > The PB-based FSImage puts the user name and the group name into a separate > section for deduplication. This jira refactor the code so that it is easier > to apply the same techniques for other types of data (e.g., > {{INodeReference}}) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897069#comment-13897069 ] Jimmy Xiang commented on HDFS-4239: --- Ping. Can anyone take a look patch v4? Thanks. > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897059#comment-13897059 ] Suresh Srinivas commented on HDFS-5920: --- [~tlipcon], if you have time, can you please take a look at this patch as well? > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897054#comment-13897054 ] Haohui Mai commented on HDFS-5921: -- Looks good to me. +1 > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5921.patch, HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5920: -- Status: Open (was: Patch Available) > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5920: -- Status: Patch Available (was: Open) > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5921: - Attachment: HDFS-5921.patch Here's an updated patch which does 's/slice/substr/g'. > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5921.patch, HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5888) Cannot get the FileStatus of the root inode from the new Globber
[ https://issues.apache.org/jira/browse/HDFS-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897004#comment-13897004 ] Hadoop QA commented on HDFS-5888: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627683/HDFS-5888.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestGlobPaths org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6097//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6097//console This message is automatically generated. > Cannot get the FileStatus of the root inode from the new Globber > > > Key: HDFS-5888 > URL: https://issues.apache.org/jira/browse/HDFS-5888 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Colin Patrick McCabe > Attachments: HDFS-5888.002.patch > > > We can no longer get the correct FileStatus of the root inode "/" from the > Globber. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896948#comment-13896948 ] Andrew Wang commented on HDFS-5921: --- +1 pending Haohui's comment and Jenkins. I'd like to see this in 2.3.0 too, since it's a rather embarrassing bug. > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-4564: -- Target Version/s: 2.4.0 (was: 2.3.0) >From what I understand, this is an existing issue with 2.2. and is NOT a >regression. This can patch can go in if need be, but I am moving it to 2.4 to >unblock 2.3. Please revert back if you disagree. Thanks! > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, > HDFS-4564.branch-23.patch, HDFS-4564.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896919#comment-13896919 ] Haohui Mai commented on HDFS-5921: -- {code} +var otherExec = ((ctx.current().permission % 10) & 1) == 1; +res = res.slice(0, res.length - 1) + (otherExec ? 't' : 'T'); {code} You probably want to use {{substr}} instead of {{slice}}, as {{substr}} usually performs better than {{slice}} in this use case. (http://jsperf.com/string-slice-vs-substr). Here is an example: {code} var exec = ((ctx.current().permission % 10) & 1) == 1; res = res.substr(0, res.length - 1) + (exec ? 't' : 'T'); {code} > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads
[ https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896916#comment-13896916 ] Jing Zhao commented on HDFS-5915: - +1 pending Jenkins. > Refactor FSImageFormatProtobuf to simplify cross section reads > -- > > Key: HDFS-5915 > URL: https://issues.apache.org/jira/browse/HDFS-5915 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch > > > The PB-based FSImage puts the user name and the group name into a separate > section for deduplication. This jira refactor the code so that it is easier > to apply the same techniques for other types of data (e.g., > {{INodeReference}}) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5920: Attachment: HDFS-5920.000.patch Fixed some bugs and added a simple unit test covering the NN's local directory rollback. Still need to add tests for JNs' rollback. > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5921: - Attachment: HDFS-5921.patch Used the wrong file name for the patch. > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896904#comment-13896904 ] Hadoop QA commented on HDFS-5921: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6099//console This message is automatically generated. > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5921.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5921: - Attachment: (was: HDFS-5291.patch) > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5916) provide API to bulk delete directories/files
[ https://issues.apache.org/jira/browse/HDFS-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896902#comment-13896902 ] Sergey Shelukhin commented on HDFS-5916: 1-2-3 are both up to you, for the case I have in mind it should operate like a sequence of regular deletes, for (1) probably best-effort, 2 - no, 3 - non-atomically. But that could be controlled by parameters. 4 - what do other operations do? As far as I recall some of them can recover. Can you provide details on how to enforce multiple RPC calls in one for this case? We currently use FileSystem/DistributedFileSystem interface. The workaround wouldn't work, due to legacy users as well as due to the fact that the files/dirs are already in the same path, it's just that we don't want to delete all of them - e.g. from /path/A, /path/B/, /path/C/ and /path/D we only want to delete B and D (of course with longer lists) > provide API to bulk delete directories/files > > > Key: HDFS-5916 > URL: https://issues.apache.org/jira/browse/HDFS-5916 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin > > It would be nice to have an API to delete directories and files in bulk - for > example, when deleting Hive partitions or HBase regions in large numbers, the > code could avoid many trips to NN. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5921: - Target Version/s: 2.4.0 Status: Patch Available (was: Open) > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5291.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
[ https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5921: - Attachment: HDFS-5291.patch Simple patch which fixes the issue. The code in question should have never been referencing the "params" variable at all, and the code which was attempting to replace the last character of the string in the case of the sticky bit will not in fact replace anything in the string. I tested this manually and it seems to work as intended. > Cannot browse file system via NN web UI if any directory has the sticky bit > set > --- > > Key: HDFS-5921 > URL: https://issues.apache.org/jira/browse/HDFS-5921 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5291.patch > > > You'll see an error like this in the JS console if any directory has the > sticky bit set: > {noformat} > 'helper_to_permission': function(chunk, ctx, bodies, params) { > > var exec = ((parms.perm % 10) & 1) == 1; > Uncaught ReferenceError: parms is not defined > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5916) provide API to bulk delete directories/files
[ https://issues.apache.org/jira/browse/HDFS-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896889#comment-13896889 ] Haohui Mai commented on HDFS-5916: -- I have a few questions: # What would be the semantic of the call if one of the deletion has failed? # Should this operation be atomic? # When should the changes propagate to other users? # When should happen when the operation happen in the middle of NN failover? I can't think of good answers of any of these questions, thus it looks to me that the semantic at the file system layer is unclear. Maybe it is better to implement as multiple RPC calls, but the RPC the messages are sent in the same packet. Alternatively, if you are able to put the files into a single directory then it might solve your problem :-) > provide API to bulk delete directories/files > > > Key: HDFS-5916 > URL: https://issues.apache.org/jira/browse/HDFS-5916 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin > > It would be nice to have an API to delete directories and files in bulk - for > example, when deleting Hive partitions or HBase regions in large numbers, the > code could avoid many trips to NN. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5837) dfs.namenode.replication.considerLoad does not consider decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896883#comment-13896883 ] Tao Luo commented on HDFS-5837: --- Thanks Konstantin! > dfs.namenode.replication.considerLoad does not consider decommissioned nodes > > > Key: HDFS-5837 > URL: https://issues.apache.org/jira/browse/HDFS-5837 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.6-alpha, 2.2.0 >Reporter: Bryan Beaudreault >Assignee: Tao Luo > Fix For: 2.3.0 > > Attachments: HDFS-5837.patch, HDFS-5837_B.patch, HDFS-5837_C.patch, > HDFS-5837_C_branch_2.2.0.patch, HDFS-5837_branch_2.2.0.patch > > > In DefaultBlockPlacementPolicy, there is a setting > dfs.namenode.replication.considerLoad which tries to balance the load of the > cluster when choosing replica locations. This code does not take into > account decommissioned nodes. > The code for considerLoad calculates the load by doing: TotalClusterLoad / > numNodes. However, numNodes includes decommissioned nodes (which have 0 > load). Therefore, the average load is artificially low. Example: > TotalLoad = 250 > numNodes = 100 > decommissionedNodes = 70 > remainingNodes = numNodes - decommissionedNodes = 30 > avgLoad = 250/100 = 2.50 > trueAvgLoad = 250 / 30 = 8.33 > If the real load of the remaining 30 nodes is (on average) 8.33, this is more > than 2x the calculated average load of 2.50. This causes these nodes to be > rejected as replica locations. The final result is that all nodes are > rejected, and no replicas can be placed. > See exceptions printed from client during this scenario: > https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads
[ https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5915: - Attachment: HDFS-5915.001.patch > Refactor FSImageFormatProtobuf to simplify cross section reads > -- > > Key: HDFS-5915 > URL: https://issues.apache.org/jira/browse/HDFS-5915 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch > > > The PB-based FSImage puts the user name and the group name into a separate > section for deduplication. This jira refactor the code so that it is easier > to apply the same techniques for other types of data (e.g., > {{INodeReference}}) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896858#comment-13896858 ] Chris Nauroth commented on HDFS-5899: - bq. I agree that we should never wipe ACLs automatically. But what's the problem with just not enforcing them when dfs.namenode.acls.enabled is false? Why do we have to fail to start up? That seems like it will introduce problems for admins. If ACLs were defined, but not enforced, then the cluster would be in a state of partial enforcement. The traditional permission bits would be enforced, but the ACLs would be ignored during permission checks. In all respects, it would appear to end users that they have set an ACL correctly on their file, but they wouldn't know that the rules aren't really being enforced. This could open a risk of unauthorized access. It's particularly dangerous when we consider that for an inode with an ACL, the group permission bits store the mask, not the group permissions. The default setting of the mask is calculated as the union of permissions for all named user entries, named group entries, and the unnamed group entry in the ACL. This union may be wider than the permissions intended for the file's group. The combination of {{dfs.permissions.enabled=false}} + {{dfs.namenode.acls.enabled=true}} would work for deployments that want to allow setting of ACLs but skip enforcement (and also skip enforcement of permission bits). The motivation for this patch was to provide a "feature flag". (Sorry to bring that phrase up again and risk confusion with HDFS-5223, but it's the best description.) An admin can leave this toggled off and be guaranteed that the feature is completely off, including no consumption of RAM or disk by ACLs. Note that in order to reach this state, the admin must have toggled ACL support on in configuration at some point. It's off by default, so turning it on was a conscious decision. Then, the admin has a change of heart and decides to turn ACLs off, but meanwhile, a user snuck in with a setfacl. I expect this to be a rare situation. bq. How do you propose that the admin do this? Our existing tools have it covered. Startup with ACLs enabled. Remove ACLs using setfacl -x. There is a recursive option if it's necessary to remove from a whole sub-tree. Enter safe mode. Save a new checkpoint. Restart with ACLs disabled. > Add configuration flag to disable/enable support for ACLs. > -- > > Key: HDFS-5899 > URL: https://issues.apache.org/jira/browse/HDFS-5899 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: HDFS ACLs (HDFS-4685) > > Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch > > > Add a new configuration property that allows administrators to toggle support > for HDFS ACLs on/off. By default, the flag will be off. This is a > conservative choice, and administrators interested in using ACLs can enable > it explicitly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5837) dfs.namenode.replication.considerLoad does not consider decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Luo updated HDFS-5837: -- Attachment: HDFS-5837_C_branch_2.2.0.patch > dfs.namenode.replication.considerLoad does not consider decommissioned nodes > > > Key: HDFS-5837 > URL: https://issues.apache.org/jira/browse/HDFS-5837 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.6-alpha, 2.2.0 >Reporter: Bryan Beaudreault >Assignee: Tao Luo > Fix For: 2.3.0 > > Attachments: HDFS-5837.patch, HDFS-5837_B.patch, HDFS-5837_C.patch, > HDFS-5837_C_branch_2.2.0.patch, HDFS-5837_branch_2.2.0.patch > > > In DefaultBlockPlacementPolicy, there is a setting > dfs.namenode.replication.considerLoad which tries to balance the load of the > cluster when choosing replica locations. This code does not take into > account decommissioned nodes. > The code for considerLoad calculates the load by doing: TotalClusterLoad / > numNodes. However, numNodes includes decommissioned nodes (which have 0 > load). Therefore, the average load is artificially low. Example: > TotalLoad = 250 > numNodes = 100 > decommissionedNodes = 70 > remainingNodes = numNodes - decommissionedNodes = 30 > avgLoad = 250/100 = 2.50 > trueAvgLoad = 250 / 30 = 8.33 > If the real load of the remaining 30 nodes is (on average) 8.33, this is more > than 2x the calculated average load of 2.50. This causes these nodes to be > rejected as replica locations. The final result is that all nodes are > rejected, and no replicas can be placed. > See exceptions printed from client during this scenario: > https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698
[ https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896846#comment-13896846 ] Haohui Mai commented on HDFS-5914: -- You'll need to apply HDFS-5915 before this patch. > Incorporate ACLs with the changes from HDFS-5698 > > > Key: HDFS-5914 > URL: https://issues.apache.org/jira/browse/HDFS-5914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, security >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5914.000.patch > > > HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be > updated to work with these changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
Aaron T. Myers created HDFS-5921: Summary: Cannot browse file system via NN web UI if any directory has the sticky bit set Key: HDFS-5921 URL: https://issues.apache.org/jira/browse/HDFS-5921 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical You'll see an error like this in the JS console if any directory has the sticky bit set: {noformat} 'helper_to_permission': function(chunk, ctx, bodies, params) { var exec = ((parms.perm % 10) & 1) == 1; Uncaught ReferenceError: parms is not defined {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896844#comment-13896844 ] Jing Zhao commented on HDFS-5920: - I will add unit tests later. > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads
[ https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896841#comment-13896841 ] Jing Zhao commented on HDFS-5915: - The patch looks pretty good to me. It will be better to have unit test for the new Loader/SaverContext. +1 after addressing the comment. > Refactor FSImageFormatProtobuf to simplify cross section reads > -- > > Key: HDFS-5915 > URL: https://issues.apache.org/jira/browse/HDFS-5915 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5915.000.patch > > > The PB-based FSImage puts the user name and the group name into a separate > section for deduplication. This jira refactor the code so that it is easier > to apply the same techniques for other types of data (e.g., > {{INodeReference}}) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5920: Attachment: HDFS-5920.000.patch Preliminary patch for review. The patch still depends on some funtionality provided by HDFS-5889. For rollback (for rolling upgrade) in JNs, this patch simply adds a new RPC call "doRollingRollback" (doRollback is used for rollback in HA setup). This RollingRollback is idempotent and expect response from all the JNs. > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5920.000.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896828#comment-13896828 ] Jing Zhao commented on HDFS-5920: - HDFS-5753 already defines the Rollback option for rolling upgrade. Users can use "-rollingUpgrade rollback" to start the NameNode and rollback the NameNode to the status before starting the rolling upgrade. Since NN will do a checkpoint right before the upgrade marker, for rollback, we only need to go back to that fsimage, and delete all the editlog segments on or above (marker txid - 1). This editlog deletion should happen in both NN's local directory and shared storage (JNs for QJM's HA setup). > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896821#comment-13896821 ] Colin Patrick McCabe commented on HDFS-5899: bq. dfs.permissions.enabled continues to work as expected, suppressing permission checks if set to false, whether the permissions are defined via permission bits or ACLs. bq. The superuser is still immune to all permission checks, whether they come from permission bits or ACLs. bq. If ACLs are not in use, then permission checks go through the exact same code path that we have in FSPermissionChecker today. We go down a separate path only if the inode has an ACL. That makes sense to me. bq. When ACLs are disabled, all APIs related to ACLs will fail intentionally, an fsimage containing an ACL will cause the NameNode to abort during startup, and ACLs present in the edit log will cause the NameNode to abort. bq. Existing ACLs never get wiped automatically. This recovery procedure is a conscious decision by the cluster admin. I agree that we should never wipe ACLs automatically. But what's the problem with just not enforcing them when {{dfs.namenode.acls.enabled}} is false? Why do we have to fail to start up? That seems like it will introduce problems for admins. bq. If ACLs accidentally crept into the fsimage or edits (i.e. accidentally started with ACLs enabled, but now the admin wants to switch them off), then the recovery procedure would be to restart with ACLs enabled, remove all ACLs, save a new checkpoint, and then restart with ACLs disabled. How do you propose that the admin do this? > Add configuration flag to disable/enable support for ACLs. > -- > > Key: HDFS-5899 > URL: https://issues.apache.org/jira/browse/HDFS-5899 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS ACLs (HDFS-4685) >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: HDFS ACLs (HDFS-4685) > > Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch > > > Add a new configuration property that allows administrators to toggle support > for HDFS ACLs on/off. By default, the flag will be off. This is a > conservative choice, and administrators interested in using ACLs can enable > it explicitly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
Jing Zhao created HDFS-5920: --- Summary: Support rollback of rolling upgrade in NameNode and JournalNodes Key: HDFS-5920 URL: https://issues.apache.org/jira/browse/HDFS-5920 Project: Hadoop HDFS Issue Type: Sub-task Components: journal-node, namenode Reporter: Jing Zhao Assignee: Jing Zhao This jira provides rollback functionality for NameNode and JournalNode in rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5888) Cannot get the FileStatus of the root inode from the new Globber
[ https://issues.apache.org/jira/browse/HDFS-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896814#comment-13896814 ] Colin Patrick McCabe commented on HDFS-5888: build failed because: {code} # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 1078712 bytes for Chunk::new # An error report file with more information is saved as: # /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hs_err_pid17045.log {code} > Cannot get the FileStatus of the root inode from the new Globber > > > Key: HDFS-5888 > URL: https://issues.apache.org/jira/browse/HDFS-5888 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Colin Patrick McCabe > Attachments: HDFS-5888.002.patch > > > We can no longer get the correct FileStatus of the root inode "/" from the > Globber. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5805) TestCheckpoint.testCheckpoint fails intermittently on branch2
[ https://issues.apache.org/jira/browse/HDFS-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-5805. - Resolution: Cannot Reproduce I was not able to reproduce the test failure even a single time in many efforts. Closing this for now. > TestCheckpoint.testCheckpoint fails intermittently on branch2 > - > > Key: HDFS-5805 > URL: https://issues.apache.org/jira/browse/HDFS-5805 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > > {noformat} > java.lang.AssertionError: Bad value for metric GetEditAvgTime > Expected: gt(0.0) > got: <0.0> > at org.junit.Assert.assertThat(Assert.java:780) > at > org.apache.hadoop.test.MetricsAsserts.assertGaugeGt(MetricsAsserts.java:341) > at > org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testCheckpoint(TestCheckpoint.java:1070) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5805) TestCheckpoint.testCheckpoint fails intermittently on branch2
[ https://issues.apache.org/jira/browse/HDFS-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5805 started by Mit Desai. > TestCheckpoint.testCheckpoint fails intermittently on branch2 > - > > Key: HDFS-5805 > URL: https://issues.apache.org/jira/browse/HDFS-5805 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > > {noformat} > java.lang.AssertionError: Bad value for metric GetEditAvgTime > Expected: gt(0.0) > got: <0.0> > at org.junit.Assert.assertThat(Assert.java:780) > at > org.apache.hadoop.test.MetricsAsserts.assertGaugeGt(MetricsAsserts.java:341) > at > org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testCheckpoint(TestCheckpoint.java:1070) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work stopped] (HDFS-5805) TestCheckpoint.testCheckpoint fails intermittently on branch2
[ https://issues.apache.org/jira/browse/HDFS-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5805 stopped by Mit Desai. > TestCheckpoint.testCheckpoint fails intermittently on branch2 > - > > Key: HDFS-5805 > URL: https://issues.apache.org/jira/browse/HDFS-5805 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > > {noformat} > java.lang.AssertionError: Bad value for metric GetEditAvgTime > Expected: gt(0.0) > got: <0.0> > at org.junit.Assert.assertThat(Assert.java:780) > at > org.apache.hadoop.test.MetricsAsserts.assertGaugeGt(MetricsAsserts.java:341) > at > org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testCheckpoint(TestCheckpoint.java:1070) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)