[jira] [Commented] (HDFS-1348) Improve NameNode reponsiveness while it is checking if datanode decommissions are complete
[ https://issues.apache.org/jira/browse/HDFS-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343920#comment-14343920 ] Konstantin Shvachko commented on HDFS-1348: --- This is still a valid optimization. The code moved with write and read locks, but the granularity of locking for decommissioning checks is still 5 DNs at a time. And the nodes got bigger drives than 5 years ago. Too bad we did not commit it back then. Improve NameNode reponsiveness while it is checking if datanode decommissions are complete -- Key: HDFS-1348 URL: https://issues.apache.org/jira/browse/HDFS-1348 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: decomissionImp1.patch, decomissionImp2.patch, decommission.patch, decommission1.patch NameNode normally is busy all the time. Its log is full of activities every second. But once for a while, NameNode seems to pause for more than 10 seconds without doing anything, leaving a blank in its log even though no garbage collection is happening. All other requests to NameNode are blocked when this is happening. One culprit is DecommionManager. Its monitor holds the fsynamesystem lock during the whole process of checking if decomissioning DataNodes are finished or not, during which it checks every block of up to a default of 5 datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-1348) Improve NameNode reponsiveness while it is checking if datanode decommissions are complete
[ https://issues.apache.org/jira/browse/HDFS-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310186#comment-14310186 ] Hadoop QA commented on HDFS-1348: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12455885/decomissionImp2.patch against trunk revision da2fb2b. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9473//console This message is automatically generated. Improve NameNode reponsiveness while it is checking if datanode decommissions are complete -- Key: HDFS-1348 URL: https://issues.apache.org/jira/browse/HDFS-1348 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: decomissionImp1.patch, decomissionImp2.patch, decommission.patch, decommission1.patch NameNode normally is busy all the time. Its log is full of activities every second. But once for a while, NameNode seems to pause for more than 10 seconds without doing anything, leaving a blank in its log even though no garbage collection is happening. All other requests to NameNode are blocked when this is happening. One culprit is DecommionManager. Its monitor holds the fsynamesystem lock during the whole process of checking if decomissioning DataNodes are finished or not, during which it checks every block of up to a default of 5 datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Commented: (HDFS-1348) Improve NameNode reponsiveness while it is checking if datanode decommissions are complete
[ https://issues.apache.org/jira/browse/HDFS-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12996793#comment-12996793 ] Hadoop QA commented on HDFS-1348: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12455885/decomissionImp2.patch against trunk revision 1072023. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/189//console This message is automatically generated. Improve NameNode reponsiveness while it is checking if datanode decommissions are complete -- Key: HDFS-1348 URL: https://issues.apache.org/jira/browse/HDFS-1348 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: decomissionImp1.patch, decomissionImp2.patch, decommission.patch, decommission1.patch NameNode normally is busy all the time. Its log is full of activities every second. But once for a while, NameNode seems to pause for more than 10 seconds without doing anything, leaving a blank in its log even though no garbage collection is happening. All other requests to NameNode are blocked when this is happening. One culprit is DecommionManager. Its monitor holds the fsynamesystem lock during the whole process of checking if decomissioning DataNodes are finished or not, during which it checks every block of up to a default of 5 datanodes. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1348) Improve NameNode reponsiveness while it is checking if datanode decommissions are complete
[ https://issues.apache.org/jira/browse/HDFS-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974020#action_12974020 ] Hadoop QA commented on HDFS-1348: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12455885/decomissionImp2.patch against trunk revision 1051669. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestStorageRestore org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestHDFSTrash -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/22//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/22//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/22//console This message is automatically generated. Improve NameNode reponsiveness while it is checking if datanode decommissions are complete -- Key: HDFS-1348 URL: https://issues.apache.org/jira/browse/HDFS-1348 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: decomissionImp1.patch, decomissionImp2.patch, decommission.patch, decommission1.patch NameNode normally is busy all the time. Its log is full of activities every second. But once for a while, NameNode seems to pause for more than 10 seconds without doing anything, leaving a blank in its log even though no garbage collection is happening. All other requests to NameNode are blocked when this is happening. One culprit is DecommionManager. Its monitor holds the fsynamesystem lock during the whole process of checking if decomissioning DataNodes are finished or not, during which it checks every block of up to a default of 5 datanodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1348) Improve NameNode reponsiveness while it is checking if datanode decommissions are complete
[ https://issues.apache.org/jira/browse/HDFS-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916276#action_12916276 ] Dmytro Molkov commented on HDFS-1348: - The patch looks good to me. +1 Improve NameNode reponsiveness while it is checking if datanode decommissions are complete -- Key: HDFS-1348 URL: https://issues.apache.org/jira/browse/HDFS-1348 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: decomissionImp1.patch, decomissionImp2.patch, decommission.patch, decommission1.patch NameNode normally is busy all the time. Its log is full of activities every second. But once for a while, NameNode seems to pause for more than 10 seconds without doing anything, leaving a blank in its log even though no garbage collection is happening. All other requests to NameNode are blocked when this is happening. One culprit is DecommionManager. Its monitor holds the fsynamesystem lock during the whole process of checking if decomissioning DataNodes are finished or not, during which it checks every block of up to a default of 5 datanodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1348) Improve NameNode reponsiveness while it is checking if datanode decommissions are complete
[ https://issues.apache.org/jira/browse/HDFS-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901492#action_12901492 ] Hairong Kuang commented on HDFS-1348: - For r/w locks, the first second cases could use the read lock and the third case has to use the write lock when it updates some stats of the decommissioning node. Improve NameNode reponsiveness while it is checking if datanode decommissions are complete -- Key: HDFS-1348 URL: https://issues.apache.org/jira/browse/HDFS-1348 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 NameNode normally is busy all the time. Its log is full of activities every second. But once for a while, NameNode seems to pause for more than 10 seconds without doing anything, leaving a blank in its log even though no garbage collection is happening. All other requests to NameNode are blocked when this is happening. One culprit is DecommionManager. Its monitor holds the fsynamesystem lock during the whole process of checking if decomissioning DataNodes are finished or not, during which it checks every block of up to a default of 5 datanodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1348) Improve NameNode reponsiveness while it is checking if datanode decommissions are complete
[ https://issues.apache.org/jira/browse/HDFS-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901086#action_12901086 ] dhruba borthakur commented on HDFS-1348: +1. Is it possible to do the check with fsnamesystem-readlock only. Improve NameNode reponsiveness while it is checking if datanode decommissions are complete -- Key: HDFS-1348 URL: https://issues.apache.org/jira/browse/HDFS-1348 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 NameNode normally is busy all the time. Its log is full of activities every second. But once for a while, NameNode seems to pause for more than 10 seconds without doing anything, leaving a blank in its log even though no garbage collection is happening. All other requests to NameNode are blocked when this is happening. One culprit is DecommionManager. Its monitor holds the fsynamesystem lock during the whole process of checking if decomissioning DataNodes are finished or not, during which it checks every block of up to a default of 5 datanodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.