[jira] [Commented] (HDFS-4261) TestBalancerWithNodeGroup times out
[ https://issues.apache.org/jira/browse/HDFS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672844#comment-13672844 ] Tsz Wo (Nicholas), SZE commented on HDFS-4261: -- Sure, let's fix the failure in HDFS-4376. Thanks for the update. TestBalancerWithNodeGroup times out --- Key: HDFS-4261 URL: https://issues.apache.org/jira/browse/HDFS-4261 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 1.0.4, 1.1.1, 2.0.2-alpha Reporter: Tsz Wo (Nicholas), SZE Assignee: Junping Du Fix For: 3.0.0 Attachments: HDFS-4261-branch-1.patch, HDFS-4261-branch-1-v2.patch, HDFS-4261-branch-2.patch, HDFS-4261.patch, HDFS-4261-v2.patch, HDFS-4261-v3.patch, HDFS-4261-v4.patch, HDFS-4261-v5.patch, HDFS-4261-v6.patch, HDFS-4261-v7.patch, HDFS-4261-v8.patch, jstack-mac-18567, jstack-win-5488, org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.mac, org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.win, test-balancer-with-node-group-timeout.txt When I manually ran TestBalancerWithNodeGroup, it always timed out in my machine. Looking at the Jerkins report [build #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/], TestBalancerWithNodeGroup somehow was skipped so that the problem was not detected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4382) Fix typo MAX_NOT_CHANGED_INTERATIONS
[ https://issues.apache.org/jira/browse/HDFS-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4382: - Affects Version/s: (was: 3.0.0) Fix Version/s: (was: 3.0.0) 2.1.0-beta Merged this to branch-2. Fix typo MAX_NOT_CHANGED_INTERATIONS Key: HDFS-4382 URL: https://issues.apache.org/jira/browse/HDFS-4382 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.1.0-beta Attachments: hdfs-4382-v1.txt Here is an example: {code} + if (notChangedIterations = MAX_NOT_CHANGED_INTERATIONS) { {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4261) TestBalancerWithNodeGroup times out
[ https://issues.apache.org/jira/browse/HDFS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4261: - Resolution: Fixed Fix Version/s: (was: 3.0.0) 1.3.0 2.1.0-beta 1-win Status: Resolved (was: Patch Available) Merged this to branch-2 and also committed the branch-1 patch. Thanks, Junping! TestBalancerWithNodeGroup times out --- Key: HDFS-4261 URL: https://issues.apache.org/jira/browse/HDFS-4261 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 1.0.4, 1.1.1, 2.0.2-alpha Reporter: Tsz Wo (Nicholas), SZE Assignee: Junping Du Fix For: 1-win, 2.1.0-beta, 1.3.0 Attachments: HDFS-4261-branch-1.patch, HDFS-4261-branch-1-v2.patch, HDFS-4261-branch-2.patch, HDFS-4261.patch, HDFS-4261-v2.patch, HDFS-4261-v3.patch, HDFS-4261-v4.patch, HDFS-4261-v5.patch, HDFS-4261-v6.patch, HDFS-4261-v7.patch, HDFS-4261-v8.patch, jstack-mac-18567, jstack-win-5488, org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.mac, org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.win, test-balancer-with-node-group-timeout.txt When I manually ran TestBalancerWithNodeGroup, it always timed out in my machine. Looking at the Jerkins report [build #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/], TestBalancerWithNodeGroup somehow was skipped so that the problem was not detected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4261) TestBalancerWithNodeGroup times out
[ https://issues.apache.org/jira/browse/HDFS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672869#comment-13672869 ] Junping Du commented on HDFS-4261: -- Thanks Nicholas! TestBalancerWithNodeGroup times out --- Key: HDFS-4261 URL: https://issues.apache.org/jira/browse/HDFS-4261 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 1.0.4, 1.1.1, 2.0.2-alpha Reporter: Tsz Wo (Nicholas), SZE Assignee: Junping Du Fix For: 1-win, 2.1.0-beta, 1.3.0 Attachments: HDFS-4261-branch-1.patch, HDFS-4261-branch-1-v2.patch, HDFS-4261-branch-2.patch, HDFS-4261.patch, HDFS-4261-v2.patch, HDFS-4261-v3.patch, HDFS-4261-v4.patch, HDFS-4261-v5.patch, HDFS-4261-v6.patch, HDFS-4261-v7.patch, HDFS-4261-v8.patch, jstack-mac-18567, jstack-win-5488, org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.mac, org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.win, test-balancer-with-node-group-timeout.txt When I manually ran TestBalancerWithNodeGroup, it always timed out in my machine. Looking at the Jerkins report [build #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/], TestBalancerWithNodeGroup somehow was skipped so that the problem was not detected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4860) Add additional attributes to JMX beans
[ https://issues.apache.org/jira/browse/HDFS-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Lorimer updated HDFS-4860: - Status: Open (was: Patch Available) Add additional attributes to JMX beans -- Key: HDFS-4860 URL: https://issues.apache.org/jira/browse/HDFS-4860 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.4-alpha, 0.20.204.1, 3.0.0, 2.1.0-beta Reporter: Trevor Lorimer Attachments: 0001-HDFS-4860.patch Currently the JMX bean returns much of the data contained on the HDFS Health webpage (dfsHealth.html). However there are several other attributes that are required to be added. I intend to add the following items to the appropriate bean in parenthesis : Started time (NameNodeInfo), Compiled info (NameNodeInfo), Jvm MaxHeap, MaxNonHeap (JvmMetrics) Node Usage stats (i.e. Min, Median, Max, stdev) (NameNodeInfo), Count of decommissioned Live and Dead nodes (FSNamesystemState), Journal Status (NodeNameInfo) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4860) Add additional attributes to JMX beans
[ https://issues.apache.org/jira/browse/HDFS-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Lorimer updated HDFS-4860: - Attachment: (was: 0001-HDFS-4860.patch) Add additional attributes to JMX beans -- Key: HDFS-4860 URL: https://issues.apache.org/jira/browse/HDFS-4860 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.20.204.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha Reporter: Trevor Lorimer Attachments: 0002-HDFS-4860.patch Currently the JMX bean returns much of the data contained on the HDFS Health webpage (dfsHealth.html). However there are several other attributes that are required to be added. I intend to add the following items to the appropriate bean in parenthesis : Started time (NameNodeInfo), Compiled info (NameNodeInfo), Jvm MaxHeap, MaxNonHeap (JvmMetrics) Node Usage stats (i.e. Min, Median, Max, stdev) (NameNodeInfo), Count of decommissioned Live and Dead nodes (FSNamesystemState), Journal Status (NodeNameInfo) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4860) Add additional attributes to JMX beans
[ https://issues.apache.org/jira/browse/HDFS-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Lorimer updated HDFS-4860: - Attachment: 0002-HDFS-4860.patch Added message to asserts. The test that was breaking seems to be unrelated to my changes. Add additional attributes to JMX beans -- Key: HDFS-4860 URL: https://issues.apache.org/jira/browse/HDFS-4860 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.20.204.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha Reporter: Trevor Lorimer Attachments: 0002-HDFS-4860.patch Currently the JMX bean returns much of the data contained on the HDFS Health webpage (dfsHealth.html). However there are several other attributes that are required to be added. I intend to add the following items to the appropriate bean in parenthesis : Started time (NameNodeInfo), Compiled info (NameNodeInfo), Jvm MaxHeap, MaxNonHeap (JvmMetrics) Node Usage stats (i.e. Min, Median, Max, stdev) (NameNodeInfo), Count of decommissioned Live and Dead nodes (FSNamesystemState), Journal Status (NodeNameInfo) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4860) Add additional attributes to JMX beans
[ https://issues.apache.org/jira/browse/HDFS-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Lorimer updated HDFS-4860: - Status: Patch Available (was: Open) Add additional attributes to JMX beans -- Key: HDFS-4860 URL: https://issues.apache.org/jira/browse/HDFS-4860 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.4-alpha, 0.20.204.1, 3.0.0, 2.1.0-beta Reporter: Trevor Lorimer Attachments: 0002-HDFS-4860.patch Currently the JMX bean returns much of the data contained on the HDFS Health webpage (dfsHealth.html). However there are several other attributes that are required to be added. I intend to add the following items to the appropriate bean in parenthesis : Started time (NameNodeInfo), Compiled info (NameNodeInfo), Jvm MaxHeap, MaxNonHeap (JvmMetrics) Node Usage stats (i.e. Min, Median, Max, stdev) (NameNodeInfo), Count of decommissioned Live and Dead nodes (FSNamesystemState), Journal Status (NodeNameInfo) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4860) Add additional attributes to JMX beans
[ https://issues.apache.org/jira/browse/HDFS-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673070#comment-13673070 ] Trevor Lorimer commented on HDFS-4860: -- The an example display of the new attributes in NameNodeInfo: NodeUsage: {nodeUsage:{min:1.02%,median:1.02%,max:1.02%,stdDev:0.00%}} NameJournalStatus: [{stream:EditLogFileOutputStream,Required:false,manager:FileJournalManage,streamLocation:(/opt/hadoop/hdfs/namenode/current/edits_inprogress_364),Disabled:false,OpenForWrite:true,managerLocation:(root=/opt/hadoop/hdfs/namenode)}] NNStarted: Fri May 31 15:29:25 BST 2013 CompileInfo: 2013-05-31T14:14Z by trevorlorimer from hadoop-2.0.4-wdd3.6 Add additional attributes to JMX beans -- Key: HDFS-4860 URL: https://issues.apache.org/jira/browse/HDFS-4860 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.20.204.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha Reporter: Trevor Lorimer Attachments: 0002-HDFS-4860.patch Currently the JMX bean returns much of the data contained on the HDFS Health webpage (dfsHealth.html). However there are several other attributes that are required to be added. I intend to add the following items to the appropriate bean in parenthesis : Started time (NameNodeInfo), Compiled info (NameNodeInfo), Jvm MaxHeap, MaxNonHeap (JvmMetrics) Node Usage stats (i.e. Min, Median, Max, stdev) (NameNodeInfo), Count of decommissioned Live and Dead nodes (FSNamesystemState), Journal Status (NodeNameInfo) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4860) Add additional attributes to JMX beans
[ https://issues.apache.org/jira/browse/HDFS-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673171#comment-13673171 ] Hadoop QA commented on HDFS-4860: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585840/0002-HDFS-4860.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4468//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4468//console This message is automatically generated. Add additional attributes to JMX beans -- Key: HDFS-4860 URL: https://issues.apache.org/jira/browse/HDFS-4860 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.20.204.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha Reporter: Trevor Lorimer Attachments: 0002-HDFS-4860.patch Currently the JMX bean returns much of the data contained on the HDFS Health webpage (dfsHealth.html). However there are several other attributes that are required to be added. I intend to add the following items to the appropriate bean in parenthesis : Started time (NameNodeInfo), Compiled info (NameNodeInfo), Jvm MaxHeap, MaxNonHeap (JvmMetrics) Node Usage stats (i.e. Min, Median, Max, stdev) (NameNodeInfo), Count of decommissioned Live and Dead nodes (FSNamesystemState), Journal Status (NodeNameInfo) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Status: Open (was: Patch Available) The patch passed test-patch.sh on my machine several times. Rolling the dice again. Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Status: Patch Available (was: Open) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3934) duplicative dfs_hosts entries handled wrong
[ https://issues.apache.org/jira/browse/HDFS-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673334#comment-13673334 ] Hudson commented on HDFS-3934: -- Integrated in Hadoop-trunk-Commit #3840 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3840/]) HDFS-3934. duplicative dfs_hosts entries handled wrong. (cmccabe) (Revision 1489065) Result = FAILURE cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1489065 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java duplicative dfs_hosts entries handled wrong --- Key: HDFS-3934 URL: https://issues.apache.org/jira/browse/HDFS-3934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.1-alpha Reporter: Andy Isaacson Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3934.001.patch, HDFS-3934.002.patch, HDFS-3934.003.patch, HDFS-3934.004.patch, HDFS-3934.005.patch, HDFS-3934.006.patch, HDFS-3934.007.patch, HDFS-3934.008.patch, HDFS-3934.010.patch, HDFS-3934.011.patch, HDFS-3934.012.patch, HDFS-3934.013.patch, HDFS-3934.014.patch, HDFS-3934.015.patch, HDFS-3934.016.patch, HDFS-3934.017.patch A dead DN listed in dfs_hosts_allow.txt by IP and in dfs_hosts_exclude.txt by hostname ends up being displayed twice in {{dfsnodelist.jsp?whatNodes=DEAD}} after the NN restarts because {{getDatanodeListForReport}} does not handle such a pseudo-duplicate correctly: # the Remove any nodes we know about from the map loop no longer has the knowledge to remove the spurious entries # the The remaining nodes are ones that are referenced by the hosts files loop does not do hostname lookups, so does not know that the IP and hostname refer to the same host. Relatedly, such an IP-based dfs_hosts entry results in a cosmetic problem in the JSP output: The *Node* column shows :50010 as the nodename, with HTML markup {{a href=http://:50075/browseDirectory.jsp?namenodeInfoPort=50070amp;dir=%2Famp;nnaddr=172.29.97.196:8020; title=172.29.97.216:50010:50010/a}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3934) duplicative dfs_hosts entries handled wrong
[ https://issues.apache.org/jira/browse/HDFS-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673340#comment-13673340 ] Colin Patrick McCabe commented on HDFS-3934: I talked to Daryn offline about this and he said he was ok with this going in, though he didn't have time this week to re-review. duplicative dfs_hosts entries handled wrong --- Key: HDFS-3934 URL: https://issues.apache.org/jira/browse/HDFS-3934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.1-alpha Reporter: Andy Isaacson Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3934.001.patch, HDFS-3934.002.patch, HDFS-3934.003.patch, HDFS-3934.004.patch, HDFS-3934.005.patch, HDFS-3934.006.patch, HDFS-3934.007.patch, HDFS-3934.008.patch, HDFS-3934.010.patch, HDFS-3934.011.patch, HDFS-3934.012.patch, HDFS-3934.013.patch, HDFS-3934.014.patch, HDFS-3934.015.patch, HDFS-3934.016.patch, HDFS-3934.017.patch A dead DN listed in dfs_hosts_allow.txt by IP and in dfs_hosts_exclude.txt by hostname ends up being displayed twice in {{dfsnodelist.jsp?whatNodes=DEAD}} after the NN restarts because {{getDatanodeListForReport}} does not handle such a pseudo-duplicate correctly: # the Remove any nodes we know about from the map loop no longer has the knowledge to remove the spurious entries # the The remaining nodes are ones that are referenced by the hosts files loop does not do hostname lookups, so does not know that the IP and hostname refer to the same host. Relatedly, such an IP-based dfs_hosts entry results in a cosmetic problem in the JSP output: The *Node* column shows :50010 as the nodename, with HTML markup {{a href=http://:50075/browseDirectory.jsp?namenodeInfoPort=50070amp;dir=%2Famp;nnaddr=172.29.97.196:8020; title=172.29.97.216:50010:50010/a}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4870) periodically re-resolve hostnames in included and excluded datanodes list
Colin Patrick McCabe created HDFS-4870: -- Summary: periodically re-resolve hostnames in included and excluded datanodes list Key: HDFS-4870 URL: https://issues.apache.org/jira/browse/HDFS-4870 Project: Hadoop HDFS Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor We currently only resolve the hostnames in the included and excluded datanodes list once-- when the list is read. The rationale for this is that in big clusters, DNS resolution for thousands of nodes can take a long time (when generating a datanode list in getDatanodeListForReport, for example). However, if the DNS information changes for one of these hosts, we should reflect that. A background thread could do these DNS resolutions every few minutes without blocking any foreground operations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3934) duplicative dfs_hosts entries handled wrong
[ https://issues.apache.org/jira/browse/HDFS-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673349#comment-13673349 ] Hudson commented on HDFS-3934: -- Integrated in Hadoop-trunk-Commit #3841 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3841/]) Add needed file for HDFS-3934 (cmccabe) (Revision 1489068) Result = SUCCESS cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1489068 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/HostFileManager.java duplicative dfs_hosts entries handled wrong --- Key: HDFS-3934 URL: https://issues.apache.org/jira/browse/HDFS-3934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.1-alpha Reporter: Andy Isaacson Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3934.001.patch, HDFS-3934.002.patch, HDFS-3934.003.patch, HDFS-3934.004.patch, HDFS-3934.005.patch, HDFS-3934.006.patch, HDFS-3934.007.patch, HDFS-3934.008.patch, HDFS-3934.010.patch, HDFS-3934.011.patch, HDFS-3934.012.patch, HDFS-3934.013.patch, HDFS-3934.014.patch, HDFS-3934.015.patch, HDFS-3934.016.patch, HDFS-3934.017.patch A dead DN listed in dfs_hosts_allow.txt by IP and in dfs_hosts_exclude.txt by hostname ends up being displayed twice in {{dfsnodelist.jsp?whatNodes=DEAD}} after the NN restarts because {{getDatanodeListForReport}} does not handle such a pseudo-duplicate correctly: # the Remove any nodes we know about from the map loop no longer has the knowledge to remove the spurious entries # the The remaining nodes are ones that are referenced by the hosts files loop does not do hostname lookups, so does not know that the IP and hostname refer to the same host. Relatedly, such an IP-based dfs_hosts entry results in a cosmetic problem in the JSP output: The *Node* column shows :50010 as the nodename, with HTML markup {{a href=http://:50075/browseDirectory.jsp?namenodeInfoPort=50070amp;dir=%2Famp;nnaddr=172.29.97.196:8020; title=172.29.97.216:50010:50010/a}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3934) duplicative dfs_hosts entries handled wrong
[ https://issues.apache.org/jira/browse/HDFS-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673375#comment-13673375 ] Hudson commented on HDFS-3934: -- Integrated in Hadoop-trunk-Commit #3843 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3843/]) Remove extra code that code erroneously committed in HDFS-3934 (cmccabe) (Revision 1489083) Result = SUCCESS cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1489083 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/HostFileManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java duplicative dfs_hosts entries handled wrong --- Key: HDFS-3934 URL: https://issues.apache.org/jira/browse/HDFS-3934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.1-alpha Reporter: Andy Isaacson Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3934.001.patch, HDFS-3934.002.patch, HDFS-3934.003.patch, HDFS-3934.004.patch, HDFS-3934.005.patch, HDFS-3934.006.patch, HDFS-3934.007.patch, HDFS-3934.008.patch, HDFS-3934.010.patch, HDFS-3934.011.patch, HDFS-3934.012.patch, HDFS-3934.013.patch, HDFS-3934.014.patch, HDFS-3934.015.patch, HDFS-3934.016.patch, HDFS-3934.017.patch A dead DN listed in dfs_hosts_allow.txt by IP and in dfs_hosts_exclude.txt by hostname ends up being displayed twice in {{dfsnodelist.jsp?whatNodes=DEAD}} after the NN restarts because {{getDatanodeListForReport}} does not handle such a pseudo-duplicate correctly: # the Remove any nodes we know about from the map loop no longer has the knowledge to remove the spurious entries # the The remaining nodes are ones that are referenced by the hosts files loop does not do hostname lookups, so does not know that the IP and hostname refer to the same host. Relatedly, such an IP-based dfs_hosts entry results in a cosmetic problem in the JSP output: The *Node* column shows :50010 as the nodename, with HTML markup {{a href=http://:50075/browseDirectory.jsp?namenodeInfoPort=50070amp;dir=%2Famp;nnaddr=172.29.97.196:8020; title=172.29.97.216:50010:50010/a}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4870) periodically re-resolve hostnames in included and excluded datanodes list
[ https://issues.apache.org/jira/browse/HDFS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4870: --- Assignee: Colin Patrick McCabe Affects Version/s: 2.0.5-alpha Status: Patch Available (was: Open) periodically re-resolve hostnames in included and excluded datanodes list - Key: HDFS-4870 URL: https://issues.apache.org/jira/browse/HDFS-4870 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4870.001.patch We currently only resolve the hostnames in the included and excluded datanodes list once-- when the list is read. The rationale for this is that in big clusters, DNS resolution for thousands of nodes can take a long time (when generating a datanode list in getDatanodeListForReport, for example). However, if the DNS information changes for one of these hosts, we should reflect that. A background thread could do these DNS resolutions every few minutes without blocking any foreground operations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4870) periodically re-resolve hostnames in included and excluded datanodes list
[ https://issues.apache.org/jira/browse/HDFS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4870: --- Attachment: HDFS-4870.001.patch periodically re-resolve hostnames in included and excluded datanodes list - Key: HDFS-4870 URL: https://issues.apache.org/jira/browse/HDFS-4870 Project: Hadoop HDFS Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4870.001.patch We currently only resolve the hostnames in the included and excluded datanodes list once-- when the list is read. The rationale for this is that in big clusters, DNS resolution for thousands of nodes can take a long time (when generating a datanode list in getDatanodeListForReport, for example). However, if the DNS information changes for one of these hosts, we should reflect that. A background thread could do these DNS resolutions every few minutes without blocking any foreground operations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4860) Add additional attributes to JMX beans
[ https://issues.apache.org/jira/browse/HDFS-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673378#comment-13673378 ] Todd Lipcon commented on HDFS-4860: --- It looks like your substring math is off -- manager:FileJournalManage Plus I think it's awfully hacky to assume that the toString() of a manager happens to have this particular format... why not add something like JournalManager.generateAttributeMap() so that each JM implementation can include its appropriate statistics in JMX without this string parsing hackery? Also, this code is unnecessarily verbose: {code} + + if (jas.isDisabled()) { +jasMap.put(Disabled, Boolean.TRUE.toString()); + } else { +jasMap.put(Disabled, Boolean.FALSE.toString()); + } {code} You could just do: {code} jasMap.put(Disabled, String.valueOf(jas.isDisabled())) {code} Add additional attributes to JMX beans -- Key: HDFS-4860 URL: https://issues.apache.org/jira/browse/HDFS-4860 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.20.204.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha Reporter: Trevor Lorimer Attachments: 0002-HDFS-4860.patch Currently the JMX bean returns much of the data contained on the HDFS Health webpage (dfsHealth.html). However there are several other attributes that are required to be added. I intend to add the following items to the appropriate bean in parenthesis : Started time (NameNodeInfo), Compiled info (NameNodeInfo), Jvm MaxHeap, MaxNonHeap (JvmMetrics) Node Usage stats (i.e. Min, Median, Max, stdev) (NameNodeInfo), Count of decommissioned Live and Dead nodes (FSNamesystemState), Journal Status (NodeNameInfo) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673388#comment-13673388 ] Ravi Prakash commented on HDFS-4832: Hi Kihwal, that change was made in https://issues.apache.org/jira/browse/HDFS-1295 . Matt reports some statistics there. Please let me know if its worthwhile to take that performance hit to report the correct block status. Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4871) Skip failing commons tests on Windows
Arpit Agarwal created HDFS-4871: --- Summary: Skip failing commons tests on Windows Key: HDFS-4871 URL: https://issues.apache.org/jira/browse/HDFS-4871 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.1.0-beta This is a temporary fix proposed to get CI working. We will skip the following failing tests on Windows: # TestChRootedFs # TestFSMainOperationsLocalFileSystem # TestFcCreateMkdirLocalFs # TestFcMainOperationsLocalFs # TestFcPermissionsLocalFs # TestLocalFSFileContextSymlink # TestLocalFileSystem # TestShellCommandFencer # TestSocketIOWithTimeout # TestViewFsLocalFs # TestViewFsTrash # TestViewFsWithAuthorityLocalFs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4871) Skip failing commons tests on Windows
[ https://issues.apache.org/jira/browse/HDFS-4871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-4871: Description: This is a temporary fix proposed to get CI working. We will skip the following failing tests on Windows: # TestChRootedFs # TestFSMainOperationsLocalFileSystem # TestFcCreateMkdirLocalFs # TestFcMainOperationsLocalFs # TestFcPermissionsLocalFs # TestLocalFSFileContextSymlink - HADOOP-9527 # TestLocalFileSystem # TestShellCommandFencer - HADOOP-9526 # TestSocketIOWithTimeout - HADOOP-8982 # TestViewFsLocalFs # TestViewFsTrash # TestViewFsWithAuthorityLocalFs The tests will be re-enabled as we fix each. JIRAs for remaining failing tests to follow soon. was: This is a temporary fix proposed to get CI working. We will skip the following failing tests on Windows: # TestChRootedFs # TestFSMainOperationsLocalFileSystem # TestFcCreateMkdirLocalFs # TestFcMainOperationsLocalFs # TestFcPermissionsLocalFs # TestLocalFSFileContextSymlink # TestLocalFileSystem # TestShellCommandFencer # TestSocketIOWithTimeout # TestViewFsLocalFs # TestViewFsTrash # TestViewFsWithAuthorityLocalFs Skip failing commons tests on Windows - Key: HDFS-4871 URL: https://issues.apache.org/jira/browse/HDFS-4871 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.1.0-beta This is a temporary fix proposed to get CI working. We will skip the following failing tests on Windows: # TestChRootedFs # TestFSMainOperationsLocalFileSystem # TestFcCreateMkdirLocalFs # TestFcMainOperationsLocalFs # TestFcPermissionsLocalFs # TestLocalFSFileContextSymlink - HADOOP-9527 # TestLocalFileSystem # TestShellCommandFencer - HADOOP-9526 # TestSocketIOWithTimeout - HADOOP-8982 # TestViewFsLocalFs # TestViewFsTrash # TestViewFsWithAuthorityLocalFs The tests will be re-enabled as we fix each. JIRAs for remaining failing tests to follow soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4870) periodically re-resolve hostnames in included and excluded datanodes list
[ https://issues.apache.org/jira/browse/HDFS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673490#comment-13673490 ] Hadoop QA commented on HDFS-4870: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585903/HDFS-4870.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4469//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4469//console This message is automatically generated. periodically re-resolve hostnames in included and excluded datanodes list - Key: HDFS-4870 URL: https://issues.apache.org/jira/browse/HDFS-4870 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4870.001.patch We currently only resolve the hostnames in the included and excluded datanodes list once-- when the list is read. The rationale for this is that in big clusters, DNS resolution for thousands of nodes can take a long time (when generating a datanode list in getDatanodeListForReport, for example). However, if the DNS information changes for one of these hosts, we should reflect that. A background thread could do these DNS resolutions every few minutes without blocking any foreground operations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4849) Idempotent create and append operations.
[ https://issues.apache.org/jira/browse/HDFS-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-4849: -- Summary: Idempotent create and append operations. (was: Idempotent create, append and delete operations.) Idempotent create and append operations. Key: HDFS-4849 URL: https://issues.apache.org/jira/browse/HDFS-4849 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.4-alpha Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko create, append and delete operations can be made idempotent. This will reduce chances for a job or other app failures when NN fails over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4872) Idempotent delete operation.
Konstantin Shvachko created HDFS-4872: - Summary: Idempotent delete operation. Key: HDFS-4872 URL: https://issues.apache.org/jira/browse/HDFS-4872 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.4-alpha Reporter: Konstantin Shvachko Making delete idempotent is important to provide uninterrupted job execution in case of HA failover. This is to discuss different approaches to idempotent implementation of delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4867) metaSave NPEs when there are invalid blocks in repl queue.
[ https://issues.apache.org/jira/browse/HDFS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-4867: --- Attachment: HDFS-4867.trunk.patch Attaching patch with unit test to print orphaned blocks from metaSave. This will fix the immediate issue but I struggle to understand WHY this is happening in the first place... I am able to simulate orphaned blocks in the unit test by deleting the created file immediately before metaSave is called. metaSave NPEs when there are invalid blocks in repl queue. -- Key: HDFS-4867 URL: https://issues.apache.org/jira/browse/HDFS-4867 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Kihwal Lee Assignee: Ravi Prakash Attachments: HDFS-4867.trunk.patch Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops generating further report. Normally ReplicationMonitor removes them quickly, but if the queue is huge, it takes very long time. Also in safe mode, they stay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-4867) metaSave NPEs when there are invalid blocks in repl queue.
[ https://issues.apache.org/jira/browse/HDFS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov reassigned HDFS-4867: -- Assignee: Plamen Jeliazkov (was: Ravi Prakash) metaSave NPEs when there are invalid blocks in repl queue. -- Key: HDFS-4867 URL: https://issues.apache.org/jira/browse/HDFS-4867 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Kihwal Lee Assignee: Plamen Jeliazkov Attachments: HDFS-4867.trunk.patch Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops generating further report. Normally ReplicationMonitor removes them quickly, but if the queue is huge, it takes very long time. Also in safe mode, they stay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4867) metaSave NPEs when there are invalid blocks in repl queue.
[ https://issues.apache.org/jira/browse/HDFS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673553#comment-13673553 ] Plamen Jeliazkov commented on HDFS-4867: Ravi, I am going to take this issue up. If you would like to take it back please let me know and I will back off. metaSave NPEs when there are invalid blocks in repl queue. -- Key: HDFS-4867 URL: https://issues.apache.org/jira/browse/HDFS-4867 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Kihwal Lee Assignee: Plamen Jeliazkov Attachments: HDFS-4867.trunk.patch Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops generating further report. Normally ReplicationMonitor removes them quickly, but if the queue is huge, it takes very long time. Also in safe mode, they stay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4873) callGetBlockLocations returns incorrect number of blocks for snapshotted files
Hari Mankude created HDFS-4873: -- Summary: callGetBlockLocations returns incorrect number of blocks for snapshotted files Key: HDFS-4873 URL: https://issues.apache.org/jira/browse/HDFS-4873 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0 Reporter: Hari Mankude Assignee: Jing Zhao callGetBlockLocations() returns all the blocks of a file even when they are not present in the snap version -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4867) metaSave NPEs when there are invalid blocks in repl queue.
[ https://issues.apache.org/jira/browse/HDFS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-4867: --- Fix Version/s: 3.0.0 Status: Patch Available (was: Open) metaSave NPEs when there are invalid blocks in repl queue. -- Key: HDFS-4867 URL: https://issues.apache.org/jira/browse/HDFS-4867 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha, 0.23.7 Reporter: Kihwal Lee Assignee: Plamen Jeliazkov Fix For: 3.0.0 Attachments: HDFS-4867.trunk.patch Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops generating further report. Normally ReplicationMonitor removes them quickly, but if the queue is huge, it takes very long time. Also in safe mode, they stay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4873) callGetBlockLocations returns incorrect number of blocks for snapshotted files
[ https://issues.apache.org/jira/browse/HDFS-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673558#comment-13673558 ] Hari Mankude commented on HDFS-4873: The sequence of operations for creating the problem 1. create a file of size one block 2. take a snapshot 3. append some data to this file. 4. use DfsClient.callGetBlockLocations() to get block locations of the snapshot version of the file. The file len is specified as Long.MAX_VALUE. 5. This call returns two LocatedBlocks for the snapshot version of the file instead of one block. callGetBlockLocations returns incorrect number of blocks for snapshotted files -- Key: HDFS-4873 URL: https://issues.apache.org/jira/browse/HDFS-4873 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0 Reporter: Hari Mankude Assignee: Jing Zhao callGetBlockLocations() returns all the blocks of a file even when they are not present in the snap version -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4873) callGetBlockLocations returns incorrect number of blocks for snapshotted files
[ https://issues.apache.org/jira/browse/HDFS-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673562#comment-13673562 ] Hari Mankude commented on HDFS-4873: Looks like the problem is in getBlockLocationsUpdateTimes() where length is not truncated to fileSize before calling createLocatedBlocks(). There are other solutions possible if snap inode is passed in. callGetBlockLocations returns incorrect number of blocks for snapshotted files -- Key: HDFS-4873 URL: https://issues.apache.org/jira/browse/HDFS-4873 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0 Reporter: Hari Mankude Assignee: Jing Zhao callGetBlockLocations() returns all the blocks of a file even when they are not present in the snap version -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4874) create with OVERWRITE deletes existing file without checking the lease: feature or a bug.
Konstantin Shvachko created HDFS-4874: - Summary: create with OVERWRITE deletes existing file without checking the lease: feature or a bug. Key: HDFS-4874 URL: https://issues.apache.org/jira/browse/HDFS-4874 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha Reporter: Konstantin Shvachko create with OVERWRITE flag will remove a file under construction even if the issuing client does not hold a lease on the file. It could be a bug or the feature that applications rely upon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4867) metaSave NPEs when there are invalid blocks in repl queue.
[ https://issues.apache.org/jira/browse/HDFS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673725#comment-13673725 ] Hadoop QA commented on HDFS-4867: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585941/HDFS-4867.trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//console This message is automatically generated. metaSave NPEs when there are invalid blocks in repl queue. -- Key: HDFS-4867 URL: https://issues.apache.org/jira/browse/HDFS-4867 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Kihwal Lee Assignee: Plamen Jeliazkov Fix For: 3.0.0 Attachments: HDFS-4867.trunk.patch Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops generating further report. Normally ReplicationMonitor removes them quickly, but if the queue is huge, it takes very long time. Also in safe mode, they stay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4867) metaSave NPEs when there are invalid blocks in repl queue.
[ https://issues.apache.org/jira/browse/HDFS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673794#comment-13673794 ] Ravi Prakash commented on HDFS-4867: Hi Plamen, Please feel free to take this up. metaSave NPEs when there are invalid blocks in repl queue. -- Key: HDFS-4867 URL: https://issues.apache.org/jira/browse/HDFS-4867 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Kihwal Lee Assignee: Plamen Jeliazkov Fix For: 3.0.0 Attachments: HDFS-4867.trunk.patch Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops generating further report. Normally ReplicationMonitor removes them quickly, but if the queue is huge, it takes very long time. Also in safe mode, they stay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4862) SafeModeInfo.isManual() returns true when resources are low even if it wasn't entered into manually
[ https://issues.apache.org/jira/browse/HDFS-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4862: --- Attachment: HDFS-4862.patch Hi Kihwal! There already exists a method to check for low resources (areResourcesLow()). So I don't understand why we need to club that in isManual. To me isManual clearly means that the safemode was entered into manually. Moreover I could also argue that the NN should be taken out of low-resource safemode automatically when ResourceMonitor detects adequate resources, so it may not necessarily be a manual step. This is a patch which IMHO corrects these behaviors. Could you please review it? SafeModeInfo.isManual() returns true when resources are low even if it wasn't entered into manually --- Key: HDFS-4862 URL: https://issues.apache.org/jira/browse/HDFS-4862 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Ravi Prakash Attachments: HDFS-4862.patch HDFS-1594 changed isManual to this {code} private boolean isManual() { return extension == Integer.MAX_VALUE !resourcesLow; } {code} One immediate impact of this is that when resources are low, the NN will throw away all block reports from DNs. This is undesirable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4867) metaSave NPEs when there are invalid blocks in repl queue.
[ https://issues.apache.org/jira/browse/HDFS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673904#comment-13673904 ] Konstantin Shvachko commented on HDFS-4867: --- metaSave is probably a casualty here. Should we take a look at why orphaned / missing blocks are kept in replication queues in the first place? It seems that when we delete a file blocks can also be removed from replication queue, because what is the point of replicating them if they don't belong to any files. It still makes sense to have this case covered in metaSave(). The patch looks good. Couple of nits: # Could you remove 3 unused imports in the test. # Also it would be good to close BufferedReader in the end of both test cases. metaSave NPEs when there are invalid blocks in repl queue. -- Key: HDFS-4867 URL: https://issues.apache.org/jira/browse/HDFS-4867 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Kihwal Lee Assignee: Plamen Jeliazkov Fix For: 3.0.0 Attachments: HDFS-4867.trunk.patch Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops generating further report. Normally ReplicationMonitor removes them quickly, but if the queue is huge, it takes very long time. Also in safe mode, they stay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4874) create with OVERWRITE deletes existing file without checking the lease: feature or a bug.
[ https://issues.apache.org/jira/browse/HDFS-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13674059#comment-13674059 ] Suresh Srinivas commented on HDFS-4874: --- I think the current behavior is the right one. Overwrite flag indicates that if a file already exists, it needs to be overwritten, irrespective of if a file is in complete state or being written state. My vote would be to close this as Not a problem. create with OVERWRITE deletes existing file without checking the lease: feature or a bug. - Key: HDFS-4874 URL: https://issues.apache.org/jira/browse/HDFS-4874 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha Reporter: Konstantin Shvachko create with OVERWRITE flag will remove a file under construction even if the issuing client does not hold a lease on the file. It could be a bug or the feature that applications rely upon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4872) Idempotent delete operation.
[ https://issues.apache.org/jira/browse/HDFS-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13674065#comment-13674065 ] Suresh Srinivas commented on HDFS-4872: --- bq. Add modTime parameter to delete operation with the meaning that the object is deleted only if its modification time is = than modTime parameter. How is time synchronization between client and server done? Another approach, use inode id to delete a file. But this has the disadvantage of client having to know the inode id before issuing delete. Idempotent delete operation. Key: HDFS-4872 URL: https://issues.apache.org/jira/browse/HDFS-4872 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.4-alpha Reporter: Konstantin Shvachko Making delete idempotent is important to provide uninterrupted job execution in case of HA failover. This is to discuss different approaches to idempotent implementation of delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4872) Idempotent delete operation.
[ https://issues.apache.org/jira/browse/HDFS-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13674068#comment-13674068 ] Suresh Srinivas commented on HDFS-4872: --- {quote} Just mark delete idempotent. A delete retry may delete an object that has been recreated or replaced between the retries in this case. {quote} I am -1 on this. {quote} Replace delete with idempotent rename to a temporary object, then delete the latter with non-idempotent delete. See the beginning of this comment. {quote} Since this requires two requests - one for rename and then delete, the better approach is to get inode ID and then delete a file using inode ID. delete with unique inode ID is idempotent. Idempotent delete operation. Key: HDFS-4872 URL: https://issues.apache.org/jira/browse/HDFS-4872 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.4-alpha Reporter: Konstantin Shvachko Making delete idempotent is important to provide uninterrupted job execution in case of HA failover. This is to discuss different approaches to idempotent implementation of delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira