Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605080#comment-14605080 ] Brahma Reddy Battula commented on HDFS-8586: [~vinayrpet] kindly review the attached patch!!! thanks Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-8586.patch *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW], ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Auto-Re: [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
您的邮件已收到!谢谢!
[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597398#comment-14597398 ] Hadoop QA commented on HDFS-8586: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 3s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 18s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 17s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 158m 22s | Tests passed in hadoop-hdfs. | | | | 205m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741218/HDFS-8586.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 41ae776 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11444/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11444/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11444/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11444/console | This message was automatically generated. Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-8586.patch *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ |
[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597156#comment-14597156 ] Vinayakumar B commented on HDFS-8586: - Thanks [~brahmareddy] for reporting this. This will come, if the NameNode have the list of deadnodes, and block allocation request comes from the same machine as of DeadNode, then dead node is being chosen as localnode irrespective of whether its part of the cluster or not. Adding one check in {{BlockPlacementPolicyDefault.java#choseLocalStorage(..)}} will be the fix for this. Regarding the test proposed above, it will not fail always, since its a minidfscluster test, and all datanodes will be on the same machine And Probabiity of deadnode being chosen as localstorage is not guaranteed. Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW], ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597196#comment-14597196 ] Brahma Reddy Battula commented on HDFS-8586: [~vinayrpet] thanks a lot for taking a look into this issue.. Added the one check in {{BlockPlacementPolicyDefault.java#choseLocalStorage(..)}} and corrected the testcase.. Kindly Review Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-8586.patch *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW], ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586118#comment-14586118 ] Brahma Reddy Battula commented on HDFS-8586: Workaround can be done by enabling *dfs.name.avoid.write.stale.datanode* where node is considered as stale( if there is no heartbeat in 30 sec's bydefault)..such that it's not allocated for the write...Anyone have some other thoughts..? Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW], ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584465#comment-14584465 ] Brahma Reddy Battula commented on HDFS-8586: *Test Code to reproduce this bug* {code} public void testDeadDatanodeForBlockLocation() throws Exception { Configuration conf = new HdfsConfiguration(); conf.setInt(DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY, 500); conf.setLong(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1L); cluster = new MiniDFSCluster.Builder(conf).numDataNodes(3).build(); cluster.waitActive(); String poolId = cluster.getNamesystem().getBlockPoolId(); // wait for datanode to be marked live DataNode dn = cluster.getDataNodes().get(0); DatanodeRegistration reg = DataNodeTestUtils.getDNRegistrationForBP(dn, poolId); DFSTestUtil.waitForDatanodeState(cluster, reg.getDatanodeUuid(), true, 2); // Shutdown and wait for data node to be marked dead dn.shutdown(); DFSTestUtil.waitForDatanodeState(cluster, reg.getDatanodeUuid(), false, 2); System.out.println(Dn downXXX: + dn.getDisplayName()); Path file = new Path(afile); try (FSDataOutputStream outputStream = cluster.getFileSystem().create(file)) { outputStream.writeChars(testContent); } BlockLocation block = cluster.getFileSystem().getFileBlockLocations(file, 0, 10)[0]; System.out.println(Dn down: + dn.getDisplayName()); for(String node : block.getNames()) { System.out.println(node); if(node.equals(dn.getDisplayName())) { fail(Not expecting the block in a dead node); } } } {code} *Impact which I seen* {color:red}The cluster have 9 Datanode,now stop 5. dfs.replications=3. Put files to HDFS continuously, but some operations failed.{color} I think, Here we can deadnodes also... {code} if (isGoodTarget(storage, blockSize, maxNodesPerRack, considerLoad, results, avoidStaleNodes, storageType)) { results.add(storage); // add node and related nodes to excludedNode return addToExcludedNodes(storage.getDatanodeDescriptor(), excludedNodes); } {code} Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1,
[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
[ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584502#comment-14584502 ] Brahma Reddy Battula commented on HDFS-8586: Currently only stale nodes are excluded but here we can consider deadnodes also .. Dead Datanode is allocated for write when client is from deadnode -- Key: HDFS-8586 URL: https://issues.apache.org/jira/browse/HDFS-8586 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula *{color:blue}DataNode marked as Dead{color}* 2015-06-11 19:39:00,862 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | BLOCK* *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009* | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584) 2015-06-11 19:39:00,863 | INFO | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e | Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488) *{color:blue}Deadnode got Allocated{color}* 2015-06-11 19:39:45,148 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | WARN | IPC Server handler 26 on 25000 | The cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616) 2015-06-11 19:39:45,149 | INFO | IPC Server handler 26 on 25000 | BLOCK* *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW], ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) 2015-06-11 19:39:45,191 | INFO | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW], ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: {{NORMAL:XX.XX.37.33:25009}} |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657) -- This message was sent by Atlassian JIRA (v6.3.4#6332)