[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904121#comment-13904121 ] Kihwal Lee commented on HDFS-5225: -- [~lars_francke], what is the version of Hadoop you are using? datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, HDFS-5225.2.patch I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356,
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904144#comment-13904144 ] Lars Francke commented on HDFS-5225: We're running CDH 4.5.0 which is using Hadoop 2.0. I see that a fix for this issue is in CDH 4.6 but that's not released yet. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, HDFS-5225.2.patch I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353,
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903144#comment-13903144 ] Lars Francke commented on HDFS-5225: We're being hit by this issue at the moment. I didn't follow all the explanations so I'm unsure what to do now. Is changing log settings for that class the correct workaround or is there anything else we can do to break the loop? datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, HDFS-5225.2.patch I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780209#comment-13780209 ] Tsuyoshi OZAWA commented on HDFS-5225: -- Thank you for your investigating, Kihwal! Now I'm creating the test which can reproduce the bug you mentioned. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780220#comment-13780220 ] Tsuyoshi OZAWA commented on HDFS-5225: -- [~djp], OK. But the current patch for reproducing can produce the bug rarely as I mentioned. I'll renew the patch for reproducing based on [~kihwal]'s comment. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780358#comment-13780358 ] Kihwal Lee commented on HDFS-5225: -- Due to HDFS-4797, two BlockScanInfo for different blocks could collide with each other in blockInfoSet if the last scan time is the same. This can easily create inconsistencies between blockMap and blockInfoSet. This problem has been fixed recently by HDFS-5031. Other than this, I cannot find any other reason for the two data structures going out of sync. [~ozawa], can you reproduce the problem with HDFS-5031? Your reproducer patch artificially introduces duplicate entries. That should not happen with proper synchronization. Other flaws in algorithm may cause wrong records to get removed or added, but the two data structures should stay in sync after HDFS-5031. If we want to make sure this is the case now and also in the future-proof, we could add sanity checks to addBlockInfo() and delBlockInfo(). For addBlockInfo(), addition to any data structure should not see a duplicate entry. For delBlockInfo(), both data structures should contain an entry to be deleted. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004,
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780396#comment-13780396 ] Kihwal Lee commented on HDFS-5225: -- In HDFS-5031, Vinay explained, bq. BlockScanInfo.equals() is strictly checking for the instance of BlockScanInfo, but in almost all retrievals from blockMap are done using instance of Block, so always will get null value and hence scan will happen again. Since the check returns null, addBlockInfo() can be called without removing the entry first. This will replace the existing entry in blockMap since its using Block's hash code. However, blockInfoSet can end up adding a duplicate entry for the block, since it is keyed on the last scan time. If the older entry becomes the first in the ordered set, is no longer in the dataset will show up repeatedly if the block was deleted. If not deleted, Verification succeeded for... will appear. I will pull HDFS-5031 in to branch-2.1-beta. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780501#comment-13780501 ] Tsuyoshi OZAWA commented on HDFS-5225: -- [~kihwal], thanks for the explanation. The point I mentioned was: {code} if (((now - getEarliestScanTime()) = scanPeriod) || ((!blockInfoSet.isEmpty()) !(this.isFirstBlockProcessed( { verifyFirstBlock(); } {code} but this is not problem, because isFirstBlockProcessed() checks whether blockInfoSet is empty or not with synchronization correctly. LGTM. [~rvs], what do you think? datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780611#comment-13780611 ] Roman Shaposhnik commented on HDFS-5225: Once this shows up in the branch for beta 2.1.2 I can re-run my tests in Bigtop and let you guys know. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353,
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778534#comment-13778534 ] Junping Du commented on HDFS-5225: -- Hi [~ozawa], thanks for the patch! Besides fixing test failures, it would be nice if you can attach log of your reproduce bug (only related part) so that Roman can judge if it is the same bug. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange:
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778879#comment-13778879 ] Kihwal Lee commented on HDFS-5225: -- I am seeing cases of repeated logging of Verification succeeded for xxx for the same block. Since it loops, the disk fills up very quickly. These nodes were told to write a replica then minutes later it was deleted. Minutes went by and the same block with the identical gen stamp was transferred to the node. All these were successful. In the next block scanner scan period, however, the thread gets into seemingly infinite loop of verifying this block replica, after verifying some number of blocks. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779241#comment-13779241 ] Kihwal Lee commented on HDFS-5225: -- The scan spins because getEarliestScanTime() will return the last scan time of the oldest block. Since it never gets removed, the scanner keeps calling verifyFirstBlock(). Also, jdk doc on TreeSet states, Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. This is the case in branch-0.23, but broken in branch-2. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778181#comment-13778181 ] Hadoop QA commented on HDFS-5225: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12605069/HDFS-5225.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestMultipleNNDataBlockScanner org.apache.hadoop.hdfs.web.TestWebHDFS {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5036//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5036//console This message is automatically generated. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775924#comment-13775924 ] Tsuyoshi OZAWA commented on HDFS-5225: -- {quote} Note that this may be not complete: I've faced the log 'is no longer in the dataset' dumped repeatedly, but sometimes with my code. {quote} I mean the log messages itself are dumped every time, but the logging the same 'is no longer in the dataset' message over and over again can happen sometimes. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Attachments: HDFS-5225.1.patch, HDFS-5225-reproduce.1.txt I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771699#comment-13771699 ] Junping Du commented on HDFS-5225: -- Hi Tsuyoshi, I think this fix may also work as this is the only unsynchronized block that accessing blockInfoSet. However, why don't you merge the synchronized block with above code block which is synchronized also. It would be nice if you can have a unit test to reproduce the bug and verify the patch can fix it. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik Attachments: HDFS-5225.1.patch I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771417#comment-13771417 ] Tsuyoshi OZAWA commented on HDFS-5225: -- Which should we dealing with this problem by, log4j setting or source code level fix? We can add a configuration to limit msg of is no longer in the dataset if we fix it at source code level. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356,
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771422#comment-13771422 ] Roman Shaposhnik commented on HDFS-5225: The problem here is really not log overflow. That can be dealt with (as I mentioned). The problem here is that the observed behavior seems to indicated a deeper (potentially significant) problem along the lines of what [~cmccabe] has mentioned. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353,
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771499#comment-13771499 ] Colin Patrick McCabe commented on HDFS-5225: It's not a log4j problem. The problem is that the DataNode is logging millions of identical lines about the same block. The scanner has gotten stuck in a loop, basically datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356,
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771520#comment-13771520 ] Junping Du commented on HDFS-5225: -- Hi Colin and Roman, I think the following code won't delete block from blockInfoSet if block is already removed from blockMap: {code} /** Deletes the block from internal structures */ synchronized void deleteBlock(Block block) { BlockScanInfo info = blockMap.get(block); if ( info != null ) { delBlockInfo(info); } } {code} Then, if the block is happened to be the first block on blockInfoSet, then log will loop forever. The reason of inconsistent between blockMap and blockInfoSet may because blockInfoSet is an unsynchronized TreeSet so is not thread-safe. May be we should replace it with ConcurrentSkipListMap (which keep order and concurrency). Thoughts? datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771538#comment-13771538 ] Tsuyoshi OZAWA commented on HDFS-5225: -- Colin and Roman, thank you for sharing! I understood the problem correctly. Junping, Both blockMap and blockInfoSet must be synchronized by BlockPoolSliceScanner's instance in this case. verifyFirstBlock method refers blockInfoSet, but it's not synchronized. IMHO, if we can make verifyFirstBlock method synchronized, this problem may be solved. What do you think? datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO
[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again
[ https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771542#comment-13771542 ] Tsuyoshi OZAWA commented on HDFS-5225: -- verifyFirstBlock method refers blockInfoSet, but it's not synchronized. IMHO, if we can make verifyFirstBlock method synchronized, this problem may be solved. What do you think? Sorry, I meant not verifyFirstBlock method, but scan method. datanode keeps logging the same 'is no longer in the dataset' message over and over again - Key: HDFS-5225 URL: https://issues.apache.org/jira/browse/HDFS-5225 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.1.1-beta Reporter: Roman Shaposhnik I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on. All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly: {noformat} 2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset {noformat} It wouldn't answer to a jstack and jstack -F ended up being useless. Here's what I was able to find in the NameNode logs regarding this block ID: {noformat} fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0 2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370 2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1) 2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371] 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353,