[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2014-02-18 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904121#comment-13904121
 ] 

Kihwal Lee commented on HDFS-5225:
--

[~lars_francke], what is the version of Hadoop you are using?

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, 
 HDFS-5225.2.patch


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, 
 blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356, 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2014-02-18 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904144#comment-13904144
 ] 

Lars Francke commented on HDFS-5225:


We're running CDH 4.5.0 which is using Hadoop 2.0. I see that a fix for this 
issue is in CDH 4.6 but that's not released yet.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, 
 HDFS-5225.2.patch


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2014-02-17 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903144#comment-13903144
 ] 

Lars Francke commented on HDFS-5225:


We're being hit by this issue at the moment. I didn't follow all the 
explanations so I'm unsure what to do now.

Is changing log settings for that class the correct workaround or is there 
anything else we can do to break the loop?

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, 
 HDFS-5225.2.patch


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780209#comment-13780209
 ] 

Tsuyoshi OZAWA commented on HDFS-5225:
--

Thank you for your investigating, Kihwal! Now I'm creating the test which can 
reproduce the bug you mentioned.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.224.158.152:1004 to delete 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780220#comment-13780220
 ] 

Tsuyoshi OZAWA commented on HDFS-5225:
--

[~djp], OK. But the current patch for reproducing can produce the bug rarely as 
I mentioned. I'll renew the patch for reproducing based on [~kihwal]'s comment.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-27 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780358#comment-13780358
 ] 

Kihwal Lee commented on HDFS-5225:
--

Due to HDFS-4797, two BlockScanInfo for different blocks could collide with 
each other in blockInfoSet if the last scan time is the same. This can easily 
create inconsistencies between blockMap and blockInfoSet. This problem has been 
fixed recently by HDFS-5031. Other than this, I cannot find any other reason 
for the two data structures going out of sync. 

[~ozawa], can you reproduce the problem with HDFS-5031? Your reproducer patch 
artificially introduces duplicate entries. That should not happen with proper 
synchronization. Other flaws in algorithm may cause wrong records to get 
removed or added, but the two data structures should stay in sync after 
HDFS-5031.  

If we want to make sure this is the case now and also in the future-proof, we 
could add sanity checks to addBlockInfo() and delBlockInfo(). For 
addBlockInfo(), addition to any data structure should not see a duplicate 
entry. For delBlockInfo(), both data structures should contain an entry to be 
deleted. 

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-27 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780396#comment-13780396
 ] 

Kihwal Lee commented on HDFS-5225:
--

In HDFS-5031, Vinay explained,

bq. BlockScanInfo.equals() is strictly checking for the instance of 
BlockScanInfo, but in almost all retrievals from blockMap are done using 
instance of Block, so always will get null value and hence scan will happen 
again.

Since the check returns null, addBlockInfo() can be called without removing the 
entry first. This will replace the existing entry in blockMap since its using 
Block's hash code. However, blockInfoSet can end up adding a duplicate entry 
for the block, since it is keyed on the last scan time. If the older entry 
becomes the first in the ordered set, is no longer in the dataset will show 
up repeatedly if the block was deleted. If not deleted, Verification succeeded 
for... will appear.

I will pull HDFS-5031 in to branch-2.1-beta.


 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-27 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780501#comment-13780501
 ] 

Tsuyoshi OZAWA commented on HDFS-5225:
--

[~kihwal], thanks for the explanation. The point I mentioned was:

{code}
if (((now - getEarliestScanTime()) = scanPeriod)
|| ((!blockInfoSet.isEmpty())  !(this.isFirstBlockProcessed( {
  verifyFirstBlock();
} 
{code}

but this is not problem, because isFirstBlockProcessed() checks whether 
blockInfoSet is empty or not with synchronization correctly. LGTM. [~rvs], what 
do you think?

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-27 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780611#comment-13780611
 ] 

Roman Shaposhnik commented on HDFS-5225:


Once this shows up in the branch for beta 2.1.2 I can re-run my tests in Bigtop 
and let you guys know.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778534#comment-13778534
 ] 

Junping Du commented on HDFS-5225:
--

Hi [~ozawa], thanks for the patch! Besides fixing test failures, it would be 
nice if you can attach log of your reproduce bug (only related part) so that 
Roman can judge if it is the same bug.  

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-26 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778879#comment-13778879
 ] 

Kihwal Lee commented on HDFS-5225:
--

I am seeing cases of repeated logging of Verification succeeded for xxx for 
the same block. Since it loops, the disk fills up very quickly. 

These nodes were told to write a replica then minutes later it was deleted. 
Minutes went by and the same block with the identical gen stamp was transferred 
to the node. All these were successful. In the next block scanner scan period, 
however, the thread gets into seemingly infinite loop of verifying this block 
replica, after verifying some number of blocks.



 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-26 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779241#comment-13779241
 ] 

Kihwal Lee commented on HDFS-5225:
--

The scan spins because getEarliestScanTime() will return the last scan time of 
the oldest block. Since it never gets removed, the scanner keeps calling 
verifyFirstBlock().

Also, jdk doc on TreeSet states, Note that the ordering maintained by a set 
(whether or not an explicit comparator is provided) must be consistent with 
equals if it is to correctly implement the Set interface.  This is the case in 
branch-0.23, but broken in branch-2.



 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778181#comment-13778181
 ] 

Hadoop QA commented on HDFS-5225:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12605069/HDFS-5225.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.datanode.TestMultipleNNDataBlockScanner
  org.apache.hadoop.hdfs.web.TestWebHDFS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5036//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5036//console

This message is automatically generated.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: HDFS-5225.1.patch, HDFS-5225.2.patch, 
 HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-23 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775924#comment-13775924
 ] 

Tsuyoshi OZAWA commented on HDFS-5225:
--

{quote}
Note that this may be not complete: I've faced the log 'is no longer in the 
dataset' dumped repeatedly, but sometimes with my code.
{quote}

I mean the log messages itself are dumped every time, but the logging the same 
'is no longer in the dataset' message over and over again can happen sometimes.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
 Attachments: HDFS-5225.1.patch, HDFS-5225-reproduce.1.txt


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771699#comment-13771699
 ] 

Junping Du commented on HDFS-5225:
--

Hi Tsuyoshi, I think this fix may also work as this is the only unsynchronized 
block that accessing blockInfoSet. However, why don't you merge the 
synchronized block with above code block which is synchronized also. It would 
be nice if you can have a unit test to reproduce the bug and verify the patch 
can fix it.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
 Attachments: HDFS-5225.1.patch


 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-18 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771417#comment-13771417
 ] 

Tsuyoshi OZAWA commented on HDFS-5225:
--

Which should we dealing with this problem by, log4j setting or source code 
level fix? We can add a configuration to limit msg of is no longer in the 
dataset if we fix it at source code level.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik

 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, 
 blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356, 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-18 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771422#comment-13771422
 ] 

Roman Shaposhnik commented on HDFS-5225:


The problem here is really not log overflow. That can be dealt with (as I 
mentioned). The problem here is that the observed behavior seems to indicated a 
deeper (potentially significant) problem along the lines of what [~cmccabe] has 
mentioned.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik

 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771499#comment-13771499
 ] 

Colin Patrick McCabe commented on HDFS-5225:


It's not a log4j problem.  The problem is that the DataNode is logging millions 
of identical lines about the same block.  The scanner has gotten stuck in a 
loop, basically

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik

 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, 
 blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356, 
 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771520#comment-13771520
 ] 

Junping Du commented on HDFS-5225:
--

Hi Colin and Roman, I think the following code won't delete block from 
blockInfoSet if block is already removed from blockMap:
{code}
  /** Deletes the block from internal structures */
  synchronized void deleteBlock(Block block) {
BlockScanInfo info = blockMap.get(block);
if ( info != null ) {
  delBlockInfo(info);
}
  }
{code}
Then, if the block is happened to be the first block on blockInfoSet, then log 
will loop forever.
The reason of inconsistent between blockMap and blockInfoSet may because 
blockInfoSet is an unsynchronized TreeSet so is not thread-safe. May be we 
should replace it with ConcurrentSkipListMap (which keep order and 
concurrency). Thoughts?

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik

 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-18 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771538#comment-13771538
 ] 

Tsuyoshi OZAWA commented on HDFS-5225:
--

Colin and Roman,

thank you for sharing! I understood the problem correctly.

Junping, 

Both blockMap and blockInfoSet must be synchronized by BlockPoolSliceScanner's 
instance in this case. verifyFirstBlock method refers blockInfoSet, but it's 
not synchronized. IMHO, if we can make verifyFirstBlock method synchronized, 
this problem may be solved. What do you think?

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik

 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO 

[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2013-09-18 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771542#comment-13771542
 ] 

Tsuyoshi OZAWA commented on HDFS-5225:
--

 verifyFirstBlock method refers blockInfoSet, but it's not synchronized. 
 IMHO, if we can make verifyFirstBlock method synchronized, this problem may 
 be solved. What do you think?

Sorry, I meant not verifyFirstBlock method, but scan method.

 datanode keeps logging the same 'is no longer in the dataset' message over 
 and over again
 -

 Key: HDFS-5225
 URL: https://issues.apache.org/jira/browse/HDFS-5225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik

 I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
 configuration: 4 nodes fully distributed cluster with security on.
 All of a sudden my DN ate up all of the space in /var/log logging the 
 following message repeatedly:
 {noformat}
 2013-09-18 20:51:12,046 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
 in the dataset
 {noformat}
 It wouldn't answer to a jstack and jstack -F ended up being useless.
 Here's what I was able to find in the NameNode logs regarding this block ID:
 {noformat}
 fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 allocateBlock: 
 /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
  BP-1884637155-10.224.158.152-1379524544853 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.224.158.152:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.34.74.206:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
 blockMap updated: 10.83.107.80:1004 is added to 
 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
 ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
 ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
 2013-09-18 18:03:17,899 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:17,904 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
 2013-09-18 18:03:18,540 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
 10.34.74.206:1004, 10.224.158.152:1004], 
 clientName=DFSClient_NONMAPREDUCE_-450304083_1)
 2013-09-18 18:03:18,548 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
  successfully to 
 BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
 blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
 blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
 blk_1073742188_1368, blk_1073742189_1371]
 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353,