[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2018-05-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479982#comment-16479982
 ] 

Íñigo Goiri commented on HDFS-12378:


We'd like to backport to branch-2.9, opened HDFS-13590.

> TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
> --
>
> Key: HDFS-12378
> URL: https://issues.apache.org/jira/browse/HDFS-12378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-alpha4
>Reporter: Xiao Chen
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-12378.00.patch, HDFS-12378.01.patch
>
>
> Saw on 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:
> Error Message
> {noformat}
> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
> {noformat}
> Standard Output
> {noformat}
> 2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
> numDataNodes=3
> Formatting using clusterid: testClusterID
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:newInstance(224)) - Edit logging is async:false
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(742)) - KeyProvider: null
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(120)) - fsLock is fair: true
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(136)) - Detailed lock hold time metrics 
> enabled: false
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(764)) - supergroup  = supergroup
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(765)) - isPermissionEnabled = true
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(776)) - HA Enabled: false
> 2017-08-30 18:02:37,718 [main] INFO  common.Util 
> (Util.java:isDiskStatsEnabled(395)) - 
> dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
> profiling
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
> configured=1000, counted=60, effected=1000
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(309)) - 
> dfs.namenode.datanode.registration.ip-hostname-check=true
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(76)) - 
> dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(82)) - The block de

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167151#comment-16167151
 ] 

Hudson commented on HDFS-12378:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12877 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12877/])
HDFS-12378.  (lei: rev 61cee3a0b9a8ea2e4f6257c17c2d90c7c930cc34)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java


> TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
> --
>
> Key: HDFS-12378
> URL: https://issues.apache.org/jira/browse/HDFS-12378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-alpha4
>Reporter: Xiao Chen
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-12378.00.patch, HDFS-12378.01.patch
>
>
> Saw on 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:
> Error Message
> {noformat}
> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
> {noformat}
> Standard Output
> {noformat}
> 2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
> numDataNodes=3
> Formatting using clusterid: testClusterID
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:newInstance(224)) - Edit logging is async:false
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(742)) - KeyProvider: null
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(120)) - fsLock is fair: true
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(136)) - Detailed lock hold time metrics 
> enabled: false
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(764)) - supergroup  = supergroup
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(765)) - isPermissionEnabled = true
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(776)) - HA Enabled: false
> 2017-08-30 18:02:37,718 [main] INFO  common.Util 
> (Util.java:isDiskStatsEnabled(395)) - 
> dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
> profiling
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
> configured=1000, counted=60, effected=1000
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(309)) - 
> dfs.namenode.datanode.registration.ip-hostname-check=true
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:pri

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167111#comment-16167111
 ] 

Hadoop QA commented on HDFS-12378:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 26s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.TestDFSUpgrade |
|   | hadoop.hdfs.TestLeaseRecoveryStriped |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.TestReadWhileWriting |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
| Timed out junit tests | org.apache.hadoop.hdfs.TestWriteReadStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:71bbb86 |
| JIRA Issue | HDFS-12378 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12887182/HDFS-12378.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 5f8d457ecde9 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7ee02d1 |
| Default Java | 1.8.0_144 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21142/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21142/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21142/conso

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-14 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166992#comment-16166992
 ] 

Andrew Wang commented on HDFS-12378:


+1 pending, thanks for working on this Eddy!

> TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
> --
>
> Key: HDFS-12378
> URL: https://issues.apache.org/jira/browse/HDFS-12378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-alpha4
>Reporter: Xiao Chen
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: flaky-test
> Attachments: HDFS-12378.00.patch, HDFS-12378.01.patch
>
>
> Saw on 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:
> Error Message
> {noformat}
> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
> {noformat}
> Standard Output
> {noformat}
> 2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
> numDataNodes=3
> Formatting using clusterid: testClusterID
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:newInstance(224)) - Edit logging is async:false
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(742)) - KeyProvider: null
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(120)) - fsLock is fair: true
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(136)) - Detailed lock hold time metrics 
> enabled: false
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(764)) - supergroup  = supergroup
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(765)) - isPermissionEnabled = true
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(776)) - HA Enabled: false
> 2017-08-30 18:02:37,718 [main] INFO  common.Util 
> (Util.java:isDiskStatsEnabled(395)) - 
> dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
> profiling
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
> configured=1000, counted=60, effected=1000
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(309)) - 
> dfs.namenode.datanode.registration.ip-hostname-check=true
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(76)) - 
> dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(82)) - The block deletion will 
> start around 2017 Aug 30 18:02:37

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166918#comment-16166918
 ] 

Hadoop QA commented on HDFS-12378:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
48s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m  0s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  
org.apache.hadoop.hdfs.server.datanode.DataNode.transferReplicaForPipelineRecovery(ExtendedBlock,
 DatanodeInfo[], StorageType[], String[], String) explicitly invokes run on a 
thread (did you mean to start it instead?)  At DataNode.java:explicitly invokes 
run on a thread (did you mean to start it instead?)  At DataNode.java:[line 
3005] |
| Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
|   | hadoop.hdfs.TestClose |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.TestLeaseRecoveryStriped |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.TestParallelUnixDomainRead |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestReconstructStripedFile |
| Timed out junit tests | org.apache.hadoop.hdfs.TestWriteReadStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:71bbb86 |
| JIRA Issue | HDFS-12378 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12887150/HDFS-12378.00.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8c0a18212bad 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchp

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-14 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166882#comment-16166882
 ] 

Brahma Reddy Battula commented on HDFS-12378:
-

+1,bisecting the good option thanks [~andrew.wang] and [~eddyxu] where I spent 
time on debugging this.

> TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
> --
>
> Key: HDFS-12378
> URL: https://issues.apache.org/jira/browse/HDFS-12378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-alpha4
>Reporter: Xiao Chen
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: flaky-test
> Attachments: HDFS-12378.00.patch
>
>
> Saw on 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:
> Error Message
> {noformat}
> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
> {noformat}
> Standard Output
> {noformat}
> 2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
> numDataNodes=3
> Formatting using clusterid: testClusterID
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:newInstance(224)) - Edit logging is async:false
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(742)) - KeyProvider: null
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(120)) - fsLock is fair: true
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(136)) - Detailed lock hold time metrics 
> enabled: false
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(764)) - supergroup  = supergroup
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(765)) - isPermissionEnabled = true
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(776)) - HA Enabled: false
> 2017-08-30 18:02:37,718 [main] INFO  common.Util 
> (Util.java:isDiskStatsEnabled(395)) - 
> dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
> profiling
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
> configured=1000, counted=60, effected=1000
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(309)) - 
> dfs.namenode.datanode.registration.ip-hostname-check=true
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(76)) - 
> dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(82)) - The 

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-14 Thread Ajay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166788#comment-16166788
 ] 

Ajay Kumar commented on HDFS-12378:
---

+1 (non binding), tested locally, test case passes after patch

> TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
> --
>
> Key: HDFS-12378
> URL: https://issues.apache.org/jira/browse/HDFS-12378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-alpha4
>Reporter: Xiao Chen
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: flaky-test
> Attachments: HDFS-12378.00.patch
>
>
> Saw on 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:
> Error Message
> {noformat}
> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
> {noformat}
> Standard Output
> {noformat}
> 2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
> numDataNodes=3
> Formatting using clusterid: testClusterID
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:newInstance(224)) - Edit logging is async:false
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(742)) - KeyProvider: null
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(120)) - fsLock is fair: true
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(136)) - Detailed lock hold time metrics 
> enabled: false
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(764)) - supergroup  = supergroup
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(765)) - isPermissionEnabled = true
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(776)) - HA Enabled: false
> 2017-08-30 18:02:37,718 [main] INFO  common.Util 
> (Util.java:isDiskStatsEnabled(395)) - 
> dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
> profiling
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
> configured=1000, counted=60, effected=1000
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(309)) - 
> dfs.namenode.datanode.registration.ip-hostname-check=true
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(76)) - 
> dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(82)) - The block deletion will 
> start around 2017 Aug 30 18:02:37
> 20

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-14 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166621#comment-16166621
 ] 

Lei (Eddy) Xu commented on HDFS-12378:
--

Sure. [~andrew.wang]. Looking into this today

> TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
> --
>
> Key: HDFS-12378
> URL: https://issues.apache.org/jira/browse/HDFS-12378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Xiao Chen
>Assignee: Ajay Kumar
>Priority: Blocker
>  Labels: flaky-test
>
> Saw on 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:
> Error Message
> {noformat}
> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
> {noformat}
> Standard Output
> {noformat}
> 2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
> numDataNodes=3
> Formatting using clusterid: testClusterID
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:newInstance(224)) - Edit logging is async:false
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(742)) - KeyProvider: null
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(120)) - fsLock is fair: true
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(136)) - Detailed lock hold time metrics 
> enabled: false
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(764)) - supergroup  = supergroup
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(765)) - isPermissionEnabled = true
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(776)) - HA Enabled: false
> 2017-08-30 18:02:37,718 [main] INFO  common.Util 
> (Util.java:isDiskStatsEnabled(395)) - 
> dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
> profiling
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
> configured=1000, counted=60, effected=1000
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(309)) - 
> dfs.namenode.datanode.registration.ip-hostname-check=true
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(76)) - 
> dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(82)) - The block deletion will 
> start around 2017 Aug 30 18:02:37
> 2017-08-30 18:02:37,719 [main] INFO  util.GSet 
> (LightWeightGSet.java:computeCapacity(395)) - C

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-14 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165874#comment-16165874
 ] 

Andrew Wang commented on HDFS-12378:


I bisected this to HDFS-12215 with a script that runs the test 10 times. 
[~eddyxu] could you dig into this?

> TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
> --
>
> Key: HDFS-12378
> URL: https://issues.apache.org/jira/browse/HDFS-12378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Xiao Chen
>Assignee: Ajay Kumar
>Priority: Blocker
>  Labels: flaky-test
>
> Saw on 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:
> Error Message
> {noformat}
> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
> {noformat}
> Standard Output
> {noformat}
> 2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
> numDataNodes=3
> Formatting using clusterid: testClusterID
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:newInstance(224)) - Edit logging is async:false
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(742)) - KeyProvider: null
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(120)) - fsLock is fair: true
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(136)) - Detailed lock hold time metrics 
> enabled: false
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(764)) - supergroup  = supergroup
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(765)) - isPermissionEnabled = true
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(776)) - HA Enabled: false
> 2017-08-30 18:02:37,718 [main] INFO  common.Util 
> (Util.java:isDiskStatsEnabled(395)) - 
> dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
> profiling
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
> configured=1000, counted=60, effected=1000
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(309)) - 
> dfs.namenode.datanode.registration.ip-hostname-check=true
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(76)) - 
> dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(82)) - The block deletion will 
> start around 2017 Aug 30 18:02:37
> 2017-08-30 18:02:37,719 [main] INFO  u

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-13 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165067#comment-16165067
 ] 

Andrew Wang commented on HDFS-12378:


I think we can fix the flaky tests after beta1. If people would like to post 
patches to disable in the meantime, that's a good short-term fix. Of course, 
patches to fix would be even better.

> TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
> --
>
> Key: HDFS-12378
> URL: https://issues.apache.org/jira/browse/HDFS-12378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Xiao Chen
>Assignee: Ajay Kumar
>Priority: Blocker
>
> Saw on 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:
> Error Message
> {noformat}
> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
> {noformat}
> Standard Output
> {noformat}
> 2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
> numDataNodes=3
> Formatting using clusterid: testClusterID
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:newInstance(224)) - Edit logging is async:false
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(742)) - KeyProvider: null
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(120)) - fsLock is fair: true
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(136)) - Detailed lock hold time metrics 
> enabled: false
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(764)) - supergroup  = supergroup
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(765)) - isPermissionEnabled = true
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(776)) - HA Enabled: false
> 2017-08-30 18:02:37,718 [main] INFO  common.Util 
> (Util.java:isDiskStatsEnabled(395)) - 
> dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
> profiling
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
> configured=1000, counted=60, effected=1000
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(309)) - 
> dfs.namenode.datanode.registration.ip-hostname-check=true
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(76)) - 
> dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(82)) - The block deletion will 
> start around 2017 Aug 30

[jira] [Commented] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-09-13 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164866#comment-16164866
 ] 

Rushabh S Shah commented on HDFS-12378:
---

There are many EC related and other flaky tests in trunk.
Should we open the ticket for all such flaky and mark ti as blocker for 
3.0.0.beta ?

> TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
> --
>
> Key: HDFS-12378
> URL: https://issues.apache.org/jira/browse/HDFS-12378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Xiao Chen
>Assignee: Ajay Kumar
>Priority: Blocker
>
> Saw on 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:
> Error Message
> {noformat}
> Failed to replace a bad datanode on the existing pipeline due to no more good 
> datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
>  The current failed datanode replacement policy is ALWAYS, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
> {noformat}
> Standard Output
> {noformat}
> 2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
> numDataNodes=3
> Formatting using clusterid: testClusterID
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:newInstance(224)) - Edit logging is async:false
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(742)) - KeyProvider: null
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(120)) - fsLock is fair: true
> 2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:(136)) - Detailed lock hold time metrics 
> enabled: false
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(764)) - supergroup  = supergroup
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(765)) - isPermissionEnabled = true
> 2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
> (FSNamesystem.java:(776)) - HA Enabled: false
> 2017-08-30 18:02:37,718 [main] INFO  common.Util 
> (Util.java:isDiskStatsEnabled(395)) - 
> dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
> profiling
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
> configured=1000, counted=60, effected=1000
> 2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
> (DatanodeManager.java:(309)) - 
> dfs.namenode.datanode.registration.ip-hostname-check=true
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(76)) - 
> dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
> 2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
> (InvalidateBlocks.java:printBlockDeletionTime(82)) - The block deletion will 
> start around 2017 Aug 30 18:02:37
> 2017-08-30 18:02:37,719 [main]