[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246990#comment-15246990 ] Lin Yiqun commented on HDFS-10275: -- Thanks [~walter.k.su] for commit! > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.7.3 > > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245624#comment-15245624 ] Hudson commented on HDFS-10275: --- FAILURE: Integrated in Hadoop-trunk-Commit #9626 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9626/]) HDFS-10275. TestDataNodeMetrics failing intermittently due to (waltersu4549: rev ab903029a9d353677184ff5602966b11ffb408b9) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.7.3 > > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245569#comment-15245569 ] Walter Su commented on HDFS-10275: -- sorry I didn't see that. The patch LGTM. +1. > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245521#comment-15245521 ] Lin Yiqun commented on HDFS-10275: -- Hi, [~walter.k.su], I have removed {{SimulatedFSDataset.setFactory(conf);}} in my patch, do you means there is no need to bump the timeout time in addition? > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245394#comment-15245394 ] Walter Su commented on HDFS-10275: -- Good analysis! I think a better way to do this is to use a real FSDataset? Just remove {{SimulatedFSDataset.setFactory(conf);}}. What do you think ? > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234581#comment-15234581 ] Hadoop QA commented on HDFS-10275: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 54s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 106m 38s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 115m 55s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 272m 38s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.TestSafeModeWithStripedFile | | | hadoop.hdfs.server.mover.TestStorageMover | | | hadoop.hdfs.qjournal.TestSecureNNWithQJM | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.fs.contract.hdfs.TestHDFSContractSeek | | JDK v1.7.0_95 Failed junit tests |