[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246990#comment-15246990
 ] 

Lin Yiqun commented on HDFS-10275:
--

Thanks [~walter.k.su] for commit!

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.7.3
>
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245624#comment-15245624
 ] 

Hudson commented on HDFS-10275:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9626 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9626/])
HDFS-10275. TestDataNodeMetrics failing intermittently due to (waltersu4549: 
rev ab903029a9d353677184ff5602966b11ffb408b9)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java


> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.7.3
>
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245569#comment-15245569
 ] 

Walter Su commented on HDFS-10275:
--

sorry I didn't see that. The patch LGTM. +1.

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245521#comment-15245521
 ] 

Lin Yiqun commented on HDFS-10275:
--

Hi, [~walter.k.su], I have removed {{SimulatedFSDataset.setFactory(conf);}} in 
my patch, do you means there is no need to bump the timeout time in addition?

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245394#comment-15245394
 ] 

Walter Su commented on HDFS-10275:
--

Good analysis! I think a better way to do this is to use a real FSDataset? Just 
remove {{SimulatedFSDataset.setFactory(conf);}}. What do you think ?

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234581#comment-15234581
 ] 

Hadoop QA commented on HDFS-10275:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 54s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
4s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 106m 38s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 115m 55s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
36s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 272m 38s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.hdfs.TestSafeModeWithStripedFile |
|   | hadoop.hdfs.server.mover.TestStorageMover |
|   | hadoop.hdfs.qjournal.TestSecureNNWithQJM |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.fs.contract.hdfs.TestHDFSContractSeek |
| JDK v1.7.0_95 Failed junit tests |