[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2017-01-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-10275:
--
Fix Version/s: 2.8.0

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: 2.8.0, 2.7.3, 3.0.0-alpha1
>
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-10275:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8, branch-2.7. Thanks [~linyiqun] for 
the contribution!

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.7.3
>
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10275:
-
Attachment: HDFS-10275.001.patch

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10275:
-
Status: Patch Available  (was: Open)

Attach a simple patch from me. I also bump the timeout time in patch to avoid 
that the test executed on a busy jenkins slave, kindly review.

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10275:
-
Description: 
The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
show these:
{code}
Results :

Failed tests: 
  
TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
 expected: but was:

Tests in error: 
  TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
Min...
  TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting for 
...
  TestHFlush.testHFlushInterrupted ? IO The stream is closed
{code}
In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I looked 
into the code and found the real reason is that the metric of 
{{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
the this leads to retry operations till timeout.
I debug the test in my local. I found the most suspect reason which cause 
{{TotalWriteTime}} metric count always be 0 is that we using the 
{{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
will use the inner class's method {{SimulatedOutputStream#write}} to count the 
write time and the method of this class just updates the {{length}} and throws 
its data away.
{code}
@Override
public void write(byte[] b,
  int off,
  int len) throws IOException  {
  length += len;
}
{code} 
So the writing operation hardly not costs any time. So we should use a real way 
to create file instead of simulated way. I have tested in my local that the 
test is passed just one time when I delete the simulated way, while the test 
retries many times to count write time in old way.

  was:
The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
show these:
{code}
Results :

Failed tests: 
  
TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
 expected: but was:

Tests in error: 
  TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
Min...
  TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting for 
...
  TestHFlush.testHFlushInterrupted ? IO The stream is closed
{code}
In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I looked 
into the code and found the real reason is that the metric of 
{{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
the this leads to retry operations till timeout.
I debug the test in my local. I found the most suspect reason whic cause 
{{TotalWriteTime}} metric count always be 0 is that we using the 
{{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
will use the inner class's method {{SimulatedOutputStream#write}} to count the 
write time and the method of this class just updates the {{length}} and throws 
its data away.
{code}
@Override
public void write(byte[] b,
  int off,
  int len) throws IOException  {
  length += len;
}
{code} 
So the writing operation hardly not costs any time. So we should use a real way 
to create file instead of simulated way. I have tested in my local that the 
test is passed just one time when I delete the simulated way, while the test 
retries many times to count write time in old way.


> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write tim