Lin Yiqun created HDFS-10275:
--------------------------------

             Summary: TestDataNodeMetrics failing intermittently due to 
TotalWriteTime counted incorrectly
                 Key: HDFS-10275
                 URL: https://issues.apache.org/jira/browse/HDFS-10275
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: test
            Reporter: Lin Yiqun
            Assignee: Lin Yiqun


The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
show these:
{code}
Results :

Failed tests: 
  
TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
 expected:<false> but was:<true>

Tests in error: 
  TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
Min...
  TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting for 
...
  TestHFlush.testHFlushInterrupted ? IO The stream is closed
{code}
In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I looked 
into the code and found the real reason is that the metric of 
{{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
the this leads to retry operations till timeout.
I debug the test in my local. I found the most suspect reason whic cause 
{{TotalWriteTime}} metric count always be 0 is that we using the 
{{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
will use the inner class's method {{SimulatedOutputStream#write}} to count the 
write time and the method of this class just updates the {{length}} and throws 
its data away.
{code}
    @Override
    public void write(byte[] b,
              int off,
              int len) throws IOException  {
      length += len;
    }
{code} 
So the writing operation hardly not costs any time. So we should use a real way 
to create file instead of simulated way. I have tested in my local that the 
test is passed just one time when I delete the simulated way, while the test 
retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to