[ 
https://issues.apache.org/jira/browse/HDFS-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053953#comment-15053953
 ] 

Tony Wu commented on HDFS-9493:
-------------------------------

Hi [~liuml07], I would like to work on fixing this test.

Did some analysis on the failure by printing out the metasave content. Turns 
out the metasave output for the current test contains 2 Datanodes:
{code}
metasave out: 1 files and directories, 0 blocks = 1 total filesystem objects
metasave out: Live Datanodes: 1
metasave out: Dead Datanodes: 1
metasave out: Metasave: Blocks waiting for replication: 0
metasave out: Mis-replicated blocks that have been postponed:
metasave out: Metasave: Blocks being replicated: 0
metasave out: Metasave: Blocks 4 waiting deletion from 2 datanodes.
metasave out: 127.0.0.1:53465
metasave out: LightWeightHashSet(size=2, modification=2, entries.length=16)
metasave out: 127.0.0.1:53469
metasave out: LightWeightHashSet(size=2, modification=2, entries.length=16)
metasave out: Metasave: Number of datanodes: 2
metasave out: 127.0.0.1:53465 IN 998093619200(929.55 GB) 10270(10.03 KB) 0.00% 
882663514112(822.04 GB) 0(0 B) 0(0 B) 100.00% 0(0 B) Fri Dec 11 17:48:41 PST 
2015
metasave out: 127.0.0.1:53469 IN 998093619200(929.55 GB) 8192(8 KB) 0.00% 
882663825408(822.04 GB) 0(0 B) 0(0 B) 100.00% 0(0 B) Fri Dec 11 17:48:26 PST 
2015
{code}

This leads me to believe the following wait time was not long enough: 
{code:java}
    // wait for namenode to discover that a datanode is dead
    Thread.sleep(15000);
{code}

After increasing the sleep time to 30 seconds, the test was able to pass 
consistently.

The invalid bock count shown in {{Block x waiting deletion...}} statement is 
updated by {{blockManager.removeBlocksAssociatedTo()}}, which is called by 
{{DatanodeManager#removeDeadDatanode()}}. This only happens at 
{{HeartbeatManager#heartbeatCheck()}}. Using sleep may not be the best way to 
ensure the Datanode is deleted by Namenode.

I will upload a patch with a more robust way of waiting for the Datanode to be 
removed, instead of relying on {{Thread.sleep()}}.

> Test o.a.h.hdfs.server.namenode.TestMetaSave fails in trunk
> -----------------------------------------------------------
>
>                 Key: HDFS-9493
>                 URL: https://issues.apache.org/jira/browse/HDFS-9493
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Mingliang Liu
>
> Tested in both Gentoo Linux and Mac.
> {quote}
> -------------------------------------------------------
>  T E S T S
> -------------------------------------------------------
> Running org.apache.hadoop.hdfs.server.namenode.TestMetaSave
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 34.159 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestMetaSave
> testMetasaveAfterDelete(org.apache.hadoop.hdfs.server.namenode.TestMetaSave)  
> Time elapsed: 15.318 sec  <<< FAILURE!
> java.lang.AssertionError: null
>       at org.junit.Assert.fail(Assert.java:86)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at org.junit.Assert.assertTrue(Assert.java:52)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestMetaSave.testMetasaveAfterDelete(TestMetaSave.java:154)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to