[ 
https://issues.apache.org/jira/browse/HDFS-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2177:
-------------------------------

    Attachment: test_HDFS-2177.sh

The version of HDFS running on the 10 node clusters was remotes/origin/MR-279. 
I wasn't able to reproduce this issue on trunk (which was running on my single 
node cluster). On MR-279 ( e3d9a2bcbcab817043b1c4c41efb7036ce00904f ) its 
pretty easy. I'm attaching a test you can use to replicate the behavior on a 
single node cluster. I'm not going to work on this any longer (investigate why 
its happening) because this issue wasn't on trunk (already been fixed?).

To run the test:
1. Please set your checkpoint period (dfs.namenode.checkpoint.period) to 10 
seconds. 
2. The idea is to make the NN shutdown at exactly the time the SNN is doing a 
checkpoint. On my machine the fillHDFS function takes exactly the time to cause 
that. You might have to adjust the sleep times while looking at the NN and SNN 
logs to replicate this.
3. The test fills up /tmp to create a big edits file. I'm not sure if that is 
necessary.

> Restarting the namenode when the secondary namenode is checkpointing seems to 
> remove everything from /
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-2177
>                 URL: https://issues.apache.org/jira/browse/HDFS-2177
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: test_HDFS-2177.sh
>
>
> This was again discovered by Arpit Gupta! Restarting the namenode when the 
> secondary namenode is checkpointing seems to remove everything from /

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to