[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

Ravi Prakash (JIRA) Tue, 21 May 2013 13:45:23 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ravi Prakash updated HDFS-4832:
-------------------------------

    Description: 
Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes 
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
{quote}

I was able to replicate this on 0.23 and trunk. I set 
dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
"lost" datanode. The opposite case also has problems (i.e. Datanode failing 
when NN is in safemode, doesn't lead to a missing blocks message)

Without the NN updating this list of missing blocks, the grid admins will not 
know when to take the cluster out of safemode.

  was:
Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes 
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
{quote}

I was able to replicate this on 0.23 and trunk. I set 
dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
"lost" datanode.

Without the NN updating this list of missing blocks, the grid admins will not 
know when to take the cluster out of safemode.

    
> Namenode doesn't change the number of missing blocks in safemode when DNs 
> rejoin or leave
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-4832
>                 URL: https://issues.apache.org/jira/browse/HDFS-4832
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>            Priority: Critical
>         Attachments: HDFS-4832.patch
>
>
> Courtesy Karri VRK Reddy!
> {quote}
> 1. Namenode lost datanodes causing missing blocks
> 2. Namenode was put in safe mode
> 3. Datanode restarted on dead nodes 
> 4. Waited for lots of time for the NN UI to reflect the recovered blocks.
> 5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
> {quote}
> I was able to replicate this on 0.23 and trunk. I set 
> dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
> "lost" datanode. The opposite case also has problems (i.e. Datanode failing 
> when NN is in safemode, doesn't lead to a missing blocks message)
> Without the NN updating this list of missing blocks, the grid admins will not 
> know when to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

Reply via email to