[ 
https://issues.apache.org/jira/browse/HDFS-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205052#comment-13205052
 ] 

Bikas Saha commented on HDFS-2781:
----------------------------------

bq. friendly, eg if storage restoration is already enabled you mihht not think 
that you should try to enable it to get this side effect.
In that case, rolling logs will restore the directories just like it works as 
of now
HA imposes higher restrictions compared to what works as of now. So we might 
need to do special stuff for HA only. Which might be trying to restore failed 
directories in the process of transitioning to active (maybe also standby)
>From what I read of the code, the standby doesnt seem to bother with setting 
>failed directories since its operations are all read only. So there might be 
>no need for the standby to shutdown gracefully.
If the active moves to SM because of a bad required directory then it should 
restore all required directories when it goes out of safe mode or else complain 
and stay in safe mode. All this should happen after the admin has done the 
necessary pre-requisites and issued a -safeMode leave command.
bq. There's some interaction with fencing, here, though... one likely reason 
that the NN will lose touch with the shared storage is that another node has 
requested that the NAS device fence the host. Then, after the failover, the 
administrator might unfence the host from the NAS, and we don't want the NN to 
automatically "come back to life" at this point.
Does the NN come back out of safemode automatically or only after an admin 
command?
                
> Add client protocol and DFSadmin for command to restore failed storage
> ----------------------------------------------------------------------
>
>                 Key: HDFS-2781
>                 URL: https://issues.apache.org/jira/browse/HDFS-2781
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>
> Per HDFS-2769, it's important that an admin be able to ask the NN to try to 
> restore failed storage since we may drop into SM until the shared edits dir 
> is restored (w/o having to wait for the next checkpoint). There's currently 
> an API (and usage in DFSAdmin) to flip the flag indicating whether the NN 
> should try to restore failed storage but not that it should actually attempt 
> to do so. This jira is to add one. This is useful outside HA but doing as an 
> HDFS-1623 sub-task since it's motivated by HA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to