[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531297#comment-14531297
 ] 

Jesse Yates commented on HDFS-6440:
-----------------------------------

More comments, as I actually get back into the code:
{quote}
In StandbyCheckpointer#doCheckpoint, unless I'm missing something, I don't 
think the variable "ie" can ever be non-null, and yet we check for whether or 
not it's null later in the method to determine if we should shut down.
{quote}
It can either be an InterruptedException or an IOException when transfering the 
checkpoint. Interrupted ("ie") thrown if we are interrupted while waiting the 
any checkpoint to complete. IOE if there is an execution exception when doing 
the checkpoint. 

After we get out of waiting for the uploads, if we got an "ioe" or an "ie" then 
we force the rest of the threads that we started for the image transfer to quit 
by shutting down the threadpool (and then forcibly shutting it down shortly 
after that). We do checks again for each exception to ensure we throw the right 
one back up.

We could wrap the exceptions into a parent exception and then just throw that 
back up to the caller (resulting in less checks), but I didn't want to change 
the method signature b/c the interrupted means something very different from 
ioe.

Can do whatever you want there though, don't really matter to me.
We need to make sure either exception is rethrown 

> Support more than 2 NameNodes
> -----------------------------
>
>                 Key: HDFS-6440
>                 URL: https://issues.apache.org/jira/browse/HDFS-6440
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: auto-failover, ha, namenode
>    Affects Versions: 2.4.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>         Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
> hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to