[ 
https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222573#comment-15222573
 ] 

Jian Fang commented on HDFS-3743:
---------------------------------

Didn't get a chance to work on this yet and come back again for this issue. 

Since HADOOP-7001 is a long way to go, I would start to fix a specific case 
first, i.e., QJM is able to format a new journal node after a journal node is 
replaced.

My thought is to add some logic to the beginning of the following method in 
QuorumJournalManager  

Map<AsyncLogger, NewEpochResponseProto> createNewUniqueEpoch()
      throws IOException

to check all available journal nodes by calling the following method.   

QuorumCall<AsyncLogger, Boolean> call =
        loggers.isFormatted();

The call will wait for all journal nodes to response back and timeout after a 
given time to avoid waiting forever. If the call times out, simply ignore this 
call and continue the workflow in createNewUniqueEpoch(). However, if the call 
is successful, will check if any journal node is not formatted. If not 
formatted, call format(nsInfo) on this logger to format it. The nsInfo is 
available to QJM and I think it should be able to format the new journal node 
successfully. 

But I have couple questions to ask

1) will this extra step with wait time cause any trouble for this new active 
QJM?
2) would this extra step introduce a lot of overhead in normal condition 
without a need to format a journal node?
3) since in our cases, we need to restart the name nodes after a new journal 
node is in place, the createNewUniqueEpoch() should be called once to format 
the new journal node. Is this assumption valid?
4) Once a new journal node is formatted, are there any extra steps to make it 
sync data from other peers? Or this has already been handled by the quorum 
protocol?

Thanks.



> QJM: improve formatting behavior for JNs
> ----------------------------------------
>
>                 Key: HDFS-3743
>                 URL: https://issues.apache.org/jira/browse/HDFS-3743
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>
> Currently, the JournalNodes automatically format themselves when a new writer 
> takes over, if they don't have any data for that namespace. However, this has 
> a few problems:
> 1) if the administrator accidentally points a new NN at the wrong quorum (eg 
> corresponding to another cluster), it will auto-format a directory on those 
> nodes. This doesn't cause any data loss, but would be better to bail out with 
> an error indicating that they need to be formatted.
> 2) if a journal node crashes and needs to be reformatted, it should be able 
> to re-join the cluster and start storing new segments without having to fail 
> over to a new NN.
> 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes 
> undone), and the user starts the NN, it should fail to start, because it may 
> end up missing edits. If it auto-formats in this case, the user might have 
> silent "rollback" of the most recent edits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to