[ https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222573#comment-15222573 ]
Jian Fang commented on HDFS-3743: --------------------------------- Didn't get a chance to work on this yet and come back again for this issue. Since HADOOP-7001 is a long way to go, I would start to fix a specific case first, i.e., QJM is able to format a new journal node after a journal node is replaced. My thought is to add some logic to the beginning of the following method in QuorumJournalManager Map<AsyncLogger, NewEpochResponseProto> createNewUniqueEpoch() throws IOException to check all available journal nodes by calling the following method. QuorumCall<AsyncLogger, Boolean> call = loggers.isFormatted(); The call will wait for all journal nodes to response back and timeout after a given time to avoid waiting forever. If the call times out, simply ignore this call and continue the workflow in createNewUniqueEpoch(). However, if the call is successful, will check if any journal node is not formatted. If not formatted, call format(nsInfo) on this logger to format it. The nsInfo is available to QJM and I think it should be able to format the new journal node successfully. But I have couple questions to ask 1) will this extra step with wait time cause any trouble for this new active QJM? 2) would this extra step introduce a lot of overhead in normal condition without a need to format a journal node? 3) since in our cases, we need to restart the name nodes after a new journal node is in place, the createNewUniqueEpoch() should be called once to format the new journal node. Is this assumption valid? 4) Once a new journal node is formatted, are there any extra steps to make it sync data from other peers? Or this has already been handled by the quorum protocol? Thanks. > QJM: improve formatting behavior for JNs > ---------------------------------------- > > Key: HDFS-3743 > URL: https://issues.apache.org/jira/browse/HDFS-3743 > Project: Hadoop HDFS > Issue Type: Sub-task > Affects Versions: QuorumJournalManager (HDFS-3077) > Reporter: Todd Lipcon > > Currently, the JournalNodes automatically format themselves when a new writer > takes over, if they don't have any data for that namespace. However, this has > a few problems: > 1) if the administrator accidentally points a new NN at the wrong quorum (eg > corresponding to another cluster), it will auto-format a directory on those > nodes. This doesn't cause any data loss, but would be better to bail out with > an error indicating that they need to be formatted. > 2) if a journal node crashes and needs to be reformatted, it should be able > to re-join the cluster and start storing new segments without having to fail > over to a new NN. > 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes > undone), and the user starts the NN, it should fail to start, because it may > end up missing edits. If it auto-formats in this case, the user might have > silent "rollback" of the most recent edits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)