[ 
https://issues.apache.org/jira/browse/HDFS-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454201#comment-13454201
 ] 

Todd Lipcon commented on HDFS-3926:
-----------------------------------

>     order to actually increase the number of failures the system can 
> tolerate, you
>     should run an odd number of JNs, (i.e. 3, 5, 7, etc.)

Probably should explicitly say that, with N JNs, we tolerate (N - 1)/2 failures.

>   * <<dfs.namenode.shared.edits.dir>> - the URI which identifies the grou of 
> JNs where the NameNodes will write/read edits
Typo: grou

>     This is where one configures the addresses of the JournalNodes which the
>     Standby NameNode uses to stay up-to-date with all the file system changes 
> the
>     Active NameNode makes
This seems to indicate that the SBN is the only node using this command. 
Perhaps better to say something like "the addresses of the JournalNodes which 
provide the shared edits storage, written to by the Active NameNode and read by 
the Standby NameNode." or somesuch?

>     should be the a URI that uses the scheme "qjournal", where the 
> scheme-specific
typo: "the a"


>
The value of this setting
>     should be the a URI that uses the scheme "qjournal", where the 
> scheme-specific
>     part is a semicolon-separated list of <host:port> addresses where the
>     JournalNodes can be contacted, and the path component is a unique 
> identifier for
>     this nameservice. Though not a requirement, it's a good idea to reuse the
>     nameservice ID for the journal identifier.

Though this is correct, I find it hard to read. Perhaps it would be simpler to 
say:

The URI should be of the form: 
qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId>. The Journal 
ID is a unique identifier for this nameservice, which allows a single set of 
JournalNodes to provide storage for multiple federated namesystems. Though not 
a requirement, it's a good idea to reuse the nameservice ID for the journal 
identifier.

(also keep the example paragraph that you have following this one)

>     Thus, during a failover, we first ensure that the Active NameNode is 
> either
>     in the Standby state, or the process has terminated, before transitioning 
> the
>     other NameNode to the Active state. In order to do this, you must 
> configure at
>     least one <<fencing method.>> These are configured as a
>     carriage-return-separated list, which will be attempted in order until one
>     indicates that fencing has succeeded. There are two methods which ship 
> with
>     Hadoop: <shell> and <sshfence>. For information on implementing your own 
> custom
>     fencing method, see the <org.apache.hadoop.ha.NodeFencer> class.

This paragraph no longer seems relevant.

> Redundancy for this data is provided by running multiple separate JournalNodes

add: or configuring this directory on a locally attached RAID array.

>   can be done by running the command "<hdfs journalnode>" and waiting for the
>   daemon to start on each of the relevant machines.

Should probably have them use hdfs-daemon.sh instead so it daemonizes

>   of your NameNode metadata directories to the other, unformatted NameNode
>   using the command "<hdfs namenode -bootstrapStandby>". Running this command

Specify that this command should be run on the machine that will be the standby.

Perhaps it would be better to format this section as a step-by-step list.
                
> QJM: Add user documentation for QJM
> -----------------------------------
>
>                 Key: HDFS-3926
>                 URL: https://issues.apache.org/jira/browse/HDFS-3926
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: documentation
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-3926.patch, qjm-ha-doc.diff, regular-ha-doc.diff
>
>
> We should add user-facing documentation for how to configure/use the QJM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to