[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027179#comment-14027179
 ] 

Konstantin Shvachko edited comment on HDFS-6469 at 6/11/14 12:32 AM:
---------------------------------------------------------------------

Todd, interesting points and you actually answered most of them yourself in 
your comments.

h4. > Coordinated reads
The motivation for a configurable design based on file names is to make 
coordinated reads available to other applications without changing them.
An alternative is to provide a new option (parameter) for read operations 
specifying whether the read should be coordinated or not. Then the application 
developers rather than administrators will be in full control of what they 
coordinate.

h4. > Journals everywhere
If I follow your logic correctly QJM being Paxos-based uses a journal by 
itself, so we are not increasing journaling here. When you look at the bigger 
picture we see more journals around. HBase uses WAL along with NN edits, which 
by itself persisted in ext4 a journaling file system.
As you said if you need to separate them one can use different drives, or SSDs.
Besides as I said in a previous comment one can choose to eliminate CNode edits 
completely.

h4. > Determinism
Determinism is not as hard as it may seem. Not harder than multi-grain locking. 
Say, you need to watch that incremental counters like genStamp are called only 
in agreements and never in proposals. Similar to how you watch that object 
locks are acquired in the right order. Most of that is already in the NameNode, 
thanks to StandbyNode implementation.

h4. > AA vs AS HA
There are several advantages of AA approach over the evolution of the current 
one outlined in your comment. All CNodes are writeable while in current 
approach only the Active NN is.
* If GC hits the active NN service is interrupted. CNodes can continue as other 
nodes can process writes.
* Reads from an SBN are always stale. And in order to write everybody should go 
to the active. So they will be writing to the namespace from the future so to 
speak. I think with mixed workloads all clients will end up working with the 
active NN only, and there won't be enough load balancing.

As you mentioned your design will be addressing the same problems as 
ConsensusNode. The question is what you would rather have in the end: 
active-active or active-read-only-standby.

h4. > why this design makes it any easier to implement a distributed namespace?
I meant the advantage of introducing of a coordination engine in general, not 
the CNode itself. 
I meant that making coordinated updates involving different parts of a 
distributed namespace is rather trivial with a coordination engine. Like an 
atomic rename, that is a move of a file from one parent to another when the 
parents are on different partitions.

h6. > locking
~This should really belong to a different issue. I just want to mention here 
that I don't see that one contradicts another. On the contrary they can be very 
much in sync. I remember we talked about an optimization schema for coordinated 
namepsace when different parts of the namespace are coordinated independently 
(and in parallel under different state machines.~


was (Author: shv):
Sorry hit the submit button too early. Will repost shortly.

> Coordinated replication of the namespace using ConsensusNode
> ------------------------------------------------------------
>
>                 Key: HDFS-6469
>                 URL: https://issues.apache.org/jira/browse/HDFS-6469
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: CNodeDesign.pdf
>
>
> This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
> which enables replication of the namespace on multiple nodes of an HDFS 
> cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to