Hi Flavio,

> I don't understand why you have to ship the log to the read only replicas. 
> Aren't you storing the log on HDFS currently? Can't they read from HDFS 
> directly?

Possibly the replicas can "tail" the WAL of the master, was using the term log 
shipping in the abstract. However I'm not an HDFS expert so unsure if we could 
read the last (partial) block in the WAL. Newly written data exists only in 
memory so the WAL would be the only option for transmitting this data until 
flush without some sort of direct replication.

> I wonder why you are choosing 3 for the size of a clique and not letting it 
> be a free parameter.

It would but 3 seems a reasonable default. (?)

> Are you choosing 3 to avoid the replication overhead?

Yes.

> #1 is relatively simple but trades away the consistency
> I don't see where you could have inconsistencies here. Would you mind 
> elaborating a bit further?

At any given instant queries to a replica may not return the same result as the 
(write) master for data in memstore and (possibly) in the last block of the WAL.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)

--- On Wed, 2/2/11, Flavio Junqueira <f...@yahoo-inc.com> wrote:

From: Flavio Junqueira <f...@yahoo-inc.com>
Subject: Re: Extracting Zab from Zookeeper
Date: Wednesday, February 2, 2011, 2:14 AM

Hi Andrew,  Interesting use case, thanks for sharing. I'm curious about a few 
things:

On Feb 1, 2011, at 5:38 PM, Andrew Purtell wrote:

Two ideas actually:

1) Do pretty straightforward log shipping from region master to read only 
replicas.


I don't understand why you have to ship the log to the read only replicas. 
Aren't you storing the log on HDFS currently? Can't they read from HDFS 
directly?


2) Divide the cluster into quorum 3-cliques. Extract ZAB and use it to maintain 
consensus on writes from region master to two read only replicas. Run the 
consensus protocol in parallel with HDFS hflush to the write ahead log. Needs a 
lot of work filling in the detail, obviously, but that's the general notion.


I wonder why you are choosing 3 for the size of a clique and not letting it be 
a free parameter. I would think that this a decision of the user. Are you 
choosing 3 to avoid the replication overhead?

#1 is relatively simple but trades away the consistency for which HBase is 
indicated for higher availability (for reads) when regions are in transition.

I don't see where you could have inconsistencies here. Would you mind 
elaborating a bit further?


#2 is not simple at all but may let maintain replicas that are fully consistent 
at all times with the region master, not lower region master write performance 
unacceptably, and also gain the higher availability (for reads) when regions 
are in transition.


Agreed, it will be tricky, especially because we would have to extract Zab 
first.

Cheers,-Flavio

flavio junqueira
research scientist
f...@yahoo-inc.com
direct +34 93-183-8828
 
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

 




Reply via email to