Others have followed up on the central question, which is about durability, and 
have pointed out that the text is misleading.

However more generally regarding the question "Does HBase do in-memory 
replication of rows?":

HBase will have a replication feature in the next release independent of HDFS 
layer data block replication:

  HBASE-1295: https://issues.apache.org/jira/browse/HBASE-1295

This is cluster-to-cluster replication, at the HBase layer, and at a finer 
granularity than the row.

HBase may also in the future evolve an optional extension to the BigTable 
architecture:

  HBASE-2357: https://issues.apache.org/jira/browse/HBASE-2357

and this I think also meets the definition of in-memory replication. While 
HBASE-2357 talks about availability, I see this as a means for offering higher 
read scalability for some use cases that can accept a relaxation of HBase's 
ACID guarantees.

So an answer to "Does HBase do in-memory replication of rows?" is also in part: 
Actually we might do that, independent of providing durability by other means.

   - Andy

> From: MauMau
> Subject: Does HBase do in-memory replication of rows?
> To: hbase-user@hadoop.apache.org
> Date: Saturday, May 8, 2010, 5:16 AM
> Hello,
> 
> I'm comparing HBase and Cassandra, which I think are the
> most promising distributed key-value stores, to determine
> which one to choose for the future OLTP and data analysis.
> I found the following benchmark report by Yahoo! Research
> which evalutes HBase, Cassandra, PNUTS, and sharded MySQL.
> 
> http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
> http://www.brianfrankcooper.net/pubs/ycsb.pdf
> 
> The above report refers to HBase 0.20.3.
> Reading this and HBase's documentation, two questions about
> load balancing and replication have risen. Could anyone give
> me any information to help solve these questions?
> 
> [Q2] replication
> Does HBase perform in-memory replication of rows like
> Cassandra?
> Does HBase sync updates to disk before returing success to
> clients?
> 
> According to the following paragraph in HBase design
> overview, HBase syncs writes.
> 
> ----------------------------------------
> Write Requests
> When a write request is received, it is first written to a
> write-ahead log called a HLog. All write requests for every
> region the region server is serving are written to the same
> HLog. Once the request has been written to the HLog, the
> result of changes is stored in an in-memory cache called the
> Memcache. There is one Memcache for each Store.
> ----------------------------------------
> 
> The source code of Put class appear to show the above
> (though I don't understand the server-side code yet):
> 
>  private boolean writeToWAL = true;
> 
> However, Yahoo's report writes as follows. Is this
> incorrect? What is in-memory replication? I know HBase
> relies on HDFS to replicate data on the storage, but not in
> memory.
> 
> ----------------------------------------
> For Cassandra, sharded MySQL and PNUTS, all updates were
> synched to disk before returning to the client. HBase does
> not sync to disk, but relies on in-memory replication
> across
> multiple servers for durability; this increases write
> throughput
> and reduces latency, but can result in data loss on
> failure.
> ----------------------------------------
> 
> Maumau
> 
> 


      

Reply via email to