For more architectural details of HBase, check out the bigtable paper,
it's fairly detailed, short and accessible.

On Sat, May 8, 2010 at 2:39 PM, Amandeep Khurana <ama...@gmail.com> wrote:
> HBase does not do in-memory replication. Your data goes into a region, which
> has only one instance. Writes go to the write ahead log first, which is
> written to the disk. However, since HDFS doesnt yet have a fully performing
> flush functionality, there is a chance of losing the chunk of data. The next
> release of HBase will guarantee data durability since by then the flush
> functionality would be fully working.
>
> Regarding replication - the difference between Cassandra and HBase is that
> when you do a write in Cassandra, it doesnt return unless it has written to
> W nodes, which is configurable. In case of HBase, the replication is taken
> care of by the filesystem (HDFS). When the region is flushed to the disk,
> HDFS replicates the HFiles (in which the data for the regions is stored).
> For more details of the working, read the Bigtable paper and
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html.
>
>
> 2010/5/8 MauMau <maumau...@gmail.com>
>
>> Hello,
>>
>> I'm comparing HBase and Cassandra, which I think are the most promising
>> distributed key-value stores, to determine which one to choose for the
>> future OLTP and data analysis.
>> I found the following benchmark report by Yahoo! Research which evalutes
>> HBase, Cassandra, PNUTS, and sharded MySQL.
>>
>> http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
>> http://www.brianfrankcooper.net/pubs/ycsb.pdf
>>
>> The above report refers to HBase 0.20.3.
>> Reading this and HBase's documentation, two questions about load balancing
>> and replication have risen. Could anyone give me any information to help
>> solve these questions?
>>
>> [Q2] replication
>> Does HBase perform in-memory replication of rows like Cassandra?
>> Does HBase sync updates to disk before returing success to clients?
>>
>> According to the following paragraph in HBase design overview, HBase syncs
>> writes.
>>
>> ----------------------------------------
>> Write Requests
>> When a write request is received, it is first written to a write-ahead log
>> called a HLog. All write requests for every region the region server is
>> serving are written to the same HLog. Once the request has been written to
>> the HLog, the result of changes is stored in an in-memory cache called the
>> Memcache. There is one Memcache for each Store.
>> ----------------------------------------
>>
>> The source code of Put class appear to show the above (though I don't
>> understand the server-side code yet):
>>
>>  private boolean writeToWAL = true;
>>
>> However, Yahoo's report writes as follows. Is this incorrect? What is
>> in-memory replication? I know HBase relies on HDFS to replicate data on the
>> storage, but not in memory.
>>
>> ----------------------------------------
>> For Cassandra, sharded MySQL and PNUTS, all updates were
>> synched to disk before returning to the client. HBase does
>> not sync to disk, but relies on in-memory replication across
>> multiple servers for durability; this increases write throughput
>> and reduces latency, but can result in data loss on failure.
>> ----------------------------------------
>>
>> Maumau
>>
>>
>

Reply via email to