I think it was "newbie question on stopping Hbase". So in 0.20 and prior, the only way the WAL was really useful was if a WAL file was closed (that's why we keep them in small size ~64MB). Data loss in the face of machine failure is real.
In Hadoop 0.21, which includes the popular HDFS-265, we currently use the "hflush" feature which, once called, guarantees us that the appended edits are sent to 3 replicas so that all data is durably persisted (unless the 3 nodes die around the same time). Checkout the latest HBase trunk with the 0.21 Hadoop branch to test it, but I can already tell you that it works very very well. J-D On Mon, Nov 30, 2009 at 6:17 PM, JQ Hadoop <[email protected]> wrote: > I wonder if you can point me to the message or the title of the > message. It seems that I cannot find the message ... > > Thanks, > -JQ > > On Mon, Nov 30, 2009 at 4:36 PM, Berk D. Demir <[email protected]> wrote: >> Short answer is "yes you can still lose data". >> It's about HDFS and HDFS-265 will solve this. >> Similar question was in the list a couple of days ago. >> Here's Ryan's answer. >> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200911.mbox/browser >> >> On Mon, Nov 30, 2009 at 00:18, JQ Hadoop <[email protected]> wrote: >>> I have a question regarding the durability of HBase. After I have put >>> a record to HBase and have received a confirmation of the put from the >>> region server (assuming both autoFlush and writeToWAL are set to >>> true), can I be sure that my record will be there no matter what? I've >>> checked the code and found that the region server will first append a >>> log record to WAL (a SequenceFile) and then may call the sync() >>> function; however, it appears to me that this does not guarantee the >>> log record is in HDFS even if the sync() function is called. That is >>> to say, in case of RegionServer crash, the records put into HBase may >>> be lost; am I missing anything? >>> >>> Thanks, >>> -JQ >>> >> >
