There was a discussion over the weekend on the incubator dist-list about Accumulo and describing what was borrowed from Hadoop-core and Hbase.
http://mail-archives.apache.org/mod_mbox/incubator-general/201109.mbox/%3C8 96611333.83615.1315154356465.javamail.r...@linzimmb04o.imo.intelink.gov%3E 5400 lines: slightly modified versions of Hadoop BCFile and related classes (our current file format extends BCFile) 4300 lines: heavily modified versions of MapFile and SequenceFile (no longer our default file format, but still included for backward compatibility) 2000 lines: heavily modified versions of HBase BlockCache and related files (Adam didn't count the tests when he said 1500 lines) 1300 lines: heavily modified versions of Hadoop BloomFilters 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo 325 lines: our Value is an immutable version of Hadoop BytesWritable 142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader On 9/5/11 5:35 PM, "Joey Echeverria" <[email protected]> wrote: >The Accumulo implementation of the WAL is a separate set of daemons. >When you write to the WAL, you send your transactions to three of the >logging servers. When you do a recovery, I believe one of the three >servers that has the WAL for the down server copies it to HDFS and >then a MapReduce job splits the log and re-inserts the recovered data. >You should have the same survivability that you get with HDFS. > >-Joey > >On Mon, Sep 5, 2011 at 5:06 PM, Bill <[email protected]> wrote: >> On 04/09/11 07:43, Mathias Herberts wrote: >>> >>> On Sep 4, 2011 1:39 AM, "Bill de hÓra"<[email protected]> wrote: >>>> >>>> On 02/09/11 19:06, Stack wrote: >>>>> >>>>> What do folks think? >>>> >>>> >>>> Not putting the log into hdfs seems like a good idea. >>> >>> I was somehow thinking the opposite as it makes irrecoverable machine >>> failures much more problematic. What makes you say it's a good idea? >>> >> >> Allows more control over the write path, specifically sequential I/O and >> crash recovery. Granted the commit needs to be replicated, but you need >>that >> regardless. Thinking a bit more it might not square with the >>regionserver >> model anyway, plus the Accumulo proposal mentions a service rather than >>a >> local disk. The WAL seems to be hardened up these days anyway making >>things >> like https://issues.apache.org/jira/browse/HBASE-4107 more of an edge >>case.. >> >> Bill >> > > > >-- >Joseph Echeverria >Cloudera, Inc. >443.305.9434
