On Mon, Jan 30, 2012 at 10:53 AM, Eric Newton <[email protected]> wrote: > I will definitely look at using HBase's WAL. What did you guys do to > distribute the log-sort/split?
In 0.90 the log sorting was done serially by the master, which as you can imagine was slow after a full outage. In 0.92 we have distributed log splitting: https://issues.apache.org/jira/browse/hbase-1364 I don't think there is a single design doc for it anywhere, but basically we use ZK as a distributed work queue. The master shoves items in there, and the region servers try to claim them. Each item corresponds to a log segment (the HBase WAL rolls periodically). After the segments are split, they're moved into the appropriate region directories, so when the region is next opened, they are replayed. -Todd > > On Mon, Jan 30, 2012 at 1:37 PM, Todd Lipcon <[email protected]> wrote: > >> On Mon, Jan 30, 2012 at 10:05 AM, Aaron Cordova >> <[email protected]> wrote: >> >> The big problem is in the fact that writing replicas in HDFS is done in >> a pipeline, rather than in parallel. There is a ticket to change this >> (HDFS-1783), but no movement on it since last summer. >> > >> > ugh - why would they change this? Pipelining maximizes bandwidth usage. >> It'd be cool if the log stream could be configured to return after written >> to one, two, or more nodes though. >> > >> >> The JIRA proposes to allow "star replication" instead of "pipeline >> replication" on a per-stream basis. Pipelining trades off latency for >> bandwidth -- multiple RTTs instead of 1 RTT. >> >> A few other notes relevant to the discussion above (sorry for losing >> the quote history): >> >> Regarding HDFS's being designed for large sequential writes rather >> than small records, that was originally true, but now its actually >> fairly efficient. We have optimizations like HDFS-895 specifically for >> the WAL use case which approximate things like group commit, and when >> you combine that with group commit at the tablet-server level you can >> get very good throughput along with durability guarantees. I haven't >> benchmarked vs Accumulo's Loggers ever, but I'd be surprised if the >> difference were substantial - we tend to be network bound on the WAL >> unless the edits are really quite tiny. >> >> We're also looking at making our WAL implementation pluggable: see >> HBASE-4529. Maybe a similar approach could be taken in Accumulo such >> that HBase could use Accumulo loggers, or Accumulo could use HBase's >> existing WAL class? >> >> -Todd >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> -- Todd Lipcon Software Engineer, Cloudera
