I will definitely look at using HBase's WAL.  What did you guys do to
distribute the log-sort/split?

Actually, if you have any design description (Jira, or email)... I would
love to read it over.

-Eric

On Mon, Jan 30, 2012 at 1:37 PM, Todd Lipcon <[email protected]> wrote:

> On Mon, Jan 30, 2012 at 10:05 AM, Aaron Cordova
> <[email protected]> wrote:
> >> The big problem is in the fact that writing replicas in HDFS is done in
> a pipeline, rather than in parallel. There is a ticket to change this
> (HDFS-1783), but no movement on it since last summer.
> >
> > ugh - why would they change this? Pipelining maximizes bandwidth usage.
> It'd be cool if the log stream could be configured to return after written
> to one, two, or more nodes though.
> >
>
> The JIRA proposes to allow "star replication" instead of "pipeline
> replication" on a per-stream basis. Pipelining trades off latency for
> bandwidth -- multiple RTTs instead of 1 RTT.
>
> A few other notes relevant to the discussion above (sorry for losing
> the quote history):
>
> Regarding HDFS's being designed for large sequential writes rather
> than small records, that was originally true, but now its actually
> fairly efficient. We have optimizations like HDFS-895 specifically for
> the WAL use case which approximate things like group commit, and when
> you combine that with group commit at the tablet-server level you can
> get very good throughput along with durability guarantees. I haven't
> benchmarked vs Accumulo's Loggers ever, but I'd be surprised if the
> difference were substantial - we tend to be network bound on the WAL
> unless the edits are really quite tiny.
>
> We're also looking at making our WAL implementation pluggable: see
> HBASE-4529. Maybe a similar approach could be taken in Accumulo such
> that HBase could use Accumulo loggers, or Accumulo could use HBase's
> existing WAL class?
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to