On Mon, Jan 30, 2012 at 10:05 AM, Aaron Cordova <[email protected]> wrote: >> The big problem is in the fact that writing replicas in HDFS is done in a >> pipeline, rather than in parallel. There is a ticket to change this >> (HDFS-1783), but no movement on it since last summer. > > ugh - why would they change this? Pipelining maximizes bandwidth usage. It'd > be cool if the log stream could be configured to return after written to one, > two, or more nodes though. >
The JIRA proposes to allow "star replication" instead of "pipeline replication" on a per-stream basis. Pipelining trades off latency for bandwidth -- multiple RTTs instead of 1 RTT. A few other notes relevant to the discussion above (sorry for losing the quote history): Regarding HDFS's being designed for large sequential writes rather than small records, that was originally true, but now its actually fairly efficient. We have optimizations like HDFS-895 specifically for the WAL use case which approximate things like group commit, and when you combine that with group commit at the tablet-server level you can get very good throughput along with durability guarantees. I haven't benchmarked vs Accumulo's Loggers ever, but I'd be surprised if the difference were substantial - we tend to be network bound on the WAL unless the edits are really quite tiny. We're also looking at making our WAL implementation pluggable: see HBASE-4529. Maybe a similar approach could be taken in Accumulo such that HBase could use Accumulo loggers, or Accumulo could use HBase's existing WAL class? -Todd -- Todd Lipcon Software Engineer, Cloudera
