+1 for ELR. I think having some data structure where we prepare the next stage of sync() operations instead of holding the row lock over the sync would be a big win for hot regions without a huge refactor. I think the other two optimizations are useful to think about, but wouldn't have the same impact/effort ratio as ELR.
On 12/29/10 11:32 AM, "Stack" <[email protected]> wrote: >Nice list of things we need to do to make logging faster (with useful >citations on current state of art). This notion of early lock release >(ELR) is worth looking into (Jon, for high rates of counter >transactions, you've been talking about aggregating counts in front of >the WAL lock... maybe an ELR and then a hold on the transaction until >confirmation of flush would be way to go?). Regards flush-pipelining, >it would be interesting to see if there are traces of the sys-time >that Dhruba is seeing in his NN out in HBase servers. My guess is >that its probably drowned by other context switches done in our >servers. Definitely worth study. > >St.Ack >P.S. Minimizing context switches, a system for ELR and >flush-pipelining, recasting the server to make use of one of the DI or >OSGi frameworks, moving off log4j, etc..... Is it just me or do others >feel a server rewrite coming on? > > >On Mon, Dec 27, 2010 at 11:48 AM, Dhruba Borthakur <[email protected]> >wrote: >> HDFS currently uses Hadoop RPC and the server thread blocks till the >>WAL is >> written to disk. In earlier deployments, I thought we could safely >>ignore >> flush-pipelining by creating more server threads. But in our largest >>HDFS >> systems, I am starting to see 20% sys-time usage on the namenode >>machine; >> most of this could be thread scheduling. If so, then it makes sense to >> enhance the logging code to release server threads even before the WAL >>is >> flushed to disk (but, of course, we still have to delay the transaction >> response to the client till the WAL is synced to disk). >> >> Does anybody have any idea on how to figure out what percentage of the >>above >> sys-time is spent in thread scheduling vs the time spent in other system >> calls (especially in the Namenode context)? >> >> thanks, >> dhruba >> >> >> On Fri, Dec 24, 2010 at 8:17 PM, Todd Lipcon <[email protected]> wrote: >> >>> Via Hammer - I thought this was a pretty good read, some good ideas for >>> optimizations for our WAL. >>> >>> http://infoscience.epfl.ch/record/149436/files/vldb10aether.pdf >>> >>> -Todd >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> >> >> -- >> Connect to me at http://www.facebook.com/dhruba >>
