I have been playing with out-of-process compaction currently. I will post my results (and patch) when it is ready. The compaction in HBase is the area of a multitude of possible improvements.
-Vladimir Rodionov On Thu, Oct 9, 2014 at 2:30 PM, Andrew Purtell <[email protected]> wrote: > On Thu, Oct 9, 2014 at 6:31 AM, Jean-Marc Spaggiari < > [email protected] > > wrote: > > > For #4, one more thing me might want to add is a safety valve to increase > > throttle in case compaction queue become bigger than a certain value? > > > > JM > > > > > Would that make resolution of the problem leading to a large queue in the > first place more difficult do you think? > > > > > > 2014-10-09 1:20 GMT-04:00 lars hofhansl <[email protected]>: > > > > > Hi Michael, > > > > > > your math is right. > > > > > > > > > I think the issue is that it actually is easy to max out the ToR switch > > > (and hence starve out other traffic), so we might want to protect the > ToR > > > switch from prolonged heavy compaction traffic in order to keep some of > > the > > > bandwidth free for other traffic. > > > Vladimir issues were around slowing other traffic while compactions are > > > running. > > > > > > > > > -- Lars > > > > > > > > > > > > ----- Original Message ----- > > > From: Michael Segel <[email protected]> > > > To: [email protected]; lars hofhansl <[email protected]> > > > Cc: Vladimir Rodionov <[email protected]> > > > Sent: Wednesday, October 8, 2014 12:30 PM > > > Subject: Re: Compactions nice to have features > > > > > > > > > > > > On Oct 5, 2014, at 11:01 PM, lars hofhansl <[email protected]> wrote: > > > > > > >>> - rack IO throttle. We should add that to accommodate for over > > > subscription at the ToR level. > > > >> Can you decipher that, Lars? > > > > > > > > ToR is "Top of Rack" switch. Over subscription means that a ToR > switch > > > usually does not have enough bandwidth to serve traffic in and out of > > rack > > > at full speed. > > > > For example if you had 40 machines in a rack with 1ge links each, and > > > the ToR switch has a 10ge uplink, you'd say the ToR switch is 4 to 1 > over > > > subsctribed. > > > > > > > > > > > > Was just trying to say: "Yeah, we need that" :) > > > > > > > > > > > > > Hmmm. > > > > > > Rough math… using 3.5” SATA II (7200 RPM) drives … 4 drives would max > > out > > > 1GbE. So then a server with 12 drives would max out 3Gb/S. Assuming > > 3.5” > > > drives. 2.5” drives and SATAIII would push this up. > > > So in theory you could get 5Gb/S or more from a node. > > > > > > 16 serves per rack… (again YMMV based on power, heat, etc … ) thats > > 48Gb/S > > > and up. > > > > > > If you had 20 servers and they had smaller (2.5” drives) 5Gb/S x 20 = > > > 100Gb/S. > > > > > > So what’s the width of the fabric? (YMMV based on ToR) > > > > > > I don’t know why you’d want to ‘throttle’ because the limits of the ToR > > > would throttle you already. > > > > > > Of course I’m assuming that you’re running a M/R job that’s going full > > > bore. > > > > > > > > > Are you seeing this? > > > I would imagine that you’d have a long running job maxing out the I/O > and > > > seeing a jump in wait CPU over time. > > > > > > And what’s the core to spindle ratio? > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
