[ Saturday, December  4, 1999 ] Mika Kuoppala wrote:
> My patch does this 'closest serves' thing exactly. With few
> exceptions...like the logic to kick idling disks in the game.

I'll read over the patch and see what it does... It should certainly
(and your results have agreed with this) be a large gain over current
balancing in most if not all configurations.

Unfortunately, I believe read balancing across drives is going to be
subject to heuristics that need to be settable because I believe the
optimal settings (like what defines an idling disk in your logic above)
needs to change based on the drives and controllers in question... the
range of optimal solutions may be small enough to where we could hard-code
it, but picking optimal settings based on any raid1 (or even small group
of raid1) configuration(s) doesn't make sense IMHO :)

To get an idea of what I mean (in that the "optimal" raid1 balancing
will be variable per-configuration), let's consider our cases.

When a read comes in, we have primarily two cases:
 - sequential - the last read went to drive X, so since the head of drive
                X is already at that location, it makes sense to send the same
                read to the same drive... now if this means drive X ends
                up getting overloaded with requests, we'd like to be
                able to redirect the requests over to drive Y, but only
                if drive Y's head isn't *too* far away... how overloaded
                we determine drive X to be should influence how much of
                a seek time we're willing to take on a read to drive Y

                Also, many (but certainly not all, or perhaps even most)
                drives are going to include aggressive read-ahead and
                a large on-drive cache.  Whether read-ahead exists and
                whether the data to be read is likely to already be in the
                drive's cache should *also* affect the switchover metric,
                because we don't want to waste drive X's already-executed
                read of that block.

                Lastly, the two drives (X and Y) may or may not be on the
                same SCSI channel... if they are, and we're bottlenecking
                on the channel itself, we should be much less aggressive
                about pushing the read over to drive Y since we risk
                losing lower-level gathering of the multiple requests
                to drive X that could help make the SCSI traffic much
                more efficient (one twice-as-large SCSI read request to
                X is better on a congested channel than one to X and
                one to Y, just based on saving the arbitration cycles
                if nothing else).  Of course, if they're on separate
                channels (but possibly on channels with other devices),
                we should be more aggressive about distributing the load
                over both channels to increase parallelism

 - random -     In the random case, I really feel like using "whoever's
                closest" should work fine since that should split fairly
                evenly between sticking at the same drive and moving over
                so that should help take care of balancing.

Perhaps rather than "closest drive but check for idles" we could do
something closer to the "closest drive, but max amount to a single drive
at a time" which is closer (at least in that enforced maximum) to what
the current raid1 balancing attempts.  The two may seem similar, but
I think it'd be a little faster and perhaps more efficient to do this
enforced maximum.  Best of all (imho) we can add in the sysctl /proc
entry for this maximum amount to be changed (on the fly quite easily)
to make for simple bash for loops of one echo and and tiobuild.pl call
to find a configurations optimal enforced maximum single-drive block.

Perhaps studies have shown that an optimal raid1 balancing exists
statically over a large chunk (3*sigma would be fine to me :) of
configurations and given enough disassembly patience I could find it
somewhere deep in the firmware of a Mylex DAC1164 (no idea :), so I'll
toss this to the linux-raid list to get an idea of whether I'm totally
off base here.

Thanks,

James
-- 
Miscellaneous Engineer --- IBM Netfinity Performance Development

Reply via email to