Rich Freeman posted on Sun, 04 Oct 2015 08:21:53 -0400 as excerpted:

> On Sun, Oct 4, 2015 at 8:03 AM, Lionel Bouton
> <lionel-subscript...@bouton.name> wrote:
>>
>> This focus on single reader RAID1 performance surprises me.
>>
>> 1/ AFAIK the kernel md RAID1 code behaves the same (last time I checked
>> you need 2 processes to read from 2 devices at once) and I've never
>> seen anyone arguing that the current md code is unstable.

I'm not a coder and could be wrong, but AFAIK, md/raid1 either works per 
thread (thus should multiplex I/O across raid1 devices in single-process-
multi-thread), or handles multiple AIO requests in parallel, if not both.

(If I'm laboring under a severe misconception, and I could be, please do 
correct me -- I'll rather be publicly corrected and have just that, my 
world-view corrected to align with reality, than be wrong, publicly or 
privately, and never know it, thus never correcting it!  =:^)

IOW, the primary case where I believe md/raid1 does single-device serial 
access, is where the single process is doing just that, serialized-single-
request-sleep-until-the-data's-ready.  Otherwise read requests are spread 
among the available spindles.  =:^)

But...

> Perhaps, but with btrfs it wouldn't be hard to get 1000 processes
> reading from a raid1 in btrfs and have every single request directed to
> the same disk with the other disk remaining completely idle.  I believe
> the algorithm is just whether the pid is even or odd, and doesn't take
> into account disk activity at all, let alone disk performance or
> anything more sophisticated than that.
> 
> I'm sure md does a better job than that.

Exactly.  Even/odd PID scheduling is great for testing, since it's simple 
enough to load either side exclusively or both sides exactly evenly, but 
it's absolutely horrible for multi-task, since worst-case single-device-
bottleneck is all too easy to achieve by accident, and even pure-random 
distribution is going to favor one side or the other to some extent, most 
of the time.

Even worse, due to the most-remaining-free-space chunk allocation 
algorithm and pair-mirroring only, no matter the number of devices, try 
to use 3+ devices of differing sizes, and until the space-available on 
the largest pair reaches that of the others, that largest pair will get 
the allocations.  Consider a bunch of quarter-TiB devices in raid1, with 
a pair of 2 TiB devices as well.  The quarter-TiB devices will remain 
idle until the pair of 2 TiB devices reach 1.75 TiB full, thus equalizing 
the space available on each compared to the other devices on the 
filesystem.  Of course, that means reads too, are going to be tied to 
only those two devices, for anything in that first 1.75 TiB of data, and 
if all those reads are from even or all from odd PIDs, it's only going to 
be ONE of... perhaps 10 devices! Possibly hundreds of read threads 
bottlenecking on a single device of ten, while the other 9/10 of the 
filesystem-array remains entirely idle! =:^(

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to