Um. Derp. Yeah, it's actually sd[defh].

Thanks for the continuing education.

On Fri, Aug 22, 2014 at 8:24 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> G. Richard Bellamy posted on Fri, 22 Aug 2014 14:36:22 -0700 as excerpted:
>
>> An interesting exercise saw me reading data from my RAID10 to a USB
>> device, which produced the following representative iostat:
>>
>> Linux 3.14.17-1-lts (eanna) 08/22/2014 _x86_64_ (24 CPU)
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            3.53    0.00    0.50    2.83    0.00   93.14
>>
>> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
>> sda               1.89         0.01         0.01        839        998
>> sdc               0.00         0.00         0.00          1          0
>> sdb               1.23         0.02         0.01       1254        998
>
>> sdi             175.40         0.00        20.26         39    1454881
>
>> sdd               0.26         0.01         0.00        827         58
>> sde              28.86        12.29         0.00     882447         61
>> sdf               0.00         0.00         0.00          1          0
>> sdh              25.25        12.29         0.00     882448         57
>> sdg               0.25         0.01         0.00        826         60
>>
>> /dev/sdi is the USB drive, and /dev/sd[defg] are the four devices in the
>> raid10 volume. I'm reading a large (1.1T) file from the raid10 volume
>> and writing it to the USB drive.
>>
>> You can see that there are approximately two drives from the raid10
>> which are being read from - I assume this corresponds to the two spans
>> (the 'no lower than the (n/spans)x' speed I mentioned in my original
>> post - and that they aggregate to 24.58MB/s reads. This corresponds to
>> the 20.26MB/s writes to the USB drive.
>>
>> The raid10 volume is only being used for this file operation, nothing
>> else is touching it but the kernel and btrfs.
>>
>> I'm curious how others would read this?
>
> Something's not adding up.  You say sd[defg] are the btrfs raid10, but
> it's sde and sdh that are getting the read traffic.  Are you sure sdh
> isn't part of the raid10 and one of sd[dfg] (perhaps f, seeing d and g
> appear to balance out leaving f the odd one out?) is?
>
> Assuming sdh is indeed part of the raid10, it makes sense, and the fact
> that only two of the four devices are being active read matches what's
> known about btrfs raid1/10 at this point -- it has a relatively dumb read
> allocation algorithm that was good enough for a first implementation but
> obviously isn't optimal, reads are allocated based on the last bit of the
> PID (or TID IDR which), so even/odd.  Since this is a single transfer
> process, all the activity is on one or the other, so it's reading from
> the two device wide stripe, but always from the same one of the two
> mirrors supporting each strip.
>
> If you had a second read process going on and it was the same even/odd
> pid, you'd be doubling up on the same two devices.  Only with a
> relatively even mix of even/odd pid reads will you see things even out
> across all four.  See what I mean about a "relatively dumb" not well
> optimized first implementation?
>
> As they say btrfs is stabilizing now, presumably one of these kernel
> cycles we'll see something better in terms of read mirror allocation
> algorithm, perhaps as part of N-way-mirroring, when that gets implemented
> (roadmapped for after raid5/6 is completed, it's two-way-mirroring only
> now, regardless of the number of devices).
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to