Um. Derp. Yeah, it's actually sd[defh]. Thanks for the continuing education.
On Fri, Aug 22, 2014 at 8:24 PM, Duncan <1i5t5.dun...@cox.net> wrote: > G. Richard Bellamy posted on Fri, 22 Aug 2014 14:36:22 -0700 as excerpted: > >> An interesting exercise saw me reading data from my RAID10 to a USB >> device, which produced the following representative iostat: >> >> Linux 3.14.17-1-lts (eanna) 08/22/2014 _x86_64_ (24 CPU) >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 3.53 0.00 0.50 2.83 0.00 93.14 >> >> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn >> sda 1.89 0.01 0.01 839 998 >> sdc 0.00 0.00 0.00 1 0 >> sdb 1.23 0.02 0.01 1254 998 > >> sdi 175.40 0.00 20.26 39 1454881 > >> sdd 0.26 0.01 0.00 827 58 >> sde 28.86 12.29 0.00 882447 61 >> sdf 0.00 0.00 0.00 1 0 >> sdh 25.25 12.29 0.00 882448 57 >> sdg 0.25 0.01 0.00 826 60 >> >> /dev/sdi is the USB drive, and /dev/sd[defg] are the four devices in the >> raid10 volume. I'm reading a large (1.1T) file from the raid10 volume >> and writing it to the USB drive. >> >> You can see that there are approximately two drives from the raid10 >> which are being read from - I assume this corresponds to the two spans >> (the 'no lower than the (n/spans)x' speed I mentioned in my original >> post - and that they aggregate to 24.58MB/s reads. This corresponds to >> the 20.26MB/s writes to the USB drive. >> >> The raid10 volume is only being used for this file operation, nothing >> else is touching it but the kernel and btrfs. >> >> I'm curious how others would read this? > > Something's not adding up. You say sd[defg] are the btrfs raid10, but > it's sde and sdh that are getting the read traffic. Are you sure sdh > isn't part of the raid10 and one of sd[dfg] (perhaps f, seeing d and g > appear to balance out leaving f the odd one out?) is? > > Assuming sdh is indeed part of the raid10, it makes sense, and the fact > that only two of the four devices are being active read matches what's > known about btrfs raid1/10 at this point -- it has a relatively dumb read > allocation algorithm that was good enough for a first implementation but > obviously isn't optimal, reads are allocated based on the last bit of the > PID (or TID IDR which), so even/odd. Since this is a single transfer > process, all the activity is on one or the other, so it's reading from > the two device wide stripe, but always from the same one of the two > mirrors supporting each strip. > > If you had a second read process going on and it was the same even/odd > pid, you'd be doubling up on the same two devices. Only with a > relatively even mix of even/odd pid reads will you see things even out > across all four. See what I mean about a "relatively dumb" not well > optimized first implementation? > > As they say btrfs is stabilizing now, presumably one of these kernel > cycles we'll see something better in terms of read mirror allocation > algorithm, perhaps as part of N-way-mirroring, when that gets implemented > (roadmapped for after raid5/6 is completed, it's two-way-mirroring only > now, regardless of the number of devices). > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html