Hugo Mills posted on Mon, 26 Oct 2015 09:24:57 +0000 as excerpted:

> On Mon, Oct 26, 2015 at 09:14:00AM +0000, Duncan wrote:
>> Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:
>> 
>>> I think PID-based solution is not the best one. Why not simply take a
>>> random device? Then at least all drives in the volume are equally
>>> loaded (in average).
>> 
>> Nobody argues that the even/odd-PID-based read-scheduling solution is
>> /optimal/, in a production sense at least.  But [it's near ideal for
>> testing, and "good enough" for the most general case].
> 
> For what it's worth, David tried implementing round-robin (IIRC)
> some time ago, and found that it performed *worse* than the pid-based
> system. (It may have been random, but memory says it was round-robin).

What I'd like to know is what mdraid1 uses, and if btrfs can get that.  
Because some upgrades worth ago, after trying mdraid6 for the main system 
and mdraid0 for some parts (with mdraid1 for boot since grub1 could deal 
with it, but not the others), I eventually settled on 4-way mdraid1 for 
everything, using the same disks I had used for the raid6 and raid0.

And I was rather blown away by the mdraid1 speed, in comparison, 
especially compared to raid0, which I thought would be better than 
raid1.  I guess my use-case is multi-thread read-heavy enough that the 
whatever mdraid1 uses, I was getting upto four separate reads (one per 
spindle) going at once, while writes still happened at single-spindle 
speed as with SATA (as opposed to the older IDE, this was when SATA was 
still new), each spindle had its own channel and they could write in 
parallel with bottleneck being the speed at which the slowest of the four 
completed its write.  So writes were single-spindle-speed, still far 
faster than the raid6 read-modify-write cycle, while reads... it really 
did appear to multitask one per spindle.

Also, the mdraid1 may have actually taken into account spindle head 
location as well, and scheduled reads to the spindle with the head 
already positioned closest to the target, tho I'm not sure on that.

But whatever mdraid1 scheduling does, I was totally astonished at how 
efficient it was, and it really did turn my thinking on most efficient 
raid choices upside down.  So if btrfs could simply take that scheduler 
and modify it as necessary for btrfs specifics, provided the 
modifications weren't /too/ heavy (and the fact that btrfs does read-time 
checksum verification could very well mean the algorithm as directly 
adapted as possible may not reach anything like the same efficiency), I 
really do think that'd be the ideal.  And of course it's freedomware code 
in the same kernel, so reusing the mdraid read-scheduler shouldn't be the 
problem it might be in other circumstances, tho the possible caveat of 
btrfs specific implementation issues does remain.

And of course someone would have to take the time to adapt it to work 
with btrfs, which gets us back onto the practical side of things, the 
"opportunity rich, developer-time poor" situation that is btrfs coding 
reality, premature optimization, possibly doing it at the same time as N-
way-mirroring, etc.

But anyway, mdraid's raid1 read-scheduler really does seem to be 
impressively efficient, the benchmark to try to match, if possible.  If 
that can be done by reusing some of the same code, so much the better. 
=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to