> On Wed, 20 Jun 2007 10:35:36 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > On Tue, 2007-06-19 at 21:44 -0700, Andrew Morton wrote: > > > Anyway, this is all arse-about. What is the design? What algorithms > > do we need to implement to do this successfully? Answer me that, then > > we can decide upon these implementation details. > > Building on the per BDI patches, how about integrating feedback from the > full-ness of device queues. That is, when we are happily doing IO and we > cannot possibly saturate the active devices (as measured by their queue > never reaching 75%?) then we can safely increase the total dirty limit. > > OTOH, when even with the per BDI dirty limit the device queue is > constantly saturated (contended) we ought to lower the total dirty > limit. > > Lots of detail here to work out, but does this sound workable?
It's pretty easy to fill the queues - I'd expect that there are a lot of not-very-heavy workloads which cause the kernel to shove a lot of little writes into the queue when it visits the blockdev mapping: a shower of inodes, directory entries, indirect blocks, etc. With very little dirty memory associated with it. But back away further. What do we actually want the kernel to *do*? Stated in terms of "when the dirty memory state is A, do B" and "when userspace does C, the kernel should do D". Top-level statement: "when userspace does anything, the kernel should not suck" ;) Some refinement is needed there. I _think_ the problem is basically one of latency: a) writes starving reads and b) dirty memory causing page reclaim to stall and c) inter-device contention on the global memory limits. Hard. If the device isn't doing anything else then we can shove data at it freely. If reads (or synchronous writes) come in then perhaps the VM should back off and permit dirty memory to go higher. The anticipatory scheduler(s) are supposed to fix this. Perhaps our queues are too long - if the VFS _does_ back off, it'll take some time for that to have an effect. Perhaps the fact that the queue size knows nothing about the _size_ of the requests in the queue is a problem. Back away even further here. What user-visible problem(s) are we attemping to fix? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/