On Tue, 03 Apr 2007 19:46:04 +0900 Tomoki Sekiyama <[EMAIL PROTECTED]> wrote:
> This patchset is to avoid the problem that write(2) can be blocked for a > long time if a system has several disks with different speed and is > under heavy I/O pressure. > > -Description of the problem: > While Dirty+Writeback pages get more than 40%(`dirty_ratio') of memory, > generators of dirty pages are blocked in balance_dirty_pages() until > they start writeback of a specific number (`write_chunk', typically=1536) > of dirty pages on the disks they write to. > > Under this rule, if a process writes to the disk which has only a few > (less than 1536) dirty pages, that process will be blocked until > writeback of the other disks is completed and % of Dirty+Writeback goes > below 40%. > > Thus, if a slow device (such as a USB disk) has many dirty pages, the > processes which write small data to the other disks can be blocked for > quite a long time. > > -Solution: > This patch introduces high/low-watermark algorithm in > balance_dirty_pages() in order to throttle only the processes which > write to disks with heavy load. > > This patch adds `dirty_start_writeback_ratio' for the low-watermark, > and modifies get_dirty_limits() to calculate and return the writeback > starting level of dirty pages based on `dirty_start_writeback_ratio'. > > If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of > dirty pages start writeback of dirty pages by themselves. At that time, > these processes are not blocked in balance_dirty_pages(), but they may > be blocked if the write-requests-queue of the written disk is full > (that is, the length of the queue > `nr_requests'). By this behavior, > we can throttle only processes which write to the disks with heavy load, > and can allow processes to write to the other disks without blocking. > > If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages > are throttled as current Linux does, not to fill up memory with dirty > pages. Does this actually solve the problem? If the request queue is sufficiently large (relative to the various dirty-memory thresholds) then I'd expect that a heavy-writer will be able to very quickly take the total dirty+writeback memory up to the dirty_ratio (should be renamed throttle_threshold, but it's too late for that). I suspect the reason why this patch was successful in your testing was because dirty_start_writeback_ratio happens to exceed the size of the disk request queues, so the heavy writer is getting stuck on disk request queue exhaustion. But that won't work if we have a lot of processes writing to a lot of disks, and it won't work if the request queue size is large, or if the dirty-memory thresholds are small (relative to the request queue size). Do the patches still work after `echo 10000 > /sys/block/sda/queue/nr_requests'? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/