Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

Daniel Phillips Mon, 18 Jun 2001 06:58:54 -0700
On Sunday 17 June 2001 12:05, Mike Galbraith wrote:
> It _juuust_ so happens that I was tinkering... what do you think of
> something like the below?  (and boy do I ever wonder what a certain
> box doing slrn stuff thinks of it.. hint hint;)

It's too subtle for me ;-)  (Not shy about sying that because this part of 
the kernel is probably subtle for everyone.)

The question I'm tackling right now is how the system behaves when the load 
goes away, or doesn't get heavy.  Your patch doesn't measure the load 
directly - it may attempt to predict it as a function of memory pressure, but 
that's a little more loosely coupled than what I had in mind.

I'm now in the midst of hatching a patch. [1] The first thing I had to do is 
go explore the block driver code, yum yum.  I found that it already computes 
the statistic I'm interested in, namely queued_sectors, which is used to pace 
the IO on block devices.  It's a little crude - we really want this to be 
per-queue and have one queue per "spindle" - but even in its current form 
it's workable.

The idea is that when queued_sectors drops below some threshold we have 
'unused disk bandwidth' so it would be nice to do something useful with it:

  1) Do an early 'sync_old_buffers'
  2) Do some preemptive pageout

The benefit of (1) is that it lets disks go idle a few seconds earlier, and 
(2) should improve the system's latency in response to load surges.  There 
are drawbacks too, which have been pointed out to me privately, but they tend 
to be pretty minor, for example: on a flash disk you'd do a few extra writes 
and wear it out ever-so-slightly sooner.  All the same, such special devices 
can be dealt easily once we progress a little further in improving the 
kernel's 'per spindle' intelligence.

Now how to implement this.  I considered putting a (newly minted) 
wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded 
transition, and that's fine except it doesn't do the whole job: we also need 
to have the early flush for any write to a disk file while the disks are 
lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to 
trigger it.  The missing trigger could be inserted into __mark_dirty, but 
that would penalize the loaded state (a little, but that's still too much).  
Furthermore, it's probably desirable to maintain a small delay between the 
dirty and the flush.  So what I'll try first is just running kflush's timer 
faster, and make its reschedule period vary with disk load, i.e., when there 
are fewer queued_sectors, kflush looks at the dirty buffer list more often.

The rest of what has to happen in kflush is pretty straightforward.  It just 
uses queued_sectors to determine how far to walk the dirty buffer list, which 
is maintained in time-since-dirtied order.  If queued_sectors is below some 
threshold the entire list is flushed.  Note that we want to change the sense 
of b_flushtime to b_timedirtied.  It's more efficient to do it this way 
anyway.

I haven't done anything about preemptive pageout yet, but similar ideas apply.

[1] This is an experiment, do not worry, it will not show up in your tree any 
time soon.  IOW, constructive criticism appreciated, flames copied to 
/dev/null.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

Reply via email to