"Stephen C. Tweedie" wrote:


> > Let us define a buffer's state as FLUSHTIME_NON_EXPANDING if flushing it
> > requires no additional memory, and FLUSHTIME_EXPANDING otherwise.
>
> Yes, the XFS people were using a similar "reserved" flag to indicate
> whether a particular memory allocation was accounted against dirty
> reserved memory, and they want Linux to support an upper bound on such
> reserved memory so that they can guarantee memory deadlock freedom.
> ext3 would certainly be able to benefit from such a feature in the VM.
>
> > I see the following separate issues:
>
> > how to drive a kernel subsystem to flush some memory.  I advocate that
> > the vm system push, and the subsystems give it calls for doing the
> > pushing.
>
> Yep.  The main problem is a matter of "cost": it is more expensive to
> free up journaled buffer_heads than to free up unreferenced cache pages,
> for example.  The "priority" counters in the vm try_to_free_page loop
> are an obvious place to access such cost information, so that we don't
> try too hard to wait for journal commits if we aren't all that short of
> memory.
>
> > How to ensure that there is at least largest_reservation buffers of
> > FLUSHTIME_NON_EXPANDING memory at all times, where largest_reservation
> > is the sum of the amount every kernel subsystem says it might need at
> > maximum.
>
> Would it be better to place an upper limit on the EXPANDING memory
> instead?  That way, you wouldn't care where the extra memory required to
> flush those buffers comes from, and you'd give the VM to free up
> whatever stuff in core was least useful at the time you need the
> memory.
>
> > There would be a reserve() and unreserve() for the kernel subsystems
> > to call.  I hypothesize that if largest_reservation is unnecessarily
> > large, so long as it is not completely obscene performance will not
> > suffer (and might gain), and the code simplicity/performance will be
> > improved as a result of using the maximum possible to need rather than
> > tracking the amount actually needed.
>
> The counter-argument is that if you have deferred allocation of written
> data, then the more reserved data you allow, the later you allow the
> allocation to take place and the more flexibility you have to let the
> filesystem choose appropriate allocations for data which is being
> written sequentially.  This is definitely something that XFS want.

I merely hypothesize that the maximum value of required FLUSHTIME_NON_EXPANDING
will usually be less than 1% of memory, and therefor won't have an impact.  It
is not like keeping 1% of memory around for use by text segments and other
FLUSHTIME_NON_EXPANDING buffers is likely to be a bad thing.

>
>
> > For reiserfs, it would simplify our balancing code (fix_nodes() in
> > particular) and improve our performance if we could efficiently
> > reserve.  Roma, think about this.
>
> It should definitely be possible to establish a fairly clean common
> kernel API for this.  Doing so would have the extra advantage that if
> you had mixed ReiserFS and XFS partitions on the same machine, the
> VM's memory reservation would be able to cope cleanly with multiple
> users of reserved memory.
>
> --Stephen

Ok, so we agree that we need it, and the details we are still refining.

Hans

--
Get Linux (http://www.kernel.org) plus ReiserFS
 (http://devlinux.org/namesys).  If you sell an OS or
internet appliance, buy a port of ReiserFS!  If you
need customizations and industrial grade support, we sell them.


Reply via email to