On Wed, 1 Mar 2006, Rohit Jalan wrote:

My problem is that I need to enforce a single memory limit on the total number of pages used by multiple zones.

The limit changes dynamically based on the number of pages being used by other non-zone allocations and also on the amount of available swap and memory.

I've tried to do the same in various ways with the stock kernel but I was unsuccessful due to reasons detailed below. In the end I had to patch the UMA subsystem to achieve my goal.

Is there a better method of doing the same? Something that would not involve patching the kernel. Please advise.

Currently, UMA supports limits on allocation by keg, so if two zones don't share the same keg, they won't share the same limit. Supporting limits shared across requires a change as things stand.

On the general topic of how to implement this -- I'm not sure what the best approach is. Your approach gives quite a bit of flexibility. I wonder, though, if it would be better to add an explicit accounting feature rather than a more flexible callback feature? I.e., have a notion of a UMA accounting group which can be shared by one or more Keg to impose shared limits on multiple kegs?

Something similar to this might also be useful in the mbuf allocator, where we currently have quite a few kegs and zones floating around, making implementing a common limit quite difficult.

Robert N M Watson


----------------------------------------------------------------------
TMPFS uses multiple UMA zones to store filesystem metadata.
These zones are allocated on a per mount basis for reasons described in
the documentation. Because of fragmentation that can occur in a zone due
to dynamic allocations and frees, the actual memory in use can be more
than the sum of the contained item sizes. This makes it difficult to
track and limit the space being used by a filesystem.

Even though the zone API provides scope for custom item constructors
and destructors the necessary information (nr. pages used) is
stored inside a keg structure which itself is a part of the opaque
uma_zone_t object. One could  include <vm/uma_int.h> and access
the keg information in the custom constructor but it would require
messy code to calculate the change delta because one would have to
track the older value to see how many pages have been added or
subtracted.

The zone API also provides custom page allocation and free hooks.
These are ideal for my purpose as they allow me to control
page allocation and frees effectively. But the callback interface is
lacking, it does not allow one to specify an argument (like const & destr)
making it difficult to update custom information from within the uma_free
callback because it is not passed the zone pointer nor an argument.

Presently I have patched my private sources to modify the UMA API to
support passing an argument to the page allocation and free callbacks.
Unlike the constructor and destructor callback argument which is specified
on each call, the argument to uma_alloc or uma_free is specified
when setting the callback via uma_zone_set_allocf() or uma_zone_set_freef().
This argument is stored in the keg and passed to the callback whenever
it is called.

The scheme implemented by my patch imposes an overhead of
passing an extra argument to the uma_alloc and uma_free callbacks.
The uma_keg structure size is also increased by (2 * sizeof(void*)).

My patch changes the present custom alloc and free callback routines
(e.g., page_alloc, page_free, etc.) to accept an extra argument
which is ignored.

The static page_alloc and page_free routines are made global and
are renamed to uma_page_alloc and uma_page_free respectively.
This is so that they may be called from other custom allocators.
As is the case with my code.

----------------------------------------------------------------------

Patches:
         http://download.purpe.com/files/TMPFS_FreeBSD_7-uma-1.dif
         http://download.purpe.com/files/TMPFS_FreeBSD_7-uma-2.dif

Regards,

rohit --



On Tue, Feb 28, 2006 at 10:04:41PM +0000, Robert Watson wrote:
On Mon, 27 Feb 2006, Rohit Jalan wrote:

Is there an upper limit on the amount of fragmentation / wastage that can
occur in a UMA zone?

Is there a method to know the total number of pages used by a UMA zone at
some instance of time?

Hey there Rohit,

UMA allocates pages retrieved from VM as "slabs".  It's behavior depends a
bit on how large the allocated object is, as it's a question of packing
objects into page-sized slabs for small objects, or packing objects into
sets of pages making up a slab for larger objects.  You can
programmatically access information on UMA using libmemstat(3), which
allows you to do things like query the current object cache size, total
lifetime allocations for the zone, allocation failure count, sizes of
per-cpu caches, etc.  You may want to take a glance at the source code for
vmstat -z and netstat -m for examples of it in use.  You'll notice, for
example, that netstat -m reports on both the mbufs in active use, and also
the memory allocated to mbufs in the percpu + zone caches, since that
memory is also (for the time being) committed to the mbuf allocator.  The
mbuf code is a little hard to follow because there are actually two zones
that allocate mbufs, the mbuf zone and the packet secondary zone, so let me
know if you have any questions.

If you want to dig down a bit more, uma_int.h includes the keg and zone
definitions, and you can extracting information like the page maximum, the
number of items per page or pages per item, etc.  If there's useful
information that you need but isn't currently exposed by libmemstat, we can
add it easily enough.  You might also be interested in some of the tools at

    http://www.watson.org/~robert/freebsd/libmemstat/

Include memtop, which is basically an activity monitor for kernel memory
types.  As an FYI, kernel malloc is wrapped around UMA, so if you view both
malloc and UMA stats at once, there is double-counting.

Robert N M Watson
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to