Introduction:

The I/O buffer of the kernel are currently allocated in buffer_map
sized statically upon boot, and never grows.  This limits the scale of
I/O performance on a host with large physical memory.  We used to tune
NBUF to cope with that problem.  This workaround, however, results in
a lot of wired pages not available for user processes, which is not
acceptable for memory-bound applications.

In order to run both I/O-bound and memory-bound processes on the same
host, it is essential to achieve:

A) allocation of buffer from kernel_map to break the limit of a map
   size, and

B) page reclaim from idle buffers to regulate the number of wired
   pages.

The patch at:

http://people.FreeBSD.org/~tanimura/patches/dynamicbuf.diff.gz

implements buffer allocation from kernel_map and reclaim of buffer
pages.  With this patch, make kernel-depend && make kernel completes
about 30-60 seconds faster on my PC.


Implementation in Detail:

A) is easy; first you need to do s/buffer_map/kernel_map/.  Since an
arbitrary number of buffer pages can be allocated dynamically, buffer
headers (struct buf) should be allocated dynamically as well.  Glue
them together into a list so that they can be traversed by boot()
et. al.

In order to accomplish B), we must find buffers both the filesystem
and I/O codes will not touch.  The clean buffer queue holds such the
buffers.  (exception: if the vnode associated with a clean buffer is
held by the namecache, it may access the buffer page.)  Thus, we
should unwire the pages of a buffer prior to enqueuing it to the clean
queue, and rewire the pages down in bremfree() if the pages are not
reclaimed.

Although unwiring gives a page a chance of being reclaimed,  we can go
further.  In Solaris, it is known that file cache pages should be
reclaimed prior to the other kinds of pages (anonymous, executable,
etc.) for a better performance.  Mainly due to a lack of time to work
on distinguishing the kind of a page to be unwired, I simply pass all
unwired pages to vm_page_dontneed().  This approach places most of the
unwired buffer pages at just one step to the cache queue.


Experimental Evaluation and Results:

The times taken to complete make kernel-depend && make kernel just
after booting into single-user mode have been measured on my ThinkPad
600E (CPU: Pentium II 366MHz, RAM: 160MB) by time(1).  The number
passed to the -j option of make(1) has been varied from 1 to 30 in
order to control the pressure of the memory demand for user processes.
The baseline is the kernel without my patch.

The following table shows the results.  All of the times are in
seconds.

-j      baseline                w/ my patch
        real    user    sys     real    user    sys
1       1608.21 1387.94 125.96  1577.88 1391.02 100.90
10      1576.10 1360.17 132.76  1531.79 1347.30 103.60
20      1568.01 1280.89 133.22  1509.36 1276.75 104.69
30      1923.42 1215.00 155.50  1865.13 1219.07 113.43

Most of the improvements in the real times are accomplished by the
speedup of system calls.  The hit ratio of getblk() may be increased,
but not examined yet.

Another interesting results are the numbers of swaps, shown below.

-j      baseline                w/ my patch
1       0                       0
10      0                       0
20      141                     77
30      530                     465

Since the baseline kernel does not free buffer pages at all(*), it may
be putting a pressure on the pages too much.

(*) bfreekva() is called only when the whole KVA is too fragmented.


Userland Interfaces:

The sysctl variable vfs.bufspace now reports the size of the pages
allocated for buffer, both wired and unwired.  A new sysctl variable,
vfs.bufwiredspace tells the size of the buffer pages wired down.

vfs.bufkvaspace returns the size of the KVA space for buffer.


Future Works:

The handling of unwired pages can be improved by scanning only buffer
pages.  In that case, we may have to run the vm page scanner more
frequently, as does Solaris.

vfs.bufspace does not track the buffer pages reclaimed by the page
scanner.  They are counted when the buffer associated with those pages
are removed from the clean queue, which is too late.

Benchmark tools concentrating on disk I/O performance (bonnie, iozone,
postmark, etc) may be more suitable than make kernel for evaluation.


Comments and flames are welcome.  Thanks a lot.

-- 
Seigo Tanimura <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to