Re: problems with mmap() and disk caching

Alan Cox Fri, 06 Apr 2012 04:02:05 -0700

On 04/06/2012 03:38, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 01:25:49PM -0500, Alan Cox wrote:

On 04/05/2012 12:31, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,


I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer.  I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap:  1 pass took:   7.431046 (none: 262112; res:     32; super:
0; other:      0)
mmap:  2 pass took:   7.356670 (none: 261648; res:    496; super:
0; other:      0)
mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
0; other:      0)
mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
0; other:      0)
mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
0; other:      0)
mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
0; other:      0)
mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
0; other:      0)
mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
0; other:      0)
mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
0; other:      0)
mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
0; other:      0)
mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
0; other:      0)
mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
0; other:      0)
mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
0; other:      0)
mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
0; other:      0)
mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
0; other:      0)
mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
0; other:      0)
mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
0; other:      0)
mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
0; other:      0)
mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
0; other:      0)
mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
0; other:      0)
mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
0; other:      0)
mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
0; other:      0)
mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
0; other:      0)
mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
0; other:      0)
mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
0; other:      0)
mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
0; other:      0)
mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
0; other:      0)
mmap: 28 pass took:   0.157508 (none:      0; res: 262144; super:
0; other:      0)
mmap: 29 pass took:   0.156169 (none:      0; res: 262144; super:
0; other:      0)
mmap: 30 pass took:   0.156550 (none:      0; res: 262144; super:
0; other:      0)

If I ran this:
$ cat /mnt/random-1024>    /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:   0.337657 (none:      0; res: 262144; super:
0; other:      0)
mmap:  2 pass took:   0.186137 (none:      0; res: 262144; super:
0; other:      0)
mmap:  3 pass took:   0.186132 (none:      0; res: 262144; super:
0; other:      0)
mmap:  4 pass took:   0.186535 (none:      0; res: 262144; super:
0; other:      0)
mmap:  5 pass took:   0.190353 (none:      0; res: 262144; super:
0; other:      0)

This is what I expect.  But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years.  Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

                                         pmap_remove_all(mt);
                                         if (mt->dirty != 0)
                                                 vm_page_deactivate(mt);
                                         else
                                                 vm_page_cache(mt);

to:

                                         vm_page_dontneed(mt);

because I suspect that the current code does more harm than good.  In
theory, it saves activations of the page daemon.  However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations.  The sequential access
detection heuristic is just too easily triggered.  For example, I've
seen it triggered by demand paging of the gcc text segment.  Also, I
think that pmap_remove_all() and especially vm_page_cache() are too
severe for a detection heuristic that is so easily triggered.

Yes, I agree that such change shall be an improvement, and I expect
that Andrey will test it.

On the other hand, I do think that allocator should prefer unnamed
pages to pages which still have valid content. On my 12G desktop,
I never saw more then 100MB of cached pages, and similar numbers
are observed on the 32-48GB servers. I suppose that this is related.

On allocation, the system does prefer free pages over cached pages.
When cached pages are added to the physical memory allocator, they are
added to VM_FREEPOOL_CACHE.  When pages are allocated, they are taken
from VM_FREEPOOL_DEFAULT.  Generally, pages only move from the CACHE
pool to the DEFAULT pool when the DEFAULT pool is depleted.  (However,
occasionally, they do move because of coalescing.)  When I redid the
physical memory allocator, I looked at the rate of cached page
reactivation under the old and the new allocators.  At least for the
tests that I did the rates weren't that different.  It was low,
single-digit percentages.  I think the highest likelihood of
reactivation comes from the pages that are cached by the sequential
access heuristic because it is so overzealous.

I don't think it's related.  You see modest numbers of cached pages
simply because the page daemon met its target for the sum of free and
cached pages.  So, it just stopped moving pages from the inactive queue
into the physical memory allocator's cache/free queues.

No, I mean something else. Specifically, I mean that somehow the
preference for non-named pages does not work. At least, I cannot give
any other explanation for the following experiment.

Lets take stock HEAD without change in vm_fault.c. The initial
state of 8GB machine is as follows, the test file was not even
stat(2)-ed yet.
Mem: 37M Active, 18M Inact, 150M Wired, 236K Cache, 27M Buf, 7612M Free

Now, run the unmodified original Andrey' test with only one pass,
making sequential read of the mmap of a 5GB file from UFS volume.
After the run
Mem: 38M Active, 18M Inact, 153M Wired, 21M Cache, 30M Buf, 7586M Free

Please note that cached count increased only for 20M, and this is
for calls to vm_page_cache() worth of 5GB. In other words, it seems
that allocator almost never touches free memory, always preferring
cache. This is mostly coincides with what I saw when I profiled
original problem reported by Andrey.


Ah, I understand.

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: problems with mmap() and disk caching

Reply via email to