On 04/04/2012 02:17, Konstantin Belousov wrote:
On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer.  I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap:  1 pass took:   7.431046 (none: 262112; res:     32; super:
0; other:      0)
mmap:  2 pass took:   7.356670 (none: 261648; res:    496; super:
0; other:      0)
mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
0; other:      0)
mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
0; other:      0)
mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
0; other:      0)
mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
0; other:      0)
mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
0; other:      0)
mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
0; other:      0)
mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
0; other:      0)
mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
0; other:      0)
mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
0; other:      0)
mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
0; other:      0)
mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
0; other:      0)
mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
0; other:      0)
mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
0; other:      0)
mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
0; other:      0)
mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
0; other:      0)
mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
0; other:      0)
mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
0; other:      0)
mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
0; other:      0)
mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
0; other:      0)
mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
0; other:      0)
mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
0; other:      0)
mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
0; other:      0)
mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
0; other:      0)
mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
0; other:      0)
mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
0; other:      0)
mmap: 28 pass took:   0.157508 (none:      0; res: 262144; super:
0; other:      0)
mmap: 29 pass took:   0.156169 (none:      0; res: 262144; super:
0; other:      0)
mmap: 30 pass took:   0.156550 (none:      0; res: 262144; super:
0; other:      0)

If I ran this:
$ cat /mnt/random-1024>  /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:   0.337657 (none:      0; res: 262144; super:
0; other:      0)
mmap:  2 pass took:   0.186137 (none:      0; res: 262144; super:
0; other:      0)
mmap:  3 pass took:   0.186132 (none:      0; res: 262144; super:
0; other:      0)
mmap:  4 pass took:   0.186535 (none:      0; res: 262144; super:
0; other:      0)
mmap:  5 pass took:   0.190353 (none:      0; res: 262144; super:
0; other:      0)

This is what I expect.  But why this doesn't work without reading file
manually?
Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

What happen is that fault handler deactivates or caches the pages
previous to the one which would satisfy the fault. See the if()
statement starting at line 463 of vm/vm_fault.c. Since all pages
of the object in your test are clean, the pages are cached.

Next fault would need to allocate some more pages for different index
of the same object. What I see is that vm_reserv_alloc_page() returns a
page that is from the cache for the same object, but different pindex.
As an obvious result, the page is invalidated and repurposed. When next
loop started, the page is not resident anymore, so it has to be re-read
from disk.

I pretty sure that the pages aren't being repurposed this quickly. Instead, I believe that the explanation is to be found in mincore(). mincore() is only reporting pages that are in the object's memq as resident. It is not reporting cache pages as resident.

The behaviour of the allocator is not consistent, so some pages are not
reused, allowing the test to converge and to collect all pages of the
object eventually.

Calling madvise(MADV_RANDOM) fixes the issue, because the code to
deactivate/cache the pages is turned off. On the other hand, it also
turns of read-ahead for faulting, and the first loop becomes eternally
long.

Doing MADV_WILLNEED does not fix the problem indeed, since willneed
reactivates the pages of the object at the time of call. To use
MADV_WILLNEED, you would need to call it between faults/memcpy.

I've also never seen super pages, how to make them work?
They just work, at least for me. Look at the output of procstat -v
after enough loops finished to not cause disk activity.

I've been playing with madvise and posix_fadvise but no luck.  BTW,
posix_fadvise(POSIX_FADV_WILLNEED) does nothing as the commentary says,
shouldn't this be documented in the manual page?

All tests were run under 9.0-STABLE (r233744).

--
Andrey Zonov
/*_
  * Andrey Zonov (c) 2011
  */

#include<sys/mman.h>
#include<sys/types.h>
#include<sys/time.h>
#include<sys/stat.h>
#include<err.h>
#include<fcntl.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>

int
main(int argc, char **argv)
{
        int i;
        int fd;
        int num;
        int block;
        int pagesize;
        size_t n;
        size_t size;
        size_t none, incore, super, other;
        char *p;
        char *tmp;
        char *vec;
        char *vecp;
        struct stat sb;
        struct timeval tp, tp1, tp2;

        if (argc<  2 || argc>  4)
                errx(1, "usage: mmap<filename>  [num] [block]");

        fd = open(argv[1], O_RDONLY);
        if (fd == -1)
                err(1, "open()");

        num = 1;
        if (argc>= 3)
                num = atoi(argv[2]);

        pagesize = getpagesize();
        block = pagesize;
        if (argc == 4)
                block = atoi(argv[3]);

        if (fstat(fd,&sb) == -1)
                err(1, "fstat()");
        size = sb.st_size;

#if 0
        if (posix_fadvise(fd, (off_t)0, (off_t)0, POSIX_FADV_WILLNEED) == -1)
                err(1, "posix_fadvise()");
#endif

        p = mmap(NULL, sb.st_size, PROT_READ, /*MAP_PREFAULT_READ |*/ 
MAP_PRIVATE, fd, (off_t)0);
        if (p == MAP_FAILED)
                err(1, "mmap()");

#if 0
        if (madvise(p, (size_t)size, MADV_WILLNEED) == -1)
                err(1, "madvise()");
#endif

        tmp = calloc(1, block);
        if (tmp == NULL)
                err(1, "calloc()");
        vec = calloc(1, size / pagesize);
        if (vec == NULL)
                err(1, "calloc()");
        for (i = 0; i<  num; i++) {
                gettimeofday(&tp1, NULL);
                for (n = 0; n<  size / block; n++)
                        memcpy(tmp, p + (n * block), block);
                gettimeofday(&tp2, NULL);
                timersub(&tp2,&tp1,&tp);

                if (mincore(p, size, vec) == -1)
                        err(1, "mincore()");

                none = incore = super = other = 0;
                for (vecp = vec; (size_t)(vecp - vec)<  size / pagesize; 
vecp++) {
                        if (*vecp == 0)
                                none++;
                        else if (*vecp&  MINCORE_INCORE)
                                incore++;
                        else if (*vecp&  MINCORE_SUPER)
                                super++;
                        else
                                other++;
                }
                warnx("%2d pass took: %3ld.%06ld (none: %6ld; res: %6ld; super: 
%6ld; other: %6ld)",
                   i + 1, tp.tv_sec, tp.tv_usec, none, incore, super, other);
        }
        free(vec);
        free(tmp);

        if (munmap(p, sb.st_size) == -1)
                err(1, "munmap()");

        close(fd);

        exit(0);
}
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Reply via email to