Re: problems with mmap() and disk caching
On 4/30/12 3:49 AM, Alan Cox wrote: On 04/11/2012 01:07, Andrey Zonov wrote: On 10.04.2012 20:19, Alan Cox wrote: On 04/09/2012 10:26, John Baldwin wrote: On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially vm_page_cache() are too severe for a detection heuristic that is so easily triggered. Are you planning to commit this? Not yet. I did some tests with a file that was several times larger than DRAM, and I didn't like what I saw. Initially, everything behaved as expected, but about halfway through the test the bulk of the pages were active. Despite the call to pmap_clear_reference() in vm_page_dontneed(), the page daemon is finding the pages to be
Re: problems with mmap() and disk caching
On 04/11/2012 01:07, Andrey Zonov wrote: On 10.04.2012 20:19, Alan Cox wrote: On 04/09/2012 10:26, John Baldwin wrote: On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially vm_page_cache() are too severe for a detection heuristic that is so easily triggered. Are you planning to commit this? Not yet. I did some tests with a file that was several times larger than DRAM, and I didn't like what I saw. Initially, everything behaved as expected, but about halfway through the test the bulk of the pages were active. Despite the call to pmap_clear_reference() in vm_page_dontneed(), the page daemon is finding the pages to be referenced and reactivating
Re: problems with mmap() and disk caching
On 10.04.2012 20:19, Alan Cox wrote: On 04/09/2012 10:26, John Baldwin wrote: On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially vm_page_cache() are too severe for a detection heuristic that is so easily triggered. Are you planning to commit this? Not yet. I did some tests with a file that was several times larger than DRAM, and I didn't like what I saw. Initially, everything behaved as expected, but about halfway through the test the bulk of the pages were active. Despite the call to pmap_clear_reference() in vm_page_dontneed(), the page daemon is finding the pages to be referenced and reactivating them. The net result is that the time it takes to
Re: problems with mmap() and disk caching
On 04/09/2012 10:26, John Baldwin wrote: On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I
mlock/mlockall (was: Re: problems with mmap() and disk caching)
Andrey writes: Wired memory: kernel memory and yes, application may get wired memory through mlock()/mlockall(), but I haven't seen any real application which calls mlock(). Apps with real time considerations may need to lock memory to prevent having to wait for page/swap. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: problems with mmap() and disk caching
On 06.04.2012 12:13, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: On 05.04.2012 23:41, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote: On 05.04.2012 19:54, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: [snip] This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); Thanks Alan! Now it works as I expect! But I have more questions to you and kib@. They are in my test below. So, prepare file as earlier, and take information about memory usage from top(1). After preparation, but before test: Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free First run: $ ./mmap /mnt/random mmap: 1 pass took: 7.462865 (none: 0; res: 262144; super: 0; other: 0) No super pages after first run, why?.. Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free Now the file is in inactive memory, that's good. Second run: $ ./mmap /mnt/random mmap: 1 pass took: 0.004191 (none: 0; res: 262144; super: 511; other: 0) All super pages are here, nice. Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free Wow, all inactive pages moved to active and sit there even after process was terminated, that's not good, what do you think? Why do you think this is 'not good' ? You have plenty of free memory, there is no memory pressure, and all pages were referenced recently. THere is no reason for them to be deactivated. I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. Let's talk about that in details. My understanding is the following: Active memory: the memory which is referenced by application. An application may get memory only through mmap() (allocator don't use brk()/sbrk() any more). The resident memory of an application is the sum of physical used memory. So, sum of RSS is active memory. Inactive memory: the memory which has no references. Once we call read() on the file, the file is in inactive memory, because we have no references to this object, we just read it. This is also released memory by free(). Cache memory: I don't know what is it. It's always small enough to not think about it. Wired memory: kernel memory and yes, application may get wired memory through mlock()/mlockall(), but I haven't seen any real application which calls mlock(). Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. You do use UFS, right ? Yes. There is enough buffer headers and buffer KVA to have buffers allocated for the whole file content. Since buffers wire corresponding pages, you get pages migrated to wired. When there appears a buffer pressure (i.e., any other i/o started), the buffers will be repurposed and pages moved to inactive. OK, how can I get amount of disk cache? You cannot. At least I am not aware of any counter that keeps track of the resident pages belonging to vnode pager. Buffers should not be thought as disk cache, pages cache disk content. Instead, VMIO buffers only provide bread()/bwrite() compatible interface to the page cache (*) for filesystems. (*) - The cache term is used in generic term, not to confuse with cached pages counter from top etc. Yes, I know that. I try once again to ask my question about buffers. Is this reasonable to use for them 10% of the physical memory or we may set rational upper limit automatically? Could you please give me explanation about active/inactive/wired memory? because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially
Re: problems with mmap() and disk caching
On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote: On 06.04.2012 12:13, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: On 05.04.2012 23:41, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote: On 05.04.2012 19:54, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: [snip] This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); Thanks Alan! Now it works as I expect! But I have more questions to you and kib@. They are in my test below. So, prepare file as earlier, and take information about memory usage from top(1). After preparation, but before test: Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free First run: $ ./mmap /mnt/random mmap: 1 pass took: 7.462865 (none: 0; res: 262144; super: 0; other: 0) No super pages after first run, why?.. Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free Now the file is in inactive memory, that's good. Second run: $ ./mmap /mnt/random mmap: 1 pass took: 0.004191 (none: 0; res: 262144; super: 511; other: 0) All super pages are here, nice. Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free Wow, all inactive pages moved to active and sit there even after process was terminated, that's not good, what do you think? Why do you think this is 'not good' ? You have plenty of free memory, there is no memory pressure, and all pages were referenced recently. THere is no reason for them to be deactivated. I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. Let's talk about that in details. My understanding is the following: Active memory: the memory which is referenced by application. An Assuming the part 'by application' is removed, this sentence is almost right. Any managed mapping of the page participates in the active references. application may get memory only through mmap() (allocator don't use brk()/sbrk() any more). The resident memory of an application is the sum of physical used memory. So, sum of RSS is active memory. First, brk/sbrk is still used. Second, there is no requirement that resident pages are referenced. E.g. page could have participated in the buffer, and unwiring on the buffer dissolve put it into inactive state. Or pagedaemon cleared the reference and moved the page to inactive queue. Or the page was prefaulted by different optimizations. More, there is subtle difference between 'resident' and 'not causing fault on access'. Page may be resident, but pte was not preinstalled, or pte was flushed etc. Inactive memory: the memory which has no references. Once we call read() on the file, the file is in inactive memory, because we have no references to this object, we just read it. This is also released memory by free(). On buffers dissolve, buffer cache explicitely puts pages constituing the buffer, into the inactive queue. In fact, this is not quite right, e.g. if the same pages are mapped and actively referenced, then pagedaemon has slightly more work now to move the page from inactive to active. And, free(3) operates at so much higher level then vm subsystem that describing the interaction between these two is impossible in any definitive mood. Old naive mallocs put block description at the beggining of the block, actually causing free() to reference at least the first page of the block. Jemalloc often does madvise(MADV_FREE) for large freed allocations. MADV_FREE moves pages between queues probabalistically. Cache memory: I don't know what is it. It's always small enough to not think about it. This was the bug you reported, and which Alan fixed on Sunday. Wired memory: kernel memory and yes, application may get wired memory through mlock()/mlockall(), but I haven't seen any real application which calls mlock(). ntpd, amd from the base system. gpg and similar programs try to mlock key store to avoid sensitive material leakage to the
Re: problems with mmap() and disk caching
On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov kostik...@gmail.com wrote: On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote: On 06.04.2012 12:13, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: [snip] I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. Let's talk about that in details. My understanding is the following: Active memory: the memory which is referenced by application. An Assuming the part 'by application' is removed, this sentence is almost right. Any managed mapping of the page participates in the active references. application may get memory only through mmap() (allocator don't use brk()/sbrk() any more). The resident memory of an application is the sum of physical used memory. So, sum of RSS is active memory. First, brk/sbrk is still used. Second, there is no requirement that resident pages are referenced. E.g. page could have participated in the buffer, and unwiring on the buffer dissolve put it into inactive state. Or pagedaemon cleared the reference and moved the page to inactive queue. Or the page was prefaulted by different optimizations. More, there is subtle difference between 'resident' and 'not causing fault on access'. Page may be resident, but pte was not preinstalled, or pte was flushed etc. From the user point of view: how can the memory be active if no-one (I mean application) use it? What I really saw not at once is that the program for a long time worked with big mmap()'ed file, couldn't work well (many page faults) with new version of the file, until I manually flushed active memory by FS re-mounting. New version couldn't force out the old one. In my opinion if VM moved cached objects to inactive queue after program termination I wouldn't see this problem. Inactive memory: the memory which has no references. Once we call read() on the file, the file is in inactive memory, because we have no references to this object, we just read it. This is also released memory by free(). On buffers dissolve, buffer cache explicitely puts pages constituing the buffer, into the inactive queue. In fact, this is not quite right, e.g. if the same pages are mapped and actively referenced, then pagedaemon has slightly more work now to move the page from inactive to active. Yes, sure, if someone else use the object it should be active and even better to introduce new SHARED counter, like one is in MacOSX and Linux. And, free(3) operates at so much higher level then vm subsystem that describing the interaction between these two is impossible in any definitive mood. Old naive mallocs put block description at the beggining of the block, actually causing free() to reference at least the first page of the block. Jemalloc often does madvise(MADV_FREE) for large freed allocations. MADV_FREE moves pages between queues probabalistically. That's exactly what I meant by free(). We drop act_count to 0 and move page to inactive queue by vm_page_dontneed() Cache memory: I don't know what is it. It's always small enough to not think about it. This was the bug you reported, and which Alan fixed on Sunday. I've tested this patch under 9.0-STABLE and should say that it introduces problems with interactivity on heavy disk loaded machines. With the patch that I tested before I didn't observe such problems. Wired memory: kernel memory and yes, application may get wired memory through mlock()/mlockall(), but I haven't seen any real application which calls mlock(). ntpd, amd from the base system. gpg and similar programs try to mlock key store to avoid sensitive material leakage to the swap. cdrecord(8) tried to mlock itself to avoid indefinite stalls during write. Nice catch ;-) Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. You do use UFS, right ? Yes. There is enough buffer headers and buffer KVA to have buffers allocated for the whole file content. Since buffers wire corresponding pages, you get pages migrated to wired. When there appears a buffer pressure (i.e., any other i/o started), the buffers will be repurposed and pages moved to inactive. OK, how can I get amount of disk cache? You cannot. At least I am not aware of any counter that keeps track of the resident pages belonging to vnode pager. Buffers should not be thought as disk cache, pages cache disk
Re: problems with mmap() and disk caching
On Mon, Apr 09, 2012 at 03:35:30PM +0400, Andrey Zonov wrote: On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov kostik...@gmail.com wrote: On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote: On 06.04.2012 12:13, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: [snip] I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. Let's talk about that in details. My understanding is the following: Active memory: the memory which is referenced by application. An Assuming the part 'by application' is removed, this sentence is almost right. Any managed mapping of the page participates in the active references. application may get memory only through mmap() (allocator don't use brk()/sbrk() any more). The resident memory of an application is the sum of physical used memory. So, sum of RSS is active memory. First, brk/sbrk is still used. Second, there is no requirement that resident pages are referenced. E.g. page could have participated in the buffer, and unwiring on the buffer dissolve put it into inactive state. Or pagedaemon cleared the reference and moved the page to inactive queue. Or the page was prefaulted by different optimizations. More, there is subtle difference between 'resident' and 'not causing fault on access'. Page may be resident, but pte was not preinstalled, or pte was flushed etc. From the user point of view: how can the memory be active if no-one (I mean application) use it? What I really saw not at once is that the program for a long time worked with big mmap()'ed file, couldn't work well (many page faults) with new version of the file, until I manually flushed active memory by FS re-mounting. New version couldn't force out the old one. In my opinion if VM moved cached objects to inactive queue after program termination I wouldn't see this problem. Moving pages to inactive just because some mapping was destroyed is plain silly. The pages migrate between active/inactive/cache/free by the pagedaemon algorithms. BTW, you do not need to actually remount filesystem to flush pages of its vnodes. It is enough to try to unmount it while cd to filesystem root. Inactive memory: the memory which has no references. Once we call read() on the file, the file is in inactive memory, because we have no references to this object, we just read it. This is also released memory by free(). On buffers dissolve, buffer cache explicitely puts pages constituing the buffer, into the inactive queue. In fact, this is not quite right, e.g. if the same pages are mapped and actively referenced, then pagedaemon has slightly more work now to move the page from inactive to active. Yes, sure, if someone else use the object it should be active and even better to introduce new SHARED counter, like one is in MacOSX and Linux. Counter for what ? There is already the ref counter for a vm object. And, free(3) operates at so much higher level then vm subsystem that describing the interaction between these two is impossible in any definitive mood. Old naive mallocs put block description at the beggining of the block, actually causing free() to reference at least the first page of the block. Jemalloc often does madvise(MADV_FREE) for large freed allocations. MADV_FREE moves pages between queues probabalistically. That's exactly what I meant by free(). We drop act_count to 0 and move page to inactive queue by vm_page_dontneed() Cache memory: I don't know what is it. It's always small enough to not think about it. This was the bug you reported, and which Alan fixed on Sunday. I've tested this patch under 9.0-STABLE and should say that it introduces problems with interactivity on heavy disk loaded machines. With the patch that I tested before I didn't observe such problems. Wired memory: kernel memory and yes, application may get wired memory through mlock()/mlockall(), but I haven't seen any real application which calls mlock(). ntpd, amd from the base system. gpg and similar programs try to mlock key store to avoid sensitive material leakage to the swap. cdrecord(8) tried to mlock itself to avoid indefinite stalls during write. Nice catch ;-) Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. You do use UFS,
Re: problems with mmap() and disk caching
On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon
Re: problems with mmap() and disk caching
On 05.04.2012 23:54, Andrey Zonov wrote: On 05.04.2012 23:41, Konstantin Belousov wrote: You do use UFS, right ? Yes. I've run test on ZFS. Mem: 2645M Active, 363M Inact, 2042M Wired, 1406M Buf, 42G Free $ ./mmap /mnt/random Mem: 3669M Active, 363M Inact, 3067M Wired, 1406M Buf, 40G Free It eats 2Gb as I understand. # umount /mnt # zfs mount -a Mem: 2645M Active, 363M Inact, 2042M Wired, 1406M Buf, 42G Free $ cat /mnt/random /dev/null Mem: 2645M Active, 363M Inact, 3067M Wired, 1406M Buf, 41G Free That's correct - 1Gb. About Buf memory. Is this reasonable to set it to 10% of physical memory? I've lost 10Gb by default on machines with 96Gb. -- Andrey Zonov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: problems with mmap() and disk caching
On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: On 05.04.2012 23:41, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote: On 05.04.2012 19:54, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: [snip] This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); Thanks Alan! Now it works as I expect! But I have more questions to you and kib@. They are in my test below. So, prepare file as earlier, and take information about memory usage from top(1). After preparation, but before test: Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free First run: $ ./mmap /mnt/random mmap: 1 pass took: 7.462865 (none: 0; res: 262144; super: 0; other: 0) No super pages after first run, why?.. Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free Now the file is in inactive memory, that's good. Second run: $ ./mmap /mnt/random mmap: 1 pass took: 0.004191 (none: 0; res: 262144; super: 511; other: 0) All super pages are here, nice. Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free Wow, all inactive pages moved to active and sit there even after process was terminated, that's not good, what do you think? Why do you think this is 'not good' ? You have plenty of free memory, there is no memory pressure, and all pages were referenced recently. THere is no reason for them to be deactivated. I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. You do use UFS, right ? Yes. There is enough buffer headers and buffer KVA to have buffers allocated for the whole file content. Since buffers wire corresponding pages, you get pages migrated to wired. When there appears a buffer pressure (i.e., any other i/o started), the buffers will be repurposed and pages moved to inactive. OK, how can I get amount of disk cache? You cannot. At least I am not aware of any counter that keeps track of the resident pages belonging to vnode pager. Buffers should not be thought as disk cache, pages cache disk content. Instead, VMIO buffers only provide bread()/bwrite() compatible interface to the page cache (*) for filesystems. (*) - The cache term is used in generic term, not to confuse with cached pages counter from top etc. Could you please give me explanation about active/inactive/wired memory? because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially vm_page_cache() are too severe for a detection heuristic that is so easily triggered. [snip] -- Andrey Zonov -- Andrey Zonov pgpb0p2pbb0W7.pgp Description: PGP signature
Re: problems with mmap() and disk caching
On Thu, Apr 05, 2012 at 01:25:49PM -0500, Alan Cox wrote: On 04/05/2012 12:31, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page
Re: problems with mmap() and disk caching
On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. What happen is that fault handler deactivates or caches the pages previous to the one which would satisfy the fault. See the if() statement starting at line 463 of vm/vm_fault.c. Since all pages of the object in your test are clean, the pages are cached. Next fault would need to allocate some more pages for different index of the same object. What I see is that vm_reserv_alloc_page() returns a page that is from the cache for the same object, but different pindex. As an obvious result, the page is invalidated and repurposed. When next loop started, the page is not resident anymore, so it has to be re-read from disk. I pretty sure that the pages aren't being repurposed this quickly. Instead, I believe that the explanation is to be found in mincore(). mincore() is only reporting pages that are in the object's memq as resident. It is not reporting cache pages as resident. The behaviour of the allocator is not consistent, so some pages are not reused, allowing the test to converge and to collect all pages of the object eventually. Calling madvise(MADV_RANDOM) fixes
Re: problems with mmap() and disk caching
On 04/06/2012 03:38, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 01:25:49PM -0500, Alan Cox wrote: On 04/05/2012 12:31, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024/dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just
Re: problems with mmap() and disk caching
On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially vm_page_cache() are too severe for a detection heuristic
Re: problems with mmap() and disk caching
On 04/04/2012 04:36, Andrey Zonov wrote: On 04.04.2012 11:17, Konstantin Belousov wrote: Calling madvise(MADV_RANDOM) fixes the issue, because the code to deactivate/cache the pages is turned off. On the other hand, it also turns of read-ahead for faulting, and the first loop becomes eternally long. Now it takes 5 times longer. Anyway, thanks for explanation. Doing MADV_WILLNEED does not fix the problem indeed, since willneed reactivates the pages of the object at the time of call. To use MADV_WILLNEED, you would need to call it between faults/memcpy. I played with it, but no luck so far. I've also never seen super pages, how to make them work? They just work, at least for me. Look at the output of procstat -v after enough loops finished to not cause disk activity. The problem was in my test program. I fixed it, now I see super pages but I'm still not satisfied. There are several tests below: 1. With madvise(MADV_RANDOM) I see almost all super pages: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 26.438535 (none: 0; res: 262144; super: 511; other: 0) mmap: 2 pass took: 0.187311 (none: 0; res: 262144; super: 511; other: 0) mmap: 3 pass took: 0.184953 (none: 0; res: 262144; super: 511; other: 0) mmap: 4 pass took: 0.186007 (none: 0; res: 262144; super: 511; other: 0) mmap: 5 pass took: 0.185790 (none: 0; res: 262144; super: 511; other: 0) Should it be 512? Check the starting virtual address. It is probably not aligned on a superpage boundary. Hence, a few pages at the start and end of your mapped region are not in a superpage. 2. Without madvise(MADV_RANDOM): $ ./mmap /mnt/random-1024 50 mmap: 1 pass took: 7.629745 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.301720 (none: 261202; res:942; super: 0; other: 0) mmap: 3 pass took: 7.261416 (none: 260226; res: 1918; super: 1; other: 0) [skip] mmap: 49 pass took: 0.155368 (none: 0; res: 262144; super: 323; other: 0) mmap: 50 pass took: 0.155438 (none: 0; res: 262144; super: 323; other: 0) Only 323 pages. 3. If I just re-run test I don't see super pages with any size of block. $ ./mmap /mnt/random-1024 5 $((130)) mmap: 1 pass took: 1.013939 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.267082 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.270711 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.268940 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.269634 (none: 0; res: 262144; super: 0; other: 0) 4. If I activate madvise(MADV_WILLNEDD) in the copy loop and re-run test then I see super pages only if I use block greater than 2Mb. $ ./mmap /mnt/random-1024 1 $((121)) mmap: 1 pass took: 0.299722 (none: 0; res: 262144; super: 0; other: 0) $ ./mmap /mnt/random-1024 1 $((122)) mmap: 1 pass took: 0.271828 (none: 0; res: 262144; super: 170; other: 0) $ ./mmap /mnt/random-1024 1 $((123)) mmap: 1 pass took: 0.333188 (none: 0; res: 262144; super: 258; other: 0) $ ./mmap /mnt/random-1024 1 $((124)) mmap: 1 pass took: 0.339250 (none: 0; res: 262144; super: 303; other: 0) $ ./mmap /mnt/random-1024 1 $((125)) mmap: 1 pass took: 0.418812 (none: 0; res: 262144; super: 324; other: 0) $ ./mmap /mnt/random-1024 1 $((126)) mmap: 1 pass took: 0.360892 (none: 0; res: 262144; super: 335; other: 0) $ ./mmap /mnt/random-1024 1 $((127)) mmap: 1 pass took: 0.401122 (none: 0; res: 262144; super: 342; other: 0) $ ./mmap /mnt/random-1024 1 $((128)) mmap: 1 pass took: 0.478764 (none: 0; res: 262144; super: 345; other: 0) $ ./mmap /mnt/random-1024 1 $((129)) mmap: 1 pass took: 0.607266 (none: 0; res: 262144; super: 346; other: 0) $ ./mmap /mnt/random-1024 1 $((130)) mmap: 1 pass took: 0.901269 (none: 0; res: 262144; super: 347; other: 0) 5. If I activate madvise(MADV_WILLNEED) immediately after mmap() then I see some number of super pages (the number from test #2). $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.178666 (none: 0; res: 262144; super: 323; other: 0) mmap: 2 pass took: 0.158889 (none: 0; res: 262144; super: 323; other: 0) mmap: 3 pass took: 0.157229 (none: 0; res: 262144; super: 323; other: 0) mmap: 4 pass took: 0.156895 (none: 0; res: 262144; super: 323; other: 0) mmap: 5 pass took: 0.162938 (none: 0; res: 262144; super: 323; other: 0) 6. If I read file manually before test then I don't see super pages with any size of block and madvise(MADV_WILLNEED) doesn't help. $ ./mmap /mnt/random-1024 5 $((130)) mmap: 1 pass took: 0.996767 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.311129 (none:
Re: problems with mmap() and disk caching
On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've
Re: problems with mmap() and disk caching
On 04/05/2012 12:31, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I
Re: problems with mmap() and disk caching
On 05.04.2012 19:54, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: [snip] This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); Thanks Alan! Now it works as I expect! But I have more questions to you and kib@. They are in my test below. So, prepare file as earlier, and take information about memory usage from top(1). After preparation, but before test: Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free First run: $ ./mmap /mnt/random mmap: 1 pass took: 7.462865 (none: 0; res: 262144; super: 0; other: 0) No super pages after first run, why?.. Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free Now the file is in inactive memory, that's good. Second run: $ ./mmap /mnt/random mmap: 1 pass took: 0.004191 (none: 0; res: 262144; super: 511; other: 0) All super pages are here, nice. Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free Wow, all inactive pages moved to active and sit there even after process was terminated, that's not good, what do you think? Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. Could you please give me explanation about active/inactive/wired memory? because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially vm_page_cache() are too severe for a detection heuristic that is so easily triggered. [snip] -- Andrey Zonov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: problems with mmap() and disk caching
On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote: On 05.04.2012 19:54, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: [snip] This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); Thanks Alan! Now it works as I expect! But I have more questions to you and kib@. They are in my test below. So, prepare file as earlier, and take information about memory usage from top(1). After preparation, but before test: Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free First run: $ ./mmap /mnt/random mmap: 1 pass took: 7.462865 (none: 0; res: 262144; super: 0; other: 0) No super pages after first run, why?.. Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free Now the file is in inactive memory, that's good. Second run: $ ./mmap /mnt/random mmap: 1 pass took: 0.004191 (none: 0; res: 262144; super: 511; other: 0) All super pages are here, nice. Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free Wow, all inactive pages moved to active and sit there even after process was terminated, that's not good, what do you think? Why do you think this is 'not good' ? You have plenty of free memory, there is no memory pressure, and all pages were referenced recently. THere is no reason for them to be deactivated. Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. You do use UFS, right ? There is enough buffer headers and buffer KVA to have buffers allocated for the whole file content. Since buffers wire corresponding pages, you get pages migrated to wired. When there appears a buffer pressure (i.e., any other i/o started), the buffers will be repurposed and pages moved to inactive. Could you please give me explanation about active/inactive/wired memory? because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially vm_page_cache() are too severe for a detection heuristic that is so easily triggered. [snip] -- Andrey Zonov pgpgwsB0EvlKi.pgp Description: PGP signature
Re: problems with mmap() and disk caching
On 05.04.2012 23:41, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote: On 05.04.2012 19:54, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: [snip] This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); Thanks Alan! Now it works as I expect! But I have more questions to you and kib@. They are in my test below. So, prepare file as earlier, and take information about memory usage from top(1). After preparation, but before test: Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free First run: $ ./mmap /mnt/random mmap: 1 pass took: 7.462865 (none: 0; res: 262144; super: 0; other: 0) No super pages after first run, why?.. Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free Now the file is in inactive memory, that's good. Second run: $ ./mmap /mnt/random mmap: 1 pass took: 0.004191 (none: 0; res: 262144; super: 511; other: 0) All super pages are here, nice. Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free Wow, all inactive pages moved to active and sit there even after process was terminated, that's not good, what do you think? Why do you think this is 'not good' ? You have plenty of free memory, there is no memory pressure, and all pages were referenced recently. THere is no reason for them to be deactivated. I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. You do use UFS, right ? Yes. There is enough buffer headers and buffer KVA to have buffers allocated for the whole file content. Since buffers wire corresponding pages, you get pages migrated to wired. When there appears a buffer pressure (i.e., any other i/o started), the buffers will be repurposed and pages moved to inactive. OK, how can I get amount of disk cache? Could you please give me explanation about active/inactive/wired memory? because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially vm_page_cache() are too severe for a detection heuristic that is so easily triggered. [snip] -- Andrey Zonov -- Andrey Zonov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: problems with mmap() and disk caching
On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. What happen is that fault handler deactivates or caches the pages previous to the one which would satisfy the fault. See the if() statement starting at line 463 of vm/vm_fault.c. Since all pages of the object in your test are clean, the pages are cached. Next fault would need to allocate some more pages for different index of the same object. What I see is that vm_reserv_alloc_page() returns a page that is from the cache for the same object, but different pindex. As an obvious result, the page is invalidated and repurposed. When next loop started, the page is not resident anymore, so it has to be re-read from disk. The behaviour of the allocator is not consistent, so some pages are not reused, allowing the test to converge and to collect all pages of the object eventually. Calling madvise(MADV_RANDOM) fixes the issue, because the code to deactivate/cache the pages is turned off. On the other hand, it also turns of read-ahead for faulting, and the first loop becomes eternally long. Doing
Re: problems with mmap() and disk caching
On 04.04.2012 11:17, Konstantin Belousov wrote: Calling madvise(MADV_RANDOM) fixes the issue, because the code to deactivate/cache the pages is turned off. On the other hand, it also turns of read-ahead for faulting, and the first loop becomes eternally long. Now it takes 5 times longer. Anyway, thanks for explanation. Doing MADV_WILLNEED does not fix the problem indeed, since willneed reactivates the pages of the object at the time of call. To use MADV_WILLNEED, you would need to call it between faults/memcpy. I played with it, but no luck so far. I've also never seen super pages, how to make them work? They just work, at least for me. Look at the output of procstat -v after enough loops finished to not cause disk activity. The problem was in my test program. I fixed it, now I see super pages but I'm still not satisfied. There are several tests below: 1. With madvise(MADV_RANDOM) I see almost all super pages: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 26.438535 (none: 0; res: 262144; super: 511; other: 0) mmap: 2 pass took: 0.187311 (none: 0; res: 262144; super: 511; other: 0) mmap: 3 pass took: 0.184953 (none: 0; res: 262144; super: 511; other: 0) mmap: 4 pass took: 0.186007 (none: 0; res: 262144; super: 511; other: 0) mmap: 5 pass took: 0.185790 (none: 0; res: 262144; super: 511; other: 0) Should it be 512? 2. Without madvise(MADV_RANDOM): $ ./mmap /mnt/random-1024 50 mmap: 1 pass took: 7.629745 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.301720 (none: 261202; res:942; super: 0; other: 0) mmap: 3 pass took: 7.261416 (none: 260226; res: 1918; super: 1; other: 0) [skip] mmap: 49 pass took: 0.155368 (none: 0; res: 262144; super: 323; other: 0) mmap: 50 pass took: 0.155438 (none: 0; res: 262144; super: 323; other: 0) Only 323 pages. 3. If I just re-run test I don't see super pages with any size of block. $ ./mmap /mnt/random-1024 5 $((130)) mmap: 1 pass took: 1.013939 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.267082 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.270711 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.268940 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.269634 (none: 0; res: 262144; super: 0; other: 0) 4. If I activate madvise(MADV_WILLNEDD) in the copy loop and re-run test then I see super pages only if I use block greater than 2Mb. $ ./mmap /mnt/random-1024 1 $((121)) mmap: 1 pass took: 0.299722 (none: 0; res: 262144; super: 0; other: 0) $ ./mmap /mnt/random-1024 1 $((122)) mmap: 1 pass took: 0.271828 (none: 0; res: 262144; super: 170; other: 0) $ ./mmap /mnt/random-1024 1 $((123)) mmap: 1 pass took: 0.333188 (none: 0; res: 262144; super: 258; other: 0) $ ./mmap /mnt/random-1024 1 $((124)) mmap: 1 pass took: 0.339250 (none: 0; res: 262144; super: 303; other: 0) $ ./mmap /mnt/random-1024 1 $((125)) mmap: 1 pass took: 0.418812 (none: 0; res: 262144; super: 324; other: 0) $ ./mmap /mnt/random-1024 1 $((126)) mmap: 1 pass took: 0.360892 (none: 0; res: 262144; super: 335; other: 0) $ ./mmap /mnt/random-1024 1 $((127)) mmap: 1 pass took: 0.401122 (none: 0; res: 262144; super: 342; other: 0) $ ./mmap /mnt/random-1024 1 $((128)) mmap: 1 pass took: 0.478764 (none: 0; res: 262144; super: 345; other: 0) $ ./mmap /mnt/random-1024 1 $((129)) mmap: 1 pass took: 0.607266 (none: 0; res: 262144; super: 346; other: 0) $ ./mmap /mnt/random-1024 1 $((130)) mmap: 1 pass took: 0.901269 (none: 0; res: 262144; super: 347; other: 0) 5. If I activate madvise(MADV_WILLNEED) immediately after mmap() then I see some number of super pages (the number from test #2). $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.178666 (none: 0; res: 262144; super: 323; other: 0) mmap: 2 pass took: 0.158889 (none: 0; res: 262144; super: 323; other: 0) mmap: 3 pass took: 0.157229 (none: 0; res: 262144; super: 323; other: 0) mmap: 4 pass took: 0.156895 (none: 0; res: 262144; super: 323; other: 0) mmap: 5 pass took: 0.162938 (none: 0; res: 262144; super: 323; other: 0) 6. If I read file manually before test then I don't see super pages with any size of block and madvise(MADV_WILLNEED) doesn't help. $ ./mmap /mnt/random-1024 5 $((130)) mmap: 1 pass took: 0.996767 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.311129 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.317430 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.314437 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass
Re: problems with mmap() and disk caching
I forgot to attach my test program. On 04.04.2012 13:36, Andrey Zonov wrote: On 04.04.2012 11:17, Konstantin Belousov wrote: Calling madvise(MADV_RANDOM) fixes the issue, because the code to deactivate/cache the pages is turned off. On the other hand, it also turns of read-ahead for faulting, and the first loop becomes eternally long. Now it takes 5 times longer. Anyway, thanks for explanation. Doing MADV_WILLNEED does not fix the problem indeed, since willneed reactivates the pages of the object at the time of call. To use MADV_WILLNEED, you would need to call it between faults/memcpy. I played with it, but no luck so far. I've also never seen super pages, how to make them work? They just work, at least for me. Look at the output of procstat -v after enough loops finished to not cause disk activity. The problem was in my test program. I fixed it, now I see super pages but I'm still not satisfied. There are several tests below: 1. With madvise(MADV_RANDOM) I see almost all super pages: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 26.438535 (none: 0; res: 262144; super: 511; other: 0) mmap: 2 pass took: 0.187311 (none: 0; res: 262144; super: 511; other: 0) mmap: 3 pass took: 0.184953 (none: 0; res: 262144; super: 511; other: 0) mmap: 4 pass took: 0.186007 (none: 0; res: 262144; super: 511; other: 0) mmap: 5 pass took: 0.185790 (none: 0; res: 262144; super: 511; other: 0) Should it be 512? 2. Without madvise(MADV_RANDOM): $ ./mmap /mnt/random-1024 50 mmap: 1 pass took: 7.629745 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.301720 (none: 261202; res: 942; super: 0; other: 0) mmap: 3 pass took: 7.261416 (none: 260226; res: 1918; super: 1; other: 0) [skip] mmap: 49 pass took: 0.155368 (none: 0; res: 262144; super: 323; other: 0) mmap: 50 pass took: 0.155438 (none: 0; res: 262144; super: 323; other: 0) Only 323 pages. 3. If I just re-run test I don't see super pages with any size of block. $ ./mmap /mnt/random-1024 5 $((130)) mmap: 1 pass took: 1.013939 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.267082 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.270711 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.268940 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.269634 (none: 0; res: 262144; super: 0; other: 0) 4. If I activate madvise(MADV_WILLNEDD) in the copy loop and re-run test then I see super pages only if I use block greater than 2Mb. $ ./mmap /mnt/random-1024 1 $((121)) mmap: 1 pass took: 0.299722 (none: 0; res: 262144; super: 0; other: 0) $ ./mmap /mnt/random-1024 1 $((122)) mmap: 1 pass took: 0.271828 (none: 0; res: 262144; super: 170; other: 0) $ ./mmap /mnt/random-1024 1 $((123)) mmap: 1 pass took: 0.333188 (none: 0; res: 262144; super: 258; other: 0) $ ./mmap /mnt/random-1024 1 $((124)) mmap: 1 pass took: 0.339250 (none: 0; res: 262144; super: 303; other: 0) $ ./mmap /mnt/random-1024 1 $((125)) mmap: 1 pass took: 0.418812 (none: 0; res: 262144; super: 324; other: 0) $ ./mmap /mnt/random-1024 1 $((126)) mmap: 1 pass took: 0.360892 (none: 0; res: 262144; super: 335; other: 0) $ ./mmap /mnt/random-1024 1 $((127)) mmap: 1 pass took: 0.401122 (none: 0; res: 262144; super: 342; other: 0) $ ./mmap /mnt/random-1024 1 $((128)) mmap: 1 pass took: 0.478764 (none: 0; res: 262144; super: 345; other: 0) $ ./mmap /mnt/random-1024 1 $((129)) mmap: 1 pass took: 0.607266 (none: 0; res: 262144; super: 346; other: 0) $ ./mmap /mnt/random-1024 1 $((130)) mmap: 1 pass took: 0.901269 (none: 0; res: 262144; super: 347; other: 0) 5. If I activate madvise(MADV_WILLNEED) immediately after mmap() then I see some number of super pages (the number from test #2). $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.178666 (none: 0; res: 262144; super: 323; other: 0) mmap: 2 pass took: 0.158889 (none: 0; res: 262144; super: 323; other: 0) mmap: 3 pass took: 0.157229 (none: 0; res: 262144; super: 323; other: 0) mmap: 4 pass took: 0.156895 (none: 0; res: 262144; super: 323; other: 0) mmap: 5 pass took: 0.162938 (none: 0; res: 262144; super: 323; other: 0) 6. If I read file manually before test then I don't see super pages with any size of block and madvise(MADV_WILLNEED) doesn't help. $ ./mmap /mnt/random-1024 5 $((130)) mmap: 1 pass took: 0.996767 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.311129 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.317430 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.314437 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.310757 (none: 0; res: 262144; super: 0; other: 0) -- Andrey Zonov /*_ * Andrey Zonov (c) 2011 */ #include sys/mman.h #include sys/types.h #include sys/time.h #include sys/stat.h #include err.h #include fcntl.h #include stdlib.h #include string.h #include unistd.h int main(int argc, char **argv) { int i; int fd; int num; int block; int pagesize;
problems with mmap() and disk caching
Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? I've also never seen super pages, how to make them work? I've been playing with madvise and posix_fadvise but no luck. BTW, posix_fadvise(POSIX_FADV_WILLNEED) does nothing as the commentary says, shouldn't this be documented in the manual page? All tests were run under 9.0-STABLE (r233744). -- Andrey Zonov /*_ * Andrey Zonov (c) 2011 */ #include sys/mman.h #include sys/types.h #include sys/time.h #include sys/stat.h #include err.h #include fcntl.h #include stdlib.h #include string.h #include unistd.h int main(int argc, char **argv) { int i; int fd; int num; int block; int pagesize; size_t n; size_t size; size_t none, incore, super, other; char *p; char *tmp; char *vec; char *vecp; struct stat sb; struct timeval tp, tp1, tp2; if (argc 2 || argc 4) errx(1, usage: mmap filename [num] [block]); fd = open(argv[1], O_RDONLY); if (fd == -1) err(1, open()); num = 1; if (argc = 3) num = atoi(argv[2]); pagesize = getpagesize(); block = pagesize; if (argc == 4) block = atoi(argv[3]); if