Re: mmap() question
On 12.10.2013, at 18:14, Konstantin Belousov kostik...@gmail.com wrote: First I tried with some swap space configured. The OS started to swap out my process after it reached about 20GB which is also not what I expected: what is the reason to swap out regions of read-only mmap()ed files? Is it the expected behaviour? How did you concluded that the pages from your r/o mappings were paged out ? VM never does this. Only anonymous memory could be written to swap file, including the shadow pages for the writeable COW mappings. I suspect that you have another 20GB of something used on the machine meantime. Yes, sorry, I tried again with swap space configured and it is really some other processes which are swapping out: sshd, other user's shells, etc. Below is the prototype patch, against HEAD. It is not applicable to stable, please use HEAD kernel for test. I tried your patch with stable/10 system and I can confirm that my process is not killed anymore because of OOM. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: mmap() question
On Fri, Oct 11, 2013 at 09:57:24AM +0400, Dmitry Sivachenko wrote: On 11.10.2013, at 9:17, Konstantin Belousov kostik...@gmail.com wrote: On Wed, Oct 09, 2013 at 03:42:27PM +0400, Dmitry Sivachenko wrote: Hello! I have a program which mmap()s a lot of large files (total size more that RAM and I have no swap), but it needs only small parts of that files at a time. My understanding is that when using mmap when I access some memory region OS reads the relevant portion of that file from disk and caches the result in memory. If there is no free memory, OS will purge previously read part of mmap'ed file to free memory for the new chunk. But this is not the case. I use the following simple program which gets list of files as command line arguments, mmap()s them all and then selects random file and random 1K parts of that file and computes a XOR of bytes from that region. After some time the program dies: pid 63251 (a.out), uid 1232, was killed: out of swap space It seems I incorrectly understand how mmap() works, can you please clarify what's going wrong? I expect that program to run indefinitely, purging some regions out of RAM and reading the relevant parts of files. You did not specified several very important parameters for your test: 1. total amount of RAM installed 24GB 2. count of the test files and size of the files To be precise: I used 57 files with size varied form 74MB to 19GB. The total size of these files is 270GB. 3. which filesystem files are located at UFS @ SSD drive 4. version of the system. FreeBSD 9.2-PRERELEASE #0 r254880M: Wed Aug 28 11:07:54 MSK 2013 I was not able to reproduce the situation locally. I even tried to start a lot of threads accessing the mapped regions, to try to outrun the pagedaemon. The user threads sleep on the disk read, while pagedaemon has a lot of time to rebalance the queues. It might be a case when SSD indeed makes a difference. Still, I see how this situation could appear. The code, which triggers OOM, never fires if there is a free space in the swapfile, so the absense of swap is neccessary condition to trigger the bug. Next, OOM calculation does not account for a possibility that almost all pages on the queues can be reused. It just fires if free pages depleted too much or free target cannot be reached. IMO one of the possible solution is to account the queued pages in addition to the swap space. This is not entirely accurate, since some pages on the queues cannot be reused, at least transiently. Most precise algorithm would count the hold and busy pages globally, and substract this count from queues length, but it is probably too costly. Instead, I think we could rely on the numbers which are counted by pagedaemon threads during the passes. Due to the transient nature of the pagedaemon failures, this should be fine. Below is the prototype patch, against HEAD. It is not applicable to stable, please use HEAD kernel for test. diff --git a/sys/sys/vmmeter.h b/sys/sys/vmmeter.h index d2ad920..ee5159a 100644 --- a/sys/sys/vmmeter.h +++ b/sys/sys/vmmeter.h @@ -93,9 +93,10 @@ struct vmmeter { u_int v_free_min; /* (c) pages desired free */ u_int v_free_count; /* (f) pages free */ u_int v_wire_count; /* (a) pages wired down */ - u_int v_active_count; /* (q) pages active */ + u_int v_active_count; /* (a) pages active */ u_int v_inactive_target; /* (c) pages desired inactive */ - u_int v_inactive_count; /* (q) pages inactive */ + u_int v_inactive_count; /* (a) pages inactive */ + u_int v_queue_sticky; /* (a) pages on queues but cannot process */ u_int v_cache_count;/* (f) pages on cache queue */ u_int v_cache_min; /* (c) min pages desired on cache queue */ u_int v_cache_max; /* (c) max pages in cached obj (unused) */ diff --git a/sys/vm/vm_meter.c b/sys/vm/vm_meter.c index 713a2be..4bb1f1f 100644 --- a/sys/vm/vm_meter.c +++ b/sys/vm/vm_meter.c @@ -316,6 +316,7 @@ VM_STATS_VM(v_active_count, Active pages); VM_STATS_VM(v_inactive_target, Desired inactive pages); VM_STATS_VM(v_inactive_count, Inactive pages); VM_STATS_VM(v_cache_count, Pages on cache queue); +VM_STATS_VM(v_queue_sticky, Pages which cannot be moved from queues); VM_STATS_VM(v_cache_min, Min pages on cache queue); VM_STATS_VM(v_cache_max, Max pages on cached queue); VM_STATS_VM(v_pageout_free_min, Min pages reserved for kernel); diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h index 7846702..6943a0e 100644 --- a/sys/vm/vm_page.h +++ b/sys/vm/vm_page.h @@ -226,6 +226,7 @@ struct vm_domain { long vmd_segs; /* bitmask of the segments */ boolean_t vmd_oom; int vmd_pass; /* local pagedaemon pass */ + int vmd_queue_sticky; /* pages on queues which cannot be processed */ struct vm_page vmd_marker; /* marker for pagedaemon private
Re: mmap() question
On 12.10.2013, at 13:59, Konstantin Belousov kostik...@gmail.com wrote: I was not able to reproduce the situation locally. I even tried to start a lot of threads accessing the mapped regions, to try to outrun the pagedaemon. The user threads sleep on the disk read, while pagedaemon has a lot of time to rebalance the queues. It might be a case when SSD indeed makes a difference. With ordinary SATA drive it will take hours just to read 20GB of data from disk because of random access, it will do a lot of seeks and reading speed will be extremely low. SSD dramatically improves reading speed. Still, I see how this situation could appear. The code, which triggers OOM, never fires if there is a free space in the swapfile, so the absense of swap is neccessary condition to trigger the bug. Next, OOM calculation does not account for a possibility that almost all pages on the queues can be reused. It just fires if free pages depleted too much or free target cannot be reached. First I tried with some swap space configured. The OS started to swap out my process after it reached about 20GB which is also not what I expected: what is the reason to swap out regions of read-only mmap()ed files? Is it the expected behaviour? Below is the prototype patch, against HEAD. It is not applicable to stable, please use HEAD kernel for test. Thanks, I will test the patch soon and report the results. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: mmap() question
On Sat, Oct 12, 2013 at 04:04:31PM +0400, Dmitry Sivachenko wrote: On 12.10.2013, at 13:59, Konstantin Belousov kostik...@gmail.com wrote: I was not able to reproduce the situation locally. I even tried to start a lot of threads accessing the mapped regions, to try to outrun the pagedaemon. The user threads sleep on the disk read, while pagedaemon has a lot of time to rebalance the queues. It might be a case when SSD indeed makes a difference. With ordinary SATA drive it will take hours just to read 20GB of data from disk because of random access, it will do a lot of seeks and reading speed will be extremely low. SSD dramatically improves reading speed. Still, I see how this situation could appear. The code, which triggers OOM, never fires if there is a free space in the swapfile, so the absense of swap is neccessary condition to trigger the bug. Next, OOM calculation does not account for a possibility that almost all pages on the queues can be reused. It just fires if free pages depleted too much or free target cannot be reached. First I tried with some swap space configured. The OS started to swap out my process after it reached about 20GB which is also not what I expected: what is the reason to swap out regions of read-only mmap()ed files? Is it the expected behaviour? How did you concluded that the pages from your r/o mappings were paged out ? VM never does this. Only anonymous memory could be written to swap file, including the shadow pages for the writeable COW mappings. I suspect that you have another 20GB of something used on the machine meantime. Below is the prototype patch, against HEAD. It is not applicable to stable, please use HEAD kernel for test. Thanks, I will test the patch soon and report the results. pgp4mxTG6rGdf.pgp Description: PGP signature
Re: mmap() question
On 11.10.2013, at 9:17, Konstantin Belousov kostik...@gmail.com wrote: On Wed, Oct 09, 2013 at 03:42:27PM +0400, Dmitry Sivachenko wrote: Hello! I have a program which mmap()s a lot of large files (total size more that RAM and I have no swap), but it needs only small parts of that files at a time. My understanding is that when using mmap when I access some memory region OS reads the relevant portion of that file from disk and caches the result in memory. If there is no free memory, OS will purge previously read part of mmap'ed file to free memory for the new chunk. But this is not the case. I use the following simple program which gets list of files as command line arguments, mmap()s them all and then selects random file and random 1K parts of that file and computes a XOR of bytes from that region. After some time the program dies: pid 63251 (a.out), uid 1232, was killed: out of swap space It seems I incorrectly understand how mmap() works, can you please clarify what's going wrong? I expect that program to run indefinitely, purging some regions out of RAM and reading the relevant parts of files. You did not specified several very important parameters for your test: 1. total amount of RAM installed 24GB 2. count of the test files and size of the files To be precise: I used 57 files with size varied form 74MB to 19GB. The total size of these files is 270GB. 3. which filesystem files are located at UFS @ SSD drive 4. version of the system. FreeBSD 9.2-PRERELEASE #0 r254880M: Wed Aug 28 11:07:54 MSK 2013 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: mmap() question
On Wed, Oct 09, 2013 at 03:42:27PM +0400, Dmitry Sivachenko wrote: Hello! I have a program which mmap()s a lot of large files (total size more that RAM and I have no swap), but it needs only small parts of that files at a time. My understanding is that when using mmap when I access some memory region OS reads the relevant portion of that file from disk and caches the result in memory. If there is no free memory, OS will purge previously read part of mmap'ed file to free memory for the new chunk. But this is not the case. I use the following simple program which gets list of files as command line arguments, mmap()s them all and then selects random file and random 1K parts of that file and computes a XOR of bytes from that region. After some time the program dies: pid 63251 (a.out), uid 1232, was killed: out of swap space It seems I incorrectly understand how mmap() works, can you please clarify what's going wrong? I expect that program to run indefinitely, purging some regions out of RAM and reading the relevant parts of files. You did not specified several very important parameters for your test: 1. total amount of RAM installed 2. count of the test files and size of the files 3. which filesystem files are located at 4. version of the system. pgpu_xJMm2QsF.pgp Description: PGP signature
mmap() question
Hello! I have a program which mmap()s a lot of large files (total size more that RAM and I have no swap), but it needs only small parts of that files at a time. My understanding is that when using mmap when I access some memory region OS reads the relevant portion of that file from disk and caches the result in memory. If there is no free memory, OS will purge previously read part of mmap'ed file to free memory for the new chunk. But this is not the case. I use the following simple program which gets list of files as command line arguments, mmap()s them all and then selects random file and random 1K parts of that file and computes a XOR of bytes from that region. After some time the program dies: pid 63251 (a.out), uid 1232, was killed: out of swap space It seems I incorrectly understand how mmap() works, can you please clarify what's going wrong? I expect that program to run indefinitely, purging some regions out of RAM and reading the relevant parts of files. Thanks! #include err.h #include fcntl.h #include math.h #include stdio.h #include stdlib.h #include sys/mman.h #include sys/stat.h #include sys/types.h #include unistd.h struct f_data { char *beg; off_t size; }; int main(int argc, char* argv[]) { if (argc 2) { fprintf(stderr, Usage: %s file ...\n, argv[0]); exit(0); } int i, j, fd; struct stat st; struct f_data FILES[500]; int NUM_FILES; void *p; NUM_FILES = argc - 1; for (i=1; i argc; i++) { printf(%s... , argv[i]); if ((fd = open(argv[i], O_RDONLY)) 0) errx(1, open); if (fstat(fd, st) != 0) errx(1, fstat); if ((p = mmap(NULL, st.st_size, PROT_READ, MAP_NOCORE, fd, 0)) == MAP_FAILED) errx(1, mmap); FILES[i-1].beg = (char*)p; FILES[i-1].size = st.st_size; if (msync(p, st.st_size, MS_INVALIDATE) != 0) errx(1, msync); printf(Ok.\n); } char chk = 0; while(1) { int rf = floor((double)random() / 2147483647 * NUM_FILES); off_t offs = floor((double)random() / 2147483647 * (FILES[rf].size - 1024)); for (j=0; j1024; j++) chk ^= *(FILES[rf].beg + offs + j); } return 0; } ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: mmap() question
On Wed, 9 Oct 2013 15:42:27 +0400 Dmitry Sivachenko wrote: Hello! I have a program which mmap()s a lot of large files (total size more that RAM and I have no swap), but it needs only small parts of that files at a time. My understanding is that when using mmap when I access some memory region OS reads the relevant portion of that file from disk and caches the result in memory. If there is no free memory, OS will purge previously read part of mmap'ed file to free memory for the new chunk. ... It seems I incorrectly understand how mmap() works, can you please clarify what's going wrong? I expect that program to run indefinitely, purging some regions out of RAM and reading the relevant parts of files. I think your problem is that you are accessing the memory so rapidly that the pages can't even get out of the active queue. The VM system isn't optimized for this kind of abnormal access. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
/dev/dsp mmap question
Guys, if we the following on FreeBSD (pseudo-code): fd = open(/dev/dsp, O_RDWR); mmap(PROT_READ, fd); mmap(PROT_WRITE, fd); This won't work entirely correctly, right? I base my question on some observations of how a particular program behaves on FreeBSD and on the following comment in sys/dev/sound/pcm/dsp.c: /* * XXX The linux api uses the nprot to select read/write buffer * our vm system doesn't allow this, so force write buffer. * * This is just a quack to fool full-duplex mmap, so that at * least playback _or_ recording works. If you really got the * urge to make _both_ work at the same time, avoid O_RDWR. * Just open each direction separately and mmap() it. * * Failure is not an option due to INVARIANTS check within * device_pager.c, which means, we have to give up one over * another. */ P.S. is this something that can easily fixed or not? -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: KLD mmap question
On Thu, 30 May 2002, Tom Tang wrote: Thanks for the reply, I'll check it out. However if you'll notice in my prev mail, I stated that I was trying to contigmalloc 4K... Hard to believe that the system doesnt have 4K lying around. Good point. Contigmallocing a page is pretty silly, since you get a page anyway. Why not just use normal malloc? Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
KLD mmap question
Hello, I have a question about implementing mmap functions in device drivers. Thinking it would be simple, I contigmalloc'd a buffer of PAGE_SIZE and returned it using atop like other mmap device implementations. However when my userland program mmaps the device with offset 0, when I try accessing the returned pointer, it returns me invalid memory address. Any help would be appreciated... - Tom -- Tom Tang tangj AT cs DOT ucdavis DOT edu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: KLD mmap question
On Thu, 30 May 2002, Tom Tang wrote: I have a question about implementing mmap functions in device drivers. Thinking it would be simple, I contigmalloc'd a buffer of PAGE_SIZE and returned it using atop like other mmap device implementations. However when my userland program mmaps the device with offset 0, when I try accessing the returned pointer, it returns me invalid memory address. Any help would be appreciated... Generally, after the machine is started, memory gets too fragmented to use contigmalloc. If you preload the module and do the contigmalloc at attach time it should succeed. Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: KLD mmap question
Doug, Thanks for the reply, I'll check it out. However if you'll notice in my prev mail, I stated that I was trying to contigmalloc 4K... Hard to believe that the system doesnt have 4K lying around. - Tom On Thu, 30 May 2002, Doug White wrote: On Thu, 30 May 2002, Tom Tang wrote: I have a question about implementing mmap functions in device drivers. Thinking it would be simple, I contigmalloc'd a buffer of PAGE_SIZE and returned it using atop like other mmap device implementations. However when my userland program mmaps the device with offset 0, when I try accessing the returned pointer, it returns me invalid memory address. Any help would be appreciated... Generally, after the machine is started, memory gets too fragmented to use contigmalloc. If you preload the module and do the contigmalloc at attach time it should succeed. Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org -- Tom Tang [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: KLD mmap question
On Thu, 2002-05-30 at 13:14, Tom Tang wrote: Hello, I have a question about implementing mmap functions in device drivers. Thinking it would be simple, I contigmalloc'd a buffer of PAGE_SIZE and returned it using atop like other mmap device implementations. However when my userland program mmaps the device with offset 0, when I try accessing the returned pointer, it returns me invalid memory address. Any help would be appreciated... An associated question: while I was looking at the DRM's mmap for the shared memory area, I think I figured out that you didn't need memory that was going to be mmapped to be physically contiguous (since the device pager would get each page of it separately). Was I right? -- Eric Anholt [EMAIL PROTECTED] http://gladstone.uoregon.edu/~eanholt/dri/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: mmap question
In article 000101bec73c$e20e3660$[EMAIL PROTECTED], Kelly Yancey [EMAIL PROTECTED] wrote: Also, in case it hasn't been notice already (I'm running -stable from May 18th), the mmap(2) manpage has a typo: it has "#include sys/mman.h" So what's the typo, exactly? John -- John Polstra [EMAIL PROTECTED] John D. Polstra Co., Inc.Seattle, Washington USA "No matter how cynical I get, I just can't keep up."-- Nora Ephron To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mmap question
In article 000101bec73c$e20e3660$291c4...@kbyanc.alcnet.com, Kelly Yancey kby...@alcnet.com wrote: Also, in case it hasn't been notice already (I'm running -stable from May 18th), the mmap(2) manpage has a typo: it has #include sys/mman.h So what's the typo, exactly? John -- John Polstra j...@polstra.com John D. Polstra Co., Inc.Seattle, Washington USA No matter how cynical I get, I just can't keep up.-- Nora Ephron To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: mmap question
: Which is fine and dandy, I'll just stat() the file to get the filesize and :mmap() it. But what happens in someone comes along and replaces the file :with :a larger file? I understand that my view of the file will change to the new :file, but only the length that I mmap()ed originally. Do I have to :periodically stat() the file to determine if I need to re-mmap() it should :the file size change? And if so, doesn't that partly diminish the usefulness :of mmap()? I mean, sure you can edit the file as a file and they are :reflected in the in-memory image, but how many edits don't change the file :size? You can mmap() an area that is larger then the file. For example, you could mmap a 100 bytes file into a 32MB area. If the file then grows, you can access the new data up to the amount of space you reserved. However, accessing pages beyond the file EOF via the mmap() will result in a segfault. This is also true if a file is truncated out from under you - previously valid data pages will disappear. If you mmap() the exact size of a file and the file grows, you have to mmap() the new area of the file or unmap the old area and remap the entire file to gain access to the additional data. You can mmap() areas of a file in a piecemeal fashion though this should not be taken to extremes since it will slow down page-fault handling. Most programs using mmap() use it on files which are not expected to change out from under the program's control. Thus most programs using mmap() simply map the file's full size and do not try to do anything fancy. : Kelly : ~[EMAIL PROTECTED]~ -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
mmap question
I have a quick question about mmap, hopefully someone can smack me and point out what I'm missing :) the man page says: The mmap() function causes the pages starting at addr and continuing for at most len bytes to be mapped from the object described by fd, starting at byte offset offset. If len is not a multiple of the pagesize, the mapped region may extend past the specified range. Any such extension beyond the end of the mapped object will be zero-filled. Which is fine and dandy, I'll just stat() the file to get the filesize and mmap() it. But what happens in someone comes along and replaces the file with a larger file? I understand that my view of the file will change to the new file, but only the length that I mmap()ed originally. Do I have to periodically stat() the file to determine if I need to re-mmap() it should the file size change? And if so, doesn't that partly diminish the usefulness of mmap()? I mean, sure you can edit the file as a file and they are reflected in the in-memory image, but how many edits don't change the file size? Also, in case it hasn't been notice already (I'm running -stable from May 18th), the mmap(2) manpage has a typo: it has #include sys/mman.h Thanks for your help, Kelly ~kby...@posi.net~ FreeBSD - The Power To Serve - http://www.freebsd.org/ Join Team FreeBSD - http://www.posi.net/freebsd/Team-FreeBSD To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: mmap question
: Which is fine and dandy, I'll just stat() the file to get the filesize and :mmap() it. But what happens in someone comes along and replaces the file :with :a larger file? I understand that my view of the file will change to the new :file, but only the length that I mmap()ed originally. Do I have to :periodically stat() the file to determine if I need to re-mmap() it should :the file size change? And if so, doesn't that partly diminish the usefulness :of mmap()? I mean, sure you can edit the file as a file and they are :reflected in the in-memory image, but how many edits don't change the file :size? You can mmap() an area that is larger then the file. For example, you could mmap a 100 bytes file into a 32MB area. If the file then grows, you can access the new data up to the amount of space you reserved. However, accessing pages beyond the file EOF via the mmap() will result in a segfault. This is also true if a file is truncated out from under you - previously valid data pages will disappear. If you mmap() the exact size of a file and the file grows, you have to mmap() the new area of the file or unmap the old area and remap the entire file to gain access to the additional data. You can mmap() areas of a file in a piecemeal fashion though this should not be taken to extremes since it will slow down page-fault handling. Most programs using mmap() use it on files which are not expected to change out from under the program's control. Thus most programs using mmap() simply map the file's full size and do not try to do anything fancy. : Kelly : ~kby...@posi.net~ -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message