Re: problems with mmap() and disk caching
On 06.04.2012 12:13, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: On 05.04.2012 23:41, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote: On 05.04.2012 19:54, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: [snip] This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); Thanks Alan! Now it works as I expect! But I have more questions to you and kib@. They are in my test below. So, prepare file as earlier, and take information about memory usage from top(1). After preparation, but before test: Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free First run: $ ./mmap /mnt/random mmap: 1 pass took: 7.462865 (none: 0; res: 262144; super: 0; other: 0) No super pages after first run, why?.. Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free Now the file is in inactive memory, that's good. Second run: $ ./mmap /mnt/random mmap: 1 pass took: 0.004191 (none: 0; res: 262144; super: 511; other: 0) All super pages are here, nice. Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free Wow, all inactive pages moved to active and sit there even after process was terminated, that's not good, what do you think? Why do you think this is 'not good' ? You have plenty of free memory, there is no memory pressure, and all pages were referenced recently. THere is no reason for them to be deactivated. I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. Let's talk about that in details. My understanding is the following: Active memory: the memory which is referenced by application. An application may get memory only through mmap() (allocator don't use brk()/sbrk() any more). The resident memory of an application is the sum of physical used memory. So, sum of RSS is active memory. Inactive memory: the memory which has no references. Once we call read() on the file, the file is in inactive memory, because we have no references to this object, we just read it. This is also released memory by free(). Cache memory: I don't know what is it. It's always small enough to not think about it. Wired memory: kernel memory and yes, application may get wired memory through mlock()/mlockall(), but I haven't seen any real application which calls mlock(). Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. You do use UFS, right ? Yes. There is enough buffer headers and buffer KVA to have buffers allocated for the whole file content. Since buffers wire corresponding pages, you get pages migrated to wired. When there appears a buffer pressure (i.e., any other i/o started), the buffers will be repurposed and pages moved to inactive. OK, how can I get amount of disk cache? You cannot. At least I am not aware of any counter that keeps track of the resident pages belonging to vnode pager. Buffers should not be thought as disk cache, pages cache disk content. Instead, VMIO buffers only provide bread()/bwrite() compatible interface to the page cache (*) for filesystems. (*) - The cache term is used in generic term, not to confuse with cached pages counter from top etc. Yes, I know that. I try once again to ask my question about buffers. Is this reasonable to use for them 10% of the physical memory or we may set rational upper limit automatically? Could you please give me explanation about active/inactive/wired memory? because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially
Re: problems with mmap() and disk caching
On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote: On 06.04.2012 12:13, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: On 05.04.2012 23:41, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote: On 05.04.2012 19:54, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: [snip] This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); Thanks Alan! Now it works as I expect! But I have more questions to you and kib@. They are in my test below. So, prepare file as earlier, and take information about memory usage from top(1). After preparation, but before test: Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free First run: $ ./mmap /mnt/random mmap: 1 pass took: 7.462865 (none: 0; res: 262144; super: 0; other: 0) No super pages after first run, why?.. Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free Now the file is in inactive memory, that's good. Second run: $ ./mmap /mnt/random mmap: 1 pass took: 0.004191 (none: 0; res: 262144; super: 511; other: 0) All super pages are here, nice. Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free Wow, all inactive pages moved to active and sit there even after process was terminated, that's not good, what do you think? Why do you think this is 'not good' ? You have plenty of free memory, there is no memory pressure, and all pages were referenced recently. THere is no reason for them to be deactivated. I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. Let's talk about that in details. My understanding is the following: Active memory: the memory which is referenced by application. An Assuming the part 'by application' is removed, this sentence is almost right. Any managed mapping of the page participates in the active references. application may get memory only through mmap() (allocator don't use brk()/sbrk() any more). The resident memory of an application is the sum of physical used memory. So, sum of RSS is active memory. First, brk/sbrk is still used. Second, there is no requirement that resident pages are referenced. E.g. page could have participated in the buffer, and unwiring on the buffer dissolve put it into inactive state. Or pagedaemon cleared the reference and moved the page to inactive queue. Or the page was prefaulted by different optimizations. More, there is subtle difference between 'resident' and 'not causing fault on access'. Page may be resident, but pte was not preinstalled, or pte was flushed etc. Inactive memory: the memory which has no references. Once we call read() on the file, the file is in inactive memory, because we have no references to this object, we just read it. This is also released memory by free(). On buffers dissolve, buffer cache explicitely puts pages constituing the buffer, into the inactive queue. In fact, this is not quite right, e.g. if the same pages are mapped and actively referenced, then pagedaemon has slightly more work now to move the page from inactive to active. And, free(3) operates at so much higher level then vm subsystem that describing the interaction between these two is impossible in any definitive mood. Old naive mallocs put block description at the beggining of the block, actually causing free() to reference at least the first page of the block. Jemalloc often does madvise(MADV_FREE) for large freed allocations. MADV_FREE moves pages between queues probabalistically. Cache memory: I don't know what is it. It's always small enough to not think about it. This was the bug you reported, and which Alan fixed on Sunday. Wired memory: kernel memory and yes, application may get wired memory through mlock()/mlockall(), but I haven't seen any real application which calls mlock(). ntpd, amd from the base system. gpg and similar programs try to mlock key store to avoid sensitive material leakage to the
Re: problems with mmap() and disk caching
On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov kostik...@gmail.com wrote: On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote: On 06.04.2012 12:13, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: [snip] I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. Let's talk about that in details. My understanding is the following: Active memory: the memory which is referenced by application. An Assuming the part 'by application' is removed, this sentence is almost right. Any managed mapping of the page participates in the active references. application may get memory only through mmap() (allocator don't use brk()/sbrk() any more). The resident memory of an application is the sum of physical used memory. So, sum of RSS is active memory. First, brk/sbrk is still used. Second, there is no requirement that resident pages are referenced. E.g. page could have participated in the buffer, and unwiring on the buffer dissolve put it into inactive state. Or pagedaemon cleared the reference and moved the page to inactive queue. Or the page was prefaulted by different optimizations. More, there is subtle difference between 'resident' and 'not causing fault on access'. Page may be resident, but pte was not preinstalled, or pte was flushed etc. From the user point of view: how can the memory be active if no-one (I mean application) use it? What I really saw not at once is that the program for a long time worked with big mmap()'ed file, couldn't work well (many page faults) with new version of the file, until I manually flushed active memory by FS re-mounting. New version couldn't force out the old one. In my opinion if VM moved cached objects to inactive queue after program termination I wouldn't see this problem. Inactive memory: the memory which has no references. Once we call read() on the file, the file is in inactive memory, because we have no references to this object, we just read it. This is also released memory by free(). On buffers dissolve, buffer cache explicitely puts pages constituing the buffer, into the inactive queue. In fact, this is not quite right, e.g. if the same pages are mapped and actively referenced, then pagedaemon has slightly more work now to move the page from inactive to active. Yes, sure, if someone else use the object it should be active and even better to introduce new SHARED counter, like one is in MacOSX and Linux. And, free(3) operates at so much higher level then vm subsystem that describing the interaction between these two is impossible in any definitive mood. Old naive mallocs put block description at the beggining of the block, actually causing free() to reference at least the first page of the block. Jemalloc often does madvise(MADV_FREE) for large freed allocations. MADV_FREE moves pages between queues probabalistically. That's exactly what I meant by free(). We drop act_count to 0 and move page to inactive queue by vm_page_dontneed() Cache memory: I don't know what is it. It's always small enough to not think about it. This was the bug you reported, and which Alan fixed on Sunday. I've tested this patch under 9.0-STABLE and should say that it introduces problems with interactivity on heavy disk loaded machines. With the patch that I tested before I didn't observe such problems. Wired memory: kernel memory and yes, application may get wired memory through mlock()/mlockall(), but I haven't seen any real application which calls mlock(). ntpd, amd from the base system. gpg and similar programs try to mlock key store to avoid sensitive material leakage to the swap. cdrecord(8) tried to mlock itself to avoid indefinite stalls during write. Nice catch ;-) Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. You do use UFS, right ? Yes. There is enough buffer headers and buffer KVA to have buffers allocated for the whole file content. Since buffers wire corresponding pages, you get pages migrated to wired. When there appears a buffer pressure (i.e., any other i/o started), the buffers will be repurposed and pages moved to inactive. OK, how can I get amount of disk cache? You cannot. At least I am not aware of any counter that keeps track of the resident pages belonging to vnode pager. Buffers should not be thought as disk cache, pages cache disk
Re: problems with mmap() and disk caching
On Mon, Apr 09, 2012 at 03:35:30PM +0400, Andrey Zonov wrote: On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov kostik...@gmail.com wrote: On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote: On 06.04.2012 12:13, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: [snip] I always thought that active memory this is a sum of resident memory of all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. Let's talk about that in details. My understanding is the following: Active memory: the memory which is referenced by application. An Assuming the part 'by application' is removed, this sentence is almost right. Any managed mapping of the page participates in the active references. application may get memory only through mmap() (allocator don't use brk()/sbrk() any more). The resident memory of an application is the sum of physical used memory. So, sum of RSS is active memory. First, brk/sbrk is still used. Second, there is no requirement that resident pages are referenced. E.g. page could have participated in the buffer, and unwiring on the buffer dissolve put it into inactive state. Or pagedaemon cleared the reference and moved the page to inactive queue. Or the page was prefaulted by different optimizations. More, there is subtle difference between 'resident' and 'not causing fault on access'. Page may be resident, but pte was not preinstalled, or pte was flushed etc. From the user point of view: how can the memory be active if no-one (I mean application) use it? What I really saw not at once is that the program for a long time worked with big mmap()'ed file, couldn't work well (many page faults) with new version of the file, until I manually flushed active memory by FS re-mounting. New version couldn't force out the old one. In my opinion if VM moved cached objects to inactive queue after program termination I wouldn't see this problem. Moving pages to inactive just because some mapping was destroyed is plain silly. The pages migrate between active/inactive/cache/free by the pagedaemon algorithms. BTW, you do not need to actually remount filesystem to flush pages of its vnodes. It is enough to try to unmount it while cd to filesystem root. Inactive memory: the memory which has no references. Once we call read() on the file, the file is in inactive memory, because we have no references to this object, we just read it. This is also released memory by free(). On buffers dissolve, buffer cache explicitely puts pages constituing the buffer, into the inactive queue. In fact, this is not quite right, e.g. if the same pages are mapped and actively referenced, then pagedaemon has slightly more work now to move the page from inactive to active. Yes, sure, if someone else use the object it should be active and even better to introduce new SHARED counter, like one is in MacOSX and Linux. Counter for what ? There is already the ref counter for a vm object. And, free(3) operates at so much higher level then vm subsystem that describing the interaction between these two is impossible in any definitive mood. Old naive mallocs put block description at the beggining of the block, actually causing free() to reference at least the first page of the block. Jemalloc often does madvise(MADV_FREE) for large freed allocations. MADV_FREE moves pages between queues probabalistically. That's exactly what I meant by free(). We drop act_count to 0 and move page to inactive queue by vm_page_dontneed() Cache memory: I don't know what is it. It's always small enough to not think about it. This was the bug you reported, and which Alan fixed on Sunday. I've tested this patch under 9.0-STABLE and should say that it introduces problems with interactivity on heavy disk loaded machines. With the patch that I tested before I didn't observe such problems. Wired memory: kernel memory and yes, application may get wired memory through mlock()/mlockall(), but I haven't seen any real application which calls mlock(). ntpd, amd from the base system. gpg and similar programs try to mlock key store to avoid sensitive material leakage to the swap. cdrecord(8) tried to mlock itself to avoid indefinite stalls during write. Nice catch ;-) Read the file: $ cat /mnt/random /dev/null Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free Now the file is in wired memory. I do not understand why so. You do use UFS,
Re: Graphical Terminal Environment
I'm still avidly trying to work on this idea, but right now the issue seems to be with AMD and NVIDIA not documenting their protocols. Intel does a good job, but I don't have any Intel chips with graphics laying around. Right now I've targeted what I think is the main issue, and that is the closed-protocol GPU. I'm working on a minimal GPU right now on an FPGA. Not sure if it will actually end up going anywhere, but I would really like to see an open-hardware GPU out on the market. Certainly it would not be an NVIDIA or AMD killer, but it would be a good card for people who just want to watch videos, browse the web, run terminals, etc. The main focus of this GPU would be to maximize resolutions and monitors, and minimize cost. Currently it looks like I could run 4 monitors at 1080p for about $50 (that's not taking into account bulk-order costs). I could try to work with nouveau (as I did before) but I'll just never feel ok with using a system that uses 'blobs' (nouveau terms for the bits that are sent to the card without knowing what they really are). -Brandon On 4/8/2012 3:45 PM, Michael Cardell Widerkrantz wrote: Since Brandon started this in a sort of rambling mood I'm keeping up with the tradition... This is just what's on top of my mind right now. per...@pluto.rain.com, 2012-03-06 17:05 (+0100): I _think_ SunTools/SunView were proprietary, Absolutely. although it's possible that Sun released the source code at some point. Much of the actual window system in SunView was implemented in the kernel, IIRC. That might not be interesting in this case. Another system I used on quite memory-starved Sun 3/50s (as little as 4 meg) and 3/60s and later on SPARCstations, was the Bellcore MGR window system: http://hack.org/mc/mgr/ http://en.wikipedia.org/wiki/ManaGeR Many users in the Lysator academic computing society where I first met MGR preferred it to SunView. It was really nice on monochrome monitors at 1152x900. It's also network transparent so you can run remote graphics applications. MGR was ported to a lot of systems, including FreeBSD. It might still compile, but it's unlikely to support anything higher than 640x480 on FreeBSD. If anyone tries to compile it and runs into problems I might be able to help. Just e-mail me. To support higher resolutions on FreeBSD Brandon would have to rewrite the functions in libbitblit. One way to do it would be to use vgl(3) to implement the libbitblit functions. Should be pretty straightforward, I think, and not too much work. On the other hand vgl(3) probably only supports VESA so Brandon will still have to write a special libbitblit for the nvidia card he mentions. MGR doesn't tile windows but Brandon might want to add a mode to do that. MGR has a slightly bothersome license, though, forbidding commercial sales so this might not be the best way forward. On Sun SPARCs under SunOS it was also possible to run a tiling window system called Oberon. It shares its name with a programming language and a complete native operating system. Oberon is a complete environment using the Oberon programming language so it might not be what Brandon wants but it might be interesting to look at nonetheless. I believe Oberon is still available and can run either as a native operating system or as an environment under other systems. The SPARC port I used many years ago was running under SunOS but was running directly on the console. I don't know if there are any modern Oberon systems that can do that. Incidentally, Oberon was one of the inspirations behind Rob Pike's acme editor on Plan 9. Acme, however, just handles text. Oberon does graphics as well. I've been thinking something along the same lines as Brandon for several years now: to write a lightweight window system. For many years I resisted X and kept using MGR, even going so far as porting MGR to Solaris and to Linux/SPARC just to be able to keep using MGR on more modern systems. I gave up, I think, around 1994. If I would do it again I would probably not work on MGR but I might use it for some ideas. One thing that MGR does that I wouldn't do was to force all graphics operations to be done through escape codes in terminal windows. While it might be great for network transparance it's not so great for the speed of local programs. The Wayland project is interesting but seems very Linux oriented. On the other hand work on KMS/GEM support on FreeBSD is coming along. It might be possible to get Wayland running on FreeBSD. I haven't looked into it myself (yet). James Gosling, who wrote both the Andrew window system and Sun's NeWS (not SunView, the *other* Sun window system, the one with a Postscript interpreter) has written an interesting paper about window system design. I have a copy here: http://hack.org/mc/texts/gosling-wsd.pdf Some people have mentioned Plan 9's 8 1/2 and rio. They are both very interesting window systems. While I think they have a very clean design I think
Re: time stops in vmware
On Sun, 08 Apr 2012 02:11:25 -0500, Daniel Braniss da...@cs.huji.ac.il wrote: Hi All There was some mention before that time stops under vmware, and now it's happened to me :-) the clock stopped now, the system is responsive, but eg sleep 1 never finishes. Is there a solution? btw, I'm running 8.2-stable, i'll try 8.3 soon. Can you recreate it? Does it go away if you use kern.hz=200 in loader.conf? We used to have to do that. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: problems with mmap() and disk caching
On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon
Re: Startvation of realtime piority threads
On Thursday, April 05, 2012 9:08:24 pm Sushanth Rai wrote: I understand the downside of badly written realtime app. In my case application runs in userspace without making much syscalls and by all means it is a well behaved application. Yes, I can wire memory, change the application to use mutex instead of spinlock and those changes should help but they are still working around the problem. I still believe kernel should not lower the realtime priority when blocking on resources. This can lead to priority inversion, especially since these threads run at fixed priorities and kernel doesn't muck with them. As you suggested _sleep() should not adjust the priorities for realtime threads. Hmm, sched_sleep() for both SCHED_4BSD and SCHED_ULE already does the right thing here in HEAD. if (PRI_BASE(td-td_pri_class) != PRI_TIMESHARE) return; Which OS version did you see this on? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Startvation of realtime piority threads
I'm on 7.2. sched_sleep() on 7.2 just records the sleep time. That's why I though _sleep might the right place to do the check. Thanks, Sushanth --- On Mon, 4/9/12, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: Startvation of realtime piority threads To: Sushanth Rai sushanth_...@yahoo.com Cc: freebsd-hackers@freebsd.org Date: Monday, April 9, 2012, 9:17 AM On Thursday, April 05, 2012 9:08:24 pm Sushanth Rai wrote: I understand the downside of badly written realtime app. In my case application runs in userspace without making much syscalls and by all means it is a well behaved application. Yes, I can wire memory, change the application to use mutex instead of spinlock and those changes should help but they are still working around the problem. I still believe kernel should not lower the realtime priority when blocking on resources. This can lead to priority inversion, especially since these threads run at fixed priorities and kernel doesn't muck with them. As you suggested _sleep() should not adjust the priorities for realtime threads. Hmm, sched_sleep() for both SCHED_4BSD and SCHED_ULE already does the right thing here in HEAD. if (PRI_BASE(td-td_pri_class) != PRI_TIMESHARE) return; Which OS version did you see this on? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Startvation of realtime piority threads
On Monday, April 09, 2012 2:08:50 pm Sushanth Rai wrote: I'm on 7.2. sched_sleep() on 7.2 just records the sleep time. That's why I though _sleep might the right place to do the check. Nah, sched_sleep() is more accurate since the sleep priority can have other side effects. Hmm, in stock 7.2, the rtprio range is below things like PVM, etc., so that shouldn't actually be buggy in that regard. I fixed this in 9.0 and HEAD when I moved the rtprio range up above the kernel sleep priorities. Are you using local patches to 7.2 to raise the priority of rtprio threads? Thanks, Sushanth --- On Mon, 4/9/12, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: Startvation of realtime piority threads To: Sushanth Rai sushanth_...@yahoo.com Cc: freebsd-hackers@freebsd.org Date: Monday, April 9, 2012, 9:17 AM On Thursday, April 05, 2012 9:08:24 pm Sushanth Rai wrote: I understand the downside of badly written realtime app. In my case application runs in userspace without making much syscalls and by all means it is a well behaved application. Yes, I can wire memory, change the application to use mutex instead of spinlock and those changes should help but they are still working around the problem. I still believe kernel should not lower the realtime priority when blocking on resources. This can lead to priority inversion, especially since these threads run at fixed priorities and kernel doesn't muck with them. As you suggested _sleep() should not adjust the priorities for realtime threads. Hmm, sched_sleep() for both SCHED_4BSD and SCHED_ULE already does the right thing here in HEAD. if (PRI_BASE(td-td_pri_class) != PRI_TIMESHARE) return; Which OS version did you see this on? -- John Baldwin -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [RFT][patch] Scheduling for HTT and not only
On 04/05/12 21:45, Alexander Motin wrote: On 05.04.2012 21:12, Arnaud Lacombe wrote: Hi, [Sorry for the delay, I got a bit sidetrack'ed...] 2012/2/17 Alexander Motinm...@freebsd.org: On 17.02.2012 18:53, Arnaud Lacombe wrote: On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motinm...@freebsd.org wrote: On 02/15/12 21:54, Jeff Roberson wrote: On Wed, 15 Feb 2012, Alexander Motin wrote: I've decided to stop those cache black magic practices and focus on things that really exist in this world -- SMT and CPU load. I've dropped most of cache related things from the patch and made the rest of things more strict and predictable: http://people.freebsd.org/~mav/sched.htt34.patch This looks great. I think there is value in considering the other approach further but I would like to do this part first. It would be nice to also add priority as a greater influence in the load balancing as well. I haven't got good idea yet about balancing priorities, but I've rewritten balancer itself. As soon as sched_lowest() / sched_highest() are more intelligent now, they allowed to remove topology traversing from the balancer itself. That should fix double-swapping problem, allow to keep some affinity while moving threads and make balancing more fair. I did number of tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and 16 threads everything is stationary as it should. With 9 threads I see regular and random load move between all 8 CPUs. Measurements on 5 minutes run show deviation of only about 5 seconds. It is the same deviation as I see caused by only scheduling of 16 threads on 8 cores without any balancing needed at all. So I believe this code works as it should. Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch I plan this to be a final patch of this series (more to come :)) and if there will be no problems or objections, I am going to commit it (except some debugging KTRs) in about ten days. So now it's a good time for reviews and testing. :) is there a place where all the patches are available ? All my scheduler patches are cumulative, so all you need is only the last mentioned here sched.htt40.patch. You may want to have a look to the result I collected in the `runs/freebsd-experiments' branch of: https://github.com/lacombar/hackbench/ and compare them with vanilla FreeBSD 9.0 and -CURRENT results available in `runs/freebsd'. On the dual package platform, your patch is not a definite win. But in some cases, especially for multi-socket systems, to let it show its best, you may want to apply additional patch from avg@ to better detect CPU topology: https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd test I conducted specifically for this patch did not showed much improvement... If I understand right, this test runs thousands of threads sending and receiving data over the pipes. It is quite likely that all CPUs will be always busy and so load balancing is not really important in this test, What looks good is that more complicated new code is not slower then old one. While this test seems very scheduler-intensive, it may depend on many other factors, such as syscall performance, context switch, etc. I'll try to play more with it. My profiling on 8-core Core i7 system shows that code from sched_ule.c staying on first places consumes still only 13% of kernel CPU time, while doing million of context switches per second. cpu_search(), affected by this patch, even less -- only 8%. The rest of time is spread between many small other functions. I did some optimizations at r234066 to reduce cpu_search(0 time to 6%, but looking on how unstable results of this test are, hardly any difference there can be really measured by it. I have strong feeling that while this test may be interesting for profiling, it's own results in first place depend not from how fast scheduler is, but from the pipes capacity and other alike things. Can somebody hint me what except pipe capacity and context switch to unblocked receiver prevents sender from sending all data in batch and then receiver from receiving them all in batch? If different OSes have different policies there, I think results could be incomparable. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Graphical Terminal Environment
Brandon writes: I'm still avidly trying to work on this idea, but right now the issue seems to be with AMD and NVIDIA not documenting their protocols. Intel does a good job, but I don't have any Intel chips with graphics laying around. I thought that AMD had documented most of it by now, with the major exception of the UVD? I'm working on a minimal GPU right now on an FPGA. Currently it looks like I could run 4 monitors at 1080p for about $50 Have FPGA prices come down that much? The OGP-D1 was quite a bit more than that, last time I looked. Or would that be the price for a production version with an ASIC? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Startvation of realtime piority threads
I'm using stock 7.2. The priorities as defined in priority.h are in this range: /* * Priorities range from 0 to 255, but differences of less then 4 (RQ_PPQ) * are insignificant. Ranges are as follows: * * Interrupt threads: 0 - 63 * Top half kernel threads: 64 - 127 * Realtime user threads: 128 - 159 * Time sharing user threads: 160 - 223 * Idle user threads: 224 - 255 * * XXX If/When the specific interrupt thread and top half thread ranges * disappear, a larger range can be used for user processes. */ The trouble is with vm_waitpfault(), which explicitly sleeps at PUSER. Sushanth --- On Mon, 4/9/12, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: Startvation of realtime piority threads To: Sushanth Rai sushanth_...@yahoo.com Cc: freebsd-hackers@freebsd.org Date: Monday, April 9, 2012, 11:37 AM On Monday, April 09, 2012 2:08:50 pm Sushanth Rai wrote: I'm on 7.2. sched_sleep() on 7.2 just records the sleep time. That's why I though _sleep might the right place to do the check. Nah, sched_sleep() is more accurate since the sleep priority can have other side effects. Hmm, in stock 7.2, the rtprio range is below things like PVM, etc., so that shouldn't actually be buggy in that regard. I fixed this in 9.0 and HEAD when I moved the rtprio range up above the kernel sleep priorities. Are you using local patches to 7.2 to raise the priority of rtprio threads? Thanks, Sushanth --- On Mon, 4/9/12, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: Startvation of realtime piority threads To: Sushanth Rai sushanth_...@yahoo.com Cc: freebsd-hackers@freebsd.org Date: Monday, April 9, 2012, 9:17 AM On Thursday, April 05, 2012 9:08:24 pm Sushanth Rai wrote: I understand the downside of badly written realtime app. In my case application runs in userspace without making much syscalls and by all means it is a well behaved application. Yes, I can wire memory, change the application to use mutex instead of spinlock and those changes should help but they are still working around the problem. I still believe kernel should not lower the realtime priority when blocking on resources. This can lead to priority inversion, especially since these threads run at fixed priorities and kernel doesn't muck with them. As you suggested _sleep() should not adjust the priorities for realtime threads. Hmm, sched_sleep() for both SCHED_4BSD and SCHED_ULE already does the right thing here in HEAD. if (PRI_BASE(td-td_pri_class) != PRI_TIMESHARE) return; Which OS version did you see this on? -- John Baldwin -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
mlockall() on freebsd 7.2 + amd64 returns EAGAIN
Hello, I have a simple program that links with the math library. The only thing that program does is to call mlockall(MCL_CURRENT | MCL_FUTURE). This call to mlockall fails with EAGAIN. I figured out that kernel vm_fault() is returning KERN_PROTECTION_FAILURE when it tries to fault-in the mmap'ed math library address. But I can't figure why. The /proc/mypid/map returns the following for the process: 0x800634000 0x80064c000 24 0 0xff0025571510 r-x 104 52 0x1000 COW NC vnode /lib/libm.so.5 0x80064c000 0x80064d000 1 0 0xff016f11c5e8 r-x 1 0 0x3100 COW NNC vnode /lib/libm.so.5 0x80064d000 0x80074c000 4 0 0xff0025571510 r-x 104 52 0x1000 COW NC vnode /lib/libm.so.5 Since ntpd calls mlockall with same option and links with math library too, I look at map o/p of ntpd, which looks slightly different resident column (3rd column) on 3rd line: 0x800682000 0x80069a000 8 0 0xff0025571510 r-x 100 50 0x1000 COW NC vnode /lib/libm.so.5 0x80069a000 0x80069b000 1 0 0xff0103b85870 r-x 1 0 0x3100 COW NNC vnode /lib/libm.so.5 0x80069b000 0x80079a000 0 0 0xff0025571510 r-x 100 50 0x1000 COW NC vnode /lib/libm.so.5 I don't know if that has anything to do with failure. The snippet of code that returns failure in vm_fault() is the following: if (fs.pindex = fs.object-size) { unlock_and_deallocate(fs); return (KERN_PROTECTION_FAILURE); } Any help would be appreciated. Thanks, Sushanth ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org