Re: Call fo comments - raising vfs.ufs.dirhash_reclaimage?
On Mon, Oct 07, 2013 at 07:34:24PM +0200, Davide Italiano wrote: What would perhaps be better than a hardcoded reclaim age would be to use an LRU-type approach and perhaps set a target percent to reclaim. That is, suppose you were to reclaim the oldest 10% of hashes on each lowmem call (and make the '10%' the tunable value). Then you will always make some amount of progress in a low memory situation (and if the situation remains dire you will eventually empty the entire cache), but the effective maximum age will be more dynamic. Right now if you haven't touched UFS in 5 seconds it throws the entire thing out on the first lowmem event. The LRU-approach would only throw the oldest 10% out on the first call, but eventually throw it all out if the situation remains dire. -- John Baldwin ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org I liked your idea more than what's available in HEAD right now and I implemented it. http://people.freebsd.org/~davide/review/ufs_direclaimage.diff I was unsure what kind of heuristic I should choose to select which (10% of) entries should be evicted so I just removed the first 10% ones from the head of the ufs_dirhash list (which should be the oldest). The code keeps rescanning the cache until 10% (or, the percentage set via SYSCTL) of the entry are freed, but probably we can discuss if this limit could be relaxed and just do a single scan over the list. Unfortunately I haven't a testcase to prove the effectiveness (or non-effectiveness) of the approach but I think either Ivan or Peter could be able to give it a spin, maybe. I gave this patch a spin for 12 hours without finding any problems. I can do more testing at a later time, if you want to. - Peter ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Memory reserves or lack thereof
On Mon, Nov 12, 2012 at 03:36:38PM +0200, Konstantin Belousov wrote: On Sun, Nov 11, 2012 at 03:40:24PM -0600, Alan Cox wrote: On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov kostik...@gmail.comwrote: On Fri, Nov 09, 2012 at 07:10:04PM +, Sears, Steven wrote: I have a memory subsystem design question that I'm hoping someone can answer. I've been looking at a machine that is completely out of memory, as in v_free_count = 0, v_cache_count = 0, I wondered how a machine could completely run out of memory like this, especially after finding a lack of interrupt storms or other pathologies that would tend to overcommit memory. So I started investigating. Most allocators come down to vm_page_alloc(), which has this guard: if ((curproc == pageproc) (page_req != VM_ALLOC_INTERRUPT)) { page_req = VM_ALLOC_SYSTEM; }; if (cnt.v_free_count + cnt.v_cache_count cnt.v_free_reserved || (page_req == VM_ALLOC_SYSTEM cnt.v_free_count + cnt.v_cache_count cnt.v_interrupt_free_min) || (page_req == VM_ALLOC_INTERRUPT cnt.v_free_count + cnt.v_cache_count 0)) { The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate every last page. From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare, perhaps only used from interrupt threads. Not so, see kmem_malloc() or uma_small_alloc() which both contain this mapping: if ((flags (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT) pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED; else pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED; Note that M_USE_RESERVE has been deprecated and is used in just a handful of places. Also note that lots of code paths come through these routines. What this means is essentially _any_ allocation using M_NOWAIT will bypass whatever reserves have been held back and will take every last page available. There is no documentation stating M_NOWAIT has this side effect of essentially being privileged, so any innocuous piece of code that can't block will use it. And of course M_NOWAIT is literally used all over. It looks to me like the design goal of the BSD allocators is on recovery; it will give all pages away knowing it can recover. Am I missing anything? I would have expected some small number of pages to be held in reserve just in case. And I didn't expect M_NOWAIT to be a sort of back door for grabbing memory. Your analysis is right, there is nothing to add or correct. This is the reason to strongly prefer M_WAITOK. Agreed. Once upon time, before SMPng, M_NOWAIT was rarely used. It was well understand that it should only be used by interrupt handlers. The trouble is that M_NOWAIT conflates two orthogonal things. The obvious being that the allocation shouldn't sleep. The other being how far we're willing to deplete the cache/free page queues. When fine-grained locking got sprinkled throughout the kernel, we all to often found ourselves wanting to do allocations without the possibility of blocking. So, M_NOWAIT became commonplace, where it wasn't before. This had the unintended consequence of introducing a lot of memory allocations in the top-half of the kernel, i.e., non-interrupt handling code, that were digging deep into the cache/free page queues. Also, ironically, in today's kernel an M_NOWAIT | M_USE_RESERVE allocation is less likely to succeed than an M_NOWAIT allocation. However, prior to FreeBSD 7.x, M_NOWAIT couldn't allocate a cached page; it could only allocate a free page. M_USE_RESERVE said that it ok to allocate a cached page even though M_NOWAIT was specified. Consequently, the system wouldn't dig as far into the free page queue if M_USE_RESERVE was specified, because it was allowed to reclaim a cached page. In conclusion, I think it's time that we change M_NOWAIT so that it doesn't dig any deeper into the cache/free page queues than M_WAITOK does and reintroduce a M_USE_RESERVE-like flag that says dig deep into the cache/free page queues. The trouble is that we then need to identify all of those places that are implicitly depending on the current behavior of M_NOWAIT also digging deep into the cache/free page queues so that we can add an explicit M_USE_RESERVE. Alan P.S. I suspect that we should also increase the size of the page reserve that is kept for VM_ALLOC_INTERRUPT allocations in vm_page_alloc*(). How many legitimate users of a new M_USE_RESERVE-like flag in today's kernel could actually be satisfied by two pages? I am almost sure that most of people who put the M_NOWAIT flag, do not know the 'allow the deeper drain of free queue' effect. As such, I believe we should flip the meaning of
Re: Kernel threads inherit CPU affinity from random sibling
On Sat, Jan 28, 2012 at 02:39:17PM +0100, Attilio Rao wrote: 2012/1/28 Attilio Rao atti...@freebsd.org: 2012/1/28 Ryan Stone ryst...@gmail.com: On Fri, Jan 27, 2012 at 10:41 PM, Attilio Rao atti...@freebsd.org wrote: I think what you found out is very sensitive. However, the patch is not correct as you cannot call cpuset_setthread() with thread_lock held. Whoops! I actually discovered that for myself and had already fixed it, but apparently I included an old version of the patch in the email. Hence this is my fix: http://www.freebsd.org/~attilio/cpuset_root.patch Oh, I do like this better. I tried something similar myself but abandoned it because I misread how sched_affinity() was implemented by 4BSD(I had gotten the impression that once TSF_AFFINITY is set it could never be cleared). Do you have a pathological test-case for it? Are you going to test the patch? BTW, I've just now updated the patch in order to remove an added white line and s/priority/affinity in comments. I've tested this patch with what I got of threaded test scenarios, for 14 hours without finding any issues. - Peter ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Seeking testers for change to lib/libc/gen/fts.c
PROBLEM: Find core dumps with extreme long path names. See also kern/12855 CAUSE: fts.c does not handle realloc of buffer space correctly. FIX: Upgrade fts.c from OpenBSD version 1.9 to 1.20. The fix for when fts_open is used with option FTS_NOCHDIR the full path entry of type FTS_DP is returned with a trailing '/' if the final directory is empty, was incorporated in version 1.20. Thanx to Todd Miller [EMAIL PROTECTED] The patch is available at http://www.freebsd.org/~pho/fts.diff -- Peter Holm To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Seeking testers for change to lib/libc/gen/fts.c
PROBLEM: Find core dumps with extreme long path names. See also kern/12855 CAUSE: fts.c does not handle realloc of buffer space correctly. FIX: Upgrade fts.c from OpenBSD version 1.9 to 1.20. The fix for when fts_open is used with option FTS_NOCHDIR the full path entry of type FTS_DP is returned with a trailing '/' if the final directory is empty, was incorporated in version 1.20. Thanx to Todd Miller mill...@openbsd.org The patch is available at http://www.freebsd.org/~pho/fts.diff -- Peter Holm To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
NFS V3 and mkdir bug
ent# gdb -k -s kernel.debug -e /var/crash/kernel.6 -c /var/crash/vmcore.6 IdlePTD 3932160 initial pcb at 33cfc0 panicstr: ffs_valloc: dup alloc panic messages: --- panic: ffs_valloc: dup alloc --- #0 boot (howto=256) at ../../kern/kern_shutdown.c:291 291 dumppcb.pcb_cr3 = rcr3(); (kgdb) bt #0 boot (howto=256) at ../../kern/kern_shutdown.c:291 #1 0xc016710d in panic (fmt=0xc02f69c1 "ffs_valloc: dup alloc") at ../../kern/kern_shutdown.c:505 #2 0xc0224103 in ffs_valloc (pvp=0xc8744a80, mode=16888, cred=0xc0b94384, vpp=0xc85d8a04) at ../../ufs/ffs/ffs_alloc.c:605 #3 0xc0236353 in ufs_mkdir (ap=0xc85d8bc4) at ../../ufs/ufs/ufs_vnops.c:1307 #4 0xc02374a1 in ufs_vnoperate (ap=0xc85d8bc4) at ../../ufs/ufs/ufs_vnops.c:2316 #5 0xc01cc26d in nfsrv_mkdir (nfsd=0xc0b94300, slp=0xc09e4600, procp=0xc7c05de0, mrq=0xc85d8dc4) at vnode_if.h:611 #6 0xc01da76e in nfssvc_nfsd (nsd=0xc85d8e80, argp=0x8071bc0 "", p=0xc7c05de0) at ../../nfs/nfs_syscalls.c:650 #7 0xc01da08f in nfssvc (p=0xc7c05de0, uap=0xc85d8f80) at ../../nfs/nfs_syscalls.c:346 #8 0xc026d496 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 4, tf_esi = 1, tf_ebp = -1077944892, tf_isp = -933392428, tf_ebx = 0, tf_edx = -1077944336, tf_ecx = 0, tf_eax = 155, tf_trapno = 12, tf_err = 2, tf_eip = 134517008, tf_cs = 31, tf_eflags = 646, tf_esp = -1077945284, tf_ss = 47}) at ../../i386/i386/trap.c:1056 #9 0xc025e526 in Xint0x80_syscall () #10 0x80480e9 in ?? () (kgdb) quit current# exit Any suggestions as where to investigate? Regards -- Peter Holm | mailto:[EMAIL PROTECTED] | http://login.dknet.dk/~pho/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
NFS V3 and mkdir bug
alloc --- #0 boot (howto=256) at ../../kern/kern_shutdown.c:291 291 dumppcb.pcb_cr3 = rcr3(); (kgdb) bt #0 boot (howto=256) at ../../kern/kern_shutdown.c:291 #1 0xc016710d in panic (fmt=0xc02f69c1 ffs_valloc: dup alloc) at ../../kern/kern_shutdown.c:505 #2 0xc0224103 in ffs_valloc (pvp=0xc8744a80, mode=16888, cred=0xc0b94384, vpp=0xc85d8a04) at ../../ufs/ffs/ffs_alloc.c:605 #3 0xc0236353 in ufs_mkdir (ap=0xc85d8bc4) at ../../ufs/ufs/ufs_vnops.c:1307 #4 0xc02374a1 in ufs_vnoperate (ap=0xc85d8bc4) at ../../ufs/ufs/ufs_vnops.c:2316 #5 0xc01cc26d in nfsrv_mkdir (nfsd=0xc0b94300, slp=0xc09e4600, procp=0xc7c05de0, mrq=0xc85d8dc4) at vnode_if.h:611 #6 0xc01da76e in nfssvc_nfsd (nsd=0xc85d8e80, argp=0x8071bc0 , p=0xc7c05de0) at ../../nfs/nfs_syscalls.c:650 #7 0xc01da08f in nfssvc (p=0xc7c05de0, uap=0xc85d8f80) at ../../nfs/nfs_syscalls.c:346 #8 0xc026d496 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 4, tf_esi = 1, tf_ebp = -1077944892, tf_isp = -933392428, tf_ebx = 0, tf_edx = -1077944336, tf_ecx = 0, tf_eax = 155, tf_trapno = 12, tf_err = 2, tf_eip = 134517008, tf_cs = 31, tf_eflags = 646, tf_esp = -1077945284, tf_ss = 47}) at ../../i386/i386/trap.c:1056 #9 0xc025e526 in Xint0x80_syscall () #10 0x80480e9 in ?? () (kgdb) quit current# exit Any suggestions as where to investigate? Regards -- Peter Holm | mailto:pe...@holm.cc | http://login.dknet.dk/~pho/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message