PUFFS deadlock
Emmanuel Dreyfus wrote: > What about adding a timeout in struct puffs_msgpark and use it for > inactive operations? Returning EAGAIN from puffs_vnop_inactive seems an > easy way to work around this deadlock. I reply to myself. This will probably break many things since VOP_INACTIVE return code is not checked, neither in vrelel() nor in vclean(). As I understand, the problem will occur when: 1) a PUFFS filesystem does a namei(), most likely on a path component leading to its filesystem mont point. It sleeps for the parent vnode lock. 2) another process which locks the desired vnode goes the getnewvnode/getcleanvnode/vclean/VOP_INACTIVE path on a vnode in the PUFFS filesystem. 3) PUFFS is sleeping in kernel and will never service the VOP_INACTIVE. The vnode remains locked forever. Since it is very likely to be the root vnode, all processes doing a VOP_LOOKUP end up sleeping for the root vnode lock. How can this be fixed? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: PUFFS deadlock
On Feb 3, 2012, at 5:45 PM, Emmanuel Dreyfus wrote: > Emmanuel Dreyfus wrote: > >> What about adding a timeout in struct puffs_msgpark and use it for >> inactive operations? Returning EAGAIN from puffs_vnop_inactive seems an >> easy way to work around this deadlock. > > I reply to myself. This will probably break many things since > VOP_INACTIVE return code is not checked, neither in vrelel() nor in > vclean(). > > As I understand, the problem will occur when: > > 1) a PUFFS filesystem does a namei(), most likely on a path component > leading to its filesystem mont point. It sleeps for the parent vnode > lock. > > 2) another process which locks the desired vnode goes the > getnewvnode/getcleanvnode/vclean/VOP_INACTIVE > path on a vnode in the PUFFS filesystem. So this is on 5.x ? For -current the getnewvnode()->getcleanvnode() path has been replaced with a cleaner thread so getnewvnode() always gets a new vnode. This was done to break deadlocks like this one (most time with layered fs involved). > 3) PUFFS is sleeping in kernel and will never service the VOP_INACTIVE. > The vnode remains locked forever. Since it is very likely to be the root > vnode, all processes doing a VOP_LOOKUP end up sleeping for the root > vnode lock. > > How can this be fixed? > > -- > Emmanuel Dreyfus > http://hcpnet.free.fr/pubz > m...@netbsd.org -- Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: PUFFS deadlock
J. Hannken-Illjes wrote: > So this is on 5.x ? Yes, this is on netbsd-5 > For -current the getnewvnode()->getcleanvnode() path has been replaced > with a cleaner thread so getnewvnode() always gets a new vnode. This was > done to break deadlocks like this one (most time with layered fs involved). Is it feasible/worth doing a pullup? It seems a big change, but I may be wrong. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: PUFFS deadlock
On Feb 3, 2012, at 7:23 PM, Emmanuel Dreyfus wrote: > J. Hannken-Illjes wrote: > >> So this is on 5.x ? > > Yes, this is on netbsd-5 > >> For -current the getnewvnode()->getcleanvnode() path has been replaced >> with a cleaner thread so getnewvnode() always gets a new vnode. This was >> done to break deadlocks like this one (most time with layered fs involved). > > Is it feasible/worth doing a pullup? It seems a big change, but I may be > wrong. kern/vfs_vnode.c rev. 1.12 and 1.13 should be sufficient -- but it needs some testing on netbsd-5. -- Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: PUFFS deadlock
J. Hannken-Illjes wrote: > kern/vfs_vnode.c rev. 1.12 and 1.13 should be sufficient -- but it needs some > testing on netbsd-5. kern/vfs_vnode.c does not even exist on netbsd-5, I guess it means it is not a reasonable pullup. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: PUFFS deadlock
On Feb 3, 2012, at 8:44 PM, Emmanuel Dreyfus wrote: > J. Hannken-Illjes wrote: > >> kern/vfs_vnode.c rev. 1.12 and 1.13 should be sufficient -- but it needs some >> testing on netbsd-5. > > kern/vfs_vnode.c does not even exist on netbsd-5, I guess it means it is > not a reasonable pullup. It is called vfs_subr.c here -- you could just try a patch. -- Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: PUFFS deadlock
J. Hannken-Illjes wrote: > It is called vfs_subr.c here -- you could just try a patch. I gave it a try, but it crashes very early, with no chance of debugging it: console is not yet usable NetBSD 5.1_STABLE (XEN3_DOMU_PUFFS) #370: Sat Feb 4 03:21:45 CET 2012 root@mai:/usr/src-5/sys/arch/i386/compile/obj/XEN3_DOMU_PUFFS total memory = 512 MB avail memory = 497 MB Condition variable error: lockdebug_alloc: already initialized lock address : 0xc04d751c type : spin initialized : 0xc037903e interlock: 00 panic: LOCKDEBUG fatal breakpoint trap in supervisor mode trap type 1 code 0 eip c03cbafc cs 9 eflags 246 cr2 0 ilevel 8 Stopped in pid 0.1 (system) at netbsd:breakpoint+0x4: popl%ebp db> xenconscn_getc(): not console -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org