PUFFS deadlock

2012-02-03 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> What about adding a timeout in struct puffs_msgpark and use it for
> inactive operations? Returning EAGAIN from puffs_vnop_inactive seems an
> easy way to work around this deadlock.

I reply to myself. This will probably break many things since
VOP_INACTIVE return code is not checked, neither in vrelel() nor in
vclean(). 

As I understand, the problem will occur when:

1) a PUFFS filesystem does a namei(), most likely on a path component
leading to its filesystem mont point. It sleeps for the parent vnode
lock.

2) another process which locks the desired vnode goes the
getnewvnode/getcleanvnode/vclean/VOP_INACTIVE
path on a vnode in the PUFFS filesystem.

3) PUFFS is sleeping in kernel and will never service the VOP_INACTIVE.
The vnode remains locked forever. Since it is very likely to be the root
vnode, all processes doing a VOP_LOOKUP end up sleeping for the root
vnode lock.
 
How can this be fixed?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: PUFFS deadlock

2012-02-03 Thread J. Hannken-Illjes
On Feb 3, 2012, at 5:45 PM, Emmanuel Dreyfus wrote:

> Emmanuel Dreyfus  wrote:
> 
>> What about adding a timeout in struct puffs_msgpark and use it for
>> inactive operations? Returning EAGAIN from puffs_vnop_inactive seems an
>> easy way to work around this deadlock.
> 
> I reply to myself. This will probably break many things since
> VOP_INACTIVE return code is not checked, neither in vrelel() nor in
> vclean(). 
> 
> As I understand, the problem will occur when:
> 
> 1) a PUFFS filesystem does a namei(), most likely on a path component
> leading to its filesystem mont point. It sleeps for the parent vnode
> lock.
> 
> 2) another process which locks the desired vnode goes the
> getnewvnode/getcleanvnode/vclean/VOP_INACTIVE
> path on a vnode in the PUFFS filesystem.

So this is on 5.x ?

For -current the getnewvnode()->getcleanvnode() path has been replaced
with a cleaner thread so getnewvnode() always gets a new vnode.  This was
done to break deadlocks like this one (most time with layered fs involved).

> 3) PUFFS is sleeping in kernel and will never service the VOP_INACTIVE.
> The vnode remains locked forever. Since it is very likely to be the root
> vnode, all processes doing a VOP_LOOKUP end up sleeping for the root
> vnode lock.
> 
> How can this be fixed?
> 
> -- 
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org

--
Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: PUFFS deadlock

2012-02-03 Thread Emmanuel Dreyfus
J. Hannken-Illjes  wrote:

> So this is on 5.x ?

Yes, this is on netbsd-5
 
> For -current the getnewvnode()->getcleanvnode() path has been replaced
> with a cleaner thread so getnewvnode() always gets a new vnode.  This was
> done to break deadlocks like this one (most time with layered fs involved).

Is it feasible/worth doing a pullup? It seems a big change, but I may be
wrong.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: PUFFS deadlock

2012-02-03 Thread J. Hannken-Illjes

On Feb 3, 2012, at 7:23 PM, Emmanuel Dreyfus wrote:

> J. Hannken-Illjes  wrote:
> 
>> So this is on 5.x ?
> 
> Yes, this is on netbsd-5
> 
>> For -current the getnewvnode()->getcleanvnode() path has been replaced
>> with a cleaner thread so getnewvnode() always gets a new vnode.  This was
>> done to break deadlocks like this one (most time with layered fs involved).
> 
> Is it feasible/worth doing a pullup? It seems a big change, but I may be
> wrong.

kern/vfs_vnode.c rev. 1.12 and 1.13 should be sufficient -- but it needs some
testing on netbsd-5.

--
Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: PUFFS deadlock

2012-02-03 Thread Emmanuel Dreyfus
J. Hannken-Illjes  wrote:

> kern/vfs_vnode.c rev. 1.12 and 1.13 should be sufficient -- but it needs some
> testing on netbsd-5.

kern/vfs_vnode.c does not even exist on netbsd-5, I guess it means it is
not a reasonable pullup.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: PUFFS deadlock

2012-02-03 Thread J. Hannken-Illjes

On Feb 3, 2012, at 8:44 PM, Emmanuel Dreyfus wrote:

> J. Hannken-Illjes  wrote:
> 
>> kern/vfs_vnode.c rev. 1.12 and 1.13 should be sufficient -- but it needs some
>> testing on netbsd-5.
> 
> kern/vfs_vnode.c does not even exist on netbsd-5, I guess it means it is
> not a reasonable pullup.

It is called vfs_subr.c here -- you could just try a patch.

--
Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: PUFFS deadlock

2012-02-03 Thread Emmanuel Dreyfus
J. Hannken-Illjes  wrote:

> It is called vfs_subr.c here -- you could just try a patch.

I gave it a try, but it crashes very early, with no chance of debugging
it: console is not yet usable


NetBSD 5.1_STABLE (XEN3_DOMU_PUFFS) #370: Sat Feb  4 03:21:45 CET 2012
root@mai:/usr/src-5/sys/arch/i386/compile/obj/XEN3_DOMU_PUFFS
total memory = 512 MB
avail memory = 497 MB
Condition variable error: lockdebug_alloc: already initialized

lock address : 0xc04d751c type :   spin
initialized  : 0xc037903e interlock: 00

panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c03cbafc cs 9 eflags 246 cr2 0 ilevel 8
Stopped in pid 0.1 (system) at  netbsd:breakpoint+0x4:  popl%ebp
db> xenconscn_getc(): not console

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org