Re: Possible kqueue related issue on STABLE/RC.

2013-09-26 Thread Patrick Lamaiziere
Le Wed, 25 Sep 2013 11:06:33 +0300,
Konstantin Belousov  a écrit :

Hello,

> > > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote:
> > > > I'd like to understand why you think protecting these functions
> > > > w/ the _DETACHED check is correct...  In kern_event.c, all
> > > > calls to f_detach are followed by knote_drop which will ensure
> > > > that the knote is removed and free, so no more f_event calls
> > > > will be called on that knote..
> > > 
> > > My current belief is that what happens is a glitch in the
> > > kqueue_register(). After a new knote is created and attached, the
> > > kq lock is dropped and then f_event() is called. If the vnode is
> > > reclaimed or possible freed meantime, f_event() seems to
> > > dereference freed memory, since kn_hook points to freed vnode.
> > > 
> > > The issue as I see it is that vnode lifecycle is detached from the
> > > knote lifecycle.  Might be, only the second patch, which acquires
> > > a hold reference on the vnode for each knote, is really needed.
> > > But before going into any conclusions, I want to see the testing
> > > results.
> > 
> > Testing looks good with your latest patch. I was able to run a
> > complete poudriere bulk (870 packages). I'm running another bulk to
> > see..

I've made another bulk without problem (with complete patch)

> > If you have other patches to test just ask, I have not updated my
> > packages because there was a change to make gvfsd to ignore some
> > poudriere activity. So I guess it will be harder to see this
> > problem.

> Could you, please, test with the only patch
> http://people.freebsd.org/~kib/misc/vnode_filter.1.patch
> applied ?  I wonder would it be enough.

Looks good with this single patch too, one poudriere bulk is
completed and I'm doing another just in case (but I think it would
have already paniced, that's quite reproductible).

Thanks, regards.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-25 Thread John-Mark Gurney
Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 22:40 +0300:
> On Wed, Sep 25, 2013 at 09:19:54AM -0700, John-Mark Gurney wrote:
> > Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 00:21 +0300:
> > > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote:
> > > > I'd like to understand why you think protecting these functions w/
> > > > the _DETACHED check is correct...  In kern_event.c, all calls to
> > > > f_detach are followed by knote_drop which will ensure that the knote
> > > > is removed and free, so no more f_event calls will be called on that
> > > > knote..
> > > 
> > > My current belief is that what happens is a glitch in the
> > > kqueue_register(). After a new knote is created and attached, the kq
> > > lock is dropped and then f_event() is called. If the vnode is reclaimed
> > > or possible freed meantime, f_event() seems to dereference freed memory,
> > > since kn_hook points to freed vnode.
> > 
> > Well, if that happens, then the vnode isn't properly clearing up the
> > knote before it gets reclaimed...  It is the vnode's responsibility to
> > make sure any knotes that are associated w/ it get cleaned up properly..
> See below.
> 
> > 
> > > The issue as I see it is that vnode lifecycle is detached from the knote
> > > lifecycle.  Might be, only the second patch, which acquires a hold 
> > > reference
> > > on the vnode for each knote, is really needed.  But before going into any
> > > conclusions, I want to see the testing results.
> > 
> > The vnode lifecycle can't/shouldn't be detached from the knote lifecycle
> > since the knote contains a pointer to the vnode...  There is the function
> > knlist_clear that can be used to clean up knotes when the object goes
> > away..
> This is done from the vdropl() (null hold count) -> destroy_vpollinfo().
> But this is too late, IMO. vdropl() is only executing with the vnode
> interlock locked, and knote lock is vnode lock.  This way, you might
> get far enough into vdropl in other thread, while trying to operate on
> a vnode with zero hold count in some kqueue code path.
> 
> We do not drain the vnode lock holders when destroying vnode, because
> VFS interface require that anybody locking the vnode own a hold reference
> on it.  My short patch should fix exactly this issue, hopefully we will see.

Which clearly wasn't happening before...  With the above, and rereading
your patch, I understand how this patch should fix the issue and bring
the life cycle of the two back into sync...

> > I was looking at the code, is there a good reason why you do
> > VI_LOCK/VI_UNLOCK to protect the knote fields instead of getting it in
> > the vfs_knllock/vfs_knlunlock functions?  Because kq code will modify
> > the knote fields w/ only running the vfs_knllock/vfs_knlunlock functions,
> > so either the VI_LOCK/VI_UNLOCK are unnecessary, or should be moved to
> > vfs_knllock/vfs_knlunlock...
> 
> vfs_knllock() is fine. The problematic case if the
> VOP_{PRE,POST}->VFS_KNOTE->VN_KNOTE->KNOTE calls from the VOPs. If you
> look at the vfs_knl_assert_locked(), you would note that the function
> only asserts that vnode is locked, not that it is locked exclusively.
> This is because some filesystems started require from VFS to do e.g.
> VOP_WRITE() with the vnode only shared-locked, and KNOTE() is called
> with shared-locked vnode lock.
> 
> The vfs_knllock() obtain the exclusive lock on the vnode, so kqueue
> callers are fine. Taking vnode interlock inside the filters provides
> enough exclusion for the VOP callers.

Ahh, ok, makes sense now..  Clearly I need to learn more about the
VFS/vnope system.. :)

Thanks for the explanations...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-25 Thread Konstantin Belousov
On Wed, Sep 25, 2013 at 09:19:54AM -0700, John-Mark Gurney wrote:
> Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 00:21 +0300:
> > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote:
> > > I'd like to understand why you think protecting these functions w/
> > > the _DETACHED check is correct...  In kern_event.c, all calls to
> > > f_detach are followed by knote_drop which will ensure that the knote
> > > is removed and free, so no more f_event calls will be called on that
> > > knote..
> > 
> > My current belief is that what happens is a glitch in the
> > kqueue_register(). After a new knote is created and attached, the kq
> > lock is dropped and then f_event() is called. If the vnode is reclaimed
> > or possible freed meantime, f_event() seems to dereference freed memory,
> > since kn_hook points to freed vnode.
> 
> Well, if that happens, then the vnode isn't properly clearing up the
> knote before it gets reclaimed...  It is the vnode's responsibility to
> make sure any knotes that are associated w/ it get cleaned up properly..
See below.

> 
> > The issue as I see it is that vnode lifecycle is detached from the knote
> > lifecycle.  Might be, only the second patch, which acquires a hold reference
> > on the vnode for each knote, is really needed.  But before going into any
> > conclusions, I want to see the testing results.
> 
> The vnode lifecycle can't/shouldn't be detached from the knote lifecycle
> since the knote contains a pointer to the vnode...  There is the function
> knlist_clear that can be used to clean up knotes when the object goes
> away..
This is done from the vdropl() (null hold count) -> destroy_vpollinfo().
But this is too late, IMO. vdropl() is only executing with the vnode
interlock locked, and knote lock is vnode lock.  This way, you might
get far enough into vdropl in other thread, while trying to operate on
a vnode with zero hold count in some kqueue code path.

We do not drain the vnode lock holders when destroying vnode, because
VFS interface require that anybody locking the vnode own a hold reference
on it.  My short patch should fix exactly this issue, hopefully we will see.

> 
> I was looking at the code, is there a good reason why you do
> VI_LOCK/VI_UNLOCK to protect the knote fields instead of getting it in
> the vfs_knllock/vfs_knlunlock functions?  Because kq code will modify
> the knote fields w/ only running the vfs_knllock/vfs_knlunlock functions,
> so either the VI_LOCK/VI_UNLOCK are unnecessary, or should be moved to
> vfs_knllock/vfs_knlunlock...

vfs_knllock() is fine. The problematic case if the
VOP_{PRE,POST}->VFS_KNOTE->VN_KNOTE->KNOTE calls from the VOPs. If you
look at the vfs_knl_assert_locked(), you would note that the function
only asserts that vnode is locked, not that it is locked exclusively.
This is because some filesystems started require from VFS to do e.g.
VOP_WRITE() with the vnode only shared-locked, and KNOTE() is called
with shared-locked vnode lock.

The vfs_knllock() obtain the exclusive lock on the vnode, so kqueue
callers are fine. Taking vnode interlock inside the filters provides
enough exclusion for the VOP callers.


pgpmOgoOWL9Qn.pgp
Description: PGP signature


Re: Possible kqueue related issue on STABLE/RC.

2013-09-25 Thread John-Mark Gurney
Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 00:21 +0300:
> On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote:
> > I'd like to understand why you think protecting these functions w/
> > the _DETACHED check is correct...  In kern_event.c, all calls to
> > f_detach are followed by knote_drop which will ensure that the knote
> > is removed and free, so no more f_event calls will be called on that
> > knote..
> 
> My current belief is that what happens is a glitch in the
> kqueue_register(). After a new knote is created and attached, the kq
> lock is dropped and then f_event() is called. If the vnode is reclaimed
> or possible freed meantime, f_event() seems to dereference freed memory,
> since kn_hook points to freed vnode.

Well, if that happens, then the vnode isn't properly clearing up the
knote before it gets reclaimed...  It is the vnode's responsibility to
make sure any knotes that are associated w/ it get cleaned up properly..

> The issue as I see it is that vnode lifecycle is detached from the knote
> lifecycle.  Might be, only the second patch, which acquires a hold reference
> on the vnode for each knote, is really needed.  But before going into any
> conclusions, I want to see the testing results.

The vnode lifecycle can't/shouldn't be detached from the knote lifecycle
since the knote contains a pointer to the vnode...  There is the function
knlist_clear that can be used to clean up knotes when the object goes
away..

I was looking at the code, is there a good reason why you do
VI_LOCK/VI_UNLOCK to protect the knote fields instead of getting it in
the vfs_knllock/vfs_knlunlock functions?  Because kq code will modify
the knote fields w/ only running the vfs_knllock/vfs_knlunlock functions,
so either the VI_LOCK/VI_UNLOCK are unnecessary, or should be moved to
vfs_knllock/vfs_knlunlock...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-25 Thread Konstantin Belousov
On Wed, Sep 25, 2013 at 09:58:05AM +0200, Patrick Lamaiziere wrote:
> Le Wed, 25 Sep 2013 00:21:27 +0300,
> Konstantin Belousov  a ?crit :
> 
> Hello,
> 
> > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote:
> > > I'd like to understand why you think protecting these functions w/
> > > the _DETACHED check is correct...  In kern_event.c, all calls to
> > > f_detach are followed by knote_drop which will ensure that the knote
> > > is removed and free, so no more f_event calls will be called on that
> > > knote..
> > 
> > My current belief is that what happens is a glitch in the
> > kqueue_register(). After a new knote is created and attached, the kq
> > lock is dropped and then f_event() is called. If the vnode is
> > reclaimed or possible freed meantime, f_event() seems to dereference
> > freed memory, since kn_hook points to freed vnode.
> > 
> > The issue as I see it is that vnode lifecycle is detached from the
> > knote lifecycle.  Might be, only the second patch, which acquires a
> > hold reference on the vnode for each knote, is really needed.  But
> > before going into any conclusions, I want to see the testing results.
> 
> Testing looks good with your latest patch. I was able to run a complete
> poudriere bulk (870 packages). I'm running another bulk to see.
> 
> If you have other patches to test just ask, I have not updated my
> packages because there was a change to make gvfsd to ignore some
> poudriere activity. So I guess it will be harder to see this
> problem.

Very good, thank you.

Could you, please, test with the only patch
http://people.freebsd.org/~kib/misc/vnode_filter.1.patch
applied ?  I wonder would it be enough.


pgp7_QMOxJKl9.pgp
Description: PGP signature


Re: Possible kqueue related issue on STABLE/RC.

2013-09-25 Thread Patrick Lamaiziere
Le Wed, 25 Sep 2013 00:21:27 +0300,
Konstantin Belousov  a écrit :

Hello,

> On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote:
> > I'd like to understand why you think protecting these functions w/
> > the _DETACHED check is correct...  In kern_event.c, all calls to
> > f_detach are followed by knote_drop which will ensure that the knote
> > is removed and free, so no more f_event calls will be called on that
> > knote..
> 
> My current belief is that what happens is a glitch in the
> kqueue_register(). After a new knote is created and attached, the kq
> lock is dropped and then f_event() is called. If the vnode is
> reclaimed or possible freed meantime, f_event() seems to dereference
> freed memory, since kn_hook points to freed vnode.
> 
> The issue as I see it is that vnode lifecycle is detached from the
> knote lifecycle.  Might be, only the second patch, which acquires a
> hold reference on the vnode for each knote, is really needed.  But
> before going into any conclusions, I want to see the testing results.

Testing looks good with your latest patch. I was able to run a complete
poudriere bulk (870 packages). I'm running another bulk to see.

If you have other patches to test just ask, I have not updated my
packages because there was a change to make gvfsd to ignore some
poudriere activity. So I guess it will be harder to see this
problem.

Many thanks Konstantin,
Regards
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-24 Thread Konstantin Belousov
On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote:
> I'd like to understand why you think protecting these functions w/
> the _DETACHED check is correct...  In kern_event.c, all calls to
> f_detach are followed by knote_drop which will ensure that the knote
> is removed and free, so no more f_event calls will be called on that
> knote..

My current belief is that what happens is a glitch in the
kqueue_register(). After a new knote is created and attached, the kq
lock is dropped and then f_event() is called. If the vnode is reclaimed
or possible freed meantime, f_event() seems to dereference freed memory,
since kn_hook points to freed vnode.

The issue as I see it is that vnode lifecycle is detached from the knote
lifecycle.  Might be, only the second patch, which acquires a hold reference
on the vnode for each knote, is really needed.  But before going into any
conclusions, I want to see the testing results.


pgpE9EQ09ovgc.pgp
Description: PGP signature


Re: Possible kqueue related issue on STABLE/RC.

2013-09-24 Thread John-Mark Gurney
Konstantin Belousov wrote this message on Tue, Sep 24, 2013 at 15:14 +0300:
> On Tue, Sep 24, 2013 at 11:47:38AM +0200, Patrick Lamaiziere wrote:
> > Le Tue, 24 Sep 2013 11:29:09 +0300,
> > Konstantin Belousov  a ?crit :
> > 
> > Hello,
> > 
> > ...
> > 
> > > > > > Ok This has been mfced to 9.2-STABLE. But I still see this panic
> > > > > > with 9-2/STABLE of today (Revision : 255811). This may be better
> > > > > > because before the box paniced within minutes and now within
> > > > > > hours (still using poudriere).
> > > > > > 
> > > > > > panic:
> > > > > > fault code  = supervisor read data, page not present
> > > > > > instruction pointer = 0x20:0x808ebfcd
> > > > > > stack pointer   = 0x28:0xff824c2e0630
> > > > > > frame pointer   = 0x28:0xff824c2e06a0
> > > > > > code segment= base 0x0, limit 0xf, type 0x1b
> > > > > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > > > > processor eflags= interrupt enabled, resume, IOPL = 0
> > > > > > current process = 54243 (gvfsd-trash)
> > > > > > trap number = 12
> > > > > > panic: page fault
> > > > > > cpuid = 2
> > > > > > KDB: stack backtrace:
> > > > > > #0 0x80939ad6 at kdb_backtrace+0x66
> > > > > > #1 0x808ffacd at panic+0x1cd
> > > > > > #2 0x80cdfbe9 at trap_fatal+0x289
> > > > > > #3 0x80cdff4f at trap_pfault+0x20f
> > > > > > #4 0x80ce0504 at trap+0x344
> > > > > > #5 0x80cc9b43 at calltrap+0x8
> > > > > > #6 0x8099d043 at filt_vfsvnode+0xf3
> > > > > > #7 0x808c4793 at kqueue_register+0x3e3
> > > > > > #8 0x808c4de8 at kern_kevent+0x108
> > > > > > #9 0x808c5950 at sys_kevent+0x90
> > > > > > #10 0x80cdf3a8 at amd64_syscall+0x5d8
> > > > > > #11 0x80cc9e27 at Xfast_syscall+0xf7
> > > > > > 
> > > > > > Full core.txt : 
> > > > > > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0
> > > > > 
> > > > > For start, please load the core into kgdb and for
> > > > > frame 8
> > > > > p *kn
> > > > 
> > > > (kgdb) frame 8
> > > > #8  0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000,
> > > > hint=0) at /usr/src/sys/kern/vfs_subr.c:4600
> > > > 4600VI_LOCK(vp);
> > > > (kgdb) p *kn
> > > > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, 
> > > >   kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, 
> > > >   kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, 
> > > > flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status =
> > > > 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp =
> > > > 0xfe016949e190, p_proc = 0xfe016949e190, p_aio =
> > > > 0xfe016949e190, p_lio = 0xfe016949e190}, kn_fop =
> > > > 0x812f0, kn_hook = 0xfe0119d0b1f8, kn_hookid = 0}
> > > From the kgdb, also please do
> > > p *(struct vnode *)0xfe0119d0b1f8
> > 
> > With a kernel with debug info, this panic becomes  mtx_lock() of
> > destroyed mutex
> > panic: mtx_lock() of destroyed mutex
> > 
> > http://user.lamaiziere.net/patrick/public/panic_mtx_lock.txt
> > 
> > @ /usr/src/sys/kern/vfs_subr.c:4600 cpuid = 2
> > KDB: stack backtrace:
> > #0 0x80920286 at kdb_backtrace+0x66
> > #1 0x808e738d at panic+0x1cd
> > #2 0x808d58d6 at _mtx_lock_flags+0x116
> > #3 0x8098143b at filt_vfsvnode+0x3b
> > #4 0x808b213a at kqueue_register+0x4ca
> > #5 0x808b2688 at kern_kevent+0x108
> > #6 0x808b3190 at sys_kevent+0x90
> > #7 0x80cbd975 at amd64_syscall+0x2f5
> > #8 0x80ca8557 at Xfast_syscall+0xf7
> > 
> > (kgdb) frame 5
> > #5  0x808b213a in kqueue_register (kq=0xfe00ddc98900, 
> > kev=0xff824bb5f880, td=0xfe00b1e7f000, waitok=1) at 
> > /usr/src/sys/kern/kern_event.c:1136
> > 1136event = kn->kn_fop->f_event(kn, 0);
> > 
> > (kgdb) p *kn
> > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 
> > 0xfe011c232b00}, kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 
> > 0x0}, kn_kq = 0xfe00ddc98900, 
> >   kn_kevent = {ident = 62, filter = -4, flags = 32784, fflags = 0, data = 
> > 0, udata = 0x0}, kn_status = 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {
> > p_fp = 0xfe00ddd4d870, p_proc = 0xfe00ddd4d870, p_aio = 
> > 0xfe00ddd4d870, p_lio = 0xfe00ddd4d870}, kn_fop = 
> > 0x812fcca0, 
> >   kn_hook = 0xfe02064a6000, kn_hookid = 0}
> > 
> > (kgdb) p *(struct vnode *)0xfe02064a6000
> > $2 = {v_type = VBAD, v_tag = 0x80d89084 "none", v_op = 0x0, v_data 
> > = 0x0, v_mount = 0x0, v_nmntvnodes = {tqe_next = 0xfe020d3e6000, 
> > tqe_prev = 0xfe0086625a68}, v_un = {vu_mount = 0x0, vu_socket = 
> > 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, 
> > le_prev = 0xff8000de9698}, v_hash = 238022, v_cache_src = {lh_first 
> > = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0

Re: Possible kqueue related issue on STABLE/RC.

2013-09-24 Thread Konstantin Belousov
On Tue, Sep 24, 2013 at 11:47:38AM +0200, Patrick Lamaiziere wrote:
> Le Tue, 24 Sep 2013 11:29:09 +0300,
> Konstantin Belousov  a ?crit :
> 
> Hello,
> 
> ...
> 
> > > > > Ok This has been mfced to 9.2-STABLE. But I still see this panic
> > > > > with 9-2/STABLE of today (Revision : 255811). This may be better
> > > > > because before the box paniced within minutes and now within
> > > > > hours (still using poudriere).
> > > > > 
> > > > > panic:
> > > > > fault code  = supervisor read data, page not present
> > > > > instruction pointer = 0x20:0x808ebfcd
> > > > > stack pointer   = 0x28:0xff824c2e0630
> > > > > frame pointer   = 0x28:0xff824c2e06a0
> > > > > code segment= base 0x0, limit 0xf, type 0x1b
> > > > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > > > processor eflags= interrupt enabled, resume, IOPL = 0
> > > > > current process = 54243 (gvfsd-trash)
> > > > > trap number = 12
> > > > > panic: page fault
> > > > > cpuid = 2
> > > > > KDB: stack backtrace:
> > > > > #0 0x80939ad6 at kdb_backtrace+0x66
> > > > > #1 0x808ffacd at panic+0x1cd
> > > > > #2 0x80cdfbe9 at trap_fatal+0x289
> > > > > #3 0x80cdff4f at trap_pfault+0x20f
> > > > > #4 0x80ce0504 at trap+0x344
> > > > > #5 0x80cc9b43 at calltrap+0x8
> > > > > #6 0x8099d043 at filt_vfsvnode+0xf3
> > > > > #7 0x808c4793 at kqueue_register+0x3e3
> > > > > #8 0x808c4de8 at kern_kevent+0x108
> > > > > #9 0x808c5950 at sys_kevent+0x90
> > > > > #10 0x80cdf3a8 at amd64_syscall+0x5d8
> > > > > #11 0x80cc9e27 at Xfast_syscall+0xf7
> > > > > 
> > > > > Full core.txt : 
> > > > > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0
> > > > 
> > > > For start, please load the core into kgdb and for
> > > > frame 8
> > > > p *kn
> > > 
> > > (kgdb) frame 8
> > > #8  0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000,
> > > hint=0) at /usr/src/sys/kern/vfs_subr.c:4600
> > > 4600  VI_LOCK(vp);
> > > (kgdb) p *kn
> > > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, 
> > >   kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, 
> > >   kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, 
> > > flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status =
> > > 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp =
> > > 0xfe016949e190, p_proc = 0xfe016949e190, p_aio =
> > > 0xfe016949e190, p_lio = 0xfe016949e190}, kn_fop =
> > > 0x812f0, kn_hook = 0xfe0119d0b1f8, kn_hookid = 0}
> > From the kgdb, also please do
> > p *(struct vnode *)0xfe0119d0b1f8
> 
> With a kernel with debug info, this panic becomes  mtx_lock() of
> destroyed mutex
> panic: mtx_lock() of destroyed mutex
> 
> http://user.lamaiziere.net/patrick/public/panic_mtx_lock.txt
> 
> @ /usr/src/sys/kern/vfs_subr.c:4600 cpuid = 2
> KDB: stack backtrace:
> #0 0x80920286 at kdb_backtrace+0x66
> #1 0x808e738d at panic+0x1cd
> #2 0x808d58d6 at _mtx_lock_flags+0x116
> #3 0x8098143b at filt_vfsvnode+0x3b
> #4 0x808b213a at kqueue_register+0x4ca
> #5 0x808b2688 at kern_kevent+0x108
> #6 0x808b3190 at sys_kevent+0x90
> #7 0x80cbd975 at amd64_syscall+0x2f5
> #8 0x80ca8557 at Xfast_syscall+0xf7
> 
> (kgdb) frame 5
> #5  0x808b213a in kqueue_register (kq=0xfe00ddc98900, 
> kev=0xff824bb5f880, td=0xfe00b1e7f000, waitok=1) at 
> /usr/src/sys/kern/kern_event.c:1136
> 1136  event = kn->kn_fop->f_event(kn, 0);
> 
> (kgdb) p *kn
> $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 
> 0xfe011c232b00}, kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 
> 0x0}, kn_kq = 0xfe00ddc98900, 
>   kn_kevent = {ident = 62, filter = -4, flags = 32784, fflags = 0, data = 0, 
> udata = 0x0}, kn_status = 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {
> p_fp = 0xfe00ddd4d870, p_proc = 0xfe00ddd4d870, p_aio = 
> 0xfe00ddd4d870, p_lio = 0xfe00ddd4d870}, kn_fop = 0x812fcca0, 
>   kn_hook = 0xfe02064a6000, kn_hookid = 0}
> 
> (kgdb) p *(struct vnode *)0xfe02064a6000
> $2 = {v_type = VBAD, v_tag = 0x80d89084 "none", v_op = 0x0, v_data = 
> 0x0, v_mount = 0x0, v_nmntvnodes = {tqe_next = 0xfe020d3e6000, 
> tqe_prev = 0xfe0086625a68}, v_un = {vu_mount = 0x0, vu_socket = 0x0, 
> vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, 
> le_prev = 0xff8000de9698}, v_hash = 238022, v_cache_src = {lh_first = 
> 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfe02064a6060}, 
> v_cache_dd = 0x0, 
>   v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_lock = {lock_object = 
> {lo_name = 0x80f56e48 "ufs", lo_flags = 91881472, lo_data = 0, 
>   lo_witness = 0xff80006c3400}, lk_lock = 1, lk_exslpfail = 0, 
> lk_timo = 51, lk_p

Re: Possible kqueue related issue on STABLE/RC.

2013-09-24 Thread Patrick Lamaiziere
Le Tue, 24 Sep 2013 11:29:09 +0300,
Konstantin Belousov  a écrit :

Hello,

...

> > > > Ok This has been mfced to 9.2-STABLE. But I still see this panic
> > > > with 9-2/STABLE of today (Revision : 255811). This may be better
> > > > because before the box paniced within minutes and now within
> > > > hours (still using poudriere).
> > > > 
> > > > panic:
> > > > fault code  = supervisor read data, page not present
> > > > instruction pointer = 0x20:0x808ebfcd
> > > > stack pointer   = 0x28:0xff824c2e0630
> > > > frame pointer   = 0x28:0xff824c2e06a0
> > > > code segment= base 0x0, limit 0xf, type 0x1b
> > > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > > processor eflags= interrupt enabled, resume, IOPL = 0
> > > > current process = 54243 (gvfsd-trash)
> > > > trap number = 12
> > > > panic: page fault
> > > > cpuid = 2
> > > > KDB: stack backtrace:
> > > > #0 0x80939ad6 at kdb_backtrace+0x66
> > > > #1 0x808ffacd at panic+0x1cd
> > > > #2 0x80cdfbe9 at trap_fatal+0x289
> > > > #3 0x80cdff4f at trap_pfault+0x20f
> > > > #4 0x80ce0504 at trap+0x344
> > > > #5 0x80cc9b43 at calltrap+0x8
> > > > #6 0x8099d043 at filt_vfsvnode+0xf3
> > > > #7 0x808c4793 at kqueue_register+0x3e3
> > > > #8 0x808c4de8 at kern_kevent+0x108
> > > > #9 0x808c5950 at sys_kevent+0x90
> > > > #10 0x80cdf3a8 at amd64_syscall+0x5d8
> > > > #11 0x80cc9e27 at Xfast_syscall+0xf7
> > > > 
> > > > Full core.txt : 
> > > > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0
> > > 
> > > For start, please load the core into kgdb and for
> > > frame 8
> > > p *kn
> > 
> > (kgdb) frame 8
> > #8  0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000,
> > hint=0) at /usr/src/sys/kern/vfs_subr.c:4600
> > 4600VI_LOCK(vp);
> > (kgdb) p *kn
> > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, 
> >   kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, 
> >   kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, 
> > flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status =
> > 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp =
> > 0xfe016949e190, p_proc = 0xfe016949e190, p_aio =
> > 0xfe016949e190, p_lio = 0xfe016949e190}, kn_fop =
> > 0x812f0, kn_hook = 0xfe0119d0b1f8, kn_hookid = 0}
> From the kgdb, also please do
> p *(struct vnode *)0xfe0119d0b1f8

With a kernel with debug info, this panic becomes  mtx_lock() of
destroyed mutex
panic: mtx_lock() of destroyed mutex

http://user.lamaiziere.net/patrick/public/panic_mtx_lock.txt

@ /usr/src/sys/kern/vfs_subr.c:4600 cpuid = 2
KDB: stack backtrace:
#0 0x80920286 at kdb_backtrace+0x66
#1 0x808e738d at panic+0x1cd
#2 0x808d58d6 at _mtx_lock_flags+0x116
#3 0x8098143b at filt_vfsvnode+0x3b
#4 0x808b213a at kqueue_register+0x4ca
#5 0x808b2688 at kern_kevent+0x108
#6 0x808b3190 at sys_kevent+0x90
#7 0x80cbd975 at amd64_syscall+0x2f5
#8 0x80ca8557 at Xfast_syscall+0xf7

(kgdb) frame 5
#5  0x808b213a in kqueue_register (kq=0xfe00ddc98900, 
kev=0xff824bb5f880, td=0xfe00b1e7f000, waitok=1) at 
/usr/src/sys/kern/kern_event.c:1136
1136event = kn->kn_fop->f_event(kn, 0);

(kgdb) p *kn
$1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0xfe011c232b00}, 
kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, kn_kq = 
0xfe00ddc98900, 
  kn_kevent = {ident = 62, filter = -4, flags = 32784, fflags = 0, data = 0, 
udata = 0x0}, kn_status = 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {
p_fp = 0xfe00ddd4d870, p_proc = 0xfe00ddd4d870, p_aio = 
0xfe00ddd4d870, p_lio = 0xfe00ddd4d870}, kn_fop = 0x812fcca0, 
  kn_hook = 0xfe02064a6000, kn_hookid = 0}

(kgdb) p *(struct vnode *)0xfe02064a6000
$2 = {v_type = VBAD, v_tag = 0x80d89084 "none", v_op = 0x0, v_data = 
0x0, v_mount = 0x0, v_nmntvnodes = {tqe_next = 0xfe020d3e6000, 
tqe_prev = 0xfe0086625a68}, v_un = {vu_mount = 0x0, vu_socket = 0x0, 
vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, 
le_prev = 0xff8000de9698}, v_hash = 238022, v_cache_src = {lh_first = 
0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfe02064a6060}, 
v_cache_dd = 0x0, 
  v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_lock = {lock_object = 
{lo_name = 0x80f56e48 "ufs", lo_flags = 91881472, lo_data = 0, 
  lo_witness = 0xff80006c3400}, lk_lock = 1, lk_exslpfail = 0, lk_timo 
= 51, lk_pri = 96, lk_stack = {depth = 12, pcs = {18446744071571296822, 
18446744071573768556, 18446744071576111075, 18446744071606114523, 
18446744071576111075, 18446744071572113927, 18446744071572067653, 
18446744071606111219, 
18446744071572016126, 18446744071572018094

Re: Possible kqueue related issue on STABLE/RC.

2013-09-24 Thread Konstantin Belousov
On Tue, Sep 24, 2013 at 09:44:27AM +0200, Patrick Lamaiziere wrote:
> Le Mon, 23 Sep 2013 23:31:41 +0300,
> Konstantin Belousov  a ?crit :
> 
> Hello,
> 
> ...
> 
> 
> > > Ok This has been mfced to 9.2-STABLE. But I still see this panic
> > > with 9-2/STABLE of today (Revision : 255811). This may be better
> > > because before the box paniced within minutes and now within hours
> > > (still using poudriere).
> > > 
> > > panic:
> > > fault code  = supervisor read data, page not present
> > > instruction pointer = 0x20:0x808ebfcd
> > > stack pointer   = 0x28:0xff824c2e0630
> > > frame pointer   = 0x28:0xff824c2e06a0
> > > code segment= base 0x0, limit 0xf, type 0x1b
> > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags= interrupt enabled, resume, IOPL = 0
> > > current process = 54243 (gvfsd-trash)
> > > trap number = 12
> > > panic: page fault
> > > cpuid = 2
> > > KDB: stack backtrace:
> > > #0 0x80939ad6 at kdb_backtrace+0x66
> > > #1 0x808ffacd at panic+0x1cd
> > > #2 0x80cdfbe9 at trap_fatal+0x289
> > > #3 0x80cdff4f at trap_pfault+0x20f
> > > #4 0x80ce0504 at trap+0x344
> > > #5 0x80cc9b43 at calltrap+0x8
> > > #6 0x8099d043 at filt_vfsvnode+0xf3
> > > #7 0x808c4793 at kqueue_register+0x3e3
> > > #8 0x808c4de8 at kern_kevent+0x108
> > > #9 0x808c5950 at sys_kevent+0x90
> > > #10 0x80cdf3a8 at amd64_syscall+0x5d8
> > > #11 0x80cc9e27 at Xfast_syscall+0xf7
> > > 
> > > Full core.txt : 
> > > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0
> > 
> > For start, please load the core into kgdb and for
> > frame 8
> > p *kn
> 
> (kgdb) frame 8
> #8  0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000, hint=0)
> at /usr/src/sys/kern/vfs_subr.c:4600
> 4600  VI_LOCK(vp);
> (kgdb) p *kn
> $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, 
>   kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, 
>   kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, 
> flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status = 24, 
>   kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp = 0xfe016949e190, 
> p_proc = 0xfe016949e190, p_aio = 0xfe016949e190, 
> p_lio = 0xfe016949e190}, kn_fop = 0x812fd440, 
>   kn_hook = 0xfe0119d0b1f8, kn_hookid = 0}
From the kgdb, also please do
p *(struct vnode *)0xfe0119d0b1f8

> 
> 
> > Also, please follow
> > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> > to recompile kernel with the debugging options and try to recreate
> > the panic.
> 
> It's building.

Please try the following.

diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index aa165a0..5715f35 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -4421,10 +4421,14 @@ filt_vfsdetach(struct knote *kn)
 static int
 filt_vfsread(struct knote *kn, long hint)
 {
-   struct vnode *vp = (struct vnode *)kn->kn_hook;
+   struct vnode *vp;
struct vattr va;
int res;
 
+   if ((kn->kn_status & KN_DETACHED) != 0)
+   return (0);
+   vp = (struct vnode *)kn->kn_hook;
+
/*
 * filesystem is gone, so set the EOF flag and schedule
 * the knote for deletion.
@@ -4450,8 +4454,11 @@ filt_vfsread(struct knote *kn, long hint)
 static int
 filt_vfswrite(struct knote *kn, long hint)
 {
-   struct vnode *vp = (struct vnode *)kn->kn_hook;
+   struct vnode *vp;
 
+   if ((kn->kn_status & KN_DETACHED) != 0)
+   return (0);
+   vp = (struct vnode *)kn->kn_hook;
VI_LOCK(vp);
 
/*
@@ -4469,9 +4476,12 @@ filt_vfswrite(struct knote *kn, long hint)
 static int
 filt_vfsvnode(struct knote *kn, long hint)
 {
-   struct vnode *vp = (struct vnode *)kn->kn_hook;
+   struct vnode *vp;
int res;
 
+   if ((kn->kn_status & KN_DETACHED) != 0)
+   return (0);
+   vp = (struct vnode *)kn->kn_hook;
VI_LOCK(vp);
if (kn->kn_sfflags & hint)
kn->kn_fflags |= hint;


pgp0ungmMXQcb.pgp
Description: PGP signature


Re: Possible kqueue related issue on STABLE/RC.

2013-09-24 Thread Patrick Lamaiziere
Le Mon, 23 Sep 2013 23:31:41 +0300,
Konstantin Belousov  a écrit :

Hello,

...


> > Ok This has been mfced to 9.2-STABLE. But I still see this panic
> > with 9-2/STABLE of today (Revision : 255811). This may be better
> > because before the box paniced within minutes and now within hours
> > (still using poudriere).
> > 
> > panic:
> > fault code  = supervisor read data, page not present
> > instruction pointer = 0x20:0x808ebfcd
> > stack pointer   = 0x28:0xff824c2e0630
> > frame pointer   = 0x28:0xff824c2e06a0
> > code segment= base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags= interrupt enabled, resume, IOPL = 0
> > current process = 54243 (gvfsd-trash)
> > trap number = 12
> > panic: page fault
> > cpuid = 2
> > KDB: stack backtrace:
> > #0 0x80939ad6 at kdb_backtrace+0x66
> > #1 0x808ffacd at panic+0x1cd
> > #2 0x80cdfbe9 at trap_fatal+0x289
> > #3 0x80cdff4f at trap_pfault+0x20f
> > #4 0x80ce0504 at trap+0x344
> > #5 0x80cc9b43 at calltrap+0x8
> > #6 0x8099d043 at filt_vfsvnode+0xf3
> > #7 0x808c4793 at kqueue_register+0x3e3
> > #8 0x808c4de8 at kern_kevent+0x108
> > #9 0x808c5950 at sys_kevent+0x90
> > #10 0x80cdf3a8 at amd64_syscall+0x5d8
> > #11 0x80cc9e27 at Xfast_syscall+0xf7
> > 
> > Full core.txt : 
> > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0
> 
> For start, please load the core into kgdb and for
> frame 8
> p *kn

(kgdb) frame 8
#8  0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000, hint=0)
at /usr/src/sys/kern/vfs_subr.c:4600
4600VI_LOCK(vp);
(kgdb) p *kn
$1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, 
  kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, 
  kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, 
flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status = 24, 
  kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp = 0xfe016949e190, 
p_proc = 0xfe016949e190, p_aio = 0xfe016949e190, 
p_lio = 0xfe016949e190}, kn_fop = 0x812fd440, 
  kn_hook = 0xfe0119d0b1f8, kn_hookid = 0}


> Also, please follow
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> to recompile kernel with the debugging options and try to recreate
> the panic.

It's building.

Thanks, regards
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-23 Thread Konstantin Belousov
On Mon, Sep 23, 2013 at 03:37:08PM +0200, Patrick Lamaiziere wrote:
> Le Fri, 20 Sep 2013 15:17:05 +0200,
> Patrick Lamaiziere  a ?crit :
> 
> > Le Thu, 12 Sep 2013 10:36:43 +0300,
> > Konstantin Belousov  a ?crit :
> > 
> > Hello,
> > 
> > > Might be, your issue is that some filesystems do not care about
> > > proper locking mode for the fifos.  UFS carefully disables shared
> > > locking for VFIFO, but it seems ZFS is not.  I can propose the
> > > following band-aid, which could help you.
> > > 
> > > I have no idea is it the same issue as the kqueue panic.
> > > 
> > > diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> > > index c53030a..00bd998 100644
> > > --- a/sys/kern/vfs_vnops.c
> > > +++ b/sys/kern/vfs_vnops.c
> > > @@ -267,6 +267,8 @@ vn_open_vnode(struct vnode *vp, int fmode,
> > > struct ucred *cred, return (error);
> > >   }
> > >   }
> > > + if (vp->v_type == VFIFO && VOP_ISLOCKED(vp) !=
> > > LK_EXCLUSIVE)
> > > + vn_lock(vp, LK_UPGRADE | LK_RETRY);
> > >   if ((error = VOP_OPEN(vp, fmode, cred, td, fp)) != 0)
> > >   return (error);
> > >  
> > > @@ -358,7 +360,7 @@ vn_close(vp, flags, file_cred, td)
> > >   struct mount *mp;
> > >   int error, lock_flags;
> > >  
> > > - if (!(flags & FWRITE) && vp->v_mount != NULL &&
> > > + if (vp->v_type != VFIFO && !(flags & FWRITE) &&
> > > vp->v_mount != NULL && vp->v_mount->mnt_kern_flag &
> > > MNTK_EXTENDED_SHARED) lock_flags = LK_SHARED;
> > >   else
> > 
> 
> Ok This has been mfced to 9.2-STABLE. But I still see this panic with
> 9-2/STABLE of today (Revision : 255811). This may be better because
> before the box paniced within minutes and now within hours (still using 
> poudriere).
> 
> panic:
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x808ebfcd
> stack pointer   = 0x28:0xff824c2e0630
> frame pointer   = 0x28:0xff824c2e06a0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 54243 (gvfsd-trash)
> trap number = 12
> panic: page fault
> cpuid = 2
> KDB: stack backtrace:
> #0 0x80939ad6 at kdb_backtrace+0x66
> #1 0x808ffacd at panic+0x1cd
> #2 0x80cdfbe9 at trap_fatal+0x289
> #3 0x80cdff4f at trap_pfault+0x20f
> #4 0x80ce0504 at trap+0x344
> #5 0x80cc9b43 at calltrap+0x8
> #6 0x8099d043 at filt_vfsvnode+0xf3
> #7 0x808c4793 at kqueue_register+0x3e3
> #8 0x808c4de8 at kern_kevent+0x108
> #9 0x808c5950 at sys_kevent+0x90
> #10 0x80cdf3a8 at amd64_syscall+0x5d8
> #11 0x80cc9e27 at Xfast_syscall+0xf7
> 
> Full core.txt : 
> http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0

For start, please load the core into kgdb and for
frame 8
p *kn

Also, please follow
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
to recompile kernel with the debugging options and try to recreate the panic.


pgpNUT12ewyWc.pgp
Description: PGP signature


Re: Possible kqueue related issue on STABLE/RC.

2013-09-23 Thread Patrick Lamaiziere
Le Fri, 20 Sep 2013 15:17:05 +0200,
Patrick Lamaiziere  a écrit :

> Le Thu, 12 Sep 2013 10:36:43 +0300,
> Konstantin Belousov  a écrit :
> 
> Hello,
> 
> > Might be, your issue is that some filesystems do not care about
> > proper locking mode for the fifos.  UFS carefully disables shared
> > locking for VFIFO, but it seems ZFS is not.  I can propose the
> > following band-aid, which could help you.
> > 
> > I have no idea is it the same issue as the kqueue panic.
> > 
> > diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> > index c53030a..00bd998 100644
> > --- a/sys/kern/vfs_vnops.c
> > +++ b/sys/kern/vfs_vnops.c
> > @@ -267,6 +267,8 @@ vn_open_vnode(struct vnode *vp, int fmode,
> > struct ucred *cred, return (error);
> > }
> > }
> > +   if (vp->v_type == VFIFO && VOP_ISLOCKED(vp) !=
> > LK_EXCLUSIVE)
> > +   vn_lock(vp, LK_UPGRADE | LK_RETRY);
> > if ((error = VOP_OPEN(vp, fmode, cred, td, fp)) != 0)
> > return (error);
> >  
> > @@ -358,7 +360,7 @@ vn_close(vp, flags, file_cred, td)
> > struct mount *mp;
> > int error, lock_flags;
> >  
> > -   if (!(flags & FWRITE) && vp->v_mount != NULL &&
> > +   if (vp->v_type != VFIFO && !(flags & FWRITE) &&
> > vp->v_mount != NULL && vp->v_mount->mnt_kern_flag &
> > MNTK_EXTENDED_SHARED) lock_flags = LK_SHARED;
> > else
> 

Ok This has been mfced to 9.2-STABLE. But I still see this panic with
9-2/STABLE of today (Revision : 255811). This may be better because
before the box paniced within minutes and now within hours (still using 
poudriere).

panic:
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x808ebfcd
stack pointer   = 0x28:0xff824c2e0630
frame pointer   = 0x28:0xff824c2e06a0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 54243 (gvfsd-trash)
trap number = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0x80939ad6 at kdb_backtrace+0x66
#1 0x808ffacd at panic+0x1cd
#2 0x80cdfbe9 at trap_fatal+0x289
#3 0x80cdff4f at trap_pfault+0x20f
#4 0x80ce0504 at trap+0x344
#5 0x80cc9b43 at calltrap+0x8
#6 0x8099d043 at filt_vfsvnode+0xf3
#7 0x808c4793 at kqueue_register+0x3e3
#8 0x808c4de8 at kern_kevent+0x108
#9 0x808c5950 at sys_kevent+0x90
#10 0x80cdf3a8 at amd64_syscall+0x5d8
#11 0x80cc9e27 at Xfast_syscall+0xf7

Full core.txt : 
http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0

Thanks, regards.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-20 Thread Patrick Lamaiziere
Le Thu, 12 Sep 2013 10:36:43 +0300,
Konstantin Belousov  a écrit :

Hello,

> Might be, your issue is that some filesystems do not care about proper
> locking mode for the fifos.  UFS carefully disables shared locking for
> VFIFO, but it seems ZFS is not.  I can propose the following band-aid,
> which could help you.
> 
> I have no idea is it the same issue as the kqueue panic.
> 
> diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> index c53030a..00bd998 100644
> --- a/sys/kern/vfs_vnops.c
> +++ b/sys/kern/vfs_vnops.c
> @@ -267,6 +267,8 @@ vn_open_vnode(struct vnode *vp, int fmode, struct
> ucred *cred, return (error);
>   }
>   }
> + if (vp->v_type == VFIFO && VOP_ISLOCKED(vp) != LK_EXCLUSIVE)
> + vn_lock(vp, LK_UPGRADE | LK_RETRY);
>   if ((error = VOP_OPEN(vp, fmode, cred, td, fp)) != 0)
>   return (error);
>  
> @@ -358,7 +360,7 @@ vn_close(vp, flags, file_cred, td)
>   struct mount *mp;
>   int error, lock_flags;
>  
> - if (!(flags & FWRITE) && vp->v_mount != NULL &&
> + if (vp->v_type != VFIFO && !(flags & FWRITE) &&
> vp->v_mount != NULL && vp->v_mount->mnt_kern_flag &
> MNTK_EXTENDED_SHARED) lock_flags = LK_SHARED;
>   else

Hmmm, So what is the fix for 9.2-STABLE ? As far I can see there is no
function vn_open_vnode() here and I don't see where I should patch.

I see this panic too (with STABLE of today), while using poudriere +
ZFS like Jimmy.

Thanks, regards
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Konstantin Belousov
On Fri, Sep 13, 2013 at 12:40:28AM +0300, Andriy Gapon wrote:
> on 12/09/2013 21:49 Konstantin Belousov said the following:
> > Ok, so it is ZFS indeed.  I think I will commit the band-aid to head
> > shortly.
> 
> I am not sure if my message <5231a016.7060...@freebsd.org> was intercepted by
> NSA and didn't reach you...  At least I haven't seen any reaction to it.
> So, ZFS does not need this band-aid.  If you think that it may be needed for
> other filesystems or is useful in general, then okay.
> 
> Just in case, r254694 is not in releng/9.2 and I haven't seen any evidence 
> that
> Jimmy has tested a tree that included that commit.
Good to know, thank you.  My change does not conflict with ZFS fix,
and more, the cost of the change for well-behaved filesystem is zero,
since upgrade is only initiated when needed.

I think both fixes can coexist usefully.


pgpsDfe1M8d2p.pgp
Description: PGP signature


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Andriy Gapon
on 12/09/2013 21:49 Konstantin Belousov said the following:
> On Thu, Sep 12, 2013 at 08:28:48PM +0200, Jimmy Olgeni wrote:
>>
>> On Thu, 12 Sep 2013, Konstantin Belousov wrote:
>>
>>> Might be, your issue is that some filesystems do not care about proper
>>> locking mode for the fifos.  UFS carefully disables shared locking for
>>> VFIFO, but it seems ZFS is not.  I can propose the following band-aid,
>>> which could help you.
>>
>> This certainly seems to improve things. I have been running builds
>> for the past couple of hours without any critical problem.
> Ok, so it is ZFS indeed.  I think I will commit the band-aid to head
> shortly.

I am not sure if my message <5231a016.7060...@freebsd.org> was intercepted by
NSA and didn't reach you...  At least I haven't seen any reaction to it.
So, ZFS does not need this band-aid.  If you think that it may be needed for
other filesystems or is useful in general, then okay.

Just in case, r254694 is not in releng/9.2 and I haven't seen any evidence that
Jimmy has tested a tree that included that commit.

>> I spotted a few LORs but nothing bad happened so far:
>>
>>  http://olgeni.olgeni.com/~olgeni/dmesg-2013-09-12
> Out of curiousity, please look up the line for vm_object_terminate+0x1d8.
> 
>>
>> If it keeps working this way I hope there's still some time to fit it 
>> into an -RC.
> 
> For 10.0 yes, 9.2 is sealed (hopefully).
> 


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Jimmy Olgeni


On Thu, 12 Sep 2013, Konstantin Belousov wrote:


This certainly seems to improve things. I have been running builds
for the past couple of hours without any critical problem.

Ok, so it is ZFS indeed.  I think I will commit the band-aid to head
shortly.


Thank you!


I spotted a few LORs but nothing bad happened so far:

 http://olgeni.olgeni.com/~olgeni/dmesg-2013-09-12

Out of curiousity, please look up the line for vm_object_terminate+0x1d8.


Here it is, VM_OBJECT_UNLOCK:

(kgdb) list *vm_object_terminate+0x1d8
0x80b5cc58 is in vm_object_terminate (/usr/src/sys/vm/vm_object.c:767).
762 
763 /*

764  * Let the pager know object is dead.
765  */
766 vm_pager_deallocate(object);
767 VM_OBJECT_UNLOCK(object);
768 
769 vm_object_destroy(object);

770 }
771

--
jimmy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Konstantin Belousov
On Thu, Sep 12, 2013 at 08:28:48PM +0200, Jimmy Olgeni wrote:
> 
> On Thu, 12 Sep 2013, Konstantin Belousov wrote:
> 
> > Might be, your issue is that some filesystems do not care about proper
> > locking mode for the fifos.  UFS carefully disables shared locking for
> > VFIFO, but it seems ZFS is not.  I can propose the following band-aid,
> > which could help you.
> 
> This certainly seems to improve things. I have been running builds
> for the past couple of hours without any critical problem.
Ok, so it is ZFS indeed.  I think I will commit the band-aid to head
shortly.

> 
> I spotted a few LORs but nothing bad happened so far:
> 
>  http://olgeni.olgeni.com/~olgeni/dmesg-2013-09-12
Out of curiousity, please look up the line for vm_object_terminate+0x1d8.

> 
> If it keeps working this way I hope there's still some time to fit it 
> into an -RC.

For 10.0 yes, 9.2 is sealed (hopefully).



pgpXYfDDDWHQu.pgp
Description: PGP signature


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Jimmy Olgeni


On Thu, 12 Sep 2013, Konstantin Belousov wrote:


Might be, your issue is that some filesystems do not care about proper
locking mode for the fifos.  UFS carefully disables shared locking for
VFIFO, but it seems ZFS is not.  I can propose the following band-aid,
which could help you.


This certainly seems to improve things. I have been running builds
for the past couple of hours without any critical problem.

I spotted a few LORs but nothing bad happened so far:

http://olgeni.olgeni.com/~olgeni/dmesg-2013-09-12

If it keeps working this way I hope there's still some time to fit it 
into an -RC.


--
jimmy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Andriy Gapon
on 12/09/2013 10:36 Konstantin Belousov said the following:
> UFS carefully disables shared locking for
> VFIFO, but it seems ZFS is not.

In fact, ZFS should do that since r253603 (MFC-ed to stables as r254694 and
r254695): http://svnweb.freebsd.org/changeset/base/253603
If it still doesn't then it's a bug.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Jimmy Olgeni


On Thu, 12 Sep 2013, Konstantin Belousov wrote:


4591 is the VI_LOCK(vp) in filt_vfsvnode:

static int
filt_vfsvnode(struct knote *kn, long hint)
{
struct vnode *vp = (struct vnode *)kn->kn_hook;
int res;

VI_LOCK(vp);

^^^

if (kn->kn_sfflags & hint)
kn->kn_fflags |= hint;
if (hint == NOTE_REVOKE) {
kn->kn_flags |= EV_EOF;
VI_UNLOCK(vp);
return (1);
}
res = (kn->kn_fflags != 0);
VI_UNLOCK(vp);
return (res);
}

Which line is 4591 ?


"VI_LOCK(vp);" which bumps into the assertion.

--
jimmy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Jimmy Olgeni


On Thu, 12 Sep 2013, Konstantin Belousov wrote:


This time I tried with clang + these options and I got something more
interesting. All works fine until the lock violation below:

Clang is, well, not relevant there.


Still, with clang I could get a hard reset rather than a hang. But
maybe there are two different issues. I'll run more tests and see if
the fifo problem goes away with your patch below.


It seems you edited the kernel output, at least rearranging large blocks
of text.  I tried to interpret what I see in a useful way.


I got the message buffer from a minidump, here:

  http://olgeni.olgeni.com/~olgeni/textdump.tar.1


Might be, your issue is that some filesystems do not care about proper
locking mode for the fifos.  UFS carefully disables shared locking for
VFIFO, but it seems ZFS is not.  I can propose the following band-aid,
which could help you.


Thanks a lot! I'll give it a run when I get back home.

--
jimmy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Konstantin Belousov
On Wed, Sep 11, 2013 at 10:32:31PM +0200, Jimmy Olgeni wrote:
> 
> Hi,
> 
> On Wed, 11 Sep 2013, Konstantin Belousov wrote:
> 
> > Could you list the lines around the the vfs_subr.c:4591 in your kernel ?
> 
> 4591 is the VI_LOCK(vp) in filt_vfsvnode:
> 
> static int
> filt_vfsvnode(struct knote *kn, long hint)
> {
>   struct vnode *vp = (struct vnode *)kn->kn_hook;
>   int res;
> 
>   VI_LOCK(vp);
>   if (kn->kn_sfflags & hint)
>   kn->kn_fflags |= hint;
>   if (hint == NOTE_REVOKE) {
>   kn->kn_flags |= EV_EOF;
>   VI_UNLOCK(vp);
>   return (1);
>   }
>   res = (kn->kn_fflags != 0);
>   VI_UNLOCK(vp);
>   return (res);
> }
Which line is 4591 ?

> 
> 
> Next test with INVARIANTS & C as soon as the build is done.
> 
> -- 
> jimmy


pgpX1mbTn0uvT.pgp
Description: PGP signature


Re: Possible kqueue related issue on STABLE/RC.

2013-09-12 Thread Konstantin Belousov
On Wed, Sep 11, 2013 at 11:18:34PM +0200, Jimmy Olgeni wrote:
> 
> Hi,
> 
> On Wed, 11 Sep 2013, Konstantin Belousov wrote:
> 
> > Also, do you have all options listed at
> > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> > enabled ?
> 
> This time I tried with clang + these options and I got something more 
> interesting. All works fine until the lock violation below:
Clang is, well, not relevant there.

> 
> acquiring duplicate lock of same type: "os.lock_mtx"
>   1st os.lock_mtx @ nvidia_os.c:748
>   2nd os.lock_mtx @ nvidia_os.c:748
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8360e1e2f0
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8360e1e3a0
> witness_checkorder() at witness_checkorder+0xc0a/frame 0xff8360e1e420
> _mtx_lock_flags() at _mtx_lock_flags+0x74/frame 0xff8360e1e460
> os_acquire_spinlock() at os_acquire_spinlock+0x17/frame 0xff8360e1e470
> _nv012281rm() at _nv012281rm+0x9/frame 0xff800cfadec0
> lock order reversal:
>   1st 0xfe003603c098 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1240
>   2nd 0xfe003603b848 syncer (syncer) @ /usr/src/sys/kern/vfs_subr.c:2335
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8360bf3660
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8360bf3710
> witness_checkorder() at witness_checkorder+0xc0a/frame 0xff8360bf3790
> __lockmgr_args() at __lockmgr_args+0x744/frame 0xff8360bf38b0
> vop_stdlock() at vop_stdlock+0x3c/frame 0xff8360bf38d0
> VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xff8360bf3900
> _vn_lock() at _vn_lock+0x63/frame 0xff8360bf3960
> vputx() at vputx+0x34b/frame 0xff8360bf39c0
> dounmount() at dounmount+0x282/frame 0xff8360bf3a30
> sys_unmount() at sys_unmount+0x3a6/frame 0xff8360bf3b20
> amd64_syscall() at amd64_syscall+0x259/frame 0xff8360bf3c30
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8360bf3c30
> --- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x801918a7c, rsp = 
> 0x7fffbf18, rbp = 0x802818800 ---
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff83611df6b0
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xff83611df760
> assert_vop_elocked() at assert_vop_elocked+0x6a/frame 0xff83611df790
> fifo_open() at fifo_open+0x38/frame 0xff83611df810
> VOP_OPEN_APV() at VOP_OPEN_APV+0xd1/frame 0xff83611df840
> vn_open_cred() at vn_open_cred+0x532/frame 0xff83611df9b0
> kern_openat() at kern_openat+0x1c1/frame 0xff83611dfb20
> amd64_syscall() at amd64_syscall+0x259/frame 0xff83611dfc30
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff83611dfc30
> 
> 
> Here it goes:
> 
> 
> --- syscall (5, FreeBSD ELF64, sys_open), rip = 0x800db3d3c, rsp = 
> 0x7fffc968, rbp = 0 ---
> fifo_open: 0xfe0063251cd0 is not exclusive locked but should be
> KDB: enter: lock violation
> 
> 0xfe0063251cd0: tag zfs, type VFIFO
>  usecount 2, writecount 0, refcount 2 mountedhere 0xfe005048cc00
>  flags (VI_ACTIVE)
>  lock type zfs: SHARED (count 1)
> #0 0x808b22fa at __lockmgr_args+0xeba
> #1 0x8095df4c at vop_stdlock+0x3c
> #2 0x80d7f6ae at VOP_LOCK1_APV+0xbe
> #3 0x80980153 at _vn_lock+0x63
> #4 0x8096e321 at vget+0xa1
> #5 0x80959921 at cache_lookup_times+0x591
> #6 0x8095aa7d at vfs_cache_lookup+0x9d
> #7 0x80d7c881 at VOP_LOOKUP_APV+0xd1
> #8 0x80963aff at lookup+0x6bf
> #9 0x80963029 at namei+0x589
> #10 0x8097fa0f at vn_open_cred+0x27f
> #11 0x80978031 at kern_openat+0x1c1
> #12 0x80ccce49 at amd64_syscall+0x259
> #13 0x80cb5e9b at Xfast_syscall+0xfb
>  , fifo with 0 readers and 1 writers

It seems you edited the kernel output, at least rearranging large blocks
of text.  I tried to interpret what I see in a useful way.

Might be, your issue is that some filesystems do not care about proper
locking mode for the fifos.  UFS carefully disables shared locking for
VFIFO, but it seems ZFS is not.  I can propose the following band-aid,
which could help you.

I have no idea is it the same issue as the kqueue panic.

diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
index c53030a..00bd998 100644
--- a/sys/kern/vfs_vnops.c
+++ b/sys/kern/vfs_vnops.c
@@ -267,6 +267,8 @@ vn_open_vnode(struct vnode *vp, int fmode, struct ucred 
*cred,
return (error);
}
}
+   if (vp->v_type == VFIFO && VOP_ISLOCKED(vp) != LK_EXCLUSIVE)
+   vn_lock(vp, LK_UPGRADE | LK_RETRY);
if ((error = VOP_OPEN(vp, fmode, cred, td, fp)) != 0)
return (error);
 
@@ -358,7 +360,7 @@ vn_close(vp, flags, file_cred, td)
struct mount *mp;
int error, lock_flags;
 
-   if (!(flags & FWRITE) && vp->v_mount != NULL &&
+   if (vp->v_type != VFIFO && !(flags &

Re: Possible kqueue related issue on STABLE/RC.

2013-09-11 Thread Jimmy Olgeni


Hi,

On Wed, 11 Sep 2013, Konstantin Belousov wrote:


Also, do you have all options listed at
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
enabled ?


This time I tried with clang + these options and I got something more 
interesting. All works fine until the lock violation below:


acquiring duplicate lock of same type: "os.lock_mtx"
 1st os.lock_mtx @ nvidia_os.c:748
 2nd os.lock_mtx @ nvidia_os.c:748
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8360e1e2f0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8360e1e3a0
witness_checkorder() at witness_checkorder+0xc0a/frame 0xff8360e1e420
_mtx_lock_flags() at _mtx_lock_flags+0x74/frame 0xff8360e1e460
os_acquire_spinlock() at os_acquire_spinlock+0x17/frame 0xff8360e1e470
_nv012281rm() at _nv012281rm+0x9/frame 0xff800cfadec0
lock order reversal:
 1st 0xfe003603c098 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1240
 2nd 0xfe003603b848 syncer (syncer) @ /usr/src/sys/kern/vfs_subr.c:2335
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8360bf3660
kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8360bf3710
witness_checkorder() at witness_checkorder+0xc0a/frame 0xff8360bf3790
__lockmgr_args() at __lockmgr_args+0x744/frame 0xff8360bf38b0
vop_stdlock() at vop_stdlock+0x3c/frame 0xff8360bf38d0
VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xff8360bf3900
_vn_lock() at _vn_lock+0x63/frame 0xff8360bf3960
vputx() at vputx+0x34b/frame 0xff8360bf39c0
dounmount() at dounmount+0x282/frame 0xff8360bf3a30
sys_unmount() at sys_unmount+0x3a6/frame 0xff8360bf3b20
amd64_syscall() at amd64_syscall+0x259/frame 0xff8360bf3c30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8360bf3c30
--- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x801918a7c, rsp = 
0x7fffbf18, rbp = 0x802818800 ---
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff83611df6b0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xff83611df760
assert_vop_elocked() at assert_vop_elocked+0x6a/frame 0xff83611df790
fifo_open() at fifo_open+0x38/frame 0xff83611df810
VOP_OPEN_APV() at VOP_OPEN_APV+0xd1/frame 0xff83611df840
vn_open_cred() at vn_open_cred+0x532/frame 0xff83611df9b0
kern_openat() at kern_openat+0x1c1/frame 0xff83611dfb20
amd64_syscall() at amd64_syscall+0x259/frame 0xff83611dfc30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff83611dfc30


Here it goes:


--- syscall (5, FreeBSD ELF64, sys_open), rip = 0x800db3d3c, rsp = 
0x7fffc968, rbp = 0 ---
fifo_open: 0xfe0063251cd0 is not exclusive locked but should be
KDB: enter: lock violation

0xfe0063251cd0: tag zfs, type VFIFO
usecount 2, writecount 0, refcount 2 mountedhere 0xfe005048cc00
flags (VI_ACTIVE)
lock type zfs: SHARED (count 1)
#0 0x808b22fa at __lockmgr_args+0xeba
#1 0x8095df4c at vop_stdlock+0x3c
#2 0x80d7f6ae at VOP_LOCK1_APV+0xbe
#3 0x80980153 at _vn_lock+0x63
#4 0x8096e321 at vget+0xa1
#5 0x80959921 at cache_lookup_times+0x591
#6 0x8095aa7d at vfs_cache_lookup+0x9d
#7 0x80d7c881 at VOP_LOOKUP_APV+0xd1
#8 0x80963aff at lookup+0x6bf
#9 0x80963029 at namei+0x589
#10 0x8097fa0f at vn_open_cred+0x27f
#11 0x80978031 at kern_openat+0x1c1
#12 0x80ccce49 at amd64_syscall+0x259
#13 0x80cb5e9b at Xfast_syscall+0xfb
, fifo with 0 readers and 1 writers

0xfe00a21d3000: tag zfs, type VREG
usecount 4, writecount 0, refcount 5 mountedhere 0
flags (VI_ACTIVE)
v_object 0xfe00a0f71bc8 ref 3 pages 150
lock type zfs: SHARED (count 1)
#0 0x808b22fa at __lockmgr_args+0xeba
#1 0x8095df4c at vop_stdlock+0x3c
#2 0x80d7f6ae at VOP_LOCK1_APV+0xbe
#3 0x80980153 at _vn_lock+0x63
#4 0x8096e321 at vget+0xa1
#5 0x80b4e1de at vm_fault_hold+0x5ee
#6 0x80b4dba7 at vm_fault+0x77
#7 0x80ccc85b at trap_pfault+0x1bb
#8 0x80ccbf92 at trap+0x512
#9 0x80cb5bb2 at calltrap+0x8

0xfe0195389290: tag zfs, type VREG
usecount 4, writecount 0, refcount 5 mountedhere 0
flags (VI_ACTIVE)
v_object 0xfe000f2df740 ref 3 pages 223
lock type zfs: SHARED (count 1)
#0 0x808b22fa at __lockmgr_args+0xeba
#1 0x8095df4c at vop_stdlock+0x3c
#2 0x80d7f6ae at VOP_LOCK1_APV+0xbe
#3 0x80980153 at _vn_lock+0x63
#4 0x8096e321 at vget+0xa1
#5 0x80b4e1de at vm_fault_hold+0x5ee
#6 0x80b4dba7 at vm_fault+0x77
#7 0x80ccc85b at trap_pfault+0x1bb
#8 0x80ccbf92 at trap+0x512
#9 0x80cb5bb2 at calltrap+0x8

0xfe00a2207000: tag zfs, type VREG
usecount 4, writecount 0, refcount 5 mountedhere 0
flags (VI_ACTIVE)
v_object 0xfe000f0b8d98 ref 3 pages 201
lock type zfs: SHARED (count 1)
#0 0xf

Re: Possible kqueue related issue on STABLE/RC.

2013-09-11 Thread Jimmy Olgeni


On Wed, 11 Sep 2013, Volodymyr Kostyrko wrote:


11.09.2013 18:07, Jimmy Olgeni wrote:


Perhaps I found something weird while running 9.2-RC3 FreeBSD
9.2-RC3 #0 r255393 (ZFS-only setup).



Unfortunately I'm not able to get a minidump for the latest RC, but at this
point I suspect that something is going on with glib20 and kqueue on both
-STABLE and -RC.


Can you spare some more info on this?


Sure, here it goes:


1. What is your /etc/src.conf and /etc/make.conf files?


My /etc/src.conf:

===
PORTS_MODULES=emulators/virtualbox-ose-kmod sysutils/fusefs-kmod 
sysutils/pefs-kmod x11/nvidia-driver
===

My /etc/make.conf:

===
APACHE_PORT=www/apache22
DEFAULT_PGSQL_VER=92

WITH_NEW_XORG=yes

PERL_VERSION=5.14.4

.if (!empty(.CURDIR:M/usr/src*) || !empty(.CURDIR:M/usr/obj*))
.if !defined(NOCCACHE)
CC:=  /usr/local/libexec/ccache/world/cc
CXX:= /usr/local/libexec/ccache/world/c++
.endif
.endif
===


2. Does your copy of sources has some third-party patches applied?


No extra patches were applied. For the RC tests I also removed the
whole /usr/src and checked it out from svn from scratch.

Currently I have this kernel config:

===
include GENERIC

ident   RELENG_9

device  crypto  # core crypto support
device  cryptodev   # /dev/crypto for access to h/w
device  enc # IPsec interface.

options DDB # Enable the ddb debugger backend.

options IPSEC   # IP security (requires device crypto)
options IPSEC_NAT_T # NAT-T support, UDP encap of ESP
options IPSEC_FILTERTUNNEL  # filter ipsec packets from a tunnel

options SC_DFLT_FONT# compile font in
makeoptions SC_DFLT_FONT=cp437
options SC_HISTORY_SIZE=512 # number of history buffer lines
options VGA_WIDTH90 # support 90 column modes

options RACCT   # Resource Accounting
options RCTL# Resource Limits

# altq(9). Enable the base part of the hooks with the ALTQ option.
# Individual disciplines must be built into the base system and can not be
# loaded as modules at this point. In order to build a SMP kernel you must
# also have the ALTQ_NOPCC option.
options ALTQ
options ALTQ_CBQ# Class Bases Queueing
options ALTQ_RED# Random Early Detection
options ALTQ_RIO# RED In/Out
options ALTQ_HFSC   # Hierarchical Packet Scheduler
options ALTQ_CDNR   # Traffic conditioner
options ALTQ_PRIQ   # Priority Queueing
options ALTQ_NOPCC  # Required for SMP build

options TEKEN_UTF8
===

Also, my loader.conf:

===
autoboot_delay="5"

kern.cam.ada.legacy_aliases="0"
kern.cam.scsi_delay="1500"
net.inet.ip.fw.default_to_accept="1"
vm.pmap.pg_ps_enabled="1"

ahci_load="YES"
ipmi_load="YES"
zfs_load="YES"

geom_uzip_load="YES"

hw.memtest.tests="0"
hw.usb.no_pf="1"

vm.kmem_size_max="16G"
vm.kmem_size="12G"

vfs.root.mountfrom="zfs:rpool/zfsroot"

vfs.zfs.write_limit_override="1536M"
vfs.zfs.txg.synctime_ms="750"
vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1"

kern.ipc.semmns="512"
kern.ipc.semmnu="256"
kern.ipc.shmmni="256"
kern.ipc.shmseg="256"

nvidia_load="YES"
vboxdrv_load="YES"
amdtemp_load="YES"
snd_hda_load="YES"

hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"
===

sysctl.conf:

===
debug.kdb.break_to_debugger=1
hw.snd.default_unit=2
kern.coredump=0
kern.ipc.shm_allow_removed=1
kern.ipc.somaxconn=4096
kern.maxfiles=25000
kern.maxvnodes=25
kern.ps_arg_cache_limit=1
kern.sched.preempt_thresh=224
machdep.kdb_on_nmi=0
machdep.panic_on_nmi=0
net.inet.icmp.log_redirect=0
net.inet6.ip6.v6only=0
net.link.ether.inet.log_arp_movements=0
vfs.hirunningspace=5242880
vfs.read_max=128
vfs.ufs.dirhash_maxmem=33554432
vfs.usermount=1
vfs.zfs.prefetch_disable=1
===


3. Does this happens on more than one PC i.e. are you sure hardware
is not involved?


First thing I thought of was either memory or the CPU temperature.

Right now I have only one PC available to test it, but:

- Memory looks ok, at least according to Memtest86/Memtest86+ (tested
  from Ultimate Boot CD)

- CPU looks ok, meaning that it can process heavy workloads without a
  problem. I tried with dev.cpu.0.freq=2200 to avoid overheating, and
  by starting 4 different poudriere builds with -J2. I have CPU
  temperature in the prompt and it hovers aroung 50C during the
  builds. Without gvfs it works just fine. Running buildworld always
  seems to work; also running sysutils/stress (stress -v -t 5m --cpu 8
  --io 4 --vm 2 --vm-bytes 128M --hdd 4) did not seem to bother the
  system.

- ZFS scrub says that it's all OK on the storage side (initially I
  thought about something going wrong with ZFS due to bad tuning).


Can you try to build world WITH_CLANG_IS_CC? Clang generated code is
known to produce an instant coredump in situations where gcc
generated code hits a loop or becomes

Re: Possible kqueue related issue on STABLE/RC.

2013-09-11 Thread Jimmy Olgeni


On Wed, 11 Sep 2013, Volodymyr Kostyrko wrote:

Can you try to build world WITH_CLANG_IS_CC? Clang generated code is known to 
produce an instant coredump in situations where gcc generated code hits a 
loop or becomes unresponsive.


I removed ccache, rebuilt with WITH_CLANG_IS_CC and it worked for a 
while, but then I got a hard reset without even a core dump.


I'm rebuilding with Konstantin's kernel debug options plus software 
watchdog and trying another round.


--
jimmy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-11 Thread Jimmy Olgeni


Hi,

On Wed, 11 Sep 2013, Konstantin Belousov wrote:


Could you list the lines around the the vfs_subr.c:4591 in your kernel ?


4591 is the VI_LOCK(vp) in filt_vfsvnode:

static int
filt_vfsvnode(struct knote *kn, long hint)
{
struct vnode *vp = (struct vnode *)kn->kn_hook;
int res;

VI_LOCK(vp);
if (kn->kn_sfflags & hint)
kn->kn_fflags |= hint;
if (hint == NOTE_REVOKE) {
kn->kn_flags |= EV_EOF;
VI_UNLOCK(vp);
return (1);
}
res = (kn->kn_fflags != 0);
VI_UNLOCK(vp);
return (res);
}


Next test with INVARIANTS & C as soon as the build is done.

--
jimmy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible kqueue related issue on STABLE/RC.

2013-09-11 Thread Konstantin Belousov
On Wed, Sep 11, 2013 at 05:07:10PM +0200, Jimmy Olgeni wrote:
> - However, this time I managed to get a minidump from the old -STABLE. I
>saved it here:
> 
>  http://olgeni.olgeni.com/~olgeni/core.txt.0
Could you list the lines around the the vfs_subr.c:4591 in your kernel ?

Also, do you have all options listed at
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
enabled ?

> Unfortunately I'm not able to get a minidump for the latest RC, but at this
> point I suspect that something is going on with glib20 and kqueue on both
> -STABLE and -RC.

Do you run software watchdog ?  If not, try it.  It might allow to get the
dump if the problem is software.

But if the problem is similar to what you have catched in core.0, this
would be not helpful, while building the debugging kernel is.


pgpg5pkdaa5P8.pgp
Description: PGP signature


Re: Possible kqueue related issue on STABLE/RC.

2013-09-11 Thread Volodymyr Kostyrko

11.09.2013 18:07, Jimmy Olgeni wrote:


Perhaps I found something weird while running 9.2-RC3 FreeBSD
9.2-RC3 #0 r255393 (ZFS-only setup).



Unfortunately I'm not able to get a minidump for the latest RC, but at this
point I suspect that something is going on with glib20 and kqueue on both
-STABLE and -RC.


Can you spare some more info on this?

1. What is your /etc/src.conf and /etc/make.conf files?
2. Does your copy of sources has some third-party patches applied?
3. Does this happens on more than one PC i.e. are you sure hardware is 
not involved?


Can you try to build world WITH_CLANG_IS_CC? Clang generated code is 
known to produce an instant coredump in situations where gcc generated 
code hits a loop or becomes unresponsive.


--
Sphinx of black quartz, judge my vow.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"