Re: Possible kqueue related issue on STABLE/RC.
Le Wed, 25 Sep 2013 11:06:33 +0300, Konstantin Belousov a écrit : Hello, > > > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote: > > > > I'd like to understand why you think protecting these functions > > > > w/ the _DETACHED check is correct... In kern_event.c, all > > > > calls to f_detach are followed by knote_drop which will ensure > > > > that the knote is removed and free, so no more f_event calls > > > > will be called on that knote.. > > > > > > My current belief is that what happens is a glitch in the > > > kqueue_register(). After a new knote is created and attached, the > > > kq lock is dropped and then f_event() is called. If the vnode is > > > reclaimed or possible freed meantime, f_event() seems to > > > dereference freed memory, since kn_hook points to freed vnode. > > > > > > The issue as I see it is that vnode lifecycle is detached from the > > > knote lifecycle. Might be, only the second patch, which acquires > > > a hold reference on the vnode for each knote, is really needed. > > > But before going into any conclusions, I want to see the testing > > > results. > > > > Testing looks good with your latest patch. I was able to run a > > complete poudriere bulk (870 packages). I'm running another bulk to > > see.. I've made another bulk without problem (with complete patch) > > If you have other patches to test just ask, I have not updated my > > packages because there was a change to make gvfsd to ignore some > > poudriere activity. So I guess it will be harder to see this > > problem. > Could you, please, test with the only patch > http://people.freebsd.org/~kib/misc/vnode_filter.1.patch > applied ? I wonder would it be enough. Looks good with this single patch too, one poudriere bulk is completed and I'm doing another just in case (but I think it would have already paniced, that's quite reproductible). Thanks, regards. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 22:40 +0300: > On Wed, Sep 25, 2013 at 09:19:54AM -0700, John-Mark Gurney wrote: > > Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 00:21 +0300: > > > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote: > > > > I'd like to understand why you think protecting these functions w/ > > > > the _DETACHED check is correct... In kern_event.c, all calls to > > > > f_detach are followed by knote_drop which will ensure that the knote > > > > is removed and free, so no more f_event calls will be called on that > > > > knote.. > > > > > > My current belief is that what happens is a glitch in the > > > kqueue_register(). After a new knote is created and attached, the kq > > > lock is dropped and then f_event() is called. If the vnode is reclaimed > > > or possible freed meantime, f_event() seems to dereference freed memory, > > > since kn_hook points to freed vnode. > > > > Well, if that happens, then the vnode isn't properly clearing up the > > knote before it gets reclaimed... It is the vnode's responsibility to > > make sure any knotes that are associated w/ it get cleaned up properly.. > See below. > > > > > > The issue as I see it is that vnode lifecycle is detached from the knote > > > lifecycle. Might be, only the second patch, which acquires a hold > > > reference > > > on the vnode for each knote, is really needed. But before going into any > > > conclusions, I want to see the testing results. > > > > The vnode lifecycle can't/shouldn't be detached from the knote lifecycle > > since the knote contains a pointer to the vnode... There is the function > > knlist_clear that can be used to clean up knotes when the object goes > > away.. > This is done from the vdropl() (null hold count) -> destroy_vpollinfo(). > But this is too late, IMO. vdropl() is only executing with the vnode > interlock locked, and knote lock is vnode lock. This way, you might > get far enough into vdropl in other thread, while trying to operate on > a vnode with zero hold count in some kqueue code path. > > We do not drain the vnode lock holders when destroying vnode, because > VFS interface require that anybody locking the vnode own a hold reference > on it. My short patch should fix exactly this issue, hopefully we will see. Which clearly wasn't happening before... With the above, and rereading your patch, I understand how this patch should fix the issue and bring the life cycle of the two back into sync... > > I was looking at the code, is there a good reason why you do > > VI_LOCK/VI_UNLOCK to protect the knote fields instead of getting it in > > the vfs_knllock/vfs_knlunlock functions? Because kq code will modify > > the knote fields w/ only running the vfs_knllock/vfs_knlunlock functions, > > so either the VI_LOCK/VI_UNLOCK are unnecessary, or should be moved to > > vfs_knllock/vfs_knlunlock... > > vfs_knllock() is fine. The problematic case if the > VOP_{PRE,POST}->VFS_KNOTE->VN_KNOTE->KNOTE calls from the VOPs. If you > look at the vfs_knl_assert_locked(), you would note that the function > only asserts that vnode is locked, not that it is locked exclusively. > This is because some filesystems started require from VFS to do e.g. > VOP_WRITE() with the vnode only shared-locked, and KNOTE() is called > with shared-locked vnode lock. > > The vfs_knllock() obtain the exclusive lock on the vnode, so kqueue > callers are fine. Taking vnode interlock inside the filters provides > enough exclusion for the VOP callers. Ahh, ok, makes sense now.. Clearly I need to learn more about the VFS/vnope system.. :) Thanks for the explanations... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Wed, Sep 25, 2013 at 09:19:54AM -0700, John-Mark Gurney wrote: > Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 00:21 +0300: > > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote: > > > I'd like to understand why you think protecting these functions w/ > > > the _DETACHED check is correct... In kern_event.c, all calls to > > > f_detach are followed by knote_drop which will ensure that the knote > > > is removed and free, so no more f_event calls will be called on that > > > knote.. > > > > My current belief is that what happens is a glitch in the > > kqueue_register(). After a new knote is created and attached, the kq > > lock is dropped and then f_event() is called. If the vnode is reclaimed > > or possible freed meantime, f_event() seems to dereference freed memory, > > since kn_hook points to freed vnode. > > Well, if that happens, then the vnode isn't properly clearing up the > knote before it gets reclaimed... It is the vnode's responsibility to > make sure any knotes that are associated w/ it get cleaned up properly.. See below. > > > The issue as I see it is that vnode lifecycle is detached from the knote > > lifecycle. Might be, only the second patch, which acquires a hold reference > > on the vnode for each knote, is really needed. But before going into any > > conclusions, I want to see the testing results. > > The vnode lifecycle can't/shouldn't be detached from the knote lifecycle > since the knote contains a pointer to the vnode... There is the function > knlist_clear that can be used to clean up knotes when the object goes > away.. This is done from the vdropl() (null hold count) -> destroy_vpollinfo(). But this is too late, IMO. vdropl() is only executing with the vnode interlock locked, and knote lock is vnode lock. This way, you might get far enough into vdropl in other thread, while trying to operate on a vnode with zero hold count in some kqueue code path. We do not drain the vnode lock holders when destroying vnode, because VFS interface require that anybody locking the vnode own a hold reference on it. My short patch should fix exactly this issue, hopefully we will see. > > I was looking at the code, is there a good reason why you do > VI_LOCK/VI_UNLOCK to protect the knote fields instead of getting it in > the vfs_knllock/vfs_knlunlock functions? Because kq code will modify > the knote fields w/ only running the vfs_knllock/vfs_knlunlock functions, > so either the VI_LOCK/VI_UNLOCK are unnecessary, or should be moved to > vfs_knllock/vfs_knlunlock... vfs_knllock() is fine. The problematic case if the VOP_{PRE,POST}->VFS_KNOTE->VN_KNOTE->KNOTE calls from the VOPs. If you look at the vfs_knl_assert_locked(), you would note that the function only asserts that vnode is locked, not that it is locked exclusively. This is because some filesystems started require from VFS to do e.g. VOP_WRITE() with the vnode only shared-locked, and KNOTE() is called with shared-locked vnode lock. The vfs_knllock() obtain the exclusive lock on the vnode, so kqueue callers are fine. Taking vnode interlock inside the filters provides enough exclusion for the VOP callers. pgpmOgoOWL9Qn.pgp Description: PGP signature
Re: Possible kqueue related issue on STABLE/RC.
Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 00:21 +0300: > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote: > > I'd like to understand why you think protecting these functions w/ > > the _DETACHED check is correct... In kern_event.c, all calls to > > f_detach are followed by knote_drop which will ensure that the knote > > is removed and free, so no more f_event calls will be called on that > > knote.. > > My current belief is that what happens is a glitch in the > kqueue_register(). After a new knote is created and attached, the kq > lock is dropped and then f_event() is called. If the vnode is reclaimed > or possible freed meantime, f_event() seems to dereference freed memory, > since kn_hook points to freed vnode. Well, if that happens, then the vnode isn't properly clearing up the knote before it gets reclaimed... It is the vnode's responsibility to make sure any knotes that are associated w/ it get cleaned up properly.. > The issue as I see it is that vnode lifecycle is detached from the knote > lifecycle. Might be, only the second patch, which acquires a hold reference > on the vnode for each knote, is really needed. But before going into any > conclusions, I want to see the testing results. The vnode lifecycle can't/shouldn't be detached from the knote lifecycle since the knote contains a pointer to the vnode... There is the function knlist_clear that can be used to clean up knotes when the object goes away.. I was looking at the code, is there a good reason why you do VI_LOCK/VI_UNLOCK to protect the knote fields instead of getting it in the vfs_knllock/vfs_knlunlock functions? Because kq code will modify the knote fields w/ only running the vfs_knllock/vfs_knlunlock functions, so either the VI_LOCK/VI_UNLOCK are unnecessary, or should be moved to vfs_knllock/vfs_knlunlock... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Wed, Sep 25, 2013 at 09:58:05AM +0200, Patrick Lamaiziere wrote: > Le Wed, 25 Sep 2013 00:21:27 +0300, > Konstantin Belousov a ?crit : > > Hello, > > > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote: > > > I'd like to understand why you think protecting these functions w/ > > > the _DETACHED check is correct... In kern_event.c, all calls to > > > f_detach are followed by knote_drop which will ensure that the knote > > > is removed and free, so no more f_event calls will be called on that > > > knote.. > > > > My current belief is that what happens is a glitch in the > > kqueue_register(). After a new knote is created and attached, the kq > > lock is dropped and then f_event() is called. If the vnode is > > reclaimed or possible freed meantime, f_event() seems to dereference > > freed memory, since kn_hook points to freed vnode. > > > > The issue as I see it is that vnode lifecycle is detached from the > > knote lifecycle. Might be, only the second patch, which acquires a > > hold reference on the vnode for each knote, is really needed. But > > before going into any conclusions, I want to see the testing results. > > Testing looks good with your latest patch. I was able to run a complete > poudriere bulk (870 packages). I'm running another bulk to see. > > If you have other patches to test just ask, I have not updated my > packages because there was a change to make gvfsd to ignore some > poudriere activity. So I guess it will be harder to see this > problem. Very good, thank you. Could you, please, test with the only patch http://people.freebsd.org/~kib/misc/vnode_filter.1.patch applied ? I wonder would it be enough. pgp7_QMOxJKl9.pgp Description: PGP signature
Re: Possible kqueue related issue on STABLE/RC.
Le Wed, 25 Sep 2013 00:21:27 +0300, Konstantin Belousov a écrit : Hello, > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote: > > I'd like to understand why you think protecting these functions w/ > > the _DETACHED check is correct... In kern_event.c, all calls to > > f_detach are followed by knote_drop which will ensure that the knote > > is removed and free, so no more f_event calls will be called on that > > knote.. > > My current belief is that what happens is a glitch in the > kqueue_register(). After a new knote is created and attached, the kq > lock is dropped and then f_event() is called. If the vnode is > reclaimed or possible freed meantime, f_event() seems to dereference > freed memory, since kn_hook points to freed vnode. > > The issue as I see it is that vnode lifecycle is detached from the > knote lifecycle. Might be, only the second patch, which acquires a > hold reference on the vnode for each knote, is really needed. But > before going into any conclusions, I want to see the testing results. Testing looks good with your latest patch. I was able to run a complete poudriere bulk (870 packages). I'm running another bulk to see. If you have other patches to test just ask, I have not updated my packages because there was a change to make gvfsd to ignore some poudriere activity. So I guess it will be harder to see this problem. Many thanks Konstantin, Regards ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote: > I'd like to understand why you think protecting these functions w/ > the _DETACHED check is correct... In kern_event.c, all calls to > f_detach are followed by knote_drop which will ensure that the knote > is removed and free, so no more f_event calls will be called on that > knote.. My current belief is that what happens is a glitch in the kqueue_register(). After a new knote is created and attached, the kq lock is dropped and then f_event() is called. If the vnode is reclaimed or possible freed meantime, f_event() seems to dereference freed memory, since kn_hook points to freed vnode. The issue as I see it is that vnode lifecycle is detached from the knote lifecycle. Might be, only the second patch, which acquires a hold reference on the vnode for each knote, is really needed. But before going into any conclusions, I want to see the testing results. pgpE9EQ09ovgc.pgp Description: PGP signature
Re: Possible kqueue related issue on STABLE/RC.
Konstantin Belousov wrote this message on Tue, Sep 24, 2013 at 15:14 +0300: > On Tue, Sep 24, 2013 at 11:47:38AM +0200, Patrick Lamaiziere wrote: > > Le Tue, 24 Sep 2013 11:29:09 +0300, > > Konstantin Belousov a ?crit : > > > > Hello, > > > > ... > > > > > > > > Ok This has been mfced to 9.2-STABLE. But I still see this panic > > > > > > with 9-2/STABLE of today (Revision : 255811). This may be better > > > > > > because before the box paniced within minutes and now within > > > > > > hours (still using poudriere). > > > > > > > > > > > > panic: > > > > > > fault code = supervisor read data, page not present > > > > > > instruction pointer = 0x20:0x808ebfcd > > > > > > stack pointer = 0x28:0xff824c2e0630 > > > > > > frame pointer = 0x28:0xff824c2e06a0 > > > > > > code segment= base 0x0, limit 0xf, type 0x1b > > > > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > > > > processor eflags= interrupt enabled, resume, IOPL = 0 > > > > > > current process = 54243 (gvfsd-trash) > > > > > > trap number = 12 > > > > > > panic: page fault > > > > > > cpuid = 2 > > > > > > KDB: stack backtrace: > > > > > > #0 0x80939ad6 at kdb_backtrace+0x66 > > > > > > #1 0x808ffacd at panic+0x1cd > > > > > > #2 0x80cdfbe9 at trap_fatal+0x289 > > > > > > #3 0x80cdff4f at trap_pfault+0x20f > > > > > > #4 0x80ce0504 at trap+0x344 > > > > > > #5 0x80cc9b43 at calltrap+0x8 > > > > > > #6 0x8099d043 at filt_vfsvnode+0xf3 > > > > > > #7 0x808c4793 at kqueue_register+0x3e3 > > > > > > #8 0x808c4de8 at kern_kevent+0x108 > > > > > > #9 0x808c5950 at sys_kevent+0x90 > > > > > > #10 0x80cdf3a8 at amd64_syscall+0x5d8 > > > > > > #11 0x80cc9e27 at Xfast_syscall+0xf7 > > > > > > > > > > > > Full core.txt : > > > > > > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0 > > > > > > > > > > For start, please load the core into kgdb and for > > > > > frame 8 > > > > > p *kn > > > > > > > > (kgdb) frame 8 > > > > #8 0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000, > > > > hint=0) at /usr/src/sys/kern/vfs_subr.c:4600 > > > > 4600VI_LOCK(vp); > > > > (kgdb) p *kn > > > > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, > > > > kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, > > > > kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, > > > > flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status = > > > > 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp = > > > > 0xfe016949e190, p_proc = 0xfe016949e190, p_aio = > > > > 0xfe016949e190, p_lio = 0xfe016949e190}, kn_fop = > > > > 0x812f0, kn_hook = 0xfe0119d0b1f8, kn_hookid = 0} > > > From the kgdb, also please do > > > p *(struct vnode *)0xfe0119d0b1f8 > > > > With a kernel with debug info, this panic becomes mtx_lock() of > > destroyed mutex > > panic: mtx_lock() of destroyed mutex > > > > http://user.lamaiziere.net/patrick/public/panic_mtx_lock.txt > > > > @ /usr/src/sys/kern/vfs_subr.c:4600 cpuid = 2 > > KDB: stack backtrace: > > #0 0x80920286 at kdb_backtrace+0x66 > > #1 0x808e738d at panic+0x1cd > > #2 0x808d58d6 at _mtx_lock_flags+0x116 > > #3 0x8098143b at filt_vfsvnode+0x3b > > #4 0x808b213a at kqueue_register+0x4ca > > #5 0x808b2688 at kern_kevent+0x108 > > #6 0x808b3190 at sys_kevent+0x90 > > #7 0x80cbd975 at amd64_syscall+0x2f5 > > #8 0x80ca8557 at Xfast_syscall+0xf7 > > > > (kgdb) frame 5 > > #5 0x808b213a in kqueue_register (kq=0xfe00ddc98900, > > kev=0xff824bb5f880, td=0xfe00b1e7f000, waitok=1) at > > /usr/src/sys/kern/kern_event.c:1136 > > 1136event = kn->kn_fop->f_event(kn, 0); > > > > (kgdb) p *kn > > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = > > 0xfe011c232b00}, kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = > > 0x0}, kn_kq = 0xfe00ddc98900, > > kn_kevent = {ident = 62, filter = -4, flags = 32784, fflags = 0, data = > > 0, udata = 0x0}, kn_status = 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = { > > p_fp = 0xfe00ddd4d870, p_proc = 0xfe00ddd4d870, p_aio = > > 0xfe00ddd4d870, p_lio = 0xfe00ddd4d870}, kn_fop = > > 0x812fcca0, > > kn_hook = 0xfe02064a6000, kn_hookid = 0} > > > > (kgdb) p *(struct vnode *)0xfe02064a6000 > > $2 = {v_type = VBAD, v_tag = 0x80d89084 "none", v_op = 0x0, v_data > > = 0x0, v_mount = 0x0, v_nmntvnodes = {tqe_next = 0xfe020d3e6000, > > tqe_prev = 0xfe0086625a68}, v_un = {vu_mount = 0x0, vu_socket = > > 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, > > le_prev = 0xff8000de9698}, v_hash = 238022, v_cache_src = {lh_first > > = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0
Re: Possible kqueue related issue on STABLE/RC.
On Tue, Sep 24, 2013 at 11:47:38AM +0200, Patrick Lamaiziere wrote: > Le Tue, 24 Sep 2013 11:29:09 +0300, > Konstantin Belousov a ?crit : > > Hello, > > ... > > > > > > Ok This has been mfced to 9.2-STABLE. But I still see this panic > > > > > with 9-2/STABLE of today (Revision : 255811). This may be better > > > > > because before the box paniced within minutes and now within > > > > > hours (still using poudriere). > > > > > > > > > > panic: > > > > > fault code = supervisor read data, page not present > > > > > instruction pointer = 0x20:0x808ebfcd > > > > > stack pointer = 0x28:0xff824c2e0630 > > > > > frame pointer = 0x28:0xff824c2e06a0 > > > > > code segment= base 0x0, limit 0xf, type 0x1b > > > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > > > processor eflags= interrupt enabled, resume, IOPL = 0 > > > > > current process = 54243 (gvfsd-trash) > > > > > trap number = 12 > > > > > panic: page fault > > > > > cpuid = 2 > > > > > KDB: stack backtrace: > > > > > #0 0x80939ad6 at kdb_backtrace+0x66 > > > > > #1 0x808ffacd at panic+0x1cd > > > > > #2 0x80cdfbe9 at trap_fatal+0x289 > > > > > #3 0x80cdff4f at trap_pfault+0x20f > > > > > #4 0x80ce0504 at trap+0x344 > > > > > #5 0x80cc9b43 at calltrap+0x8 > > > > > #6 0x8099d043 at filt_vfsvnode+0xf3 > > > > > #7 0x808c4793 at kqueue_register+0x3e3 > > > > > #8 0x808c4de8 at kern_kevent+0x108 > > > > > #9 0x808c5950 at sys_kevent+0x90 > > > > > #10 0x80cdf3a8 at amd64_syscall+0x5d8 > > > > > #11 0x80cc9e27 at Xfast_syscall+0xf7 > > > > > > > > > > Full core.txt : > > > > > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0 > > > > > > > > For start, please load the core into kgdb and for > > > > frame 8 > > > > p *kn > > > > > > (kgdb) frame 8 > > > #8 0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000, > > > hint=0) at /usr/src/sys/kern/vfs_subr.c:4600 > > > 4600 VI_LOCK(vp); > > > (kgdb) p *kn > > > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, > > > kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, > > > kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, > > > flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status = > > > 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp = > > > 0xfe016949e190, p_proc = 0xfe016949e190, p_aio = > > > 0xfe016949e190, p_lio = 0xfe016949e190}, kn_fop = > > > 0x812f0, kn_hook = 0xfe0119d0b1f8, kn_hookid = 0} > > From the kgdb, also please do > > p *(struct vnode *)0xfe0119d0b1f8 > > With a kernel with debug info, this panic becomes mtx_lock() of > destroyed mutex > panic: mtx_lock() of destroyed mutex > > http://user.lamaiziere.net/patrick/public/panic_mtx_lock.txt > > @ /usr/src/sys/kern/vfs_subr.c:4600 cpuid = 2 > KDB: stack backtrace: > #0 0x80920286 at kdb_backtrace+0x66 > #1 0x808e738d at panic+0x1cd > #2 0x808d58d6 at _mtx_lock_flags+0x116 > #3 0x8098143b at filt_vfsvnode+0x3b > #4 0x808b213a at kqueue_register+0x4ca > #5 0x808b2688 at kern_kevent+0x108 > #6 0x808b3190 at sys_kevent+0x90 > #7 0x80cbd975 at amd64_syscall+0x2f5 > #8 0x80ca8557 at Xfast_syscall+0xf7 > > (kgdb) frame 5 > #5 0x808b213a in kqueue_register (kq=0xfe00ddc98900, > kev=0xff824bb5f880, td=0xfe00b1e7f000, waitok=1) at > /usr/src/sys/kern/kern_event.c:1136 > 1136 event = kn->kn_fop->f_event(kn, 0); > > (kgdb) p *kn > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = > 0xfe011c232b00}, kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = > 0x0}, kn_kq = 0xfe00ddc98900, > kn_kevent = {ident = 62, filter = -4, flags = 32784, fflags = 0, data = 0, > udata = 0x0}, kn_status = 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = { > p_fp = 0xfe00ddd4d870, p_proc = 0xfe00ddd4d870, p_aio = > 0xfe00ddd4d870, p_lio = 0xfe00ddd4d870}, kn_fop = 0x812fcca0, > kn_hook = 0xfe02064a6000, kn_hookid = 0} > > (kgdb) p *(struct vnode *)0xfe02064a6000 > $2 = {v_type = VBAD, v_tag = 0x80d89084 "none", v_op = 0x0, v_data = > 0x0, v_mount = 0x0, v_nmntvnodes = {tqe_next = 0xfe020d3e6000, > tqe_prev = 0xfe0086625a68}, v_un = {vu_mount = 0x0, vu_socket = 0x0, > vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, > le_prev = 0xff8000de9698}, v_hash = 238022, v_cache_src = {lh_first = > 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfe02064a6060}, > v_cache_dd = 0x0, > v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_lock = {lock_object = > {lo_name = 0x80f56e48 "ufs", lo_flags = 91881472, lo_data = 0, > lo_witness = 0xff80006c3400}, lk_lock = 1, lk_exslpfail = 0, > lk_timo = 51, lk_p
Re: Possible kqueue related issue on STABLE/RC.
Le Tue, 24 Sep 2013 11:29:09 +0300, Konstantin Belousov a écrit : Hello, ... > > > > Ok This has been mfced to 9.2-STABLE. But I still see this panic > > > > with 9-2/STABLE of today (Revision : 255811). This may be better > > > > because before the box paniced within minutes and now within > > > > hours (still using poudriere). > > > > > > > > panic: > > > > fault code = supervisor read data, page not present > > > > instruction pointer = 0x20:0x808ebfcd > > > > stack pointer = 0x28:0xff824c2e0630 > > > > frame pointer = 0x28:0xff824c2e06a0 > > > > code segment= base 0x0, limit 0xf, type 0x1b > > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > > processor eflags= interrupt enabled, resume, IOPL = 0 > > > > current process = 54243 (gvfsd-trash) > > > > trap number = 12 > > > > panic: page fault > > > > cpuid = 2 > > > > KDB: stack backtrace: > > > > #0 0x80939ad6 at kdb_backtrace+0x66 > > > > #1 0x808ffacd at panic+0x1cd > > > > #2 0x80cdfbe9 at trap_fatal+0x289 > > > > #3 0x80cdff4f at trap_pfault+0x20f > > > > #4 0x80ce0504 at trap+0x344 > > > > #5 0x80cc9b43 at calltrap+0x8 > > > > #6 0x8099d043 at filt_vfsvnode+0xf3 > > > > #7 0x808c4793 at kqueue_register+0x3e3 > > > > #8 0x808c4de8 at kern_kevent+0x108 > > > > #9 0x808c5950 at sys_kevent+0x90 > > > > #10 0x80cdf3a8 at amd64_syscall+0x5d8 > > > > #11 0x80cc9e27 at Xfast_syscall+0xf7 > > > > > > > > Full core.txt : > > > > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0 > > > > > > For start, please load the core into kgdb and for > > > frame 8 > > > p *kn > > > > (kgdb) frame 8 > > #8 0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000, > > hint=0) at /usr/src/sys/kern/vfs_subr.c:4600 > > 4600VI_LOCK(vp); > > (kgdb) p *kn > > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, > > kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, > > kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, > > flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status = > > 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp = > > 0xfe016949e190, p_proc = 0xfe016949e190, p_aio = > > 0xfe016949e190, p_lio = 0xfe016949e190}, kn_fop = > > 0x812f0, kn_hook = 0xfe0119d0b1f8, kn_hookid = 0} > From the kgdb, also please do > p *(struct vnode *)0xfe0119d0b1f8 With a kernel with debug info, this panic becomes mtx_lock() of destroyed mutex panic: mtx_lock() of destroyed mutex http://user.lamaiziere.net/patrick/public/panic_mtx_lock.txt @ /usr/src/sys/kern/vfs_subr.c:4600 cpuid = 2 KDB: stack backtrace: #0 0x80920286 at kdb_backtrace+0x66 #1 0x808e738d at panic+0x1cd #2 0x808d58d6 at _mtx_lock_flags+0x116 #3 0x8098143b at filt_vfsvnode+0x3b #4 0x808b213a at kqueue_register+0x4ca #5 0x808b2688 at kern_kevent+0x108 #6 0x808b3190 at sys_kevent+0x90 #7 0x80cbd975 at amd64_syscall+0x2f5 #8 0x80ca8557 at Xfast_syscall+0xf7 (kgdb) frame 5 #5 0x808b213a in kqueue_register (kq=0xfe00ddc98900, kev=0xff824bb5f880, td=0xfe00b1e7f000, waitok=1) at /usr/src/sys/kern/kern_event.c:1136 1136event = kn->kn_fop->f_event(kn, 0); (kgdb) p *kn $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0xfe011c232b00}, kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, kn_kq = 0xfe00ddc98900, kn_kevent = {ident = 62, filter = -4, flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status = 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = { p_fp = 0xfe00ddd4d870, p_proc = 0xfe00ddd4d870, p_aio = 0xfe00ddd4d870, p_lio = 0xfe00ddd4d870}, kn_fop = 0x812fcca0, kn_hook = 0xfe02064a6000, kn_hookid = 0} (kgdb) p *(struct vnode *)0xfe02064a6000 $2 = {v_type = VBAD, v_tag = 0x80d89084 "none", v_op = 0x0, v_data = 0x0, v_mount = 0x0, v_nmntvnodes = {tqe_next = 0xfe020d3e6000, tqe_prev = 0xfe0086625a68}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0xff8000de9698}, v_hash = 238022, v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfe02064a6060}, v_cache_dd = 0x0, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_lock = {lock_object = {lo_name = 0x80f56e48 "ufs", lo_flags = 91881472, lo_data = 0, lo_witness = 0xff80006c3400}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96, lk_stack = {depth = 12, pcs = {18446744071571296822, 18446744071573768556, 18446744071576111075, 18446744071606114523, 18446744071576111075, 18446744071572113927, 18446744071572067653, 18446744071606111219, 18446744071572016126, 18446744071572018094
Re: Possible kqueue related issue on STABLE/RC.
On Tue, Sep 24, 2013 at 09:44:27AM +0200, Patrick Lamaiziere wrote: > Le Mon, 23 Sep 2013 23:31:41 +0300, > Konstantin Belousov a ?crit : > > Hello, > > ... > > > > > Ok This has been mfced to 9.2-STABLE. But I still see this panic > > > with 9-2/STABLE of today (Revision : 255811). This may be better > > > because before the box paniced within minutes and now within hours > > > (still using poudriere). > > > > > > panic: > > > fault code = supervisor read data, page not present > > > instruction pointer = 0x20:0x808ebfcd > > > stack pointer = 0x28:0xff824c2e0630 > > > frame pointer = 0x28:0xff824c2e06a0 > > > code segment= base 0x0, limit 0xf, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags= interrupt enabled, resume, IOPL = 0 > > > current process = 54243 (gvfsd-trash) > > > trap number = 12 > > > panic: page fault > > > cpuid = 2 > > > KDB: stack backtrace: > > > #0 0x80939ad6 at kdb_backtrace+0x66 > > > #1 0x808ffacd at panic+0x1cd > > > #2 0x80cdfbe9 at trap_fatal+0x289 > > > #3 0x80cdff4f at trap_pfault+0x20f > > > #4 0x80ce0504 at trap+0x344 > > > #5 0x80cc9b43 at calltrap+0x8 > > > #6 0x8099d043 at filt_vfsvnode+0xf3 > > > #7 0x808c4793 at kqueue_register+0x3e3 > > > #8 0x808c4de8 at kern_kevent+0x108 > > > #9 0x808c5950 at sys_kevent+0x90 > > > #10 0x80cdf3a8 at amd64_syscall+0x5d8 > > > #11 0x80cc9e27 at Xfast_syscall+0xf7 > > > > > > Full core.txt : > > > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0 > > > > For start, please load the core into kgdb and for > > frame 8 > > p *kn > > (kgdb) frame 8 > #8 0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000, hint=0) > at /usr/src/sys/kern/vfs_subr.c:4600 > 4600 VI_LOCK(vp); > (kgdb) p *kn > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, > kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, > kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, > flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status = 24, > kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp = 0xfe016949e190, > p_proc = 0xfe016949e190, p_aio = 0xfe016949e190, > p_lio = 0xfe016949e190}, kn_fop = 0x812fd440, > kn_hook = 0xfe0119d0b1f8, kn_hookid = 0} From the kgdb, also please do p *(struct vnode *)0xfe0119d0b1f8 > > > > Also, please follow > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > > to recompile kernel with the debugging options and try to recreate > > the panic. > > It's building. Please try the following. diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index aa165a0..5715f35 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -4421,10 +4421,14 @@ filt_vfsdetach(struct knote *kn) static int filt_vfsread(struct knote *kn, long hint) { - struct vnode *vp = (struct vnode *)kn->kn_hook; + struct vnode *vp; struct vattr va; int res; + if ((kn->kn_status & KN_DETACHED) != 0) + return (0); + vp = (struct vnode *)kn->kn_hook; + /* * filesystem is gone, so set the EOF flag and schedule * the knote for deletion. @@ -4450,8 +4454,11 @@ filt_vfsread(struct knote *kn, long hint) static int filt_vfswrite(struct knote *kn, long hint) { - struct vnode *vp = (struct vnode *)kn->kn_hook; + struct vnode *vp; + if ((kn->kn_status & KN_DETACHED) != 0) + return (0); + vp = (struct vnode *)kn->kn_hook; VI_LOCK(vp); /* @@ -4469,9 +4476,12 @@ filt_vfswrite(struct knote *kn, long hint) static int filt_vfsvnode(struct knote *kn, long hint) { - struct vnode *vp = (struct vnode *)kn->kn_hook; + struct vnode *vp; int res; + if ((kn->kn_status & KN_DETACHED) != 0) + return (0); + vp = (struct vnode *)kn->kn_hook; VI_LOCK(vp); if (kn->kn_sfflags & hint) kn->kn_fflags |= hint; pgp0ungmMXQcb.pgp Description: PGP signature
Re: Possible kqueue related issue on STABLE/RC.
Le Mon, 23 Sep 2013 23:31:41 +0300, Konstantin Belousov a écrit : Hello, ... > > Ok This has been mfced to 9.2-STABLE. But I still see this panic > > with 9-2/STABLE of today (Revision : 255811). This may be better > > because before the box paniced within minutes and now within hours > > (still using poudriere). > > > > panic: > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0x808ebfcd > > stack pointer = 0x28:0xff824c2e0630 > > frame pointer = 0x28:0xff824c2e06a0 > > code segment= base 0x0, limit 0xf, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags= interrupt enabled, resume, IOPL = 0 > > current process = 54243 (gvfsd-trash) > > trap number = 12 > > panic: page fault > > cpuid = 2 > > KDB: stack backtrace: > > #0 0x80939ad6 at kdb_backtrace+0x66 > > #1 0x808ffacd at panic+0x1cd > > #2 0x80cdfbe9 at trap_fatal+0x289 > > #3 0x80cdff4f at trap_pfault+0x20f > > #4 0x80ce0504 at trap+0x344 > > #5 0x80cc9b43 at calltrap+0x8 > > #6 0x8099d043 at filt_vfsvnode+0xf3 > > #7 0x808c4793 at kqueue_register+0x3e3 > > #8 0x808c4de8 at kern_kevent+0x108 > > #9 0x808c5950 at sys_kevent+0x90 > > #10 0x80cdf3a8 at amd64_syscall+0x5d8 > > #11 0x80cc9e27 at Xfast_syscall+0xf7 > > > > Full core.txt : > > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0 > > For start, please load the core into kgdb and for > frame 8 > p *kn (kgdb) frame 8 #8 0x8099d043 in filt_vfsvnode (kn=0xfe0147a7f000, hint=0) at /usr/src/sys/kern/vfs_subr.c:4600 4600VI_LOCK(vp); (kgdb) p *kn $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0x0}, kn_kq = 0xfe01079a6200, kn_kevent = {ident = 62, filter = -4, flags = 32784, fflags = 0, data = 0, udata = 0x0}, kn_status = 24, kn_sfflags = 47, kn_sdata = 0, kn_ptr = {p_fp = 0xfe016949e190, p_proc = 0xfe016949e190, p_aio = 0xfe016949e190, p_lio = 0xfe016949e190}, kn_fop = 0x812fd440, kn_hook = 0xfe0119d0b1f8, kn_hookid = 0} > Also, please follow > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > to recompile kernel with the debugging options and try to recreate > the panic. It's building. Thanks, regards ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Mon, Sep 23, 2013 at 03:37:08PM +0200, Patrick Lamaiziere wrote: > Le Fri, 20 Sep 2013 15:17:05 +0200, > Patrick Lamaiziere a ?crit : > > > Le Thu, 12 Sep 2013 10:36:43 +0300, > > Konstantin Belousov a ?crit : > > > > Hello, > > > > > Might be, your issue is that some filesystems do not care about > > > proper locking mode for the fifos. UFS carefully disables shared > > > locking for VFIFO, but it seems ZFS is not. I can propose the > > > following band-aid, which could help you. > > > > > > I have no idea is it the same issue as the kqueue panic. > > > > > > diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c > > > index c53030a..00bd998 100644 > > > --- a/sys/kern/vfs_vnops.c > > > +++ b/sys/kern/vfs_vnops.c > > > @@ -267,6 +267,8 @@ vn_open_vnode(struct vnode *vp, int fmode, > > > struct ucred *cred, return (error); > > > } > > > } > > > + if (vp->v_type == VFIFO && VOP_ISLOCKED(vp) != > > > LK_EXCLUSIVE) > > > + vn_lock(vp, LK_UPGRADE | LK_RETRY); > > > if ((error = VOP_OPEN(vp, fmode, cred, td, fp)) != 0) > > > return (error); > > > > > > @@ -358,7 +360,7 @@ vn_close(vp, flags, file_cred, td) > > > struct mount *mp; > > > int error, lock_flags; > > > > > > - if (!(flags & FWRITE) && vp->v_mount != NULL && > > > + if (vp->v_type != VFIFO && !(flags & FWRITE) && > > > vp->v_mount != NULL && vp->v_mount->mnt_kern_flag & > > > MNTK_EXTENDED_SHARED) lock_flags = LK_SHARED; > > > else > > > > Ok This has been mfced to 9.2-STABLE. But I still see this panic with > 9-2/STABLE of today (Revision : 255811). This may be better because > before the box paniced within minutes and now within hours (still using > poudriere). > > panic: > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x808ebfcd > stack pointer = 0x28:0xff824c2e0630 > frame pointer = 0x28:0xff824c2e06a0 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 54243 (gvfsd-trash) > trap number = 12 > panic: page fault > cpuid = 2 > KDB: stack backtrace: > #0 0x80939ad6 at kdb_backtrace+0x66 > #1 0x808ffacd at panic+0x1cd > #2 0x80cdfbe9 at trap_fatal+0x289 > #3 0x80cdff4f at trap_pfault+0x20f > #4 0x80ce0504 at trap+0x344 > #5 0x80cc9b43 at calltrap+0x8 > #6 0x8099d043 at filt_vfsvnode+0xf3 > #7 0x808c4793 at kqueue_register+0x3e3 > #8 0x808c4de8 at kern_kevent+0x108 > #9 0x808c5950 at sys_kevent+0x90 > #10 0x80cdf3a8 at amd64_syscall+0x5d8 > #11 0x80cc9e27 at Xfast_syscall+0xf7 > > Full core.txt : > http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0 For start, please load the core into kgdb and for frame 8 p *kn Also, please follow http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html to recompile kernel with the debugging options and try to recreate the panic. pgpNUT12ewyWc.pgp Description: PGP signature
Re: Possible kqueue related issue on STABLE/RC.
Le Fri, 20 Sep 2013 15:17:05 +0200, Patrick Lamaiziere a écrit : > Le Thu, 12 Sep 2013 10:36:43 +0300, > Konstantin Belousov a écrit : > > Hello, > > > Might be, your issue is that some filesystems do not care about > > proper locking mode for the fifos. UFS carefully disables shared > > locking for VFIFO, but it seems ZFS is not. I can propose the > > following band-aid, which could help you. > > > > I have no idea is it the same issue as the kqueue panic. > > > > diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c > > index c53030a..00bd998 100644 > > --- a/sys/kern/vfs_vnops.c > > +++ b/sys/kern/vfs_vnops.c > > @@ -267,6 +267,8 @@ vn_open_vnode(struct vnode *vp, int fmode, > > struct ucred *cred, return (error); > > } > > } > > + if (vp->v_type == VFIFO && VOP_ISLOCKED(vp) != > > LK_EXCLUSIVE) > > + vn_lock(vp, LK_UPGRADE | LK_RETRY); > > if ((error = VOP_OPEN(vp, fmode, cred, td, fp)) != 0) > > return (error); > > > > @@ -358,7 +360,7 @@ vn_close(vp, flags, file_cred, td) > > struct mount *mp; > > int error, lock_flags; > > > > - if (!(flags & FWRITE) && vp->v_mount != NULL && > > + if (vp->v_type != VFIFO && !(flags & FWRITE) && > > vp->v_mount != NULL && vp->v_mount->mnt_kern_flag & > > MNTK_EXTENDED_SHARED) lock_flags = LK_SHARED; > > else > Ok This has been mfced to 9.2-STABLE. But I still see this panic with 9-2/STABLE of today (Revision : 255811). This may be better because before the box paniced within minutes and now within hours (still using poudriere). panic: fault code = supervisor read data, page not present instruction pointer = 0x20:0x808ebfcd stack pointer = 0x28:0xff824c2e0630 frame pointer = 0x28:0xff824c2e06a0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 54243 (gvfsd-trash) trap number = 12 panic: page fault cpuid = 2 KDB: stack backtrace: #0 0x80939ad6 at kdb_backtrace+0x66 #1 0x808ffacd at panic+0x1cd #2 0x80cdfbe9 at trap_fatal+0x289 #3 0x80cdff4f at trap_pfault+0x20f #4 0x80ce0504 at trap+0x344 #5 0x80cc9b43 at calltrap+0x8 #6 0x8099d043 at filt_vfsvnode+0xf3 #7 0x808c4793 at kqueue_register+0x3e3 #8 0x808c4de8 at kern_kevent+0x108 #9 0x808c5950 at sys_kevent+0x90 #10 0x80cdf3a8 at amd64_syscall+0x5d8 #11 0x80cc9e27 at Xfast_syscall+0xf7 Full core.txt : http://user.lamaiziere.net/patrick/public/vfs_vnode-core.txt.0 Thanks, regards. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
Le Thu, 12 Sep 2013 10:36:43 +0300, Konstantin Belousov a écrit : Hello, > Might be, your issue is that some filesystems do not care about proper > locking mode for the fifos. UFS carefully disables shared locking for > VFIFO, but it seems ZFS is not. I can propose the following band-aid, > which could help you. > > I have no idea is it the same issue as the kqueue panic. > > diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c > index c53030a..00bd998 100644 > --- a/sys/kern/vfs_vnops.c > +++ b/sys/kern/vfs_vnops.c > @@ -267,6 +267,8 @@ vn_open_vnode(struct vnode *vp, int fmode, struct > ucred *cred, return (error); > } > } > + if (vp->v_type == VFIFO && VOP_ISLOCKED(vp) != LK_EXCLUSIVE) > + vn_lock(vp, LK_UPGRADE | LK_RETRY); > if ((error = VOP_OPEN(vp, fmode, cred, td, fp)) != 0) > return (error); > > @@ -358,7 +360,7 @@ vn_close(vp, flags, file_cred, td) > struct mount *mp; > int error, lock_flags; > > - if (!(flags & FWRITE) && vp->v_mount != NULL && > + if (vp->v_type != VFIFO && !(flags & FWRITE) && > vp->v_mount != NULL && vp->v_mount->mnt_kern_flag & > MNTK_EXTENDED_SHARED) lock_flags = LK_SHARED; > else Hmmm, So what is the fix for 9.2-STABLE ? As far I can see there is no function vn_open_vnode() here and I don't see where I should patch. I see this panic too (with STABLE of today), while using poudriere + ZFS like Jimmy. Thanks, regards ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Fri, Sep 13, 2013 at 12:40:28AM +0300, Andriy Gapon wrote: > on 12/09/2013 21:49 Konstantin Belousov said the following: > > Ok, so it is ZFS indeed. I think I will commit the band-aid to head > > shortly. > > I am not sure if my message <5231a016.7060...@freebsd.org> was intercepted by > NSA and didn't reach you... At least I haven't seen any reaction to it. > So, ZFS does not need this band-aid. If you think that it may be needed for > other filesystems or is useful in general, then okay. > > Just in case, r254694 is not in releng/9.2 and I haven't seen any evidence > that > Jimmy has tested a tree that included that commit. Good to know, thank you. My change does not conflict with ZFS fix, and more, the cost of the change for well-behaved filesystem is zero, since upgrade is only initiated when needed. I think both fixes can coexist usefully. pgpsDfe1M8d2p.pgp Description: PGP signature
Re: Possible kqueue related issue on STABLE/RC.
on 12/09/2013 21:49 Konstantin Belousov said the following: > On Thu, Sep 12, 2013 at 08:28:48PM +0200, Jimmy Olgeni wrote: >> >> On Thu, 12 Sep 2013, Konstantin Belousov wrote: >> >>> Might be, your issue is that some filesystems do not care about proper >>> locking mode for the fifos. UFS carefully disables shared locking for >>> VFIFO, but it seems ZFS is not. I can propose the following band-aid, >>> which could help you. >> >> This certainly seems to improve things. I have been running builds >> for the past couple of hours without any critical problem. > Ok, so it is ZFS indeed. I think I will commit the band-aid to head > shortly. I am not sure if my message <5231a016.7060...@freebsd.org> was intercepted by NSA and didn't reach you... At least I haven't seen any reaction to it. So, ZFS does not need this band-aid. If you think that it may be needed for other filesystems or is useful in general, then okay. Just in case, r254694 is not in releng/9.2 and I haven't seen any evidence that Jimmy has tested a tree that included that commit. >> I spotted a few LORs but nothing bad happened so far: >> >> http://olgeni.olgeni.com/~olgeni/dmesg-2013-09-12 > Out of curiousity, please look up the line for vm_object_terminate+0x1d8. > >> >> If it keeps working this way I hope there's still some time to fit it >> into an -RC. > > For 10.0 yes, 9.2 is sealed (hopefully). > -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Thu, 12 Sep 2013, Konstantin Belousov wrote: This certainly seems to improve things. I have been running builds for the past couple of hours without any critical problem. Ok, so it is ZFS indeed. I think I will commit the band-aid to head shortly. Thank you! I spotted a few LORs but nothing bad happened so far: http://olgeni.olgeni.com/~olgeni/dmesg-2013-09-12 Out of curiousity, please look up the line for vm_object_terminate+0x1d8. Here it is, VM_OBJECT_UNLOCK: (kgdb) list *vm_object_terminate+0x1d8 0x80b5cc58 is in vm_object_terminate (/usr/src/sys/vm/vm_object.c:767). 762 763 /* 764 * Let the pager know object is dead. 765 */ 766 vm_pager_deallocate(object); 767 VM_OBJECT_UNLOCK(object); 768 769 vm_object_destroy(object); 770 } 771 -- jimmy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Thu, Sep 12, 2013 at 08:28:48PM +0200, Jimmy Olgeni wrote: > > On Thu, 12 Sep 2013, Konstantin Belousov wrote: > > > Might be, your issue is that some filesystems do not care about proper > > locking mode for the fifos. UFS carefully disables shared locking for > > VFIFO, but it seems ZFS is not. I can propose the following band-aid, > > which could help you. > > This certainly seems to improve things. I have been running builds > for the past couple of hours without any critical problem. Ok, so it is ZFS indeed. I think I will commit the band-aid to head shortly. > > I spotted a few LORs but nothing bad happened so far: > > http://olgeni.olgeni.com/~olgeni/dmesg-2013-09-12 Out of curiousity, please look up the line for vm_object_terminate+0x1d8. > > If it keeps working this way I hope there's still some time to fit it > into an -RC. For 10.0 yes, 9.2 is sealed (hopefully). pgpXYfDDDWHQu.pgp Description: PGP signature
Re: Possible kqueue related issue on STABLE/RC.
On Thu, 12 Sep 2013, Konstantin Belousov wrote: Might be, your issue is that some filesystems do not care about proper locking mode for the fifos. UFS carefully disables shared locking for VFIFO, but it seems ZFS is not. I can propose the following band-aid, which could help you. This certainly seems to improve things. I have been running builds for the past couple of hours without any critical problem. I spotted a few LORs but nothing bad happened so far: http://olgeni.olgeni.com/~olgeni/dmesg-2013-09-12 If it keeps working this way I hope there's still some time to fit it into an -RC. -- jimmy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
on 12/09/2013 10:36 Konstantin Belousov said the following: > UFS carefully disables shared locking for > VFIFO, but it seems ZFS is not. In fact, ZFS should do that since r253603 (MFC-ed to stables as r254694 and r254695): http://svnweb.freebsd.org/changeset/base/253603 If it still doesn't then it's a bug. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Thu, 12 Sep 2013, Konstantin Belousov wrote: 4591 is the VI_LOCK(vp) in filt_vfsvnode: static int filt_vfsvnode(struct knote *kn, long hint) { struct vnode *vp = (struct vnode *)kn->kn_hook; int res; VI_LOCK(vp); ^^^ if (kn->kn_sfflags & hint) kn->kn_fflags |= hint; if (hint == NOTE_REVOKE) { kn->kn_flags |= EV_EOF; VI_UNLOCK(vp); return (1); } res = (kn->kn_fflags != 0); VI_UNLOCK(vp); return (res); } Which line is 4591 ? "VI_LOCK(vp);" which bumps into the assertion. -- jimmy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Thu, 12 Sep 2013, Konstantin Belousov wrote: This time I tried with clang + these options and I got something more interesting. All works fine until the lock violation below: Clang is, well, not relevant there. Still, with clang I could get a hard reset rather than a hang. But maybe there are two different issues. I'll run more tests and see if the fifo problem goes away with your patch below. It seems you edited the kernel output, at least rearranging large blocks of text. I tried to interpret what I see in a useful way. I got the message buffer from a minidump, here: http://olgeni.olgeni.com/~olgeni/textdump.tar.1 Might be, your issue is that some filesystems do not care about proper locking mode for the fifos. UFS carefully disables shared locking for VFIFO, but it seems ZFS is not. I can propose the following band-aid, which could help you. Thanks a lot! I'll give it a run when I get back home. -- jimmy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Wed, Sep 11, 2013 at 10:32:31PM +0200, Jimmy Olgeni wrote: > > Hi, > > On Wed, 11 Sep 2013, Konstantin Belousov wrote: > > > Could you list the lines around the the vfs_subr.c:4591 in your kernel ? > > 4591 is the VI_LOCK(vp) in filt_vfsvnode: > > static int > filt_vfsvnode(struct knote *kn, long hint) > { > struct vnode *vp = (struct vnode *)kn->kn_hook; > int res; > > VI_LOCK(vp); > if (kn->kn_sfflags & hint) > kn->kn_fflags |= hint; > if (hint == NOTE_REVOKE) { > kn->kn_flags |= EV_EOF; > VI_UNLOCK(vp); > return (1); > } > res = (kn->kn_fflags != 0); > VI_UNLOCK(vp); > return (res); > } Which line is 4591 ? > > > Next test with INVARIANTS & C as soon as the build is done. > > -- > jimmy pgpX1mbTn0uvT.pgp Description: PGP signature
Re: Possible kqueue related issue on STABLE/RC.
On Wed, Sep 11, 2013 at 11:18:34PM +0200, Jimmy Olgeni wrote: > > Hi, > > On Wed, 11 Sep 2013, Konstantin Belousov wrote: > > > Also, do you have all options listed at > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > > enabled ? > > This time I tried with clang + these options and I got something more > interesting. All works fine until the lock violation below: Clang is, well, not relevant there. > > acquiring duplicate lock of same type: "os.lock_mtx" > 1st os.lock_mtx @ nvidia_os.c:748 > 2nd os.lock_mtx @ nvidia_os.c:748 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8360e1e2f0 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8360e1e3a0 > witness_checkorder() at witness_checkorder+0xc0a/frame 0xff8360e1e420 > _mtx_lock_flags() at _mtx_lock_flags+0x74/frame 0xff8360e1e460 > os_acquire_spinlock() at os_acquire_spinlock+0x17/frame 0xff8360e1e470 > _nv012281rm() at _nv012281rm+0x9/frame 0xff800cfadec0 > lock order reversal: > 1st 0xfe003603c098 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1240 > 2nd 0xfe003603b848 syncer (syncer) @ /usr/src/sys/kern/vfs_subr.c:2335 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8360bf3660 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8360bf3710 > witness_checkorder() at witness_checkorder+0xc0a/frame 0xff8360bf3790 > __lockmgr_args() at __lockmgr_args+0x744/frame 0xff8360bf38b0 > vop_stdlock() at vop_stdlock+0x3c/frame 0xff8360bf38d0 > VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xff8360bf3900 > _vn_lock() at _vn_lock+0x63/frame 0xff8360bf3960 > vputx() at vputx+0x34b/frame 0xff8360bf39c0 > dounmount() at dounmount+0x282/frame 0xff8360bf3a30 > sys_unmount() at sys_unmount+0x3a6/frame 0xff8360bf3b20 > amd64_syscall() at amd64_syscall+0x259/frame 0xff8360bf3c30 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8360bf3c30 > --- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x801918a7c, rsp = > 0x7fffbf18, rbp = 0x802818800 --- > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff83611df6b0 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xff83611df760 > assert_vop_elocked() at assert_vop_elocked+0x6a/frame 0xff83611df790 > fifo_open() at fifo_open+0x38/frame 0xff83611df810 > VOP_OPEN_APV() at VOP_OPEN_APV+0xd1/frame 0xff83611df840 > vn_open_cred() at vn_open_cred+0x532/frame 0xff83611df9b0 > kern_openat() at kern_openat+0x1c1/frame 0xff83611dfb20 > amd64_syscall() at amd64_syscall+0x259/frame 0xff83611dfc30 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff83611dfc30 > > > Here it goes: > > > --- syscall (5, FreeBSD ELF64, sys_open), rip = 0x800db3d3c, rsp = > 0x7fffc968, rbp = 0 --- > fifo_open: 0xfe0063251cd0 is not exclusive locked but should be > KDB: enter: lock violation > > 0xfe0063251cd0: tag zfs, type VFIFO > usecount 2, writecount 0, refcount 2 mountedhere 0xfe005048cc00 > flags (VI_ACTIVE) > lock type zfs: SHARED (count 1) > #0 0x808b22fa at __lockmgr_args+0xeba > #1 0x8095df4c at vop_stdlock+0x3c > #2 0x80d7f6ae at VOP_LOCK1_APV+0xbe > #3 0x80980153 at _vn_lock+0x63 > #4 0x8096e321 at vget+0xa1 > #5 0x80959921 at cache_lookup_times+0x591 > #6 0x8095aa7d at vfs_cache_lookup+0x9d > #7 0x80d7c881 at VOP_LOOKUP_APV+0xd1 > #8 0x80963aff at lookup+0x6bf > #9 0x80963029 at namei+0x589 > #10 0x8097fa0f at vn_open_cred+0x27f > #11 0x80978031 at kern_openat+0x1c1 > #12 0x80ccce49 at amd64_syscall+0x259 > #13 0x80cb5e9b at Xfast_syscall+0xfb > , fifo with 0 readers and 1 writers It seems you edited the kernel output, at least rearranging large blocks of text. I tried to interpret what I see in a useful way. Might be, your issue is that some filesystems do not care about proper locking mode for the fifos. UFS carefully disables shared locking for VFIFO, but it seems ZFS is not. I can propose the following band-aid, which could help you. I have no idea is it the same issue as the kqueue panic. diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c index c53030a..00bd998 100644 --- a/sys/kern/vfs_vnops.c +++ b/sys/kern/vfs_vnops.c @@ -267,6 +267,8 @@ vn_open_vnode(struct vnode *vp, int fmode, struct ucred *cred, return (error); } } + if (vp->v_type == VFIFO && VOP_ISLOCKED(vp) != LK_EXCLUSIVE) + vn_lock(vp, LK_UPGRADE | LK_RETRY); if ((error = VOP_OPEN(vp, fmode, cred, td, fp)) != 0) return (error); @@ -358,7 +360,7 @@ vn_close(vp, flags, file_cred, td) struct mount *mp; int error, lock_flags; - if (!(flags & FWRITE) && vp->v_mount != NULL && + if (vp->v_type != VFIFO && !(flags &
Re: Possible kqueue related issue on STABLE/RC.
Hi, On Wed, 11 Sep 2013, Konstantin Belousov wrote: Also, do you have all options listed at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html enabled ? This time I tried with clang + these options and I got something more interesting. All works fine until the lock violation below: acquiring duplicate lock of same type: "os.lock_mtx" 1st os.lock_mtx @ nvidia_os.c:748 2nd os.lock_mtx @ nvidia_os.c:748 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8360e1e2f0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8360e1e3a0 witness_checkorder() at witness_checkorder+0xc0a/frame 0xff8360e1e420 _mtx_lock_flags() at _mtx_lock_flags+0x74/frame 0xff8360e1e460 os_acquire_spinlock() at os_acquire_spinlock+0x17/frame 0xff8360e1e470 _nv012281rm() at _nv012281rm+0x9/frame 0xff800cfadec0 lock order reversal: 1st 0xfe003603c098 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1240 2nd 0xfe003603b848 syncer (syncer) @ /usr/src/sys/kern/vfs_subr.c:2335 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8360bf3660 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8360bf3710 witness_checkorder() at witness_checkorder+0xc0a/frame 0xff8360bf3790 __lockmgr_args() at __lockmgr_args+0x744/frame 0xff8360bf38b0 vop_stdlock() at vop_stdlock+0x3c/frame 0xff8360bf38d0 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xff8360bf3900 _vn_lock() at _vn_lock+0x63/frame 0xff8360bf3960 vputx() at vputx+0x34b/frame 0xff8360bf39c0 dounmount() at dounmount+0x282/frame 0xff8360bf3a30 sys_unmount() at sys_unmount+0x3a6/frame 0xff8360bf3b20 amd64_syscall() at amd64_syscall+0x259/frame 0xff8360bf3c30 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8360bf3c30 --- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x801918a7c, rsp = 0x7fffbf18, rbp = 0x802818800 --- KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff83611df6b0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff83611df760 assert_vop_elocked() at assert_vop_elocked+0x6a/frame 0xff83611df790 fifo_open() at fifo_open+0x38/frame 0xff83611df810 VOP_OPEN_APV() at VOP_OPEN_APV+0xd1/frame 0xff83611df840 vn_open_cred() at vn_open_cred+0x532/frame 0xff83611df9b0 kern_openat() at kern_openat+0x1c1/frame 0xff83611dfb20 amd64_syscall() at amd64_syscall+0x259/frame 0xff83611dfc30 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff83611dfc30 Here it goes: --- syscall (5, FreeBSD ELF64, sys_open), rip = 0x800db3d3c, rsp = 0x7fffc968, rbp = 0 --- fifo_open: 0xfe0063251cd0 is not exclusive locked but should be KDB: enter: lock violation 0xfe0063251cd0: tag zfs, type VFIFO usecount 2, writecount 0, refcount 2 mountedhere 0xfe005048cc00 flags (VI_ACTIVE) lock type zfs: SHARED (count 1) #0 0x808b22fa at __lockmgr_args+0xeba #1 0x8095df4c at vop_stdlock+0x3c #2 0x80d7f6ae at VOP_LOCK1_APV+0xbe #3 0x80980153 at _vn_lock+0x63 #4 0x8096e321 at vget+0xa1 #5 0x80959921 at cache_lookup_times+0x591 #6 0x8095aa7d at vfs_cache_lookup+0x9d #7 0x80d7c881 at VOP_LOOKUP_APV+0xd1 #8 0x80963aff at lookup+0x6bf #9 0x80963029 at namei+0x589 #10 0x8097fa0f at vn_open_cred+0x27f #11 0x80978031 at kern_openat+0x1c1 #12 0x80ccce49 at amd64_syscall+0x259 #13 0x80cb5e9b at Xfast_syscall+0xfb , fifo with 0 readers and 1 writers 0xfe00a21d3000: tag zfs, type VREG usecount 4, writecount 0, refcount 5 mountedhere 0 flags (VI_ACTIVE) v_object 0xfe00a0f71bc8 ref 3 pages 150 lock type zfs: SHARED (count 1) #0 0x808b22fa at __lockmgr_args+0xeba #1 0x8095df4c at vop_stdlock+0x3c #2 0x80d7f6ae at VOP_LOCK1_APV+0xbe #3 0x80980153 at _vn_lock+0x63 #4 0x8096e321 at vget+0xa1 #5 0x80b4e1de at vm_fault_hold+0x5ee #6 0x80b4dba7 at vm_fault+0x77 #7 0x80ccc85b at trap_pfault+0x1bb #8 0x80ccbf92 at trap+0x512 #9 0x80cb5bb2 at calltrap+0x8 0xfe0195389290: tag zfs, type VREG usecount 4, writecount 0, refcount 5 mountedhere 0 flags (VI_ACTIVE) v_object 0xfe000f2df740 ref 3 pages 223 lock type zfs: SHARED (count 1) #0 0x808b22fa at __lockmgr_args+0xeba #1 0x8095df4c at vop_stdlock+0x3c #2 0x80d7f6ae at VOP_LOCK1_APV+0xbe #3 0x80980153 at _vn_lock+0x63 #4 0x8096e321 at vget+0xa1 #5 0x80b4e1de at vm_fault_hold+0x5ee #6 0x80b4dba7 at vm_fault+0x77 #7 0x80ccc85b at trap_pfault+0x1bb #8 0x80ccbf92 at trap+0x512 #9 0x80cb5bb2 at calltrap+0x8 0xfe00a2207000: tag zfs, type VREG usecount 4, writecount 0, refcount 5 mountedhere 0 flags (VI_ACTIVE) v_object 0xfe000f0b8d98 ref 3 pages 201 lock type zfs: SHARED (count 1) #0 0xf
Re: Possible kqueue related issue on STABLE/RC.
On Wed, 11 Sep 2013, Volodymyr Kostyrko wrote: 11.09.2013 18:07, Jimmy Olgeni wrote: Perhaps I found something weird while running 9.2-RC3 FreeBSD 9.2-RC3 #0 r255393 (ZFS-only setup). Unfortunately I'm not able to get a minidump for the latest RC, but at this point I suspect that something is going on with glib20 and kqueue on both -STABLE and -RC. Can you spare some more info on this? Sure, here it goes: 1. What is your /etc/src.conf and /etc/make.conf files? My /etc/src.conf: === PORTS_MODULES=emulators/virtualbox-ose-kmod sysutils/fusefs-kmod sysutils/pefs-kmod x11/nvidia-driver === My /etc/make.conf: === APACHE_PORT=www/apache22 DEFAULT_PGSQL_VER=92 WITH_NEW_XORG=yes PERL_VERSION=5.14.4 .if (!empty(.CURDIR:M/usr/src*) || !empty(.CURDIR:M/usr/obj*)) .if !defined(NOCCACHE) CC:= /usr/local/libexec/ccache/world/cc CXX:= /usr/local/libexec/ccache/world/c++ .endif .endif === 2. Does your copy of sources has some third-party patches applied? No extra patches were applied. For the RC tests I also removed the whole /usr/src and checked it out from svn from scratch. Currently I have this kernel config: === include GENERIC ident RELENG_9 device crypto # core crypto support device cryptodev # /dev/crypto for access to h/w device enc # IPsec interface. options DDB # Enable the ddb debugger backend. options IPSEC # IP security (requires device crypto) options IPSEC_NAT_T # NAT-T support, UDP encap of ESP options IPSEC_FILTERTUNNEL # filter ipsec packets from a tunnel options SC_DFLT_FONT# compile font in makeoptions SC_DFLT_FONT=cp437 options SC_HISTORY_SIZE=512 # number of history buffer lines options VGA_WIDTH90 # support 90 column modes options RACCT # Resource Accounting options RCTL# Resource Limits # altq(9). Enable the base part of the hooks with the ALTQ option. # Individual disciplines must be built into the base system and can not be # loaded as modules at this point. In order to build a SMP kernel you must # also have the ALTQ_NOPCC option. options ALTQ options ALTQ_CBQ# Class Bases Queueing options ALTQ_RED# Random Early Detection options ALTQ_RIO# RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler options ALTQ_CDNR # Traffic conditioner options ALTQ_PRIQ # Priority Queueing options ALTQ_NOPCC # Required for SMP build options TEKEN_UTF8 === Also, my loader.conf: === autoboot_delay="5" kern.cam.ada.legacy_aliases="0" kern.cam.scsi_delay="1500" net.inet.ip.fw.default_to_accept="1" vm.pmap.pg_ps_enabled="1" ahci_load="YES" ipmi_load="YES" zfs_load="YES" geom_uzip_load="YES" hw.memtest.tests="0" hw.usb.no_pf="1" vm.kmem_size_max="16G" vm.kmem_size="12G" vfs.root.mountfrom="zfs:rpool/zfsroot" vfs.zfs.write_limit_override="1536M" vfs.zfs.txg.synctime_ms="750" vfs.zfs.vdev.min_pending="1" vfs.zfs.vdev.max_pending="1" kern.ipc.semmns="512" kern.ipc.semmnu="256" kern.ipc.shmmni="256" kern.ipc.shmseg="256" nvidia_load="YES" vboxdrv_load="YES" amdtemp_load="YES" snd_hda_load="YES" hint.p4tcc.0.disabled="1" hint.acpi_throttle.0.disabled="1" === sysctl.conf: === debug.kdb.break_to_debugger=1 hw.snd.default_unit=2 kern.coredump=0 kern.ipc.shm_allow_removed=1 kern.ipc.somaxconn=4096 kern.maxfiles=25000 kern.maxvnodes=25 kern.ps_arg_cache_limit=1 kern.sched.preempt_thresh=224 machdep.kdb_on_nmi=0 machdep.panic_on_nmi=0 net.inet.icmp.log_redirect=0 net.inet6.ip6.v6only=0 net.link.ether.inet.log_arp_movements=0 vfs.hirunningspace=5242880 vfs.read_max=128 vfs.ufs.dirhash_maxmem=33554432 vfs.usermount=1 vfs.zfs.prefetch_disable=1 === 3. Does this happens on more than one PC i.e. are you sure hardware is not involved? First thing I thought of was either memory or the CPU temperature. Right now I have only one PC available to test it, but: - Memory looks ok, at least according to Memtest86/Memtest86+ (tested from Ultimate Boot CD) - CPU looks ok, meaning that it can process heavy workloads without a problem. I tried with dev.cpu.0.freq=2200 to avoid overheating, and by starting 4 different poudriere builds with -J2. I have CPU temperature in the prompt and it hovers aroung 50C during the builds. Without gvfs it works just fine. Running buildworld always seems to work; also running sysutils/stress (stress -v -t 5m --cpu 8 --io 4 --vm 2 --vm-bytes 128M --hdd 4) did not seem to bother the system. - ZFS scrub says that it's all OK on the storage side (initially I thought about something going wrong with ZFS due to bad tuning). Can you try to build world WITH_CLANG_IS_CC? Clang generated code is known to produce an instant coredump in situations where gcc generated code hits a loop or becomes
Re: Possible kqueue related issue on STABLE/RC.
On Wed, 11 Sep 2013, Volodymyr Kostyrko wrote: Can you try to build world WITH_CLANG_IS_CC? Clang generated code is known to produce an instant coredump in situations where gcc generated code hits a loop or becomes unresponsive. I removed ccache, rebuilt with WITH_CLANG_IS_CC and it worked for a while, but then I got a hard reset without even a core dump. I'm rebuilding with Konstantin's kernel debug options plus software watchdog and trying another round. -- jimmy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
Hi, On Wed, 11 Sep 2013, Konstantin Belousov wrote: Could you list the lines around the the vfs_subr.c:4591 in your kernel ? 4591 is the VI_LOCK(vp) in filt_vfsvnode: static int filt_vfsvnode(struct knote *kn, long hint) { struct vnode *vp = (struct vnode *)kn->kn_hook; int res; VI_LOCK(vp); if (kn->kn_sfflags & hint) kn->kn_fflags |= hint; if (hint == NOTE_REVOKE) { kn->kn_flags |= EV_EOF; VI_UNLOCK(vp); return (1); } res = (kn->kn_fflags != 0); VI_UNLOCK(vp); return (res); } Next test with INVARIANTS & C as soon as the build is done. -- jimmy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Possible kqueue related issue on STABLE/RC.
On Wed, Sep 11, 2013 at 05:07:10PM +0200, Jimmy Olgeni wrote: > - However, this time I managed to get a minidump from the old -STABLE. I >saved it here: > > http://olgeni.olgeni.com/~olgeni/core.txt.0 Could you list the lines around the the vfs_subr.c:4591 in your kernel ? Also, do you have all options listed at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html enabled ? > Unfortunately I'm not able to get a minidump for the latest RC, but at this > point I suspect that something is going on with glib20 and kqueue on both > -STABLE and -RC. Do you run software watchdog ? If not, try it. It might allow to get the dump if the problem is software. But if the problem is similar to what you have catched in core.0, this would be not helpful, while building the debugging kernel is. pgpg5pkdaa5P8.pgp Description: PGP signature
Re: Possible kqueue related issue on STABLE/RC.
11.09.2013 18:07, Jimmy Olgeni wrote: Perhaps I found something weird while running 9.2-RC3 FreeBSD 9.2-RC3 #0 r255393 (ZFS-only setup). Unfortunately I'm not able to get a minidump for the latest RC, but at this point I suspect that something is going on with glib20 and kqueue on both -STABLE and -RC. Can you spare some more info on this? 1. What is your /etc/src.conf and /etc/make.conf files? 2. Does your copy of sources has some third-party patches applied? 3. Does this happens on more than one PC i.e. are you sure hardware is not involved? Can you try to build world WITH_CLANG_IS_CC? Clang generated code is known to produce an instant coredump in situations where gcc generated code hits a loop or becomes unresponsive. -- Sphinx of black quartz, judge my vow. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"