I opened PR210641 to track this after I hit it on i386 during the sys/kqueue/kqueue_test:main ATF test. I hit the panic two times in 9 tries. -Alan
On Wed, Jun 15, 2016 at 1:34 PM, Matthew Macy <mm...@nextbsd.org> wrote: > > > > ---- On Wed, 15 Jun 2016 10:45:24 -0700 Konstantin Belousov > <kostik...@gmail.com> wrote ---- > > On Wed, Jun 15, 2016 at 10:39:42AM -0700, Matthew Macy wrote: > > > > > > > > > > > > > > > You can use dwarf4 if you use GDB from ports > > How would it help ? > > The following statement to a native speaker would imply that GDB is the > problem: "There is not much gdb info here; I'll try to rebuild kgdb." > > If in fact %rip has been smashed that's a bit like saying "the light doesn't > show anything on the table, I'll replace the light bulb" - when in fact there > isn't anything on the table. > > > Problem for kgdb is that %rip is zero, due to function pointer being set > > to NULL in a destroyed knlist. Either version of kgdb would not find > > neither code nor unwind annotations for zero address. > > > > But the issue is understood and > > Yes. Since the initial e-mail. > > >> we are working on the version of fix. > > I'm glad you're on it. > > -M > > > > > > > ---- On Wed, 15 Jun 2016 04:50:00 -0700 Peter Holm<pe...@holm.cc> wrote > ----On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote: > On > Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote: > > I believe they > all have more or less the same cause. The crashes occur > > because we > acquire a knlist lock via the KN_LIST_LOCK macro, but when we > > call > KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has > > been > cleared by another thread. Thus we are unable to unlock the > > previously > acquired lock and hold it until something causes us to crash > > (such as > the witness code noticing that we???re returning to userland with > > the > lock still held). > ... > > I believe there???s also a small window where the > KN_LIST_LOCK macro > > checks kn->kn_knlist and finds it to be non-NULL, but > by the time it > > actually dereferences it, it has become NULL. This would > produce the > > ???page fault while in kernel mode??? crash. > > > > If > someone fa mi > liar with this code sees an obvious fix, I???ll be happy to > > test it. > Otherwise, I???d appreciate any advice on fixing this. My first > > thought > is that a ???struct knote??? ought to have its own mutex for > > controlling > access to the flag fields and ideally the ???kn_knlist??? field. > > I.e., > you would first acquire a knote???s lock and then the knlist lock, > > thus > ensuring that no one could clear the kn_knlist variable while you > > hold > the knlist lock. The knlist lock, however, usually comes from > > whichever > event producing entity the knote tracks, so getting lock > > ordering right > between the per-knote mutex and this other lock seems > > potentially hard. > (Sometimes we call into functions in kern_event.c with > > the knlist lock > already held, having been acquired in code outside of > > kern_event.c. > Consider, for example, calling KNOTE_LOCKED from > > kern_exit.c; the > PROC_LOCK macro has already been used to acquire the > > process lock, also > serving > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" _______________________________________________ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"