Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors
2011/8/25 Kostik Belousov : > On Thu, Aug 25, 2011 at 05:12:09PM -0500, Brandon Gooch wrote: >> On Thu, Aug 25, 2011 at 4:53 PM, Kostik Belousov wrote: >> > On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote: >> >> We're having a crash in some internal code running on FreeBSD 7.2 >> >> (specifically 7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know >> >> it's quite a bit behind) in which after 18-30 hours of running load >> >> tests, the code panics with: >> >> >> >> panic: Bad link elm 0xff0044c09600 next->prev != elm >> >> cpuid = 0 >> >> KDB: stack backtrace: >> >> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a >> >> panic() at 0x80307c72 = panic+0x182 >> >> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548 >> >> >> >> >> >> First question: where's the most appropriate place to ask about this >> >> kind of bug on a back version. >> > It is fine to ask there. >> > >> >> >> >> Second: does this remind anyone of any bugs? Googling came up with a >> >> few somewhat similar things but hasn't provided much insight so far. >> > In 99% of the cases, it means that you forgot to dev_ref() some cdev. >> >> So dev_ref increments the reference count for a cdev. Even though the >> work "loop" seems to indicate that we will iterate over a list of >> objects (one of which we may be missing a reference to via a missing >> dev_ref()), I'm not seeing how this can cause a panic from inside >> devfs_populate_loop(). >> >> Can you help me understand this? >> > Missing dev_ref() means that the memory for the cdev (and cdev_priv) is > freed prematurely. If this happens before destroy_dev() is called, > then the list which is iterated over by populate_loop(), is corrupted. > > See e.g. MAKEDEV_REF flag for make_dev(9) and its use in the (old) clone > handlers. > Ahhh, thanks Kostik. Reading make_dev(9) (and more source code) now... -Brandon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors
On Thu, Aug 25, 2011 at 4:53 PM, Kostik Belousov wrote: > On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote: >> We're having a crash in some internal code running on FreeBSD 7.2 >> (specifically 7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know >> it's quite a bit behind) in which after 18-30 hours of running load >> tests, the code panics with: >> >> panic: Bad link elm 0xff0044c09600 next->prev != elm >> cpuid = 0 >> KDB: stack backtrace: >> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a >> panic() at 0x80307c72 = panic+0x182 >> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548 >> >> >> First question: where's the most appropriate place to ask about this >> kind of bug on a back version. > It is fine to ask there. > >> >> Second: does this remind anyone of any bugs? Googling came up with a >> few somewhat similar things but hasn't provided much insight so far. > In 99% of the cases, it means that you forgot to dev_ref() some cdev. So dev_ref increments the reference count for a cdev. Even though the work "loop" seems to indicate that we will iterate over a list of objects (one of which we may be missing a reference to via a missing dev_ref()), I'm not seeing how this can cause a panic from inside devfs_populate_loop(). Can you help me understand this? -Brandon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors
On Thu, Aug 25, 2011 at 11:16 PM, Charlie Martin wrote: > We're having a crash in some internal code running on FreeBSD 7.2 > (specifically 7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know it's > quite a bit behind) in which after 18-30 hours of running load tests, the > code panics with: > > panic: Bad link elm 0xff0044c09600 next->prev != elm > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a > panic() at 0x80307c72 = panic+0x182 > devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548 > > > First question: where's the most appropriate place to ask about this kind of > bug on a back version. Probably -stable. I don't know how many developers are still running 7. Most are on 8 at this point. > Second: does this remind anyone of any bugs? Googling came up with a few > somewhat similar things but hasn't provided much insight so far. This panic is very common when list updates aren't adequately serialized. > Third: I tried compiling with the sys/queue.h QUEUE_MACRO_DEBUG defined in > order to get more useful information from the panic. The kernel build fails > in pmap.c when this macro is defined, giving an error saying the CTASSERT > macro is resolving to a negative array size. Is there any particular secret > to using this macro (like, no one goes there any more?) This is because you are running amd64 and the the pv_entry constants were defined assuming the default (smaller) list entry structure. I once fixed this in a local tree, but I think I was so dismayed at the "obviousness" of the bug I was tracking down that I neglected to commit the pmap update. It shouldn't be too hard to calculate the correct constants. Cheers ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors
On Thu, Aug 25, 2011 at 05:12:09PM -0500, Brandon Gooch wrote: > On Thu, Aug 25, 2011 at 4:53 PM, Kostik Belousov wrote: > > On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote: > >> We're having a crash in some internal code running on FreeBSD 7.2 > >> (specifically 7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know > >> it's quite a bit behind) in which after 18-30 hours of running load > >> tests, the code panics with: > >> > >> panic: Bad link elm 0xff0044c09600 next->prev != elm > >> cpuid = 0 > >> KDB: stack backtrace: > >> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a > >> panic() at 0x80307c72 = panic+0x182 > >> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548 > >> > >> > >> First question: where's the most appropriate place to ask about this > >> kind of bug on a back version. > > It is fine to ask there. > > > >> > >> Second: does this remind anyone of any bugs? Googling came up with a > >> few somewhat similar things but hasn't provided much insight so far. > > In 99% of the cases, it means that you forgot to dev_ref() some cdev. > > So dev_ref increments the reference count for a cdev. Even though the > work "loop" seems to indicate that we will iterate over a list of > objects (one of which we may be missing a reference to via a missing > dev_ref()), I'm not seeing how this can cause a panic from inside > devfs_populate_loop(). > > Can you help me understand this? > Missing dev_ref() means that the memory for the cdev (and cdev_priv) is freed prematurely. If this happens before destroy_dev() is called, then the list which is iterated over by populate_loop(), is corrupted. See e.g. MAKEDEV_REF flag for make_dev(9) and its use in the (old) clone handlers. pgpWJlf9huRNl.pgp Description: PGP signature
Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors
On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote: > We're having a crash in some internal code running on FreeBSD 7.2 > (specifically 7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know > it's quite a bit behind) in which after 18-30 hours of running load > tests, the code panics with: > > panic: Bad link elm 0xff0044c09600 next->prev != elm > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a > panic() at 0x80307c72 = panic+0x182 > devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548 > > > First question: where's the most appropriate place to ask about this > kind of bug on a back version. It is fine to ask there. > > Second: does this remind anyone of any bugs? Googling came up with a > few somewhat similar things but hasn't provided much insight so far. In 99% of the cases, it means that you forgot to dev_ref() some cdev. pgpG6DLjTvGZh.pgp Description: PGP signature