Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Brandon Gooch
2011/8/25 Kostik Belousov :
> On Thu, Aug 25, 2011 at 05:12:09PM -0500, Brandon Gooch wrote:
>> On Thu, Aug 25, 2011 at 4:53 PM, Kostik Belousov  wrote:
>> > On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote:
>> >> We're having a crash in some internal code running on FreeBSD 7.2
>> >> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know
>> >> it's quite a bit behind) in which after 18-30 hours of running load
>> >> tests, the code panics with:
>> >>
>> >> panic: Bad link elm 0xff0044c09600 next->prev != elm
>> >> cpuid = 0
>> >> KDB: stack backtrace:
>> >> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
>> >> panic() at 0x80307c72 = panic+0x182
>> >> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
>> >>
>> >>
>> >> First question: where's the most appropriate place to ask about this
>> >> kind of bug on a back version.
>> > It is fine to ask there.
>> >
>> >>
>> >> Second: does this remind anyone of any bugs?  Googling came up with a
>> >> few somewhat similar things but hasn't provided much insight so far.
>> > In 99% of the cases, it means that you forgot to dev_ref() some cdev.
>>
>> So dev_ref increments the reference count for a cdev. Even though the
>> work "loop" seems to indicate that we will iterate over a list of
>> objects (one of which we may be missing a reference to via a missing
>> dev_ref()), I'm not seeing how this can cause a panic from inside
>> devfs_populate_loop().
>>
>> Can you help me understand this?
>>
> Missing dev_ref() means that the memory for the cdev (and cdev_priv) is
> freed prematurely. If this happens before destroy_dev() is called,
> then the list which is iterated over by populate_loop(), is corrupted.
>
> See e.g. MAKEDEV_REF flag for make_dev(9) and its use in the (old) clone
> handlers.
>

Ahhh, thanks Kostik. Reading make_dev(9) (and more source code) now...

-Brandon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Brandon Gooch
On Thu, Aug 25, 2011 at 4:53 PM, Kostik Belousov  wrote:
> On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote:
>> We're having a crash in some internal code running on FreeBSD 7.2
>> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know
>> it's quite a bit behind) in which after 18-30 hours of running load
>> tests, the code panics with:
>>
>> panic: Bad link elm 0xff0044c09600 next->prev != elm
>> cpuid = 0
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
>> panic() at 0x80307c72 = panic+0x182
>> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
>>
>>
>> First question: where's the most appropriate place to ask about this
>> kind of bug on a back version.
> It is fine to ask there.
>
>>
>> Second: does this remind anyone of any bugs?  Googling came up with a
>> few somewhat similar things but hasn't provided much insight so far.
> In 99% of the cases, it means that you forgot to dev_ref() some cdev.

So dev_ref increments the reference count for a cdev. Even though the
work "loop" seems to indicate that we will iterate over a list of
objects (one of which we may be missing a reference to via a missing
dev_ref()), I'm not seeing how this can cause a panic from inside
devfs_populate_loop().

Can you help me understand this?

-Brandon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Kip Macy
On Thu, Aug 25, 2011 at 11:16 PM, Charlie Martin  wrote:
> We're having a crash in some internal code running on FreeBSD 7.2
> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know it's
> quite a bit behind) in which after 18-30 hours of running load tests, the
> code panics with:
>
> panic: Bad link elm 0xff0044c09600 next->prev != elm
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
> panic() at 0x80307c72 = panic+0x182
> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
>
>
> First question: where's the most appropriate place to ask about this kind of
> bug on a back version.

Probably -stable. I don't know how many developers are still running
7. Most are on 8 at this point.

> Second: does this remind anyone of any bugs?  Googling came up with a few
> somewhat similar things but hasn't provided much insight so far.

This panic is very common when list updates aren't adequately serialized.

> Third: I tried compiling with the sys/queue.h QUEUE_MACRO_DEBUG defined in
> order to get more useful information from the panic.  The kernel build fails
> in pmap.c when this macro is defined, giving an error saying the CTASSERT
> macro is resolving to a negative array size.  Is there any particular secret
> to using this macro (like, no one goes there any more?)

This is because you are running amd64 and the the pv_entry constants
were defined assuming the default (smaller) list entry structure. I
once fixed this in a local tree, but I think I was so dismayed at the
"obviousness" of the bug I was tracking down that I neglected to
commit the pmap update. It shouldn't be too hard to calculate the
correct constants.

Cheers
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Kostik Belousov
On Thu, Aug 25, 2011 at 05:12:09PM -0500, Brandon Gooch wrote:
> On Thu, Aug 25, 2011 at 4:53 PM, Kostik Belousov  wrote:
> > On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote:
> >> We're having a crash in some internal code running on FreeBSD 7.2
> >> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know
> >> it's quite a bit behind) in which after 18-30 hours of running load
> >> tests, the code panics with:
> >>
> >> panic: Bad link elm 0xff0044c09600 next->prev != elm
> >> cpuid = 0
> >> KDB: stack backtrace:
> >> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
> >> panic() at 0x80307c72 = panic+0x182
> >> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
> >>
> >>
> >> First question: where's the most appropriate place to ask about this
> >> kind of bug on a back version.
> > It is fine to ask there.
> >
> >>
> >> Second: does this remind anyone of any bugs?  Googling came up with a
> >> few somewhat similar things but hasn't provided much insight so far.
> > In 99% of the cases, it means that you forgot to dev_ref() some cdev.
> 
> So dev_ref increments the reference count for a cdev. Even though the
> work "loop" seems to indicate that we will iterate over a list of
> objects (one of which we may be missing a reference to via a missing
> dev_ref()), I'm not seeing how this can cause a panic from inside
> devfs_populate_loop().
> 
> Can you help me understand this?
> 
Missing dev_ref() means that the memory for the cdev (and cdev_priv) is
freed prematurely. If this happens before destroy_dev() is called,
then the list which is iterated over by populate_loop(), is corrupted.

See e.g. MAKEDEV_REF flag for make_dev(9) and its use in the (old) clone
handlers.


pgpWJlf9huRNl.pgp
Description: PGP signature


Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Kostik Belousov
On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote:
> We're having a crash in some internal code running on FreeBSD 7.2 
> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know 
> it's quite a bit behind) in which after 18-30 hours of running load 
> tests, the code panics with:
> 
> panic: Bad link elm 0xff0044c09600 next->prev != elm
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
> panic() at 0x80307c72 = panic+0x182
> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
> 
> 
> First question: where's the most appropriate place to ask about this 
> kind of bug on a back version.
It is fine to ask there.

> 
> Second: does this remind anyone of any bugs?  Googling came up with a 
> few somewhat similar things but hasn't provided much insight so far.
In 99% of the cases, it means that you forgot to dev_ref() some cdev.


pgpG6DLjTvGZh.pgp
Description: PGP signature