Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Brandon Gooch
2011/8/25 Kostik Belousov :
> On Thu, Aug 25, 2011 at 05:12:09PM -0500, Brandon Gooch wrote:
>> On Thu, Aug 25, 2011 at 4:53 PM, Kostik Belousov  wrote:
>> > On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote:
>> >> We're having a crash in some internal code running on FreeBSD 7.2
>> >> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know
>> >> it's quite a bit behind) in which after 18-30 hours of running load
>> >> tests, the code panics with:
>> >>
>> >> panic: Bad link elm 0xff0044c09600 next->prev != elm
>> >> cpuid = 0
>> >> KDB: stack backtrace:
>> >> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
>> >> panic() at 0x80307c72 = panic+0x182
>> >> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
>> >>
>> >>
>> >> First question: where's the most appropriate place to ask about this
>> >> kind of bug on a back version.
>> > It is fine to ask there.
>> >
>> >>
>> >> Second: does this remind anyone of any bugs?  Googling came up with a
>> >> few somewhat similar things but hasn't provided much insight so far.
>> > In 99% of the cases, it means that you forgot to dev_ref() some cdev.
>>
>> So dev_ref increments the reference count for a cdev. Even though the
>> work "loop" seems to indicate that we will iterate over a list of
>> objects (one of which we may be missing a reference to via a missing
>> dev_ref()), I'm not seeing how this can cause a panic from inside
>> devfs_populate_loop().
>>
>> Can you help me understand this?
>>
> Missing dev_ref() means that the memory for the cdev (and cdev_priv) is
> freed prematurely. If this happens before destroy_dev() is called,
> then the list which is iterated over by populate_loop(), is corrupted.
>
> See e.g. MAKEDEV_REF flag for make_dev(9) and its use in the (old) clone
> handlers.
>

Ahhh, thanks Kostik. Reading make_dev(9) (and more source code) now...

-Brandon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Brandon Gooch
On Thu, Aug 25, 2011 at 4:53 PM, Kostik Belousov  wrote:
> On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote:
>> We're having a crash in some internal code running on FreeBSD 7.2
>> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know
>> it's quite a bit behind) in which after 18-30 hours of running load
>> tests, the code panics with:
>>
>> panic: Bad link elm 0xff0044c09600 next->prev != elm
>> cpuid = 0
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
>> panic() at 0x80307c72 = panic+0x182
>> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
>>
>>
>> First question: where's the most appropriate place to ask about this
>> kind of bug on a back version.
> It is fine to ask there.
>
>>
>> Second: does this remind anyone of any bugs?  Googling came up with a
>> few somewhat similar things but hasn't provided much insight so far.
> In 99% of the cases, it means that you forgot to dev_ref() some cdev.

So dev_ref increments the reference count for a cdev. Even though the
work "loop" seems to indicate that we will iterate over a list of
objects (one of which we may be missing a reference to via a missing
dev_ref()), I'm not seeing how this can cause a panic from inside
devfs_populate_loop().

Can you help me understand this?

-Brandon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Kip Macy
On Thu, Aug 25, 2011 at 11:16 PM, Charlie Martin  wrote:
> We're having a crash in some internal code running on FreeBSD 7.2
> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know it's
> quite a bit behind) in which after 18-30 hours of running load tests, the
> code panics with:
>
> panic: Bad link elm 0xff0044c09600 next->prev != elm
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
> panic() at 0x80307c72 = panic+0x182
> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
>
>
> First question: where's the most appropriate place to ask about this kind of
> bug on a back version.

Probably -stable. I don't know how many developers are still running
7. Most are on 8 at this point.

> Second: does this remind anyone of any bugs?  Googling came up with a few
> somewhat similar things but hasn't provided much insight so far.

This panic is very common when list updates aren't adequately serialized.

> Third: I tried compiling with the sys/queue.h QUEUE_MACRO_DEBUG defined in
> order to get more useful information from the panic.  The kernel build fails
> in pmap.c when this macro is defined, giving an error saying the CTASSERT
> macro is resolving to a negative array size.  Is there any particular secret
> to using this macro (like, no one goes there any more?)

This is because you are running amd64 and the the pv_entry constants
were defined assuming the default (smaller) list entry structure. I
once fixed this in a local tree, but I think I was so dismayed at the
"obviousness" of the bug I was tracking down that I neglected to
commit the pmap update. It shouldn't be too hard to calculate the
correct constants.

Cheers
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Kostik Belousov
On Thu, Aug 25, 2011 at 05:12:09PM -0500, Brandon Gooch wrote:
> On Thu, Aug 25, 2011 at 4:53 PM, Kostik Belousov  wrote:
> > On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote:
> >> We're having a crash in some internal code running on FreeBSD 7.2
> >> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know
> >> it's quite a bit behind) in which after 18-30 hours of running load
> >> tests, the code panics with:
> >>
> >> panic: Bad link elm 0xff0044c09600 next->prev != elm
> >> cpuid = 0
> >> KDB: stack backtrace:
> >> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
> >> panic() at 0x80307c72 = panic+0x182
> >> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
> >>
> >>
> >> First question: where's the most appropriate place to ask about this
> >> kind of bug on a back version.
> > It is fine to ask there.
> >
> >>
> >> Second: does this remind anyone of any bugs?  Googling came up with a
> >> few somewhat similar things but hasn't provided much insight so far.
> > In 99% of the cases, it means that you forgot to dev_ref() some cdev.
> 
> So dev_ref increments the reference count for a cdev. Even though the
> work "loop" seems to indicate that we will iterate over a list of
> objects (one of which we may be missing a reference to via a missing
> dev_ref()), I'm not seeing how this can cause a panic from inside
> devfs_populate_loop().
> 
> Can you help me understand this?
> 
Missing dev_ref() means that the memory for the cdev (and cdev_priv) is
freed prematurely. If this happens before destroy_dev() is called,
then the list which is iterated over by populate_loop(), is corrupted.

See e.g. MAKEDEV_REF flag for make_dev(9) and its use in the (old) clone
handlers.


pgpWJlf9huRNl.pgp
Description: PGP signature


Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Kostik Belousov
On Thu, Aug 25, 2011 at 03:16:09PM -0600, Charlie Martin wrote:
> We're having a crash in some internal code running on FreeBSD 7.2 
> (specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know 
> it's quite a bit behind) in which after 18-30 hours of running load 
> tests, the code panics with:
> 
> panic: Bad link elm 0xff0044c09600 next->prev != elm
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
> panic() at 0x80307c72 = panic+0x182
> devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548
> 
> 
> First question: where's the most appropriate place to ask about this 
> kind of bug on a back version.
It is fine to ask there.

> 
> Second: does this remind anyone of any bugs?  Googling came up with a 
> few somewhat similar things but hasn't provided much insight so far.
In 99% of the cases, it means that you forgot to dev_ref() some cdev.


pgpG6DLjTvGZh.pgp
Description: PGP signature


Where to ask about a 7.2 bug, and debugging sys/queue.h errors

2011-08-25 Thread Charlie Martin
We're having a crash in some internal code running on FreeBSD 7.2 
(specifically  7.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know 
it's quite a bit behind) in which after 18-30 hours of running load 
tests, the code panics with:


panic: Bad link elm 0xff0044c09600 next->prev != elm
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at 0x8019119a = db_trace_self_wrapper+0x2a
panic() at 0x80307c72 = panic+0x182
devfs_populate_loop() at 0x802a43a8 = devfs_populate_loop+0x548


First question: where's the most appropriate place to ask about this 
kind of bug on a back version.


Second: does this remind anyone of any bugs?  Googling came up with a 
few somewhat similar things but hasn't provided much insight so far.


Third: I tried compiling with the sys/queue.h QUEUE_MACRO_DEBUG defined 
in order to get more useful information from the panic.  The kernel 
build fails in pmap.c when this macro is defined, giving an error saying 
the CTASSERT macro is resolving to a negative array size.  Is there any 
particular secret to using this macro (like, no one goes there any more?)


Thanks
--

Charles R. (Charlie) Martin
Senior Software Engineer
SGI logo
1900 Pike Road
Longmont, CO 80501
Phone: 303-532-0209
E-Mail: crmar...@sgi.com 
Website: www.sgi.com 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"