Jonathan Adams wrote:
> On Thu, Feb 26, 2009 at 11:42:59AM -0800, Edward Pilatowicz wrote:
>   
>> On Thu, Feb 26, 2009 at 01:42:24PM -0500, James Carlson wrote:
>>     
>>> I just spent a little over a day debugging a stack overflow problem in
>>> mdb itself.  It turned out to be a fairly simple problem -- I'd added
>>> a new dcmd, and one of the functions had a structure on the stack that
>>> turned out to be unexpectedly _huge_ (512K+) -- but the symptoms of
>>> the problem were fairly misleading and unexpected.  I saw panics that
>>> looked like this:
>>>
>>> kmdb ABORT: "../common/umem.c", line 1264: assertion failed: sp->slab_cache 
>>> == cp
>>> Debugger aborted
>>> Program terminated
>>> {2} ok boot
>>>
>>> It turns out that allocating big things on the stack inside mdb can be
>>> somewhat toxic.
>>>
>>> I fixed my problem by allocating the offending structure with
>>> mdb_alloc, but that begs a question: are there other instances of this
>>> problem hiding in here?  Could this be near the root of weird problems
>>> like CR 6766866?
>>>
>>> It seems to me that the compiler must (obviously) know how much
>>> storage it's reserving for auto variables.  Is there any way to find
>>> this out and enforce a limit?  That wouldn't fix the problem of
>>> nesting too deeply (or just recursing), but it'd at least catch
>>> obvious blunders before they turn into lengthy trials.
>>>
>>>       
>> iirc, at some point, someone had a tool which could look at a kernel
>> panic stack trace and tell you the stack usage of each frame.  but that
>> is only for post mortem analysis.  afaik we don't have any way of
>> getting this information from the compiler, and we don't have any tools
>> that can do assembly analysis to determine this information.
>>     
>
> It's pretty easy to look for subtractions from %rsp in the code:
>
> dis /kernel/kmdb/amd64/genunix | grep 'subq.*rsp' | 
>     sed 's/\(.*\):.*subq.*\$\(0x[0-9a-f]*\),%rsp.*$/\1 \2' |
>     while read func off; do printf "%5d %s\n" "$off" "$func";
>     done | sort -n | egrep -v ' walkers\+| dcmds+"
>
> gives:
>
> ...
>  4392 threadlist+0x21
>  6592 pfile_callback+0x1d
>  7336 vmem+0x1d
>  8504 calloutid+0x21
>
> So calloutid is the largest stack user.  Something similar will work
> for sparc.  We could put something like this in the kmdb module build,
> and have a "max allowed" value (10k?)
>
> It wouldn't solve heavily recursive stuff, but maybe a guard page or two could
> protect against that.
>   
There is a redzone page at the low end of all(?) kernel stacks.  When it 
is touched,
the system panics (better than overwriting other memory).
See the code for segkp_fault(0 in uts/common/vm/seg_kp.c.

max

> Cheers,
> - jonathan
>
>
> _______________________________________________
> mdb-discuss mailing list
> mdb-discuss at opensolaris.org
>
>   


Reply via email to