On Thu, Feb 26, 2009 at 09:07:00PM +0100, max at bruningsystems.com wrote:
> Jonathan Adams wrote:
> >On Thu, Feb 26, 2009 at 11:42:59AM -0800, Edward Pilatowicz wrote:
> >  
> >>On Thu, Feb 26, 2009 at 01:42:24PM -0500, James Carlson wrote:
> >>    
> >>>I just spent a little over a day debugging a stack overflow problem in
> >>>mdb itself.  It turned out to be a fairly simple problem -- I'd added
> >>>a new dcmd, and one of the functions had a structure on the stack that
> >>>turned out to be unexpectedly _huge_ (512K+) -- but the symptoms of
> >>>the problem were fairly misleading and unexpected.  I saw panics that
> >>>looked like this:
> >>>
> >>>kmdb ABORT: "../common/umem.c", line 1264: assertion failed: 
> >>>sp->slab_cache == cp
> >>>Debugger aborted
> >>>Program terminated
> >>>{2} ok boot
> >>>
> >>>It turns out that allocating big things on the stack inside mdb can be
> >>>somewhat toxic.
> >>>
> >>>I fixed my problem by allocating the offending structure with
> >>>mdb_alloc, but that begs a question: are there other instances of this
> >>>problem hiding in here?  Could this be near the root of weird problems
> >>>like CR 6766866?
> >>>
> >>>It seems to me that the compiler must (obviously) know how much
> >>>storage it's reserving for auto variables.  Is there any way to find
> >>>this out and enforce a limit?  That wouldn't fix the problem of
> >>>nesting too deeply (or just recursing), but it'd at least catch
> >>>obvious blunders before they turn into lengthy trials.
> >>>
> >>>      
> >>iirc, at some point, someone had a tool which could look at a kernel
> >>panic stack trace and tell you the stack usage of each frame.  but that
> >>is only for post mortem analysis.  afaik we don't have any way of
> >>getting this information from the compiler, and we don't have any tools
> >>that can do assembly analysis to determine this information.
> >>    
> >
> >It's pretty easy to look for subtractions from %rsp in the code:
> >
> >dis /kernel/kmdb/amd64/genunix | grep 'subq.*rsp' | 
> >    sed 's/\(.*\):.*subq.*\$\(0x[0-9a-f]*\),%rsp.*$/\1 \2' |
> >    while read func off; do printf "%5d %s\n" "$off" "$func";
> >    done | sort -n | egrep -v ' walkers\+| dcmds+"
> >
> >gives:
> >
> >...
> > 4392 threadlist+0x21
> > 6592 pfile_callback+0x1d
> > 7336 vmem+0x1d
> > 8504 calloutid+0x21
> >
> >So calloutid is the largest stack user.  Something similar will work
> >for sparc.  We could put something like this in the kmdb module build,
> >and have a "max allowed" value (10k?)
> >
> >It wouldn't solve heavily recursive stuff, but maybe a guard page or two 
> >could
> >protect against that.
> >  
> There is a redzone page at the low end of all(?) kernel stacks.  When it 
> is touched,
> the system panics (better than overwriting other memory).
> See the code for segkp_fault(0 in uts/common/vm/seg_kp.c.

But kmdb has it's own stacks.  I don't know if they have redzone pages.

Cheers,
- jonathan


Reply via email to