Correction krs_cpustack is only used when entering kmdb, then it uses kmdb_main_stack which is mdb_alloc'ed. You probably overflowed this stack.
alex. On 02/26/09 12:22, Alexandre Chartre wrote: > > If I remember correctly, kmdb uses krs_cpustack from kaif_cpusave_t > structure as a stack (one for each cpu). krs_cpustack is just an array > at the end of the structure, and all structures (one per cpu) are > allocated together. > > So there's no redzone protection in that case, if the kmdb stack overflows > it will actually trash other information in kaif_cpusave_t leading to > very bad and random behavior (there's a recovery sequence, kmdb_fault, but > it can fail if the kaif_cpusave_t structure is corrupted). > > alex. > > > On 02/26/09 12:08, Jonathan Adams wrote: >> On Thu, Feb 26, 2009 at 09:07:00PM +0100, max at bruningsystems.com wrote: >>> Jonathan Adams wrote: >>>> On Thu, Feb 26, 2009 at 11:42:59AM -0800, Edward Pilatowicz wrote: >>>> >>>>> On Thu, Feb 26, 2009 at 01:42:24PM -0500, James Carlson wrote: >>>>> >>>>>> I just spent a little over a day debugging a stack overflow >>>>>> problem in >>>>>> mdb itself. It turned out to be a fairly simple problem -- I'd added >>>>>> a new dcmd, and one of the functions had a structure on the stack >>>>>> that >>>>>> turned out to be unexpectedly _huge_ (512K+) -- but the symptoms of >>>>>> the problem were fairly misleading and unexpected. I saw panics that >>>>>> looked like this: >>>>>> >>>>>> kmdb ABORT: "../common/umem.c", line 1264: assertion failed: >>>>>> sp->slab_cache == cp >>>>>> Debugger aborted >>>>>> Program terminated >>>>>> {2} ok boot >>>>>> >>>>>> It turns out that allocating big things on the stack inside mdb >>>>>> can be >>>>>> somewhat toxic. >>>>>> >>>>>> I fixed my problem by allocating the offending structure with >>>>>> mdb_alloc, but that begs a question: are there other instances of >>>>>> this >>>>>> problem hiding in here? Could this be near the root of weird >>>>>> problems >>>>>> like CR 6766866? >>>>>> >>>>>> It seems to me that the compiler must (obviously) know how much >>>>>> storage it's reserving for auto variables. Is there any way to find >>>>>> this out and enforce a limit? That wouldn't fix the problem of >>>>>> nesting too deeply (or just recursing), but it'd at least catch >>>>>> obvious blunders before they turn into lengthy trials. >>>>>> >>>>>> >>>>> iirc, at some point, someone had a tool which could look at a kernel >>>>> panic stack trace and tell you the stack usage of each frame. but >>>>> that >>>>> is only for post mortem analysis. afaik we don't have any way of >>>>> getting this information from the compiler, and we don't have any >>>>> tools >>>>> that can do assembly analysis to determine this information. >>>>> >>>> It's pretty easy to look for subtractions from %rsp in the code: >>>> >>>> dis /kernel/kmdb/amd64/genunix | grep 'subq.*rsp' | sed >>>> 's/\(.*\):.*subq.*\$\(0x[0-9a-f]*\),%rsp.*$/\1 \2' | >>>> while read func off; do printf "%5d %s\n" "$off" "$func"; >>>> done | sort -n | egrep -v ' walkers\+| dcmds+" >>>> >>>> gives: >>>> >>>> ... >>>> 4392 threadlist+0x21 >>>> 6592 pfile_callback+0x1d >>>> 7336 vmem+0x1d >>>> 8504 calloutid+0x21 >>>> >>>> So calloutid is the largest stack user. Something similar will work >>>> for sparc. We could put something like this in the kmdb module build, >>>> and have a "max allowed" value (10k?) >>>> >>>> It wouldn't solve heavily recursive stuff, but maybe a guard page or >>>> two could >>>> protect against that. >>>> >>> There is a redzone page at the low end of all(?) kernel stacks. When >>> it is touched, >>> the system panics (better than overwriting other memory). >>> See the code for segkp_fault(0 in uts/common/vm/seg_kp.c. >> >> But kmdb has it's own stacks. I don't know if they have redzone pages. >> >> Cheers, >> - jonathan >> >> _______________________________________________ >> mdb-discuss mailing list >> mdb-discuss at opensolaris.org >