Correction krs_cpustack is only used when entering kmdb, then it
uses kmdb_main_stack which is mdb_alloc'ed. You probably overflowed
this stack.

alex.

On 02/26/09 12:22, Alexandre Chartre wrote:
> 
>  If I remember correctly, kmdb uses krs_cpustack from kaif_cpusave_t
> structure as a stack (one for each cpu). krs_cpustack is just an array
> at the end of the structure, and all structures (one per cpu) are
> allocated together.
> 
>  So there's no redzone protection in that case, if the kmdb stack overflows
> it will actually trash other information in kaif_cpusave_t leading to
> very bad and random behavior (there's a recovery sequence, kmdb_fault, but
> it can fail if the kaif_cpusave_t structure is corrupted).
> 
> alex.
> 
> 
> On 02/26/09 12:08, Jonathan Adams wrote:
>> On Thu, Feb 26, 2009 at 09:07:00PM +0100, max at bruningsystems.com wrote:
>>> Jonathan Adams wrote:
>>>> On Thu, Feb 26, 2009 at 11:42:59AM -0800, Edward Pilatowicz wrote:
>>>>  
>>>>> On Thu, Feb 26, 2009 at 01:42:24PM -0500, James Carlson wrote:
>>>>>   
>>>>>> I just spent a little over a day debugging a stack overflow 
>>>>>> problem in
>>>>>> mdb itself.  It turned out to be a fairly simple problem -- I'd added
>>>>>> a new dcmd, and one of the functions had a structure on the stack 
>>>>>> that
>>>>>> turned out to be unexpectedly _huge_ (512K+) -- but the symptoms of
>>>>>> the problem were fairly misleading and unexpected.  I saw panics that
>>>>>> looked like this:
>>>>>>
>>>>>> kmdb ABORT: "../common/umem.c", line 1264: assertion failed: 
>>>>>> sp->slab_cache == cp
>>>>>> Debugger aborted
>>>>>> Program terminated
>>>>>> {2} ok boot
>>>>>>
>>>>>> It turns out that allocating big things on the stack inside mdb 
>>>>>> can be
>>>>>> somewhat toxic.
>>>>>>
>>>>>> I fixed my problem by allocating the offending structure with
>>>>>> mdb_alloc, but that begs a question: are there other instances of 
>>>>>> this
>>>>>> problem hiding in here?  Could this be near the root of weird 
>>>>>> problems
>>>>>> like CR 6766866?
>>>>>>
>>>>>> It seems to me that the compiler must (obviously) know how much
>>>>>> storage it's reserving for auto variables.  Is there any way to find
>>>>>> this out and enforce a limit?  That wouldn't fix the problem of
>>>>>> nesting too deeply (or just recursing), but it'd at least catch
>>>>>> obvious blunders before they turn into lengthy trials.
>>>>>>
>>>>>>      
>>>>> iirc, at some point, someone had a tool which could look at a kernel
>>>>> panic stack trace and tell you the stack usage of each frame.  but 
>>>>> that
>>>>> is only for post mortem analysis.  afaik we don't have any way of
>>>>> getting this information from the compiler, and we don't have any 
>>>>> tools
>>>>> that can do assembly analysis to determine this information.
>>>>>    
>>>> It's pretty easy to look for subtractions from %rsp in the code:
>>>>
>>>> dis /kernel/kmdb/amd64/genunix | grep 'subq.*rsp' |    sed 
>>>> 's/\(.*\):.*subq.*\$\(0x[0-9a-f]*\),%rsp.*$/\1 \2' |
>>>>    while read func off; do printf "%5d %s\n" "$off" "$func";
>>>>    done | sort -n | egrep -v ' walkers\+| dcmds+"
>>>>
>>>> gives:
>>>>
>>>> ...
>>>> 4392 threadlist+0x21
>>>> 6592 pfile_callback+0x1d
>>>> 7336 vmem+0x1d
>>>> 8504 calloutid+0x21
>>>>
>>>> So calloutid is the largest stack user.  Something similar will work
>>>> for sparc.  We could put something like this in the kmdb module build,
>>>> and have a "max allowed" value (10k?)
>>>>
>>>> It wouldn't solve heavily recursive stuff, but maybe a guard page or 
>>>> two could
>>>> protect against that.
>>>>  
>>> There is a redzone page at the low end of all(?) kernel stacks.  When 
>>> it is touched,
>>> the system panics (better than overwriting other memory).
>>> See the code for segkp_fault(0 in uts/common/vm/seg_kp.c.
>>
>> But kmdb has it's own stacks.  I don't know if they have redzone pages.
>>
>> Cheers,
>> - jonathan
>>
>> _______________________________________________
>> mdb-discuss mailing list
>> mdb-discuss at opensolaris.org
> 

Reply via email to