Hi Matt,
Maybe this should be moved to mdb-discuss...
My comments are at the end.

Matt Harrison wrote:
> Matt Harrison wrote:
>   
>> m...@bruningsystems.com wrote:
>>     
>>> Hi Matt,
>>>
>>> Matt Harrison wrote:
>>>       
>>>> Matt Harrison wrote:
>>>>  
>>>>         
>>>>> thanks Ian, I'll look into this tomorrow.
>>>>>
>>>>>     
>>>>>           
>>>> Well I'm not sure if it's good news or not. I've got the machine 
>>>> running memtest86+ with the standard tests and so far it's done 2 
>>>> passes (3 hours runtime) without a single error.
>>>>
>>>> I'm going to leave it running overnight but does it seem there could 
>>>> be another problem other than memory?
>>>>   
>>>>         
>>> Have you gotten anywhere yet with this hang?  Have you tried set 
>>> snooping=1 in
>>> /etc/system?  How about booting with kmdb and forcing a dump?
>>> I'm not sure why this is necessarily hardware related...
>>>
>>> max
>>>
>>>
>>>       
>> Unfortunately not yet...I was forced to bring the server back up to get 
>> some files from it, and I haven't had a chance to take it down again yet.
>>
>> I still need to make sure it can survive 24h solid of memtest, but I am 
>> happy to try other things.
>>
>> I'm not familiar with the snooping variable, nor with kmdb, although I 
>> have read about it being used here and there.
>>
>> I'll go ahead with the memtest when I can and report back.
>>     
>
> Ok, sorry it's taken a while but I've had the server run memtest for 24 
> hours and it hasn't found any errors whatsoever.
>
> Does anyone have an idea as to what I could try next?
>   
I would try booting with kmdb (or, alternatively, load kmdb once the 
machine is
up but before it is hung).  You can do this from command line console 
login (no graphics)
by running:

# mdb -K  <-- this will load kmdb and drop into it
:c  <-- this will continue

If you must have a windowing system to reproduce the hang, you can still use
kmdb, but, unless you can redirect console input/output from/to a serial 
port, you
won't be able to see what you are doing.  But, it is ok.
You type:

# mdb -K -F  <-- again, loads kmdb and drops into it.  The machine will 
appear hung.

Now, carefully with no typos:

: c   <-- and enter (that's colon c enter (3 key strokes)) the machine 
should
             come back (unless you have a typo).

Now, do whatever you are doing that causes the machine to hang.
When the machine is hung, type F1-a  (that is function key f1 and "a" 
together.
Unless the machine is "hard hung", this will put you into kmdb.  Again, 
you won't
be able to see what is happening if your console is on a windowing system.
Then type (again, no typos):

$<systemdump    <-- this will give you a panic dump and reboot.

If the above doesn't work, you either made typos, or your machine is 
hard hung.
If it is hard hung, add this line to /etc/system (of course, you'll have 
to bounce the
machine to get it back up to do this):

set snooping=1

Then reboot.  This sets a "deadman" timer.  Again, do your thing to cause
the hang.  If the scheduling clock does not
run for (by default) 50 seconds, the machine will panic giving you a dump.

If neither of these work, it implies that the real time clock is blocked 
out.
This is highly unlikely, but can occur.

Once you have the dump, report back...

max


_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to