Hi Matt, Maybe this should be moved to mdb-discuss... My comments are at the end.
Matt Harrison wrote: > Matt Harrison wrote: > >> m...@bruningsystems.com wrote: >> >>> Hi Matt, >>> >>> Matt Harrison wrote: >>> >>>> Matt Harrison wrote: >>>> >>>> >>>>> thanks Ian, I'll look into this tomorrow. >>>>> >>>>> >>>>> >>>> Well I'm not sure if it's good news or not. I've got the machine >>>> running memtest86+ with the standard tests and so far it's done 2 >>>> passes (3 hours runtime) without a single error. >>>> >>>> I'm going to leave it running overnight but does it seem there could >>>> be another problem other than memory? >>>> >>>> >>> Have you gotten anywhere yet with this hang? Have you tried set >>> snooping=1 in >>> /etc/system? How about booting with kmdb and forcing a dump? >>> I'm not sure why this is necessarily hardware related... >>> >>> max >>> >>> >>> >> Unfortunately not yet...I was forced to bring the server back up to get >> some files from it, and I haven't had a chance to take it down again yet. >> >> I still need to make sure it can survive 24h solid of memtest, but I am >> happy to try other things. >> >> I'm not familiar with the snooping variable, nor with kmdb, although I >> have read about it being used here and there. >> >> I'll go ahead with the memtest when I can and report back. >> > > Ok, sorry it's taken a while but I've had the server run memtest for 24 > hours and it hasn't found any errors whatsoever. > > Does anyone have an idea as to what I could try next? > I would try booting with kmdb (or, alternatively, load kmdb once the machine is up but before it is hung). You can do this from command line console login (no graphics) by running: # mdb -K <-- this will load kmdb and drop into it :c <-- this will continue If you must have a windowing system to reproduce the hang, you can still use kmdb, but, unless you can redirect console input/output from/to a serial port, you won't be able to see what you are doing. But, it is ok. You type: # mdb -K -F <-- again, loads kmdb and drops into it. The machine will appear hung. Now, carefully with no typos: : c <-- and enter (that's colon c enter (3 key strokes)) the machine should come back (unless you have a typo). Now, do whatever you are doing that causes the machine to hang. When the machine is hung, type F1-a (that is function key f1 and "a" together. Unless the machine is "hard hung", this will put you into kmdb. Again, you won't be able to see what is happening if your console is on a windowing system. Then type (again, no typos): $<systemdump <-- this will give you a panic dump and reboot. If the above doesn't work, you either made typos, or your machine is hard hung. If it is hard hung, add this line to /etc/system (of course, you'll have to bounce the machine to get it back up to do this): set snooping=1 Then reboot. This sets a "deadman" timer. Again, do your thing to cause the hang. If the scheduling clock does not run for (by default) 50 seconds, the machine will panic giving you a dump. If neither of these work, it implies that the real time clock is blocked out. This is highly unlikely, but can occur. Once you have the dump, report back... max _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org