Apologies for the delay in replying - our fileserver died a grisly death
on Friday afternoon.

> Could you do the following for me, please:
> 
> 0) In your BIOS options look for an option to remap/reclaim
>     the dram hole, and if it is offering a choice of "software"
>     vs "hardware" remapping select hardware.  If that was
>     not already selected reboot with that setting and see
>     if the hang occurs.  Set the following in /etc/system
>     to increase the scrub rate we apply so (if we're guilty)
>     you won't have to wait 90 minutes each time:
> 
>       set cpu\.AuthenticAMD\.15:ao_scrub_rate_dram=1
> 
No option exists in the BIOS Setup Utility to remap the dram hole, as
far as I could see.

Adding the above line to /etc/system did indeed reduce the time to crash
- so much so that it didn't get as far as the login prompt :( Hence, as
I didn't have the nous or time to roll the Smart Array drivers into a
Live CD or the failsafe boot environment, I reinstalled.


> 1) Regardless of the results of 0, in mdb dump out the memory
>     controller nvlist info and then the full memory controller structure:
> 
> mdb -k <<EOM
> *mc_list::list mc_t mc_next | ::print mc_t mc_nvl | ::nvlist
> *mc_list::list mc_t mc_next | ::print mc_t
> EOM
> 

The output is pretty long: instead of attaching it I've put it here:

http://chrislf.freeshell.org/mdb.out

> 2) If 0 did not take care of it, rename the two AMD cpu modules and reboot:
> 
> # mv /platform/i86pc/kernel/cpu/cpu.AuthenticAMD.15 \
>       /platform/i86pc/kernel/cpu/cpu.AuthenticAMD.15-
> 
> # mv /platform/i86pc/kernel/cpu/amd64/cpu.AuthenticAMD.15 \
>       /platform/i86pc/kernel/cpu/amd64/cpu.AuthenticAMD.15-
> 
> # init 6

This worked.

> 
> That's a bit heavy-handed since it eliminates all the config
> operations we perform and not just the scrubber stuff.  We will
> fallback to the dumb generic support.
> 
> 3) If 2) appears to let you survive longer than 90 minutes you can
>     add the following to /etc/system as a workaround:
> 
>       set cpu\.AuthenticAMD\.15:ao_scrub_policy=1
>       set cpu\.AuthenticAMD\.15:ao_scrub_rate_dram=0
> 
>     which will stop us enabling the dram scrubber.  If you
>     set the dram scrub rate in 0) above to 1 be sure to replace
>     that line with the 0 setting above.

This also worked fine! Let me know if you would like further details of
my hardware setup.

Cheers,

Chris



_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to