Re: [newbie] Constant computer crashes

Tom Brinkman Wed, 25 Sep 2002 17:06:46 -0700

On Wednesday September 25 2002 08:25 am, Marcia wrote:
> Dear Tom
>
> Tom Brinkman wrote:
> >>Since the 'nopentium' bandaid didn't fix it, let's start again
> >>Marcia. List the hardware involved, particularly mobo, psu, video,
> >> and what Mandrake version, which video drivers are used.  Ram
> >> vendor, if you know?  IIRC, it's Mdk 8.2, with an ECS mobo. Got
> >> the model/ revision/bios vendor and numbers?
>
> The link for my board is http://www.ecsusa.com/ and my motherboard is
> the L7VMM.


   AMD apprv'd for your 1600+, unfortuntely, I have no experience with 
these new mico-boards (an i'm not an ECS fan). The lastest bios is 1.0a
http://www.ecsusa.com/ecsusa/www.ecs.com.tw/download/l7vmm.htm
"1. Remove "CPU warning temp item" in BIOS setup
 The ITE8705 chipset use the same high and low limit for "CPU warning 
temp & CPU shutdown temp"
 2. To fix Hynix 128M X 2 or Samsung 128M X 2 system will auto-restart 
when running"
....  either fix could be pertinent to your crash problem, so update if 
you don't already have 1.0a.  Both are worrisome in that they deal with 
auto shutdowns (crashes), one for temp, the other for ram.

I disabled the onboard lan because even though it worked
> it was grabbing the same irq as sound. The company sent me a new lan
> card which helped that it seems. This is an Athlon 1600+ XP with
> 512MB PC2100 DDR, 266 MHZ SDRAM, 

   Yes, but who makes the ram. Two important points, the actual ram 
chips and the pcb (board) implemetation of the chips. IOW's Micron 
chips (good) on a generic pcb (bad) ... well two wrongs don't make a 
right ;>   Look in bios and see what the ram timings are. The most 
lenient are CAS 3-3-3, and if there's a setting for 'bank 
interleaving', disable it. At least till we tryin get your crash 
problem solved, go for lenient.  2-2-2 and 4-bank are the optimum, but 
only good ram on a good mobo with a good PSU can do it.  
Also it's 133 Mhz x2 ram.  (the x2, and DDR are mostly maketing talk)

  Probly now's a good time to run the machine overnite booting to 
memtest86. Look on your CD's, or use SoftwareManager, you should find 
somethin like  memtest86-3.0-2mdk .  Install that rpm, it'll add a 
memtest86 boot option to lilo (or grub). When you re-boot, choose this 
option and let the tests run overnite. 

    Plan B, if your machine doesn't like booting this option, then look 
in /boot. After installin the memtest rpm you'll see a file like 
memtest-3.0.bin.  So put in a good floppy and type
'dd if=/boot/memtest-3.0.bin of=/dev/fd0' (caution your memtest version 
is probly differnet than mine). That'll make an memtest86 floppy you 
can boot from. Just choose 'floppy' from lilo.  If you can't run 
memtest86 overnite with -0- errors, then we probly have found the 
problem ... the ram, or how well your motherboards gets along with it, 
or both. Could still be PSU tho.

I had the cooler master added plus
> an extra case fan. This is a brand new machine. I have Win95 as a
> dual boot and Win does not have the problems that my Linux side has.

  Win9.x --> WinXP tolerates sloppy (win)hardware, actually encourages 
it IMO.  Most all CoolerMaster hs/fans are AMD appr'vd, so we probly 
don't need to look there. I'd advise you tho, that it's probly usin a 
thermal pad to contact the cpu's die, and this will deteriorate over 
time, might even fail. Thermal grease is much better, now and later.


>  cat /proc/interrupts
   |
>  11:        154          XT-PIC  usb-uhci, usb-uhci

   What USB devices do you have? Appears two are sharing IRQ11 or it's 
possibly a double entry.  Everything else looked good.


>
> There is a temperature and performance utility in the bios.  What are
> lm_sensors/gkrellm? I would gladly install this if needed.

    Most common causes of random, occaisional lockups and reboots are 
faulty ram, or overheating. Even a lot of Windoze problems get blamed 
on M$, when these two culprits are really at fault (specially Winsux 
Registry errors).

    The temp you see in bios is really only good for verifying that you 
have hardware support for temp, voltage, fan monitoring.  When you see 
this temp the system is not under load, and usually is comin from a 
cool state. Specially if it's been off for more'n just a few seconds. 
Processor core temp is _very_ dynamic.  Also there's only a very few 
current mobo's that can really access AthlonXP internal diode core 
temps (Asus, Gigabyte).  All other boards, including yours an' mine, 
measure the temp from an external probe. 'Bout like tryin to see if the 
electric wires inside a wall are too hot, by holding your hand against 
the sheetrock. Still it's somethin to go by. Figure your cpu core temp 
is 10 to 20C hotter than the probe reports tho.

    So we need lm_sensors. It's on your CD's, install 
liblm_sensors1-2.6.4-4mdk
lm_sensors-2.6.4-4mdk      ...or just type 'lm-sensors' into 
SoftwareManager.  We won't fool with gkrellm just yet.  After the rpms 
are installed, su to root and run 'sensors-detect'. All the default 
answers to the questions it presents should be ok, just keep hitting 
<Enter>. When it get's towards the end, it'll output some lines that 
you need to edit into the end of either /etc/rc.d/rc.local and 
/etc/modules.conf   While we're at it, add 'i2c-proc' (w/o the quotes) 
to /etc/modules. Gettin back to 'sensors-detect', it probly has one 
more question ... install the sensors.conf file?, say Yes.  Then back 
in ('cd' to) /etc/rc.d/   ... type './rc.local'  to restart rc.local 
and have the modules take effect. Then as user you should see 
temp/voltage/fan outputs when you type 'sensors' in a terminal. Some 
have reported a reboot is necessary, but I've never needed to.

     We'll concentrate on the cpu temp for now. The cpu temp should stay 
under 60C (from a probe), under 55C is better under extreme load (eg, a 
kernel compile, specially 'make modules', running cpuburn, etc.) For 
normal operation it should be under the low 50's to mid 40's.  It's 
during high temp spikes or sustained load that systems freeze or 
spontaneously reboots occur. Keep an eye on system voltages too tho, 
they should be very close to slightly (+10%) over the voltages spec'd 
for your motherboard/cpu, and stay very steady.

    So for the acid test, cpuburn.  It's probly on your CD's, if not get 
it here   http://users.ev1.net/~redelm/    For your XP 1600+ you want 
to run 'burnK7'. While doin so, in another terminal check the output of 
'sensors' frequently. If the cpu temp climbs to 65C and starts going 
over, abort burnK7 (Ctrl+C), and figure out what you need to do to 
improve cooling. 'Cause that's most likely your crash problem.  If you 
notice the -5 and -12 volt ouputs droping too low (more'n 10%), then 
the PSU could be the problem. Voltage drops cause lockups/freezes too.  
If it all looks OK, and you can run burnK7 for an hour, your crash 
problem almost surely isn't hardware.

    Sorry for being so long winded, but I warned you that dianosing 
hardware over the phone was difficult ;)
-- 
    Tom Brinkman                  Corpus Christi, Texas

Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com

Re: [newbie] Constant computer crashes

Reply via email to