Re: [newbie] Constant computer crashes

Marcia Wed, 25 Sep 2002 20:32:32 -0700

On Wednesday 25 September 2002 07:25 pm, you wrote:
> On Wednesday September 25 2002 08:25 am, Marcia wrote:
> > Dear Tom
> >
> > Tom Brinkman wrote:
> > >>Since the 'nopentium' bandaid didn't fix it, let's start again
> > >>Marcia. List the hardware involved, particularly mobo, psu, video,
> > >> and what Mandrake version, which video drivers are used.  Ram
> > >> vendor, if you know?  IIRC, it's Mdk 8.2, with an ECS mobo. Got
> > >> the model/ revision/bios vendor and numbers?
> >
> > The link for my board is http://www.ecsusa.com/ and my motherboard is
> > the L7VMM.
>
>    AMD apprv'd for your 1600+, unfortuntely, I have no experience with
> these new mico-boards (an i'm not an ECS fan). The lastest bios is 1.0a
> http://www.ecsusa.com/ecsusa/www.ecs.com.tw/download/l7vmm.htm
> "1. Remove "CPU warning temp item" in BIOS setup
>  The ITE8705 chipset use the same high and low limit for "CPU warning
> temp & CPU shutdown temp"
>  2. To fix Hynix 128M X 2 or Samsung 128M X 2 system will auto-restart
> when running"
> ....  either fix could be pertinent to your crash problem, so update if
> you don't already have 1.0a.  Both are worrisome in that they deal with
> auto shutdowns (crashes), one for temp, the other for ram.


I will update the bios then for starters. I have never done this so what is 
the procedure for doing this?

>
> I disabled the onboard lan because even though it worked
>
> > it was grabbing the same irq as sound. The company sent me a new lan
> > card which helped that it seems. This is an Athlon 1600+ XP with
> > 512MB PC2100 DDR, 266 MHZ SDRAM,
>
>    Yes, but who makes the ram. Two important points, the actual ram
> chips and the pcb (board) implemetation of the chips. IOW's Micron
> chips (good) on a generic pcb (bad) ... well two wrongs don't make a
> right ;>   Look in bios and see what the ram timings are. The most
> lenient are CAS 3-3-3, and if there's a setting for 'bank
> interleaving', disable it. At least till we tryin get your crash
> problem solved, go for lenient.  2-2-2 and 4-bank are the optimum, but
> only good ram on a good mobo with a good PSU can do it.
> Also it's 133 Mhz x2 ram.  (the x2, and DDR are mostly maketing talk)
>
>   Probly now's a good time to run the machine overnite booting to
> memtest86. Look on your CD's, or use SoftwareManager, you should find
> somethin like  memtest86-3.0-2mdk .  Install that rpm, it'll add a
> memtest86 boot option to lilo (or grub). When you re-boot, choose this
> option and let the tests run overnite.
>
>     Plan B, if your machine doesn't like booting this option, then look
> in /boot. After installin the memtest rpm you'll see a file like
> memtest-3.0.bin.  So put in a good floppy and type
> 'dd if=/boot/memtest-3.0.bin of=/dev/fd0' (caution your memtest version
> is probly differnet than mine). That'll make an memtest86 floppy you
> can boot from. Just choose 'floppy' from lilo.  If you can't run
> memtest86 overnite with -0- errors, then we probly have found the
> problem ... the ram, or how well your motherboards gets along with it,
> or both. Could still be PSU tho.
>
> I had the cooler master added plus
>
> > an extra case fan. This is a brand new machine. I have Win95 as a
> > dual boot and Win does not have the problems that my Linux side has.
>
>   Win9.x --> WinXP tolerates sloppy (win)hardware, actually encourages
> it IMO.  Most all CoolerMaster hs/fans are AMD appr'vd, so we probly
> don't need to look there. I'd advise you tho, that it's probly usin a
> thermal pad to contact the cpu's die, and this will deteriorate over
> time, might even fail. Thermal grease is much better, now and later.
>
> >  cat /proc/interrupts
> >
> >  11:        154          XT-PIC  usb-uhci, usb-uhci
>
>    What USB devices do you have? Appears two are sharing IRQ11 or it's
> possibly a double entry.  Everything else looked good.

I have a usb HP 4300 scanjet scanner and a HP 940c usb printer.
>
> > There is a temperature and performance utility in the bios.  What are
> > lm_sensors/gkrellm? I would gladly install this if needed.
>
>     Most common causes of random, occaisional lockups and reboots are
> faulty ram, or overheating. Even a lot of Windoze problems get blamed
> on M$, when these two culprits are really at fault (specially Winsux
> Registry errors).
>
>     The temp you see in bios is really only good for verifying that you
> have hardware support for temp, voltage, fan monitoring.  When you see
> this temp the system is not under load, and usually is comin from a
> cool state. Specially if it's been off for more'n just a few seconds.
> Processor core temp is _very_ dynamic.  Also there's only a very few
> current mobo's that can really access AthlonXP internal diode core
> temps (Asus, Gigabyte).  All other boards, including yours an' mine,
> measure the temp from an external probe. 'Bout like tryin to see if the
> electric wires inside a wall are too hot, by holding your hand against
> the sheetrock. Still it's somethin to go by. Figure your cpu core temp
> is 10 to 20C hotter than the probe reports tho.
>
>     So we need lm_sensors. It's on your CD's, install
> liblm_sensors1-2.6.4-4mdk
> lm_sensors-2.6.4-4mdk      ...or just type 'lm-sensors' into
> SoftwareManager.  We won't fool with gkrellm just yet.  After the rpms
> are installed, su to root and run 'sensors-detect'. All the default
> answers to the questions it presents should be ok, just keep hitting
> <Enter>. When it get's towards the end, it'll output some lines that
> you need to edit into the end of either /etc/rc.d/rc.local and
> /etc/modules.conf   While we're at it, add 'i2c-proc' (w/o the quotes)
> to /etc/modules. Gettin back to 'sensors-detect', it probly has one
> more question ... install the sensors.conf file?, say Yes.  Then back
> in ('cd' to) /etc/rc.d/   ... type './rc.local'  to restart rc.local
> and have the modules take effect. Then as user you should see
> temp/voltage/fan outputs when you type 'sensors' in a terminal. Some
> have reported a reboot is necessary, but I've never needed to.
>
>      We'll concentrate on the cpu temp for now. The cpu temp should stay
> under 60C (from a probe), under 55C is better under extreme load (eg, a
> kernel compile, specially 'make modules', running cpuburn, etc.) For
> normal operation it should be under the low 50's to mid 40's.  It's
> during high temp spikes or sustained load that systems freeze or
> spontaneously reboots occur. Keep an eye on system voltages too tho,
> they should be very close to slightly (+10%) over the voltages spec'd
> for your motherboard/cpu, and stay very steady.
>
>     So for the acid test, cpuburn.  It's probly on your CD's, if not get
> it here   http://users.ev1.net/~redelm/    For your XP 1600+ you want
> to run 'burnK7'. While doin so, in another terminal check the output of
> 'sensors' frequently. If the cpu temp climbs to 65C and starts going
> over, abort burnK7 (Ctrl+C), and figure out what you need to do to
> improve cooling. 'Cause that's most likely your crash problem.  If you
> notice the -5 and -12 volt ouputs droping too low (more'n 10%), then
> the PSU could be the problem. Voltage drops cause lockups/freezes too.
> If it all looks OK, and you can run burnK7 for an hour, your crash
> problem almost surely isn't hardware.
>
>     Sorry for being so long winded, but I warned you that dianosing
> hardware over the phone was difficult ;)


Thank you very much for your detailed information here. I really appreciate 
your time on this. I just got this tonight so will study and try these things 
the next few days. I will let you know my results. I am sure this will be 
resolved eventually.

Thanks again.

Sincerely,

Marcia

Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com

Re: [newbie] Constant computer crashes

Reply via email to