Jeremy Domingue wrote:
>
> Okay, well, the whole reason I'm trying to lower the usage of cron (or
> should I say, the processes it runs) is because I thought that the load on
> the machine may have been what has been crashing it over the past couple of
> months. I've been having a looooooooooooooong running problem of the server
> crashing every 5 days or so, now I'm down to about every 1-2 days the dang
> thing crashes. No matter what I do it will not stay up, and the problem
> continues to get worse.
If you want to test if the load is what crashes it, just start
whatever program you can until you fill both the ram and the swap
(try a loop, or a big "make -j", netscapes). This _may_ actually
crash it, but I think you'll get some errors logged.
I tried this once (lost the swap partition with buggy fstool on
rh50), I filled the ram, and it crashed. But with 128M swap (64M
RAM), I couldn't crash it: at some point, it was too busy
swapping that I wasn't able to start anymore programs for about 3
hours. Stopping programs made it come back to life. I hope your
cron jobs do not last for 3 hours :)
Related may be the size of ram/swap. It was suggested in the unix
world to use swap = 2 x ram. Maybe this ratio ends up in
"protective" heavy swapping which prevents launching new apps
before exhausting the swap space. Not sure if the same ratio
should apply for SMP.
You can try monitoring the machine until it crashes (or it fills
the disk) using "top > logfile", but I'm not sure if the file
will survive after fsck. You might find it in lost+found. This
may reveal peak loads or memory leaks as root cause.
But...
>
> Here are the machine specs:
>
> Dual Pentium II 266
> 512mb EDO ECC SDRAM
> Adaptec 7880 on board SCSI controller
> 2 - 4.1 GIG IBM UW-SCSI hard drives
> 3com 10/100 TP Ethernet Card
>
> Redhat 5.0
> Kernel 2.0.34 w/SMP enabled, no modules. Both SCSI and Netcard driver are
> built in the kernel.
Amazingly close to what a friend has. His machine was crashing
out of the blue, or was not booting at all. He mentioned
something about scsi+smp hw failures.
I don't know exactly how he solved them, but now it has RH50 with
kernel 2.1.106(w/SMP), no modules, and everything seems OK,
despite heavy load peaks (I whitnessed loads over 16):
$ uptime
12:46am up 6 days, 10:24h, 8 users, load average: 0.61, 0.44,
0.38
Maybe upgrading the kernel?
Hope it helps,
dan
--
PLEASE read the Red Hat FAQ, Tips, Errata and the MAILING LIST ARCHIVES!
http://www.redhat.com/RedHat-FAQ /RedHat-Errata /RedHat-Tips /mailing-lists
To unsubscribe: mail [EMAIL PROTECTED] with
"unsubscribe" as the Subject.