Re: [OpenIndiana-discuss] OI Crash

Doug Hughes Fri, 18 Jan 2013 20:46:42 -0800

On 1/18/2013 7:53 PM, dormitionsk...@hotmail.com wrote:

On Jan 17, 2013, at 8:47 PM, Reginald Beardsley wrote:

As far as I'm concerned, problems like this are a bottomless abyss.  Which is 
why I'm still putting up w/ my OI box hanging.  It's annoying, but not 
critical.  It's also why critical stuff still runs on Solaris 10.

Intermittent failures are the worst time sink there is. There is no assurance 
that devoting all your time to the problem will fix it even at very high skill 
levels w/ a full complement of the very best tools.

If you're getting crash dumps there is hope of finding the cause, so that's a 
big improvement.

Good luck,
Reg

BTW Back in the 80's there was a VAX operator in Texas who went out to his 
truck, got a .357 and shot the computer.  His employer was not happy.  But I 
can certainly understand how the operator felt.



 From 1992 to I used to 1998, I used to work at the Denver Museum of Natural 
History -- now the Denver Museum of Nature and Science.  We had two or three 
DEC Vax's and an AIX machine there.  It was their policy that once a week we 
had to power each of the servers all the way down to clear out any memory 
problems -- or whatever -- as preventive maintenance.

Since then, I've always had the habit of setting up a cron job to reboot my 
servers once a week.  It's not as good as a full power down, but it's better 
than nothing.  And in all these years, I've never had to deal with intermittent 
problems like this, except for a few brief times when I used Red Hat Linux ten 
plus years ago.  (I've tried most of Red Hat's versions since 6.2, and RHEL 6 
is the first version I've found that runs decent enough on our hardware, and 
that I'm happy enough with, for us to use.)

So, if you can do it, you might want try setting up a cron job to reboot your 
server once a week -- or every night.  I reboot our LTSP thin client server 
every night just because it gets hit with running lots of desktop applications 
that I think give it a greater potential for these kinds of memory problems.

On the other hand, we have all of our websites hosted on one of our 
parishioner's servers -- and he doesn't reboot his machines periodically like I 
do -- and about every two months, I have to call him up and tell him something 
is wrong.  And he goes and powers down his system -- sometimes he has to even 
unplug it -- and then turn it back on, and everything works again.

I know there are system admins that just love to brag about how great their 
up-times are on their machines -- but this might just save you a lot of time 
and grief.

Of course, if you're running a real high-volume server, this might not be 
workable for you; but it only takes 2-5 minutes or so to reboot... Perhaps in 
the middle of the night you might be able to spare it being down that short 
time?

Just a friendly suggestion.

Shared experience.

I know others may tell you that that's no longer necessary anymore in these 
more modern times; but my experience has been otherwise.

I hope it helps.

+Peter, hieromonk

Haven't we passed the days of mystical sysadmin without understandingand characterization? Keeping up tradition for tradition's sake withoutunderstanding the underlying reasons really doesn't do anybody a favor.If there are memory leaks, we posses the technology to find them. Myorganization has thousands of machines that run jobs sometimes formonths at a time. If I had to reboot servers once a week, my users wouldbe at the doors with pitchforks. The only time we take downtime is whenthere are reasons to do so, including OS updates, hardware failures, anduser software run amok. They can run a very long time like this.

Not that memory leaks never happen. Of course they do, but theyeventually get found and fixed, or the program causing them passes intoobsolescence. Always.

I encourage discovery rather than superstition, and diagnosis ratherthan repetition.


Be a knight, not a victim!


_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] OI Crash

Reply via email to