Re: Maybe OT but a big problem with SOLARIS

2004-11-24 Thread Michael Schaller
Hey amanda-users!
First: Thanks to all for your tips and hints.
Second: The story goes on ...
SUN updatet some packages of the OS (Solaris 8), updatet the OBP, they 
did some stresstests and so on - without success.

During the last week we had the console output logged to a Putty-Session 
and what shall I say: At the console the machine told about some CPU 
problems. SUN changed the complete CPU-Board and the affected 
SCSI-Controller on which the tape devices are attached. Without success

Today SUN will change the centerplane and we'll see ;-)
I'll keep you informed.
Greets from Stuttgart
Michael

Michael Schaller schrieb:
Hey!
Maybe this Thread is a little OT, maybe it isn't.
During a few days we've big problems with one of our servers.
It's a SUN FIRE V480R with external StorEdge an a external changer 
(Overland LoaderXpress with a single LTO-1 drive from HP) attached.

As I wrote a few weeks ago the system run fine with AMANDA and the 
changer. Only in the fist night after configuring AMANDA with the 
changer the automatic backup started and the complete system crashed!!
Solaris didn't give any messages in /var/adm/messages ...
The system was frozen, the only way to get the system back to life was a 
poweroff. After that the system was really fine for a few weeks.

Last week the same shit happend. The complete system was frozen.
We opened a call but without any messages sun was not able to solve the 
problem. During the last two days the system crashed two times.
But now the system does a automatic reboot.

Anybody ANY hints??
I get crazy ...
Thanks in advance
Michael




Re: Maybe OT but a big problem with SOLARIS

2004-11-23 Thread Paul Bijnens
Michael Schaller wrote:
As I wrote a few weeks ago the system run fine with AMANDA and the 
changer. Only in the fist night after configuring AMANDA with the 
changer the automatic backup started and the complete system crashed!!
Solaris didn't give any messages in /var/adm/messages ...
The system was frozen, the only way to get the system back to life was a 
poweroff. After that the system was really fine for a few weeks.

Last week the same shit happend. The complete system was frozen.
The trick is to find out what is happening short before the crash.
I guess amanda just happens to stress the machine enough to tickle
the problem.  Amanda uses IO (network, disk, tape), CPU and RAM and
loads the bus (anything else in a computer?).
Run under /bin/script (*) a loop which gathers all interesting
information, like ps -efl, netstat -ni, vmstat, iostat, df, maybe
even dmesg|tail, all intermingled with date to get some timestamps,
and hope the resulting file contains some hints of what is happening
just before the crash.
Maybe the console contains a useful error message, which may not have
made it into /var/adm/messages.  Make sure it does not go into
powersafe mode or looses the info on the screen by rebooting.
E.g. connect a serial line to a PC with a terminal emulation (Hyperterm 
on Windows, or Kermit on Linux) having a very large screen history
buffer.

Also some hardware testing utilities would be nice.

We opened a call but without any messages sun was not able to solve the 
problem. During the last two days the system crashed two times.
But now the system does a automatic reboot.
Nice he, payed support :-)
(*) A nice tip I learned on this list a few days ago, by Eric Siegerman:
http://marc.theaimsgroup.com/?l=amanda-usersm=109959188008684w=2
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Maybe OT but a big problem with SOLARIS

2004-11-23 Thread Martin Hepworth
Michael
There's of diagnostics you can do with a SUN. It may well be bad RAM or 
other hardware issue. SUN should be able to talk you through all this..

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Michael Schaller wrote:
Hey!
Maybe this Thread is a little OT, maybe it isn't.
During a few days we've big problems with one of our servers.
It's a SUN FIRE V480R with external StorEdge an a external changer 
(Overland LoaderXpress with a single LTO-1 drive from HP) attached.

As I wrote a few weeks ago the system run fine with AMANDA and the 
changer. Only in the fist night after configuring AMANDA with the 
changer the automatic backup started and the complete system crashed!!
Solaris didn't give any messages in /var/adm/messages ...
The system was frozen, the only way to get the system back to life was a 
poweroff. After that the system was really fine for a few weeks.

Last week the same shit happend. The complete system was frozen.
We opened a call but without any messages sun was not able to solve the 
problem. During the last two days the system crashed two times.
But now the system does a automatic reboot.

Anybody ANY hints??
I get crazy ...
Thanks in advance
Michael
**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.
**


Re: Maybe OT but a big problem with SOLARIS

2004-11-23 Thread Eric Siegerman
On Tue, Nov 23, 2004 at 09:37:15AM +0100, Paul Bijnens wrote:
 Maybe the [Sun box's] console contains a useful error message [...]
 E.g. connect a serial line to a PC with a terminal emulation (Hyperterm 
 on Windows, or Kermit on Linux) having a very large screen history
 buffer.

I don't know your box, only our much more ancient Suns.  On
those, here's how it works:
  - Power off the system

  - Connect the terminal to the *first* serial port (on our
boxes, it's labelled A)

  - 9600-8-N-1

  - *Disconnect* the Sun keyboard

This has to be done with the power off.  Besides the usual
concern about frying the hardware, it's only at power-up time,
when the POST detects the missing keyboard, that the boot ROM
switches to serial-console mode.

Maybe you already know this stuff; if so, consider this post as
being for the benefit of the archives :-)  

--

|  | /\
|-_|/ Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED]
|  |  /
The animal that coils in a circle is the serpent; that's why so
many cults and myths of the serpent exist, because it's hard to
represent the return of the sun by the coiling of a hippopotamus.
- Umberto Eco, Foucault's Pendulum


RE: Maybe OT but a big problem with SOLARIS

2004-11-23 Thread Rebecca Pakish Crum
What version of Solaris are you running? Have you thought about
installing the Solaris BSM to monitor what exactly is happening? There's
a great article in the Solaris Administration magazine on the BSM with
some scripts on how to use it effectively. 



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Michael Schaller
 Sent: Tuesday, November 23, 2004 1:08 AM
 To: [EMAIL PROTECTED]
 Subject: Maybe OT but a big problem with SOLARIS
 
 
 Hey!
 
 Maybe this Thread is a little OT, maybe it isn't.
 
 During a few days we've big problems with one of our servers. 
 It's a SUN FIRE V480R with external StorEdge an a external changer 
 (Overland LoaderXpress with a single LTO-1 drive from HP) attached.
 
 As I wrote a few weeks ago the system run fine with AMANDA and the 
 changer. Only in the fist night after configuring AMANDA with the 
 changer the automatic backup started and the complete system 
 crashed!! Solaris didn't give any messages in 
 /var/adm/messages ... The system was frozen, the only way to 
 get the system back to life was a 
 poweroff. After that the system was really fine for a few weeks.
 
 Last week the same shit happend. The complete system was frozen.
 
 We opened a call but without any messages sun was not able to 
 solve the 
 problem. During the last two days the system crashed two 
 times. But now the system does a automatic reboot.
 
 Anybody ANY hints??
 
 I get crazy ...
 
 Thanks in advance
 Michael