Re: Maybe OT but a big problem with SOLARIS
Hey "amanda-users"! First: Thanks to all for your tips and hints. Second: The story goes on ... SUN updatet some packages of the OS (Solaris 8), updatet the OBP, they did some stresstests and so on - without success. During the last week we had the console output logged to a Putty-Session and what shall I say: At the console the machine told about some CPU problems. SUN changed the complete CPU-Board and the affected SCSI-Controller on which the tape devices are attached. Without success Today SUN will change the centerplane and we'll see ;-) I'll keep you informed. Greets from Stuttgart Michael Michael Schaller schrieb: Hey! Maybe this Thread is a little OT, maybe it isn't. During a few days we've big problems with one of our servers. It's a SUN FIRE V480R with external StorEdge an a external changer (Overland LoaderXpress with a single LTO-1 drive from HP) attached. As I wrote a few weeks ago the system run fine with AMANDA and the changer. Only in the fist night after configuring AMANDA with the changer the automatic backup started and the complete system crashed!! Solaris didn't give any messages in /var/adm/messages ... The system was frozen, the only way to get the system back to life was a "poweroff". After that the system was really fine for a few weeks. Last week the same shit happend. The complete system was frozen. We opened a call but without any messages sun was not able to solve the problem. During the last two days the system crashed two times. But now the system does a automatic reboot. Anybody ANY hints?? I get crazy ... Thanks in advance Michael
RE: Maybe OT but a big problem with SOLARIS
What version of Solaris are you running? Have you thought about installing the Solaris BSM to monitor what exactly is happening? There's a great article in the Solaris Administration magazine on the BSM with some scripts on how to use it effectively. > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Michael Schaller > Sent: Tuesday, November 23, 2004 1:08 AM > To: [EMAIL PROTECTED] > Subject: Maybe OT but a big problem with SOLARIS > > > Hey! > > Maybe this Thread is a little OT, maybe it isn't. > > During a few days we've big problems with one of our servers. > It's a SUN FIRE V480R with external StorEdge an a external changer > (Overland LoaderXpress with a single LTO-1 drive from HP) attached. > > As I wrote a few weeks ago the system run fine with AMANDA and the > changer. Only in the fist night after configuring AMANDA with the > changer the automatic backup started and the complete system > crashed!! Solaris didn't give any messages in > /var/adm/messages ... The system was frozen, the only way to > get the system back to life was a > "poweroff". After that the system was really fine for a few weeks. > > Last week the same shit happend. The complete system was frozen. > > We opened a call but without any messages sun was not able to > solve the > problem. During the last two days the system crashed two > times. But now the system does a automatic reboot. > > Anybody ANY hints?? > > I get crazy ... > > Thanks in advance > Michael > >
Re: Maybe OT but a big problem with SOLARIS
On Tue, Nov 23, 2004 at 09:37:15AM +0100, Paul Bijnens wrote: > Maybe the [Sun box's] console contains a useful error message [...] > E.g. connect a serial line to a PC with a terminal emulation (Hyperterm > on Windows, or Kermit on Linux) having a very large screen history > buffer. I don't know your box, only our much more ancient Suns. On those, here's how it works: - Power off the system - Connect the terminal to the *first* serial port (on our boxes, it's labelled "A") - 9600-8-N-1 - *Disconnect* the Sun keyboard This has to be done with the power off. Besides the usual concern about frying the hardware, it's only at power-up time, when the POST detects the missing keyboard, that the boot ROM switches to serial-console mode. Maybe you already know this stuff; if so, consider this post as being for the benefit of the archives :-) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Maybe OT but a big problem with SOLARIS
Michael There's of diagnostics you can do with a SUN. It may well be bad RAM or other hardware issue. SUN should be able to talk you through all this.. -- Martin Hepworth Snr Systems Administrator Solid State Logic Tel: +44 (0)1865 842300 Michael Schaller wrote: Hey! Maybe this Thread is a little OT, maybe it isn't. During a few days we've big problems with one of our servers. It's a SUN FIRE V480R with external StorEdge an a external changer (Overland LoaderXpress with a single LTO-1 drive from HP) attached. As I wrote a few weeks ago the system run fine with AMANDA and the changer. Only in the fist night after configuring AMANDA with the changer the automatic backup started and the complete system crashed!! Solaris didn't give any messages in /var/adm/messages ... The system was frozen, the only way to get the system back to life was a "poweroff". After that the system was really fine for a few weeks. Last week the same shit happend. The complete system was frozen. We opened a call but without any messages sun was not able to solve the problem. During the last two days the system crashed two times. But now the system does a automatic reboot. Anybody ANY hints?? I get crazy ... Thanks in advance Michael ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote confirms that this email message has been swept for the presence of computer viruses and is believed to be clean. **
Re: Maybe OT but a big problem with SOLARIS
Michael Schaller wrote: As I wrote a few weeks ago the system run fine with AMANDA and the changer. Only in the fist night after configuring AMANDA with the changer the automatic backup started and the complete system crashed!! Solaris didn't give any messages in /var/adm/messages ... The system was frozen, the only way to get the system back to life was a "poweroff". After that the system was really fine for a few weeks. Last week the same shit happend. The complete system was frozen. The trick is to find out what is happening short before the crash. I guess amanda just happens to stress the machine enough to tickle the problem. Amanda uses IO (network, disk, tape), CPU and RAM and loads the bus (anything else in a computer?). Run under /bin/script (*) a loop which gathers all interesting information, like "ps -efl", "netstat -ni", vmstat, iostat, df, maybe even "dmesg|tail", all intermingled with "date" to get some timestamps, and hope the resulting file contains some hints of what is happening just before the crash. Maybe the console contains a useful error message, which may not have made it into /var/adm/messages. Make sure it does not go into powersafe mode or looses the info on the screen by rebooting. E.g. connect a serial line to a PC with a terminal emulation (Hyperterm on Windows, or Kermit on Linux) having a very large screen history buffer. Also some hardware testing utilities would be nice. We opened a call but without any messages sun was not able to solve the problem. During the last two days the system crashed two times. But now the system does a automatic reboot. Nice he, payed support :-) (*) A nice tip I learned on this list a few days ago, by Eric Siegerman: http://marc.theaimsgroup.com/?l=amanda-users&m=109959188008684&w=2 -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... "Are you sure?" ... YES ... Phew ... I'm out * ***