Re: random system hang
I also forgot to mention that xscreensaver seems better behaved. The OpenGL screensavers no longer "bleed" onto the screen in preview mode (from KDE) and most of them seem to work when quickly scrolling through them. Ron On Tue, June 21, 2005 17:23, Ron Farrer said: > Based on some off-list discussion I decided to try recompiling the nvidia > kernel driver with gcc 3.4.4 (instead of 3.3.6) and give a kernel boot > option of "acpi=off". The results so far show much better stability in X > and the kernel no longer complains about loosing ticks. I'll keep an eye > on it but ut2004 (amd64 binary) no longer randomly crashes - something I > did not notice before because I didn't play the game long enough on this > machine (hence why I stated it was fine in a previous post). However, it > hasn't been that long (day and a half) and only time will tell if either > change fixed anything... -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: random system hang
On Tue, June 21, 2005 2:40, Thomas Steffen said: > > It might just be the X server going crazy. Since it runs as root, it > can take the whole system down if it crashes. You may try to compile > "Magic keys" into the kernel, so that you can kill the server using > the keyboard. That's good advice. Although with only one exception I have been able to ssh in and restart X or reboot the machine. > But what I would really recommend is to try X.org 6.8 from ubuntu. It > has solved many many problems for me (ATI card), and my machine rans > fine now. I'd like to try x.org, but I think I'll wait until it's in sid. > Hm... I had some issues with my SATA drive. The connection was not > reliable, so that it would "unplugg" itself. A new cable fixed that. > Is /var/log/messages on the SATA drive? This is possible but I have done a LOT of heavy I/O to/from the disk without even a hickup. I even (recently) copied 200GB from an ATA/100 drive to a SATA drive without issue. Based on some off-list discussion I decided to try recompiling the nvidia kernel driver with gcc 3.4.4 (instead of 3.3.6) and give a kernel boot option of "acpi=off". The results so far show much better stability in X and the kernel no longer complains about loosing ticks. I'll keep an eye on it but ut2004 (amd64 binary) no longer randomly crashes - something I did not notice before because I didn't play the game long enough on this machine (hence why I stated it was fine in a previous post). However, it hasn't been that long (day and a half) and only time will tell if either change fixed anything... Ron -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: random system hang
On Tue, June 21, 2005 0:09, Olleg Samoylov said: > > I have Tyan S2875 with one Opteron 240, DDR400. I can give some advise. > 1. I had problems with SATA drive. Update BIOS to newest. Bios for S2875 > is very buggy especialy for SATA (may be your swap in SATA?). I am running the latest BIOS (3.00) because it is required to get this board to POST with Opteron 252 processors (found that out the hard way). > 2. I had hangs with DDR400. memtest86+ show buggy DIMM. Try this usefull > test. I left memtest run overnight (I know, not that long...) and there were no problems. > 3. I very often have hang with mplayer. But mplayer is not in debian, > thus I can't comment this. I can't comment here as I don't use mplayer. I do use xine and it runs perfectly from a chroot (for w32 codecs). XMMS runs fine natively. Ron -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: random system hang
I'm using a Tyan thunder K8W S2885 mobo, but have no SATA drives attached. Hmmm, most of the time, the machine hangs when I'm away, so the screensaver is running, though it's only using the blank screen. Sometimes however, it's hung while I'm in the middle of using the system. I've moved to the 2.6.12 kernel yesterday. Let's see if that makes any difference. Haven't tried xorg. Charles. On Tue, 2005-06-21 at 15:28 +0100, Manuel Capinha wrote: > Yet another "Me too" message. > > I've seen this happen a lot on a Tyan Mobo with 2 Opterons (can't > recall the exact model but I could look it up). For us, disabling the > xscreensaver solved it. I can't pinpoint it to one specific > screensaver (we even stress tested a lot of them) but _almost_ > everytime it crashed it was running xscreensaver. The times it crashed > without xscreensaver, I believe that the screensaver could be just > starting. > > Our first guess went into the amd64 java package from sun, but after > removing all the java apps from there it kept crashing. It kept > crashing when we started using the x86 java so... we then removed the > screensaver and it never crashed ever since. We're using the DPMS > stuff to turn off the monitor and nothing else. > > Since you're all seeing problems in X maybe this is related and YMMV :) > > Cheers, > Manuel > > On 6/21/05, Thomas Steffen <[EMAIL PROTECTED]> wrote: > > On 6/18/05, Ron Farrer <[EMAIL PROTECTED]> wrote: > > > I guess I didn't "knock on wood" quick enough or something. After I sent > > > my reply I left for lunch and apon returning (1 hour) the thing was locked > > > up! > > > > > > I went over to another machine, logged in via ssh. I looked at the CPU > > > usage and X was pegging one of the CPUs out to 100% usage. > > > > It might just be the X server going crazy. Since it runs as root, it > > can take the whole system down if it crashes. You may try to compile > > "Magic keys" into the kernel, so that you can kill the server using > > the keyboard. > > > > But what I would really recommend is to try X.org 6.8 from ubuntu. It > > has solved many many problems for me (ATI card), and my machine rans > > fine now. > > > > > So I tried to run "less > > > /var/log/messages" and it hung. > > > > Hm... I had some issues with my SATA drive. The connection was not > > reliable, so that it would "unplugg" itself. A new cable fixed that. > > Is /var/log/messages on the SATA drive? > > > > Thomas > > > > signature.asc Description: This is a digitally signed message part
Re: random system hang
Yet another "Me too" message. I've seen this happen a lot on a Tyan Mobo with 2 Opterons (can't recall the exact model but I could look it up). For us, disabling the xscreensaver solved it. I can't pinpoint it to one specific screensaver (we even stress tested a lot of them) but _almost_ everytime it crashed it was running xscreensaver. The times it crashed without xscreensaver, I believe that the screensaver could be just starting. Our first guess went into the amd64 java package from sun, but after removing all the java apps from there it kept crashing. It kept crashing when we started using the x86 java so... we then removed the screensaver and it never crashed ever since. We're using the DPMS stuff to turn off the monitor and nothing else. Since you're all seeing problems in X maybe this is related and YMMV :) Cheers, Manuel On 6/21/05, Thomas Steffen <[EMAIL PROTECTED]> wrote: > On 6/18/05, Ron Farrer <[EMAIL PROTECTED]> wrote: > > I guess I didn't "knock on wood" quick enough or something. After I sent > > my reply I left for lunch and apon returning (1 hour) the thing was locked > > up! > > > > I went over to another machine, logged in via ssh. I looked at the CPU > > usage and X was pegging one of the CPUs out to 100% usage. > > It might just be the X server going crazy. Since it runs as root, it > can take the whole system down if it crashes. You may try to compile > "Magic keys" into the kernel, so that you can kill the server using > the keyboard. > > But what I would really recommend is to try X.org 6.8 from ubuntu. It > has solved many many problems for me (ATI card), and my machine rans > fine now. > > > So I tried to run "less > > /var/log/messages" and it hung. > > Hm... I had some issues with my SATA drive. The connection was not > reliable, so that it would "unplugg" itself. A new cable fixed that. > Is /var/log/messages on the SATA drive? > > Thomas > >
Re: random system hang
On 6/18/05, Ron Farrer <[EMAIL PROTECTED]> wrote: > I guess I didn't "knock on wood" quick enough or something. After I sent > my reply I left for lunch and apon returning (1 hour) the thing was locked > up! > > I went over to another machine, logged in via ssh. I looked at the CPU > usage and X was pegging one of the CPUs out to 100% usage. It might just be the X server going crazy. Since it runs as root, it can take the whole system down if it crashes. You may try to compile "Magic keys" into the kernel, so that you can kill the server using the keyboard. But what I would really recommend is to try X.org 6.8 from ubuntu. It has solved many many problems for me (ATI card), and my machine rans fine now. > So I tried to run "less > /var/log/messages" and it hung. Hm... I had some issues with my SATA drive. The connection was not reliable, so that it would "unplugg" itself. A new cable fixed that. Is /var/log/messages on the SATA drive? Thomas
Re: random system hang
Ron Farrer wrote: Hardware list (to look for any possible common connections): 2 x Opteron 252 Tyan S2875 2GB DDR400 (2 x 1GB) SATA (one seagate drive) IDE (one sony DVDRW) EVGA Nvidia Geforce 6800 Ultra 256MB (AGP 8x, FW, and SBA enabled) no PCI devices installed I have Tyan S2875 with one Opteron 240, DDR400. I can give some advise. 1. I had problems with SATA drive. Update BIOS to newest. Bios for S2875 is very buggy especialy for SATA (may be your swap in SATA?). 2. I had hangs with DDR400. memtest86+ show buggy DIMM. Try this usefull test. 3. I very often have hang with mplayer. But mplayer is not in debian, thus I can't comment this. -- Olleg Samoylov smime.p7s Description: S/MIME Cryptographic Signature
Re: random system hang
On Fri, June 17, 2005 13:09, Ron Farrer said: > On one of the lockups I started killing off processes and finally > determined that X was not stopping when given the HUP and SEGV signals. > Running (I use KDE) "/etc/init.d/kdm stop" would end in an error about the > xserver not responding. Luckily (for me) X would stop with a KILL signal > (-9) and I was able to restart X with "/etc/init.d/kdm restart" which > would then return the local console and the frozen screen to normal and > the machine would operate normally from that point on. I guess I didn't "knock on wood" quick enough or something. After I sent my reply I left for lunch and apon returning (1 hour) the thing was locked up! I went over to another machine, logged in via ssh. I looked at the CPU usage and X was pegging one of the CPUs out to 100% usage. xscreensaver appeared to have died (it was not in the output of "ps aux") but it was clearly the last thing on the frozen screen. I noted the system load was 2.95 (the machine was idle when I left it). I killed X (with -9) and restarted kdm. I opened (still on ssh) a file in vim (a text file that I had been editing before I left for lunch) and vim hung. So I opened another ssh session and ran "ps aux" and it hung before finishing the list (got probably 3/4 the way through). So I tried to run "less /var/log/messages" and it hung. Ok, so I opened yet another ssh session and tried to run "top" and the whole system stopped responding (first time ever). At this point I could no longer ssh in. Left with no other choice I pressed the "reset" and the box came back up fine. Just a bit of FYI if anyone has an idea about the cause. BTW I'm running Debian kernel 2.6.11-9-amd64-k8-smp Ron -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: random system hang
On Thu, June 16, 2005 15:56, Charles Leggett said: > > My dual opteron system hangs at random intervals. Sometimes it's stable > for a week, sometimes it hangs after just a few hours. The symptoms > are always the same - NONE. Carefully scanning the system logs shows > abslutely nothing occurred to cause a hang. No kernel oopses, no error > messages. It's been this way ever since I insalled debian 6 months ago - > before that I was running CentOS, and it never died then, so I'm pretty > sure it's not a hardware problem. I don't really have a solution, but rather this is more of a "me too" reply. I have also been seeing seemingly random lockups on a dual Opteron system. The screen is frozen and there is no keyboard response. Once it has locked up, however, I was able to login via ssh and do some looking around. Most commands take about 1-2 minutes to complete. Just typing "vim somefile" can take as long as 2 minutes before it completes. Login in via ssh sometimes takes 30 seconds or more. On one of the lockups I started killing off processes and finally determined that X was not stopping when given the HUP and SEGV signals. Running (I use KDE) "/etc/init.d/kdm stop" would end in an error about the xserver not responding. Luckily (for me) X would stop with a KILL signal (-9) and I was able to restart X with "/etc/init.d/kdm restart" which would then return the local console and the frozen screen to normal and the machine would operate normally from that point on. There doesn't seem to be any visable connection between the lockups outside X and friends. It can be anywhere from a few hours (rare) to a week. So far most of the lockups were while I wasn't even in the office - one was in the middle of the night and another was during the day when I was away. I have been having weird behavior from xscreensaver (doesn't want to start sometimes, some screensavers (especially opengl ones) will bleed onto the screen in preview mode (forcing me to restart X to regain control), and sometimes (rare) it doesn't want to stop on keyboard or mouse activity) which could be the root of the problem. I have not yet tried running without xscreensaver and if the lockups continue I may try stopping it. I configured xscreensaver to use the "slide show" screensaver and I've only seen one lockup so far (knock on wood). The machine has never locked up while in use, only when idle. This system has been otherwise rock solid. It runs everything extremely well including games like ut2004 (amd64) and doom3 (i386 chroot). When a lockup happens there is nothing in any of the logs. I am running sid and keep it up-to-date. This machine has been heavily tested with a huge range of tasks (games, compiling, benchmarks, etc.) and none have shown signs of any problems. I like to compile large packages (for comparison to other machines and) to look for any signs of instability. After compiling xserver-xfree86 (33 minutes) there were no errors or unusual behavior (although I did not actually try using the compiled packages). Other compiled packages have built and run fine (although none are as large as X) so I'm leaning towards a problem with X or xscreensaver. Hardware list (to look for any possible common connections): 2 x Opteron 252 Tyan S2875 2GB DDR400 (2 x 1GB) SATA (one seagate drive) IDE (one sony DVDRW) EVGA Nvidia Geforce 6800 Ultra 256MB (AGP 8x, FW, and SBA enabled) no PCI devices installed Right now the machine has been up 7 days without a lockup and I'll continue to track it - but narrowing the problem down is difficult when the lockups only happen about a week or two apart... Regards, Ron -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]