On Tue, 2012-03-13 at 17:04 +0000, Darac Marjal wrote: > On Mon, Mar 05, 2012 at 01:32:17AM -0500, KS wrote: > > On Mon, Mar 5, 2012, at 12:51 AM, KS wrote: > > > Hi all, > > > > > > The last few days I ahve noticed that when I return to my machine > > > (always ON), the screen doesn't respond. Keyboard (caps lock, num lock) > > > works. I can also ssh to the machine and have noticed that Xorg takes > > > 100% CPU. I couldn't find anything in the Xorg log or syslog files. > > > > > > Today however, the screen stopped responding after a beep while I was > > > using the machine. Below is what I found on sys log: > > > > > > Mar 5 00:32:28 gurh kernel: [17901.730462] NVRM: GPU at 0000:01:00.0 > > > has fallen off the bus. > > This doesn't sound particularly good. It would suggest to me that your > graphics card (the GPU) is no longer attached to the PCI bus. Probably > the best case scenario is that this is a physical problem: Open up your > computer, pull out the card and push it back in, making sure it's fully > seated. > > If the problem persists, then it may be that the card is locking up > completely such that the PCI bus THINKS you've pulled it out. You may > find monitoring the output of "nvclock -T" useful. > > > > > Syslog gave the warning again as above! > > > > So it this just a kernel issue? > > > > Thanks, > > KS > >
Hi Darac, I don't think this is related to HW issue, indeed, I'm experiencing this since some time ago on two different machines. All I can have is the following: root@laptop:~# head -20 /var/log/syslog May 31 22:28:59 laptop syslog-ng[1860]: Configuration reload request received, reloading configuration; May 31 22:28:59 laptop syslog-ng[1860]: EOF on control channel, closing connection; May 31 22:29:00 laptop anacron[11394]: Job `cron.daily' terminated May 31 22:29:00 laptop anacron[11394]: Normal exit (1 job run) May 31 22:49:00 laptop -- MARK -- May 31 23:05:40 laptop kernel: [32915.745040] sdc: detected capacity change from 8019509248 to 0 May 31 23:05:52 laptop kernel: [32927.622139] usb 2-1: USB disconnect, device number 8 May 31 23:08:11 laptop kernel: [33066.384097] NVRM: GPU at 0000:01:00.0 has fallen off the bus. May 31 23:08:11 laptop kernel: [33066.384102] NVRM: GPU at 0000:01:00.0 has fallen off the bus. May 31 23:08:11 laptop kernel: [33066.384120] NVRM: os_pci_init_handle: invalid context! May 31 23:08:11 laptop kernel: [33066.384124] NVRM: os_pci_init_handle: invalid context! May 31 23:08:11 laptop kernel: [33066.384176] NVRM: os_pci_init_handle: invalid context! May 31 23:08:11 laptop kernel: [33066.384179] NVRM: os_pci_init_handle: invalid context! May 31 23:13:06 laptop kernel: [ 0.000000] Initializing cgroup subsys cpuset May 31 23:13:06 laptop kernel: [ 0.000000] Initializing cgroup subsys cpu May 31 23:13:06 laptop kernel: [ 0.000000] Linux version 3.2.0-2-686-pae (Debian 3.2.18-1) (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-5) ) #1 SMP Mon May 21 18:24:12 UTC 2012 May 31 23:13:06 laptop kernel: [ 0.000000] BIOS-provided physical RAM map: May 31 23:13:06 laptop kernel: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f000 (usable) May 31 23:13:06 laptop kernel: [ 0.000000] BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) May 31 23:13:06 laptop kernel: [ 0.000000] BIOS-e820: 0000000000100000 - 00000000bfe5a800 (usable) then on Xorg side I have this [ 30399.257] (II) config/udev: Adding input device ELECOM ELECOM USB mouse with wheel (/dev/input/mouse2) [ 30399.257] (II) No input driver specified, ignoring this device. [ 30399.257] (II) This device may have been added with another device file. [ 33119.907] [mi] EQ overflowing. Additional events will be discarded until existing events are processed. [ 33119.907] [ 33119.907] Backtrace: [ 33120.497] 0: /usr/bin/Xorg (xorg_backtrace+0x49) [0xb7778099] [ 33120.497] 1: /usr/bin/Xorg (mieqEnqueue+0x22b) [0xb77569ab] [ 33120.497] 2: /usr/bin/Xorg (0xb75fb000+0x51265) [0xb764c265] [ 33120.497] 3: /usr/bin/Xorg (xf86PostMotionEventM+0xf9) [0xb7686119] [ 33120.497] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0xb4255000+0x35ad) [0xb42585ad] [ 33120.497] 5: /usr/lib/xorg/modules/input/evdev_drv.so (0xb4255000+0x4a2c) [0xb4259a2c] [ 33120.497] 6: /usr/bin/Xorg (0xb75fb000+0x7a8e1) [0xb76758e1] [ 33120.497] 7: /usr/bin/Xorg (0xb75fb000+0xa050a) [0xb769b50a] [ 33120.497] 8: (vdso) (__kernel_sigreturn+0x0) [0xb75dd400] [ 33120.497] 9: (vdso) (__kernel_vsyscall+0x10) [0xb75dd424] [ 33120.497] 10: /lib/i386-linux-gnu/i686/cmov/libc.so.6 (__gettimeofday+0x16) [0xb7309916] [ 33120.497] 11: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0xb486f000+0x62e0d) [0xb48d1e0d] [ 33120.497] [ 33120.497] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack. [ 33120.497] [mi] mieq is *NOT* the cause. It is a victim. [ 33120.983] (WW) NVIDIA(0): WAIT (0, 7, 0x8000, 0x00009354, 0x00009354) [ 33120.983] [mi] Increasing EQ size to 512 to prevent dropped events. [ 33120.983] [mi] EQ processing has resumed after 31 dropped events. [ 33120.983] [mi] This may be caused my a misbehaving driver monopolizing the server's resources. [ 33123.984] (WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x000098ec, 0x000098ec) [ 33126.986] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000a1bc) [ 33133.986] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000a1bc) [ 33136.987] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000a214) [ 33143.987] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000a214) [ 33146.988] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00009b7c, 0x0000b190) [ 33153.988] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00009b7c, 0x0000b190) [ 33156.989] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000b1e8) [ 33163.989] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000b1e8) [ 33166.993] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00009b7c, 0x0000b780) [ 33173.993] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00009b7c, 0x0000b780) [ 33176.997] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000c050) [ 33183.997] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000c050) [ 33186.999] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00009b7c, 0x0000cfcc) [ 33193.999] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00009b7c, 0x0000cfcc) [ 33197.000] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d024) [ 33204.000] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d024) [ 33207.005] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d8f4) [ 33214.005] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d8f4) [ 33220.157] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d94c) [ 33227.157] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d94c) [ 33230.158] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d980) [ 33237.158] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d980) [ 33240.159] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d9b4) [ 33247.159] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d9b4) [ 33250.160] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d9e8) [ 33257.160] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d9e8) [ 33260.161] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000da1c) [ 33267.161] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000da1c) [ 33270.162] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000da50) [ 33277.162] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000da50) [ 33280.163] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000da84) [ 33287.163] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000da84) [ 33290.164] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000dab8) [ 33297.164] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000dab8) I'm sure this is either kernel or Xorg issue, as HW failure could not happen on multiple workstations at the same moment root@laptop:~# lspci 00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c) 00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 0c) 00:1a.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 02) 00:1a.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02) 00:1a.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02) 00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02) 00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02) 00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 02) 00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 02) 00:1d.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02) 00:1d.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02) 00:1d.2 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02) 00:1d.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2) 00:1f.0 ISA bridge: Intel Corporation 82801HM (ICH8M) LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 02) 00:1f.2 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 02) 00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02) 01:00.0 VGA compatible controller: NVIDIA Corporation G86M [Quadro NVS 135M] (rev a1) 03:01.0 CardBus bridge: O2 Micro, Inc. Cardbus bridge (rev 21) 03:01.4 FireWire (IEEE 1394): O2 Micro, Inc. Firewire (IEEE 1394) (rev 02) 09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5755M Gigabit Ethernet PCI Express (rev 02) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61) I've also seen this happening more and more when system is loaded and especially when gnome-shell tries to do its special effects when hitting the "Activites" button. PS: Please keep me in CC as I'm not subscribed to this list. Cheers,
signature.asc
Description: This is a digitally signed message part