On Tue, 2012-03-13 at 17:04 +0000, Darac Marjal wrote:

> On Mon, Mar 05, 2012 at 01:32:17AM -0500, KS wrote:
> > On Mon, Mar 5, 2012, at 12:51 AM, KS wrote:
> > > Hi all,
> > > 
> > > The last few days I ahve noticed that when I return to my machine
> > > (always ON), the screen doesn't respond. Keyboard (caps lock, num lock)
> > > works. I can also ssh to the machine and have noticed that Xorg takes
> > > 100% CPU. I couldn't find anything in the Xorg log or syslog files.
> > > 
> > > Today however, the screen stopped responding after a beep while I was
> > > using the machine. Below is what I found on sys log:
> > > 
> > > Mar  5 00:32:28 gurh kernel: [17901.730462] NVRM: GPU at 0000:01:00.0
> > > has fallen off the bus.
> 
> This doesn't sound particularly good. It would suggest to me that your
> graphics card (the GPU) is no longer attached to the PCI bus. Probably
> the best case scenario is that this is a physical problem: Open up your
> computer, pull out the card and push it back in, making sure it's fully
> seated.
> 
> If the problem persists, then it may be that the card is locking up
> completely such that the PCI bus THINKS you've pulled it out. You may
> find monitoring the output of "nvclock -T" useful.
> 
> > 
> > Syslog gave the warning again as above!
> > 
> > So it this just a kernel issue?
> > 
> > Thanks,
> > KS
> > 

Hi Darac,

I don't think this is related to HW issue, indeed, I'm experiencing this
since some time ago on two different machines. All I can have is the
following:

root@laptop:~# head -20 /var/log/syslog
May 31 22:28:59 laptop syslog-ng[1860]: Configuration reload request received, 
reloading configuration;
May 31 22:28:59 laptop syslog-ng[1860]: EOF on control channel, closing 
connection;
May 31 22:29:00 laptop anacron[11394]: Job `cron.daily' terminated
May 31 22:29:00 laptop anacron[11394]: Normal exit (1 job run)
May 31 22:49:00 laptop -- MARK --
May 31 23:05:40 laptop kernel: [32915.745040] sdc: detected capacity change 
from 8019509248 to 0
May 31 23:05:52 laptop kernel: [32927.622139] usb 2-1: USB disconnect, device 
number 8
May 31 23:08:11 laptop kernel: [33066.384097] NVRM: GPU at 0000:01:00.0 has 
fallen off the bus.
May 31 23:08:11 laptop kernel: [33066.384102] NVRM: GPU at 0000:01:00.0 has 
fallen off the bus.
May 31 23:08:11 laptop kernel: [33066.384120] NVRM: os_pci_init_handle: invalid 
context!
May 31 23:08:11 laptop kernel: [33066.384124] NVRM: os_pci_init_handle: invalid 
context!
May 31 23:08:11 laptop kernel: [33066.384176] NVRM: os_pci_init_handle: invalid 
context!
May 31 23:08:11 laptop kernel: [33066.384179] NVRM: os_pci_init_handle: invalid 
context!
May 31 23:13:06 laptop kernel: [    0.000000] Initializing cgroup subsys cpuset
May 31 23:13:06 laptop kernel: [    0.000000] Initializing cgroup subsys cpu
May 31 23:13:06 laptop kernel: [    0.000000] Linux version 3.2.0-2-686-pae 
(Debian 3.2.18-1) (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 
4.6.3-5) ) #1 SMP Mon May 21 18:24:12 UTC 2012
May 31 23:13:06 laptop kernel: [    0.000000] BIOS-provided physical RAM map:
May 31 23:13:06 laptop kernel: [    0.000000]  BIOS-e820: 0000000000000000 - 
000000000009f000 (usable)
May 31 23:13:06 laptop kernel: [    0.000000]  BIOS-e820: 000000000009f000 - 
00000000000a0000 (reserved)
May 31 23:13:06 laptop kernel: [    0.000000]  BIOS-e820: 0000000000100000 - 
00000000bfe5a800 (usable)

then on Xorg side I have this

[ 30399.257] (II) config/udev: Adding input device ELECOM ELECOM USB mouse with 
wheel  (/dev/input/mouse2)
[ 30399.257] (II) No input driver specified, ignoring this device.
[ 30399.257] (II) This device may have been added with another device file.
[ 33119.907] [mi] EQ overflowing.  Additional events will be discarded until 
existing events are processed.
[ 33119.907] 
[ 33119.907] Backtrace:
[ 33120.497] 0: /usr/bin/Xorg (xorg_backtrace+0x49) [0xb7778099]
[ 33120.497] 1: /usr/bin/Xorg (mieqEnqueue+0x22b) [0xb77569ab]
[ 33120.497] 2: /usr/bin/Xorg (0xb75fb000+0x51265) [0xb764c265]
[ 33120.497] 3: /usr/bin/Xorg (xf86PostMotionEventM+0xf9) [0xb7686119]
[ 33120.497] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0xb4255000+0x35ad) 
[0xb42585ad]
[ 33120.497] 5: /usr/lib/xorg/modules/input/evdev_drv.so (0xb4255000+0x4a2c) 
[0xb4259a2c]
[ 33120.497] 6: /usr/bin/Xorg (0xb75fb000+0x7a8e1) [0xb76758e1]
[ 33120.497] 7: /usr/bin/Xorg (0xb75fb000+0xa050a) [0xb769b50a]
[ 33120.497] 8: (vdso) (__kernel_sigreturn+0x0) [0xb75dd400]
[ 33120.497] 9: (vdso) (__kernel_vsyscall+0x10) [0xb75dd424]
[ 33120.497] 10: /lib/i386-linux-gnu/i686/cmov/libc.so.6 (__gettimeofday+0x16) 
[0xb7309916]
[ 33120.497] 11: /usr/lib/xorg/modules/drivers/nvidia_drv.so 
(0xb486f000+0x62e0d) [0xb48d1e0d]
[ 33120.497] 
[ 33120.497] [mi] These backtraces from mieqEnqueue may point to a culprit 
higher up the stack.
[ 33120.497] [mi] mieq is *NOT* the cause.  It is a victim.
[ 33120.983] (WW) NVIDIA(0): WAIT (0, 7, 0x8000, 0x00009354, 0x00009354)
[ 33120.983] [mi] Increasing EQ size to 512 to prevent dropped events.
[ 33120.983] [mi] EQ processing has resumed after 31 dropped events.
[ 33120.983] [mi] This may be caused my a misbehaving driver monopolizing the 
server's resources.
[ 33123.984] (WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x000098ec, 0x000098ec)
[ 33126.986] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000a1bc)
[ 33133.986] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000a1bc)
[ 33136.987] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000a214)
[ 33143.987] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000a214)
[ 33146.988] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00009b7c, 0x0000b190)
[ 33153.988] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00009b7c, 0x0000b190)
[ 33156.989] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000b1e8)
[ 33163.989] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000b1e8)
[ 33166.993] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00009b7c, 0x0000b780)
[ 33173.993] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00009b7c, 0x0000b780)
[ 33176.997] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000c050)
[ 33183.997] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000c050)
[ 33186.999] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00009b7c, 0x0000cfcc)
[ 33193.999] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00009b7c, 0x0000cfcc)
[ 33197.000] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d024)
[ 33204.000] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d024)
[ 33207.005] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d8f4)
[ 33214.005] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d8f4)
[ 33220.157] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d94c)
[ 33227.157] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d94c)
[ 33230.158] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d980)
[ 33237.158] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d980)
[ 33240.159] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d9b4)
[ 33247.159] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d9b4)
[ 33250.160] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000d9e8)
[ 33257.160] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000d9e8)
[ 33260.161] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000da1c)
[ 33267.161] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000da1c)
[ 33270.162] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000da50)
[ 33277.162] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000da50)
[ 33280.163] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000da84)
[ 33287.163] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000da84)
[ 33290.164] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0x00009b7c, 0x0000dab8)
[ 33297.164] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0x00009b7c, 0x0000dab8)


I'm sure this is either kernel or Xorg issue, as HW failure could not
happen on multiple workstations at the same moment

root@laptop:~# lspci 
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory 
Controller Hub (rev 0c)
00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root 
Port (rev 0c)
00:1a.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
Controller #5 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI 
Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio 
Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 
(rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 
(rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 
(rev 02)
00:1d.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
Controller #3 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI 
Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HM (ICH8M) LPC Interface Controller 
(rev 02)
00:1f.1 IDE interface: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) IDE 
Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA 
Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: NVIDIA Corporation G86M [Quadro NVS 135M] 
(rev a1)
03:01.0 CardBus bridge: O2 Micro, Inc. Cardbus bridge (rev 21)
03:01.4 FireWire (IEEE 1394): O2 Micro, Inc. Firewire (IEEE 1394) (rev 02)
09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5755M Gigabit 
Ethernet PCI Express (rev 02)
0c:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN 
[Kedron] Network Connection (rev 61)


I've also seen this happening more and more when system is loaded and
especially when gnome-shell tries to do its special effects when hitting
the "Activites" button.

PS: Please keep me in CC as I'm not subscribed to this list.

Cheers,

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to