Bug#513904: Info received (Bug#513904: Xorg runaway in NVSync(), see #336774)

2009-02-03 Thread Sergio Gelato
* Sergio Gelato [2009-02-02 14:49:30 +0100]:
 #define READ_GET(pNv) ((pNv)-FIFO[0x0011]  2)
   while(READ_GET(pNv) != pNv-dmaPut);
 
 so it looks like a polling wait for an event that isn't happening.

Regardless of what is ultimately causing the problem on our system,
I don't like the fact that that loop is unbounded. Since it chews CPU
time anyway, I think one could easily afford a few cycles to maintain
a counter and bail out after some reasonable number of iterations.
Maybe one should also log some debugging info whenever that happens,
or take additional measures like disabling acceleration if it happens
too often. Having to kill the X server is painful for users; being able
to limp along and at least save one's work would be nicer.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#513904: Xorg runaway in NVSync(), see #336774

2009-02-02 Thread Sergio Gelato
Package: xserver-xorg-video-nv
Version: 1:2.0.3-1

This seems to be a follow-up of archived bug #336774, which was closed
due to the submitter no longer having access to a test system on which
to reproduce the issue.

Symptom: Xorg starts using nearly 100% of the CPU, becomes unresponsive.

Stack backtrace, obtained by attaching gdb to the running process:
#0  0xb7c24532 in NVSync () from /usr/lib/xorg/modules/drivers/nv_drv.so
#1  0xb6b00668 in XAAComposite () from /usr/lib/xorg/modules/libxaa.so
#2  0x08157597 in DamageDamageRegion ()
#3  0x08144517 in CompositePicture ()
#4  0x0814a853 in PanoramiXRenderReset ()
#5  0x081476b5 in AllocatePicturePrivate ()
#6  0x08086ccb in Dispatch ()
#7  0x0806e6b9 in main ()

The symptoms have appeared only recently on this system, so they could signal 
a hardware problem.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#513904: Info received (Bug#513904: Xorg runaway in NVSync(), see #336774)

2009-02-02 Thread Sergio Gelato
It just happened again, and here is a disassembly of the location:

0xb7b96530 NVSync+48: mov(%ecx),%eax
0xb7b96532 NVSync+50: shr$0x2,%eax
0xb7b96535 NVSync+53: cmp%edx,%eax
0xb7b96537 NVSync+55: jne0xb7b96530 NVSync+48

Since neither %ecx nor %edx get changed during the loop, the only way
out of it is for (%ecx) to be volatile.

In the source code, that loop is
#define READ_GET(pNv) ((pNv)-FIFO[0x0011]  2)
while(READ_GET(pNv) != pNv-dmaPut);

so it looks like a polling wait for an event that isn't happening.

This part of the source code looks exactly the same in 2.1.10, so I
don't think backporting it will be worth our time.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#513904: Xorg runaway in NVSync(), see #336774

2009-02-02 Thread Brice Goglin
Sergio Gelato wrote:
 Package: xserver-xorg-video-nv
 Version: 1:2.0.3-1

 This seems to be a follow-up of archived bug #336774, which was closed
 due to the submitter no longer having access to a test system on which
 to reproduce the issue.

 Symptom: Xorg starts using nearly 100% of the CPU, becomes unresponsive.

 Stack backtrace, obtained by attaching gdb to the running process:
 #0  0xb7c24532 in NVSync () from /usr/lib/xorg/modules/drivers/nv_drv.so
 #1  0xb6b00668 in XAAComposite () from /usr/lib/xorg/modules/libxaa.so
 #2  0x08157597 in DamageDamageRegion ()
 #3  0x08144517 in CompositePicture ()
 #4  0x0814a853 in PanoramiXRenderReset ()
 #5  0x081476b5 in AllocatePicturePrivate ()
 #6  0x08086ccb in Dispatch ()
 #7  0x0806e6b9 in main ()
   

Any chance you try with a more recent driver? 2.0.3 is old. We have
2.1.10 in testing/unstable and 2.1.12 in experimental?

thanks,
Brice




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#513904: Xorg runaway in NVSync(), see #336774

2009-02-02 Thread Sergio Gelato
* Brice Goglin [2009-02-02 12:50:32 +0100]:
 Any chance you try with a more recent driver? 2.0.3 is old. We have
 2.1.10 in testing/unstable and 2.1.12 in experimental?

Been thinking about that, but it looks like 2.1.10 build-depends on
xserver-xorg-dev (= 2:1.4), so a backport to etch may not be entirely
trivial. This is a production system, I can't very well turn my user
into a guinea pig; he's got work to do. I've only just started preparing 
our upgrade to lenny.

I can give backporting 2.1.10 a try, but I don't have a lot of time to
spare. Upgrading to lenny will happen, but probably not before release.

I should have mentioned that the hardware this happens on is an
NV37GL (Quadro NVS280, Dell OEM). Since we have several machines with
these, all running etch, and only one is misbehaving, I'm suspecting
either hardware or the user's specific application mix. The latter
seems unlikely, however: the problem appeared while gnome-screensaver
was running. (Still, it could be the result of earlier memory corruption
or what not.)



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org