Re: [r300] VB lockup found and fixed

Nicolai Haehnle Fri, 18 Feb 2005 08:50:44 -0800

On Friday 18 February 2005 16:03, Keith Whitwell wrote:
> Ben Skeggs wrote:
> >> I still have a 100% reproducable bug which I need to find the cause of,
> >> but time is once again a problem for me.  If I move a window over the 
top
> >> of a glxgears window my machine locks up immediately, but sysrq still 
> >> works
> >> fine.
> > 
> > 
> > I just discovered (and should've checked before), that I can ssh in and 
> > successfuly
> > kill glxgears, then X returns to normal.  I can have a partially covered 
> > glxgears
> > window and everything is fine, but as soon as the entire window (not 
> > incl. window
> > decorations) is covered, it seems that the 2d driver is unable to update 
> > the screen.
> 
> I think some of the other drivers do a 'sched_yeild()' or 'usleep(0)' in 
> the zero cliprect case to get away from this sort of behaviour.


Well, I can reproduce this bug and I tracked it down. There are a number of 
problems here, and they all have to do with DMA buffer accounting.
The first (trivial) problem is that nr_released_bufs was never reset to 0. 
I've already fixed that in CVS.
The real problem is that the following situation can occur when we have zero 
cliprects:
1. The command buffer contains a DISCARD command for a DMA buffer.
2. We simply drop that command buffer because there are no cliprects, i.e. 
nothing can be drawn.
3. As a consequence, DMA buffers aren't freed.
4. The rendering loop continues even though DMA buffers have been leaked, 
which eventually causes all DMA buffers to be exhausted, and this causes an 
infinite loop in r300RefillCurrentDmaRegion.

The root cause is that we drop the command buffers with the DISCARD. I can 
see two possible solutions to this problem:
1. Wait until we have a cliprect again before submitting command buffers.
2. Submit command buffers even when we have no cliprects. The kernel module 
would basically ignore everything but the DISCARD commands.
3. Something else?

I don't like option (1) because it somehow assumes that the user program 
only cares about OpenGL (and that's quite selfish). There are many use 
cases where it is plainly the incorrect thing to do:
- A user running something like Quake in listenserver mode; if they switch 
away from Quake for some reason (incoming messages, whatever), the server 
will stop and eventuall all clients will timeout.
- Imagine a chat application that uses some fancy 3D graphics for whatever 
reason (glitz, for example). Now this application may just be in the middle 
of drawing something when the user moves some other application above it. 
The end result will be that the applications essentially becomes locked up 
until it becomes visible again; in the mean time, the chat might time out 
and disconnect the user.
So (1) clearly isn't a good solution.

Option (2) is more correct, but it does seem a little bit hackish.

Any better ideas? Perhaps tracking which buffers were discarded? That's not 
exactly beautiful either.

cu,
Nicolai

> 
> Keith

pgp9gwf1pFSni.pgp
Description: PGP signature

Re: [r300] VB lockup found and fixed

Reply via email to