http://bugs.freedesktop.org/show_bug.cgi?id=5092





------- Comment #9 from [EMAIL PROTECTED]  2007-06-25 04:18 PST -------
(In reply to comment #2)
> The situation appears to be more complicated than I thought initially. I've 
> did
> additional debugging and now think that all games are suffering from the same
> bug in a driver - the symptoms are quite similar. However, there are many ways
> to "activate" this bug, so every game (or GL app in general) has it's own
> workaround. For instance, in Trigger you should avoid GL_LINEAR_MIPMAP_LINEAR,
> in Torcs you should disable GL_ALPHA_TEST when rendering multitextures (it is
> always connected with textures somehow) and so on. Usually, the program hangs
> between return statement and the next line of code, i.e. in the sample code 
> below:
> 
> int some_func() {
>   ...
>   printf("BEFORE\n");
>   return 1;
> }
> ...
> while (some_func) {
>   printf("AFTER\n");
> }
> 
> you will see "BEFORE" line but not "AFTER".
> 
> So, I've prepared a very simple demo program (see attachement) below which 
> hangs
> my computer. Hope it will help debugging driver. More details are in 
> attachment
> comments.

I looked into this a bit more and I infer the following. I realise that you
looked at the bug in terms of high-level errors at the mesa DRI code. I
however, went lower than that to the exact root cause. My observations may be
different from yours, but here I go.
1. Debugging into this using gdb caused a hard lock in the glFlush() portion of
glx code, which in turn goes to the __mesa_Flush() in unichrome_dri.so. The
locking happens at different points of the code and therefore I figured that it
is an asynchronous event driven code that is causing this lock.
2. I finally went into the DRM portion of the code(libdrm) which ioctl's the
kernel for running various kernel level code from user space.
3. Adding printk's to DRM code finally isolated the problem. There is a
function in via_irq.c called via_driver_vblank_wait(), which is probably
serviced when the VIA_IRQ_VBLANK_PENDING interrupt bit is set. It calls
viadrv_acknowledge_irqs(). 
4. This reads the VIA_INTERRUPTS_REG using the VIA_READ macro(which is a readl
PCI post), 'or' it with the VIA_IRQ_VBLANK_PENDING bit. QUESTION: If it is
interrupt driven, this bit should already be set. Why is it being set during
acknowledge? Then it writes the VIA_INTERRUPTS_REG back using VIA_WRITE.
5. Looking at the sequence of printk's I see that VIA_READ and VIA_WRITE
happens several times and that at one point VIA_READ simply locks.

Observations:

1. Since this locking is happening in a mmio PCI Posting, it probably means
there is some bus arbitration problems(memory space must be mapped to agpgart).
So is the bug in agpgart? Or is there something in the hardware that says you
cannot read and write to HW registers using PCI posts continuously and maybe
you should introduce gaps or delays between READ's and WRITE's?
2. Since the hw is mmio, I would imagine that PCI posting(reading and writing
together) although non-blocking would be properly handled by the bus
aribitration queue. It would be a great help if we had the manufacturer specs.
This is wierder because it happens only to a few via chipsets(Unichrome Pro B).
3. I think it must be related to certain HW timing differences between the
chipsets. Matters are not helped by the fact that the bug seems to lie at
kernel space where debugging is a lot more difficult. Debugging with Linice
seems to be a good way of reducing wastage of time, but I don't think it is
stable enough for the latest 2.6.x kernels.
4. Finally, just giving arbitray udelays do not seem to solve the problem. On
the other hand, they just slow the system much more. And the VIA_READ still
hangs. If it is a timing issue, then there is more to it than just simple delay
between reading and writing of HW registers.
5. Would very much like someone, to go further into this, and if possible, get
help from the DRI architects, as they maybe the best persons to deal with this
problem, with or without HW specs for the chipset.
Hope this helped in some way. I would love for comments or corrections on what
I have written. It may happen that your code flow happens entirely differently.
Please let me know if so.

Hope this helps.


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to