On 03/31/2013 01:01 PM, Martin Andersson wrote:
On Sun, Mar 31, 2013 at 1:08 AM, Vadim Girlin <vadimgir...@gmail.com> wrote:
On 03/30/2013 05:35 AM, Martin Andersson wrote:

I found an issue with the shader compiler for Cayman when I looked
into why the ext_transform_feedback/order test case caused a GPU stall.
It turned out the stall was an infinite loop that was the result of broken
calculation in the shader function. The issue is that Cayman uses the
tgsi_umad function for UMAD, but that does not work since it does not
populate the y, z and w slots for UMUL that cayman requires.

This patch implements a cayman_umad. There are some things I'm unsure of
though.

The UMUL for Cayman is compiled to, as far as I can tell,
ALU_OP2_MULLO_INT and
not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the
uint
version in cayman_umad. In the patch I used the uint one, because that
seemed
the most logical.


Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK
MULLO_UINT should be used.

Ok, I will send a patch for that as well then.



The add part of UMAD I copied from tgsi_umad and that had a loop around
the
variable lasti, but the variable lasti is usally not used in cayman
specific code.


The only difference with umad on cayman is in the mul part - each MULLO_UINT
should be expanded to 4 slots on cayman. Add part doesn't need any changes.


This is used in tgsi functions.
int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);


This is used to determine last written vector component from the write mask,
so that if tgsi instruction doesn't write e.g. W component, we don't have to
emit R600 instruction(s) for that component.



But in cayman specific code this is used instead.
int last_slot = (inst->Dst[0].Register.WriteMask & 0x8) ? 4 : 3;


This is used for instructions like RECIP_xxx (see the comment at
r600_shader.c:40) that should be expanded to 3 slots with optional 4th slot
if the write to the W component is required, but MULLO_UINT is different -
it should be expanded to 4 instruction slots always. By the way, it seems
cayman_mul_int_instr is incorrect in this regard.



It does not work to switch lasti with last_slot, since that makes the loop
run too
many times (in my test case lasti is 0 and last_slot is 3). So I just
removed the
loop, was that correct or should i resolve that in some other way?


No, it's not correct, there should be a loop over the vector components for
addition as well - it should be performed in the same way as on the
pre-cayman chips. In your patch you are only performing the addition for one
component.

Basically, the only required change for UMAD on cayman is that you need to
expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots on
cayman.

Should I keep the cayman_umad function or should I modify tgsi_umad and
add the cayman specific part there?

I think it's better to modify tgsi_umad (to avoid unnecessary code duplication).

Vadim


Vadim



Martin Andersson (1):
    r600g: Add a Cayman specific version of UMAD

   src/gallium/drivers/r600/r600_shader.c | 47
+++++++++++++++++++++++++++++++++-
   1 file changed, 46 insertions(+), 1 deletion(-)



//Martin


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to