Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

Vadim Girlin Sun, 14 Apr 2013 09:45:51 -0700

On 04/13/2013 09:54 PM, Martin Andersson wrote:

On Sat, Apr 13, 2013 at 4:23 AM, Vadim Girlin <vadimgir...@gmail.com> wrote:

On 04/12/2013 11:36 PM, Martin Andersson wrote:


I have made some progress with this issue.

Vadim, I did as you suggested and tried to mimic the output from the
shader analyser
tool. I used your patch as a base and then tried various ways to see
what would work.
After many tries (and lockups) I did managed to get the
ext_transform_feedback/order
test to pass.

It is a very ugly patch but it should illustrate what the problem (and
potential solution) is.

Your test program fails however because explicit break statements do
not work. It
should be possible to use the same code for the explicit breaks as for
the implicit
loop break.The reason it does not is that I detect the implicit break
with a hack and
it does notwork for explicit breaks.

The problem is that I need to detect the break statement when creating the
corresponding if statement. So that I can treat it differently than
other "regular" if
statements. Anyone knows how I could do that, or is this the wrong
approach?


It doesn't work with my test app because IF/ENDIF blocks with BRK may
contain other code, so you can't simply throw away IF/ENDIF making that code
execute unconditionally.


Yeah my hack is not an viable option.

By the way, shader analyzer in some cases also produces the code with
JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will really
work as expected with catalyst. Possibly we're simply missing something in
the hardware configuration.

Also there is one thing that I didn't take into account in my initial patch
- r600g converts ALU followed by POP to ALU_POP_AFTER and this might explain
why my initial patch doesn't work. Possibly if we prevent that optimization
for ALU containing PRED_SET-BREAK and leave separate POP, it might be enough
to make it work. I'm attaching the additional patch that will force POP to
be a separate instruction in this case, please test it (on top of the my
first patch). This would be at least not very intrusive.


No, that patch did not help either.

If this won't help, then I think we should understand what exactly we are
trying to fix before implementing any big changes, possibly there is a
better solution or at least a more clean workaround. In the worst case we
can return to your approach and improve it to handle other cases.


I'm starting to think that there is nothing wrong with the shader
compiler. It seems to me that a push, pop inside a nested loop clears
the break status on a thread.

shift_reg = 1u;
count = 0u;
while (true) {
     if (x == 1u)
         break;
      while (true) {
          if (x != 1u)
               count = 10u;
          if (x == 1u)
               count = 20u;
          break;
      }
      shift_reg = 2u;
      break;
}

input: x == 0
actual ouput: shift_reg == 2, count == 10
expected output: shift_reg == 2, count == 10

input: x == 1
actual ouput: shift_reg == 2, count == 20
expected output: shift_reg == 1, count == 0

If I swap the if statements in the inner loop I get different results.

shift_reg = 1u;
count = 0u;
while (true) {
     if (x == 1u)
         break;
      while (true) {
          if (x == 1u)
               count = 20u;
          if (x != 1u)
               count = 10u;
          break;
      }
      shift_reg = 2u;
      break;
}

input: x == 0
actual ouput: shift_reg == 2, count == 10
expected output: shift_reg == 2, count == 10

input: x == 1
actual ouput: shift_reg == 2, count == 0
expected output: shift_reg == 1, count == 0

I tested both cases on mesa master and mesa master + Vadims two
patches with the same results.

This turned out to be a known issue with cayman: BREAK/CONTINUE followedby LOOP_STARTxxx for nested loop may put the branch stack into the statesuch that ALU_PUSH_BEFORE doesn't work as expected.

It seems the simplest workaround is either to avoid ALU_PUSH_BEFORE innested loops completely or to replace it with separate PUSH and ALU.

We can check if we actually have BREAK/CONTINUE in the outer loop beforeLOOP_START for the inner loop, but I think it will be true in mostcases, so the simplest fix for r600g is to replace all ALU_PUSH_BEFOREwith PUSH + ALU in the nested loops on cayman.


Vadim

//Martin

Vadim

//Martin

On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin <vadimgir...@gmail.com>
wrote:


On 04/11/2013 02:08 AM, Marek Olšák wrote:



Here's the output:

creating vs ...
shader compilation status: OK
creating fs ...
shader compilation status: OK
thread #0 (0;0) : ref = 16608
thread #1 (1;0) : ref = 27873
thread #2 (0;1) : ref = 16608
thread #3 (1;1) : ref = 27877
results:
    thread 0 (0, 0): expected = 16608, observed = 27876, FAIL
    thread 1 (1, 0): expected = 27873, observed = 27873, OK
    thread 2 (0, 1): expected = 16608, observed = 27876, FAIL
    thread 3 (1, 1): expected = 27877, observed = 27877, OK


Thanks. According to these results, it looks like LOOP_START_DX10 for
inner
loop somehow reactivates the threads that were put into inactive-break
state
by the LOOP_BREAK in the outer loop. Also it seems LOOP_BREAK in the
inner
loop doesn't work as expected in this case. In other words, it looks
weird.

I can't explain why would this happen. It might be interesting to run
these
tests with llvm backend to see if there are any differences.

Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP
in
the PRED_SET encoding as in my earlier patch, but without any stack
push/pop
operations and jumps (where it's possible), closer to what the catalyst
(shader analyzer) does. I'm not sure if it will help though, and anyway
we'll need stack operations in some cases, so I'm afraid this won't fix
the
issue completely.

So far I have no other ideas.

Vadim

Marek


On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin
<vadimgir...@gmail.com>wrote:

On 04/10/2013 01:53 PM, Marek Olšák wrote:

glsl-fs-loop-nested passes here.

nstack is 3 and adding 4 to it doesn't help.


Ok, thanks.

Also I wrote a simple test app that should reproduce the issue if it's
really related to diverging control flow with nested loops and might
more
information about what's going wrong.

The source is in the attachment and needs to be compiled with -lGL
-lglut
-lGLEW. The app renders four points and computes some value for each
point
in the loops similar to the transform feedback order test, but it
doesn't
use tfb. It should render four green or red squares depending on
correctness of the result.

Here is the correct output produced for me on evergreen:

    thread 0 (0, 0): expected = 16608, observed = 16608, OK
    thread 1 (1, 0): expected = 27873, observed = 27873, OK
    thread 2 (0, 1): expected = 16608, observed = 16608, OK
    thread 3 (1, 1): expected = 27877, observed = 27877, OK

Please post the output if it fails on cayman.

Vadim

Marek


On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin <vadimgir...@gmail.com>
wrote:

    On 04/10/2013 03:58 AM, Marek Olšák wrote:




    Hi Vadim,




your patch does not fix the test.

Hmm, I'm out of ideas then. Thanks for testing.

I've checked the shader dump few times but I don't see anything
obviously
wrong there, and the same code (except the minor ALU grouping changes
due
to the VLIW4/VLIW5 difference) works fine for me on evergreen.

According to the Martin's observations it looks like if the threads
that
shouldn't execute the loop body were incorrectly left in the active
state.
LOOP_BREAK should put them into the inactive-break state, but
something
goes wrong. Do the other piglit tests with nested loops (e.g.
glsl-fs-loop-nested) work on cayman? Though possibly there are no
other
tests with the diverging loops as in this case.

I'll try to write a simpler test with the diverging loops to see if
the
issue is really caused by the incorrect control flow handling, and to
figure out the exact instruction that results in the incorrect active
state.

Also probably it worth checking if the stack size is correct for that
shader (latest mesa should print nstack value in the shader
disassemble
header, I think it should be 3 for that shader) and maybe try adding
some
constant, e.g. 4 to the bc->nstack in the r600_bytecode_build just to
be
sure that we reserve enough of stack space, though I don't think
stack
size
is the cause of this issue.

Vadim



    Marek





On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin
<vadimgir...@gmail.com>
wrote:

     On 04/09/2013 10:58 AM, Martin Andersson wrote:


     On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák <mar...@gmail.com>
wrote:


     Pushed, thanks. The transform feedback test still doesn't
pass,
but
at

least
the hardlocks are gone.


    Thanks, I have looked into the other issue as well





http://lists.freedesktop.org/******archives/mesa-dev/2013-**March/**<http://lists.freedesktop.org/****archives/mesa-dev/2013-March/**>
**036941.html<http://lists.**freedesktop.org/**archives/**


mesa-dev/2013-March/**036941.**html<http://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html>



<http://lists.**freedesktop.**org/archives/mesa-**<http://freedesktop.org/archives/mesa-**>
dev/2013-March/036941.html<htt**p://lists.freedesktop.org/**


archives/mesa-dev/2013-March/**036941.html<http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html>


The problem arises when there are nested loops. If I rework the
code
so there are
no nested loops the issue disappears. At least one pixel also
needs
to
enter the
outer loop. The pixels that should enter the outer loop behaves
correctly. It is those
pixels that should not enter the outer loop that misbehaves. It
does
not matter if they
also fails the test for the inner loop, they will still execute
the
instruction inside. That
leads to the strange results for that test.


    Please test the attached patch.




Vadim


     The strangeness is easier to see if the NUM_POINTS in the

ext_transform_feedback/
order.c are run with smaller values,like 3, 6 and 9. Disable the
code
that fail the test
and print starting_x, shift_reg_final and iteration_count.

Marek, since you implemented transform feedback for r600, do you
think
the issue
is with the tranform feedback code or the shader compiler or some
other
thing?

//Martin
______________________________******_________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org


http://lists.freedesktop.org/******mailman/listinfo/mesa-dev<http://lists.freedesktop.org/****mailman/listinfo/mesa-dev>


<h**ttp://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>



<htt**p://lists.freedesktop.**org/**mailman/listinfo/mesa-**dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>


<http://lists.freedesktop.**org/mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>



    ______________________________****_________________



mesa-dev mailing list
mesa-dev@lists.freedesktop.org


http://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>


<htt**p://lists.freedesktop.org/**mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

Reply via email to