Reply to all this time...

On Tue, Mar 30, 2010 at 8:13 AM, Marek Olšák <mar...@gmail.com> wrote:
>> > 1) Branching and looping
>> >
>> > This is the most important one and there are 3 things which need to be
>> > done.
>> > * Unrolling loops and converting conditionals to multiplications. This
>> > is
>> > crucial for R3xx-R4xx GLSL support. I don't say it will work in all
>> > cases
>> > but should be fine for the most common ones. This is kind of a standard
>> > in
>> > all proprietary drivers supporting shaders 2.0. It would be nice have it
>> > work with pure TGSI shaders so that drivers like nvfx can reuse it too
>> > and I
>> > personally prefer to have this feature first before going on.
>>
>> Would you be able to provide a small example of how to convert the
>> conditionals to multiplications?  I understand the basic idea is to mask
>> values based on the result of the conditional, but it would help me to see
>> an example.  On IRC, eosie mentioned an alternate technique for emulating
>> conditionals: Save the values of variables that might be affected by
>> the conditional statement.  Then, after executing both the if and the else
>> branches, roll back the variables that were affected by the branch that
>> was not supposed to be taken. Would this technique work as well?
>
> Well, I am eosie, thanks for the info, it's always cool to be reminded what
> I've written on IRC. ;)
>
> Another idea was to convert TGSI to a SSA form. That would make unrolling
> branches much easier as the Phi function would basically become a linear
> interpolation, loops and subroutines with conditional return statements
> might be trickier. The r300 compiler already uses SSA for its optimization
> passes so maybe you wouldn't need to mess with TGSI that much...

Note that my Git repository already contains an implementation of
branch emulation and some additional optimizations, see here:
http://cgit.freedesktop.org/~nh/mesa/log/?h=r300g-glsl

Shame on me for abandoning it - I should really get around to make
sure it fits in with recent changes and merge it to master. The main
problem is that it produces "somewhat" inefficient code. Adding and
improving peephole and similar optimizations should help tremendously.

<snip>
>> > 2) Derivatives instructions fix
>> >
>> > It's implemented but broken. From docs: "If src0 is computed in the
>> > previous
>> > instruction, then a NOP needs to be inserted between the two
>> > instructions.
>> > Do this by setting the NOP flag in the previous instruction. This is not
>> > required if the previous instruction is a texture lookup." .. and that
>> > should be the fix.
>>
>> Is the only problem here that NOP is being inserted after texture
>> lookups when it shouldn't be?
>
> Well the derivatives don't work and NOP is not being inserted anywhere. The
> quoted statement from the docs was supposed to give you a clue. NOP after a
> texture lookup is *not required*, that means it would be just silly to put
> it there but it shouldn't break anything.

I seem to recall that there is a bit in the opcodes to have a NOP
cycle without actually inserting a NOP instruction. This might be more
inefficient. I've never actually tested it.

cu,
Nicolai

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to