This is getting off-topic, but anyway...

Luca Barbieri wrote:
>> There are several deep challenges in making TGSI <-> LLVM IR translation
>> lossless -- I'm sure we'll get around to overcome them -- but I don't
>> think that using LLVM is a requirement for this module. Having a shared
>> IR for simple TGSI optimization module would go a long way by itself.
> 
> What are these challenges?

Control flow is hard.  Writing a TGSI backend for LLVM would be a lot 
of work.  Etc.


> If you keep vectors and don't scalarize, I don't see why it shouldn't
> just work, especially if you just roundtrip without running any
> passes.
> The DAG instruction matcher should be able to match writemasks,
> swizzles, etc. fine.
> 
> Control flow may not be exactly reconstructed, but I think LLVM has
> control flow canonicalization that should allow to reconstruct a
> loop/if control flow structure of equivalent efficiency.

LLVM only has branch instructions while GPU instruction sets avoid 
branching and use explicit conditional and loop constructs.  Analyzing 
the LLVM IR branches to reconstruct GPU loops and conditionals isn't easy.


> Using LLVM has the obvious advantage that all optimizations have
> already been written and tested.
> And for complex shaders, you may really need a good full optimizer
> (that can do inter-basic-block and interprocedural optimizations,
> alias analysis, advanced loop optmizations, and so on), especially if
> we start supporting OpenCL over TGSI.
> 
> There is also the option of having the driver directly consume the
> LLVM IR, and the frontend directly produce it (e.g. clang supports
> OpenCL -> LLVM).
> 
> Some things, like inlining, are easy to do directly in TGSI (but only
> because all regs are global).

Inlining isn't always easy.  The Mesa GLSL compiler inlines function 
calls whenever possible.  But there are some tricky cases.  For 
example, if the function we want to inline has deeply nested early 
return statements you have to convert the return statements into 
something else to avoid mistakenly returning from the calling 
function.  The LLVM optimizer may handle this just fine, but 
translating the resulting LLVM IR back to TGSI could be hard (see above).


> However, even determining the minimum number of loop iterations for
> loop unrolling is very hard to do without a full compiler.
> 
> For instance, consider code like this:
> if(foo >= 6)
> {
>   if(foo == 1)
>     iters = foo + 3;
>   else if(bar == 1)
>     iters = foo + 5 + bar;
>   else
>     iters = foo + 7;
> 
>    for(i = 0; i < iters; ++i) LOOP_BODY;
> 
> }
> 
> You need a non-trivial optimizer (with control flow support, value
> range propagation, and constant folding) to find out that the loop
> always executes at least 12 iterations, which you need to know to
> unroll it optimally.
> More complex examples are possible.

Yup, it's hard.


> It general, anything that requires (approximately) determining any
> property of the program potentially benefits from having the most
> complex and powerful optimizer available.

I also think that some optimizations are more effective if they're 
applied at a higher level (in the GLSL compiler, for example).  But 
that's a another topic of conversation.

-Brian

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to