Re: [Mesa3d-dev] GSOC: Gallium R300 driver

Luca Barbieri Tue, 30 Mar 2010 12:01:36 -0700

> On Tue, 2010-03-30 at 09:52 -0700, Luca Barbieri wrote:
>> > There are several deep challenges in making TGSI <-> LLVM IR translation
>> > lossless -- I'm sure we'll get around to overcome them -- but I don't
>> > think that using LLVM is a requirement for this module. Having a shared
>> > IR for simple TGSI optimization module would go a long way by itself.
>>
>> What are these challenges?
>
> - Control flow as you mentioned -- gets broken into jump spaghetti.


LoopSimplify seems to do at least some of the work for loops.

Not sure if there is an if-construction pass, but it should be relatively easy.

Once you have an acyclic CFG subgraph (which hopefully LoopSimplify
easily gives you), every basic block with more than one outedge will
need to have an if/else block generated.

Now find the first block in topological sort order such that any path
from the if start block reaches that block before any later ones in
topological sort order.
I think this is called the forward dominator, and LLVM should have
analysis that gives you that easily.

After that, just duplicate the CFG between the if block start and the
forward dominator to build each branch of the if, and recursively
process the branches.

If you have a DDX/DDY present in multiple if parts, you are screwed,
but that won't happen without optimization and hopefully you can tune
fragment program optimization so that doesn't happen at all.

> - Predicates can't be represented -- you need to use AND / NAND masking.
> I know people have asked support for this in the LLVM list so it might
> change someday.

For the LLVM->TGSI part, x86 has condition codes.
Not sure how LLVM represents them, but I suppose predicates can be
handled in the same way

Multiple predicate registers may not work well, but GPUs probably
don't have them in hardware anyway (e.g. nv30/nv40 only have one or
two).

For the TGSI->LLVM part, Mesa never outputs predicates afaik.

> - missing intrinsics -- TGSI has a much richer instruction set than
> LLVM's builtin instructions, so it would be necessary to add several
> llvm.tgsi.xxx instrinsics (e.g., for max, min, madd, exp2, log2, etc),
> and teach LLVM to do constant propagation for every single one of them.

Yes, of course.
Initially you could do without constant propagation.
Also again x86/SSE has many of the same intrinsics, so their approach
can be imitated.

I think MAD can be handled by mul + add, if you don't care about
whether an extra rounding is done or not (and I think, for GPU
shaders, it's not really a high priority issue).
Anyway SSE5 has fused multiply/add, so LLVM has/will have a way.

> - Constants -- you often want to make specialized version of shaders for
> certain constants, especially when you have control flow statements
> whose arguments are constants (e.g., when doing TNL with a big glsl
> shader), and therefore should be factored out. You also may want to do
> factor out constant operations (e.g., MUL TEMP[1], CONST[0], CONST[1])
> But LLVM can't help you with that given that for LLVM IR constants are
> ordinary memory, like the inputs. LLVM doesn't know that a shader will
> be invoked million of times with the same constants but varying inputs.

If you want to do that, you must of course run LLVM for each constant
set, telling it what the constant values are.
You can probably identify branch-relevant constant from the LLVM SSA
form to restrict that set.

For the MUL TEMP[1], CONST[0], CONST[1], I suppose you could enclose
the shader code in a big loop to simulate the rasterizer.

LLVM will them move the CONST[0] * CONST[1] outside the loop, and you
can codegen the part outside the loop using an LLVM CPU backend.

In this case, using LLVM will give you automatic "pre-shader"
generation for the CPU mostly for free.

Alternatively, you could have a basic IF-simplifier on TGSI that only
supports the conditional being the comparison of a constant to
something else (using the rasterizer loop trick can allow you to get
simpler conditionals).

> If people can make this TGSI optimization module work quickly on top of
> LLVM then it's fine by me. I'm just pointing out that between the
> extreme of sharing nothing between each pipe driver compiler, and
> sharing everything with LLVM, there's a middle ground which is sharing
> between pipe drivers but not LLVM.  Once that module exists having it
> use LLVM internally would then be pretty easy.  It looks to me a better
> way to parallize the effort than to be blocked for quite some time on
> making TGSI <-> LLVM IR be lossless.

Yes, sure, a minimal module can be written first and then LLVM use can
be investigated later.

In other words, it's not necessarily trivial, but definitely seems doable.
In particular getting it to work on  anything non-GLSL should be
relatively straightforward.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Re: [Mesa3d-dev] GSOC: Gallium R300 driver

Reply via email to