> On Tue, 2010-03-30 at 09:52 -0700, Luca Barbieri wrote: >> > There are several deep challenges in making TGSI <-> LLVM IR translation >> > lossless -- I'm sure we'll get around to overcome them -- but I don't >> > think that using LLVM is a requirement for this module. Having a shared >> > IR for simple TGSI optimization module would go a long way by itself. >> >> What are these challenges? > > - Control flow as you mentioned -- gets broken into jump spaghetti.
LoopSimplify seems to do at least some of the work for loops. Not sure if there is an if-construction pass, but it should be relatively easy. Once you have an acyclic CFG subgraph (which hopefully LoopSimplify easily gives you), every basic block with more than one outedge will need to have an if/else block generated. Now find the first block in topological sort order such that any path from the if start block reaches that block before any later ones in topological sort order. I think this is called the forward dominator, and LLVM should have analysis that gives you that easily. After that, just duplicate the CFG between the if block start and the forward dominator to build each branch of the if, and recursively process the branches. If you have a DDX/DDY present in multiple if parts, you are screwed, but that won't happen without optimization and hopefully you can tune fragment program optimization so that doesn't happen at all. > - Predicates can't be represented -- you need to use AND / NAND masking. > I know people have asked support for this in the LLVM list so it might > change someday. For the LLVM->TGSI part, x86 has condition codes. Not sure how LLVM represents them, but I suppose predicates can be handled in the same way Multiple predicate registers may not work well, but GPUs probably don't have them in hardware anyway (e.g. nv30/nv40 only have one or two). For the TGSI->LLVM part, Mesa never outputs predicates afaik. > - missing intrinsics -- TGSI has a much richer instruction set than > LLVM's builtin instructions, so it would be necessary to add several > llvm.tgsi.xxx instrinsics (e.g., for max, min, madd, exp2, log2, etc), > and teach LLVM to do constant propagation for every single one of them. Yes, of course. Initially you could do without constant propagation. Also again x86/SSE has many of the same intrinsics, so their approach can be imitated. I think MAD can be handled by mul + add, if you don't care about whether an extra rounding is done or not (and I think, for GPU shaders, it's not really a high priority issue). Anyway SSE5 has fused multiply/add, so LLVM has/will have a way. > - Constants -- you often want to make specialized version of shaders for > certain constants, especially when you have control flow statements > whose arguments are constants (e.g., when doing TNL with a big glsl > shader), and therefore should be factored out. You also may want to do > factor out constant operations (e.g., MUL TEMP[1], CONST[0], CONST[1]) > But LLVM can't help you with that given that for LLVM IR constants are > ordinary memory, like the inputs. LLVM doesn't know that a shader will > be invoked million of times with the same constants but varying inputs. If you want to do that, you must of course run LLVM for each constant set, telling it what the constant values are. You can probably identify branch-relevant constant from the LLVM SSA form to restrict that set. For the MUL TEMP[1], CONST[0], CONST[1], I suppose you could enclose the shader code in a big loop to simulate the rasterizer. LLVM will them move the CONST[0] * CONST[1] outside the loop, and you can codegen the part outside the loop using an LLVM CPU backend. In this case, using LLVM will give you automatic "pre-shader" generation for the CPU mostly for free. Alternatively, you could have a basic IF-simplifier on TGSI that only supports the conditional being the comparison of a constant to something else (using the rasterizer loop trick can allow you to get simpler conditionals). > If people can make this TGSI optimization module work quickly on top of > LLVM then it's fine by me. I'm just pointing out that between the > extreme of sharing nothing between each pipe driver compiler, and > sharing everything with LLVM, there's a middle ground which is sharing > between pipe drivers but not LLVM. Once that module exists having it > use LLVM internally would then be pretty easy. It looks to me a better > way to parallize the effort than to be blocked for quite some time on > making TGSI <-> LLVM IR be lossless. Yes, sure, a minimal module can be written first and then LLVM use can be investigated later. In other words, it's not necessarily trivial, but definitely seems doable. In particular getting it to work on anything non-GLSL should be relatively straightforward. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev