> Way back I actually looked into LLVM for R300. I was totally > unconvinced by their vector support back then, but that may well have > changed. In particular, I'm curious about how LLVM deals with > writemasks. Writing to only a select subsets of components of a vector > is something I've seen in a lot of shaders, but it doesn't seem to be > too popular in CPU-bound SSE code, which is probably why LLVM didn't > support it well. Has that improved? > > The trouble with writemasks is that it's not something you can just > implement one module for. All your optimization passes, from simple > peephole to the smartest loop modifications need to understand the > meaning of writemasks.
You should be able to just use shufflevector/insertelement/extractelement to mix the new computed values with the previous values in the vector register (as well as doing swizzles). There is also the option of immediately scalarizing, optimizing the scalar code, and then revectorizing. This risks pessimizing the input code, but might turn out to work well. > I agree, though if I were to start an LLVM-based compilation project, > I would do it for R600+, not for R300. That would be a very different > kind of project. > A LLVM->TGSI conversion is not the best way to go because TGSI doesn't > match the hardware all that well, at least in the Radeon family. > R300-R500 fragment programs have the weird RGB/A split, and R600+ is > yet another beast that looks quite different from TGSI. So at least > for Radeon, I believe it would be best to generate hardware-level > instructions directly from LLVM, possibly via some Radeon-family > specific intermediate representation. The advantage of LLVM->TGSI would be that it works with all drivers without any driver specific code, so it probably makes sense as an initial step. nv30/nv40 fragment programs map almost directly to TGSI (with the addition of condition codes, and half float precision, and a few other things). Things that end up using an existing graphics API like vmware svga, or using the llvm optimizer for game development, also need tgsi-like output. Thus, even if TGSI itself becomes irrelevant at some point, any nontrivial parts of the LLVM->TGSI code should be needed anyway for those cases. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev