> Way back I actually looked into LLVM for R300. I was totally
> unconvinced by their vector support back then, but that may well have
> changed. In particular, I'm curious about how LLVM deals with
> writemasks. Writing to only a select subsets of components of a vector
> is something I've seen in a lot of shaders, but it doesn't seem to be
> too popular in CPU-bound SSE code, which is probably why LLVM didn't
> support it well. Has that improved?
>
> The trouble with writemasks is that it's not something you can just
> implement one module for. All your optimization passes, from simple
> peephole to the smartest loop modifications need to understand the
> meaning of writemasks.

You should be able to just use
shufflevector/insertelement/extractelement to mix the new computed
values with the previous values in the vector register (as well as
doing swizzles).

There is also the option of immediately scalarizing, optimizing the
scalar code, and then revectorizing.
This risks pessimizing the input code, but might turn out to work well.

> I agree, though if I were to start an LLVM-based compilation project,
> I would do it for R600+, not for R300. That would be a very different
> kind of project.

> A LLVM->TGSI conversion is not the best way to go because TGSI doesn't
> match the hardware all that well, at least in the Radeon family.
> R300-R500 fragment programs have the weird RGB/A split, and R600+ is
> yet another beast that looks quite different from TGSI. So at least
> for Radeon, I believe it would be best to generate hardware-level
> instructions directly from LLVM, possibly via some Radeon-family
> specific intermediate representation.

The advantage of LLVM->TGSI would be that it works with all drivers
without any driver specific code, so it probably makes sense as an
initial step.
nv30/nv40 fragment programs map almost directly to TGSI (with the
addition of condition codes, and half float precision, and a few other
things).
Things that end up using an existing graphics API like vmware svga, or
using the llvm optimizer for game development, also need tgsi-like
output.
Thus, even if TGSI itself becomes irrelevant at some point, any
nontrivial parts of the LLVM->TGSI code should be needed anyway for
those cases.

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to