On Thu, 2009-07-23 at 16:38 -0700, Zack Rusin wrote:
> On Thursday 23 July 2009 14:50:48 José Fonseca wrote:
> > On Thu, 2009-07-23 at 11:14 -0700, Zack Rusin wrote:
> > > Before anything else the problem of representation needs to solved. The
> > > two step approach that the code in there started on using is again, imho,
> > > by far the best but it likely needs a solid discussion to get everyone on
> > > the same page.
> >
> > I don't think that representation is such a big problem. IMO, gallivm
> > should be just a library of TGSI -> LLVM IR building blocks. For
> > example, the class Instruction should be all virtuals, and a pipe driver
> > would override the methods it wants. LLVM IR -> hardware assembly
> > backend would then be necessary. If a hardware has higher level
> > statements which are not part of LLVM IR, then it should override and
> > generate the intrinsics itself, 
> I thought about that and discarded that for the following reasons:
> 1) it doesn't solve the main/core problem of the representation: how to 
> represent vectors. Without that we can't generate anything. We are dealing 
> with two main architectures here: mimd (e.g. nvidia) and simd (e.g. 
> larrabee), 
> with the latter coming in multiple of permutation. for mimd the prefered 
> layout will be simple AOS (x,y,z,w), for simd it will vector wide SOA (so for 
> larrabee that would be (x,x,x,x, x,x,x,x, x,x,x,x, x,x,x,x). So for SOA we'd 
> need to scale vectors likely at least  between (4 components) (for simple 
> sse) 
> to (16 components). So it's not even certain that our vectors would have 4 
> components.
> 2) It means that the driver would have to be compiled with a c++ compiler. 
> While obviously simply solvable by sprinkling tons of extern "c" everywhere 
> it 
> makes the whole thing a lot uglier.
> 3) It means that Gallium's public interface is a combination of C and C++. So 
> implementing Gallium means: oo C structures (p_context) and C++ classes. 
> Which 
> quite frankly makes it just ugly. Interfaces a lot like indention, if they're 
> not consistent they're just difficult to read, understand and follow.
> So while I do like C++ a lot and would honestly prefer it all over the place, 
> mixing languages like that, especially in an interface is just not a good 
> idea.
> > or even better, generate asm statements directly from those methods.
> That wouldn't work because LLVM wouldn't know what to do with them which 
> would 
> defeat the whole reason for using LLVM (i.e. it would make optimization 
> passes 
> do nothing).
> > Currently, my main interest for LLVM is to speed up softpipe with the
> > TGSI -> SSE2 translation. I'd like to code generate with LLVM the whole
> > function to rasterize and render a triangle (texture sampling, fragment
> > shading, blending, etc). I'm not particular worried about the
> > representation as vanilla LLVM can already represent almost everything
> > needed.
> That sounds great :) 
> For that you don't gallivm at all though. Or do you want to mix the 
> rasterization code with actual shading, i.e. inject the fragment shader into 
> the rasterizer? I'm not sure if the latter would win us anything.
> If the shader wouldn't use any texture mapping then it's possibly that you 
> could get perfect cache-coherency when rasterizing very small patches, but 
> texture sampling will thrash thrash the caches anyway and it's going to make 
> the whole process a lot harder to debug/understand.

There's a lot of choices about how much you incorporate into a single
function -- it seems to be a tradeoff between how much recompilation you
need to do (and how many compiled shaders you need to keep around) vs.
the overhead of function calls between separately compiled blobs.

At the moment we compile the shader but nothing else, which leaves a lot
of scope.  At the moment I'm thinking about an architecture that looks
like generating the following independent functions:

  - shader:
        - parameter interpolation
        - traditional fragment shader
             - ** calls samplers **
        - alpha test
        - depth/stencil test
        - depth write
        - ** call color combine **

  - samplers
        - coordinate wrap
        - texel fetch from tile
        - min/mag/mip filter

  - color combine
        - color read
        - logicop/blend
        - color mask
        - color write

The shader-ztest and color-combine blobs would operate on a list of
quads within a tile, but the sampler functions would probably operate
one quad at a time.

You could argue that the fragment-shader and depth/stencil tests should
be kept in separate blobs so that you avoided just about all
non-orthogonal state in fragment shader compilation.  I'd be ok with
that too -- it would just mean that you call the fragment shader from
within the depth-stencil blob so that the early-z optimization is not
visible from outside:

   - shader
        - parameter interpolation
        - traditional shader
        - ** call sampler functions **

   - depth-stencil-alpha
        - integer z interpolation
        - early z test
        - ** call shader **
        - alpha test
        - late z test
        - depth write
        - ** call color combine **
    - samplers
    - color combine

With either of the above, you don't need to recompile the triangle
function -- it's basically invarient and just calls the current
depth-stencil-shader blob with a list of quads, each which is just

If you process 8 or more quads at a time, any overhead from not
co-generating the triangle and shader functions should go quickly to

At very least the above looks like a fairly managable task -- we can get
there somewhat incrementally.

When we *do* get there, we'll have a pretty fast rasterizer and will be
able to answer the question whether there's further benefit in inlining
all the generated code into single functions.

I suspect that for very simple cases that will be true (eg. gouraud
shading where there effectively is no fragment shader), but for more
complicated shaders there won't be a benefit.  

It's also interesting to try and figure out how to take advantage of
iterative parameter interpolation for things like calculating lambda
when sampling from an unmodified INPUT texcoord.


Mesa3d-dev mailing list

Reply via email to