This patch series is intended to resolve the issue of semantic-based shader 
linkage in Gallium.
It can also be found in the RFC-gallium-semantics branch.

It does not change the current Gallium design, but rather formalizes some 
limitations to it, and provides infrastructure to implement this model more 
easily in drivers, along with a full nv30/nv40 implementation.

These limitations are added to allow an efficient implementation for both 
hardware lacking special support and hardware having support but also special 
constraints.

Note that this does NOT resolve all issues, and there are quite a bit left to 
future refinement.

In particular, the following issues are still open:
1. COLOR clamping (and floating point framebuffers)
2. A linkage table CSO allowing to specify non-identity linkage
3. BCOLOR/FACE-related issues
4. Adding a cap to inform the state tracker that more than 219 generic indices 
are provided

This topic was already very extensively discussed.
See http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg10865.html 
for some early inconclusive discussion around an early implementation that 
modified the GLSL linker (which is NOT being proposed here)
See http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg12016.html 
for some more discussion that seemed to mostly reach a consensus over the 
approach proposed here.
See in particular 
http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg12041.html .

That said, I'm going to try to repeat all information here, partially by 
copy&pasting from earlier messages.
This message should probably be adapted into gallium/docs if/when this is 
accepted.

Here is the short summary; the long rationale follows after it.

The proposal here is to add the following limitations to Gallium, for the 
intermediate semantics:
1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was 
never merged
2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with 
semantic index 0
3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this 
doesn't apply to fragment outputs)
4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is 
not used
5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS 
used
6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those 
that don't need the 0-218 limitation)
7. Some drivers may also choose to support GENERIC with arbitrary indices, but 
that should generally not happen

The reason of this, in short, is that this maps directly to DirectX 9 SM3, 
which is the most problematic interface of all.

The peculiar problem we have here is that we have two competing constraints 
that force us into choosing the exact SM3 value:
1. The VMware SVGA driver must deal with an SM3 host interface and would 
ideally want to directly feed the Gallium semantics to the host
2. An hypotetical DirectX 9 state tracker needs to support SM3 and would 
ideally want to directly feed the SM3 semantics to Gallium

Note that this is not a reference to the VMware DirectX 9 state tracker, since 
its authors haven't provided details about its handling of shader semantics.

SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus 
POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that 
wouldn't be mapped to GENERIC.
However, Gallium drivers that don't benefit from having specific contraints 
(like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work 
does that.

The expected implementation, if no hardware support exists, is to build a list 
of relocations to apply to either the fragment or the vertex shader, and patch 
one of them at validation time to match the other.
Data structures are provided in gallium/auxiliary to ease this, and try to 
minimize the number of times where this needs to be performed.

Let's now proceed to the discussion and detailed rationale, mostly constructed 
by copy&pasting older messages.

===============
Michal Krol's proposal
===============

First of all, see Michal Krol's proposal at 
http://www.opensource-archive.org/showthread.php?t=148573, and in particular:
<<
name index range
----------------------------
POSITION no limit?
COLOR 0..1, explicit clamp?
BCOLOR 0..1, explicit clamp?
FOG remove?
PSIZE 0
GENERIC 0..<max generics>
NORMAL remove
FACE 0
EDGEFLAG 0
PRIMID 0
INSTANCEID 0
>>

My proposal follows this, except for limiting POSITION to 0 too.
Not sure why Michal thought "no limit" could make sense: the POSITION is 
fundamentally a singleton, since it is the input to the rasterizer unit.


======================
An overview of hardware support
======================

Hardware with no capabilities.
- nv30 does not support any mapping. However, we already need to patch
fragment programs to insert constants, so we can patch input register
numbers as well. The current driver only supports 0-7 generic indices,
but I already implemented support for 0-255 indices with in-driver
linkage and patching. Note that nv30 lacks control flow in fragment
programs.
- nv40 is like nv30, but supports fp control flow, and may have some
configurable mapping support, with unknown behavior

Hardware with capabilities that must be configured for each fp/vp pair.
- nv40 might have this but the nVidia OpenGL driver does not use them
- nv50 has configurable vp->gp and gp->fp mappings with 64 entries.
The current Gallium driver seems to support arbitrary 0-2^32 indices, but uses 
an inefficient O(n^2) algorithm to be able to do that

- r300 appears to have a configurable vp->fp mapping. The current
driver only supports 0-15 generic indices, but redefining
ATTR_GENERIC_COUNT could be enough to have it support larger numbers.

Hardware with automatic linkage when semantics match:
- VMWare svga appears to support 14 * 16 semantics, but the current
driver only supports 0-15 generic indices. This could be fixed by
mapping GENERIC into all non-special SM3 semantics.

Hardware that can do both configurable mappings and automatic linkage:
- r600 supports linkage in hardware between matching apparently
byte-sized semantic ids

Other hardware;
- i915 has no hardware vertex shading
The current driver is broken and only supports 0-7 indices: this seems
easy to fix though
- Not sure about i965

===================
An overview of software APIs
===================

1. DirectX 9 SM3 supports indices in the 0-15 range associated with 
semantics in the 0-13 range.

A few of the name/index pairs have special meanings, but the others
are just cosmetic as long as the fixed pipeline is not used.

Thus, SM3 wants to use 14 * 16 indices overall.

Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC
semantics, leaving 219 semantics handled by GENERIC

2. SM2 and non-GLSL OpenGL just want to use as many indices as the
hardware interpolator count, sometimes limiting that further

They are the most easy and straightforward ones.

3. DirectX 10 seems to only require a 0-31 range.

In particular, the fxc.exe compiler allows to specify arbitrary _strings_ and
32-bit indices.

However, this information is encoded as metadata in the output file, and
the shader bytecode itself uses integers in the 0-31 range to refer to the
metadata.

It seems that the metadata is resolved by the Microsoft DirectX 10 runtime,
and the driver only sees 0-31 indices on the DDI interface.

However, this is a bit unclear: confirmation or correction would be
appreciated.

4. GLSL requires to provide both shaders at link time, and thus does
not constrain the implementation in any way.

However, it may be possible to mix GLSL with other shaders, leading to
the need to reserve the texcoord slots.

In that case, GLSL will need about 8 more slots that the number of
effectively used semantics.

This is the case with the current Mesa/Gallium implementation

5. GLSL with EXT_separate_shader_objects does not add requirements
because only gl_TexCoord and other builtin varyings are supported.
User-defined varyings are not supported

See in particular the following text from the extension:
<<
        It is undesirable from a performance standpoint to attempt to
        support "rendezvous by name" for arbitrary separate shaders
        because the separate shaders won't be naturally compiled to
        match their varying inputs and outputs of the same name without
        a special link step.  Such a special link would introduce an
        extra validation overhead to binding separate shaders.  The link
        itself would have to be deferred until glBegin time since separate
        shaders won't match when transitioning from one set of consistent
        shaders to another.  This special link would still create errors
        or undefined behavior when the names of input and output varyings
        matched but their types did not match.
>>

6. An hypotetical version of EXT_separate_shader_objects extended to
support user-defining varyings would either want arbitrary 32-bit
generic indices (by interning strings to generate the indices) or the
ability to specify a custom mapping between shader indices

7. An hypotetical "no-op" implementation of the GLSL linker would have
the same requirement


====================
About non-GENERIC semantics
====================

Also note that non-GENERIC semantics have peculiar properties.

For COLOR and BCOLOR:
1. SM3 and OpenGL with glColorClamp appropriately set wants it to
_not_ be clamped to [0, 1]
2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1]
(sometimes for fixed point targets only) and may also allow using
U8_UNORM precision for it instead of FP32
3. OpenGL allows to enable two-sided lighting, in which case COLOR in
the fragment shader is automagically set to BCOLOR for back faces
4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING.
Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware.
The latest hardware probably supports FACING only.

Any API that requires special semantics for COLOR and BCOLOR (i.e.
non-SM3) seems to only want 0-1 indices.

Note that SM3 does *not* include BCOLOR, so basically the limits for
generic indices would need to be conditional on BCOLOR being present
or not (e.g. if it is present, we must reserve two semantic slots in
svga for it).

POSITION0 is obviously special.
PSIZE0 is also special for points.

FOG0 seems right now to just be a GENERIC with a single component.
Gallium could be extended to support fixed function fog, which most
DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal
to the semantic issue.

==============
Current Gallium users
==============

Right now no open-source users of Gallium fundamentally require arbitrary 
indices.
In particular:
1. GLSL and anything with similar link-by-name can of course be modified to use 
sequential indices
2. ARB fragment program and vertex program use index-limited texcoord slots
3. g3dvl needs and uses 8 texcoord slots, indices 0-7
4. vega and xorg use indices 0-1
5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken
6. The GL_EXT_separate_shader_objects extension does not provide
arbitrary index matching for GLSL, but merely lets it use a model
similar to ARB fp/vp

However, the GLSL linker needs them in its current form, and the capability can 
be generally useful anyway.

===================
Discussion of possible options
===================

[Options from Keith Whitwell, see 
http://www.opensource-archive.org/showthread.php?p=180719]
a) Picking a lower number like 128, that an SM3 state tracker could
usually be able to directly translate incoming semantics into, but which
would force it to renumber under rare circumstances. This would make
life easier for the open drivers at the expense of the closed code.

b) Picking 256 to make life easier for some closed-source SM3 state
tracker, but harder for open drivers.

c) Picking 219 (or some other magic number) that happens to work with
the current set of constraints, but makes gallium fragile in the face of
new constraints.

d) Abandoning the current gallium linkage rules and coming up with
something new, for instance forcing the state trackers to renumber
always and making life trivial for the drivers...

[Options from me]

(e) Allow arbitrary 32-bit indices. This requires slightly more
complicated data structures in some cases, and will require svga and
r600 to fallback to software linkage if numbers are too high.

(f) Limit semantic indices to hardware interpolators _and_ introduce
an interface to let the user specify an

Personally I think the simplest idea for now could be to have all
drivers support 256 indices or, in the case of r600 and svga, the
maximum value supported by the hardware, and expose that as a cap (as
well as another cap for the number of different semantic values
supported at once).
The minimum guaranteed value is set to the lowest hardware constraint,
which would be svga with 219 indices (assuming no bcolor is used).
If some new constraints pop up, we just lower it and change SM3 state
trackers to check for it and fallback otherwise.

This should just require simple fixes to svga and r300, and
significant code for nv30/nv40, which is however already implemented.

Luca Barbieri (5):
  tgsi: formalize limits on semantic indices
  tgsi: add support for packing semantics in SM3 byte values
  gallium/auxiliary: add semantic linkage utility code
  nvfx: support proper shader linkage - adds glsl support
  nvfx: expose GLSL

Michal Krol (1):
  gallium: Remove TGSI_SEMANTIC_NORMAL.


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to