> Where the semantic indicates some relationship to actual system resources, I 
> agree that the number is constrained by the number of those system resources. 
>  In the case of the gallium "GENERIC" semantic, there is explicitly no system 
> resource that semantic is referring to and hence no limit on the index.

GENERIC[i] refers to a slot in the output register file of the vertex
shader or a slot in the input register file of the fragment shader.
It also refers to the interpolator unit that interpolates data between
those two registers.

Since interpolators are usually available in a finite number and
register files also are usually physically limited, there should be a
limit on the index.

> I feel like we're going in circles here.  We agree that we want to improve 
> linkage, you have a patch that helps your driver, but please accept that it 
> doesn't solve the wider problem.

Yes.
In the following I try to write out my reasoning step by step, in the
hope of making it clearer and making it easier to both establish which
facts we agree are true and pinpoint what we may disagree on.

This is a list of steps that are leading me to conclusion that it is
best to change the Gallium rules so that semantic indices must be in
the range 0..N-1, where is N is the maximum number of simultaneously
available registers with that semantic, apply my GLSL patch to fix
GLSL, and after doing that, consider extending Gallium by letting the
user specify a routing table to link these limited index semantics
with something other than an identity mapping.

Please tell me which points you find are incorrect, or why one any
deduction does not follow from the antecedents.

1. All the semantic indices in OpenGL are limited, according to the
ARB specification
2. All the sematic indices in DirectX 9/10 are limited, according to
http://msdn.microsoft.com/en-us/library/ee418355%28VS.85%29.aspx
3. In the OpenGL/DirectX 9/10 model, there are a fixed number of
interpolators, numbered from 0 to N - 1. Interpolator K reads from
vertex shader output register K, interpolates and writes to fragment
shader input register K.
4. Some cards (e.g. r300), but not all, allow to configure the vertex
shader input register and fragment shader output register that
interpolator K reads and writes.
5. Such register inputs are usually offsets in a physical register
file, and thus are limited to the physical size of that register file
6. No API exposes the functionality in point 4 and all expose the more
rigid model in (3.)
7. Gallium GENERIC is equivalent to OpenGL texcoord and DirectX 9/10
TEXCOORD semantics
8. "texcoord" is called this way because of historical reasons, since
fixed pipelines could use the values only for texture sampling.
GENERIC is called GENERIC instead of TEXCOORD because Gallium was
designed with a programmable pipeline in mind.
9. The current Mesa implementation of ARB_fp/vp translates texcoord[i]
to GENERIC[i]
10. fragment.texcoord[K] has K limited by GL_MAX_TEXTURE_COORDS_ARB
11. Because of (9.) and (10.), the current Mesa implementation of
ARB_vp/fp uses GENERIC indices limited by GL_MAX_TEXTURE_COORDS_ARB
(perhaps plus a very small constant)
12. Because of (2.), a straightforward Gallium DirectX state tracker
would also use GENERIC indices limited by the number of interpolators
13. If GLSL did not reserve sematic indices for unused gl_TexCoord[]
varyings, but allocated varyings sequentially, then it would use
semantic indices sequentially starting from 0
14. My patch implements (13.)
15. The xorg, vega and g3dvl state trackers use GENERIC indices
starting from 0 up to 1, 1 and 7 respectively
16. Because of (11.), (12.), (13.), (14.) and (15.), after applying my
patch, limiting the value of GENERIC semantic indices to the number of
interpolators would not adversely affect Mesa/Gallium functionality in
any way, probably including the VMware DirectX state tracker
17. Driver code would be simplified by not having to worry about any
register semantic remapping. It will be possible to separately compile
fragment and vertex shaders on all hardware. The CPU usage of all
drivers will be reduced, especially when switching shaders (a fast
path!)
18. Thus, (16.) is a net gain for Gallium, and should go forward

[Note: my current nv40 tree does exactly (16.) this and this does not
seem to be a source of any problem]

Points that lead me to propose a routing table CSO *IN ADDITION* to
applying my GLSL patch:

19. Some current 3D APIs (ARB_fp/vp, DX PS,
EXT_separate_shader_objects) link vertex and fragment shaders by
matching physical register file offset, limited to index N - 1 where N
is the maximum number of usage variables (see (3.))
20. Other 3D APIs (GLSL) link by matching variable name. This forces
to have the requirement, in unextended GLSL, to provide both the
fragment and vertex shaders at once in the link step
21. No API links by matching abstract unlimited variable number,
except some Gallium driver interfaces such as r300
22. It would be desirable to further extend
GL_EXT_separate_shader_objects to support linking user-defined
varyings *without* needing a link step that sees both vertex and
fragment shader at once
23. Binding by unlimited semantic index does not solve (22.), because
we want to link by matching variable names, not numbers
24. Turning semantic indices into strings to obviate (23.) is
undesirable, because it's obviously something the hardware won't
accelerate and the shading language may have more complex linking
rules than just strcmp
25. Binding by unlimited semantic index is thus not useful at all. All
APIs that can be implemented with that can also work with the more
straightforward sequential semantic indices, and the other
APIs/functionality cannot be implemented with either
26. If we had the ability to configure interpolator K to read vertex
shader input V[K]  and write fragment shader output F[K], we could do
GLSL binding by name, compiling the shaders once and only changing the
routing tables in the link step
[this is accomplished by sequentially allocating variables in each
shader and then, for each common variable, emitting a routing entry
among the values in the dictionary, keyed by the common variable name,
associated with each state tracker shader structure]
27. Such routing table functionality is thus greatly superior to
binding by unlimited integer semantic indices
28. r300 and maybe other hardware provides exactly that functionality
29. Such routing table functionality would however be unnecessary for
the vast majority of OpenGL applications
30. Because of (23.) and (27.), the best solution is to start by
changing the Gallium rules so that semantic indices must be in the
range 0..N-1, where is N is the maximum number of simultaneously
available registers with that semantic, and applying my GLSL patch to
fix GSL
31. After doing that, it might be desirable to extend Gallium by
letting the user specify a routing table
32. Since the routing table would be specific to the fragment/vertex
shader pair, it is desirable to include the shader handles when
creating the routing table CSO, so that the driver may better process
it
33. It may be desirable to add light_twoside to the routing table,
moving it out of the rasterizer
34. It may be desirable to optionally extend light_twoside, by
supporting two separate routing tables, one for each face
35. It may be desirable to optionally support swizzling during routing
36. It may be desirable to optionally support routing specific scalar
components of vec4 registers, for use with nv50 and other scalar
architecture
37. We actually need multiple routing tables. Ideally, an
input->vertex shader one, a vertex->geometry shader one, a
geometry->fragment one, and a fragment->render targets one.
38. We may also even want some kind of sampler remapping tables, and
maybe constant buffer remapping table
39. All this functionality should either have separate caps, or we
should figure out if there are some subsets that are always supported
together in hardware

On limits

40. Most (all?) hardware has a fixed number of interpolators, so the
number of different semantic indices is hardware-limited
41. Thus, regardless of whether we allow arbitrary numbers as semantic
indices, we have an hardware limit and we could want/need to add a cap
for it
42. Hardware may in theory use software/multiplexed hardware
interpolation and VRAM storage to support unlimited
registers/interpolators
43. Such hardware does not exist, is not common, or is not fast in
that configuration

This is the reasoning that supports my conclusion.
At which steps do you disagree with me?

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to