> Where the semantic indicates some relationship to actual system resources, I > agree that the number is constrained by the number of those system resources. > In the case of the gallium "GENERIC" semantic, there is explicitly no system > resource that semantic is referring to and hence no limit on the index.
GENERIC[i] refers to a slot in the output register file of the vertex shader or a slot in the input register file of the fragment shader. It also refers to the interpolator unit that interpolates data between those two registers. Since interpolators are usually available in a finite number and register files also are usually physically limited, there should be a limit on the index. > I feel like we're going in circles here. We agree that we want to improve > linkage, you have a patch that helps your driver, but please accept that it > doesn't solve the wider problem. Yes. In the following I try to write out my reasoning step by step, in the hope of making it clearer and making it easier to both establish which facts we agree are true and pinpoint what we may disagree on. This is a list of steps that are leading me to conclusion that it is best to change the Gallium rules so that semantic indices must be in the range 0..N-1, where is N is the maximum number of simultaneously available registers with that semantic, apply my GLSL patch to fix GLSL, and after doing that, consider extending Gallium by letting the user specify a routing table to link these limited index semantics with something other than an identity mapping. Please tell me which points you find are incorrect, or why one any deduction does not follow from the antecedents. 1. All the semantic indices in OpenGL are limited, according to the ARB specification 2. All the sematic indices in DirectX 9/10 are limited, according to http://msdn.microsoft.com/en-us/library/ee418355%28VS.85%29.aspx 3. In the OpenGL/DirectX 9/10 model, there are a fixed number of interpolators, numbered from 0 to N - 1. Interpolator K reads from vertex shader output register K, interpolates and writes to fragment shader input register K. 4. Some cards (e.g. r300), but not all, allow to configure the vertex shader input register and fragment shader output register that interpolator K reads and writes. 5. Such register inputs are usually offsets in a physical register file, and thus are limited to the physical size of that register file 6. No API exposes the functionality in point 4 and all expose the more rigid model in (3.) 7. Gallium GENERIC is equivalent to OpenGL texcoord and DirectX 9/10 TEXCOORD semantics 8. "texcoord" is called this way because of historical reasons, since fixed pipelines could use the values only for texture sampling. GENERIC is called GENERIC instead of TEXCOORD because Gallium was designed with a programmable pipeline in mind. 9. The current Mesa implementation of ARB_fp/vp translates texcoord[i] to GENERIC[i] 10. fragment.texcoord[K] has K limited by GL_MAX_TEXTURE_COORDS_ARB 11. Because of (9.) and (10.), the current Mesa implementation of ARB_vp/fp uses GENERIC indices limited by GL_MAX_TEXTURE_COORDS_ARB (perhaps plus a very small constant) 12. Because of (2.), a straightforward Gallium DirectX state tracker would also use GENERIC indices limited by the number of interpolators 13. If GLSL did not reserve sematic indices for unused gl_TexCoord[] varyings, but allocated varyings sequentially, then it would use semantic indices sequentially starting from 0 14. My patch implements (13.) 15. The xorg, vega and g3dvl state trackers use GENERIC indices starting from 0 up to 1, 1 and 7 respectively 16. Because of (11.), (12.), (13.), (14.) and (15.), after applying my patch, limiting the value of GENERIC semantic indices to the number of interpolators would not adversely affect Mesa/Gallium functionality in any way, probably including the VMware DirectX state tracker 17. Driver code would be simplified by not having to worry about any register semantic remapping. It will be possible to separately compile fragment and vertex shaders on all hardware. The CPU usage of all drivers will be reduced, especially when switching shaders (a fast path!) 18. Thus, (16.) is a net gain for Gallium, and should go forward [Note: my current nv40 tree does exactly (16.) this and this does not seem to be a source of any problem] Points that lead me to propose a routing table CSO *IN ADDITION* to applying my GLSL patch: 19. Some current 3D APIs (ARB_fp/vp, DX PS, EXT_separate_shader_objects) link vertex and fragment shaders by matching physical register file offset, limited to index N - 1 where N is the maximum number of usage variables (see (3.)) 20. Other 3D APIs (GLSL) link by matching variable name. This forces to have the requirement, in unextended GLSL, to provide both the fragment and vertex shaders at once in the link step 21. No API links by matching abstract unlimited variable number, except some Gallium driver interfaces such as r300 22. It would be desirable to further extend GL_EXT_separate_shader_objects to support linking user-defined varyings *without* needing a link step that sees both vertex and fragment shader at once 23. Binding by unlimited semantic index does not solve (22.), because we want to link by matching variable names, not numbers 24. Turning semantic indices into strings to obviate (23.) is undesirable, because it's obviously something the hardware won't accelerate and the shading language may have more complex linking rules than just strcmp 25. Binding by unlimited semantic index is thus not useful at all. All APIs that can be implemented with that can also work with the more straightforward sequential semantic indices, and the other APIs/functionality cannot be implemented with either 26. If we had the ability to configure interpolator K to read vertex shader input V[K] and write fragment shader output F[K], we could do GLSL binding by name, compiling the shaders once and only changing the routing tables in the link step [this is accomplished by sequentially allocating variables in each shader and then, for each common variable, emitting a routing entry among the values in the dictionary, keyed by the common variable name, associated with each state tracker shader structure] 27. Such routing table functionality is thus greatly superior to binding by unlimited integer semantic indices 28. r300 and maybe other hardware provides exactly that functionality 29. Such routing table functionality would however be unnecessary for the vast majority of OpenGL applications 30. Because of (23.) and (27.), the best solution is to start by changing the Gallium rules so that semantic indices must be in the range 0..N-1, where is N is the maximum number of simultaneously available registers with that semantic, and applying my GLSL patch to fix GSL 31. After doing that, it might be desirable to extend Gallium by letting the user specify a routing table 32. Since the routing table would be specific to the fragment/vertex shader pair, it is desirable to include the shader handles when creating the routing table CSO, so that the driver may better process it 33. It may be desirable to add light_twoside to the routing table, moving it out of the rasterizer 34. It may be desirable to optionally extend light_twoside, by supporting two separate routing tables, one for each face 35. It may be desirable to optionally support swizzling during routing 36. It may be desirable to optionally support routing specific scalar components of vec4 registers, for use with nv50 and other scalar architecture 37. We actually need multiple routing tables. Ideally, an input->vertex shader one, a vertex->geometry shader one, a geometry->fragment one, and a fragment->render targets one. 38. We may also even want some kind of sampler remapping tables, and maybe constant buffer remapping table 39. All this functionality should either have separate caps, or we should figure out if there are some subsets that are always supported together in hardware On limits 40. Most (all?) hardware has a fixed number of interpolators, so the number of different semantic indices is hardware-limited 41. Thus, regardless of whether we allow arbitrary numbers as semantic indices, we have an hardware limit and we could want/need to add a cap for it 42. Hardware may in theory use software/multiplexed hardware interpolation and VRAM storage to support unlimited registers/interpolators 43. Such hardware does not exist, is not common, or is not fast in that configuration This is the reasoning that supports my conclusion. At which steps do you disagree with me? ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev