Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
I propose that the following language be added to the spec: """ Gallium has no explicit mechanism for linking shaders. Shaders are implicitly linked in a pipeline at render time. Thus, linking must not fail and the pipe driver is permitted to change shader semantics to preserve linking. If a shader uses non-consecutive semantics, the pipe driver may remap them to be consecutive, within the bounds of other currently bound state. For example, if a shader samples from samplers 3 and 5, but the only samplers bound are 0 and 1, the driver may adjust the shader to sample from samplers 0 and 1 instead. If the bound shader pipeline does not have consistent outputs and inputs, the driver is permitted to remap them to be consistent through a driver-specific mechanism. For example, if a vertex shader writes generic 1, a fragment shader references generic 0, and there is no geometry shader bound, the pipe driver may change the vertex shader to write generic 0, or it may use a driver-specific internal routing table to route generic 1 to generic 0. If the total count of each semantic attribute written by the vertex shader does not match the counts read by the fragment shader, the driver may discard extraneous attributes and the values of missing attributes are undefined. If a geometry shader is present, this adjustment may happen on the vertex to geometry linking and geometry to fragment linking. """ I think this about covers it. This language seems *very* reasonable to me, while not totally ignoring the problems faced by nv50. ~ C. On Mon, Feb 1, 2010 at 7:29 AM, Luca Barbieri wrote: >> I can't really use a routing table state to produce a cso, because the hw >> routing table I generate depends on rasterizer state, e.g. I must not >> put in back face colour (we have a 2 to 1 mapping here) if twoside >> is disabled. >> >> Also, I'm routing based on the scalar *components* the FP reads, >> not whole TGSI pseudo vec4 registers (NUM_INTERPOLATORS will >> thus be inaccurate) - set_routing_table will have to pass me the >> respective programs too. >> Well, I can still use the cso and insert it into the rest of the routing >> table that still need to be assembled on the fly, I did that before the >> 1:1 mapping between FP and VP regs was removed. > > You are right, the routing table CSO needs to contain the fragment and > vertex shader handles, and ideally light_twoside should be moved to > the vertex->fragment routing table since it is really an attribute of > that and not polygon rasterization/setup. > > You can then just look at your internal data structure and construct a > scalar routing table from the vec4 one provided by Gallium. > > We could also, as a further extension, support scalar routing tables > directly in Gallium. > Note however that radeon hardware presumably only supports vector > ones, so we would need all 3 options with caps. > A further intermediate step could be vector routing tables with swizzling. > > -- > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > ___ > Mesa3d-dev mailing list > Mesa3d-dev@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > -- Only fools are easily impressed by what is only barely beyond their reach. ~ Unknown Corbin Simpson -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
> I can't really use a routing table state to produce a cso, because the hw > routing table I generate depends on rasterizer state, e.g. I must not > put in back face colour (we have a 2 to 1 mapping here) if twoside > is disabled. > > Also, I'm routing based on the scalar *components* the FP reads, > not whole TGSI pseudo vec4 registers (NUM_INTERPOLATORS will > thus be inaccurate) - set_routing_table will have to pass me the > respective programs too. > Well, I can still use the cso and insert it into the rest of the routing > table that still need to be assembled on the fly, I did that before the > 1:1 mapping between FP and VP regs was removed. You are right, the routing table CSO needs to contain the fragment and vertex shader handles, and ideally light_twoside should be moved to the vertex->fragment routing table since it is really an attribute of that and not polygon rasterization/setup. You can then just look at your internal data structure and construct a scalar routing table from the vec4 one provided by Gallium. We could also, as a further extension, support scalar routing tables directly in Gallium. Note however that radeon hardware presumably only supports vector ones, so we would need all 3 options with caps. A further intermediate step could be vector routing tables with swizzling. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On 01.02.2010 15:32, Luca Barbieri wrote: > An overview of the possible options. > Let's call vertex shader outputs "v" and fragment shader inputs "f" > Let v -> f mean that v connects to f. > NUM_INTERPOLATORS is the number of available interpolators. It is > usually between 8 and 32. > > 1. Current Gallium > v -> f if and only if v == f > Any values of v and f are legal > > 2. My proposal, basic version > v -> f if and only if v == f > v >= NUM_INTERPOLATORS and f >= NUM_INTERPOLATORS are illegal > > 3. My proposal plus with routing tables (Corbin Simpson's idea) > v -> f if and only if (v, f) is an entry in the routing table > v >= NUM_INTERPOLATORS and f >= NUM_INTERPOLATORS are illegal > If no routing table is set, (2) is used instead > The routing table is set via a new ->set_routing_table() entry point, > taking the processor linkage affected, and an array of pairs of > registers to link. > This may be made a CSO (it is constant for each pair of state tracker > shaders). > > I can't really use a routing table state to produce a cso, because the hw routing table I generate depends on rasterizer state, e.g. I must not put in back face colour (we have a 2 to 1 mapping here) if twoside is disabled. Also, I'm routing based on the scalar *components* the FP reads, not whole TGSI pseudo vec4 registers (NUM_INTERPOLATORS will thus be inaccurate) - set_routing_table will have to pass me the respective programs too. Well, I can still use the cso and insert it into the rest of the routing table that still need to be assembled on the fly, I did that before the 1:1 mapping between FP and VP regs was removed. On a sidenote, if a VP output is not written, I map it to 0 or 1 (4th component); FOG has all 4 components written in VP thus yzw don't become 0 or 1 in FP as is expected; I can adjust the write mask myself though ... Christoph > The problem with (1) is that if the hardware has no routing support, > there is no way to implement it except by recompiling either the > vertex or fragment shader when the other changes. > > Furthermore, no known user of Gallium actually needs (1) except the > current GLSL linker, but only due to the particular implementation and > not due to the OpenGL/GLSL API itself. > > Only (2) is actually needed by the APIs, and it is also supported > trivially by all hardware. > > > Additionally there is no way to implement "bind by name" (i.e. having > GLSL varyings with the same name be linked) with either (1) or (2) > without recompiling one shader in response to changing the other. > > With (3) instead, the state tracker can look at the symbol tables of > both programs, build a routing table, and set it independently of the > shaders. Functionality provided by (1) can also be emulated in this > way. > > > Thus, I propose replacing (1) with (2) and later expand to (3) if we > desire to do so. > > I think that in particular, the "bind by name" argument is decisive, > since binding by name is what GLSL and high level languages really > want, and the added flexibility of (1) is useless for that, while > coming at significant driver complexity and performance cost. > > What do you think? > > -- > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > ___ > Mesa3d-dev mailing list > Mesa3d-dev@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Mon, Feb 1, 2010 at 3:38 PM, Keith Whitwell wrote: > This seems like a very different idea of semantics. These aren't intended to > be hardware resources, and there is no concept of querying the driver to > figure out how many the hardware supports. Further, the indices for > different semantic names are considered to be disjoint, permitting FOG[0], > COLOR[0] and GENERIC[0], each of which would potentially consume an > interpolator on some hardware, subverting the idea of a maximum semantic > index. There would be a maximum semantic index for each semantic type. Note this is exactly like the existing OpenGL limit on fragment.texcoord[i] and how ARBfp/vp work. All APIs as far as I know have such limits, simply because they indeed refer to hardware resources. Why shouldn't Gallium semantic indices refer to hardware resources too? What is the advantage of using abstract identifiers that the driver needs to map, when no API needs those, and when there is no use for them? (except for slightly simplifying the GLSL implementation at the expense of greater complexity in all drivers) > I think if you want to improve linkage semantics, some of your other > suggestions are more promising. I'd like to dig into those a little more if > that's ok. Yes, sure. Are your referring to the routing tables idea? (actually initially suggested by Corbin Simpson) -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
This seems like a very different idea of semantics. These aren't intended to be hardware resources, and there is no concept of querying the driver to figure out how many the hardware supports. Further, the indices for different semantic names are considered to be disjoint, permitting FOG[0], COLOR[0] and GENERIC[0], each of which would potentially consume an interpolator on some hardware, subverting the idea of a maximum semantic index. There's currently no concept of driver-specified maximum semantic indexes, and I don't think it's the right direction to be taking things. If we are going to have a limit, it would be on the total number of inputs/outputs from a given stage, not the way that individual ones are labelled. I think if you want to improve linkage semantics, some of your other suggestions are more promising. I'd like to dig into those a little more if that's ok. Keith From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca Barbieri [l...@luca-barbieri.com] Sent: Monday, February 01, 2010 6:11 AM To: Keith Whitwell Cc: Brian Paul; mesa3d-dev@lists.sourceforge.net Subject: Re: [PATCH] glsl: put varyings in texcoord slots > In GL, there doesn't seem to be a requirement for sequential usage - an app > using ARB_vp/fp could explicitly pass TEXCOORD[10] and ignore 0..9 if it > wanted to. In ARB_vp, that effectively means the shader would be using > discontiguous register numbers, ie OUTPUT[0], OUTPUT[10], etc. Yes, but TEXCOORD[10] will only work if the hardware supports 11 textures. > > In DX9, there is also no requirement for sequential usage - an app can use > TEXCOORD[1] without [0], or COLOR[0] and [2] without [1] or [3]. I would > expect any DX9 gallium state tracker would also end up using non-sequential > indices if it directly translated DX9 semantics to gallium. I think there is some terminology confusion. I'm not proposing to require indices to be "sequential" in the way you are describing. By "sequential" I mean that if the hardware supports 8 interpolators, then the available semantic indices should run from 0 to 7. Thus, the *available* semantic indices are sequential, but the user is free to use any subset of them. So you can start with OUTPUT[10], but only if at least 11 interpolators are supported (so that the used set {10} is a subset of the available set [0-10]). -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
An overview of the possible options. Let's call vertex shader outputs "v" and fragment shader inputs "f" Let v -> f mean that v connects to f. NUM_INTERPOLATORS is the number of available interpolators. It is usually between 8 and 32. 1. Current Gallium v -> f if and only if v == f Any values of v and f are legal 2. My proposal, basic version v -> f if and only if v == f v >= NUM_INTERPOLATORS and f >= NUM_INTERPOLATORS are illegal 3. My proposal plus with routing tables (Corbin Simpson's idea) v -> f if and only if (v, f) is an entry in the routing table v >= NUM_INTERPOLATORS and f >= NUM_INTERPOLATORS are illegal If no routing table is set, (2) is used instead The routing table is set via a new ->set_routing_table() entry point, taking the processor linkage affected, and an array of pairs of registers to link. This may be made a CSO (it is constant for each pair of state tracker shaders). The problem with (1) is that if the hardware has no routing support, there is no way to implement it except by recompiling either the vertex or fragment shader when the other changes. Furthermore, no known user of Gallium actually needs (1) except the current GLSL linker, but only due to the particular implementation and not due to the OpenGL/GLSL API itself. Only (2) is actually needed by the APIs, and it is also supported trivially by all hardware. Additionally there is no way to implement "bind by name" (i.e. having GLSL varyings with the same name be linked) with either (1) or (2) without recompiling one shader in response to changing the other. With (3) instead, the state tracker can look at the symbol tables of both programs, build a routing table, and set it independently of the shaders. Functionality provided by (1) can also be emulated in this way. Thus, I propose replacing (1) with (2) and later expand to (3) if we desire to do so. I think that in particular, the "bind by name" argument is decisive, since binding by name is what GLSL and high level languages really want, and the added flexibility of (1) is useless for that, while coming at significant driver complexity and performance cost. What do you think? -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
> In GL, there doesn't seem to be a requirement for sequential usage - an app > using ARB_vp/fp could explicitly pass TEXCOORD[10] and ignore 0..9 if it > wanted to. In ARB_vp, that effectively means the shader would be using > discontiguous register numbers, ie OUTPUT[0], OUTPUT[10], etc. Yes, but TEXCOORD[10] will only work if the hardware supports 11 textures. > > In DX9, there is also no requirement for sequential usage - an app can use > TEXCOORD[1] without [0], or COLOR[0] and [2] without [1] or [3]. I would > expect any DX9 gallium state tracker would also end up using non-sequential > indices if it directly translated DX9 semantics to gallium. I think there is some terminology confusion. I'm not proposing to require indices to be "sequential" in the way you are describing. By "sequential" I mean that if the hardware supports 8 interpolators, then the available semantic indices should run from 0 to 7. Thus, the *available* semantic indices are sequential, but the user is free to use any subset of them. So you can start with OUTPUT[10], but only if at least 11 interpolators are supported (so that the used set {10} is a subset of the available set [0-10]). > With quote: > -- > In Direct3D 10, adjacent stages effectively share a register array, where the > output shader (or pipeline stage) writes data to specific locations in the > register array and the input shader must read from the same locations. The > API uses shader signatures to bind shader outputs with inputs without the > overhead of semantic resolution. > -- > > So right now, we seem to actually be close to the DX9 and GL models. If we > were going to change, I'd suggest moving closer to DX10 > rather than coming > up with some new way of doing things. According to the document you quoted, it seems the difference is that in DX10 vertex and fragment shaders must match exactly, i.e. no vertex shader output may be left unused and no fragment shader input may be default zero/undefined. We can't go with the DX10 model because it will break GL. We should however decide whether fragment inputs not written by the fragment shaders are zero or undefined, and make sure all drivers correctly implement this. What I'm proposing, excluding the routing table stuff, is exactly the idea of having output shader and input shader read/write from a (conceptual) register array. If you have 8 interpolators, you use vertex outputs 0-7 and fragment inputs 0-7 with identity mapping. If we want to add routing support, I suggested adding the ability to explicitly set a mapping from shader outputs to shader inputs as a set of pairs. Unlike the current method, this allows to layer GLSL-like match by name semantics over it, which is impossible which the current proposal, since you can't map strings to integer, and you can't specify routing separately from shaders. > Ultimately, it shouldn't be all that important as everybody should be running > these shaders through a proper optimizer for their hardware, which should > have no trouble translating this stuff into any format it likes. This is only true if the optimizer works on both the vertex and fragment shaders together, which is something that would be nice to avoid (at least, avoid _having_ to do it, it might possibly be beneficial sometimes). If they are optimized/compiled separately, the routing and semantic index choice is fundamental. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
Luca, In GL, there doesn't seem to be a requirement for sequential usage - an app using ARB_vp/fp could explicitly pass TEXCOORD[10] and ignore 0..9 if it wanted to. In ARB_vp, that effectively means the shader would be using discontiguous register numbers, ie OUTPUT[0], OUTPUT[10], etc. In DX9, there is also no requirement for sequential usage - an app can use TEXCOORD[1] without [0], or COLOR[0] and [2] without [1] or [3]. I would expect any DX9 gallium state tracker would also end up using non-sequential indices if it directly translated DX9 semantics to gallium. In DX10, it seems to be the same, but there are additional changes to simplify things. Here's a link to some msdn about DX9 and DX10 semantics: http://msdn.microsoft.com/en-us/library/ee415668%28VS.85%29.aspx#Porting_Shaders With quote: -- A semantic is a string attached to a shader input or output that conveys information about the intended use of a parameter. Semantics are required on all variables passed between shader stages. The syntax for adding a semantic to a shader variable is shown here (Variable Syntax (DirectX HLSL)). In general, data passed between pipeline stages is completely generic and is not uniquely interpreted by the system; arbitrary semantics are allowed which have no special meaning. Parameters (in Direct3D 10) which contain these special semantics are referred to as System-Value Semantics. -- This is fairly close to gallium. It's hard to tell what goes on at the assembly level in DX10, as HLSL is the public interface. But it seems that DX10 goes further than gallium, and also matches based on register number, ie PS.input[0] always corresponds to VS.output[0], though I'm not sure how that works in the face of things like the "face" register which appears for the first time in the PS. Here's a link: http://msdn.microsoft.com/en-us/library/ee418358%28VS.85%29.aspx With quote: -- In Direct3D 10, adjacent stages effectively share a register array, where the output shader (or pipeline stage) writes data to specific locations in the register array and the input shader must read from the same locations. The API uses shader signatures to bind shader outputs with inputs without the overhead of semantic resolution. -- So right now, we seem to actually be close to the DX9 and GL models. If we were going to change, I'd suggest moving closer to DX10 rather than coming up with some new way of doing things. Ultimately, it shouldn't be all that important as everybody should be running these shaders through a proper optimizer for their hardware, which should have no trouble translating this stuff into any format it likes. Keith From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca Barbieri [l...@luca-barbieri.com] Sent: Thursday, January 28, 2010 11:48 PM To: Brian Paul; Keith Whitwell Cc: Luca Barbieri; mesa3d-dev@lists.sourceforge.net Subject: Re: [PATCH] glsl: put varyings in texcoord slots I'd like to have some more definitive review comments on this patch (sending to Brian and Keith for this). Right now GLSL is the *only* Gallium user that does not use sequential indexes starting from 0 for vertex shader outputs and fragment shader inputs. This causes problems for some drivers such as nv30/nv40 that don't remap the indexes right now. This can be addressed in two ways: 1. Don't require Gallium users to use sequential indices, and require vertex shader inputs and fragment shader outputs to match perfectly 2. Don't require Gallium users to use sequential indices, and change nv30/nv40 and possibly other drivers to remap indices 3. Fix the only problematic user, GLSL, to use sequential indices (1) will break the Mesa state tracker in a very hard to fix way. (2) is complex and means that nv30/nv40 and maybe other drivers can no longer compile vertex and fragment shaders independently. (3) is a simple fix, provided by this patch. I feel that (3), implemented by this patch, is the best solution, since driver simplicity is one of the Gallium design goals, and I don't see any significant advantages in supporting discontiguous vertex shader output / fragment shader input values. OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for vertex shader outputs and fragment shader inputs, and instead require 0-7, 0-15 or 0-31 depending on feature level. If this is wrong, please correct me. I propose that Gallium should also require 0-x indices and not arbitrary values. Thus, GLSL should be fixed to respect that. Note that this change cannot be done in the state tracker because it requires to see both the fragment and vertex shaders at once, which only happens in the GLSL linker. Thus, while the change has been discussed with Gallium in mind, it is done at the Mesa program level, and it actually results in Mesa programs with contiguous indices. This also
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Fri, Jan 29, 2010 at 11:09 PM, Corbin Simpson wrote: > I would say that the routing table really needs to be handled by the > driver implicitly. When you're told to draw things, you do your shader > routing/linking before you draw. > > If the routing table really does depend on the shaders and only the > shaders, then I wouldn't be opposed to some CSO, but I don't think > it's horribly necessary. The problem with implicit handling is that you can't match by name. You can have the driver automatically bind vs output 34134 to fs input 34134, but that isn't really helpful if you want to bind vs output variable "foo" to fs input variable "foo". And this is what GLSL and HLSL really want. Of course one could replace the semantic index integer with a string, but at this point hardware clearly won't accelerate that, so it seems better to have the state tracker send an explicit list of (vs output, fs input) pairs. It could be a CSO that the state tracker keeps indexed by (vs, fs) pair. However note that, unless I'm mistaken, no state tracker _needs_ to expose this functionality right now. We could however expose it as an additional GLSL extension based on GL_EXT_separate_shader_objects, that would allow to link by user-defined varying values too We could also use it to make GLSL linking a no-op on hardware supporting routing tables. In other words, it seems to me that this approach could both fix nv30/nv40, simplify drivers and provide additional, useful, functionality. [As already mentioned, we have geometry shaders too so probably it's actually going to be (vs, gs) and (gs, fs)] -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
I would say that the routing table really needs to be handled by the driver implicitly. When you're told to draw things, you do your shader routing/linking before you draw. If the routing table really does depend on the shaders and only the shaders, then I wouldn't be opposed to some CSO, but I don't think it's horribly necessary. And no, pre-optimizing TGSI is stupid. Every single backend does something different with its shaders. You can't deliver something besides TGSI to drivers, either; it will suck badly for out-of-tree drivers and compat. Just write a real compiler for your HW. ~ C. On Fri, Jan 29, 2010 at 12:47 PM, Luca Barbieri wrote: > On Fri, Jan 29, 2010 at 8:49 PM, Keith Whitwell wrote: >> So the nv40 code is doing the wrong thing... :) >> >> The rule currently is that the generic tags are just tags and are used only >> to establish mapping between fragment shader and vertex shader. Additionaly >> the vertex shader should always supply at least the inputs required by the >> fragment shader. Taking a guess, without specific hardware support for >> routing, nv40 should translate the fragment shader inputs into successive >> SRC_TC slots, and build a mapping from semantic-index -> SRC_TC slot that >> can be applied when translating the vertex shader. >> >> You might complain that in this world a vertex shader will require multiple >> translations depending on which fragment shader is bound, as one taking >> fewer inputs will have different numberings. That's true, but you also get >> the opportunity to specialize the vertex shader as a result, eliminating >> dead code in each case - which may be worth the effort. >> >> Anyway, it sounds like what you really want is to change the linkage rules. >> This is possible, of course, but we should discuss it in those terms. > And >> then if we do revisit those rules, we should take a deeper look at linkage >> and figure out if anything else can be changed. > > Yes, this is what I'm proposing. > > Actually, those rules do not seem to be documented, so I was not sure > whether they were a requirement or were just a "bonus" feature of some > drivers. > > Right now it seems to me that no users of Gallium require this functionality. > In particular: > 1. GLSL and anything with similar link-by-name can of course be > modified to use sequential indices > 2. ARB fragment program and vertex program use index-limited texcoord slots > 3. g3dvl needs and uses 8 texcoord slots, indices 0-7 > 4. vega and xorg use indices 0-1 > 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken > 6. The GL_EXT_separate_shader_objects extension does not provide > arbitrary index matching for GLSL, but merely lets it use a model > similar to ARB fp/vp > > Thus, unless I am mistaken, no current API needs this feature. > > In addition, note that current rules do *not* allow to bind inputs and > outputs by name, since the semantic index is an integer and not a > screen. > However, the hardware implementation could probably allow that, since > it most likely works by specifying the fragment shader input a given > vertex shader output maps to. > > Thus, I propose to: > 1. Change the rules to require semantic indices to be in a 0-N range. > Document this appropriately. > 2. Fix GLSL to respect that using the patch I posted. Everything else > should be already fine. > 3. If necessary, add a ->set_routing_table function that takes pairs > of vertex shader outputs and fragment shader inputs to associate > > This allows to state tracker to support seamless "link by variable > name string" by packing the indices in sequential numbers, and then > using a dictionary kept in the state tracker to build an appropriate > routing table. > > There are also geometry programs, so both vs->gs and gs->fs routing > tables are probably needed. > > What do you think? > > > As an aside, regarding shader optimization, I would suggest a > driver-independent LLVM-based middle layer that takes TGSI as input > and can provide either optimized LLVM bytecode or optimized TGSI to > the driver (letting the driver choose with appropriate caps). > > -- > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > ___ > Mesa3d-dev mailing list > Mesa3d-dev@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > -- Only fools are easily impressed by what is only barely beyond their reach. ~ Unknown Corbin Simpson -- The Planet: dedicated and managed hosting, cloud storage, colocation St
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Fri, Jan 29, 2010 at 8:49 PM, Keith Whitwell wrote: > So the nv40 code is doing the wrong thing... :) > > The rule currently is that the generic tags are just tags and are used only > to establish mapping between fragment shader and vertex shader. Additionaly > the vertex shader should always supply at least the inputs required by the > fragment shader. Taking a guess, without specific hardware support for > routing, nv40 should translate the fragment shader inputs into successive > SRC_TC slots, and build a mapping from semantic-index -> SRC_TC slot that can > be applied when translating the vertex shader. > > You might complain that in this world a vertex shader will require multiple > translations depending on which fragment shader is bound, as one taking fewer > inputs will have different numberings. That's true, but you also get the > opportunity to specialize the vertex shader as a result, eliminating dead > code in each case - which may be worth the effort. > > Anyway, it sounds like what you really want is to change the linkage rules. > This is possible, of course, but we should discuss it in those terms. > And > then if we do revisit those rules, we should take a deeper look at linkage > and figure out if anything else can be changed. Yes, this is what I'm proposing. Actually, those rules do not seem to be documented, so I was not sure whether they were a requirement or were just a "bonus" feature of some drivers. Right now it seems to me that no users of Gallium require this functionality. In particular: 1. GLSL and anything with similar link-by-name can of course be modified to use sequential indices 2. ARB fragment program and vertex program use index-limited texcoord slots 3. g3dvl needs and uses 8 texcoord slots, indices 0-7 4. vega and xorg use indices 0-1 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken 6. The GL_EXT_separate_shader_objects extension does not provide arbitrary index matching for GLSL, but merely lets it use a model similar to ARB fp/vp Thus, unless I am mistaken, no current API needs this feature. In addition, note that current rules do *not* allow to bind inputs and outputs by name, since the semantic index is an integer and not a screen. However, the hardware implementation could probably allow that, since it most likely works by specifying the fragment shader input a given vertex shader output maps to. Thus, I propose to: 1. Change the rules to require semantic indices to be in a 0-N range. Document this appropriately. 2. Fix GLSL to respect that using the patch I posted. Everything else should be already fine. 3. If necessary, add a ->set_routing_table function that takes pairs of vertex shader outputs and fragment shader inputs to associate This allows to state tracker to support seamless "link by variable name string" by packing the indices in sequential numbers, and then using a dictionary kept in the state tracker to build an appropriate routing table. There are also geometry programs, so both vs->gs and gs->fs routing tables are probably needed. What do you think? As an aside, regarding shader optimization, I would suggest a driver-independent LLVM-based middle layer that takes TGSI as input and can provide either optimized LLVM bytecode or optimized TGSI to the driver (letting the driver choose with appropriate caps). -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
I just read the extension, and it seems to be that it clearly indicates that routing is *not* used by OpenGL. In particular, varyings with the same name are not linked together, and instead the builtin varyings must be used. As far as I know, the builtin varyings are gl_TexCoord[i] where i < MaxTextureCoords. Am I missing anything? Does GLSL, or any extension of it, specify any builtin varyings with arbitrary indices? Note that the extension spec even explicitly says the following: << It is undesirable from a performance standpoint to attempt to support "rendezvous by name" for arbitrary separate shaders because the separate shaders won't be naturally compiled to match their varying inputs and outputs of the same name without a special link step. Such a special link would introduce an extra validation overhead to binding separate shaders. The link itself would have to be deferred until glBegin time since separate shaders won't match when transitioning from one set of consistent shaders to another. This special link would still create errors or undefined behavior when the names of input and output varyings matched but their types did not match. >> It explicitly rules out doing a special link step/routing. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Fri, Jan 29, 2010 at 12:49 PM, Keith Whitwell wrote: > So the nv40 code is doing the wrong thing... :) > > The rule currently is that the generic tags are just tags and are used only > to establish mapping between fragment shader and vertex shader. Additionaly > the vertex shader should always supply at least the inputs required by the > fragment shader. Taking a guess, without specific hardware support for > routing, nv40 should translate the fragment shader inputs into successive > SRC_TC slots, and build a mapping from semantic-index -> SRC_TC slot that can > be applied when translating the vertex shader. > > You might complain that in this world a vertex shader will require multiple > translations depending on which fragment shader is bound, as one taking fewer > inputs will have different numberings. That's true, but you also get the > opportunity to specialize the vertex shader as a result, eliminating dead > code in each case - which may be worth the effort. > > Anyway, it sounds like what you really want is to change the linkage rules. > This is possible, of course, but we should discuss it in those terms. And > then if we do revisit those rules, we should take a deeper look at linkage > and figure out if anything else can be changed. Just a heads-up: The new GL_EXT_separate_shader_objects extension (http://www.opengl.org/registry/specs/EXT/separate_shader_objects.txt) allows binding/using vertex shaders and fragment shaders without an explicit linking step. Using the semantic info to match inputs to outputs will be key for that, I believe. -Brian -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
So the nv40 code is doing the wrong thing... :) The rule currently is that the generic tags are just tags and are used only to establish mapping between fragment shader and vertex shader. Additionaly the vertex shader should always supply at least the inputs required by the fragment shader. Taking a guess, without specific hardware support for routing, nv40 should translate the fragment shader inputs into successive SRC_TC slots, and build a mapping from semantic-index -> SRC_TC slot that can be applied when translating the vertex shader. You might complain that in this world a vertex shader will require multiple translations depending on which fragment shader is bound, as one taking fewer inputs will have different numberings. That's true, but you also get the opportunity to specialize the vertex shader as a result, eliminating dead code in each case - which may be worth the effort. Anyway, it sounds like what you really want is to change the linkage rules. This is possible, of course, but we should discuss it in those terms. And then if we do revisit those rules, we should take a deeper look at linkage and figure out if anything else can be changed. Keith From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca Barbieri [l...@luca-barbieri.com] Sent: Friday, January 29, 2010 10:15 AM To: Brian Paul Cc: Keith Whitwell; mesa3d-dev@lists.sourceforge.net Subject: Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots As a concrete example, the current nv40 code does this during fragment program translation. case TGSI_SEMANTIC_GENERIC: if (fdec->Semantic.Index <= 7) { hw = NV40_FP_OP_INPUT_SRC_TC(fdec->Semantic. Index); } else { NOUVEAU_ERR("bad generic semantic index\n"); return FALSE; } break; As a result, GLSL completely fails to work on nv30 and nv40. nv50 instead seems to have hardware routing support, which is used by the driver. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Fri, Jan 29, 2010 at 12:55 PM, Luca Barbieri wrote: >> FWIW, I think DX10 required or at least encouraged semantic mapping >> support in hardware. R6xx+ radeons support this and r3xx-r5xx >> hardware do to a lesser degree. You can use arbitrary, driver >> specific ids and the hardware will match up inputs and outputs based >> on those ids. > > Can you provide a reference to the DX10 API for doing that? > I'm not very familiar with DX10, and a quick search on MSDN didn't > turn anything relevant. > I'm not sure off hand. That's the answer I got internally as to why we have semantic mapping support in hw. Alex > Does the VMWare DirectX state tracker use non-sequential semantic > indices in the implementation of such a feature? > -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
Look harder. There *should* be routing tables of some sort on there. If there really aren't, then you might have to actually start in on your shader compilers instead of just translating opcodes one-at-a-time, but this isn't a bad thing as it's what everybody needs to do for performance. Also, as stated before, it's a bad idea to fix GLSL and pretend that everything's kosher, because it'll break again next time somebody provides a shader generator of non-trivial complexity. HLSL is the obvious candidate, but there might be GPGPU languages that do the same thing. While I have no serious objection to changing Gallium's TGSI semantics and permanently forbidding this, I kind of wish the drivers could just be more robust. ~ C. On Fri, Jan 29, 2010 at 10:15 AM, Luca Barbieri wrote: > As a concrete example, the current nv40 code does this during fragment > program translation. > case TGSI_SEMANTIC_GENERIC: > if (fdec->Semantic.Index <= 7) { > hw = NV40_FP_OP_INPUT_SRC_TC(fdec->Semantic. > Index); > } else { > NOUVEAU_ERR("bad generic semantic index\n"); > return FALSE; > } > break; > > As a result, GLSL completely fails to work on nv30 and nv40. > > nv50 instead seems to have hardware routing support, which is used by > the driver. > > -- > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > ___ > Mesa3d-dev mailing list > Mesa3d-dev@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > -- Only fools are easily impressed by what is only barely beyond their reach. ~ Unknown Corbin Simpson -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
As a concrete example, the current nv40 code does this during fragment program translation. case TGSI_SEMANTIC_GENERIC: if (fdec->Semantic.Index <= 7) { hw = NV40_FP_OP_INPUT_SRC_TC(fdec->Semantic. Index); } else { NOUVEAU_ERR("bad generic semantic index\n"); return FALSE; } break; As a result, GLSL completely fails to work on nv30 and nv40. nv50 instead seems to have hardware routing support, which is used by the driver. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
> Luca, > > Let me make sure I understand the problem here. > > Are you specifically concerned about the GENERIC[x] semantic > labels/indexes that are attached to VS outputs and FS inputs? Yes. > This is as intended. The semantic indexes are used to match up > inputs/outputs logically but they should not effect which hardware > interpolation slots are used. If remapping hardware is not available (or it is inefficient), then they must also select the interpolation slots, unless the driver is changed to recompile both shaders every time one of them changes. I'd really like to avoid having to do that in drivers if possible. If I'm not mistaken, the current GLSL linker is the only thing that currently relies on the capability to use semantic indices N and larger, where N is the number of interpolators, so fixing that seems to be enough. For instance, users of ARB fragment/vertex program will use matched texcoord slots, up to the implementation limit. The situation might change if DirectX 10 provides a remapping API, or if there is an OpenGL extension doing that. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
> FWIW, I think DX10 required or at least encouraged semantic mapping > support in hardware. R6xx+ radeons support this and r3xx-r5xx > hardware do to a lesser degree. You can use arbitrary, driver > specific ids and the hardware will match up inputs and outputs based > on those ids. Can you provide a reference to the DX10 API for doing that? I'm not very familiar with DX10, and a quick search on MSDN didn't turn anything relevant. Does the VMWare DirectX state tracker use non-sequential semantic indices in the implementation of such a feature? -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Fri, Jan 29, 2010 at 12:31 PM, Brian Paul wrote: > On Fri, Jan 29, 2010 at 9:49 AM, Brian Paul wrote: >> On Fri, Jan 29, 2010 at 12:48 AM, Luca Barbieri >> wrote: >>> I'd like to have some more definitive review comments on this patch >>> (sending to Brian and Keith for this). >>> >>> Right now GLSL is the *only* Gallium user that does not use sequential >>> indexes starting from 0 for vertex shader outputs and fragment shader >>> inputs. >>> This causes problems for some drivers such as nv30/nv40 that don't >>> remap the indexes right now. >>> >>> This can be addressed in two ways: >>> 1. Don't require Gallium users to use sequential indices, and require >>> vertex shader inputs and fragment shader outputs to match perfectly >>> 2. Don't require Gallium users to use sequential indices, and change >>> nv30/nv40 and possibly other drivers to remap indices >>> 3. Fix the only problematic user, GLSL, to use sequential indices >>> >>> (1) will break the Mesa state tracker in a very hard to fix way. >>> (2) is complex and means that nv30/nv40 and maybe other drivers can no >>> longer compile vertex and fragment shaders independently. >>> (3) is a simple fix, provided by this patch. >>> >>> I feel that (3), implemented by this patch, is the best solution, >>> since driver simplicity is one of the Gallium design goals, and I >>> don't see any significant advantages in supporting discontiguous >>> vertex shader output / fragment shader input values. >>> >>> OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for >>> vertex shader outputs and fragment shader inputs, and instead require >>> 0-7, 0-15 or 0-31 depending on feature level. >>> >>> If this is wrong, please correct me. >>> >>> I propose that Gallium should also require 0-x indices and not arbitrary >>> values. >>> Thus, GLSL should be fixed to respect that. >>> >>> Note that this change cannot be done in the state tracker because it >>> requires to see both the fragment and vertex shaders at once, which >>> only happens in the GLSL linker. >>> Thus, while the change has been discussed with Gallium in mind, it is >>> done at the Mesa program level, and it actually results in Mesa >>> programs with contiguous indices. >>> This also potentially benefits non-Gallium drivers. >>> >>> What do you think? >> >> Luca, I'm OK with this change in principle but I need a bit more time >> to review the problem and your patch... > > Luca, > > Let me make sure I understand the problem here. > > Are you specifically concerned about the GENERIC[x] semantic > labels/indexes that are attached to VS outputs and FS inputs? > > I hacked a Mesa GLSL demo to use texcoords and varying vars and saw > something like this: > > VERT > DCL IN[0] > DCL OUT[0], POSITION > DCL OUT[1], GENERIC[0] > DCL OUT[2], GENERIC[10] > ... > > FRAG > DCL IN[0], GENERIC[0], PERSPECTIVE > DCL IN[1], GENERIC[10], PERSPECTIVE > DCL OUT[0], COLOR > ... > > > We use the semantic names/labels GENERIC[0] and GENERIC[10] but note > that the actual inputs/outputs are in consecutive slots. > > This is as intended. The semantic indexes are used to match up > inputs/outputs logically but they should not effect which hardware > interpolation slots are used. > FWIW, I think DX10 required or at least encouraged semantic mapping support in hardware. R6xx+ radeons support this and r3xx-r5xx hardware do to a lesser degree. You can use arbitrary, driver specific ids and the hardware will match up inputs and outputs based on those ids. Alex > Prior to Keith's commit 07fafc7c9346aa260829603bf3188596481e9e62 the > generic semantic indexes were consecutive, BTW. > > -Brian > > -- > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > ___ > Mesa3d-dev mailing list > Mesa3d-dev@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Fri, Jan 29, 2010 at 9:49 AM, Brian Paul wrote: > On Fri, Jan 29, 2010 at 12:48 AM, Luca Barbieri > wrote: >> I'd like to have some more definitive review comments on this patch >> (sending to Brian and Keith for this). >> >> Right now GLSL is the *only* Gallium user that does not use sequential >> indexes starting from 0 for vertex shader outputs and fragment shader >> inputs. >> This causes problems for some drivers such as nv30/nv40 that don't >> remap the indexes right now. >> >> This can be addressed in two ways: >> 1. Don't require Gallium users to use sequential indices, and require >> vertex shader inputs and fragment shader outputs to match perfectly >> 2. Don't require Gallium users to use sequential indices, and change >> nv30/nv40 and possibly other drivers to remap indices >> 3. Fix the only problematic user, GLSL, to use sequential indices >> >> (1) will break the Mesa state tracker in a very hard to fix way. >> (2) is complex and means that nv30/nv40 and maybe other drivers can no >> longer compile vertex and fragment shaders independently. >> (3) is a simple fix, provided by this patch. >> >> I feel that (3), implemented by this patch, is the best solution, >> since driver simplicity is one of the Gallium design goals, and I >> don't see any significant advantages in supporting discontiguous >> vertex shader output / fragment shader input values. >> >> OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for >> vertex shader outputs and fragment shader inputs, and instead require >> 0-7, 0-15 or 0-31 depending on feature level. >> >> If this is wrong, please correct me. >> >> I propose that Gallium should also require 0-x indices and not arbitrary >> values. >> Thus, GLSL should be fixed to respect that. >> >> Note that this change cannot be done in the state tracker because it >> requires to see both the fragment and vertex shaders at once, which >> only happens in the GLSL linker. >> Thus, while the change has been discussed with Gallium in mind, it is >> done at the Mesa program level, and it actually results in Mesa >> programs with contiguous indices. >> This also potentially benefits non-Gallium drivers. >> >> What do you think? > > Luca, I'm OK with this change in principle but I need a bit more time > to review the problem and your patch... Luca, Let me make sure I understand the problem here. Are you specifically concerned about the GENERIC[x] semantic labels/indexes that are attached to VS outputs and FS inputs? I hacked a Mesa GLSL demo to use texcoords and varying vars and saw something like this: VERT DCL IN[0] DCL OUT[0], POSITION DCL OUT[1], GENERIC[0] DCL OUT[2], GENERIC[10] ... FRAG DCL IN[0], GENERIC[0], PERSPECTIVE DCL IN[1], GENERIC[10], PERSPECTIVE DCL OUT[0], COLOR ... We use the semantic names/labels GENERIC[0] and GENERIC[10] but note that the actual inputs/outputs are in consecutive slots. This is as intended. The semantic indexes are used to match up inputs/outputs logically but they should not effect which hardware interpolation slots are used. Prior to Keith's commit 07fafc7c9346aa260829603bf3188596481e9e62 the generic semantic indexes were consecutive, BTW. -Brian -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Fri, Jan 29, 2010 at 12:48 AM, Luca Barbieri wrote: > I'd like to have some more definitive review comments on this patch > (sending to Brian and Keith for this). > > Right now GLSL is the *only* Gallium user that does not use sequential > indexes starting from 0 for vertex shader outputs and fragment shader > inputs. > This causes problems for some drivers such as nv30/nv40 that don't > remap the indexes right now. > > This can be addressed in two ways: > 1. Don't require Gallium users to use sequential indices, and require > vertex shader inputs and fragment shader outputs to match perfectly > 2. Don't require Gallium users to use sequential indices, and change > nv30/nv40 and possibly other drivers to remap indices > 3. Fix the only problematic user, GLSL, to use sequential indices > > (1) will break the Mesa state tracker in a very hard to fix way. > (2) is complex and means that nv30/nv40 and maybe other drivers can no > longer compile vertex and fragment shaders independently. > (3) is a simple fix, provided by this patch. > > I feel that (3), implemented by this patch, is the best solution, > since driver simplicity is one of the Gallium design goals, and I > don't see any significant advantages in supporting discontiguous > vertex shader output / fragment shader input values. > > OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for > vertex shader outputs and fragment shader inputs, and instead require > 0-7, 0-15 or 0-31 depending on feature level. > > If this is wrong, please correct me. > > I propose that Gallium should also require 0-x indices and not arbitrary > values. > Thus, GLSL should be fixed to respect that. > > Note that this change cannot be done in the state tracker because it > requires to see both the fragment and vertex shaders at once, which > only happens in the GLSL linker. > Thus, while the change has been discussed with Gallium in mind, it is > done at the Mesa program level, and it actually results in Mesa > programs with contiguous indices. > This also potentially benefits non-Gallium drivers. > > What do you think? Luca, I'm OK with this change in principle but I need a bit more time to review the problem and your patch... -Brian -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
I'd like to have some more definitive review comments on this patch (sending to Brian and Keith for this). Right now GLSL is the *only* Gallium user that does not use sequential indexes starting from 0 for vertex shader outputs and fragment shader inputs. This causes problems for some drivers such as nv30/nv40 that don't remap the indexes right now. This can be addressed in two ways: 1. Don't require Gallium users to use sequential indices, and require vertex shader inputs and fragment shader outputs to match perfectly 2. Don't require Gallium users to use sequential indices, and change nv30/nv40 and possibly other drivers to remap indices 3. Fix the only problematic user, GLSL, to use sequential indices (1) will break the Mesa state tracker in a very hard to fix way. (2) is complex and means that nv30/nv40 and maybe other drivers can no longer compile vertex and fragment shaders independently. (3) is a simple fix, provided by this patch. I feel that (3), implemented by this patch, is the best solution, since driver simplicity is one of the Gallium design goals, and I don't see any significant advantages in supporting discontiguous vertex shader output / fragment shader input values. OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for vertex shader outputs and fragment shader inputs, and instead require 0-7, 0-15 or 0-31 depending on feature level. If this is wrong, please correct me. I propose that Gallium should also require 0-x indices and not arbitrary values. Thus, GLSL should be fixed to respect that. Note that this change cannot be done in the state tracker because it requires to see both the fragment and vertex shaders at once, which only happens in the GLSL linker. Thus, while the change has been discussed with Gallium in mind, it is done at the Mesa program level, and it actually results in Mesa programs with contiguous indices. This also potentially benefits non-Gallium drivers. What do you think? -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Mon, Jan 18, 2010 at 09:06:01PM +0100, Luca Barbieri wrote: > As for REing the tables, it may not be possible. > This is the code that apparently sets them up right now: > /* vtxprog output routing */ > so_method(so, screen->curie, 0x1fc4, 1); > so_data (so, 0x06144321); > so_method(so, screen->curie, 0x1fc8, 2); > so_data (so, 0xedcba987); > so_data (so, 0x0021); > so_method(so, screen->curie, 0x1fd0, 1); > so_data (so, 0x00171615); > so_method(so, screen->curie, 0x1fd4, 1); > so_data (so, 0x001b1a19); > > This makes me think that only 4 bits might be used for the values > (look at the arithmetic progressions of 4-bit values), so that there > is a limit of 16 vertex output/fragment inputs. Looking at the so_data values, 1fcx is 4 bits, 1fdx is 8 bits. It's also a little strange to have 24 slots for 16 possible values. > If GLSL starts at index 10, we are still in trouble because less than > 8 varyings will be available. Shader model 4 requires at least 32 interpolated values. Subsequent versions will want more and more. Any "start index" is going to blow up someday unless it's a very large value (which 10 isn't). OG. -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
> If you get this patch in, then you'll still have to fight with every > other state tracker that doesn't prettify their TGSI. It would be a > much better approach to attempt to RE the routing tables. I don't think there any users of the Gallium interface that need more than 8 vertex outputs/fragment inputs and don't use sequential values starting at 0, except the GLSL linker without this patch. ARB_fragment_program and ARB_vertex_program is limited to texcoord slots, and Mesa should advertise only 8 of them. Also users of this interface will likely only use as many as they need, sequentially. Vega, xorg seem to only use up to 2 slots. g3dvl up to 8 (starting from 0, of course). Cards with less than 8 slots may sometimes still have problems, but such cards will probably be DX8 cards that don't work anyway. Furthermore, even if you can route things, usings vertex outputs and fragment inputs with lower indices may be more efficient anyway. As for REing the tables, it may not be possible. This is the code that apparently sets them up right now: /* vtxprog output routing */ so_method(so, screen->curie, 0x1fc4, 1); so_data (so, 0x06144321); so_method(so, screen->curie, 0x1fc8, 2); so_data (so, 0xedcba987); so_data (so, 0x0021); so_method(so, screen->curie, 0x1fd0, 1); so_data (so, 0x00171615); so_method(so, screen->curie, 0x1fd4, 1); so_data (so, 0x001b1a19); This makes me think that only 4 bits might be used for the values (look at the arithmetic progressions of 4-bit values), so that there is a limit of 16 vertex output/fragment inputs. If GLSL starts at index 10, we are still in trouble because less than 8 varyings will be available. Also leaving vertex outputs/fragment inputs unused by starting at high values may be bad for performance even if supported, as it may lead to a bigger register file and thus less simultaneous GPU threads running. In other words, having GLSL start at index 10 is easily avoided, and causes problems nothing else causes, so why not just stop doing that? -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
Actually, we don't even bother worrying about the rasterizer's routing table until we've bound a pair of shaders and start drawing. Right before the draw call, we re-generate, among other things, routing tables for the vert shader and the rasterizer. This is *incredibly* powerful, because it means we only have to compile the shaders once, and load the rasterizer tables based on those shaders. I even baked up a CSO to cache the tables, but it turned out to be an overall slowdown. If you get this patch in, then you'll still have to fight with every other state tracker that doesn't prettify their TGSI. It would be a much better approach to attempt to RE the routing tables. Also FYI the r300-r500 rasterizer can only handle, off the top of my head, 16 sets of vectors total (8 colors, 8 texcoords) so you're not the only ones with this kind of limitation. The situation gets better for r600 and nv50. ~ C. On Mon, Jan 18, 2010 at 8:27 AM, Luca Barbieri wrote: > So, basically, you allocate the rasterizer units according to the > vertex shader, and when the fragment shader comes up, you say "write > rasterizer output 4 to fragment input 100"? > > The current nouveau drivers can't do this. > There are "routing" registers in hardware, but I think the nVidia > proprietary driver (at least without GLSL) leaves them unaltered after > initialization and I don't think we really know how they would work. > They are also very likely limited to at most 256 values (maybe even > less, such as 16), even if they can actually be made to work. > > The way the current pre-nv50 driver works is that there are 8 slots, > each of which has an interpolator and a fixed associated vertex shader > output and fixed fragment input. This seems a rather obvious way to > design hardware, and so shouldn't be uncommon. > > Thus, the inputs/outputs can't be packed, because that will break if > the fragment shader doesn't use a vertex output. > And there is no way to correct that when the fragment program comes > up, other than recompiling the vertex shader, which would be very > desirable to avoid having to do. > > Non-GLSL programs can only use the 8 texcoords, so there is no problem > there since hardware supports 8 slots. > > Thus, I think my proposed solution is the simplest and most efficient > approach. > Any other solution would require much more, and slower, code in the > Gallium drivers for nv30, nv40, and maybe Intel too. > > -- > Throughout its 18-year history, RSA Conference consistently attracts the > world's best and brightest in the field, creating opportunities for Conference > attendees to learn about information security's most important issues through > interactions with peers, luminaries and emerging and established companies. > http://p.sf.net/sfu/rsaconf-dev2dev > ___ > Mesa3d-dev mailing list > Mesa3d-dev@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > -- Only fools are easily impressed by what is only barely beyond their reach. ~ Unknown Corbin Simpson -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
So, basically, you allocate the rasterizer units according to the vertex shader, and when the fragment shader comes up, you say "write rasterizer output 4 to fragment input 100"? The current nouveau drivers can't do this. There are "routing" registers in hardware, but I think the nVidia proprietary driver (at least without GLSL) leaves them unaltered after initialization and I don't think we really know how they would work. They are also very likely limited to at most 256 values (maybe even less, such as 16), even if they can actually be made to work. The way the current pre-nv50 driver works is that there are 8 slots, each of which has an interpolator and a fixed associated vertex shader output and fixed fragment input. This seems a rather obvious way to design hardware, and so shouldn't be uncommon. Thus, the inputs/outputs can't be packed, because that will break if the fragment shader doesn't use a vertex output. And there is no way to correct that when the fragment program comes up, other than recompiling the vertex shader, which would be very desirable to avoid having to do. Non-GLSL programs can only use the 8 texcoords, so there is no problem there since hardware supports 8 slots. Thus, I think my proposed solution is the simplest and most efficient approach. Any other solution would require much more, and slower, code in the Gallium drivers for nv30, nv40, and maybe Intel too. -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Mon, Jan 18, 2010 at 10:52 AM, Marek Olšák wrote: > On Mon, Jan 18, 2010 at 3:22 PM, Luca Barbieri > wrote: >> >> > I think this is not necessary and fixing the rasterizer setup in the >> > driver >> > would by better than fixing the state tracker. >> > >> > In r300g, we dynamically allocate rasterizer units based on vertex >> > shader >> > outputs. If the vertex shader uses slots 1, 5, 20, 100, the driver >> > maps >> > them to units 1,2,3,4. >> >> But what if the fragment shader has inputs 1, 2, 5, 20, 100? >> If you remap the fragment shader to 1, 2, 3, 4, 5, then they will >> mismatch. >> >> You would need to either: >> 1. Generate shaders in the driver for the fragment/vertex combination >> instead of each one separately >> 2. Require that vertex shader outputs match fragment shader inputs exactly >> >> (1) makes the driver much more complex and slow. I think we should try >> to make it possible to avoid this, unless the hardware absolutely >> requires it. >> (2) will probably break the existing fixed pipeline and ARB_fp/vp >> support, and also make the driver more complex than necessary. >> >> Does r300g compile both fragment and vertex shader together? >> >> Also note that all Gallium-capable hardware should support 8 varying >> slots, so anything that uses only "texture coordinates" should not >> need any remapping. > > I was talking about the rasterizer (interpolator) units, which, on r300, are > quite flexible and can read an arbitrary vertex shader output and write it > to an arbitrary fragment shader input (= register address). Given this > flexibility, fragment and vertex shaders are compiled separately in r300g > and semantic indices don't matter, just the total number of varyings. I think DX9 required this flexibility when mapping VS to PS, so it's likely most DX9 hw supports this. Alex -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
On Mon, Jan 18, 2010 at 3:22 PM, Luca Barbieri wrote: > > I think this is not necessary and fixing the rasterizer setup in the > driver > > would by better than fixing the state tracker. > > > > In r300g, we dynamically allocate rasterizer units based on vertex shader > > outputs. If the vertex shader uses slots 1, 5, 20, 100, the driver > maps > > them to units 1,2,3,4. > > But what if the fragment shader has inputs 1, 2, 5, 20, 100? > If you remap the fragment shader to 1, 2, 3, 4, 5, then they will mismatch. > > You would need to either: > 1. Generate shaders in the driver for the fragment/vertex combination > instead of each one separately > 2. Require that vertex shader outputs match fragment shader inputs exactly > > (1) makes the driver much more complex and slow. I think we should try > to make it possible to avoid this, unless the hardware absolutely > requires it. > (2) will probably break the existing fixed pipeline and ARB_fp/vp > support, and also make the driver more complex than necessary. > > Does r300g compile both fragment and vertex shader together? > > Also note that all Gallium-capable hardware should support 8 varying > slots, so anything that uses only "texture coordinates" should not > need any remapping. > I was talking about the rasterizer (interpolator) units, which, on r300, are quite flexible and can read an arbitrary vertex shader output and write it to an arbitrary fragment shader input (= register address). Given this flexibility, fragment and vertex shaders are compiled separately in r300g and semantic indices don't matter, just the total number of varyings. -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
> I think this is not necessary and fixing the rasterizer setup in the driver > would by better than fixing the state tracker. > > In r300g, we dynamically allocate rasterizer units based on vertex shader > outputs. If the vertex shader uses slots 1, 5, 20, 100, the driver maps > them to units 1,2,3,4. But what if the fragment shader has inputs 1, 2, 5, 20, 100? If you remap the fragment shader to 1, 2, 3, 4, 5, then they will mismatch. You would need to either: 1. Generate shaders in the driver for the fragment/vertex combination instead of each one separately 2. Require that vertex shader outputs match fragment shader inputs exactly (1) makes the driver much more complex and slow. I think we should try to make it possible to avoid this, unless the hardware absolutely requires it. (2) will probably break the existing fixed pipeline and ARB_fp/vp support, and also make the driver more complex than necessary. Does r300g compile both fragment and vertex shader together? Also note that all Gallium-capable hardware should support 8 varying slots, so anything that uses only "texture coordinates" should not need any remapping. -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
Hi Luca, I think this is not necessary and fixing the rasterizer setup in the driver would by better than fixing the state tracker. In r300g, we dynamically allocate rasterizer units based on vertex shader outputs. If the vertex shader uses slots 1, 5, 20, 100, the driver maps them to units 1,2,3,4. Marek On Sun, Jan 17, 2010 at 8:05 PM, Luca Barbieri wrote: > The current GLSL linker puts varyings in slots starting from *_VAR0, > leaving the *_TEX slots used only for gl_TexCoord[i]. > > This results in TGSI programs that start using generic input/outputs > with index 10. > > Unfortunately, some drivers (e.g. pre-nv50 nouveau) support only 8 > vertex program outputs, and this causes GLSL to not work at all. > On other cards, GLSL works, but 8 varying slots are lost. > > This patch solves the problem by modifying the GLSL linker to allocate > varyings in texcoord slots that neither vertex nor fragment shader uses. > > Note that the GLSL linker is the only place where this can be done, > because it is the only place that sees both the vertex and fragment > programs at once. > > The only known issue is that if the GLSL program has an indirect > reference to gl_TexCoord[i], no varyings will be put in texcoord slots. > This may or may not be desirable. > > This makes (a subset of) GLSL work on NV30/NV40 and improves the > chances of complex programs working on other cards. > > Signed-off-by: Luca Barbieri > --- > src/mesa/shader/slang/slang_link.c | 62 > ++- > 1 files changed, 46 insertions(+), 16 deletions(-) > > diff --git a/src/mesa/shader/slang/slang_link.c > b/src/mesa/shader/slang/slang_link.c > index ed27821..889a811 100644 > --- a/src/mesa/shader/slang/slang_link.c > +++ b/src/mesa/shader/slang/slang_link.c > @@ -99,9 +99,9 @@ bits_agree(GLbitfield flags1, GLbitfield flags2, > GLbitfield bit) > */ > static GLboolean > link_varying_vars(GLcontext *ctx, > - struct gl_shader_program *shProg, struct gl_program > *prog) > + struct gl_shader_program *shProg, struct gl_program > *prog, GLbyte* varying_slots) > { > - GLuint *map, i, firstVarying, newFile; > + GLuint *map, i, firstTex, firstVarying, newFile; >GLbitfield *inOutFlags; > >map = (GLuint *) _mesa_malloc(prog->Varying->NumParameters * > sizeof(GLuint)); > @@ -114,13 +114,15 @@ link_varying_vars(GLcontext *ctx, > * Also, replace File=PROGRAM_VARYING with File=PROGRAM_INPUT/OUTPUT. > */ >if (prog->Target == GL_VERTEX_PROGRAM_ARB) { > - firstVarying = VERT_RESULT_VAR0; > + firstTex = VERT_RESULT_TEX0; > + firstVarying = VERT_RESULT_VAR0 - 8; > newFile = PROGRAM_OUTPUT; > inOutFlags = prog->OutputFlags; >} >else { > assert(prog->Target == GL_FRAGMENT_PROGRAM_ARB); > - firstVarying = FRAG_ATTRIB_VAR0; > + firstTex = FRAG_ATTRIB_TEX0; > + firstVarying = FRAG_ATTRIB_VAR0 - 8; > newFile = PROGRAM_INPUT; > inOutFlags = prog->InputFlags; >} > @@ -173,9 +175,12 @@ link_varying_vars(GLcontext *ctx, > { > GLint sz = var->Size; > while (sz > 0) { > -inOutFlags[firstVarying + j] = var->Flags; > +int v = varying_slots[j]; > +v += ((v < 8) ? firstTex : firstVarying); > +inOutFlags[v] = var->Flags; > /*printf("Link varying from %d to %d\n", i, j);*/ > -map[i++] = j++; > +map[i++] = v; > +++j; > sz -= 4; > } > i--; /* go back one */ > @@ -192,13 +197,13 @@ link_varying_vars(GLcontext *ctx, > > if (inst->DstReg.File == PROGRAM_VARYING) { > inst->DstReg.File = newFile; > - inst->DstReg.Index = map[ inst->DstReg.Index ] + firstVarying; > + inst->DstReg.Index = map[ inst->DstReg.Index ]; > } > > for (j = 0; j < 3; j++) { > if (inst->SrcReg[j].File == PROGRAM_VARYING) { > inst->SrcReg[j].File = newFile; > -inst->SrcReg[j].Index = map[ inst->SrcReg[j].Index ] + > firstVarying; > +inst->SrcReg[j].Index = map[ inst->SrcReg[j].Index ]; > } > } >} > @@ -790,14 +795,39 @@ _slang_link(GLcontext *ctx, > ASSERT(shProg->FragmentProgram->Base.RefCount == 1); >} > > - /* link varying vars */ > - if (shProg->VertexProgram) { > - if (!link_varying_vars(ctx, shProg, &shProg->VertexProgram->Base)) > - return; > - } > - if (shProg->FragmentProgram) { > - if (!link_varying_vars(ctx, shProg, &shProg->FragmentProgram->Base)) > - return; > + { > + GLuint texcoord_mask = 0; > + GLbyte varying_slots[MAX_VARYING]; > + GLuint next_varying = 0; > + > + if(shProg->VertexProgram) > + { > +_slang_update_inputs_outputs(&shProg->VertexProgram->Base); > +texcoord_mask |= (shProg->VertexProgram->Base.OutputsWritten > >> VERT_RESULT_TEX0) & 0xff; > + } > + if(shProg->FragmentPr
[Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots
The current GLSL linker puts varyings in slots starting from *_VAR0, leaving the *_TEX slots used only for gl_TexCoord[i]. This results in TGSI programs that start using generic input/outputs with index 10. Unfortunately, some drivers (e.g. pre-nv50 nouveau) support only 8 vertex program outputs, and this causes GLSL to not work at all. On other cards, GLSL works, but 8 varying slots are lost. This patch solves the problem by modifying the GLSL linker to allocate varyings in texcoord slots that neither vertex nor fragment shader uses. Note that the GLSL linker is the only place where this can be done, because it is the only place that sees both the vertex and fragment programs at once. The only known issue is that if the GLSL program has an indirect reference to gl_TexCoord[i], no varyings will be put in texcoord slots. This may or may not be desirable. This makes (a subset of) GLSL work on NV30/NV40 and improves the chances of complex programs working on other cards. Signed-off-by: Luca Barbieri --- src/mesa/shader/slang/slang_link.c | 62 ++- 1 files changed, 46 insertions(+), 16 deletions(-) diff --git a/src/mesa/shader/slang/slang_link.c b/src/mesa/shader/slang/slang_link.c index ed27821..889a811 100644 --- a/src/mesa/shader/slang/slang_link.c +++ b/src/mesa/shader/slang/slang_link.c @@ -99,9 +99,9 @@ bits_agree(GLbitfield flags1, GLbitfield flags2, GLbitfield bit) */ static GLboolean link_varying_vars(GLcontext *ctx, - struct gl_shader_program *shProg, struct gl_program *prog) + struct gl_shader_program *shProg, struct gl_program *prog, GLbyte* varying_slots) { - GLuint *map, i, firstVarying, newFile; + GLuint *map, i, firstTex, firstVarying, newFile; GLbitfield *inOutFlags; map = (GLuint *) _mesa_malloc(prog->Varying->NumParameters * sizeof(GLuint)); @@ -114,13 +114,15 @@ link_varying_vars(GLcontext *ctx, * Also, replace File=PROGRAM_VARYING with File=PROGRAM_INPUT/OUTPUT. */ if (prog->Target == GL_VERTEX_PROGRAM_ARB) { - firstVarying = VERT_RESULT_VAR0; + firstTex = VERT_RESULT_TEX0; + firstVarying = VERT_RESULT_VAR0 - 8; newFile = PROGRAM_OUTPUT; inOutFlags = prog->OutputFlags; } else { assert(prog->Target == GL_FRAGMENT_PROGRAM_ARB); - firstVarying = FRAG_ATTRIB_VAR0; + firstTex = FRAG_ATTRIB_TEX0; + firstVarying = FRAG_ATTRIB_VAR0 - 8; newFile = PROGRAM_INPUT; inOutFlags = prog->InputFlags; } @@ -173,9 +175,12 @@ link_varying_vars(GLcontext *ctx, { GLint sz = var->Size; while (sz > 0) { -inOutFlags[firstVarying + j] = var->Flags; +int v = varying_slots[j]; +v += ((v < 8) ? firstTex : firstVarying); +inOutFlags[v] = var->Flags; /*printf("Link varying from %d to %d\n", i, j);*/ -map[i++] = j++; +map[i++] = v; +++j; sz -= 4; } i--; /* go back one */ @@ -192,13 +197,13 @@ link_varying_vars(GLcontext *ctx, if (inst->DstReg.File == PROGRAM_VARYING) { inst->DstReg.File = newFile; - inst->DstReg.Index = map[ inst->DstReg.Index ] + firstVarying; + inst->DstReg.Index = map[ inst->DstReg.Index ]; } for (j = 0; j < 3; j++) { if (inst->SrcReg[j].File == PROGRAM_VARYING) { inst->SrcReg[j].File = newFile; -inst->SrcReg[j].Index = map[ inst->SrcReg[j].Index ] + firstVarying; +inst->SrcReg[j].Index = map[ inst->SrcReg[j].Index ]; } } } @@ -790,14 +795,39 @@ _slang_link(GLcontext *ctx, ASSERT(shProg->FragmentProgram->Base.RefCount == 1); } - /* link varying vars */ - if (shProg->VertexProgram) { - if (!link_varying_vars(ctx, shProg, &shProg->VertexProgram->Base)) - return; - } - if (shProg->FragmentProgram) { - if (!link_varying_vars(ctx, shProg, &shProg->FragmentProgram->Base)) - return; + { + GLuint texcoord_mask = 0; + GLbyte varying_slots[MAX_VARYING]; + GLuint next_varying = 0; + + if(shProg->VertexProgram) + { +_slang_update_inputs_outputs(&shProg->VertexProgram->Base); +texcoord_mask |= (shProg->VertexProgram->Base.OutputsWritten >> VERT_RESULT_TEX0) & 0xff; + } + if(shProg->FragmentProgram) + { +_slang_update_inputs_outputs(&shProg->FragmentProgram->Base); +texcoord_mask |= (shProg->FragmentProgram->Base.InputsRead >> FRAG_ATTRIB_TEX0) & 0xff; + } + + /* Allocate varying slots, excluding slot i if gl_TexCoord[i] is used by the shaders */ + for(i = 0; i < MAX_VARYING; ++i) + { + for(; (1 << next_varying) & texcoord_mask; ++next_varying) + {} + varying_slots[i] = next_varying++; + } + + /* link varying vars */ + if (shProg->VertexProgram) { + if