Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
I tested this on Windows, using nVidia driver 195 on nv40, and it seems we are all partially wrong. SM3 does indeed allow semantics unrelated to hardware resources. However, the semantic indices for any semantic type must be in the range 0-15, or D3DX will report a compiler error during shader compilation: error X2000: syntax error : unexpected token 'dcl_texcoord16' This is further confirmed by the following lines in d3d9types.h: #define MAXD3DDECLUSAGE D3DDECLUSAGE_SAMPLE #define MAXD3DDECLUSAGEINDEX15 I would guess that these two 4-bit values are combined into an 8-bit value that is then passed directly to hardware like r600 which supports 8-bit semantic indices in hardware. Is this the case on Radeon? Is the 8-bit semantic table a feature of r300 too or only of r600+? In light of this, it may make sense to do some range limitation ourselves too. In particular, a good plan could be limiting all semantic indices to 0-15, except GENERIC, which could support a 0-127 range. This would allow to both directly take advantage of Radeon hardware, and let drivers that need to remap in software do so with direct lookup in a small array. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
On Wed, 2010-02-03 at 01:42 -0800, Luca Barbieri wrote: I tested this on Windows, using nVidia driver 195 on nv40, and it seems we are all partially wrong. SM3 does indeed allow semantics unrelated to hardware resources. However, the semantic indices for any semantic type must be in the range 0-15, or D3DX will report a compiler error during shader compilation: error X2000: syntax error : unexpected token 'dcl_texcoord16' This is further confirmed by the following lines in d3d9types.h: #define MAXD3DDECLUSAGE D3DDECLUSAGE_SAMPLE #define MAXD3DDECLUSAGEINDEX15 I would guess that these two 4-bit values are combined into an 8-bit value that is then passed directly to hardware like r600 which supports 8-bit semantic indices in hardware. Further down that file they define the binary shader tokens for DX9, which match your guess: // For dcl info tokens requiring a semantic (usage + index) #define D3DSP_DCL_USAGE_SHIFT 0 #define D3DSP_DCL_USAGE_MASK 0x000f #define D3DSP_DCL_USAGEINDEX_SHIFT 16 #define D3DSP_DCL_USAGEINDEX_MASK 0x000f Not for the first time, hardware capabilities directly match what was required to implement the DX version of the era. Is this the case on Radeon? Is the 8-bit semantic table a feature of r300 too or only of r600+? At some point this would have been introduced to hardware to remove the headache from software of dealing with the DX9 semantic scheme. I don't know at what point in the hardware/driver evolution it became important enough to devote silicon to. In light of this, it may make sense to do some range limitation ourselves too. In particular, a good plan could be limiting all semantic indices to 0-15, except GENERIC, which could support a 0-127 range. This would allow to both directly take advantage of Radeon hardware, and let drivers that need to remap in software do so with direct lookup in a small array. This level of restriction is fine with me. It seems to allow us to capture all the important APIs - GL and DX9 clearly, and DX10 seems to match outputs to inputs by position, without needing to examine semantics. Also, we've been proliferating semantic names, one each for various system values. It sounds like we might want to consolidate them down designated indices within a single name. Thanks for looking into this Luca, Keith -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
On Wed, Feb 3, 2010 at 4:42 AM, Luca Barbieri l...@luca-barbieri.com wrote: I tested this on Windows, using nVidia driver 195 on nv40, and it seems we are all partially wrong. SM3 does indeed allow semantics unrelated to hardware resources. However, the semantic indices for any semantic type must be in the range 0-15, or D3DX will report a compiler error during shader compilation: error X2000: syntax error : unexpected token 'dcl_texcoord16' This is further confirmed by the following lines in d3d9types.h: #define MAXD3DDECLUSAGE D3DDECLUSAGE_SAMPLE #define MAXD3DDECLUSAGEINDEX 15 I would guess that these two 4-bit values are combined into an 8-bit value that is then passed directly to hardware like r600 which supports 8-bit semantic indices in hardware. Is this the case on Radeon? Is the 8-bit semantic table a feature of r300 too or only of r600+? Only r600+. r3xx-r5xx is more basic. You basically set up a table based on the inputs and outputs. Order doesn't matter as long as the table is correct for the vs and ps you are using. See pages 258-261 for the vertex fetch setup and pages 197-199 for the vs to ps routing of the r5xx accel guide: http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf Alex In light of this, it may make sense to do some range limitation ourselves too. In particular, a good plan could be limiting all semantic indices to 0-15, except GENERIC, which could support a 0-127 range. This would allow to both directly take advantage of Radeon hardware, and let drivers that need to remap in software do so with direct lookup in a small array. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
Luca Barbieri wrote on 2010-02-01 21:42: 1. All the semantic indices in OpenGL are limited, according to the ARB specification 2. All the sematic indices in DirectX 9/10 are limited, according to http://msdn.microsoft.com/en-us/library/ee418355%28VS.85%29.aspx At least for SM3.0, one can specify a vertex shader output semantic like COLOR15 and have it running as long as one has also a pixel shader with a matching input semantic. Though I agree with you we don't really want to go this route and have something more sensible. We could, for example, limit COLOR and BCOLOR indices to [0, 1], remove FOG and NORMAL names, and have a well-defined limit on GENERIC index value. After all, we only need non-generic semantics to communicate with the fixed-function part of the pipeline, that is rasteriser. name index range POSITION no limit? COLOR 0..1, explicit clamp? BCOLOR 0..1, explicit clamp? FOGremove? PSIZE 0 GENERIC0..max generics NORMAL remove FACE 0 EDGEFLAG 0 PRIMID 0 INSTANCEID 0 As for the routing table thing, I am not really convinced. The GLSL mechanism to link shaders based on varying names is GL-specific and thus should stay inside Mesa state tracker. In fact, D3D10 runtime is doing exactly the same thing and generating shader varients on the fly as they are mixed and matched by the application. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
On Tue, Feb 2, 2010 at 3:54 PM, michal mic...@vmware.com wrote: Luca Barbieri wrote on 2010-02-01 21:42: 1. All the semantic indices in OpenGL are limited, according to the ARB specification 2. All the sematic indices in DirectX 9/10 are limited, according to http://msdn.microsoft.com/en-us/library/ee418355%28VS.85%29.aspx At least for SM3.0, one can specify a vertex shader output semantic like COLOR15 and have it running as long as one has also a pixel shader with a matching input semantic. Though I agree with you we don't really want to go this route and have something more sensible. So translating COLOR15 away in a DX9 state tracker would mean that it would have to examine pairs of vertex and fragment shaders together and re-translate to generate varients that use the same set of remapped semantics, right? That sounds like extra work a DX9 state tracker could avoid with the current rules. I'm not opposed to doing more work in the state-trackers, but as I keep saying, if we're going to do this type of hand-holding in the state trackers, we should make sure we do enough to fix the re-translation problem in all drivers, not just a couple. Keith -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
At least for SM3.0, one can specify a vertex shader output semantic like COLOR15 and have it running as long as one has also a pixel shader with a matching input semantic. Though I agree with you we don't really want to go this route and have something more sensible. Do you know of any official Microsoft documentation that clearly indicates that COLOR (and other) semantic indices are not limited? The documentation I found seems to support to opposite statement, as the following line: n is an optional integer between 0 and the number of resources supported. For example, POSITION0, TEXCOOR1, etc. in Semantics (DirectX HLSL) on MSDN seems to indicate that if only 2 COLORs are supported, they are denoted by COLOR0 and COLOR1, and that COLOR15 being valid would implying support for simultaneously using at least 16 COLOR semantics. As I understand it, the difference between SM2 and SM3 is that SM2 programs essentially directly use the semantics in instructions, because they have c## registers for colors, t## registers for texcoords, etc. SM3 programs instead use generic i## or o## input/output registers, which are associated to semantics with a declaration. Note that this difference is orthogonal to the issue of whether semantic indices are limited or not. As for the routing table thing, I am not really convinced. The GLSL mechanism to link shaders based on varying names is GL-specific and thus should stay inside Mesa state tracker Surely. However, if we want to support compiling the shaders separately, variable foo may have been assigned output #2 in the vertex shader, but input #1 in the fragment shader. Thus, we need a way for Mesa to tell Gallium to map output #2 to input #1. Of course, deciding to map #2 to #1 by consulting the GLSL shader compiler symbol tables should be the state tracker's job. Otherwise, we will need to recompile either of the shaders at link time, so that foo is assigned the same slot in both shaders, which is what we do now in GLSL linking, but is somewhat inefficient and in particular can lead to compilation time growing quadratically in the number of shaders, and slower shader switching. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
Personally I'm going to take a break from this thread, spend a couple of days looking at i965, etc, to see what can be done to improve things there, and maybe come back with an alternate proposal. Yes, I think that the most important step is to precisely determine how both hardware (and especially the newer cards you mentioned) works and how shader APIs (especially DirectX) are defined. Once the workings of both are known and agreed upon, the best solution should be hopefully be clear. In addition to looking at hardware such as i965, it would be awesome to find some clear and unambiguous documentation on DirectX shader semantics. and agree on its interpretation. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
On Tue, Feb 02, 2010 at 07:09:12PM +0100, Luca Barbieri wrote: Otherwise, we will need to recompile either of the shaders at link time, so that foo is assigned the same slot in both shaders, which is what we do now in GLSL linking, but is somewhat inefficient and in particular can lead to compilation time growing quadratically in the number of shaders, and slower shader switching. Slower shader switching is what caches are for. And if you have n VS and m FS, and a large subset of the n*m combinations (that's where your quadratic comes from, right?) are actually used, then it's rather obvious that inter-shader constant propagation and dead code removal is going to be a must. Incidentally, you can multiply by the number of geometry shaders while you're at it. As for link-by-name, it's pretty obvious it's going to become to norm and not the exception. Numbers are opaque, names aren't, and shaders are a bitch to write and debug. In addition color and texture coords is way too specific and is pretty sure to morph into int and float, or even float only, given HDR, and how easier it is hardware wise and shader compiler wise to just have large-n parallel float interpolation units. That with link-time shared types. So you'd better ensure your approach is ready for a more dynamic world where you can't decide a lot of things until link time. OG. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
On Tue, Feb 2, 2010 at 1:16 PM, Luca Barbieri l...@luca-barbieri.com wrote: Personally I'm going to take a break from this thread, spend a couple of days looking at i965, etc, to see what can be done to improve things there, and maybe come back with an alternate proposal. Yes, I think that the most important step is to precisely determine how both hardware (and especially the newer cards you mentioned) works and how shader APIs (especially DirectX) are defined. On AMD r6xx and newer asics, the hardware provides 8 bit semantic ids that are used for vertex fetches and shader to shader routing. See: http://www.x.org/docs/AMD/R6xx_R7xx_3D.pdf pages 10-11, 16-17 The driver can define the ids to whatever it wants and then data will be routed based on those ids. E.g., #define POSITION 1 #define COLOR0 2 #define TEXCOORD0 3 etc. then in your fetch shader: vfetch POSITION vfetch TEXCOORD0 vfetch COLOR0 and in your vs output: export COLOR0 export TEXCOORD0 export POSITION and in your ps inputs: input TEXCOORD0 input COLOR0 etc. The ordering doesn't matter all routing is done by semantic id. There's no need to recompile your vertex shaders or pixel shaders, you just adjust the fetch shader and sematic exports/imports state accordingly. The current r600 classic mesa driver just uses a hardcoded mapping right now, but it would make sense to use semantic ids in the gallium driver. Alex Once the workings of both are known and agreed upon, the best solution should be hopefully be clear. In addition to looking at hardware such as i965, it would be awesome to find some clear and unambiguous documentation on DirectX shader semantics. and agree on its interpretation. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
On Tue, Feb 2, 2010 at 7:38 PM, Olivier Galibert galib...@pobox.com wrote: On Tue, Feb 02, 2010 at 07:09:12PM +0100, Luca Barbieri wrote: Otherwise, we will need to recompile either of the shaders at link time, so that foo is assigned the same slot in both shaders, which is what we do now in GLSL linking, but is somewhat inefficient and in particular can lead to compilation time growing quadratically in the number of shaders, and slower shader switching. Slower shader switching is what caches are for. And if you have n VS and m FS, and a large subset of the n*m combinations (that's where your quadratic comes from, right?) are actually used, then it's rather obvious that inter-shader constant propagation and dead code removal is going to be a must. Incidentally, you can multiply by the number of geometry shaders while you're at it. As for link-by-name, it's pretty obvious it's going to become to norm and not the exception. Exactly, and that's why we should be able to support it efficiently. The current Gallium architecture doesn't do that because semantic indices are integers, and there is no way to specify linking without creating different shaders. Of course we could also decide we don't care about separate shaders, and just have a Gallium CSO for a whole complete vertex+geometry+fragment pipeline. This makes things much simpler, but I'm afraid that some applications could suffer catastrophic performance degradation. A good place to find inspiration for this choice could be the nVidia and ATI proprietary driver. Since app/game developers test with those, if they always recompile for each (fs, vs) pair, then we can safely do so too. On the nVidia front, they on one hand patented inter-shader optimization (patent 7426724, filed in 2004), but on the other hand wrote GL_EXT_separate_shader_objects (written in 2009), which doesn't provide significant benefits if the driver always does inter-shader optimization. Thus it seems they may be using a mix of the two techniques, possibly depending driver version and on the shader API being used. What do you think? Is separate compilation important, and we should thus try to fully support it, or can we just drop it? In the first case, we would want a Gallium interface to support specifying routing separately from shaders. In the second case, we would be better off dropping the separate VS/GS/FS shader CSOs and just having a CSO for the whole shader pipeline. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
SM3 usages are arbitrary. For example, you could have some data with a blendweight 5 semantic, where there is no hardware to support it and no meaningful limit to the number. From: Luca Barbieri [l...@luca-barbieri.com] Sent: Tuesday, February 02, 2010 10:09 AM To: Michal Krol Cc: mesa3d-dev@lists.sourceforge.net Subject: Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots The documentation I found seems to support to opposite statement, as the following line: n is an optional integer between 0 and the number of resources supported. For example, POSITION0, TEXCOOR1, etc. in Semantics (DirectX HLSL) on MSDN seems to indicate that if only 2 COLORs are supported, they are denoted by COLOR0 and COLOR1, and that COLOR15 being valid would implying support for simultaneously using at least 16 COLOR semantics. As I understand it, the difference between SM2 and SM3 is that SM2 programs essentially directly use the semantics in instructions, because they have c## registers for colors, t## registers for texcoords, etc. SM3 programs instead use generic i## or o## input/output registers, which are associated to semantics with a declaration. Note that this difference is orthogonal to the issue of whether semantic indices are limited or not. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
Christoph, Luca, Twoside lighting has is a bit of a special case GL-ism. On a lot of hardware we end up implementing it by passing both front and back colors to the fragment shader and selecting between them using the FACE variable. If we removed the implicit fixed-function support for two-side lighting in the rasterizer, it would solve the issue of how this is represented in any routing table. How does that sit with your drivers? Keith From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca Barbieri [l...@luca-barbieri.com] Sent: Monday, February 01, 2010 7:29 AM To: Christoph Bumiller Cc: Keith Whitwell; mesa3d-dev@lists.sourceforge.net Subject: Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots I can't really use a routing table state to produce a cso, because the hw routing table I generate depends on rasterizer state, e.g. I must not put in back face colour (we have a 2 to 1 mapping here) if twoside is disabled. Also, I'm routing based on the scalar *components* the FP reads, not whole TGSI pseudo vec4 registers (NUM_INTERPOLATORS will thus be inaccurate) - set_routing_table will have to pass me the respective programs too. Well, I can still use the cso and insert it into the rest of the routing table that still need to be assembled on the fly, I did that before the 1:1 mapping between FP and VP regs was removed. You are right, the routing table CSO needs to contain the fragment and vertex shader handles, and ideally light_twoside should be moved to the vertex-fragment routing table since it is really an attribute of that and not polygon rasterization/setup. You can then just look at your internal data structure and construct a scalar routing table from the vec4 one provided by Gallium. We could also, as a further extension, support scalar routing tables directly in Gallium. Note however that radeon hardware presumably only supports vector ones, so we would need all 3 options with caps. A further intermediate step could be vector routing tables with swizzling. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
On Mon, Feb 1, 2010 at 5:31 PM, Keith Whitwell kei...@vmware.com wrote: Christoph, Luca, Twoside lighting has is a bit of a special case GL-ism. On a lot of hardware we end up implementing it by passing both front and back colors to the fragment shader and selecting between them using the FACE variable. If we removed the implicit fixed-function support for two-side lighting in the rasterizer, it would solve the issue of how this is represented in any routing table. How does that sit with your drivers? nv40 (and perhaps r300 too?) appears to have 2 hardware back color registers in the vertex shader that are automatically routed, so it would probably be best to leave it that way. Of course, a generic face-dependent routing table could be yet another optional feature. Does any API expose such a thing, perhaps in the form of unlimited rather than 2 front/back colors? (other than by using FACE) -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
DX9 semantic indexes are apparently unlimited, and you can definitely specify COLOR 0..3, I haven't tried to go further. Keith From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca Barbieri [l...@luca-barbieri.com] Sent: Monday, February 01, 2010 8:44 AM To: Keith Whitwell Cc: Christoph Bumiller; mesa3d-dev@lists.sourceforge.net Subject: Re: light_twoside RE: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots On Mon, Feb 1, 2010 at 5:31 PM, Keith Whitwell kei...@vmware.com wrote: Christoph, Luca, Twoside lighting has is a bit of a special case GL-ism. On a lot of hardware we end up implementing it by passing both front and back colors to the fragment shader and selecting between them using the FACE variable. If we removed the implicit fixed-function support for two-side lighting in the rasterizer, it would solve the issue of how this is represented in any routing table. How does that sit with your drivers? nv40 (and perhaps r300 too?) appears to have 2 hardware back color registers in the vertex shader that are automatically routed, so it would probably be best to leave it that way. Of course, a generic face-dependent routing table could be yet another optional feature. Does any API expose such a thing, perhaps in the form of unlimited rather than 2 front/back colors? (other than by using FACE) -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
DX9 semantic indexes are apparently unlimited According to http://msdn.microsoft.com/en-us/library/ee418355%28VS.85%29.aspx, this is not the case. Here is the relevant text: These semantics have meaning when attached to a vertex-shader parameters. These semantics are supported in both Direct3D 9 and Direct3D 10. [...] n is an optional integer between 0 and the number of resources supported. For example, POSITION0, TEXCOOR1, etc. [...] These semantics have meaning when attached to a pixel-shader input parameter. These semantics are supported in both Direct3D 9 and Direct3D 10. [...] n is an optional integer between 0 and the number of resources supported. For example, PSIZE0, COLOR1, etc. Thus, both DX9 and DX10 do not need arbitrary indices. OpenGL also doesn't, as fragment.texcoord[i] has i GL_MAX_TEXTURE_COORDS_ARB. It seems to make sense to follow those APIs in the design of Gallium semantics. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
Luca, I haven't tried to probe crazy high numbers, but within reason, my experience is that the numbers are unconstrained. Certainly, within the range you're suggesting for gallium, there is no constraint in DX9. No doubt where there is a system-interpreted meaning attached to a semantic, that meaning will impose an interpretation on the index and that will imply a limit on the semantic index. For instance, in pixel shader outputs, COLOR[n] means a particular output is destined to be written to colorbuffer n. Nobody is saying there isn't a limit on the number of bound colorbuffers. By implication, the same limit already exists in gallium. Now, your particular hardware has a additional limitation which is fairly unique, and you're pushing a change to gallium which would mimic the restrictions of your hardware. I'm not actually interested in adjusting gallium to the constraints of one particular driver, but *am* quite interested in finding a way to improve linkage issues across the hardware we support. If you take a look at i965, I think you'll see that the change you're suggesting does nothing to avoid retranslating vertex shaders on that platform. Likewise the software rasterizers and any driver relying on the draw module are currently jumping through hoops to emulate a routing table, which wouldn't be improved by your change. But your change does dramatically alter the meaning of one part of gallium and introduces a new raft of hardware capabilities we'd have to be checking and respecting in every state tracker. If we are going to adjust gallium, lets figure out a way to improve linkage generally. Adding a per-driver, per-semantic maximum index query just for the benefit of one driver doesn't strike me as a good trade-off. Keith From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca Barbieri [l...@luca-barbieri.com] Sent: Monday, February 01, 2010 9:15 AM To: Keith Whitwell Cc: mesa3d-dev@lists.sourceforge.net Subject: Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots DX9 semantic indexes are apparently unlimited According to http://msdn.microsoft.com/en-us/library/ee418355%28VS.85%29.aspx, this is not the case. Here is the relevant text: These semantics have meaning when attached to a vertex-shader parameters. These semantics are supported in both Direct3D 9 and Direct3D 10. [...] n is an optional integer between 0 and the number of resources supported. For example, POSITION0, TEXCOOR1, etc. [...] These semantics have meaning when attached to a pixel-shader input parameter. These semantics are supported in both Direct3D 9 and Direct3D 10. [...] n is an optional integer between 0 and the number of resources supported. For example, PSIZE0, COLOR1, etc. Thus, both DX9 and DX10 do not need arbitrary indices. OpenGL also doesn't, as fragment.texcoord[i] has i GL_MAX_TEXTURE_COORDS_ARB. It seems to make sense to follow those APIs in the design of Gallium semantics. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
On 01.02.2010 17:31, Keith Whitwell wrote: Christoph, Luca, Twoside lighting has is a bit of a special case GL-ism. On a lot of hardware we end up implementing it by passing both front and back colors to the fragment shader and selecting between them using the FACE variable. If we removed the implicit fixed-function support for two-side lighting in the rasterizer, it would solve the issue of how this is represented in any routing table. How does that sit with your drivers? Keith It would work, if the COLOR semantic is completely ignored, i.e. I would appreciate the insertion of clamping instructions on the st side (I suspect earlier cards will not have 4 front color registers so clamping will go away for their back colors too ...). I can only select 2 x 8 consecutive scalar values in the routing table to be clamped, and only 1 x 8 will get through to the fragment shader. I'll not be happy to insert clamping manually, but I can do if it turns out to be the best solution to not have the st do it. It's a bit of a waste not to use that hw cap though ... otoh not many apps will use two sided lighting nowadays I suppose. From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca Barbieri [l...@luca-barbieri.com] Sent: Monday, February 01, 2010 7:29 AM To: Christoph Bumiller Cc: Keith Whitwell; mesa3d-dev@lists.sourceforge.net Subject: Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots I can't really use a routing table state to produce a cso, because the hw routing table I generate depends on rasterizer state, e.g. I must not put in back face colour (we have a 2 to 1 mapping here) if twoside is disabled. Also, I'm routing based on the scalar *components* the FP reads, not whole TGSI pseudo vec4 registers (NUM_INTERPOLATORS will thus be inaccurate) - set_routing_table will have to pass me the respective programs too. Well, I can still use the cso and insert it into the rest of the routing table that still need to be assembled on the fly, I did that before the 1:1 mapping between FP and VP regs was removed. You are right, the routing table CSO needs to contain the fragment and vertex shader handles, and ideally light_twoside should be moved to the vertex-fragment routing table since it is really an attribute of that and not polygon rasterization/setup. You can then just look at your internal data structure and construct a scalar routing table from the vec4 one provided by Gallium. We could also, as a further extension, support scalar routing tables directly in Gallium. Note however that radeon hardware presumably only supports vector ones, so we would need all 3 options with caps. A further intermediate step could be vector routing tables with swizzling. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
I haven't tried to probe crazy high numbers, but within reason, my experience is that the numbers are unconstrained. No, according to that document if you use TEXCOORD[n] then n NUM_TEXCOORDS. TEXCOORD[n] Texture coordinates float4 [...] n is an optional integer between 0 and the number of resources supported. For example, POSITION0, TEXCOOR1, etc. Also look at the spec for ARB_fragment_program: fragAttribItem ::= color optColorType | texcoord optTexCoordNum [...] optTexCoordNum ::= | [ texCoordNum ] optColorType ::= | . primary | . secondary texCoordNum ::= integer from 0 to MAX_TEXTURE_COORDS_ARB-1 fragment.texcoord has the index limited by MAX_TEXTURE_COORDS_ARB. It seems to me pretty clear from the above references that *all* 3D APIs (i.e. DX9, DX10 and GL) have semantic indices in the range 0...N-1 where N is the limit appropriate for the specific semantic. I think these references contradict your hypotesis that there is no constraint in DX9. Am I misunderstanding something completely? Do you disagree with the fact that those references clearly show that semantic indices are limited by hardware resources? Now, your particular hardware has a additional limitation which is fairly unique, and you're pushing a change to gallium which would mimic the restrictions of your hardware. I'm not actually interested in adjusting gallium to the constraints of one particular driver, but *am* quite interested in finding a way to improve linkage issues across the hardware we support. No, it is a limitation that any hardware that does a direct implementation of OpenGL has. For instance, I'd guess that the VMWare driver works around that problem somewhere since it ultimately uses the host OpenGL implementation. With my proposal, it could just convert GENERIC[0] into fragment.texcoord[0] and likewise for others. But your change does dramatically alter the meaning of one part of gallium and introduces a new raft of hardware capabilities we'd have to be checking and respecting in every state tracker. Not at all. All code except the GLSL linker will work optimally as is, since it uses indices sequentially starting from 0 (or implements an API that does). I provided a patch to fix the GLSL linker. If we are going to adjust gallium, lets figure out a way to improve linkage generally. Adding a per-driver, per-semantic maximum index query just for the benefit of one driver doesn't strike me as a good trade-off. It is already necessary to have that to implement glGet of GL_MAX_TEXTURE_COORDS_ARB, for the TEXCOORD capability. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
Luca, Where the semantic indicates some relationship to actual system resources, I agree that the number is constrained by the number of those system resources. In the case of the gallium GENERIC semantic, there is explicitly no system resource that semantic is referring to and hence no limit on the index. I feel like we're going in circles here. We agree that we want to improve linkage, you have a patch that helps your driver, but please accept that it doesn't solve the wider problem. Keith From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca Barbieri [l...@luca-barbieri.com] Sent: Monday, February 01, 2010 10:50 AM To: Keith Whitwell Cc: mesa3d-dev@lists.sourceforge.net Subject: Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots I haven't tried to probe crazy high numbers, but within reason, my experience is that the numbers are unconstrained. No, according to that document if you use TEXCOORD[n] then n NUM_TEXCOORDS. TEXCOORD[n] Texture coordinates float4 [...] n is an optional integer between 0 and the number of resources supported. For example, POSITION0, TEXCOOR1, etc. Also look at the spec for ARB_fragment_program: fragAttribItem ::= color optColorType | texcoord optTexCoordNum [...] optTexCoordNum ::= | [ texCoordNum ] optColorType ::= | . primary | . secondary texCoordNum ::= integer from 0 to MAX_TEXTURE_COORDS_ARB-1 fragment.texcoord has the index limited by MAX_TEXTURE_COORDS_ARB. It seems to me pretty clear from the above references that *all* 3D APIs (i.e. DX9, DX10 and GL) have semantic indices in the range 0...N-1 where N is the limit appropriate for the specific semantic. I think these references contradict your hypotesis that there is no constraint in DX9. Am I misunderstanding something completely? Do you disagree with the fact that those references clearly show that semantic indices are limited by hardware resources? Now, your particular hardware has a additional limitation which is fairly unique, and you're pushing a change to gallium which would mimic the restrictions of your hardware. I'm not actually interested in adjusting gallium to the constraints of one particular driver, but *am* quite interested in finding a way to improve linkage issues across the hardware we support. No, it is a limitation that any hardware that does a direct implementation of OpenGL has. For instance, I'd guess that the VMWare driver works around that problem somewhere since it ultimately uses the host OpenGL implementation. With my proposal, it could just convert GENERIC[0] into fragment.texcoord[0] and likewise for others. But your change does dramatically alter the meaning of one part of gallium and introduces a new raft of hardware capabilities we'd have to be checking and respecting in every state tracker. Not at all. All code except the GLSL linker will work optimally as is, since it uses indices sequentially starting from 0 (or implements an API that does). I provided a patch to fix the GLSL linker. If we are going to adjust gallium, lets figure out a way to improve linkage generally. Adding a per-driver, per-semantic maximum index query just for the benefit of one driver doesn't strike me as a good trade-off. It is already necessary to have that to implement glGet of GL_MAX_TEXTURE_COORDS_ARB, for the TEXCOORD capability. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
Where the semantic indicates some relationship to actual system resources, I agree that the number is constrained by the number of those system resources. In the case of the gallium GENERIC semantic, there is explicitly no system resource that semantic is referring to and hence no limit on the index. GENERIC[i] refers to a slot in the output register file of the vertex shader or a slot in the input register file of the fragment shader. It also refers to the interpolator unit that interpolates data between those two registers. Since interpolators are usually available in a finite number and register files also are usually physically limited, there should be a limit on the index. I feel like we're going in circles here. We agree that we want to improve linkage, you have a patch that helps your driver, but please accept that it doesn't solve the wider problem. Yes. In the following I try to write out my reasoning step by step, in the hope of making it clearer and making it easier to both establish which facts we agree are true and pinpoint what we may disagree on. This is a list of steps that are leading me to conclusion that it is best to change the Gallium rules so that semantic indices must be in the range 0..N-1, where is N is the maximum number of simultaneously available registers with that semantic, apply my GLSL patch to fix GLSL, and after doing that, consider extending Gallium by letting the user specify a routing table to link these limited index semantics with something other than an identity mapping. Please tell me which points you find are incorrect, or why one any deduction does not follow from the antecedents. 1. All the semantic indices in OpenGL are limited, according to the ARB specification 2. All the sematic indices in DirectX 9/10 are limited, according to http://msdn.microsoft.com/en-us/library/ee418355%28VS.85%29.aspx 3. In the OpenGL/DirectX 9/10 model, there are a fixed number of interpolators, numbered from 0 to N - 1. Interpolator K reads from vertex shader output register K, interpolates and writes to fragment shader input register K. 4. Some cards (e.g. r300), but not all, allow to configure the vertex shader input register and fragment shader output register that interpolator K reads and writes. 5. Such register inputs are usually offsets in a physical register file, and thus are limited to the physical size of that register file 6. No API exposes the functionality in point 4 and all expose the more rigid model in (3.) 7. Gallium GENERIC is equivalent to OpenGL texcoord and DirectX 9/10 TEXCOORD semantics 8. texcoord is called this way because of historical reasons, since fixed pipelines could use the values only for texture sampling. GENERIC is called GENERIC instead of TEXCOORD because Gallium was designed with a programmable pipeline in mind. 9. The current Mesa implementation of ARB_fp/vp translates texcoord[i] to GENERIC[i] 10. fragment.texcoord[K] has K limited by GL_MAX_TEXTURE_COORDS_ARB 11. Because of (9.) and (10.), the current Mesa implementation of ARB_vp/fp uses GENERIC indices limited by GL_MAX_TEXTURE_COORDS_ARB (perhaps plus a very small constant) 12. Because of (2.), a straightforward Gallium DirectX state tracker would also use GENERIC indices limited by the number of interpolators 13. If GLSL did not reserve sematic indices for unused gl_TexCoord[] varyings, but allocated varyings sequentially, then it would use semantic indices sequentially starting from 0 14. My patch implements (13.) 15. The xorg, vega and g3dvl state trackers use GENERIC indices starting from 0 up to 1, 1 and 7 respectively 16. Because of (11.), (12.), (13.), (14.) and (15.), after applying my patch, limiting the value of GENERIC semantic indices to the number of interpolators would not adversely affect Mesa/Gallium functionality in any way, probably including the VMware DirectX state tracker 17. Driver code would be simplified by not having to worry about any register semantic remapping. It will be possible to separately compile fragment and vertex shaders on all hardware. The CPU usage of all drivers will be reduced, especially when switching shaders (a fast path!) 18. Thus, (16.) is a net gain for Gallium, and should go forward [Note: my current nv40 tree does exactly (16.) this and this does not seem to be a source of any problem] Points that lead me to propose a routing table CSO *IN ADDITION* to applying my GLSL patch: 19. Some current 3D APIs (ARB_fp/vp, DX PS, EXT_separate_shader_objects) link vertex and fragment shaders by matching physical register file offset, limited to index N - 1 where N is the maximum number of usage variables (see (3.)) 20. Other 3D APIs (GLSL) link by matching variable name. This forces to have the requirement, in unextended GLSL, to provide both the fragment and vertex shaders at once in the link step 21. No API links by matching abstract unlimited variable number, except some Gallium driver interfaces such as r300 22. It would be
Re: [Mesa3d-dev] light_twoside RE: [PATCH] glsl: put varyings in texcoord slots
A possible limitation of this scheme is that it doesn't readily map to hardware that can configure its own interpolators to behave either as GENERIC, COLOR (or some other semantic) dynamically. However, it seems to me that at least ARB_fragment_program only requires and supports 2 COLOR registers (primary and secondary), 1 FOG register and 1 PSIZE register. I'm not sure if any API can support more than 2 COLOR, 1 FOG and 1 PSIZE register as vertex shader outputs/fragment shader inputs (note that this is totally different from COLOR as a fragment shader output, where each COLOR semantic maps to a different render target), and thus I'm not sure if such functionality is useful. A driver with that functionality wishing to let an application use more than 2 COLORs is however free to do remapping in the driver even under my proposal. It just doesn't _have_ to do it. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev