On Mon, 2010-12-13 at 12:01 -0800, Christoph Bumiller wrote: > I want to warm this up again adding nvc0 and > GL_ARB_separate_shader_objects to the picture. > > The latter extends GL_EXT_separate_shader_objects to support user > defined varyings and guarantees well defined behaviour only if > - varyings are declared inside the gl_PerVertex/gl_PerFragment block the > blocks match exactly in name, type, qualification, and (most > significantly) declaration order. > - varyings are assigned matching location qualifiers: > like: layout(location = 3) in vec4 normal > "The number of input locations available to a shader is limited." > > So, I propose to (loosely) identify GENERIC semantic indices with these > location qualifiers and let the pipe driver set a limit on the allowed > maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at least > support 219 of them - nvc0 offsers 0x200 bytes for generic inputs/outputs).
This sounds fine actually. We kicked this around before & I was basically ok with the last iteration of the proposal, but this seems ok too. As far as I can tell from a gallium perspective you're really just proposing a new pipe cap _MAX_INPUTS (actually _MAX_GENERIC_INDEX would be clearer), which the state tracker thereafter has to respect? That would be fine with me. > My motivation is mostly that the hardware routing table for shader > varyings that was present on nv50 has been removed with nvc0 (Fermi). > And I'm glad, because filling 4 routing tables (since we have 5 shader > types now) is somewhat annoying. And so applying relocations to shaders > - it can be done, it's probably not too time consuming, but it's just > plain *unnecessary* (and thus stupid) for OpenGL. > > Now about d3d9 ... > 1. don't care, I don't see a d3d9 state tracker > 2. http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx > says "n is an optional integer between 0 and the number of resources > supported" - what "supported" means here isn't clear to me, but, I > didn't find any example where someone used something OpenGL doesn't have > (like COLOR2). > 3. > http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics > says "Input semantics are similar to the values in the D3DDECLUSAGE." > and > DECLUSAGE sounds like you're limited to sane values. I think you're on the right track with (1)... It's fairly pointless trying to discuss code here which isn't public & I don't think people need to be worrying about what may or may not be important for code they can't see. I know this idea previously got tied up with speculation about what a DX9 state tracker might or might not require, but in retrospect I wish I'd been able to steer conversation away from that. The work on closed components may drive a lot of the feature development and new interfaces, but there's usually enough flexibility that this sort of cleanup isn't a big deal. Keith > Not sure if anyone wants to think about this issue at this time (since > implementation of ARB_separate_shader_objects is probably far in the GL4 > future), but I'd be happy about any comments. > > Regards, > Christoph > > On 04/13/2010 12:55 PM, Luca Barbieri wrote: > > This patch series is intended to resolve the issue of semantic-based shader > > linkage in Gallium. > > It can also be found in the RFC-gallium-semantics branch. > > > > It does not change the current Gallium design, but rather formalizes some > > limitations to it, and provides infrastructure to implement this model more > > easily in drivers, along with a full nv30/nv40 implementation. > > > > These limitations are added to allow an efficient implementation for both > > hardware lacking special support and hardware having support but also > > special constraints. > > > > Note that this does NOT resolve all issues, and there are quite a bit left > > to future refinement. > > > > In particular, the following issues are still open: > > 1. COLOR clamping (and floating point framebuffers) > > 2. A linkage table CSO allowing to specify non-identity linkage > > 3. BCOLOR/FACE-related issues > > 4. Adding a cap to inform the state tracker that more than 219 generic > > indices are provided > > > > This topic was already very extensively discussed. > > See > > http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg10865.html > > for some early inconclusive discussion around an early implementation that > > modified the GLSL linker (which is NOT being proposed here) > > See > > http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg12016.html > > for some more discussion that seemed to mostly reach a consensus over the > > approach proposed here. > > See in particular > > http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg12041.html . > > > > That said, I'm going to try to repeat all information here, partially by > > copy&pasting from earlier messages. > > This message should probably be adapted into gallium/docs if/when this is > > accepted. > > > > Here is the short summary; the long rationale follows after it. > > > > The proposal here is to add the following limitations to Gallium, for the > > intermediate semantics: > > 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was > > never merged > > 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with > > semantic index 0 > > 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that > > this doesn't apply to fragment outputs) > > 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR > > is not used > > 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR > > IS used > > 6. GENERIC can be used with semantic indices 0-255 on almost all drivers > > (those that don't need the 0-218 limitation) > > 7. Some drivers may also choose to support GENERIC with arbitrary indices, > > but that should generally not happen > > > > The reason of this, in short, is that this maps directly to DirectX 9 SM3, > > which is the most problematic interface of all. > > > > The peculiar problem we have here is that we have two competing constraints > > that force us into choosing the exact SM3 value: > > 1. The VMware SVGA driver must deal with an SM3 host interface and would > > ideally want to directly feed the Gallium semantics to the host > > 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would > > ideally want to directly feed the SM3 semantics to Gallium > > > > Note that this is not a reference to the VMware DirectX 9 state tracker, > > since its authors haven't provided details about its handling of shader > > semantics. > > > > SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus > > POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that > > wouldn't be mapped to GENERIC. > > However, Gallium drivers that don't benefit from having specific contraints > > (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 > > work does that. > > > > The expected implementation, if no hardware support exists, is to build a > > list of relocations to apply to either the fragment or the vertex shader, > > and patch one of them at validation time to match the other. > > Data structures are provided in gallium/auxiliary to ease this, and try to > > minimize the number of times where this needs to be performed. > > > > Let's now proceed to the discussion and detailed rationale, mostly > > constructed by copy&pasting older messages. > > > > =============== > > Michal Krol's proposal > > =============== > > > > First of all, see Michal Krol's proposal at > > http://www.opensource-archive.org/showthread.php?t=148573, and in > > particular: > > << > > name index range > > ---------------------------- > > POSITION no limit? > > COLOR 0..1, explicit clamp? > > BCOLOR 0..1, explicit clamp? > > FOG remove? > > PSIZE 0 > > GENERIC 0..<max generics> > > NORMAL remove > > FACE 0 > > EDGEFLAG 0 > > PRIMID 0 > > INSTANCEID 0 > >>> > > > > My proposal follows this, except for limiting POSITION to 0 too. > > Not sure why Michal thought "no limit" could make sense: the POSITION is > > fundamentally a singleton, since it is the input to the rasterizer unit. > > > > > > ====================== > > An overview of hardware support > > ====================== > > > > Hardware with no capabilities. > > - nv30 does not support any mapping. However, we already need to patch > > fragment programs to insert constants, so we can patch input register > > numbers as well. The current driver only supports 0-7 generic indices, > > but I already implemented support for 0-255 indices with in-driver > > linkage and patching. Note that nv30 lacks control flow in fragment > > programs. > > - nv40 is like nv30, but supports fp control flow, and may have some > > configurable mapping support, with unknown behavior > > > > Hardware with capabilities that must be configured for each fp/vp pair. > > - nv40 might have this but the nVidia OpenGL driver does not use them > > - nv50 has configurable vp->gp and gp->fp mappings with 64 entries. > > The current Gallium driver seems to support arbitrary 0-2^32 indices, but > > uses an inefficient O(n^2) algorithm to be able to do that > > > > - r300 appears to have a configurable vp->fp mapping. The current > > driver only supports 0-15 generic indices, but redefining > > ATTR_GENERIC_COUNT could be enough to have it support larger numbers. > > > > Hardware with automatic linkage when semantics match: > > - VMWare svga appears to support 14 * 16 semantics, but the current > > driver only supports 0-15 generic indices. This could be fixed by > > mapping GENERIC into all non-special SM3 semantics. > > > > Hardware that can do both configurable mappings and automatic linkage: > > - r600 supports linkage in hardware between matching apparently > > byte-sized semantic ids > > > > Other hardware; > > - i915 has no hardware vertex shading > > The current driver is broken and only supports 0-7 indices: this seems > > easy to fix though > > - Not sure about i965 > > > > =================== > > An overview of software APIs > > =================== > > > > 1. DirectX 9 SM3 supports indices in the 0-15 range associated with > > semantics in the 0-13 range. > > > > A few of the name/index pairs have special meanings, but the others > > are just cosmetic as long as the fixed pipeline is not used. > > > > Thus, SM3 wants to use 14 * 16 indices overall. > > > > Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC > > semantics, leaving 219 semantics handled by GENERIC > > > > 2. SM2 and non-GLSL OpenGL just want to use as many indices as the > > hardware interpolator count, sometimes limiting that further > > > > They are the most easy and straightforward ones. > > > > 3. DirectX 10 seems to only require a 0-31 range. > > > > In particular, the fxc.exe compiler allows to specify arbitrary _strings_ > > and > > 32-bit indices. > > > > However, this information is encoded as metadata in the output file, and > > the shader bytecode itself uses integers in the 0-31 range to refer to the > > metadata. > > > > It seems that the metadata is resolved by the Microsoft DirectX 10 runtime, > > and the driver only sees 0-31 indices on the DDI interface. > > > > However, this is a bit unclear: confirmation or correction would be > > appreciated. > > > > 4. GLSL requires to provide both shaders at link time, and thus does > > not constrain the implementation in any way. > > > > However, it may be possible to mix GLSL with other shaders, leading to > > the need to reserve the texcoord slots. > > > > In that case, GLSL will need about 8 more slots that the number of > > effectively used semantics. > > > > This is the case with the current Mesa/Gallium implementation > > > > 5. GLSL with EXT_separate_shader_objects does not add requirements > > because only gl_TexCoord and other builtin varyings are supported. > > User-defined varyings are not supported > > > > See in particular the following text from the extension: > > << > > It is undesirable from a performance standpoint to attempt to > > support "rendezvous by name" for arbitrary separate shaders > > because the separate shaders won't be naturally compiled to > > match their varying inputs and outputs of the same name without > > a special link step. Such a special link would introduce an > > extra validation overhead to binding separate shaders. The link > > itself would have to be deferred until glBegin time since separate > > shaders won't match when transitioning from one set of consistent > > shaders to another. This special link would still create errors > > or undefined behavior when the names of input and output varyings > > matched but their types did not match. > >>> > > > > 6. An hypotetical version of EXT_separate_shader_objects extended to > > support user-defining varyings would either want arbitrary 32-bit > > generic indices (by interning strings to generate the indices) or the > > ability to specify a custom mapping between shader indices > > > > 7. An hypotetical "no-op" implementation of the GLSL linker would have > > the same requirement > > > > > > ==================== > > About non-GENERIC semantics > > ==================== > > > > Also note that non-GENERIC semantics have peculiar properties. > > > > For COLOR and BCOLOR: > > 1. SM3 and OpenGL with glColorClamp appropriately set wants it to > > _not_ be clamped to [0, 1] > > 2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1] > > (sometimes for fixed point targets only) and may also allow using > > U8_UNORM precision for it instead of FP32 > > 3. OpenGL allows to enable two-sided lighting, in which case COLOR in > > the fragment shader is automagically set to BCOLOR for back faces > > 4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING. > > Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware. > > The latest hardware probably supports FACING only. > > > > Any API that requires special semantics for COLOR and BCOLOR (i.e. > > non-SM3) seems to only want 0-1 indices. > > > > Note that SM3 does *not* include BCOLOR, so basically the limits for > > generic indices would need to be conditional on BCOLOR being present > > or not (e.g. if it is present, we must reserve two semantic slots in > > svga for it). > > > > POSITION0 is obviously special. > > PSIZE0 is also special for points. > > > > FOG0 seems right now to just be a GENERIC with a single component. > > Gallium could be extended to support fixed function fog, which most > > DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal > > to the semantic issue. > > > > ============== > > Current Gallium users > > ============== > > > > Right now no open-source users of Gallium fundamentally require arbitrary > > indices. > > In particular: > > 1. GLSL and anything with similar link-by-name can of course be modified to > > use sequential indices > > 2. ARB fragment program and vertex program use index-limited texcoord slots > > 3. g3dvl needs and uses 8 texcoord slots, indices 0-7 > > 4. vega and xorg use indices 0-1 > > 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken > > 6. The GL_EXT_separate_shader_objects extension does not provide > > arbitrary index matching for GLSL, but merely lets it use a model > > similar to ARB fp/vp > > > > However, the GLSL linker needs them in its current form, and the capability > > can be generally useful anyway. > > > > =================== > > Discussion of possible options > > =================== > > > > [Options from Keith Whitwell, see > > http://www.opensource-archive.org/showthread.php?p=180719] > > a) Picking a lower number like 128, that an SM3 state tracker could > > usually be able to directly translate incoming semantics into, but which > > would force it to renumber under rare circumstances. This would make > > life easier for the open drivers at the expense of the closed code. > > > > b) Picking 256 to make life easier for some closed-source SM3 state > > tracker, but harder for open drivers. > > > > c) Picking 219 (or some other magic number) that happens to work with > > the current set of constraints, but makes gallium fragile in the face of > > new constraints. > > > > d) Abandoning the current gallium linkage rules and coming up with > > something new, for instance forcing the state trackers to renumber > > always and making life trivial for the drivers... > > > > [Options from me] > > > > (e) Allow arbitrary 32-bit indices. This requires slightly more > > complicated data structures in some cases, and will require svga and > > r600 to fallback to software linkage if numbers are too high. > > > > (f) Limit semantic indices to hardware interpolators _and_ introduce > > an interface to let the user specify an > > > > Personally I think the simplest idea for now could be to have all > > drivers support 256 indices or, in the case of r600 and svga, the > > maximum value supported by the hardware, and expose that as a cap (as > > well as another cap for the number of different semantic values > > supported at once). > > The minimum guaranteed value is set to the lowest hardware constraint, > > which would be svga with 219 indices (assuming no bcolor is used). > > If some new constraints pop up, we just lower it and change SM3 state > > trackers to check for it and fallback otherwise. > > > > This should just require simple fixes to svga and r300, and > > significant code for nv30/nv40, which is however already implemented. > > > > Luca Barbieri (5): > > tgsi: formalize limits on semantic indices > > tgsi: add support for packing semantics in SM3 byte values > > gallium/auxiliary: add semantic linkage utility code > > nvfx: support proper shader linkage - adds glsl support > > nvfx: expose GLSL > > > > Michal Krol (1): > > gallium: Remove TGSI_SEMANTIC_NORMAL. > > > > > > ------------------------------------------------------------------------------ > > Download Intel® Parallel Studio Eval > > Try the new software tools for yourself. Speed compiling, find bugs > > proactively, and fine-tune applications for parallel performance. > > See why Intel Parallel Studio got high marks during beta. > > http://p.sf.net/sfu/intel-sw-dev > > _______________________________________________ > > Mesa3d-dev mailing list > > Mesa3d-dev@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > _______________________________________________ > Mesa3d-dev mailing list > Mesa3d-dev@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev ------------------------------------------------------------------------------ Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev