On Mon, 2010-12-13 at 12:01 -0800, Christoph Bumiller wrote:
> I want to warm this up again adding nvc0 and
> GL_ARB_separate_shader_objects to the picture.
> 
> The latter extends GL_EXT_separate_shader_objects to support user
> defined varyings and guarantees well defined behaviour only if
> - varyings are declared inside the gl_PerVertex/gl_PerFragment block the
> blocks match exactly in name, type, qualification, and (most
> significantly) declaration order.
> - varyings are assigned matching location qualifiers:
> like: layout(location = 3) in vec4 normal
> "The number of input locations available to a shader is limited."
> 
> So, I propose to (loosely) identify GENERIC semantic indices with these
> location qualifiers and let the pipe driver set a limit on the allowed
> maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at least
> support 219 of them - nvc0 offsers 0x200 bytes for generic inputs/outputs).

This sounds fine actually.  We kicked this around before & I was
basically ok with the last iteration of the proposal, but this seems ok
too.

As far as I can tell from a gallium perspective you're really just
proposing a new pipe cap _MAX_INPUTS (actually _MAX_GENERIC_INDEX would
be clearer), which the state tracker thereafter has to respect?

That would be fine with me.

> My motivation is mostly that the hardware routing table for shader
> varyings that was present on nv50 has been removed with nvc0 (Fermi).
> And I'm glad, because filling 4 routing tables (since we have 5 shader
> types now) is somewhat annoying. And so applying relocations to shaders
> - it can be done, it's probably not too time consuming, but it's just
> plain *unnecessary* (and thus stupid) for OpenGL.
> 
> Now about d3d9 ...
> 1. don't care, I don't see a d3d9 state tracker
> 2. http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx
> says "n is an optional integer between 0 and the number of resources
> supported" - what "supported" means here isn't clear to me, but, I
> didn't find any example where someone used something OpenGL doesn't have
> (like COLOR2).
> 3.
> http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics
> says "Input semantics are similar to the values in the D3DDECLUSAGE."
> and
> DECLUSAGE sounds like you're limited to sane values.

I think you're on the right track with (1)...  It's fairly pointless
trying to discuss code here which isn't public & I don't think people
need to be worrying about what may or may not be important for code they
can't see.

I know this idea previously got tied up with speculation about what a
DX9 state tracker might or might not require, but in retrospect I wish
I'd been able to steer conversation away from that.

The work on closed components may drive a lot of the feature development
and new interfaces, but there's usually enough flexibility that this
sort of cleanup isn't a big deal.


Keith

> Not sure if anyone wants to think about this issue at this time (since
> implementation of ARB_separate_shader_objects is probably far in the GL4
> future), but I'd be happy about any comments.
> 
> Regards,
> Christoph
> 
> On 04/13/2010 12:55 PM, Luca Barbieri wrote:
> > This patch series is intended to resolve the issue of semantic-based shader 
> > linkage in Gallium.
> > It can also be found in the RFC-gallium-semantics branch.
> >
> > It does not change the current Gallium design, but rather formalizes some 
> > limitations to it, and provides infrastructure to implement this model more 
> > easily in drivers, along with a full nv30/nv40 implementation.
> >
> > These limitations are added to allow an efficient implementation for both 
> > hardware lacking special support and hardware having support but also 
> > special constraints.
> >
> > Note that this does NOT resolve all issues, and there are quite a bit left 
> > to future refinement.
> >
> > In particular, the following issues are still open:
> > 1. COLOR clamping (and floating point framebuffers)
> > 2. A linkage table CSO allowing to specify non-identity linkage
> > 3. BCOLOR/FACE-related issues
> > 4. Adding a cap to inform the state tracker that more than 219 generic 
> > indices are provided
> >
> > This topic was already very extensively discussed.
> > See 
> > http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg10865.html 
> > for some early inconclusive discussion around an early implementation that 
> > modified the GLSL linker (which is NOT being proposed here)
> > See 
> > http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg12016.html 
> > for some more discussion that seemed to mostly reach a consensus over the 
> > approach proposed here.
> > See in particular 
> > http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg12041.html .
> >
> > That said, I'm going to try to repeat all information here, partially by 
> > copy&pasting from earlier messages.
> > This message should probably be adapted into gallium/docs if/when this is 
> > accepted.
> >
> > Here is the short summary; the long rationale follows after it.
> >
> > The proposal here is to add the following limitations to Gallium, for the 
> > intermediate semantics:
> > 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was 
> > never merged
> > 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with 
> > semantic index 0
> > 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that 
> > this doesn't apply to fragment outputs)
> > 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR 
> > is not used
> > 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR 
> > IS used
> > 6. GENERIC can be used with semantic indices 0-255 on almost all drivers 
> > (those that don't need the 0-218 limitation)
> > 7. Some drivers may also choose to support GENERIC with arbitrary indices, 
> > but that should generally not happen
> >
> > The reason of this, in short, is that this maps directly to DirectX 9 SM3, 
> > which is the most problematic interface of all.
> >
> > The peculiar problem we have here is that we have two competing constraints 
> > that force us into choosing the exact SM3 value:
> > 1. The VMware SVGA driver must deal with an SM3 host interface and would 
> > ideally want to directly feed the Gallium semantics to the host
> > 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would 
> > ideally want to directly feed the SM3 semantics to Gallium
> >
> > Note that this is not a reference to the VMware DirectX 9 state tracker, 
> > since its authors haven't provided details about its handling of shader 
> > semantics.
> >
> > SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus 
> > POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that 
> > wouldn't be mapped to GENERIC.
> > However, Gallium drivers that don't benefit from having specific contraints 
> > (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 
> > work does that.
> >
> > The expected implementation, if no hardware support exists, is to build a 
> > list of relocations to apply to either the fragment or the vertex shader, 
> > and patch one of them at validation time to match the other.
> > Data structures are provided in gallium/auxiliary to ease this, and try to 
> > minimize the number of times where this needs to be performed.
> >
> > Let's now proceed to the discussion and detailed rationale, mostly 
> > constructed by copy&pasting older messages.
> >
> > ===============
> > Michal Krol's proposal
> > ===============
> >
> > First of all, see Michal Krol's proposal at 
> > http://www.opensource-archive.org/showthread.php?t=148573, and in 
> > particular:
> > <<
> > name index range
> > ----------------------------
> > POSITION no limit?
> > COLOR 0..1, explicit clamp?
> > BCOLOR 0..1, explicit clamp?
> > FOG remove?
> > PSIZE 0
> > GENERIC 0..<max generics>
> > NORMAL remove
> > FACE 0
> > EDGEFLAG 0
> > PRIMID 0
> > INSTANCEID 0
> >>>
> >
> > My proposal follows this, except for limiting POSITION to 0 too.
> > Not sure why Michal thought "no limit" could make sense: the POSITION is 
> > fundamentally a singleton, since it is the input to the rasterizer unit.
> >
> >
> > ======================
> > An overview of hardware support
> > ======================
> >
> > Hardware with no capabilities.
> > - nv30 does not support any mapping. However, we already need to patch
> > fragment programs to insert constants, so we can patch input register
> > numbers as well. The current driver only supports 0-7 generic indices,
> > but I already implemented support for 0-255 indices with in-driver
> > linkage and patching. Note that nv30 lacks control flow in fragment
> > programs.
> > - nv40 is like nv30, but supports fp control flow, and may have some
> > configurable mapping support, with unknown behavior
> >
> > Hardware with capabilities that must be configured for each fp/vp pair.
> > - nv40 might have this but the nVidia OpenGL driver does not use them
> > - nv50 has configurable vp->gp and gp->fp mappings with 64 entries.
> > The current Gallium driver seems to support arbitrary 0-2^32 indices, but 
> > uses an inefficient O(n^2) algorithm to be able to do that
> >
> > - r300 appears to have a configurable vp->fp mapping. The current
> > driver only supports 0-15 generic indices, but redefining
> > ATTR_GENERIC_COUNT could be enough to have it support larger numbers.
> >
> > Hardware with automatic linkage when semantics match:
> > - VMWare svga appears to support 14 * 16 semantics, but the current
> > driver only supports 0-15 generic indices. This could be fixed by
> > mapping GENERIC into all non-special SM3 semantics.
> >
> > Hardware that can do both configurable mappings and automatic linkage:
> > - r600 supports linkage in hardware between matching apparently
> > byte-sized semantic ids
> >
> > Other hardware;
> > - i915 has no hardware vertex shading
> > The current driver is broken and only supports 0-7 indices: this seems
> > easy to fix though
> > - Not sure about i965
> >
> > ===================
> > An overview of software APIs
> > ===================
> >
> > 1. DirectX 9 SM3 supports indices in the 0-15 range associated with
> > semantics in the 0-13 range.
> >
> > A few of the name/index pairs have special meanings, but the others
> > are just cosmetic as long as the fixed pipeline is not used.
> >
> > Thus, SM3 wants to use 14 * 16 indices overall.
> >
> > Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC
> > semantics, leaving 219 semantics handled by GENERIC
> >
> > 2. SM2 and non-GLSL OpenGL just want to use as many indices as the
> > hardware interpolator count, sometimes limiting that further
> >
> > They are the most easy and straightforward ones.
> >
> > 3. DirectX 10 seems to only require a 0-31 range.
> >
> > In particular, the fxc.exe compiler allows to specify arbitrary _strings_ 
> > and
> > 32-bit indices.
> >
> > However, this information is encoded as metadata in the output file, and
> > the shader bytecode itself uses integers in the 0-31 range to refer to the
> > metadata.
> >
> > It seems that the metadata is resolved by the Microsoft DirectX 10 runtime,
> > and the driver only sees 0-31 indices on the DDI interface.
> >
> > However, this is a bit unclear: confirmation or correction would be
> > appreciated.
> >
> > 4. GLSL requires to provide both shaders at link time, and thus does
> > not constrain the implementation in any way.
> >
> > However, it may be possible to mix GLSL with other shaders, leading to
> > the need to reserve the texcoord slots.
> >
> > In that case, GLSL will need about 8 more slots that the number of
> > effectively used semantics.
> >
> > This is the case with the current Mesa/Gallium implementation
> >
> > 5. GLSL with EXT_separate_shader_objects does not add requirements
> > because only gl_TexCoord and other builtin varyings are supported.
> > User-defined varyings are not supported
> >
> > See in particular the following text from the extension:
> > <<
> >         It is undesirable from a performance standpoint to attempt to
> >         support "rendezvous by name" for arbitrary separate shaders
> >         because the separate shaders won't be naturally compiled to
> >         match their varying inputs and outputs of the same name without
> >         a special link step.  Such a special link would introduce an
> >         extra validation overhead to binding separate shaders.  The link
> >         itself would have to be deferred until glBegin time since separate
> >         shaders won't match when transitioning from one set of consistent
> >         shaders to another.  This special link would still create errors
> >         or undefined behavior when the names of input and output varyings
> >         matched but their types did not match.
> >>>
> >
> > 6. An hypotetical version of EXT_separate_shader_objects extended to
> > support user-defining varyings would either want arbitrary 32-bit
> > generic indices (by interning strings to generate the indices) or the
> > ability to specify a custom mapping between shader indices
> >
> > 7. An hypotetical "no-op" implementation of the GLSL linker would have
> > the same requirement
> >
> >
> > ====================
> > About non-GENERIC semantics
> > ====================
> >
> > Also note that non-GENERIC semantics have peculiar properties.
> >
> > For COLOR and BCOLOR:
> > 1. SM3 and OpenGL with glColorClamp appropriately set wants it to
> > _not_ be clamped to [0, 1]
> > 2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1]
> > (sometimes for fixed point targets only) and may also allow using
> > U8_UNORM precision for it instead of FP32
> > 3. OpenGL allows to enable two-sided lighting, in which case COLOR in
> > the fragment shader is automagically set to BCOLOR for back faces
> > 4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING.
> > Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware.
> > The latest hardware probably supports FACING only.
> >
> > Any API that requires special semantics for COLOR and BCOLOR (i.e.
> > non-SM3) seems to only want 0-1 indices.
> >
> > Note that SM3 does *not* include BCOLOR, so basically the limits for
> > generic indices would need to be conditional on BCOLOR being present
> > or not (e.g. if it is present, we must reserve two semantic slots in
> > svga for it).
> >
> > POSITION0 is obviously special.
> > PSIZE0 is also special for points.
> >
> > FOG0 seems right now to just be a GENERIC with a single component.
> > Gallium could be extended to support fixed function fog, which most
> > DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal
> > to the semantic issue.
> >
> > ==============
> > Current Gallium users
> > ==============
> >
> > Right now no open-source users of Gallium fundamentally require arbitrary 
> > indices.
> > In particular:
> > 1. GLSL and anything with similar link-by-name can of course be modified to 
> > use sequential indices
> > 2. ARB fragment program and vertex program use index-limited texcoord slots
> > 3. g3dvl needs and uses 8 texcoord slots, indices 0-7
> > 4. vega and xorg use indices 0-1
> > 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken
> > 6. The GL_EXT_separate_shader_objects extension does not provide
> > arbitrary index matching for GLSL, but merely lets it use a model
> > similar to ARB fp/vp
> >
> > However, the GLSL linker needs them in its current form, and the capability 
> > can be generally useful anyway.
> >
> > ===================
> > Discussion of possible options
> > ===================
> >
> > [Options from Keith Whitwell, see 
> > http://www.opensource-archive.org/showthread.php?p=180719]
> > a) Picking a lower number like 128, that an SM3 state tracker could
> > usually be able to directly translate incoming semantics into, but which
> > would force it to renumber under rare circumstances. This would make
> > life easier for the open drivers at the expense of the closed code.
> >
> > b) Picking 256 to make life easier for some closed-source SM3 state
> > tracker, but harder for open drivers.
> >
> > c) Picking 219 (or some other magic number) that happens to work with
> > the current set of constraints, but makes gallium fragile in the face of
> > new constraints.
> >
> > d) Abandoning the current gallium linkage rules and coming up with
> > something new, for instance forcing the state trackers to renumber
> > always and making life trivial for the drivers...
> >
> > [Options from me]
> >
> > (e) Allow arbitrary 32-bit indices. This requires slightly more
> > complicated data structures in some cases, and will require svga and
> > r600 to fallback to software linkage if numbers are too high.
> >
> > (f) Limit semantic indices to hardware interpolators _and_ introduce
> > an interface to let the user specify an
> >
> > Personally I think the simplest idea for now could be to have all
> > drivers support 256 indices or, in the case of r600 and svga, the
> > maximum value supported by the hardware, and expose that as a cap (as
> > well as another cap for the number of different semantic values
> > supported at once).
> > The minimum guaranteed value is set to the lowest hardware constraint,
> > which would be svga with 219 indices (assuming no bcolor is used).
> > If some new constraints pop up, we just lower it and change SM3 state
> > trackers to check for it and fallback otherwise.
> >
> > This should just require simple fixes to svga and r300, and
> > significant code for nv30/nv40, which is however already implemented.
> >
> > Luca Barbieri (5):
> >   tgsi: formalize limits on semantic indices
> >   tgsi: add support for packing semantics in SM3 byte values
> >   gallium/auxiliary: add semantic linkage utility code
> >   nvfx: support proper shader linkage - adds glsl support
> >   nvfx: expose GLSL
> >
> > Michal Krol (1):
> >   gallium: Remove TGSI_SEMANTIC_NORMAL.
> >
> >
> > ------------------------------------------------------------------------------
> > Download Intel&#174; Parallel Studio Eval
> > Try the new software tools for yourself. Speed compiling, find bugs
> > proactively, and fine-tune applications for parallel performance.
> > See why Intel Parallel Studio got high marks during beta.
> > http://p.sf.net/sfu/intel-sw-dev
> > _______________________________________________
> > Mesa3d-dev mailing list
> > Mesa3d-dev@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
> 
> 
> ------------------------------------------------------------------------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Mesa3d-dev mailing list
> Mesa3d-dev@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev



------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to