Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-02-01 Thread Corbin Simpson
I propose that the following language be added to the spec:

"""
Gallium has no explicit mechanism for linking shaders. Shaders are
implicitly linked in a pipeline at render time. Thus, linking must not
fail and the pipe driver is permitted to change shader semantics to
preserve linking.

If a shader uses non-consecutive semantics, the pipe driver may remap
them to be consecutive, within the bounds of other currently bound
state. For example, if a shader samples from samplers 3 and 5, but the
only samplers bound are 0 and 1, the driver may adjust the shader to
sample from samplers 0 and 1 instead.

If the bound shader pipeline does not have consistent outputs and
inputs, the driver is permitted to remap them to be consistent through
a driver-specific mechanism. For example, if a vertex shader writes
generic 1, a fragment shader references generic 0, and there is no
geometry shader bound, the pipe driver may change the vertex shader to
write generic 0, or it may use a driver-specific internal routing
table to route generic 1 to generic 0.

If the total count of each semantic attribute written by the vertex
shader does not match the counts read by the fragment shader, the
driver may discard extraneous attributes and the values of missing
attributes are undefined. If a geometry shader is present, this
adjustment may happen on the vertex to geometry linking and geometry
to fragment linking.
"""

I think this about covers it. This language seems *very* reasonable to
me, while not totally ignoring the problems faced by nv50.

~ C.

On Mon, Feb 1, 2010 at 7:29 AM, Luca Barbieri  wrote:
>> I can't really use a routing table state to produce a cso, because the hw
>> routing table I generate depends on rasterizer state, e.g. I must not
>> put in back face colour (we have a 2 to 1 mapping here) if twoside
>> is disabled.
>>
>> Also, I'm routing based on the scalar *components* the FP reads,
>> not whole TGSI pseudo vec4 registers (NUM_INTERPOLATORS will
>> thus be inaccurate) - set_routing_table will have to pass me the
>> respective programs too.
>> Well, I can still use the cso and insert it into the rest of the routing
>> table that still need to be assembled on the fly, I did that before the
>> 1:1 mapping between FP and VP regs was removed.
>
> You are right, the routing table CSO needs to contain the fragment and
> vertex shader handles, and ideally light_twoside should be moved to
> the vertex->fragment routing table since it is really an attribute of
> that and not polygon rasterization/setup.
>
> You can then just look at your internal data structure and construct a
> scalar routing table from the vec4 one provided by Gallium.
>
> We could also, as a further extension, support scalar routing tables
> directly in Gallium.
> Note however that radeon hardware presumably only supports vector
> ones, so we would need all 3 options with caps.
> A further intermediate step could be vector routing tables with swizzling.
>
> --
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> ___
> Mesa3d-dev mailing list
> Mesa3d-dev@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
>



-- 
Only fools are easily impressed by what is only
barely beyond their reach. ~ Unknown

Corbin Simpson


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-02-01 Thread Luca Barbieri
> I can't really use a routing table state to produce a cso, because the hw
> routing table I generate depends on rasterizer state, e.g. I must not
> put in back face colour (we have a 2 to 1 mapping here) if twoside
> is disabled.
>
> Also, I'm routing based on the scalar *components* the FP reads,
> not whole TGSI pseudo vec4 registers (NUM_INTERPOLATORS will
> thus be inaccurate) - set_routing_table will have to pass me the
> respective programs too.
> Well, I can still use the cso and insert it into the rest of the routing
> table that still need to be assembled on the fly, I did that before the
> 1:1 mapping between FP and VP regs was removed.

You are right, the routing table CSO needs to contain the fragment and
vertex shader handles, and ideally light_twoside should be moved to
the vertex->fragment routing table since it is really an attribute of
that and not polygon rasterization/setup.

You can then just look at your internal data structure and construct a
scalar routing table from the vec4 one provided by Gallium.

We could also, as a further extension, support scalar routing tables
directly in Gallium.
Note however that radeon hardware presumably only supports vector
ones, so we would need all 3 options with caps.
A further intermediate step could be vector routing tables with swizzling.

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-02-01 Thread Christoph Bumiller
On 01.02.2010 15:32, Luca Barbieri wrote:
> An overview of the possible options.
> Let's call vertex shader outputs "v" and fragment shader inputs "f"
> Let v -> f mean that v connects to f.
> NUM_INTERPOLATORS is the number of available interpolators. It is
> usually between 8 and 32.
>
> 1. Current Gallium
> v -> f if and only if v == f
> Any values of v and f are legal
>
> 2. My proposal, basic version
> v -> f if and only if v == f
> v >= NUM_INTERPOLATORS and f >= NUM_INTERPOLATORS are illegal
>
> 3. My proposal plus with routing tables (Corbin Simpson's idea)
> v -> f if and only if (v, f) is an entry in the routing table
> v >= NUM_INTERPOLATORS and f >= NUM_INTERPOLATORS are illegal
> If no routing table is set, (2) is used instead
> The routing table is set via a new ->set_routing_table() entry point,
> taking the processor linkage affected, and an array of pairs of
> registers to link.
> This may be made a CSO (it is constant for each pair of state tracker 
> shaders).
>
>   

I can't really use a routing table state to produce a cso, because the hw
routing table I generate depends on rasterizer state, e.g. I must not
put in back face colour (we have a 2 to 1 mapping here) if twoside
is disabled.

Also, I'm routing based on the scalar *components* the FP reads,
not whole TGSI pseudo vec4 registers (NUM_INTERPOLATORS will
thus be inaccurate) - set_routing_table will have to pass me the
respective programs too.
Well, I can still use the cso and insert it into the rest of the routing
table that still need to be assembled on the fly, I did that before the
1:1 mapping between FP and VP regs was removed.

On a sidenote,
if a VP output is not written, I map it to 0 or 1 (4th component);
FOG has all 4 components written in VP thus yzw don't
become 0 or 1 in FP as is expected; I can adjust the write mask
myself though ...

Christoph
> The problem with (1) is that if the hardware has no routing support,
> there is no way to implement it except by recompiling either the
> vertex or fragment shader when the other changes.
>
> Furthermore, no known user of Gallium actually needs (1) except the
> current GLSL linker, but only due to the particular implementation and
> not due to the OpenGL/GLSL API itself.
>
> Only (2) is actually needed by the APIs, and it is also supported
> trivially by all hardware.
>
>
> Additionally there is no way to implement "bind by name" (i.e. having
> GLSL varyings with the same name be linked) with either (1) or (2)
> without recompiling one shader in response to changing the other.
>
> With (3) instead, the state tracker can look at the symbol tables of
> both programs, build a routing table, and set it independently of the
> shaders. Functionality provided by (1) can also be emulated in this
> way.
>
>
> Thus, I propose replacing (1) with (2) and later expand to (3) if we
> desire to do so.
>
> I think that in particular, the "bind by name" argument is decisive,
> since binding by name is what GLSL and high level languages really
> want, and the added flexibility of (1) is useless for that, while
> coming at significant driver complexity and performance cost.
>
> What do you think?
>
> --
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> ___
> Mesa3d-dev mailing list
> Mesa3d-dev@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
>   


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-02-01 Thread Luca Barbieri
On Mon, Feb 1, 2010 at 3:38 PM, Keith Whitwell  wrote:
> This seems like a very different idea of semantics.  These aren't intended to 
> be hardware resources, and there is no concept of querying the driver to 
> figure out how many the hardware supports.  Further, the indices for 
> different semantic names are considered to be disjoint, permitting FOG[0], 
> COLOR[0] and GENERIC[0], each of which would potentially consume an 
> interpolator on some hardware, subverting the idea of a maximum semantic 
> index.

There would be a maximum semantic index for each semantic type.
Note this is exactly like the existing OpenGL limit on
fragment.texcoord[i] and how ARBfp/vp work.

All APIs as far as I know have such limits, simply because they indeed
refer to hardware resources.

Why shouldn't Gallium semantic indices refer to hardware resources too?

What is the advantage of using abstract identifiers that the driver
needs to map, when no API needs those, and when there is no use for
them? (except for slightly simplifying the GLSL implementation at the
expense of greater complexity in all drivers)

> I think if you want to improve linkage semantics, some of your other 
> suggestions are more promising.  I'd like to dig into those a little more if 
> that's ok.

Yes, sure.
Are your referring to the routing tables idea? (actually initially
suggested by Corbin Simpson)

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-02-01 Thread Keith Whitwell
This seems like a very different idea of semantics.  These aren't intended to 
be hardware resources, and there is no concept of querying the driver to figure 
out how many the hardware supports.  Further, the indices for different 
semantic names are considered to be disjoint, permitting FOG[0], COLOR[0] and 
GENERIC[0], each of which would potentially consume an interpolator on some 
hardware, subverting the idea of a maximum semantic index.  

There's currently no concept of driver-specified maximum semantic indexes, and 
I don't think it's the right direction to be taking things.  If we are going to 
have a limit, it would be on the total number of inputs/outputs from a given 
stage, not the way that individual ones are labelled.

I think if you want to improve linkage semantics, some of your other 
suggestions are more promising.  I'd like to dig into those a little more if 
that's ok.

Keith

From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca 
Barbieri [l...@luca-barbieri.com]
Sent: Monday, February 01, 2010 6:11 AM
To: Keith Whitwell
Cc: Brian Paul; mesa3d-dev@lists.sourceforge.net
Subject: Re: [PATCH] glsl: put varyings in texcoord slots

> In GL, there doesn't seem to be a requirement for sequential usage - an app 
> using ARB_vp/fp could explicitly pass TEXCOORD[10] and ignore 0..9 if it 
> wanted to.   In ARB_vp, that effectively means the shader would be using 
> discontiguous register numbers, ie OUTPUT[0], OUTPUT[10], etc.
Yes, but TEXCOORD[10] will only work if the hardware supports 11 textures.

>
> In DX9, there is also no requirement for sequential usage - an app can use 
> TEXCOORD[1] without [0], or COLOR[0] and [2] without [1] or [3].   I would 
> expect any DX9 gallium state tracker would also end up using non-sequential 
> indices if it directly translated DX9 semantics to gallium.

I think there is some terminology confusion.
I'm not proposing to require indices to be "sequential" in the way you
are describing.

By "sequential" I mean that if the hardware supports 8 interpolators,
then the available semantic indices should run from 0 to 7.
Thus, the *available* semantic indices are sequential, but the user is
free to use any subset of them.

So you can start with OUTPUT[10], but only if at least 11
interpolators are supported (so that the used set {10} is a subset of
the available set [0-10]).

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-02-01 Thread Luca Barbieri
An overview of the possible options.
Let's call vertex shader outputs "v" and fragment shader inputs "f"
Let v -> f mean that v connects to f.
NUM_INTERPOLATORS is the number of available interpolators. It is
usually between 8 and 32.

1. Current Gallium
v -> f if and only if v == f
Any values of v and f are legal

2. My proposal, basic version
v -> f if and only if v == f
v >= NUM_INTERPOLATORS and f >= NUM_INTERPOLATORS are illegal

3. My proposal plus with routing tables (Corbin Simpson's idea)
v -> f if and only if (v, f) is an entry in the routing table
v >= NUM_INTERPOLATORS and f >= NUM_INTERPOLATORS are illegal
If no routing table is set, (2) is used instead
The routing table is set via a new ->set_routing_table() entry point,
taking the processor linkage affected, and an array of pairs of
registers to link.
This may be made a CSO (it is constant for each pair of state tracker shaders).

The problem with (1) is that if the hardware has no routing support,
there is no way to implement it except by recompiling either the
vertex or fragment shader when the other changes.

Furthermore, no known user of Gallium actually needs (1) except the
current GLSL linker, but only due to the particular implementation and
not due to the OpenGL/GLSL API itself.

Only (2) is actually needed by the APIs, and it is also supported
trivially by all hardware.


Additionally there is no way to implement "bind by name" (i.e. having
GLSL varyings with the same name be linked) with either (1) or (2)
without recompiling one shader in response to changing the other.

With (3) instead, the state tracker can look at the symbol tables of
both programs, build a routing table, and set it independently of the
shaders. Functionality provided by (1) can also be emulated in this
way.


Thus, I propose replacing (1) with (2) and later expand to (3) if we
desire to do so.

I think that in particular, the "bind by name" argument is decisive,
since binding by name is what GLSL and high level languages really
want, and the added flexibility of (1) is useless for that, while
coming at significant driver complexity and performance cost.

What do you think?

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-02-01 Thread Luca Barbieri
> In GL, there doesn't seem to be a requirement for sequential usage - an app 
> using ARB_vp/fp could explicitly pass TEXCOORD[10] and ignore 0..9 if it 
> wanted to.   In ARB_vp, that effectively means the shader would be using 
> discontiguous register numbers, ie OUTPUT[0], OUTPUT[10], etc.
Yes, but TEXCOORD[10] will only work if the hardware supports 11 textures.

>
> In DX9, there is also no requirement for sequential usage - an app can use 
> TEXCOORD[1] without [0], or COLOR[0] and [2] without [1] or [3].   I would 
> expect any DX9 gallium state tracker would also end up using non-sequential 
> indices if it directly translated DX9 semantics to gallium.

I think there is some terminology confusion.
I'm not proposing to require indices to be "sequential" in the way you
are describing.

By "sequential" I mean that if the hardware supports 8 interpolators,
then the available semantic indices should run from 0 to 7.
Thus, the *available* semantic indices are sequential, but the user is
free to use any subset of them.

So you can start with OUTPUT[10], but only if at least 11
interpolators are supported (so that the used set {10} is a subset of
the available set [0-10]).

> With quote:
> --
> In Direct3D 10, adjacent stages effectively share a register array, where the 
> output shader (or pipeline stage) writes data to specific locations in the 
> register array and the input shader must read from the same locations. The 
> API uses shader signatures to bind shader outputs with inputs without the 
> overhead of semantic resolution.
> --
>
> So right now, we seem to actually be close to the DX9 and GL models.  If we 
> were going to change, I'd suggest moving closer to DX10 > rather than coming 
> up with some new way of doing things.
According to the document you quoted, it seems the difference is that
in DX10 vertex and fragment shaders must match exactly, i.e. no vertex
shader output may be left unused and no fragment shader input may be
default zero/undefined.

We can't go with the DX10 model because it will break GL.
We should however decide whether fragment inputs not written by the
fragment shaders are zero or undefined, and make sure all drivers
correctly implement this.

What I'm proposing, excluding the routing table stuff, is exactly the
idea of having output shader and input shader read/write from a
(conceptual) register array.

If you have 8 interpolators, you use vertex outputs 0-7 and fragment
inputs 0-7 with identity mapping.

If we want to add routing support, I suggested adding the ability to
explicitly set a mapping from shader outputs to shader inputs as a set
of pairs.
Unlike the current method, this allows to layer GLSL-like match by
name semantics over it, which is impossible which the current
proposal, since you can't map strings to integer, and you can't
specify routing separately from shaders.

> Ultimately, it shouldn't be all that important as everybody should be running 
> these shaders through a proper optimizer for their hardware, which should 
> have no trouble translating this stuff into any format it likes.

This is only true if the optimizer works on both the vertex and
fragment shaders together, which is something that would be nice to
avoid (at least, avoid _having_ to do it, it might possibly be
beneficial sometimes).
If they are optimized/compiled separately, the routing and semantic
index choice is fundamental.

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-02-01 Thread Keith Whitwell
Luca,

In GL, there doesn't seem to be a requirement for sequential usage - an app 
using ARB_vp/fp could explicitly pass TEXCOORD[10] and ignore 0..9 if it wanted 
to.   In ARB_vp, that effectively means the shader would be using discontiguous 
register numbers, ie OUTPUT[0], OUTPUT[10], etc.

In DX9, there is also no requirement for sequential usage - an app can use 
TEXCOORD[1] without [0], or COLOR[0] and [2] without [1] or [3].   I would 
expect any DX9 gallium state tracker would also end up using non-sequential 
indices if it directly translated DX9 semantics to gallium.

In DX10, it seems to be the same, but there are additional changes to simplify 
things.

Here's a link to some msdn about DX9 and DX10 semantics:
   
http://msdn.microsoft.com/en-us/library/ee415668%28VS.85%29.aspx#Porting_Shaders

With quote:
--
A semantic is a string attached to a shader input or output that conveys 
information about the intended use of a parameter. Semantics are required on 
all variables passed between shader stages. The syntax for adding a semantic to 
a shader variable is shown here (Variable Syntax (DirectX HLSL)).

In general, data passed between pipeline stages is completely generic and is 
not uniquely interpreted by the system; arbitrary semantics are allowed which 
have no special meaning. Parameters (in Direct3D 10) which contain these 
special semantics are referred to as System-Value Semantics.
--

This is fairly close to gallium.  It's hard to tell what goes on at the 
assembly level in DX10, as HLSL is the public interface.  But it seems that 
DX10 goes further than gallium, and also matches based on register number, ie 
PS.input[0] always corresponds to VS.output[0], though I'm not sure how that 
works in the face of things like the "face" register which appears for the 
first time in the PS.  

Here's a link:
   http://msdn.microsoft.com/en-us/library/ee418358%28VS.85%29.aspx

With quote:
--
In Direct3D 10, adjacent stages effectively share a register array, where the 
output shader (or pipeline stage) writes data to specific locations in the 
register array and the input shader must read from the same locations. The API 
uses shader signatures to bind shader outputs with inputs without the overhead 
of semantic resolution.
--

So right now, we seem to actually be close to the DX9 and GL models.  If we 
were going to change, I'd suggest moving closer to DX10 rather than coming up 
with some new way of doing things.

Ultimately, it shouldn't be all that important as everybody should be running 
these shaders through a proper optimizer for their hardware, which should have 
no trouble translating this stuff into any format it likes.

Keith



From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca 
Barbieri [l...@luca-barbieri.com]
Sent: Thursday, January 28, 2010 11:48 PM
To: Brian Paul; Keith Whitwell
Cc: Luca Barbieri; mesa3d-dev@lists.sourceforge.net
Subject: Re: [PATCH] glsl: put varyings in texcoord slots

I'd like to have some more definitive review comments on this patch
(sending to Brian and Keith for this).

Right now GLSL is the *only* Gallium user that does not use sequential
indexes starting from 0 for vertex shader outputs and fragment shader
inputs.
This causes problems for some drivers such as nv30/nv40 that don't
remap the indexes right now.

This can be addressed in two ways:
1. Don't require Gallium users to use sequential indices, and require
vertex shader inputs and fragment shader outputs to match perfectly
2. Don't require Gallium users to use sequential indices, and change
nv30/nv40 and possibly other drivers to remap indices
3. Fix the only problematic user, GLSL, to use sequential indices

(1) will break the Mesa state tracker in a very hard to fix way.
(2) is complex and means that nv30/nv40 and maybe other drivers can no
longer compile vertex and fragment shaders independently.
(3) is a simple fix, provided by this patch.

I feel that (3), implemented by this patch, is the best solution,
since driver simplicity is one of the Gallium design goals, and I
don't see any significant advantages in supporting discontiguous
vertex shader output / fragment shader input values.

OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for
vertex shader outputs and fragment shader inputs, and instead require
0-7, 0-15 or 0-31 depending on feature level.

If this is wrong, please correct me.

I propose that Gallium should also require 0-x indices and not arbitrary values.
Thus, GLSL should be fixed to respect that.

Note that this change cannot be done in the state tracker because it
requires to see both the fragment and vertex shaders at once, which
only happens in the GLSL linker.
Thus, while the change has been discussed with Gallium in mind, it is
done at the Mesa program level, and it actually results in Mesa
programs with contiguous indices.
This also

Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Luca Barbieri
On Fri, Jan 29, 2010 at 11:09 PM, Corbin Simpson
 wrote:
> I would say that the routing table really needs to be handled by the
> driver implicitly. When you're told to draw things, you do your shader
> routing/linking before you draw.
>
> If the routing table really does depend on the shaders and only the
> shaders, then I wouldn't be opposed to some CSO, but I don't think
> it's horribly necessary.

The problem with implicit handling is that you can't match by name.
You can have the driver automatically bind vs output 34134 to fs input
34134, but that isn't really helpful if you want to bind vs output
variable "foo" to fs input variable "foo".
And this is what GLSL and HLSL really want.

Of course one could replace the semantic index integer with a string,
but at this point hardware clearly won't accelerate that, so it seems
better to have the state tracker send an explicit list of (vs output,
fs input) pairs.

It could be a CSO that the state tracker keeps indexed by (vs, fs) pair.

However note that, unless I'm mistaken, no state tracker _needs_ to
expose this functionality right now.
We could however expose it as an additional GLSL extension based on
GL_EXT_separate_shader_objects, that would allow to link by
user-defined varying values too
We could also use it to make GLSL linking a no-op on hardware
supporting routing tables.

In other words, it seems to me that this approach could both fix
nv30/nv40, simplify drivers and provide additional, useful,
functionality.

[As already mentioned, we have geometry shaders too so probably it's
actually going to be (vs, gs) and (gs, fs)]

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Corbin Simpson
I would say that the routing table really needs to be handled by the
driver implicitly. When you're told to draw things, you do your shader
routing/linking before you draw.

If the routing table really does depend on the shaders and only the
shaders, then I wouldn't be opposed to some CSO, but I don't think
it's horribly necessary.

And no, pre-optimizing TGSI is stupid. Every single backend does
something different with its shaders. You can't deliver something
besides TGSI to drivers, either; it will suck badly for out-of-tree
drivers and compat. Just write a real compiler for your HW.

~ C.

On Fri, Jan 29, 2010 at 12:47 PM, Luca Barbieri  wrote:
> On Fri, Jan 29, 2010 at 8:49 PM, Keith Whitwell  wrote:
>> So the nv40 code is doing the wrong thing...  :)
>>
>> The rule currently is that the generic tags are just tags and are used only 
>> to establish mapping between fragment shader and vertex shader.  Additionaly 
>> the vertex shader should always supply at least the inputs required by the 
>> fragment shader.  Taking a guess, without specific hardware support for 
>> routing, nv40 should translate the fragment shader inputs into successive 
>> SRC_TC slots, and build a mapping from semantic-index -> SRC_TC slot that 
>> can be applied when translating the vertex shader.
>>
>> You might complain that in this world a vertex shader will require multiple 
>> translations depending on which fragment shader is bound, as one taking 
>> fewer inputs will have different numberings.  That's true, but you also get 
>> the opportunity to specialize the vertex shader as a result, eliminating 
>> dead code in each case - which may be worth the effort.
>>
>> Anyway, it sounds like what you really want is to change the linkage rules.  
>> This is possible, of course, but we should discuss it in those terms. > And 
>> then if we do revisit those rules, we should take a deeper look at linkage 
>> and figure out if anything else can be changed.
>
> Yes, this is what I'm proposing.
>
> Actually, those rules do not seem to be documented, so I was not sure
> whether they were a requirement or were just a "bonus" feature of some
> drivers.
>
> Right now it seems to me that no users of Gallium require this functionality.
> In particular:
> 1. GLSL and anything with similar link-by-name can of course be
> modified to use sequential indices
> 2. ARB fragment program and vertex program use index-limited texcoord slots
> 3. g3dvl needs and uses 8 texcoord slots, indices 0-7
> 4. vega and xorg use indices 0-1
> 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken
> 6. The GL_EXT_separate_shader_objects extension does not provide
> arbitrary index matching for GLSL, but merely lets it use a model
> similar to ARB fp/vp
>
> Thus, unless I am mistaken, no current API needs this feature.
>
> In addition, note that current rules do *not* allow to bind inputs and
> outputs by name, since the semantic index is an integer and not a
> screen.
> However, the hardware implementation could probably allow that, since
> it most likely works by specifying the fragment shader input a given
> vertex shader output maps to.
>
> Thus, I propose to:
> 1. Change the rules to require semantic indices to be in a 0-N range.
> Document this appropriately.
> 2. Fix GLSL to respect that using the patch I posted. Everything else
> should be already fine.
> 3. If necessary, add a ->set_routing_table function that takes pairs
> of vertex shader outputs and fragment shader inputs to associate
>
> This allows to state tracker to support seamless "link by variable
> name string" by packing the indices in sequential numbers, and then
> using a dictionary kept in the state tracker to build an appropriate
> routing table.
>
> There are also geometry programs, so both vs->gs and gs->fs routing
> tables are probably needed.
>
> What do you think?
>
>
> As an aside, regarding shader optimization, I would suggest a
> driver-independent LLVM-based middle layer that takes TGSI as input
> and can provide either optimized LLVM bytecode or optimized TGSI to
> the driver (letting the driver choose with appropriate caps).
>
> --
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> ___
> Mesa3d-dev mailing list
> Mesa3d-dev@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
>



-- 
Only fools are easily impressed by what is only
barely beyond their reach. ~ Unknown

Corbin Simpson


--
The Planet: dedicated and managed hosting, cloud storage, colocation
St

Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Luca Barbieri
On Fri, Jan 29, 2010 at 8:49 PM, Keith Whitwell  wrote:
> So the nv40 code is doing the wrong thing...  :)
>
> The rule currently is that the generic tags are just tags and are used only 
> to establish mapping between fragment shader and vertex shader.  Additionaly 
> the vertex shader should always supply at least the inputs required by the 
> fragment shader.  Taking a guess, without specific hardware support for 
> routing, nv40 should translate the fragment shader inputs into successive 
> SRC_TC slots, and build a mapping from semantic-index -> SRC_TC slot that can 
> be applied when translating the vertex shader.
>
> You might complain that in this world a vertex shader will require multiple 
> translations depending on which fragment shader is bound, as one taking fewer 
> inputs will have different numberings.  That's true, but you also get the 
> opportunity to specialize the vertex shader as a result, eliminating dead 
> code in each case - which may be worth the effort.
>
> Anyway, it sounds like what you really want is to change the linkage rules.  
> This is possible, of course, but we should discuss it in those terms. > And 
> then if we do revisit those rules, we should take a deeper look at linkage 
> and figure out if anything else can be changed.

Yes, this is what I'm proposing.

Actually, those rules do not seem to be documented, so I was not sure
whether they were a requirement or were just a "bonus" feature of some
drivers.

Right now it seems to me that no users of Gallium require this functionality.
In particular:
1. GLSL and anything with similar link-by-name can of course be
modified to use sequential indices
2. ARB fragment program and vertex program use index-limited texcoord slots
3. g3dvl needs and uses 8 texcoord slots, indices 0-7
4. vega and xorg use indices 0-1
5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken
6. The GL_EXT_separate_shader_objects extension does not provide
arbitrary index matching for GLSL, but merely lets it use a model
similar to ARB fp/vp

Thus, unless I am mistaken, no current API needs this feature.

In addition, note that current rules do *not* allow to bind inputs and
outputs by name, since the semantic index is an integer and not a
screen.
However, the hardware implementation could probably allow that, since
it most likely works by specifying the fragment shader input a given
vertex shader output maps to.

Thus, I propose to:
1. Change the rules to require semantic indices to be in a 0-N range.
Document this appropriately.
2. Fix GLSL to respect that using the patch I posted. Everything else
should be already fine.
3. If necessary, add a ->set_routing_table function that takes pairs
of vertex shader outputs and fragment shader inputs to associate

This allows to state tracker to support seamless "link by variable
name string" by packing the indices in sequential numbers, and then
using a dictionary kept in the state tracker to build an appropriate
routing table.

There are also geometry programs, so both vs->gs and gs->fs routing
tables are probably needed.

What do you think?


As an aside, regarding shader optimization, I would suggest a
driver-independent LLVM-based middle layer that takes TGSI as input
and can provide either optimized LLVM bytecode or optimized TGSI to
the driver (letting the driver choose with appropriate caps).

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Luca Barbieri
I just read the extension, and it seems to be that it clearly
indicates that routing is *not* used by OpenGL.

In particular, varyings with the same name are not linked together,
and instead the builtin varyings must be used.
As far as I know, the builtin varyings are gl_TexCoord[i] where i <
MaxTextureCoords.

Am I missing anything?
Does GLSL, or any extension of it, specify any builtin varyings with
arbitrary indices?

Note that the extension spec even explicitly says the following:
<<
It is undesirable from a performance standpoint to attempt to
support "rendezvous by name" for arbitrary separate shaders
because the separate shaders won't be naturally compiled to
match their varying inputs and outputs of the same name without
a special link step.  Such a special link would introduce an
extra validation overhead to binding separate shaders.  The link
itself would have to be deferred until glBegin time since separate
shaders won't match when transitioning from one set of consistent
shaders to another.  This special link would still create errors
or undefined behavior when the names of input and output varyings
matched but their types did not match.
>>

It explicitly rules out doing a special link step/routing.

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Brian Paul
On Fri, Jan 29, 2010 at 12:49 PM, Keith Whitwell  wrote:
> So the nv40 code is doing the wrong thing...  :)
>
> The rule currently is that the generic tags are just tags and are used only 
> to establish mapping between fragment shader and vertex shader.  Additionaly 
> the vertex shader should always supply at least the inputs required by the 
> fragment shader.  Taking a guess, without specific hardware support for 
> routing, nv40 should translate the fragment shader inputs into successive 
> SRC_TC slots, and build a mapping from semantic-index -> SRC_TC slot that can 
> be applied when translating the vertex shader.
>
> You might complain that in this world a vertex shader will require multiple 
> translations depending on which fragment shader is bound, as one taking fewer 
> inputs will have different numberings.  That's true, but you also get the 
> opportunity to specialize the vertex shader as a result, eliminating dead 
> code in each case - which may be worth the effort.
>
> Anyway, it sounds like what you really want is to change the linkage rules.  
> This is possible, of course, but we should discuss it in those terms.  And 
> then if we do revisit those rules, we should take a deeper look at linkage 
> and figure out if anything else can be changed.


Just a heads-up:

The new GL_EXT_separate_shader_objects extension
(http://www.opengl.org/registry/specs/EXT/separate_shader_objects.txt)
allows binding/using vertex shaders and fragment shaders without an
explicit linking step.

Using the semantic info to match inputs to outputs will be key for
that, I believe.

-Brian

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Keith Whitwell
So the nv40 code is doing the wrong thing...  :)

The rule currently is that the generic tags are just tags and are used only to 
establish mapping between fragment shader and vertex shader.  Additionaly the 
vertex shader should always supply at least the inputs required by the fragment 
shader.  Taking a guess, without specific hardware support for routing, nv40 
should translate the fragment shader inputs into successive SRC_TC slots, and 
build a mapping from semantic-index -> SRC_TC slot that can be applied when 
translating the vertex shader.

You might complain that in this world a vertex shader will require multiple 
translations depending on which fragment shader is bound, as one taking fewer 
inputs will have different numberings.  That's true, but you also get the 
opportunity to specialize the vertex shader as a result, eliminating dead code 
in each case - which may be worth the effort.

Anyway, it sounds like what you really want is to change the linkage rules.  
This is possible, of course, but we should discuss it in those terms.  And then 
if we do revisit those rules, we should take a deeper look at linkage and 
figure out if anything else can be changed.

Keith





From: luca.barbi...@gmail.com [luca.barbi...@gmail.com] On Behalf Of Luca 
Barbieri [l...@luca-barbieri.com]
Sent: Friday, January 29, 2010 10:15 AM
To: Brian Paul
Cc: Keith Whitwell; mesa3d-dev@lists.sourceforge.net
Subject: Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

As a concrete example, the current nv40 code does this during fragment
program translation.
case TGSI_SEMANTIC_GENERIC:
if (fdec->Semantic.Index <= 7) {
hw = NV40_FP_OP_INPUT_SRC_TC(fdec->Semantic.
 Index);
} else {
NOUVEAU_ERR("bad generic semantic index\n");
return FALSE;
}
break;

As a result, GLSL completely fails to work on nv30 and nv40.

nv50 instead seems to have hardware routing support, which is used by
the driver.

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Alex Deucher
On Fri, Jan 29, 2010 at 12:55 PM, Luca Barbieri  wrote:
>> FWIW, I think DX10 required or at least encouraged semantic mapping
>> support in hardware.  R6xx+ radeons support this and r3xx-r5xx
>> hardware do to a lesser degree.  You can use arbitrary, driver
>> specific ids and the hardware will match up inputs and outputs based
>> on those ids.
>
> Can you provide a reference to the DX10 API for doing that?
> I'm not very familiar with DX10, and a quick search on MSDN didn't
> turn anything relevant.
>

I'm not sure off hand.  That's the answer I got internally as to why
we have semantic mapping support in hw.

Alex

> Does the VMWare DirectX state tracker use non-sequential semantic
> indices in the implementation of such a feature?
>

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Corbin Simpson

Look harder. There *should* be routing tables of some sort on there.
If there really aren't, then you might have to actually start in on
your shader compilers instead of just translating opcodes
one-at-a-time, but this isn't a bad thing as it's what everybody needs
to do for performance.


Also, as stated before, it's a bad idea to fix GLSL and pretend that
everything's kosher, because it'll break again next time somebody
provides a shader generator of non-trivial complexity. HLSL is the
obvious candidate, but there might be GPGPU languages that do the same
thing. While I have no serious objection to changing Gallium's TGSI
semantics and permanently forbidding this, I kind of wish the drivers
could just be more robust.

~ C.

On Fri, Jan 29, 2010 at 10:15 AM, Luca Barbieri  wrote:
> As a concrete example, the current nv40 code does this during fragment
> program translation.
>        case TGSI_SEMANTIC_GENERIC:
>                if (fdec->Semantic.Index <= 7) {
>                        hw = NV40_FP_OP_INPUT_SRC_TC(fdec->Semantic.
>                                                     Index);
>                } else {
>                        NOUVEAU_ERR("bad generic semantic index\n");
>                        return FALSE;
>                }
>                break;
>
> As a result, GLSL completely fails to work on nv30 and nv40.
>
> nv50 instead seems to have hardware routing support, which is used by
> the driver.
>
> --
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> ___
> Mesa3d-dev mailing list
> Mesa3d-dev@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
>



-- 
Only fools are easily impressed by what is only
barely beyond their reach. ~ Unknown

Corbin Simpson


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Luca Barbieri
As a concrete example, the current nv40 code does this during fragment
program translation.
case TGSI_SEMANTIC_GENERIC:
if (fdec->Semantic.Index <= 7) {
hw = NV40_FP_OP_INPUT_SRC_TC(fdec->Semantic.
 Index);
} else {
NOUVEAU_ERR("bad generic semantic index\n");
return FALSE;
}
break;

As a result, GLSL completely fails to work on nv30 and nv40.

nv50 instead seems to have hardware routing support, which is used by
the driver.

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Luca Barbieri
> Luca,
>
> Let me make sure I understand the problem here.
>
> Are you specifically concerned about the GENERIC[x] semantic
> labels/indexes that are attached to VS outputs and FS inputs?
Yes.

> This is as intended.  The semantic indexes are used to match up
> inputs/outputs logically but they should not effect which hardware
> interpolation slots are used.

If remapping hardware is not available (or it is inefficient), then
they must also select the interpolation slots, unless the driver is
changed to recompile both shaders every time one of them changes.

I'd really like to avoid having to do that in drivers if possible.
If I'm not mistaken, the current GLSL linker is the only thing that
currently relies on the capability to use semantic indices N and
larger, where N is the number of interpolators, so fixing that seems
to be enough.
For instance, users of ARB fragment/vertex program will use matched
texcoord slots, up to the implementation limit.

The situation might change if DirectX 10 provides a remapping API, or
if there is an OpenGL extension doing that.

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Luca Barbieri
> FWIW, I think DX10 required or at least encouraged semantic mapping
> support in hardware.  R6xx+ radeons support this and r3xx-r5xx
> hardware do to a lesser degree.  You can use arbitrary, driver
> specific ids and the hardware will match up inputs and outputs based
> on those ids.

Can you provide a reference to the DX10 API for doing that?
I'm not very familiar with DX10, and a quick search on MSDN didn't
turn anything relevant.

Does the VMWare DirectX state tracker use non-sequential semantic
indices in the implementation of such a feature?

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Alex Deucher
On Fri, Jan 29, 2010 at 12:31 PM, Brian Paul  wrote:
> On Fri, Jan 29, 2010 at 9:49 AM, Brian Paul  wrote:
>> On Fri, Jan 29, 2010 at 12:48 AM, Luca Barbieri  
>> wrote:
>>> I'd like to have some more definitive review comments on this patch
>>> (sending to Brian and Keith for this).
>>>
>>> Right now GLSL is the *only* Gallium user that does not use sequential
>>> indexes starting from 0 for vertex shader outputs and fragment shader
>>> inputs.
>>> This causes problems for some drivers such as nv30/nv40 that don't
>>> remap the indexes right now.
>>>
>>> This can be addressed in two ways:
>>> 1. Don't require Gallium users to use sequential indices, and require
>>> vertex shader inputs and fragment shader outputs to match perfectly
>>> 2. Don't require Gallium users to use sequential indices, and change
>>> nv30/nv40 and possibly other drivers to remap indices
>>> 3. Fix the only problematic user, GLSL, to use sequential indices
>>>
>>> (1) will break the Mesa state tracker in a very hard to fix way.
>>> (2) is complex and means that nv30/nv40 and maybe other drivers can no
>>> longer compile vertex and fragment shaders independently.
>>> (3) is a simple fix, provided by this patch.
>>>
>>> I feel that (3), implemented by this patch, is the best solution,
>>> since driver simplicity is one of the Gallium design goals, and I
>>> don't see any significant advantages in supporting discontiguous
>>> vertex shader output / fragment shader input values.
>>>
>>> OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for
>>> vertex shader outputs and fragment shader inputs, and instead require
>>> 0-7, 0-15 or 0-31 depending on feature level.
>>>
>>> If this is wrong, please correct me.
>>>
>>> I propose that Gallium should also require 0-x indices and not arbitrary 
>>> values.
>>> Thus, GLSL should be fixed to respect that.
>>>
>>> Note that this change cannot be done in the state tracker because it
>>> requires to see both the fragment and vertex shaders at once, which
>>> only happens in the GLSL linker.
>>> Thus, while the change has been discussed with Gallium in mind, it is
>>> done at the Mesa program level, and it actually results in Mesa
>>> programs with contiguous indices.
>>> This also potentially benefits non-Gallium drivers.
>>>
>>> What do you think?
>>
>> Luca, I'm OK with this change in principle but I need a bit more time
>> to review the problem and your patch...
>
> Luca,
>
> Let me make sure I understand the problem here.
>
> Are you specifically concerned about the GENERIC[x] semantic
> labels/indexes that are attached to VS outputs and FS inputs?
>
> I hacked a Mesa GLSL demo to use texcoords and varying vars and saw
> something like this:
>
> VERT
> DCL IN[0]
> DCL OUT[0], POSITION
> DCL OUT[1], GENERIC[0]
> DCL OUT[2], GENERIC[10]
> ...
>
> FRAG
> DCL IN[0], GENERIC[0], PERSPECTIVE
> DCL IN[1], GENERIC[10], PERSPECTIVE
> DCL OUT[0], COLOR
> ...
>
>
> We use the semantic names/labels GENERIC[0] and GENERIC[10] but note
> that the actual inputs/outputs are in consecutive slots.
>
> This is as intended.  The semantic indexes are used to match up
> inputs/outputs logically but they should not effect which hardware
> interpolation slots are used.
>

FWIW, I think DX10 required or at least encouraged semantic mapping
support in hardware.  R6xx+ radeons support this and r3xx-r5xx
hardware do to a lesser degree.  You can use arbitrary, driver
specific ids and the hardware will match up inputs and outputs based
on those ids.

Alex

> Prior to Keith's commit 07fafc7c9346aa260829603bf3188596481e9e62 the
> generic semantic indexes were consecutive, BTW.
>
> -Brian
>
> --
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> ___
> Mesa3d-dev mailing list
> Mesa3d-dev@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
>

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Brian Paul
On Fri, Jan 29, 2010 at 9:49 AM, Brian Paul  wrote:
> On Fri, Jan 29, 2010 at 12:48 AM, Luca Barbieri  
> wrote:
>> I'd like to have some more definitive review comments on this patch
>> (sending to Brian and Keith for this).
>>
>> Right now GLSL is the *only* Gallium user that does not use sequential
>> indexes starting from 0 for vertex shader outputs and fragment shader
>> inputs.
>> This causes problems for some drivers such as nv30/nv40 that don't
>> remap the indexes right now.
>>
>> This can be addressed in two ways:
>> 1. Don't require Gallium users to use sequential indices, and require
>> vertex shader inputs and fragment shader outputs to match perfectly
>> 2. Don't require Gallium users to use sequential indices, and change
>> nv30/nv40 and possibly other drivers to remap indices
>> 3. Fix the only problematic user, GLSL, to use sequential indices
>>
>> (1) will break the Mesa state tracker in a very hard to fix way.
>> (2) is complex and means that nv30/nv40 and maybe other drivers can no
>> longer compile vertex and fragment shaders independently.
>> (3) is a simple fix, provided by this patch.
>>
>> I feel that (3), implemented by this patch, is the best solution,
>> since driver simplicity is one of the Gallium design goals, and I
>> don't see any significant advantages in supporting discontiguous
>> vertex shader output / fragment shader input values.
>>
>> OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for
>> vertex shader outputs and fragment shader inputs, and instead require
>> 0-7, 0-15 or 0-31 depending on feature level.
>>
>> If this is wrong, please correct me.
>>
>> I propose that Gallium should also require 0-x indices and not arbitrary 
>> values.
>> Thus, GLSL should be fixed to respect that.
>>
>> Note that this change cannot be done in the state tracker because it
>> requires to see both the fragment and vertex shaders at once, which
>> only happens in the GLSL linker.
>> Thus, while the change has been discussed with Gallium in mind, it is
>> done at the Mesa program level, and it actually results in Mesa
>> programs with contiguous indices.
>> This also potentially benefits non-Gallium drivers.
>>
>> What do you think?
>
> Luca, I'm OK with this change in principle but I need a bit more time
> to review the problem and your patch...

Luca,

Let me make sure I understand the problem here.

Are you specifically concerned about the GENERIC[x] semantic
labels/indexes that are attached to VS outputs and FS inputs?

I hacked a Mesa GLSL demo to use texcoords and varying vars and saw
something like this:

VERT
DCL IN[0]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[0]
DCL OUT[2], GENERIC[10]
...

FRAG
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL IN[1], GENERIC[10], PERSPECTIVE
DCL OUT[0], COLOR
...


We use the semantic names/labels GENERIC[0] and GENERIC[10] but note
that the actual inputs/outputs are in consecutive slots.

This is as intended.  The semantic indexes are used to match up
inputs/outputs logically but they should not effect which hardware
interpolation slots are used.

Prior to Keith's commit 07fafc7c9346aa260829603bf3188596481e9e62 the
generic semantic indexes were consecutive, BTW.

-Brian

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-29 Thread Brian Paul
On Fri, Jan 29, 2010 at 12:48 AM, Luca Barbieri  wrote:
> I'd like to have some more definitive review comments on this patch
> (sending to Brian and Keith for this).
>
> Right now GLSL is the *only* Gallium user that does not use sequential
> indexes starting from 0 for vertex shader outputs and fragment shader
> inputs.
> This causes problems for some drivers such as nv30/nv40 that don't
> remap the indexes right now.
>
> This can be addressed in two ways:
> 1. Don't require Gallium users to use sequential indices, and require
> vertex shader inputs and fragment shader outputs to match perfectly
> 2. Don't require Gallium users to use sequential indices, and change
> nv30/nv40 and possibly other drivers to remap indices
> 3. Fix the only problematic user, GLSL, to use sequential indices
>
> (1) will break the Mesa state tracker in a very hard to fix way.
> (2) is complex and means that nv30/nv40 and maybe other drivers can no
> longer compile vertex and fragment shaders independently.
> (3) is a simple fix, provided by this patch.
>
> I feel that (3), implemented by this patch, is the best solution,
> since driver simplicity is one of the Gallium design goals, and I
> don't see any significant advantages in supporting discontiguous
> vertex shader output / fragment shader input values.
>
> OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for
> vertex shader outputs and fragment shader inputs, and instead require
> 0-7, 0-15 or 0-31 depending on feature level.
>
> If this is wrong, please correct me.
>
> I propose that Gallium should also require 0-x indices and not arbitrary 
> values.
> Thus, GLSL should be fixed to respect that.
>
> Note that this change cannot be done in the state tracker because it
> requires to see both the fragment and vertex shaders at once, which
> only happens in the GLSL linker.
> Thus, while the change has been discussed with Gallium in mind, it is
> done at the Mesa program level, and it actually results in Mesa
> programs with contiguous indices.
> This also potentially benefits non-Gallium drivers.
>
> What do you think?

Luca, I'm OK with this change in principle but I need a bit more time
to review the problem and your patch...

-Brian

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-28 Thread Luca Barbieri
I'd like to have some more definitive review comments on this patch
(sending to Brian and Keith for this).

Right now GLSL is the *only* Gallium user that does not use sequential
indexes starting from 0 for vertex shader outputs and fragment shader
inputs.
This causes problems for some drivers such as nv30/nv40 that don't
remap the indexes right now.

This can be addressed in two ways:
1. Don't require Gallium users to use sequential indices, and require
vertex shader inputs and fragment shader outputs to match perfectly
2. Don't require Gallium users to use sequential indices, and change
nv30/nv40 and possibly other drivers to remap indices
3. Fix the only problematic user, GLSL, to use sequential indices

(1) will break the Mesa state tracker in a very hard to fix way.
(2) is complex and means that nv30/nv40 and maybe other drivers can no
longer compile vertex and fragment shaders independently.
(3) is a simple fix, provided by this patch.

I feel that (3), implemented by this patch, is the best solution,
since driver simplicity is one of the Gallium design goals, and I
don't see any significant advantages in supporting discontiguous
vertex shader output / fragment shader input values.

OpenGL and DirectX 9/10 don't seem to allow arbitrary numbers for
vertex shader outputs and fragment shader inputs, and instead require
0-7, 0-15 or 0-31 depending on feature level.

If this is wrong, please correct me.

I propose that Gallium should also require 0-x indices and not arbitrary values.
Thus, GLSL should be fixed to respect that.

Note that this change cannot be done in the state tracker because it
requires to see both the fragment and vertex shaders at once, which
only happens in the GLSL linker.
Thus, while the change has been discussed with Gallium in mind, it is
done at the Mesa program level, and it actually results in Mesa
programs with contiguous indices.
This also potentially benefits non-Gallium drivers.

What do you think?

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-18 Thread Olivier Galibert
On Mon, Jan 18, 2010 at 09:06:01PM +0100, Luca Barbieri wrote:
> As for REing the tables, it may not be possible.
> This is the code that apparently sets them up right now:
>   /* vtxprog output routing */
>   so_method(so, screen->curie, 0x1fc4, 1);
>   so_data  (so, 0x06144321);
>   so_method(so, screen->curie, 0x1fc8, 2);
>   so_data  (so, 0xedcba987);
>   so_data  (so, 0x0021);
>   so_method(so, screen->curie, 0x1fd0, 1);
>   so_data  (so, 0x00171615);
>   so_method(so, screen->curie, 0x1fd4, 1);
>   so_data  (so, 0x001b1a19);
> 
> This makes me think that only 4 bits might be used for the values
> (look at the arithmetic progressions of 4-bit values), so that there
> is a limit of 16 vertex output/fragment inputs.

Looking at the so_data values, 1fcx is 4 bits, 1fdx is 8 bits.  It's
also a little strange to have 24 slots for 16 possible values.


> If GLSL starts at index 10, we are still in trouble because less than
> 8 varyings will be available.

Shader model 4 requires at least 32 interpolated values.  Subsequent
versions will want more and more.  Any "start index" is going to blow
up someday unless it's a very large value (which 10 isn't).

  OG.


--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-18 Thread Luca Barbieri
> If you get this patch in, then you'll still have to fight with every
> other state tracker that doesn't prettify their TGSI. It would be a
> much better approach to attempt to RE the routing tables.

I don't think there any users of the Gallium interface that need more
than 8 vertex outputs/fragment inputs and don't use sequential values
starting at 0, except the GLSL linker without this patch.

ARB_fragment_program and ARB_vertex_program is limited to texcoord
slots, and Mesa should advertise only 8 of them.
Also users of this interface will likely only use as many as they
need, sequentially.

Vega, xorg seem to only use up to 2 slots.
g3dvl up to 8 (starting from 0, of course).

Cards with less than 8 slots may sometimes still have problems, but
such cards will probably be DX8 cards that don't work anyway.

Furthermore, even if you can route things, usings vertex outputs and
fragment inputs with lower indices may be more efficient anyway.

As for REing the tables, it may not be possible.
This is the code that apparently sets them up right now:
/* vtxprog output routing */
so_method(so, screen->curie, 0x1fc4, 1);
so_data  (so, 0x06144321);
so_method(so, screen->curie, 0x1fc8, 2);
so_data  (so, 0xedcba987);
so_data  (so, 0x0021);
so_method(so, screen->curie, 0x1fd0, 1);
so_data  (so, 0x00171615);
so_method(so, screen->curie, 0x1fd4, 1);
so_data  (so, 0x001b1a19);

This makes me think that only 4 bits might be used for the values
(look at the arithmetic progressions of 4-bit values), so that there
is a limit of 16 vertex output/fragment inputs.
If GLSL starts at index 10, we are still in trouble because less than
8 varyings will be available.

Also leaving vertex outputs/fragment inputs unused by starting at high
values may be bad for performance even if supported, as it may lead to
a bigger register file and thus less simultaneous GPU threads running.

In other words, having GLSL start at index 10 is easily avoided, and
causes problems nothing else causes, so why not just stop doing that?

--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-18 Thread Corbin Simpson
Actually, we don't even bother worrying about the rasterizer's routing
table until we've bound a pair of shaders and start drawing. Right
before the draw call, we re-generate, among other things, routing
tables for the vert shader and the rasterizer.

This is *incredibly* powerful, because it means we only have to
compile the shaders once, and load the rasterizer tables based on
those shaders. I even baked up a CSO to cache the tables, but it
turned out to be an overall slowdown.

If you get this patch in, then you'll still have to fight with every
other state tracker that doesn't prettify their TGSI. It would be a
much better approach to attempt to RE the routing tables.

Also FYI the r300-r500 rasterizer can only handle, off the top of my
head, 16 sets of vectors total (8 colors, 8 texcoords) so you're not
the only ones with this kind of limitation. The situation gets better
for r600 and nv50.

~ C.

On Mon, Jan 18, 2010 at 8:27 AM, Luca Barbieri  wrote:
> So, basically, you allocate the rasterizer units according to the
> vertex shader, and when the fragment shader comes up, you say "write
> rasterizer output 4 to fragment input 100"?
>
> The current nouveau drivers can't do this.
> There are "routing" registers in hardware, but I think the nVidia
> proprietary driver (at least without GLSL) leaves them unaltered after
> initialization and I don't think we really know how they would work.
> They are also very likely limited to at most 256 values (maybe even
> less, such as 16), even if they can actually be made to work.
>
> The way the current pre-nv50 driver works is that there are 8 slots,
> each of which has an interpolator and a fixed associated vertex shader
> output and fixed fragment input. This seems a rather obvious way to
> design hardware, and so shouldn't be uncommon.
>
> Thus, the inputs/outputs can't be packed, because that will break if
> the fragment shader doesn't use a vertex output.
> And there is no way to correct that when the fragment program comes
> up, other than recompiling the vertex shader, which would be very
> desirable to avoid having to do.
>
> Non-GLSL programs can only use the 8 texcoords, so there is no problem
> there since hardware supports 8 slots.
>
> Thus, I think my proposed solution is the simplest and most efficient 
> approach.
> Any other solution would require much more, and slower, code in the
> Gallium drivers for nv30, nv40, and maybe Intel too.
>
> --
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for Conference
> attendees to learn about information security's most important issues through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> ___
> Mesa3d-dev mailing list
> Mesa3d-dev@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
>



-- 
Only fools are easily impressed by what is only
barely beyond their reach. ~ Unknown

Corbin Simpson


--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-18 Thread Luca Barbieri
So, basically, you allocate the rasterizer units according to the
vertex shader, and when the fragment shader comes up, you say "write
rasterizer output 4 to fragment input 100"?

The current nouveau drivers can't do this.
There are "routing" registers in hardware, but I think the nVidia
proprietary driver (at least without GLSL) leaves them unaltered after
initialization and I don't think we really know how they would work.
They are also very likely limited to at most 256 values (maybe even
less, such as 16), even if they can actually be made to work.

The way the current pre-nv50 driver works is that there are 8 slots,
each of which has an interpolator and a fixed associated vertex shader
output and fixed fragment input. This seems a rather obvious way to
design hardware, and so shouldn't be uncommon.

Thus, the inputs/outputs can't be packed, because that will break if
the fragment shader doesn't use a vertex output.
And there is no way to correct that when the fragment program comes
up, other than recompiling the vertex shader, which would be very
desirable to avoid having to do.

Non-GLSL programs can only use the 8 texcoords, so there is no problem
there since hardware supports 8 slots.

Thus, I think my proposed solution is the simplest and most efficient approach.
Any other solution would require much more, and slower, code in the
Gallium drivers for nv30, nv40, and maybe Intel too.

--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-18 Thread Alex Deucher
On Mon, Jan 18, 2010 at 10:52 AM, Marek Olšák  wrote:
> On Mon, Jan 18, 2010 at 3:22 PM, Luca Barbieri 
> wrote:
>>
>> > I think this is not necessary and fixing the rasterizer setup in the
>> > driver
>> > would by better than fixing the state tracker.
>> >
>> > In r300g, we dynamically allocate rasterizer units based on vertex
>> > shader
>> > outputs. If the vertex shader uses slots 1, 5, 20, 100, the driver
>> > maps
>> > them to units 1,2,3,4.
>>
>> But what if the fragment shader has inputs 1, 2, 5, 20, 100?
>> If you remap the fragment shader to 1, 2, 3, 4, 5, then they will
>> mismatch.
>>
>> You would need to either:
>> 1. Generate shaders in the driver for the fragment/vertex combination
>> instead of each one separately
>> 2. Require that vertex shader outputs match fragment shader inputs exactly
>>
>> (1) makes the driver much more complex and slow. I think we should try
>> to make it possible to avoid this, unless the hardware absolutely
>> requires it.
>> (2) will probably break the existing fixed pipeline and ARB_fp/vp
>> support, and also make the driver more complex than necessary.
>>
>> Does r300g compile both fragment and vertex shader together?
>>
>> Also note that all Gallium-capable hardware should support 8 varying
>> slots, so anything that uses only "texture coordinates" should not
>> need any remapping.
>
> I was talking about the rasterizer (interpolator) units, which, on r300, are
> quite flexible and can read an arbitrary vertex shader output and write it
> to an arbitrary fragment shader input (= register address). Given this
> flexibility, fragment and vertex shaders are compiled separately in r300g
> and semantic indices don't matter, just the total number of varyings.

I think DX9 required this flexibility when mapping VS to PS, so it's
likely most DX9 hw supports this.

Alex

--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-18 Thread Marek Olšák
On Mon, Jan 18, 2010 at 3:22 PM, Luca Barbieri wrote:

> > I think this is not necessary and fixing the rasterizer setup in the
> driver
> > would by better than fixing the state tracker.
> >
> > In r300g, we dynamically allocate rasterizer units based on vertex shader
> > outputs. If the vertex shader uses slots 1, 5, 20, 100, the driver
> maps
> > them to units 1,2,3,4.
>
> But what if the fragment shader has inputs 1, 2, 5, 20, 100?
> If you remap the fragment shader to 1, 2, 3, 4, 5, then they will mismatch.
>
> You would need to either:
> 1. Generate shaders in the driver for the fragment/vertex combination
> instead of each one separately
> 2. Require that vertex shader outputs match fragment shader inputs exactly
>
> (1) makes the driver much more complex and slow. I think we should try
> to make it possible to avoid this, unless the hardware absolutely
> requires it.
> (2) will probably break the existing fixed pipeline and ARB_fp/vp
> support, and also make the driver more complex than necessary.
>
> Does r300g compile both fragment and vertex shader together?
>
> Also note that all Gallium-capable hardware should support 8 varying
> slots, so anything that uses only "texture coordinates" should not
> need any remapping.
>

I was talking about the rasterizer (interpolator) units, which, on r300, are
quite flexible and can read an arbitrary vertex shader output and write it
to an arbitrary fragment shader input (= register address). Given this
flexibility, fragment and vertex shaders are compiled separately in r300g
and semantic indices don't matter, just the total number of varyings.
--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-18 Thread Luca Barbieri
> I think this is not necessary and fixing the rasterizer setup in the driver
> would by better than fixing the state tracker.
>
> In r300g, we dynamically allocate rasterizer units based on vertex shader
> outputs. If the vertex shader uses slots 1, 5, 20, 100, the driver maps
> them to units 1,2,3,4.

But what if the fragment shader has inputs 1, 2, 5, 20, 100?
If you remap the fragment shader to 1, 2, 3, 4, 5, then they will mismatch.

You would need to either:
1. Generate shaders in the driver for the fragment/vertex combination
instead of each one separately
2. Require that vertex shader outputs match fragment shader inputs exactly

(1) makes the driver much more complex and slow. I think we should try
to make it possible to avoid this, unless the hardware absolutely
requires it.
(2) will probably break the existing fixed pipeline and ARB_fp/vp
support, and also make the driver more complex than necessary.

Does r300g compile both fragment and vertex shader together?

Also note that all Gallium-capable hardware should support 8 varying
slots, so anything that uses only "texture coordinates" should not
need any remapping.

--
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-17 Thread Marek Olšák
Hi Luca,

I think this is not necessary and fixing the rasterizer setup in the driver
would by better than fixing the state tracker.

In r300g, we dynamically allocate rasterizer units based on vertex shader
outputs. If the vertex shader uses slots 1, 5, 20, 100, the driver maps
them to units 1,2,3,4.

Marek

On Sun, Jan 17, 2010 at 8:05 PM, Luca Barbieri wrote:

> The current GLSL linker puts varyings in slots starting from *_VAR0,
> leaving the *_TEX slots used only for gl_TexCoord[i].
>
> This results in TGSI programs that start using generic input/outputs
> with index 10.
>
> Unfortunately, some drivers (e.g. pre-nv50 nouveau) support only 8
> vertex program outputs, and this causes GLSL to not work at all.
> On other cards, GLSL works, but 8 varying slots are lost.
>
> This patch solves the problem by modifying the GLSL linker to allocate
> varyings in texcoord slots that neither vertex nor fragment shader uses.
>
> Note that the GLSL linker is the only place where this can be done,
> because it is the only place that sees both the vertex and fragment
> programs at once.
>
> The only known issue is that if the GLSL program has an indirect
> reference to gl_TexCoord[i], no varyings will be put in texcoord slots.
> This may or may not be desirable.
>
> This makes (a subset of) GLSL work on NV30/NV40 and improves the
> chances of complex programs working on other cards.
>
> Signed-off-by: Luca Barbieri 
> ---
>  src/mesa/shader/slang/slang_link.c |   62
> ++-
>  1 files changed, 46 insertions(+), 16 deletions(-)
>
> diff --git a/src/mesa/shader/slang/slang_link.c
> b/src/mesa/shader/slang/slang_link.c
> index ed27821..889a811 100644
> --- a/src/mesa/shader/slang/slang_link.c
> +++ b/src/mesa/shader/slang/slang_link.c
> @@ -99,9 +99,9 @@ bits_agree(GLbitfield flags1, GLbitfield flags2,
> GLbitfield bit)
>  */
>  static GLboolean
>  link_varying_vars(GLcontext *ctx,
> -  struct gl_shader_program *shProg, struct gl_program
> *prog)
> +  struct gl_shader_program *shProg, struct gl_program
> *prog, GLbyte* varying_slots)
>  {
> -   GLuint *map, i, firstVarying, newFile;
> +   GLuint *map, i, firstTex, firstVarying, newFile;
>GLbitfield *inOutFlags;
>
>map = (GLuint *) _mesa_malloc(prog->Varying->NumParameters *
> sizeof(GLuint));
> @@ -114,13 +114,15 @@ link_varying_vars(GLcontext *ctx,
> * Also, replace File=PROGRAM_VARYING with File=PROGRAM_INPUT/OUTPUT.
> */
>if (prog->Target == GL_VERTEX_PROGRAM_ARB) {
> -  firstVarying = VERT_RESULT_VAR0;
> +  firstTex = VERT_RESULT_TEX0;
> +  firstVarying = VERT_RESULT_VAR0 - 8;
>   newFile = PROGRAM_OUTPUT;
>   inOutFlags = prog->OutputFlags;
>}
>else {
>   assert(prog->Target == GL_FRAGMENT_PROGRAM_ARB);
> -  firstVarying = FRAG_ATTRIB_VAR0;
> +  firstTex = FRAG_ATTRIB_TEX0;
> +  firstVarying = FRAG_ATTRIB_VAR0 - 8;
>   newFile = PROGRAM_INPUT;
>   inOutFlags = prog->InputFlags;
>}
> @@ -173,9 +175,12 @@ link_varying_vars(GLcontext *ctx,
>   {
>  GLint sz = var->Size;
>  while (sz > 0) {
> -inOutFlags[firstVarying + j] = var->Flags;
> +int v = varying_slots[j];
> +v += ((v < 8) ? firstTex : firstVarying);
> +inOutFlags[v] = var->Flags;
> /*printf("Link varying from %d to %d\n", i, j);*/
> -map[i++] = j++;
> +map[i++] = v;
> +++j;
> sz -= 4;
>  }
>  i--; /* go back one */
> @@ -192,13 +197,13 @@ link_varying_vars(GLcontext *ctx,
>
>   if (inst->DstReg.File == PROGRAM_VARYING) {
>  inst->DstReg.File = newFile;
> - inst->DstReg.Index = map[ inst->DstReg.Index ] + firstVarying;
> + inst->DstReg.Index = map[ inst->DstReg.Index ];
>   }
>
>   for (j = 0; j < 3; j++) {
>  if (inst->SrcReg[j].File == PROGRAM_VARYING) {
> inst->SrcReg[j].File = newFile;
> -inst->SrcReg[j].Index = map[ inst->SrcReg[j].Index ] +
> firstVarying;
> +inst->SrcReg[j].Index = map[ inst->SrcReg[j].Index ];
>  }
>   }
>}
> @@ -790,14 +795,39 @@ _slang_link(GLcontext *ctx,
>   ASSERT(shProg->FragmentProgram->Base.RefCount == 1);
>}
>
> -   /* link varying vars */
> -   if (shProg->VertexProgram) {
> -  if (!link_varying_vars(ctx, shProg, &shProg->VertexProgram->Base))
> - return;
> -   }
> -   if (shProg->FragmentProgram) {
> -  if (!link_varying_vars(ctx, shProg, &shProg->FragmentProgram->Base))
> - return;
> +   {
> +  GLuint texcoord_mask = 0;
> +  GLbyte varying_slots[MAX_VARYING];
> +  GLuint next_varying = 0;
> +
> +  if(shProg->VertexProgram)
> +  {
> +_slang_update_inputs_outputs(&shProg->VertexProgram->Base);
> +texcoord_mask |= (shProg->VertexProgram->Base.OutputsWritten
> >> VERT_RESULT_TEX0) & 0xff;
> +  }
> +  if(shProg->FragmentPr

[Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-17 Thread Luca Barbieri
The current GLSL linker puts varyings in slots starting from *_VAR0,
leaving the *_TEX slots used only for gl_TexCoord[i].

This results in TGSI programs that start using generic input/outputs
with index 10.

Unfortunately, some drivers (e.g. pre-nv50 nouveau) support only 8
vertex program outputs, and this causes GLSL to not work at all.
On other cards, GLSL works, but 8 varying slots are lost.

This patch solves the problem by modifying the GLSL linker to allocate
varyings in texcoord slots that neither vertex nor fragment shader uses.

Note that the GLSL linker is the only place where this can be done,
because it is the only place that sees both the vertex and fragment
programs at once.

The only known issue is that if the GLSL program has an indirect
reference to gl_TexCoord[i], no varyings will be put in texcoord slots.
This may or may not be desirable.

This makes (a subset of) GLSL work on NV30/NV40 and improves the
chances of complex programs working on other cards.

Signed-off-by: Luca Barbieri 
---
 src/mesa/shader/slang/slang_link.c |   62 ++-
 1 files changed, 46 insertions(+), 16 deletions(-)

diff --git a/src/mesa/shader/slang/slang_link.c 
b/src/mesa/shader/slang/slang_link.c
index ed27821..889a811 100644
--- a/src/mesa/shader/slang/slang_link.c
+++ b/src/mesa/shader/slang/slang_link.c
@@ -99,9 +99,9 @@ bits_agree(GLbitfield flags1, GLbitfield flags2, GLbitfield 
bit)
  */
 static GLboolean
 link_varying_vars(GLcontext *ctx,
-  struct gl_shader_program *shProg, struct gl_program *prog)
+  struct gl_shader_program *shProg, struct gl_program *prog, 
GLbyte* varying_slots)
 {
-   GLuint *map, i, firstVarying, newFile;
+   GLuint *map, i, firstTex, firstVarying, newFile;
GLbitfield *inOutFlags;
 
map = (GLuint *) _mesa_malloc(prog->Varying->NumParameters * 
sizeof(GLuint));
@@ -114,13 +114,15 @@ link_varying_vars(GLcontext *ctx,
 * Also, replace File=PROGRAM_VARYING with File=PROGRAM_INPUT/OUTPUT.
 */
if (prog->Target == GL_VERTEX_PROGRAM_ARB) {
-  firstVarying = VERT_RESULT_VAR0;
+  firstTex = VERT_RESULT_TEX0;
+  firstVarying = VERT_RESULT_VAR0 - 8;
   newFile = PROGRAM_OUTPUT;
   inOutFlags = prog->OutputFlags;
}
else {
   assert(prog->Target == GL_FRAGMENT_PROGRAM_ARB);
-  firstVarying = FRAG_ATTRIB_VAR0;
+  firstTex = FRAG_ATTRIB_TEX0;
+  firstVarying = FRAG_ATTRIB_VAR0 - 8;
   newFile = PROGRAM_INPUT;
   inOutFlags = prog->InputFlags;
}
@@ -173,9 +175,12 @@ link_varying_vars(GLcontext *ctx,
   {
  GLint sz = var->Size;
  while (sz > 0) {
-inOutFlags[firstVarying + j] = var->Flags;
+int v = varying_slots[j];
+v += ((v < 8) ? firstTex : firstVarying);
+inOutFlags[v] = var->Flags;
 /*printf("Link varying from %d to %d\n", i, j);*/
-map[i++] = j++;
+map[i++] = v;
+++j;
 sz -= 4;
  }
  i--; /* go back one */
@@ -192,13 +197,13 @@ link_varying_vars(GLcontext *ctx,
 
   if (inst->DstReg.File == PROGRAM_VARYING) {
  inst->DstReg.File = newFile;
- inst->DstReg.Index = map[ inst->DstReg.Index ] + firstVarying;
+ inst->DstReg.Index = map[ inst->DstReg.Index ];
   }
 
   for (j = 0; j < 3; j++) {
  if (inst->SrcReg[j].File == PROGRAM_VARYING) {
 inst->SrcReg[j].File = newFile;
-inst->SrcReg[j].Index = map[ inst->SrcReg[j].Index ] + 
firstVarying;
+inst->SrcReg[j].Index = map[ inst->SrcReg[j].Index ];
  }
   }
}
@@ -790,14 +795,39 @@ _slang_link(GLcontext *ctx,
   ASSERT(shProg->FragmentProgram->Base.RefCount == 1);
}
 
-   /* link varying vars */
-   if (shProg->VertexProgram) {
-  if (!link_varying_vars(ctx, shProg, &shProg->VertexProgram->Base))
- return;
-   }
-   if (shProg->FragmentProgram) {
-  if (!link_varying_vars(ctx, shProg, &shProg->FragmentProgram->Base))
- return;
+   {
+  GLuint texcoord_mask = 0;
+  GLbyte varying_slots[MAX_VARYING];
+  GLuint next_varying = 0;
+
+  if(shProg->VertexProgram)
+  {
+_slang_update_inputs_outputs(&shProg->VertexProgram->Base);
+texcoord_mask |= (shProg->VertexProgram->Base.OutputsWritten >> 
VERT_RESULT_TEX0) & 0xff;
+  }
+  if(shProg->FragmentProgram)
+  {
+_slang_update_inputs_outputs(&shProg->FragmentProgram->Base);
+texcoord_mask |= (shProg->FragmentProgram->Base.InputsRead >> 
FRAG_ATTRIB_TEX0) & 0xff;
+  }
+
+  /* Allocate varying slots, excluding slot i if gl_TexCoord[i] is used by 
the shaders */
+  for(i = 0; i < MAX_VARYING; ++i)
+  {
+ for(; (1 << next_varying) & texcoord_mask; ++next_varying)
+ {}
+ varying_slots[i] = next_varying++;
+  }
+
+  /* link varying vars */
+  if (shProg->VertexProgram) {
+ if