Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers

2016-05-11 Thread Kenneth Graunke
On Wednesday, May 11, 2016 1:19:01 PM PDT Juan A. Suarez Romero wrote:
> On Fri, 2016-04-29 at 14:23 +0200, Juan A. Suarez Romero wrote:
> > On Fri, 2016-04-29 at 11:15 +0200, Ian Romanick wrote:
> > > 
> > > The driver supports up to 16 vertex attributes.
> > > > 
> > > > ARB_vertex_attrib_64bit
> > > > states that attribute variables of type dvec3, dvec4, dmat2x3,
> > > > dmat2x4,
> > > > dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice
> > > > as
> > > > many attributes as equivalent single-precision types.
> > > > 
> > > > 
> > > > I highlight the may, because it is not mandatory. If we count
> > > > those
> > > > types as consuming the same as a single-precision type (which is
> > > > what
> > > > is happening in Mesa), we are consuming 15 attributes, so we are
> > > > under 
> > > > the limit.
> > > This is the thing we need to fix.  Bailing from deep inside the
> > > driver
> > > code generation (which may happen long, long after linking) is not
> > > allowed.  If a shader is not going to work, we are required to
> > > generate
> > > the error in glLinkProgram.
> > > 
> > I'm not sure if I am following you. In which cases there can be code
> > generation after the linking?
> > 
> > On the other side, the error is generated when calling
> > glLinkProgram():
> > it happens inside the stack that follows the call, when the NIR code
> > is
> > transformed in the intermediate BRW code.
> > 
> > So from user pov, the error happens when calling glLinkProgram(), not
> > afterwards.
> > 
> > 
> > > 
> > > > 
> > > > > 
> > > > > But I see couple of drawbacks with this approach:
> > > > >  
> > > > >  
> > > > > - There are tests that under the same conditions (less than the
> > > > limit
> > > > > 
> > > > > if you count those types as occupying the same as single-
> > > > precision, but
> > > > > 
> > > > > beyond the limit if those types are considered as consuming
> > > > twice) they
> > > > > 
> > > > > still works. An example is the attached shader2 test: it
> > > > > requires
> > > > 13
> > > > > 
> > > > > attributes (or 19 counting as twice the mentioned types) and it
> > > > works
> > > > > 
> > > > > fine.
> > > > > 
> > > > - This check affects to all the backends. And there could be some
> > > > backend that works perfectly fine with the current
> > > > implementation,
> > > > which is less conservative. In fact, we have an example: the same
> > > > driver running in vec4 mode (SIMD4x2) works perfectly fine.
> > > I think we can handle this by having a per-type (double, dvec2,
> > > dvec3,
> > > and dvec4) flag to select the double or don't-double behavior.
> > > 
> > Do you mean only dvec3 and dvec4 (and types based on those: dmat2x3,
> > dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4)?
> > 
> > Because only restrict to those types to count them as twice when
> > checking if we bypass the max attributes limit.
> > 
> > Also, do we really need a flag per type? Wouldn't it be enough to
> > have
> > a single flag stating if all of those types are counted twice or not?
> > 
> > Finally, I understand the idea is that the flag would be like a
> > boolean
> > with true (count twice) or false (don't count twice), contained in
> > each
> > backend/driver (like the max attribs limit, for instance).
> > 
> > 
> > Anyway, this would fix the second problem, but not the former: if our
> > gen8 backend counts doubles as twice, it will prevent to run some
> > shaders that otherway would work fine (like the shader2). If this is
> > an
> > acceptable solution (shaders that work fine now will be rejected
> > because they use too much vertex attribs), then it's fine.
> > 
> > 
> > J.A.
> 
> 
> CCing Kenneth as this is also related to patch 12.
> 
> 
> Summing up, we can:
> 
> -  Check registers exhaustion and URB read length and abort linking if
> we reach the limit. This is what patches 11 and 12 do.
> 
> - Or, we can count dvec3, dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4,
> dmat4x3, and dmat4 vertex attributes as consuming twice the equivalent
> single-precision types. Right now we are counting them as if they were
> single-precision (spec allows both options[1]). This would require to
> add some kind of flags to allow drivers to decide if they count them as
> one or two. But if we follow this option, some shaders in i965 that
> currently work because they do not exhaust registers neither reach the
> URB read length limit, will fail to link if the count those mentioned
> double types twice.
> 
> So what to do?

I feel like the best approach is the second one (double counting
dvec[34] and dmat*x[34] types).  Although they only take up a single
VERTEX_ELEMENT structure, they do take up twice as much URB space
(256 bits instead of 128 bits).

As you said in your earlier email, we should only get into trouble
for shaders that use a lot of double attributes.  I doubt that we'll
see many of those in the real world.  Plus, we are specifically allowed
to do that, so conformance tests can't assume things 

Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers

2016-05-11 Thread Juan A. Suarez Romero
On Fri, 2016-04-29 at 14:23 +0200, Juan A. Suarez Romero wrote:
> On Fri, 2016-04-29 at 11:15 +0200, Ian Romanick wrote:
> > 
> > The driver supports up to 16 vertex attributes.
> > > 
> > > ARB_vertex_attrib_64bit
> > > states that attribute variables of type dvec3, dvec4, dmat2x3,
> > > dmat2x4,
> > > dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice
> > > as
> > > many attributes as equivalent single-precision types.
> > > 
> > > 
> > > I highlight the may, because it is not mandatory. If we count
> > > those
> > > types as consuming the same as a single-precision type (which is
> > > what
> > > is happening in Mesa), we are consuming 15 attributes, so we are
> > > under 
> > > the limit.
> > This is the thing we need to fix.  Bailing from deep inside the
> > driver
> > code generation (which may happen long, long after linking) is not
> > allowed.  If a shader is not going to work, we are required to
> > generate
> > the error in glLinkProgram.
> > 
> I'm not sure if I am following you. In which cases there can be code
> generation after the linking?
> 
> On the other side, the error is generated when calling
> glLinkProgram():
> it happens inside the stack that follows the call, when the NIR code
> is
> transformed in the intermediate BRW code.
> 
> So from user pov, the error happens when calling glLinkProgram(), not
> afterwards.
> 
> 
> > 
> > > 
> > > > 
> > > > But I see couple of drawbacks with this approach:
> > > >  
> > > >  
> > > > - There are tests that under the same conditions (less than the
> > > limit
> > > > 
> > > > if you count those types as occupying the same as single-
> > > precision, but
> > > > 
> > > > beyond the limit if those types are considered as consuming
> > > twice) they
> > > > 
> > > > still works. An example is the attached shader2 test: it
> > > > requires
> > > 13
> > > > 
> > > > attributes (or 19 counting as twice the mentioned types) and it
> > > works
> > > > 
> > > > fine.
> > > > 
> > > - This check affects to all the backends. And there could be some
> > > backend that works perfectly fine with the current
> > > implementation,
> > > which is less conservative. In fact, we have an example: the same
> > > driver running in vec4 mode (SIMD4x2) works perfectly fine.
> > I think we can handle this by having a per-type (double, dvec2,
> > dvec3,
> > and dvec4) flag to select the double or don't-double behavior.
> > 
> Do you mean only dvec3 and dvec4 (and types based on those: dmat2x3,
> dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4)?
> 
> Because only restrict to those types to count them as twice when
> checking if we bypass the max attributes limit.
> 
> Also, do we really need a flag per type? Wouldn't it be enough to
> have
> a single flag stating if all of those types are counted twice or not?
> 
> Finally, I understand the idea is that the flag would be like a
> boolean
> with true (count twice) or false (don't count twice), contained in
> each
> backend/driver (like the max attribs limit, for instance).
> 
> 
> Anyway, this would fix the second problem, but not the former: if our
> gen8 backend counts doubles as twice, it will prevent to run some
> shaders that otherway would work fine (like the shader2). If this is
> an
> acceptable solution (shaders that work fine now will be rejected
> because they use too much vertex attribs), then it's fine.
> 
> 
>   J.A.


CCing Kenneth as this is also related to patch 12.


Summing up, we can:

-  Check registers exhaustion and URB read length and abort linking if
we reach the limit. This is what patches 11 and 12 do.

- Or, we can count dvec3, dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4,
dmat4x3, and dmat4 vertex attributes as consuming twice the equivalent
single-precision types. Right now we are counting them as if they were
single-precision (spec allows both options[1]). This would require to
add some kind of flags to allow drivers to decide if they count them as
one or two. But if we follow this option, some shaders in i965 that
currently work because they do not exhaust registers neither reach the
URB read length limit, will fail to link if the count those mentioned
double types twice.

So what to do?


[1] "(modify the second paragraph, p. 87) ... exceeds
MAX_VERTEX_ATTRIBS.  For
the purposes of this comparison, attribute variables of the type
dvec3,
dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4 may
count as
consuming twice as many attributes as equivalent single-precision
types.
While these types use the same number of generic attributes as
their
single-precision equivalents, implementations are permitted to
consume two
single-precision vectors of internal storage for each three- or
four-component double-precision vector."



J.A.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers

2016-04-29 Thread Juan A. Suarez Romero
On Fri, 2016-04-29 at 11:15 +0200, Ian Romanick wrote:
> The driver supports up to 16 vertex attributes.
> > ARB_vertex_attrib_64bit
> > states that attribute variables of type dvec3, dvec4, dmat2x3,
> > dmat2x4,
> > dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice
> > as
> > many attributes as equivalent single-precision types.
> > 
> > 
> > I highlight the may, because it is not mandatory. If we count those
> > types as consuming the same as a single-precision type (which is
> > what
> > is happening in Mesa), we are consuming 15 attributes, so we are
> > under 
> > the limit.
> This is the thing we need to fix.  Bailing from deep inside the
> driver
> code generation (which may happen long, long after linking) is not
> allowed.  If a shader is not going to work, we are required to
> generate
> the error in glLinkProgram.
> 

I'm not sure if I am following you. In which cases there can be code
generation after the linking?

On the other side, the error is generated when calling glLinkProgram():
it happens inside the stack that follows the call, when the NIR code is
transformed in the intermediate BRW code.

So from user pov, the error happens when calling glLinkProgram(), not
afterwards.


> > > But I see couple of drawbacks with this approach:
> > > 
> > > 
> > > - There are tests that under the same conditions (less than the
> > limit
> > > if you count those types as occupying the same as single-
> > precision, but
> > > beyond the limit if those types are considered as consuming
> > twice) they
> > > still works. An example is the attached shader2 test: it requires
> > 13
> > > attributes (or 19 counting as twice the mentioned types) and it
> > works
> > > fine.
> > >
> > - This check affects to all the backends. And there could be some
> > backend that works perfectly fine with the current implementation,
> > which is less conservative. In fact, we have an example: the same
> > driver running in vec4 mode (SIMD4x2) works perfectly fine.
> I think we can handle this by having a per-type (double, dvec2,
> dvec3,
> and dvec4) flag to select the double or don't-double behavior.
> 

Do you mean only dvec3 and dvec4 (and types based on those: dmat2x3,
dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4)?

Because only restrict to those types to count them as twice when
checking if we bypass the max attributes limit.

Also, do we really need a flag per type? Wouldn't it be enough to have
a single flag stating if all of those types are counted twice or not?

Finally, I understand the idea is that the flag would be like a boolean
with true (count twice) or false (don't count twice), contained in each
backend/driver (like the max attribs limit, for instance).


Anyway, this would fix the second problem, but not the former: if our
gen8 backend counts doubles as twice, it will prevent to run some
shaders that otherway would work fine (like the shader2). If this is an
acceptable solution (shaders that work fine now will be rejected
because they use too much vertex attribs), then it's fine.


J.A.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers

2016-04-29 Thread Juan A. Suarez Romero
On Fri, 2016-04-29 at 11:15 +0200, Ian Romanick wrote:
> I don't see where you get 19.  I get 3 array elements * 2 matrix
> columns
> * 2 for value0, 2 array elements * 3 matrix columns * 2 for value1,
> and
> 1 for piglit_vertex.  That's 25.
> 
> This overcounts because by naive doubling the dmat2 counts each
> column
> as 2 slots, but we only actually need 1. By doubling only when it's
> necessary, that shader would need (3 * 2) + (2 * 3 * 2) + 1 = 19.


I get 19 because for value0, I count 3 array elements * 2 matrix
columns. I don't multiply by 2 again because dmat2 is not in the list
of the types that my count as consuming twice.


In any case, either counting 25 or 19, the value is above the limit.


J.A.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers

2016-04-29 Thread Juan A. Suarez Romero
On Thu, 2016-04-28 at 15:29 +0200, Ian Romanick wrote:
> On 04/28/2016 01:40 PM, Antia Puentes wrote:
> > 
> > From: "Juan A. Suarez Romero" 
> > 
> > Even when the number of vertex attributes is under the limit, for
> > shaders that use a high number of them, we can quickly exhaust the
> > number of hardware registers.
> Were you able to construct a case where this actually occurs?  Limits
> exposed by the driver and enforced by the GLSL linker should prevent
> this.
> 

(Re-sending, because the original email was too big).


Yes. See the attached shader1 test that exposes this problem.


The driver supports up to 16 vertex attributes. ARB_vertex_attrib_64bit
states that attribute variables of type dvec3, dvec4, dmat2x3, dmat2x4,
dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice as
many attributes as equivalent single-precision types.


I highlight the may, because it is not mandatory. If we count those
types as consuming the same as a single-precision type (which is what
is happening in Mesa), we are consuming 15 attributes, so we are under 
the limit.


The issue is that in scalar mode (SIMD8), for each vec4 attribute we
require 4 registers (or 8 per each dvec4 attribute), so it is easy to
reach a huge number of registers. Which is the problem the test is
exposing.


If we were working on SIMD4x2, this wouldn't happen, as we would 
require only 1 register per vec4 attribute (or 2 per each dvec4).


So the problem is a combination of using a high number of attributes
and SIMD8 mode.


One of the first approaches we took was precisely to consider the
previous types to consume two attributes, instead of one. In this case,
the shader1 test would be consuming 29 attributes, so the limit would
be reached.


But I see couple of drawbacks with this approach:


- There are tests that under the same conditions (less than the limit
if you count those types as occupying the same as single-precision, but
beyond the limit if those types are considered as consuming twice) they
still works. An example is the attached shader2 test: it requires 13
attributes (or 19 counting as twice the mentioned types) and it works
fine.

- This check affects to all the backends. And there could be some
backend that works perfectly fine with the current implementation,
which is less conservative. In fact, we have an example: the same
driver running in vec4 mode (SIMD4x2) works perfectly fine.


So all in all, the best way we found is to keep how we count vertex
attributes, and just abort if we exhaust the available registers.

Ideally, the best approach would be to switch to vec4 mode. But this
would require to support gen8+vec4 (we are right now working on support
for gen7, which uses vec4), and also to improve switching from scalar
mode to vec4 when compiling the shader.


J.A.



shader1.shader_test.gz
Description: application/gzip


shader2.shader_test.gz
Description: application/gzip
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers

2016-04-29 Thread Ian Romanick
On 04/29/2016 09:32 AM, Juan A. Suarez Romero wrote:
> On Thu, 2016-04-28 at 15:29 +0200, Ian Romanick wrote:
>> On 04/28/2016 01:40 PM, Antia Puentes wrote:
>>>
>>> From: "Juan A. Suarez Romero" 
>>>
>>> Even when the number of vertex attributes is under the limit, for
>>> shaders that use a high number of them, we can quickly exhaust the
>>> number of hardware registers.
>> Were you able to construct a case where this actually occurs?  Limits
>> exposed by the driver and enforced by the GLSL linker should prevent
>> this.
>>
> 
> Yes. See the attached shader1 test that exposes this problem.
> 
> 
> The driver supports up to 16 vertex attributes. ARB_vertex_attrib_64bit
> states that attribute variables of type dvec3, dvec4, dmat2x3, dmat2x4,
> dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice as
> many attributes as equivalent single-precision types.
> 
> 
> I highlight the may, because it is not mandatory. If we count those
> types as consuming the same as a single-precision type (which is what
> is happening in Mesa), we are consuming 15 attributes, so we are under 
> the limit.

This is the thing we need to fix.  Bailing from deep inside the driver
code generation (which may happen long, long after linking) is not
allowed.  If a shader is not going to work, we are required to generate
the error in glLinkProgram.

> The issue is that in scalar mode (SIMD8), for each vec4 attribute we
> require 4 registers (or 8 per each dvec4 attribute), so it is easy to
> reach a huge number of registers. Which is the problem the test is
> exposing.
> 
> 
> If we were working on SIMD4x2, this wouldn't happen, as we would 
> require only 1 register per vec4 attribute (or 2 per each dvec4).
> 
> 
> So the problem is a combination of using a high number of attributes
> and SIMD8 mode.
> 
> 
> One of the first approaches we took was precisely to consider the
> previous types to consume two attributes, instead of one. In this case,
> the shader1 test would be consuming 29 attributes, so the limit would
> be reached.
> 
> 
> But I see couple of drawbacks with this approach:
> 
> 
> - There are tests that under the same conditions (less than the limit
> if you count those types as occupying the same as single-precision, but
> beyond the limit if those types are considered as consuming twice) they
> still works. An example is the attached shader2 test: it requires 13
> attributes (or 19 counting as twice the mentioned types) and it works
> fine.

I don't see where you get 19.  I get 3 array elements * 2 matrix columns
* 2 for value0, 2 array elements * 3 matrix columns * 2 for value1, and
1 for piglit_vertex.  That's 25.

This overcounts because by naive doubling the dmat2 counts each column
as 2 slots, but we only actually need 1. By doubling only when it's
necessary, that shader would need (3 * 2) + (2 * 3 * 2) + 1 = 19.

> - This check affects to all the backends. And there could be some
> backend that works perfectly fine with the current implementation,
> which is less conservative. In fact, we have an example: the same
> driver running in vec4 mode (SIMD4x2) works perfectly fine.

I think we can handle this by having a per-type (double, dvec2, dvec3,
and dvec4) flag to select the double or don't-double behavior.

> So all in all, the best way we found is to keep how we count vertex
> attributes, and just abort if we exhaust the available registers.
> 
> Ideally, the best approach would be to switch to vec4 mode. But this
> would require to support gen8+vec4 (we are right now working on support
> for gen7, which uses vec4), and also to improve switching from scalar
> mode to vec4 when compiling the shader.
> 
> 
> J.A.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers

2016-04-28 Thread Ian Romanick
On 04/28/2016 01:40 PM, Antia Puentes wrote:
> From: "Juan A. Suarez Romero" 
> 
> Even when the number of vertex attributes is under the limit, for
> shaders that use a high number of them, we can quickly exhaust the
> number of hardware registers.

Were you able to construct a case where this actually occurs?  Limits
exposed by the driver and enforced by the GLSL linker should prevent this.

> In this case, just abort the linking.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 4b8835d..387a266 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1857,6 +1857,12 @@ fs_visitor::convert_attr_sources_to_hw_regs(fs_inst 
> *inst)
> inst->src[i].nr +
> inst->src[i].reg_offset;
>  
> + if (grf >= 128) {
> +fail("Failure to register allocate.  Reduce the number of "
> + "vertex input attributes to avoid this.");
> +return;
> + }
> +
>   unsigned exec_size;
>   /* As explained at brw_reg_from_fs_reg, From the Haswell PRM:
>*
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev