Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers
On Wednesday, May 11, 2016 1:19:01 PM PDT Juan A. Suarez Romero wrote: > On Fri, 2016-04-29 at 14:23 +0200, Juan A. Suarez Romero wrote: > > On Fri, 2016-04-29 at 11:15 +0200, Ian Romanick wrote: > > > > > > The driver supports up to 16 vertex attributes. > > > > > > > > ARB_vertex_attrib_64bit > > > > states that attribute variables of type dvec3, dvec4, dmat2x3, > > > > dmat2x4, > > > > dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice > > > > as > > > > many attributes as equivalent single-precision types. > > > > > > > > > > > > I highlight the may, because it is not mandatory. If we count > > > > those > > > > types as consuming the same as a single-precision type (which is > > > > what > > > > is happening in Mesa), we are consuming 15 attributes, so we are > > > > under > > > > the limit. > > > This is the thing we need to fix. Bailing from deep inside the > > > driver > > > code generation (which may happen long, long after linking) is not > > > allowed. If a shader is not going to work, we are required to > > > generate > > > the error in glLinkProgram. > > > > > I'm not sure if I am following you. In which cases there can be code > > generation after the linking? > > > > On the other side, the error is generated when calling > > glLinkProgram(): > > it happens inside the stack that follows the call, when the NIR code > > is > > transformed in the intermediate BRW code. > > > > So from user pov, the error happens when calling glLinkProgram(), not > > afterwards. > > > > > > > > > > > > > > > > > > > > > But I see couple of drawbacks with this approach: > > > > > > > > > > > > > > > - There are tests that under the same conditions (less than the > > > > limit > > > > > > > > > > if you count those types as occupying the same as single- > > > > precision, but > > > > > > > > > > beyond the limit if those types are considered as consuming > > > > twice) they > > > > > > > > > > still works. An example is the attached shader2 test: it > > > > > requires > > > > 13 > > > > > > > > > > attributes (or 19 counting as twice the mentioned types) and it > > > > works > > > > > > > > > > fine. > > > > > > > > > - This check affects to all the backends. And there could be some > > > > backend that works perfectly fine with the current > > > > implementation, > > > > which is less conservative. In fact, we have an example: the same > > > > driver running in vec4 mode (SIMD4x2) works perfectly fine. > > > I think we can handle this by having a per-type (double, dvec2, > > > dvec3, > > > and dvec4) flag to select the double or don't-double behavior. > > > > > Do you mean only dvec3 and dvec4 (and types based on those: dmat2x3, > > dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4)? > > > > Because only restrict to those types to count them as twice when > > checking if we bypass the max attributes limit. > > > > Also, do we really need a flag per type? Wouldn't it be enough to > > have > > a single flag stating if all of those types are counted twice or not? > > > > Finally, I understand the idea is that the flag would be like a > > boolean > > with true (count twice) or false (don't count twice), contained in > > each > > backend/driver (like the max attribs limit, for instance). > > > > > > Anyway, this would fix the second problem, but not the former: if our > > gen8 backend counts doubles as twice, it will prevent to run some > > shaders that otherway would work fine (like the shader2). If this is > > an > > acceptable solution (shaders that work fine now will be rejected > > because they use too much vertex attribs), then it's fine. > > > > > > J.A. > > > CCing Kenneth as this is also related to patch 12. > > > Summing up, we can: > > - Check registers exhaustion and URB read length and abort linking if > we reach the limit. This is what patches 11 and 12 do. > > - Or, we can count dvec3, dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4, > dmat4x3, and dmat4 vertex attributes as consuming twice the equivalent > single-precision types. Right now we are counting them as if they were > single-precision (spec allows both options[1]). This would require to > add some kind of flags to allow drivers to decide if they count them as > one or two. But if we follow this option, some shaders in i965 that > currently work because they do not exhaust registers neither reach the > URB read length limit, will fail to link if the count those mentioned > double types twice. > > So what to do? I feel like the best approach is the second one (double counting dvec[34] and dmat*x[34] types). Although they only take up a single VERTEX_ELEMENT structure, they do take up twice as much URB space (256 bits instead of 128 bits). As you said in your earlier email, we should only get into trouble for shaders that use a lot of double attributes. I doubt that we'll see many of those in the real world. Plus, we are specifically allowed to do that, so conformance tests can't assume things
Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers
On Fri, 2016-04-29 at 14:23 +0200, Juan A. Suarez Romero wrote: > On Fri, 2016-04-29 at 11:15 +0200, Ian Romanick wrote: > > > > The driver supports up to 16 vertex attributes. > > > > > > ARB_vertex_attrib_64bit > > > states that attribute variables of type dvec3, dvec4, dmat2x3, > > > dmat2x4, > > > dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice > > > as > > > many attributes as equivalent single-precision types. > > > > > > > > > I highlight the may, because it is not mandatory. If we count > > > those > > > types as consuming the same as a single-precision type (which is > > > what > > > is happening in Mesa), we are consuming 15 attributes, so we are > > > under > > > the limit. > > This is the thing we need to fix. Bailing from deep inside the > > driver > > code generation (which may happen long, long after linking) is not > > allowed. If a shader is not going to work, we are required to > > generate > > the error in glLinkProgram. > > > I'm not sure if I am following you. In which cases there can be code > generation after the linking? > > On the other side, the error is generated when calling > glLinkProgram(): > it happens inside the stack that follows the call, when the NIR code > is > transformed in the intermediate BRW code. > > So from user pov, the error happens when calling glLinkProgram(), not > afterwards. > > > > > > > > > > > > > > > But I see couple of drawbacks with this approach: > > > > > > > > > > > > - There are tests that under the same conditions (less than the > > > limit > > > > > > > > if you count those types as occupying the same as single- > > > precision, but > > > > > > > > beyond the limit if those types are considered as consuming > > > twice) they > > > > > > > > still works. An example is the attached shader2 test: it > > > > requires > > > 13 > > > > > > > > attributes (or 19 counting as twice the mentioned types) and it > > > works > > > > > > > > fine. > > > > > > > - This check affects to all the backends. And there could be some > > > backend that works perfectly fine with the current > > > implementation, > > > which is less conservative. In fact, we have an example: the same > > > driver running in vec4 mode (SIMD4x2) works perfectly fine. > > I think we can handle this by having a per-type (double, dvec2, > > dvec3, > > and dvec4) flag to select the double or don't-double behavior. > > > Do you mean only dvec3 and dvec4 (and types based on those: dmat2x3, > dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4)? > > Because only restrict to those types to count them as twice when > checking if we bypass the max attributes limit. > > Also, do we really need a flag per type? Wouldn't it be enough to > have > a single flag stating if all of those types are counted twice or not? > > Finally, I understand the idea is that the flag would be like a > boolean > with true (count twice) or false (don't count twice), contained in > each > backend/driver (like the max attribs limit, for instance). > > > Anyway, this would fix the second problem, but not the former: if our > gen8 backend counts doubles as twice, it will prevent to run some > shaders that otherway would work fine (like the shader2). If this is > an > acceptable solution (shaders that work fine now will be rejected > because they use too much vertex attribs), then it's fine. > > > J.A. CCing Kenneth as this is also related to patch 12. Summing up, we can: - Check registers exhaustion and URB read length and abort linking if we reach the limit. This is what patches 11 and 12 do. - Or, we can count dvec3, dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4 vertex attributes as consuming twice the equivalent single-precision types. Right now we are counting them as if they were single-precision (spec allows both options[1]). This would require to add some kind of flags to allow drivers to decide if they count them as one or two. But if we follow this option, some shaders in i965 that currently work because they do not exhaust registers neither reach the URB read length limit, will fail to link if the count those mentioned double types twice. So what to do? [1] "(modify the second paragraph, p. 87) ... exceeds MAX_VERTEX_ATTRIBS. For the purposes of this comparison, attribute variables of the type dvec3, dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4 may count as consuming twice as many attributes as equivalent single-precision types. While these types use the same number of generic attributes as their single-precision equivalents, implementations are permitted to consume two single-precision vectors of internal storage for each three- or four-component double-precision vector." J.A. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers
On Fri, 2016-04-29 at 11:15 +0200, Ian Romanick wrote: > The driver supports up to 16 vertex attributes. > > ARB_vertex_attrib_64bit > > states that attribute variables of type dvec3, dvec4, dmat2x3, > > dmat2x4, > > dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice > > as > > many attributes as equivalent single-precision types. > > > > > > I highlight the may, because it is not mandatory. If we count those > > types as consuming the same as a single-precision type (which is > > what > > is happening in Mesa), we are consuming 15 attributes, so we are > > under > > the limit. > This is the thing we need to fix. Bailing from deep inside the > driver > code generation (which may happen long, long after linking) is not > allowed. If a shader is not going to work, we are required to > generate > the error in glLinkProgram. > I'm not sure if I am following you. In which cases there can be code generation after the linking? On the other side, the error is generated when calling glLinkProgram(): it happens inside the stack that follows the call, when the NIR code is transformed in the intermediate BRW code. So from user pov, the error happens when calling glLinkProgram(), not afterwards. > > > But I see couple of drawbacks with this approach: > > > > > > > > > - There are tests that under the same conditions (less than the > > limit > > > if you count those types as occupying the same as single- > > precision, but > > > beyond the limit if those types are considered as consuming > > twice) they > > > still works. An example is the attached shader2 test: it requires > > 13 > > > attributes (or 19 counting as twice the mentioned types) and it > > works > > > fine. > > > > > - This check affects to all the backends. And there could be some > > backend that works perfectly fine with the current implementation, > > which is less conservative. In fact, we have an example: the same > > driver running in vec4 mode (SIMD4x2) works perfectly fine. > I think we can handle this by having a per-type (double, dvec2, > dvec3, > and dvec4) flag to select the double or don't-double behavior. > Do you mean only dvec3 and dvec4 (and types based on those: dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4)? Because only restrict to those types to count them as twice when checking if we bypass the max attributes limit. Also, do we really need a flag per type? Wouldn't it be enough to have a single flag stating if all of those types are counted twice or not? Finally, I understand the idea is that the flag would be like a boolean with true (count twice) or false (don't count twice), contained in each backend/driver (like the max attribs limit, for instance). Anyway, this would fix the second problem, but not the former: if our gen8 backend counts doubles as twice, it will prevent to run some shaders that otherway would work fine (like the shader2). If this is an acceptable solution (shaders that work fine now will be rejected because they use too much vertex attribs), then it's fine. J.A. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers
On Fri, 2016-04-29 at 11:15 +0200, Ian Romanick wrote: > I don't see where you get 19. I get 3 array elements * 2 matrix > columns > * 2 for value0, 2 array elements * 3 matrix columns * 2 for value1, > and > 1 for piglit_vertex. That's 25. > > This overcounts because by naive doubling the dmat2 counts each > column > as 2 slots, but we only actually need 1. By doubling only when it's > necessary, that shader would need (3 * 2) + (2 * 3 * 2) + 1 = 19. I get 19 because for value0, I count 3 array elements * 2 matrix columns. I don't multiply by 2 again because dmat2 is not in the list of the types that my count as consuming twice. In any case, either counting 25 or 19, the value is above the limit. J.A. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers
On Thu, 2016-04-28 at 15:29 +0200, Ian Romanick wrote: > On 04/28/2016 01:40 PM, Antia Puentes wrote: > > > > From: "Juan A. Suarez Romero"> > > > Even when the number of vertex attributes is under the limit, for > > shaders that use a high number of them, we can quickly exhaust the > > number of hardware registers. > Were you able to construct a case where this actually occurs? Limits > exposed by the driver and enforced by the GLSL linker should prevent > this. > (Re-sending, because the original email was too big). Yes. See the attached shader1 test that exposes this problem. The driver supports up to 16 vertex attributes. ARB_vertex_attrib_64bit states that attribute variables of type dvec3, dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice as many attributes as equivalent single-precision types. I highlight the may, because it is not mandatory. If we count those types as consuming the same as a single-precision type (which is what is happening in Mesa), we are consuming 15 attributes, so we are under the limit. The issue is that in scalar mode (SIMD8), for each vec4 attribute we require 4 registers (or 8 per each dvec4 attribute), so it is easy to reach a huge number of registers. Which is the problem the test is exposing. If we were working on SIMD4x2, this wouldn't happen, as we would require only 1 register per vec4 attribute (or 2 per each dvec4). So the problem is a combination of using a high number of attributes and SIMD8 mode. One of the first approaches we took was precisely to consider the previous types to consume two attributes, instead of one. In this case, the shader1 test would be consuming 29 attributes, so the limit would be reached. But I see couple of drawbacks with this approach: - There are tests that under the same conditions (less than the limit if you count those types as occupying the same as single-precision, but beyond the limit if those types are considered as consuming twice) they still works. An example is the attached shader2 test: it requires 13 attributes (or 19 counting as twice the mentioned types) and it works fine. - This check affects to all the backends. And there could be some backend that works perfectly fine with the current implementation, which is less conservative. In fact, we have an example: the same driver running in vec4 mode (SIMD4x2) works perfectly fine. So all in all, the best way we found is to keep how we count vertex attributes, and just abort if we exhaust the available registers. Ideally, the best approach would be to switch to vec4 mode. But this would require to support gen8+vec4 (we are right now working on support for gen7, which uses vec4), and also to improve switching from scalar mode to vec4 when compiling the shader. J.A. shader1.shader_test.gz Description: application/gzip shader2.shader_test.gz Description: application/gzip ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers
On 04/29/2016 09:32 AM, Juan A. Suarez Romero wrote: > On Thu, 2016-04-28 at 15:29 +0200, Ian Romanick wrote: >> On 04/28/2016 01:40 PM, Antia Puentes wrote: >>> >>> From: "Juan A. Suarez Romero">>> >>> Even when the number of vertex attributes is under the limit, for >>> shaders that use a high number of them, we can quickly exhaust the >>> number of hardware registers. >> Were you able to construct a case where this actually occurs? Limits >> exposed by the driver and enforced by the GLSL linker should prevent >> this. >> > > Yes. See the attached shader1 test that exposes this problem. > > > The driver supports up to 16 vertex attributes. ARB_vertex_attrib_64bit > states that attribute variables of type dvec3, dvec4, dmat2x3, dmat2x4, > dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice as > many attributes as equivalent single-precision types. > > > I highlight the may, because it is not mandatory. If we count those > types as consuming the same as a single-precision type (which is what > is happening in Mesa), we are consuming 15 attributes, so we are under > the limit. This is the thing we need to fix. Bailing from deep inside the driver code generation (which may happen long, long after linking) is not allowed. If a shader is not going to work, we are required to generate the error in glLinkProgram. > The issue is that in scalar mode (SIMD8), for each vec4 attribute we > require 4 registers (or 8 per each dvec4 attribute), so it is easy to > reach a huge number of registers. Which is the problem the test is > exposing. > > > If we were working on SIMD4x2, this wouldn't happen, as we would > require only 1 register per vec4 attribute (or 2 per each dvec4). > > > So the problem is a combination of using a high number of attributes > and SIMD8 mode. > > > One of the first approaches we took was precisely to consider the > previous types to consume two attributes, instead of one. In this case, > the shader1 test would be consuming 29 attributes, so the limit would > be reached. > > > But I see couple of drawbacks with this approach: > > > - There are tests that under the same conditions (less than the limit > if you count those types as occupying the same as single-precision, but > beyond the limit if those types are considered as consuming twice) they > still works. An example is the attached shader2 test: it requires 13 > attributes (or 19 counting as twice the mentioned types) and it works > fine. I don't see where you get 19. I get 3 array elements * 2 matrix columns * 2 for value0, 2 array elements * 3 matrix columns * 2 for value1, and 1 for piglit_vertex. That's 25. This overcounts because by naive doubling the dmat2 counts each column as 2 slots, but we only actually need 1. By doubling only when it's necessary, that shader would need (3 * 2) + (2 * 3 * 2) + 1 = 19. > - This check affects to all the backends. And there could be some > backend that works perfectly fine with the current implementation, > which is less conservative. In fact, we have an example: the same > driver running in vec4 mode (SIMD4x2) works perfectly fine. I think we can handle this by having a per-type (double, dvec2, dvec3, and dvec4) flag to select the double or don't-double behavior. > So all in all, the best way we found is to keep how we count vertex > attributes, and just abort if we exhaust the available registers. > > Ideally, the best approach would be to switch to vec4 mode. But this > would require to support gen8+vec4 (we are right now working on support > for gen7, which uses vec4), and also to improve switching from scalar > mode to vec4 when compiling the shader. > > > J.A. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers
On 04/28/2016 01:40 PM, Antia Puentes wrote: > From: "Juan A. Suarez Romero"> > Even when the number of vertex attributes is under the limit, for > shaders that use a high number of them, we can quickly exhaust the > number of hardware registers. Were you able to construct a case where this actually occurs? Limits exposed by the driver and enforced by the GLSL linker should prevent this. > In this case, just abort the linking. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 4b8835d..387a266 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -1857,6 +1857,12 @@ fs_visitor::convert_attr_sources_to_hw_regs(fs_inst > *inst) > inst->src[i].nr + > inst->src[i].reg_offset; > > + if (grf >= 128) { > +fail("Failure to register allocate. Reduce the number of " > + "vertex input attributes to avoid this."); > +return; > + } > + > unsigned exec_size; > /* As explained at brw_reg_from_fs_reg, From the Haswell PRM: >* > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev