Re: [Mesa-dev] [PATCH] intel/fs: Use a pure vertical stride for large register strides

2017-11-09 Thread Jason Ekstrand
On Thu, Nov 9, 2017 at 2:23 PM, Matt Turner  wrote:

> On Thu, Nov 2, 2017 at 3:54 PM, Jason Ekstrand 
> wrote:
> > Register strides higher than 4 are uncommon but they can happen.  For
> > instance, if you have a 64-bit extract_u8 operation, we turn that into
> > UB -> UQ MOV with a source stride of 8.  Our previous calculation would
> > try to generate a stride of <32;8,8>:ub which is invalid because the
> > maximum horizontal stride is 4.  To solve this problem, we instead use a
> > stride of <8;1,0>.  As noted in the comment, this does not work as a
> > destination but that's ok as very few things actually generate that
> > stride.
>
> Please put the tests you fixed in the commit message. It's not okay to
> leave that out for all the reasons that I'm sure you know.
>

I didn't because the test passes before and after the patch.  I guess I
could have included that information though.


> Looks like this doesn't work on CHV, BXT, GLK :(
>
> KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks now fails on CHV,
> BXT, GLK with:
>
> mov(8)  g21<1>UQg19<8,1,0>UB{ align1
> 1Q };
> ERROR: Source and destination horizontal stride must equal and
> a multiple of a qword when the execution type is 64-bit
> ERROR: Vstride must be Width * Hstride when the execution type is
> 64-bit
>
> Modulo the typo in the first error, I think both of these are correct.
> I don't think we can extract_u8 to a 64-bit type on Atom :(
>

That's unfortunate...  Quickly racking my brain, I don't see a slick way to
implement that opcode.  How would you feel about some late opt_algebraic
lowering?


> This is filed as https://bugs.freedesktop.org/show_bug.cgi?id=103628
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Use a pure vertical stride for large register strides

2017-11-09 Thread Matt Turner
On Thu, Nov 2, 2017 at 3:54 PM, Jason Ekstrand  wrote:
> Register strides higher than 4 are uncommon but they can happen.  For
> instance, if you have a 64-bit extract_u8 operation, we turn that into
> UB -> UQ MOV with a source stride of 8.  Our previous calculation would
> try to generate a stride of <32;8,8>:ub which is invalid because the
> maximum horizontal stride is 4.  To solve this problem, we instead use a
> stride of <8;1,0>.  As noted in the comment, this does not work as a
> destination but that's ok as very few things actually generate that
> stride.

Please put the tests you fixed in the commit message. It's not okay to
leave that out for all the reasons that I'm sure you know.

Looks like this doesn't work on CHV, BXT, GLK :(

KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks now fails on CHV,
BXT, GLK with:

mov(8)  g21<1>UQg19<8,1,0>UB{ align1 1Q };
ERROR: Source and destination horizontal stride must equal and
a multiple of a qword when the execution type is 64-bit
ERROR: Vstride must be Width * Hstride when the execution type is 64-bit

Modulo the typo in the first error, I think both of these are correct.
I don't think we can extract_u8 to a 64-bit type on Atom :(

This is filed as https://bugs.freedesktop.org/show_bug.cgi?id=103628
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Use a pure vertical stride for large register strides

2017-11-06 Thread Jason Ekstrand
On Mon, Nov 6, 2017 at 4:10 AM, Samuel Iglesias Gonsálvez <
sigles...@igalia.com> wrote:

>
> On Thu, 2017-11-02 at 15:54 -0700, Jason Ekstrand wrote:
> > Register strides higher than 4 are uncommon but they can happen.  For
> > instance, if you have a 64-bit extract_u8 operation, we turn that
> > into
> > UB -> UQ MOV with a source stride of 8.  Our previous calculation
> > would
> > try to generate a stride of <32;8,8>:ub which is invalid because the
> > maximum horizontal stride is 4.  To solve this problem, we instead
> > use a
> > stride of <8;1,0>.  As noted in the comment, this does not work as a
> > destination but that's ok as very few things actually generate that
> > stride.
> >
>
> Great!
>
> > Cc: mesa-sta...@lists.freedesktop.org
> > ---
> >  src/intel/compiler/brw_fs_generator.cpp | 15 ---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/intel/compiler/brw_fs_generator.cpp
> > b/src/intel/compiler/brw_fs_generator.cpp
> > index 46f9a33..a557f80 100644
> > --- a/src/intel/compiler/brw_fs_generator.cpp
> > +++ b/src/intel/compiler/brw_fs_generator.cpp
> > @@ -90,9 +90,18 @@ brw_reg_from_fs_reg(const struct gen_device_info
> > *devinfo, fs_inst *inst,
> >*   different execution size when the number of
> > components
> >*   written to each destination GRF is not the same.
> >*/
> > - const unsigned width = MIN2(reg_width, phys_width);
> > - brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg-
> > >nr, 0);
> > - brw_reg = stride(brw_reg, width * reg->stride, width, reg-
> > >stride);
> > + if (reg->stride > 4) {
> > +/* For registers with an exceptionally large stride, we
> > use a
> > + * width of 1 and only use the vertical stride.  This
> > only works
> > + * for sources since destinations require hstride == 1.
> > + */
> > +brw_reg = brw_vec1_reg(brw_file_from_reg(reg), reg->nr,
> > 0);
> > +brw_reg = stride(brw_reg, reg->stride, 1, 0);
>
> I think it is a good idea to add an assert like:
>
>assert(reg != >dst)
>
> in order to avoid applying this to dst.
>

We already have an assert that triggers, but this is more direct.  I'll add
it.


> With or without that change,
>
> Reviewed-by: Samuel Iglesias Gonsálvez 
>

Thanks!


> Sam
>
> > + } else {
> > +const unsigned width = MIN2(reg_width, phys_width);
> > +brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg),
> > reg->nr, 0);
> > +brw_reg = stride(brw_reg, width * reg->stride, width,
> > reg->stride);
> > + }
> >
> >   if (devinfo->gen == 7 && !devinfo->is_haswell) {
> >  /* From the IvyBridge PRM (EU Changes by Processor
> > Generation, page 13):
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Use a pure vertical stride for large register strides

2017-11-06 Thread Samuel Iglesias Gonsálvez

On Thu, 2017-11-02 at 15:54 -0700, Jason Ekstrand wrote:
> Register strides higher than 4 are uncommon but they can happen.  For
> instance, if you have a 64-bit extract_u8 operation, we turn that
> into
> UB -> UQ MOV with a source stride of 8.  Our previous calculation
> would
> try to generate a stride of <32;8,8>:ub which is invalid because the
> maximum horizontal stride is 4.  To solve this problem, we instead
> use a
> stride of <8;1,0>.  As noted in the comment, this does not work as a
> destination but that's ok as very few things actually generate that
> stride.
> 

Great!

> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/compiler/brw_fs_generator.cpp | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs_generator.cpp
> b/src/intel/compiler/brw_fs_generator.cpp
> index 46f9a33..a557f80 100644
> --- a/src/intel/compiler/brw_fs_generator.cpp
> +++ b/src/intel/compiler/brw_fs_generator.cpp
> @@ -90,9 +90,18 @@ brw_reg_from_fs_reg(const struct gen_device_info
> *devinfo, fs_inst *inst,
>*   different execution size when the number of
> components
>*   written to each destination GRF is not the same.
>*/
> - const unsigned width = MIN2(reg_width, phys_width);
> - brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg-
> >nr, 0);
> - brw_reg = stride(brw_reg, width * reg->stride, width, reg-
> >stride);
> + if (reg->stride > 4) {
> +/* For registers with an exceptionally large stride, we
> use a
> + * width of 1 and only use the vertical stride.  This
> only works
> + * for sources since destinations require hstride == 1.
> + */
> +brw_reg = brw_vec1_reg(brw_file_from_reg(reg), reg->nr,
> 0);
> +brw_reg = stride(brw_reg, reg->stride, 1, 0);

I think it is a good idea to add an assert like:

   assert(reg != >dst)

in order to avoid applying this to dst.

With or without that change,

Reviewed-by: Samuel Iglesias Gonsálvez 

Sam

> + } else {
> +const unsigned width = MIN2(reg_width, phys_width);
> +brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg),
> reg->nr, 0);
> +brw_reg = stride(brw_reg, width * reg->stride, width,
> reg->stride);
> + }
>  
>   if (devinfo->gen == 7 && !devinfo->is_haswell) {
>  /* From the IvyBridge PRM (EU Changes by Processor
> Generation, page 13):

signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel/fs: Use a pure vertical stride for large register strides

2017-11-02 Thread Jason Ekstrand
Register strides higher than 4 are uncommon but they can happen.  For
instance, if you have a 64-bit extract_u8 operation, we turn that into
UB -> UQ MOV with a source stride of 8.  Our previous calculation would
try to generate a stride of <32;8,8>:ub which is invalid because the
maximum horizontal stride is 4.  To solve this problem, we instead use a
stride of <8;1,0>.  As noted in the comment, this does not work as a
destination but that's ok as very few things actually generate that
stride.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/compiler/brw_fs_generator.cpp | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 46f9a33..a557f80 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -90,9 +90,18 @@ brw_reg_from_fs_reg(const struct gen_device_info *devinfo, 
fs_inst *inst,
   *   different execution size when the number of components
   *   written to each destination GRF is not the same.
   */
- const unsigned width = MIN2(reg_width, phys_width);
- brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
- brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
+ if (reg->stride > 4) {
+/* For registers with an exceptionally large stride, we use a
+ * width of 1 and only use the vertical stride.  This only works
+ * for sources since destinations require hstride == 1.
+ */
+brw_reg = brw_vec1_reg(brw_file_from_reg(reg), reg->nr, 0);
+brw_reg = stride(brw_reg, reg->stride, 1, 0);
+ } else {
+const unsigned width = MIN2(reg_width, phys_width);
+brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
+brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
+ }
 
  if (devinfo->gen == 7 && !devinfo->is_haswell) {
 /* From the IvyBridge PRM (EU Changes by Processor Generation, 
page 13):
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev