Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2017-01-03 Thread Samuel Iglesias Gonsálvez
Hello Matt,

We have pushed all the patches except the last one...

> * i965/gen7: expose OpenGL 4.0 on Haswell
> 
>   We are currently discussing it with Curro :-)
> 

We plan to send another patch series with the needed changes to enable
OpenGL 4.0 on Haswell and all the suggestions we got from Kenneth and
Curro.

Thanks for the reviews! :-)

Sam
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-22 Thread Samuel Iglesias Gonsálvez
On Thu, 2016-12-22 at 12:13 -0600, Matt Turner wrote:
> On Tue, Dec 13, 2016 at 2:01 AM, Samuel Iglesias Gonsálvez
>  wrote:
> > On Mon, 2016-12-05 at 15:21 -0800, Matt Turner wrote:
> > > i965/vec4: add a helper function to create double immediates
> > > 
> > >   Can leave for later: Shouldn't we use the DIM instruction
> > > (on
> > >   HSW)?
> > > 
> > >   I'm not sure if this should be fixed now or later, but
> > > shouldn't
> > >   we use NibCtrl on these two instructions instead of
> > >   force_writemask_all? I think this is a case where NibCtrl
> > > is
> > >   useful.
> > > 
> > 
> > Yes, we can use DIM instruction here. We are going to write a
> > follow-up
> > patch for it.
> 
> I noticed a bug in "i965/fs: emit DIM instruction to load 64-bit
> immediates in HSW"
> 
> You want
> 
> -  const fs_builder ubld = bld.exec_all();
> +  const fs_builder ubld = bld.exec_all().group(1, 0);
> 
> otherwise, DIM instructions will be emitted with the default exec
> size
> -- dim(16) in some cases, which is not legal.
> 

Oh, good catch. I am going to write the patch.

Thanks!

Sam

signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-22 Thread Matt Turner
On Tue, Dec 13, 2016 at 2:01 AM, Samuel Iglesias Gonsálvez
 wrote:
> On Mon, 2016-12-05 at 15:21 -0800, Matt Turner wrote:
>> i965/vec4: add a helper function to create double immediates
>>
>>   Can leave for later: Shouldn't we use the DIM instruction (on
>>   HSW)?
>>
>>   I'm not sure if this should be fixed now or later, but
>> shouldn't
>>   we use NibCtrl on these two instructions instead of
>>   force_writemask_all? I think this is a case where NibCtrl is
>>   useful.
>>
>
> Yes, we can use DIM instruction here. We are going to write a follow-up
> patch for it.

I noticed a bug in "i965/fs: emit DIM instruction to load 64-bit
immediates in HSW"

You want

-  const fs_builder ubld = bld.exec_all();
+  const fs_builder ubld = bld.exec_all().group(1, 0);

otherwise, DIM instructions will be emitted with the default exec size
-- dim(16) in some cases, which is not legal.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-21 Thread Matt Turner
On Wed, Dec 21, 2016 at 10:01 AM, Matt Turner  wrote:
> On Tue, Oct 11, 2016 at 4:01 AM, Iago Toral Quiroga  wrote:
>>   i965/disasm: fix subreg for dst in Align16 mode
>
> I just noticed that this commit has a rebase mistake. Tim changed the
> code in July to use PRIu64, but this patch reverts back to %u.

Sorry, disregard that. I see that the type of the expression actually
changed, so %u is correct.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-21 Thread Matt Turner
On Tue, Oct 11, 2016 at 4:01 AM, Iago Toral Quiroga  wrote:
>   i965/disasm: fix subreg for dst in Align16 mode

I just noticed that this commit has a rebase mistake. Tim changed the
code in July to use PRIu64, but this patch reverts back to %u.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-19 Thread Samuel Iglesias Gonsálvez
On Mon, 2016-12-19 at 11:31 -0600, Matt Turner wrote:
> On Mon, Dec 19, 2016 at 2:00 AM, Samuel Iglesias Gonsálvez
>  wrote:
> > Hello Matt,
> > 
> > We have done most of the suggestions you made to our patches.
> > However,
> > we have replied to some of your questions/suggestions and we are
> > waiting for a reply before marking them as R-b or not.
> 
> Thank you guys so much.
> 
> > You can clone the new version of the patch series by running this
> > command:
> > 
> > $ git clone -b i965-fp64-gen7-scalar-vec4-rc3 https://github.com/Ig
> > alia
> > /mesa.git
> > 
> > Below is the list of patches that need a R-b (they are marked as
> > UNREVIEWED in the branch).
> > 
> > * i965/vec4: implement hardware workaround for align16 double to
> > float
> > conversion
> > > 
> > >   This always seemed like a really strange hardware bug, and
> > 
> > one
> > >   that no one should ever hit.
> > > 
> > >   I'd prefer that, instead of loading an immediate double and
> > > then
> > >   performing a conversion to float, that we just convert the
> > >   double to float in the compiler and emit an instruction to
> > 
> > load
> > >   that.
> > > 
> > 
> >   We have done this. Does this change get your R-b?
> 
> Yes!
> 
> > 
> > * i965/vec4: fix optimize predicate for doubles
> > 
> >   We have replied here [0].
> 
> Sounds good to me.
> 
> > 
> > * i965/vec4: handle 32 and 64 bit channels in liveness analysis
> > 
> >   It is still unreviewed. Maybe Curro can take a look at it.
> 
> I've also pinged Curro to ask if he'll review it.
> 
> > * i965/vec4: add a SIMD lowering pass
> > 
> >   Replied here [1].
> 
> Silly messy hardware. :)
> 
> > * i965/vec4: Prevent copy propagation from violating pre-gen8
> > restrictions
> > 
> >   Replied here [1].
> > 
> > * i965/vec4: run scalarize_df() after spilling
> > 
> >   Replied here [1].
> 
> Makes sense.
> 
> Yes, all of those should be
> 
> Reviewed-by: Matt Turner 
> 
> Again, thank you so much. This was a large amount of work, and the
> way
> you guys handled it was extremely impressive. I'm only sorry that the
> review of your work wasn't executed as well as your actual work!
> 

Thanks to you for the review! :-)

Sam

signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-19 Thread Matt Turner
On Mon, Dec 19, 2016 at 2:00 AM, Samuel Iglesias Gonsálvez
 wrote:
> Hello Matt,
>
> We have done most of the suggestions you made to our patches. However,
> we have replied to some of your questions/suggestions and we are
> waiting for a reply before marking them as R-b or not.

Thank you guys so much.

> You can clone the new version of the patch series by running this
> command:
>
> $ git clone -b i965-fp64-gen7-scalar-vec4-rc3 https://github.com/Igalia
> /mesa.git
>
> Below is the list of patches that need a R-b (they are marked as
> UNREVIEWED in the branch).
>
> * i965/vec4: implement hardware workaround for align16 double to float
> conversion
>>
>>   This always seemed like a really strange hardware bug, and
> one
>>   that no one should ever hit.
>>
>>   I'd prefer that, instead of loading an immediate double and
>> then
>>   performing a conversion to float, that we just convert the
>>   double to float in the compiler and emit an instruction to
> load
>>   that.
>>
>
>   We have done this. Does this change get your R-b?

Yes!

>
> * i965/vec4: fix optimize predicate for doubles
>
>   We have replied here [0].

Sounds good to me.

>
> * i965/vec4: handle 32 and 64 bit channels in liveness analysis
>
>   It is still unreviewed. Maybe Curro can take a look at it.

I've also pinged Curro to ask if he'll review it.

> * i965/vec4: add a SIMD lowering pass
>
>   Replied here [1].

Silly messy hardware. :)

> * i965/vec4: Prevent copy propagation from violating pre-gen8
> restrictions
>
>   Replied here [1].
>
> * i965/vec4: run scalarize_df() after spilling
>
>   Replied here [1].

Makes sense.

Yes, all of those should be

Reviewed-by: Matt Turner 

Again, thank you so much. This was a large amount of work, and the way
you guys handled it was extremely impressive. I'm only sorry that the
review of your work wasn't executed as well as your actual work!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-19 Thread Samuel Iglesias Gonsálvez
Hello Matt,

We have done most of the suggestions you made to our patches. However,
we have replied to some of your questions/suggestions and we are
waiting for a reply before marking them as R-b or not.

You can clone the new version of the patch series by running this
command:

$ git clone -b i965-fp64-gen7-scalar-vec4-rc3 https://github.com/Igalia
/mesa.git

Below is the list of patches that need a R-b (they are marked as
UNREVIEWED in the branch).

* i965/vec4: implement hardware workaround for align16 double to float
conversion
> 
>   This always seemed like a really strange hardware bug, and
one
>   that no one should ever hit.
> 
>   I'd prefer that, instead of loading an immediate double and
> then
>   performing a conversion to float, that we just convert the
>   double to float in the compiler and emit an instruction to
load
>   that.
> 

  We have done this. Does this change get your R-b?

* i965/vec4: fix optimize predicate for doubles

  We have replied here [0].

* i965/vec4: handle 32 and 64 bit channels in liveness analysis

  It is still unreviewed. Maybe Curro can take a look at it.

* i965/vec4: add a SIMD lowering pass

  Replied here [1].

* i965/vec4: Prevent copy propagation from violating pre-gen8
restrictions

  Replied here [1].

* i965/vec4: run scalarize_df() after spilling

  Replied here [1].

* i965/gen7: expose OpenGL 4.0 on Haswell

  We are currently discussing it with Curro :-)

Thanks!

Sam

[0] https://lists.freedesktop.org/archives/mesa-dev/2016-December/13815
1.html
[1] https://lists.freedesktop.org/archives/mesa-dev/2016-December/13816
0.html

signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-13 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> On Tue, 2016-12-13 at 09:01 +0100, Samuel Iglesias Gonsálvez wrote:
>> 
> [...]
>> > i965/vec4/nir: implement double comparisons
>> > 
>> >Trivial: A newline before the if() would be nice.
>> > 
>> >I have a memory of Curro telling me that the hardware maps each
>> >32-bit chunk in the dst to a single bit in the flag register.
>> >Maybe that's only on IVB, and maybe I'm misremembering. I'm
>> >concerned that while the PICK_LOW+MOV will properly handle the
>> >result that is written to the destination, the result written
>> > to
>> >the flag register might be incorrect.
>> > 
>> >My commit d9b09f8a30 fixed some problems that seems similar in
>> >my mind.
>> > 
>> 
>> As far as we know that is not what happens, and the flag register has
>> one bit for each logical channel (so each 64-bit chunk for DF
>> instructions). If that were not the case, I'd expect a lot of the
>> tests
>> for doubles to fail or at least non-uniform control-flow scenarios to
>> fail, for which we have specific tests that are passing just fine in
>> both haswell and ivybridge. We will try to double-check with Curro
>> just
>> in case though.
>> 
>
> We have just found an old email from Curro saying that it works as we
> think (one bit per logical channel). Maybe Curro wants to confirm it (I
> added him on Cc).
>

The only case I can recall where a DF instruction will interpret the
flag register incorrectly (as two bits per channel instead of one bit
per channel) only affects the decompression logic of the SEL
instruction, and only in Align16 mode (sigh...) -- I don't think your
double float comparison code will be affected.

> Sam


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-13 Thread Iago Toral
On Tue, 2016-12-13 at 13:22 +0100, Samuel Iglesias Gonsálvez wrote:
> On Sun, 2016-12-11 at 15:00 -0800, Matt Turner wrote:
> > 
> > i965/vec4: handle 32 and 64 bit channels in liveness analysis
> > 
> > Please indent the returned multiline expressions in
> > var_from_reg() like we do elsewhere, so that the second line
> > begins on the same column as the first line.
> > 
> > */ goes on its own line.
> > 
> > I'm having a hard time reviewing this one. The logic is rather
> > complex. I'll ask someone to help me review it on Monday at the
> > office.
> > 
> OK, no problem!

I think for this one you want to ping Curro, he reviewed parts of it
when Juan wrote it and he suggested changes so he is familiar with the
solution.

Iago

> > 
> > i965/vec4: add a horiz_offset() helper
> > i965: move the group field from fs_inst to backend_instruction.
> > i965/vec4: add a SIMD lowering pass
> > 
> > In the commit message, you say
> > 
> > For now the pass only handles the gen7 restriction where
> > any
> > instruction that writes 2 registers also needs to read 2
> > registers.  This affects double-precision instructions
> > reading uniforms, for example. Later patches will extend
> > the
> > lowering pass adding a few more cases.
> > 
> > But the rule about if-writing-two-regs, must-read-two-regs
> > says that scalar sources are an exception:
> > 
> > "When source is scalar, the source registers are not
> >  incremented."
> > 
> > I don't see any code that allows us to avoid splitting an
> > instruction if it's writing two registers but sourcing a scalar
> > uniform. Maybe this doesn't apply because we have to use a non
> > scalar swizzle (.xy) to access a single fp64 component?
> > 
> Right, however, this is necessary in align16 because uniforms are
> vectors (hstride != 0) so they are not scalars.
> 
> > 
> > i965/vec4: make the generator set correct NibCtrl for SIMD4 DF
> > instructions
> > i965/vec4: dump NibCtrl for instructions with execsize != 8
> > i965/disasm: print NibCtrl for instructions with execsize < 8
> > i965/vec4: teach CSE about exec_size, group and doubles
> > i965/vec4: teach cmod propagation about different execution sizes
> > i965/vec4: split double-precision bcsel
> > 
> > bcsel is the NIR opcode. I'd change references to bcsel to SEL.
> > 
> > Very interesting find...
> > 
> > i965/vec4: add a scalarization pass for double-precision
> > instructions
> > 
> > Don't indent case inside a switch.
> > 
> > i965/vec4: translate 64-bit swizzles to 32-bit
> > i965/vec4: implement access to DF source components Z/W
> > 
> > Wow, bien hecho!
> > 
> > i965/disasm: fix subreg for dst in Align16 mode
> > i965/vec4: teach register coalescing about 64-bit
> > i965/vec4: fix pack_uniform_registers for doubles
> > i965/vec4: fix indentation in pack_uniform_registers
> > i965/vec4: Skip swizzle to subnr in 3src instructions with DF
> > operands
> > 
> > s/need/needs/ in the comment.
> > 
> > i965/vec4/nir: do not emit 64-bit MAD
> > i965/vec4: do not emit 64-bit MAD
> > 
> > I might change the name of this commit to "i965/vec4: Lower
> > 64-bit MAD" or "i965/vec4: Lower DF MAD"
> > 
> > I think I'd change the name of the function as well, maybe to
> > lower_64bit_mad[_to_mul_add] or something.
> > 
> OK, we will do the rename.
> 
> > 
> > i965/vec4: support multiple dispatch widths and groups in the IR
> > builder.
> > i965/vec4: Add a shuffle_64bit_data helper
> > 
> > I was initially confused by r0.0:DF/r0.1:DF, thinking that .1
> > in
> > r0.1:DF was a subreg offset. But I think it's actually the
> > register offset (i.e., .offset)?
> > 
> > If that's the case, I think it would be clearer just to
> > increment the register number in the example:
> > 
> > r0.0:DF  x0 y0 z0 w0
> > r1.0:DF  x1 y1 z1 w1
> > 
> > s/opperation/operation/ in the comment.
> > 
> > On the multiline bld.group(...), I think Curro's style is to
> > align with the '.'. For instance,
> > 
> > inst = bld.group(4, for_write ? 1 : 0)
> >   .MOV(writemask(dst, WRITEMASK_ZW),
> >    swizzle(byte_offset(src, REG_SIZE),
> > BRW_SWIZZLE_XYXY));
> > 
> > so that group and MOV align, with the '.' on the same line as
> > the MOV.
> > 
> Regarding the example, yes, you are right. We are going to fix it.
> Thanks for the rest of suggestions, we will do them too :-)
> 
> 
> > 
> > i965/vec4: Fix UBO loads for 64-bit data
> > i965/vec4: Fix SSBO loads for 64-bit data
> > i965/vec4: Fix SSBO stores for 64-bit data
> > i965/vec4: don't constant propagate 64-bit immediates
> > i965/vec4: prevent copy-propagation from values with a different
> > type
> > size
> > i965/vec4: Prevent copy propagation from violating pre-gen8
> > restrictions
> > 
> > Similar comment as before about being allowed to write 

Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-13 Thread Samuel Iglesias Gonsálvez
On Sun, 2016-12-11 at 15:00 -0800, Matt Turner wrote:
> i965/vec4: handle 32 and 64 bit channels in liveness analysis
> 
>   Please indent the returned multiline expressions in
>   var_from_reg() like we do elsewhere, so that the second line
>   begins on the same column as the first line.
> 
>   */ goes on its own line.
> 
>   I'm having a hard time reviewing this one. The logic is rather
>   complex. I'll ask someone to help me review it on Monday at the
>   office.
> 

OK, no problem!

> i965/vec4: add a horiz_offset() helper
> i965: move the group field from fs_inst to backend_instruction.
> i965/vec4: add a SIMD lowering pass
> 
>   In the commit message, you say
> 
>   For now the pass only handles the gen7 restriction where
> any
>   instruction that writes 2 registers also needs to read 2
>   registers.  This affects double-precision instructions
>   reading uniforms, for example. Later patches will extend
> the
>   lowering pass adding a few more cases.
> 
>   But the rule about if-writing-two-regs, must-read-two-regs
>   says that scalar sources are an exception:
> 
>   "When source is scalar, the source registers are not
>    incremented."
> 
>   I don't see any code that allows us to avoid splitting an
>   instruction if it's writing two registers but sourcing a scalar
>   uniform. Maybe this doesn't apply because we have to use a non
>   scalar swizzle (.xy) to access a single fp64 component?
> 

Right, however, this is necessary in align16 because uniforms are
vectors (hstride != 0) so they are not scalars.

> i965/vec4: make the generator set correct NibCtrl for SIMD4 DF
> instructions
> i965/vec4: dump NibCtrl for instructions with execsize != 8
> i965/disasm: print NibCtrl for instructions with execsize < 8
> i965/vec4: teach CSE about exec_size, group and doubles
> i965/vec4: teach cmod propagation about different execution sizes
> i965/vec4: split double-precision bcsel
> 
>   bcsel is the NIR opcode. I'd change references to bcsel to SEL.
> 
>   Very interesting find...
> 
> i965/vec4: add a scalarization pass for double-precision instructions
> 
>   Don't indent case inside a switch.
> 
> i965/vec4: translate 64-bit swizzles to 32-bit
> i965/vec4: implement access to DF source components Z/W
> 
>   Wow, bien hecho!
> 
> i965/disasm: fix subreg for dst in Align16 mode
> i965/vec4: teach register coalescing about 64-bit
> i965/vec4: fix pack_uniform_registers for doubles
> i965/vec4: fix indentation in pack_uniform_registers
> i965/vec4: Skip swizzle to subnr in 3src instructions with DF
> operands
> 
>   s/need/needs/ in the comment.
> 
> i965/vec4/nir: do not emit 64-bit MAD
> i965/vec4: do not emit 64-bit MAD
> 
>   I might change the name of this commit to "i965/vec4: Lower
>   64-bit MAD" or "i965/vec4: Lower DF MAD"
> 
>   I think I'd change the name of the function as well, maybe to
>   lower_64bit_mad[_to_mul_add] or something.
> 

OK, we will do the rename.

> i965/vec4: support multiple dispatch widths and groups in the IR
> builder.
> i965/vec4: Add a shuffle_64bit_data helper
> 
>   I was initially confused by r0.0:DF/r0.1:DF, thinking that .1
> in
>   r0.1:DF was a subreg offset. But I think it's actually the
>   register offset (i.e., .offset)?
> 
>   If that's the case, I think it would be clearer just to
>   increment the register number in the example:
> 
>   r0.0:DF  x0 y0 z0 w0
>   r1.0:DF  x1 y1 z1 w1
> 
>   s/opperation/operation/ in the comment.
> 
>   On the multiline bld.group(...), I think Curro's style is to
>   align with the '.'. For instance,
> 
>   inst = bld.group(4, for_write ? 1 : 0)
>     .MOV(writemask(dst, WRITEMASK_ZW),
>      swizzle(byte_offset(src, REG_SIZE),
> BRW_SWIZZLE_XYXY));
> 
>   so that group and MOV align, with the '.' on the same line as
>   the MOV.
> 

Regarding the example, yes, you are right. We are going to fix it.
Thanks for the rest of suggestions, we will do them too :-)


> i965/vec4: Fix UBO loads for 64-bit data
> i965/vec4: Fix SSBO loads for 64-bit data
> i965/vec4: Fix SSBO stores for 64-bit data
> i965/vec4: don't constant propagate 64-bit immediates
> i965/vec4: prevent copy-propagation from values with a different type
> size
> i965/vec4: Prevent copy propagation from violating pre-gen8
> restrictions
> 
>   Similar comment as before about being allowed to write two
>   registers while sourcing a scalar. Maybe doesn't apply because
>   of the double swizzle.
> 

Same reply as to the other question.

> i965/vec4: don't propagate single-precision uniforms into 4-wide
> instructions
> i965/vec4: don't copy propagate misaligned registers
> i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8
> platforms
> 
>   I don't see this in the BDW PRMs, 

Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-13 Thread Samuel Iglesias Gonsálvez
On Tue, 2016-12-13 at 09:01 +0100, Samuel Iglesias Gonsálvez wrote:
> 
[...]
> > i965/vec4/nir: implement double comparisons
> > 
> > Trivial: A newline before the if() would be nice.
> > 
> > I have a memory of Curro telling me that the hardware maps each
> > 32-bit chunk in the dst to a single bit in the flag register.
> > Maybe that's only on IVB, and maybe I'm misremembering. I'm
> > concerned that while the PICK_LOW+MOV will properly handle the
> > result that is written to the destination, the result written
> > to
> > the flag register might be incorrect.
> > 
> > My commit d9b09f8a30 fixed some problems that seems similar in
> > my mind.
> > 
> 
> As far as we know that is not what happens, and the flag register has
> one bit for each logical channel (so each 64-bit chunk for DF
> instructions). If that were not the case, I'd expect a lot of the
> tests
> for doubles to fail or at least non-uniform control-flow scenarios to
> fail, for which we have specific tests that are passing just fine in
> both haswell and ivybridge. We will try to double-check with Curro
> just
> in case though.
> 

We have just found an old email from Curro saying that it works as we
think (one bit per logical channel). Maybe Curro wants to confirm it (I
added him on Cc).

Sam

signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-13 Thread Samuel Iglesias Gonsálvez
Hello Matt!

On Mon, 2016-12-05 at 15:21 -0800, Matt Turner wrote:
> On 10/11, Iago Toral Quiroga wrote:
> > It's been some time since
> 
> ... anyone has reviewed your patches. Sorry. :(
> 
> I'm going to review from your rebased i965-fp64-gen7-scalar-vec4-rc2
> branch. There have probably been some reorderings or other changes
> due
> to rebasing since the patches were sent, so I'm going to paste the
> list
> of patches below and then attempt to list any review comments after
> the
> patch name.
> 

Thanks a lot for the review!

> 
> A couple of patches have an extra newline in the commit message
> between
> *-by: tags. Would be nice to make a pass through and fix that.
> 
> 
> i965/nir: double/dvec2 uniforms only need to be padded to a single
> vec4 slot
> i965/vec4/nir: simplify glsl_type_for_nir_alu_type()
> i965/vec4/nir: allocate two registers for dvec3/dvec4
> i965/vec4/nir: Add bit-size information to types
> i965/vec4/nir: support doubles in ALU operations
> i965/vec4/nir: set the right type for 64-bit registers
> i965/vec4/nir: fix emitting 64-bit immediates
> i965/vec4: add support for printing DF immediates
> i965/vec4: add double/float conversion pseudo-opcodes
> 
>   I wonder if we should allow MOV F/DF and DF/F operations in the
>   IR and then have a lowering pass that "legalizes" them. I'm
>   happy to leave that experiment for after this series lands.
> 

This is another way of doing it but we did not do it because we have
not seen a clear advantage on this approach.

> i965/vec4: translate d2f/f2d
> i965: add brw_vecn_grf()
> i965/vec4: set correct register regions for 32-bit and 64-bit
> i965/disasm: align16 DF source regions have a width of 2
> 
>   It's actually kind of weird to print width and horizontal
> stride
>   for align16 sources, since they don't exist in the instruction
>   word. We should probably print only the vertical stride. I
> don't
>   care if that's fixed a part of this series.
> 

Right. We updated it because it was printed but you are right.
We will do this change later in a follow-up patch.

> i965/vec4: We only support 32-bit integer ALU operations for now
> i965/vec4: add dst_null_df()
> i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes
> i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes
> 
>   If I understand correctly, these opcodes map to instructions
>   like
> 
>   mov(XXX) dst<1>UD  src<8,4,2>:UD
> 
>   Is the exec_size 4? I ask, because if it's 8 (and the source
>   region spans two registers and the dest region spans one)
> that's
>   not a legal instruction. If it's 4, then it's legal.
> 

The execsize is 8 for the specific case you mention, which is emitted
in VEC4_OPCODE_PICK_{HIGH,LOW}_32BIT. When the source regions spans two
register, it is allowed that the destination region spans one
register but only in specific cases:

See HSW's PRM doc, Volume 7: 3D Media GPGPU Engine (Haswell), page
948: 


A. Region Alignment Rules for Direct Register Addressing

[...]

When an instruction has a source region spanning two registers and a
destination region
contained in one register the number of elements must be the same
between two sources
and one of the following must be true:
a. The destination region is entirely contained in the lower OWord of a
register.
b. The destination region is entirely contained in the upper OWord of a
register.
c. The destination elements are evenly split between the two OWords of
a register.


That mov is legal because of c) rule.

However, for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes the emitted code
is slighly different:

mov(4)  dst<2>UD   src<4,4,1>UD   { align1 1N };

Where both dst and src span one register because of exec_size = 4. The
exec_size is set to 4 by the simd lowering pass because it detects a
dst that spans two registers  and a source that only spans one, which
is not allowed.

> i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT
> i965/vec4: don't copy propagate vector opcodes that operate in align1
> mode
> i965/vec4: implement double unpacking
> 
>   This emits 
>   
>   MOVdvec4_tmp, op[0]
>   PICK_LO/HI uvec4_tmp, dvec4_tmp
>   MOVdst, uvec4_tmp
>   
>   I'm confused about the purpose of the MOVs. It seems like op[0]
>   should already be a dvec and dst should already be a uvec.
> 

The opcodes used for this operates in align1 mode, so it can't handle
swizzles in the src nor writemasks in the dst, that's why we need the
movs, the first one handles the swizzle in the src and the second once
handles the writemask in the dst.

> i965/vec4: implement double packing
> 
>   More or less the same thing here. Looks like we don't need all
>   of the MOVs.
> 

Same than before.

> i965/vec4/nir: implement double comparisons
> 
>   Trivial: A newline before the if() would be 

Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-11 Thread Matt Turner

i965/vec4: handle 32 and 64 bit channels in liveness analysis

Please indent the returned multiline expressions in
var_from_reg() like we do elsewhere, so that the second line
begins on the same column as the first line.

*/ goes on its own line.

I'm having a hard time reviewing this one. The logic is rather
complex. I'll ask someone to help me review it on Monday at the
office.

i965/vec4: add a horiz_offset() helper
i965: move the group field from fs_inst to backend_instruction.
i965/vec4: add a SIMD lowering pass

In the commit message, you say

For now the pass only handles the gen7 restriction where any
instruction that writes 2 registers also needs to read 2
registers.  This affects double-precision instructions
reading uniforms, for example. Later patches will extend the
lowering pass adding a few more cases.

But the rule about if-writing-two-regs, must-read-two-regs
says that scalar sources are an exception:

"When source is scalar, the source registers are not
 incremented."

I don't see any code that allows us to avoid splitting an
instruction if it's writing two registers but sourcing a scalar
uniform. Maybe this doesn't apply because we have to use a non
scalar swizzle (.xy) to access a single fp64 component?

i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions
i965/vec4: dump NibCtrl for instructions with execsize != 8
i965/disasm: print NibCtrl for instructions with execsize < 8
i965/vec4: teach CSE about exec_size, group and doubles
i965/vec4: teach cmod propagation about different execution sizes
i965/vec4: split double-precision bcsel

bcsel is the NIR opcode. I'd change references to bcsel to SEL.

Very interesting find...

i965/vec4: add a scalarization pass for double-precision instructions

Don't indent case inside a switch.

i965/vec4: translate 64-bit swizzles to 32-bit
i965/vec4: implement access to DF source components Z/W

Wow, bien hecho!

i965/disasm: fix subreg for dst in Align16 mode
i965/vec4: teach register coalescing about 64-bit
i965/vec4: fix pack_uniform_registers for doubles
i965/vec4: fix indentation in pack_uniform_registers
i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands

s/need/needs/ in the comment.

i965/vec4/nir: do not emit 64-bit MAD
i965/vec4: do not emit 64-bit MAD

I might change the name of this commit to "i965/vec4: Lower
64-bit MAD" or "i965/vec4: Lower DF MAD"

I think I'd change the name of the function as well, maybe to
lower_64bit_mad[_to_mul_add] or something.

i965/vec4: support multiple dispatch widths and groups in the IR builder.
i965/vec4: Add a shuffle_64bit_data helper

I was initially confused by r0.0:DF/r0.1:DF, thinking that .1 in
r0.1:DF was a subreg offset. But I think it's actually the
register offset (i.e., .offset)?

If that's the case, I think it would be clearer just to
increment the register number in the example:

r0.0:DF  x0 y0 z0 w0
r1.0:DF  x1 y1 z1 w1

s/opperation/operation/ in the comment.

On the multiline bld.group(...), I think Curro's style is to
align with the '.'. For instance,

inst = bld.group(4, for_write ? 1 : 0)
  .MOV(writemask(dst, WRITEMASK_ZW),
   swizzle(byte_offset(src, REG_SIZE), BRW_SWIZZLE_XYXY));

so that group and MOV align, with the '.' on the same line as
the MOV.

i965/vec4: Fix UBO loads for 64-bit data
i965/vec4: Fix SSBO loads for 64-bit data
i965/vec4: Fix SSBO stores for 64-bit data
i965/vec4: don't constant propagate 64-bit immediates
i965/vec4: prevent copy-propagation from values with a different type size
i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions

Similar comment as before about being allowed to write two
registers while sourcing a scalar. Maybe doesn't apply because
of the double swizzle.

i965/vec4: don't propagate single-precision uniforms into 4-wide instructions
i965/vec4: don't copy propagate misaligned registers
i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms

I don't see this in the BDW PRMs, and the internal documentation
says that it applies to "CHV, BXT"

I suggest dropping this patch (or replacing it with one that
adds || devinfo->is_broxton).

i965/vec4: Do not use DepCtrl with 64-bit instructions
i965/vec4: do not split scratch read/write opcodes
i965/vec4: fix scratch offset for 64bit data
i965/vec4: fix scratch reads for 64bit data
i965/vec4: fix scratch writes for 64bit data
i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data
i965/vec4: fix indentation in 

Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-07 Thread Iago Toral
On Mon, 2016-12-05 at 15:21 -0800, Matt Turner wrote:
> On 10/11, Iago Toral Quiroga wrote:
> > 
> > It's been some time since
> ... anyone has reviewed your patches. Sorry. :(
> 
> I'm going to review from your rebased i965-fp64-gen7-scalar-vec4-rc2
> branch. There have probably been some reorderings or other changes
> due
> to rebasing since the patches were sent, so I'm going to paste the
> list
> of patches below and then attempt to list any review comments after
> the
> patch name.

Hey Matt, thanks for reviewing this! I am on holidays this week but
I'll go through all your comments starting Monday next week.

Iago

> 
> A couple of patches have an extra newline in the commit message
> between
> *-by: tags. Would be nice to make a pass through and fix that.
> 
> 
> i965/nir: double/dvec2 uniforms only need to be padded to a single
> vec4 slot
> i965/vec4/nir: simplify glsl_type_for_nir_alu_type()
> i965/vec4/nir: allocate two registers for dvec3/dvec4
> i965/vec4/nir: Add bit-size information to types
> i965/vec4/nir: support doubles in ALU operations
> i965/vec4/nir: set the right type for 64-bit registers
> i965/vec4/nir: fix emitting 64-bit immediates
> i965/vec4: add support for printing DF immediates
> i965/vec4: add double/float conversion pseudo-opcodes
> 
>   I wonder if we should allow MOV F/DF and DF/F operations in the
>   IR and then have a lowering pass that "legalizes" them. I'm
>   happy to leave that experiment for after this series lands.
> 
> i965/vec4: translate d2f/f2d
> i965: add brw_vecn_grf()
> i965/vec4: set correct register regions for 32-bit and 64-bit
> i965/disasm: align16 DF source regions have a width of 2
> 
>   It's actually kind of weird to print width and horizontal
> stride
>   for align16 sources, since they don't exist in the instruction
>   word. We should probably print only the vertical stride. I
> don't
>   care if that's fixed a part of this series.
> 
> i965/vec4: We only support 32-bit integer ALU operations for now
> i965/vec4: add dst_null_df()
> i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes
> i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes
> 
>   If I understand correctly, these opcodes map to instructions
>   like
> 
>   mov(XXX) dst<1>UD  src<8,4,2>:UD
> 
>   Is the exec_size 4? I ask, because if it's 8 (and the source
>   region spans two registers and the dest region spans one)
> that's
>   not a legal instruction. If it's 4, then it's legal.
> 
> i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT
> i965/vec4: don't copy propagate vector opcodes that operate in align1
> mode
> i965/vec4: implement double unpacking
> 
>   This emits 
>   
>   MOVdvec4_tmp, op[0]
>   PICK_LO/HI uvec4_tmp, dvec4_tmp
>   MOVdst, uvec4_tmp
>   
>   I'm confused about the purpose of the MOVs. It seems like op[0]
>   should already be a dvec and dst should already be a uvec.
> 
> i965/vec4: implement double packing
> 
>   More or less the same thing here. Looks like we don't need all
>   of the MOVs.
> 
> i965/vec4/nir: implement double comparisons
> 
>   Trivial: A newline before the if() would be nice.
> 
>   I have a memory of Curro telling me that the hardware maps each
>   32-bit chunk in the dst to a single bit in the flag register.
>   Maybe that's only on IVB, and maybe I'm misremembering. I'm
>   concerned that while the PICK_LOW+MOV will properly handle the
>   result that is written to the destination, the result written
> to
>   the flag register might be incorrect.
> 
>   My commit d9b09f8a30 fixed some problems that seems similar in
>   my mind.
> 
> i965/vec4: fix indentation in get_nir_src()
> i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations
> i965/vec4: make opt_vector_float ignore doubles
> i965/vec4: fix register allocation for 64-bit undef sources
> i965/vec4: Rename DF to/from F generator opcodes
> 
>   I'm not sure replacing "float" with "single" implies that the
>   opcodes can handle other 32-bit (integer) types, since "single"
>   is actually the name of the "float" type in some other
>   programming languages.
> 
>   Maybe call them VEC4_OPCODE_TO_DOUBLE and
>   VEC4_OPCODE_FROM_DOUBLE?
> 
> i965/vec4: add helpers for conversions to/from doubles
> 
>   Same thing here.
> 
>   Also, same confusion about the purpose of the MOVs.
> 
> i965/vec4: implement hardware workaround for align16 double to float
> conversion
> 
>   This always seemed like a really strange hardware bug, and one
>   that no one should ever hit.
> 
>   I'd prefer that, instead of loading an immediate double and
> then
>   performing a conversion to float, that we just convert the
>   double to float in the compiler and emit an instruction to load
>   that.
> 
> i965/vec4: implement d2i, d2u, i2d and u2d
> 

Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-05 Thread Matt Turner

On 10/11, Iago Toral Quiroga wrote:

It's been some time since


... anyone has reviewed your patches. Sorry. :(

I'm going to review from your rebased i965-fp64-gen7-scalar-vec4-rc2
branch. There have probably been some reorderings or other changes due
to rebasing since the patches were sent, so I'm going to paste the list
of patches below and then attempt to list any review comments after the
patch name.


A couple of patches have an extra newline in the commit message between
*-by: tags. Would be nice to make a pass through and fix that.


i965/nir: double/dvec2 uniforms only need to be padded to a single vec4 slot
i965/vec4/nir: simplify glsl_type_for_nir_alu_type()
i965/vec4/nir: allocate two registers for dvec3/dvec4
i965/vec4/nir: Add bit-size information to types
i965/vec4/nir: support doubles in ALU operations
i965/vec4/nir: set the right type for 64-bit registers
i965/vec4/nir: fix emitting 64-bit immediates
i965/vec4: add support for printing DF immediates
i965/vec4: add double/float conversion pseudo-opcodes

I wonder if we should allow MOV F/DF and DF/F operations in the
IR and then have a lowering pass that "legalizes" them. I'm
happy to leave that experiment for after this series lands.

i965/vec4: translate d2f/f2d
i965: add brw_vecn_grf()
i965/vec4: set correct register regions for 32-bit and 64-bit
i965/disasm: align16 DF source regions have a width of 2

It's actually kind of weird to print width and horizontal stride
for align16 sources, since they don't exist in the instruction
word. We should probably print only the vertical stride. I don't
care if that's fixed a part of this series.

i965/vec4: We only support 32-bit integer ALU operations for now
i965/vec4: add dst_null_df()
i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes
i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes

If I understand correctly, these opcodes map to instructions
like

mov(XXX) dst<1>UD  src<8,4,2>:UD

Is the exec_size 4? I ask, because if it's 8 (and the source
region spans two registers and the dest region spans one) that's
not a legal instruction. If it's 4, then it's legal.

i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT
i965/vec4: don't copy propagate vector opcodes that operate in align1 mode
i965/vec4: implement double unpacking

	This emits 
	

MOVdvec4_tmp, op[0]
PICK_LO/HI uvec4_tmp, dvec4_tmp
MOVdst, uvec4_tmp

I'm confused about the purpose of the MOVs. It seems like op[0]
should already be a dvec and dst should already be a uvec.

i965/vec4: implement double packing

More or less the same thing here. Looks like we don't need all
of the MOVs.

i965/vec4/nir: implement double comparisons

Trivial: A newline before the if() would be nice.

I have a memory of Curro telling me that the hardware maps each
32-bit chunk in the dst to a single bit in the flag register.
Maybe that's only on IVB, and maybe I'm misremembering. I'm
concerned that while the PICK_LOW+MOV will properly handle the
result that is written to the destination, the result written to
the flag register might be incorrect.

My commit d9b09f8a30 fixed some problems that seems similar in
my mind.

i965/vec4: fix indentation in get_nir_src()
i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations
i965/vec4: make opt_vector_float ignore doubles
i965/vec4: fix register allocation for 64-bit undef sources
i965/vec4: Rename DF to/from F generator opcodes

I'm not sure replacing "float" with "single" implies that the
opcodes can handle other 32-bit (integer) types, since "single"
is actually the name of the "float" type in some other
programming languages.

Maybe call them VEC4_OPCODE_TO_DOUBLE and
VEC4_OPCODE_FROM_DOUBLE?

i965/vec4: add helpers for conversions to/from doubles

Same thing here.

Also, same confusion about the purpose of the MOVs.

i965/vec4: implement hardware workaround for align16 double to float conversion

This always seemed like a really strange hardware bug, and one
that no one should ever hit.

I'd prefer that, instead of loading an immediate double and then
performing a conversion to float, that we just convert the
double to float in the compiler and emit an instruction to load
that.

i965/vec4: implement d2i, d2u, i2d and u2d
i965/vec4: implement d2b

Trivial: s/Curo/Curro/ in commit message.

Trivial: The comment says "predicated MOV", but it's actually a
MOV with conditional_mod.

i965/vec4: implement fsign() for doubles

Trivial: v2 comment and comment in code say "predicated MOV"
like previous patch

i965/vec4: fix optimize predicate for doubles


[Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-10-11 Thread Iago Toral Quiroga
It's been some time since we sent the first version of the patches, so here is
a v2, which adds:

1. Feedback from Curro to v1. I think the only thing missing is the suggestion
to change the semantics of the offset() helper in vec4 to match those in the
scalar backend. I sent this as a separate series [1] that is still awaiting
review. Once that is good to land we should adapt this series accordingly.

2. Adaptations to the sub-register offsets work done by Curro in master.

3. Some rudimentary support for 64-bit spilling. This is quite limited at the
moment, since it skips spilling of fp64 data in a number of cases where it
is not safe to do it at present. I guess we can look for ways improve this
going forward, but I rather do that after we land the bulk of fp64, since the
series is already quite big as it is.

4. Avoid scalarizing a number of swizzle combinations that we can support
natively.

5. Many other small clean-ups and fixes.

The series is available for testing in the 'i965-fp64-gen7-scalar-vec4-rc2'
branch of our github repository [2].

This series implements the bulk of the fp64 align16 backend support and creates
the infrastructure to implement vertex attrib 64bit as well, so once this lands
in master we plan to send additional series that add VA64 for Haswell, and then
Fp64 and VA64 for IvyBridge.

[1] https://lists.freedesktop.org/archives/mesa-dev/2016-October/130459.html
[2] https://github.com/Igalia/mesa/tree/i965-fp64-gen7-scalar-vec4-rc2

Connor Abbott (6):
  i965/vec4/nir: simplify glsl_type_for_nir_alu_type()
  i965/vec4/nir: allocate two registers for dvec3/dvec4
  i965/vec4/nir: set the right type for 64-bit registers
  i965/vec4: add support for printing DF immediates
  i965: add brw_vecn_grf()
  i965/vec4: don't constant propagate 64-bit immediates

Iago Toral Quiroga (92):
  i965/vec4/nir: Add bit-size information to types
  i965/vec4/nir: support doubles in ALU operations
  i965/vec4/nir: fix emitting 64-bit immediates
  i965/vec4: add double/float conversion pseudo-opcodes
  i965/vec4: translate d2f/f2d
  i965: fix subnr overflow in suboffset()
  i965/vec4: set correct register regions for 32-bit and 64-bit
  i965/disasm: align16 DF source regions have a width of 2
  i965/vec4: We only support 32-bit integer ALU operations for now
  i965/vec4: add dst_null_df()
  i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes
  i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes
  i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT
  i965/vec4: don't copy propagate vector opcodes that operate in align1
mode
  i965/vec4: implement double unpacking
  i965/vec4: implement double packing
  i965/vec4/nir: implement double comparisons
  i965/vec4: fix base offset for nir_registers with doubles
  i965/vec4: fix indentation in get_nir_src()
  i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations
  i965/vec4: make opt_vector_float ignore doubles
  i965/vec4: fix register allocation for 64-bit undef sources
  i965/vec4: Rename DF to/from F generator opcodes
  i965/vec4: add helpers for conversions to/from doubles
  i965/vec4: implement hardware workaround for align16 double to float
conversion
  i965/vec4: implement d2i, d2u, i2d and u2d
  i965/vec4: implement d2b
  i965/vec4: implement fsign() for doubles
  i965/vec4: fix optimize predicate for doubles
  i965/vec4: add a helper function to create double immediates
  i965: move exec_size from fs_instruction to backend_instruction
  i965/vec4: fix size_written for doubles
  i965/vec4: fix regs_read() for doubles
  i965/vec4: use the IR's execution size
  i965/vec4: dump the instruction execution size
  i965/vec4: add a horiz_offset() helper
  i965: move the group field from fs_inst to backend_instruction.
  i965/vec4: add a SIMD lowering pass
  i965/vec4: make the generator set correct NibCtrl for SIMD4 DF
instructions
  i965/vec4: dump NibCtrl for instructions with execsize != 8
  i965/disasm: print NibCtrl for instructions with execsize < 8
  i965/vec4: teach CSE about exec_size, group and doubles
  i965/vec4: teach cmod propagation about different execution sizes
  i965/vec4: split double-precision bcsel
  i965/vec4: add a scalarization pass for double-precision instructions
  i965/vec4: translate 64-bit swizzles to 32-bit
  i965/vec4: implement access to DF source components Z/W
  i965/disasm: fix subreg for dst in Align16 mode
  i965/vec4: teach register coalescing about 64-bit
  i965/vec4: fix pack_uniform_registers for doubles
  i965/vec4: fix indentation in pack_uniform_registers
  i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands
  i965/vec4/nir: do not emit 64-bit MAD
  i965/vec4: do not emit 64-bit MAD
  i965/vec4: support multiple dispatch widths and groups in the IR
builder.
  i965/vec4: Add a shuffle_64bit_data helper
  i965/vec4: Fix UBO loads for 64-bit data
  i965/vec4: Fix SSBO loads for 64-bit data
  i965/vec4: Fix SSBO stores for 64-bit data
  i965/vec4: prevent