Re: [Mesa-dev] [PATCH 2/9] i965/fs: Register allocator shoudn't use grf127 for sends dest

2018-07-11 Thread Chema Casanova
Including mesa-dev in my previous reply.

El 11/07/18 a las 01:08, Caio Marcelo de Oliveira Filho escribió:
>> Since Gen8+ Intel PRM states that "r127 must not be used for return
>> address when there is a src and dest overlap in send instruction."
> 
> The previous patch, that verifies the condition above is causing
> 
> tests/spec/arb_compute_shader/linker/bug-93840.shader_test
> 
> to crash with
> 
> shader_runner: ../src/intel/compiler/brw_fs_generator.cpp:2455: int 
> fs_generator::generate_code(const cfg_t*, int): Assertion `validated' failed.
> 
> I also could reproduce the crash locally. It happens even with this
> patch (which adds the hack) applied.

I've seen it in Jenkins, but couldn't reproduce it so I thought it
wasn't related. Now I've realized that I was using a release build at
that moment.

The good thing is that the validator rule has detected that the
generated instruction was incorrect.


>> This patch implements this restriction creating new grf127_send_hack_node
>> at the register allocator. This node has a fixed assignation to grf127.
>>
>> For vgrf that are used as destination of send messages we create node
>> interfereces with the grf127_send_hack_node. So the register allocator
>> will never assign to these vgrf a register that involves grf127.
>>
>> If dispatch_width > 8 we don't create these interferences to the because
>> all instructions have node interferences between sources and destination.
>> That is enough to avoid the r127 restriction.
> 
> I think for both widths will not be enough. The instruction that fails
> the validation is:
> 
> mov(8)  g126<1>UD   g0<8,8,1>UD { align1 
> WE_all 1Q };
> mov(1)  g126.2<1>UD 0x0090UD{ align1 
> WE_all 1N };
> send(16)g126<1>UW   g126<8,8,1>UD
> data ( DC OWORD block read, 253, 3) mlen 1 rlen 2 
> { align1 WE_all 1H };
> ERROR: r127 must not be used for return address when there is a src 
> and dest overlap
> 
> Which if I understood correctly comes from the scratch reading being
> created by the spilling logic. In brw_oword_block_read_scratch() we
> see
> 
>if (p->devinfo->gen >= 7) {
>   /* On gen 7 and above, we no longer have message registers and we can
>* send from any register we want.  By using the destination register
>* for the message, we guarantee that the implied message write won't
>* accidentally overwrite anything.  This has been a problem because
>* the MRF registers and source for the final FB write are both fixed
>* and may overlap.
>*/
>   mrf = retype(dest, BRW_REGISTER_TYPE_UD);
>} else {
>   mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
>}
>dest = retype(dest, BRW_REGISTER_TYPE_UW);
> 
> It seems to me we'll have to handle r127 there as well.

Yes, as in this case source and destination are coded to be the same
vgrf, we don't have a source/destination interference on SIMD16.

I'm doing some extra testing but something like next code at
assigns_regs seems to fix the issue:


  if (spilled_any_registers) {
 foreach_block_and_inst(block, fs_inst, inst, cfg) {
if (inst->opcode == SHADER_OPCODE_GEN7_SCRATCH_READ ||
inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ) {
   ra_add_node_interference(g, inst->dst.nr,
grf127_send_hack_node);
}
 }

Thanks Caio for digging into the problem. I'm sending today a patch to
deal with this case.

Chema


>>
>> This fixes CTS tests that raised this issue as they were executed as SIMD8:
>>
>> dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
>>
>> Shader-db results on Skylake:
>>total instructions in shared programs: 7686798 -> 7686797 (<.01%)
>>instructions in affected programs: 301 -> 300 (-0.33%)
>>helped: 1
>>HURT: 0
>>
>>total cycles in shared programs: 337092322 -> 337091919 (<.01%)
>>cycles in affected programs: 22420415 -> 22420012 (<.01%)
>>helped: 712
>>HURT: 588
>>
>> Shader-db results on Broadwell:
>>
>>total instructions in shared programs: 7658574 -> 7658625 (<.01%)
>>instructions in affected programs: 19610 -> 19661 (0.26%)
>>helped: 3
>>HURT: 4
>>
>>total cycles in shared programs: 340694553 -> 340676378 (<.01%)
>>cycles in affected programs: 24724915 -> 24706740 (-0.07%)
>>helped: 998
>>HURT: 916
>>
>>total spills in shared programs: 4300 -> 4311 (0.26%)
>>spills in affected programs: 333 -> 344 (3.30%)
>>helped: 1
>>HURT: 3
>>
>>total fills in shared programs: 5370 -> 5378 (0.15%)
>>fills in affected programs: 274 -> 282 (2.92%)
>>helped: 1
>>HURT: 3
>>
>> v2: Avoid duplicating register classes without grf127. Let's use a node
>> with a fixed assignation to grf127 and create interferences to send
>> message vgrf destinations. (Eric Anholt)
>> v3: 

Re: [Mesa-dev] [PATCH 2/9] i965/fs: Register allocator shoudn't use grf127 for sends dest

2018-07-10 Thread Caio Marcelo de Oliveira Filho
> Since Gen8+ Intel PRM states that "r127 must not be used for return
> address when there is a src and dest overlap in send instruction."

The previous patch, that verifies the condition above is causing

tests/spec/arb_compute_shader/linker/bug-93840.shader_test

to crash with

shader_runner: ../src/intel/compiler/brw_fs_generator.cpp:2455: int 
fs_generator::generate_code(const cfg_t*, int): Assertion `validated' failed.

I also could reproduce the crash locally. It happens even with this
patch (which adds the hack) applied.


> This patch implements this restriction creating new grf127_send_hack_node
> at the register allocator. This node has a fixed assignation to grf127.
> 
> For vgrf that are used as destination of send messages we create node
> interfereces with the grf127_send_hack_node. So the register allocator
> will never assign to these vgrf a register that involves grf127.
> 
> If dispatch_width > 8 we don't create these interferences to the because
> all instructions have node interferences between sources and destination.
> That is enough to avoid the r127 restriction.

I think for both widths will not be enough. The instruction that fails
the validation is:

mov(8)  g126<1>UD   g0<8,8,1>UD { align1 WE_all 
1Q };
mov(1)  g126.2<1>UD 0x0090UD{ align1 WE_all 
1N };
send(16)g126<1>UW   g126<8,8,1>UD
data ( DC OWORD block read, 253, 3) mlen 1 rlen 2 { 
align1 WE_all 1H };
ERROR: r127 must not be used for return address when there is a src and 
dest overlap

Which if I understood correctly comes from the scratch reading being
created by the spilling logic. In brw_oword_block_read_scratch() we
see

   if (p->devinfo->gen >= 7) {
  /* On gen 7 and above, we no longer have message registers and we can
   * send from any register we want.  By using the destination register
   * for the message, we guarantee that the implied message write won't
   * accidentally overwrite anything.  This has been a problem because
   * the MRF registers and source for the final FB write are both fixed
   * and may overlap.
   */
  mrf = retype(dest, BRW_REGISTER_TYPE_UD);
   } else {
  mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
   }
   dest = retype(dest, BRW_REGISTER_TYPE_UW);

It seems to me we'll have to handle r127 there as well.


Thanks,
Caio



> 
> This fixes CTS tests that raised this issue as they were executed as SIMD8:
> 
> dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
> 
> Shader-db results on Skylake:
>total instructions in shared programs: 7686798 -> 7686797 (<.01%)
>instructions in affected programs: 301 -> 300 (-0.33%)
>helped: 1
>HURT: 0
> 
>total cycles in shared programs: 337092322 -> 337091919 (<.01%)
>cycles in affected programs: 22420415 -> 22420012 (<.01%)
>helped: 712
>HURT: 588
> 
> Shader-db results on Broadwell:
> 
>total instructions in shared programs: 7658574 -> 7658625 (<.01%)
>instructions in affected programs: 19610 -> 19661 (0.26%)
>helped: 3
>HURT: 4
> 
>total cycles in shared programs: 340694553 -> 340676378 (<.01%)
>cycles in affected programs: 24724915 -> 24706740 (-0.07%)
>helped: 998
>HURT: 916
> 
>total spills in shared programs: 4300 -> 4311 (0.26%)
>spills in affected programs: 333 -> 344 (3.30%)
>helped: 1
>HURT: 3
> 
>total fills in shared programs: 5370 -> 5378 (0.15%)
>fills in affected programs: 274 -> 282 (2.92%)
>helped: 1
>HURT: 3
> 
> v2: Avoid duplicating register classes without grf127. Let's use a node
> with a fixed assignation to grf127 and create interferences to send
> message vgrf destinations. (Eric Anholt)
> v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
> (Jose Maria Casanova)
> ---
>  src/intel/compiler/brw_fs_reg_allocate.cpp | 25 ++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
> b/src/intel/compiler/brw_fs_reg_allocate.cpp
> index ec8e116cb38..59e047483c0 100644
> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> @@ -548,6 +548,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
> spill_all)
> int first_mrf_hack_node = node_count;
> if (devinfo->gen >= 7)
>node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
> +   int grf127_send_hack_node = node_count;
> +   if (devinfo->gen >= 8 && dispatch_width == 8)
> +  node_count ++;
> struct ra_graph *g =
>ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, 
> node_count);
>  
> @@ -653,6 +656,28 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
> spill_all)
>}
> }
>  
> +   if (devinfo->gen >= 8 && dispatch_width == 8) {
> +  /* At Intel Broadwell PRM, vol 07, section "Instruction Set 

Re: [Mesa-dev] [PATCH 2/9] i965/fs: Register allocator shoudn't use grf127 for sends dest

2018-07-08 Thread Jason Ekstrand
This is a very clever solution to the problem.  I like it. :-)

Reviewed-by: Jason Ekstrand 

I think that's all of them.  I've pushed the XML bump so you should be able
to land at will.

--Jason

On Sun, Jul 8, 2018 at 5:29 PM Jose Maria Casanova Crespo <
jmcasan...@igalia.com> wrote:

> Since Gen8+ Intel PRM states that "r127 must not be used for return
> address when there is a src and dest overlap in send instruction."
>
> This patch implements this restriction creating new grf127_send_hack_node
> at the register allocator. This node has a fixed assignation to grf127.
>
> For vgrf that are used as destination of send messages we create node
> interfereces with the grf127_send_hack_node. So the register allocator
> will never assign to these vgrf a register that involves grf127.
>
> If dispatch_width > 8 we don't create these interferences to the because
> all instructions have node interferences between sources and destination.
> That is enough to avoid the r127 restriction.
>
> This fixes CTS tests that raised this issue as they were executed as SIMD8:
>
>
> dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
>
> Shader-db results on Skylake:
>total instructions in shared programs: 7686798 -> 7686797 (<.01%)
>instructions in affected programs: 301 -> 300 (-0.33%)
>helped: 1
>HURT: 0
>
>total cycles in shared programs: 337092322 -> 337091919 (<.01%)
>cycles in affected programs: 22420415 -> 22420012 (<.01%)
>helped: 712
>HURT: 588
>
> Shader-db results on Broadwell:
>
>total instructions in shared programs: 7658574 -> 7658625 (<.01%)
>instructions in affected programs: 19610 -> 19661 (0.26%)
>helped: 3
>HURT: 4
>
>total cycles in shared programs: 340694553 -> 340676378 (<.01%)
>cycles in affected programs: 24724915 -> 24706740 (-0.07%)
>helped: 998
>HURT: 916
>
>total spills in shared programs: 4300 -> 4311 (0.26%)
>spills in affected programs: 333 -> 344 (3.30%)
>helped: 1
>HURT: 3
>
>total fills in shared programs: 5370 -> 5378 (0.15%)
>fills in affected programs: 274 -> 282 (2.92%)
>helped: 1
>HURT: 3
>
> v2: Avoid duplicating register classes without grf127. Let's use a node
> with a fixed assignation to grf127 and create interferences to send
> message vgrf destinations. (Eric Anholt)
> v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
> (Jose Maria Casanova)
> ---
>  src/intel/compiler/brw_fs_reg_allocate.cpp | 25 ++
>  1 file changed, 25 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp
> b/src/intel/compiler/brw_fs_reg_allocate.cpp
> index ec8e116cb38..59e047483c0 100644
> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> @@ -548,6 +548,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool
> spill_all)
> int first_mrf_hack_node = node_count;
> if (devinfo->gen >= 7)
>node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
> +   int grf127_send_hack_node = node_count;
> +   if (devinfo->gen >= 8 && dispatch_width == 8)
> +  node_count ++;
> struct ra_graph *g =
>ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs,
> node_count);
>
> @@ -653,6 +656,28 @@ fs_visitor::assign_regs(bool allow_spilling, bool
> spill_all)
>}
> }
>
> +   if (devinfo->gen >= 8 && dispatch_width == 8) {
> +  /* At Intel Broadwell PRM, vol 07, section "Instruction Set
> Reference",
> +   * subsection "EUISA Instructions", Send Message (page 990):
> +   *
> +   * "r127 must not be used for return address when there is a src and
> +   * dest overlap in send instruction."
> +   *
> +   * We are avoiding using grf127 as part of the destination of send
> +   * messages adding a node interference to the grf127_send_hack_node.
> +   * This node has a fixed asignment to grf127.
> +   *
> +   * We don't apply it to SIMD16 because previous code avoids any
> register
> +   * overlap between sources and destination.
> +   */
> +  ra_set_node_reg(g, grf127_send_hack_node, 127);
> +  foreach_block_and_inst(block, fs_inst, inst, cfg) {
> + if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
> +ra_add_node_interference(g, inst->dst.nr,
> grf127_send_hack_node);
> + }
> +  }
> +   }
> +
> /* Debug of register spilling: Go spill everything. */
> if (unlikely(spill_all)) {
>int reg = choose_spill_reg(g);
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] i965/fs: Register allocator shoudn't use grf127 for sends dest

2018-07-08 Thread Jose Maria Casanova Crespo
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."

This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.

For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.

If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.

This fixes CTS tests that raised this issue as they were executed as SIMD8:

dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom

Shader-db results on Skylake:
   total instructions in shared programs: 7686798 -> 7686797 (<.01%)
   instructions in affected programs: 301 -> 300 (-0.33%)
   helped: 1
   HURT: 0

   total cycles in shared programs: 337092322 -> 337091919 (<.01%)
   cycles in affected programs: 22420415 -> 22420012 (<.01%)
   helped: 712
   HURT: 588

Shader-db results on Broadwell:

   total instructions in shared programs: 7658574 -> 7658625 (<.01%)
   instructions in affected programs: 19610 -> 19661 (0.26%)
   helped: 3
   HURT: 4

   total cycles in shared programs: 340694553 -> 340676378 (<.01%)
   cycles in affected programs: 24724915 -> 24706740 (-0.07%)
   helped: 998
   HURT: 916

   total spills in shared programs: 4300 -> 4311 (0.26%)
   spills in affected programs: 333 -> 344 (3.30%)
   helped: 1
   HURT: 3

   total fills in shared programs: 5370 -> 5378 (0.15%)
   fills in affected programs: 274 -> 282 (2.92%)
   helped: 1
   HURT: 3

v2: Avoid duplicating register classes without grf127. Let's use a node
with a fixed assignation to grf127 and create interferences to send
message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
(Jose Maria Casanova)
---
 src/intel/compiler/brw_fs_reg_allocate.cpp | 25 ++
 1 file changed, 25 insertions(+)

diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
b/src/intel/compiler/brw_fs_reg_allocate.cpp
index ec8e116cb38..59e047483c0 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -548,6 +548,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
int first_mrf_hack_node = node_count;
if (devinfo->gen >= 7)
   node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
+   int grf127_send_hack_node = node_count;
+   if (devinfo->gen >= 8 && dispatch_width == 8)
+  node_count ++;
struct ra_graph *g =
   ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, node_count);
 
@@ -653,6 +656,28 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
spill_all)
   }
}
 
+   if (devinfo->gen >= 8 && dispatch_width == 8) {
+  /* At Intel Broadwell PRM, vol 07, section "Instruction Set Reference",
+   * subsection "EUISA Instructions", Send Message (page 990):
+   *
+   * "r127 must not be used for return address when there is a src and
+   * dest overlap in send instruction."
+   *
+   * We are avoiding using grf127 as part of the destination of send
+   * messages adding a node interference to the grf127_send_hack_node.
+   * This node has a fixed asignment to grf127.
+   *
+   * We don't apply it to SIMD16 because previous code avoids any register
+   * overlap between sources and destination.
+   */
+  ra_set_node_reg(g, grf127_send_hack_node, 127);
+  foreach_block_and_inst(block, fs_inst, inst, cfg) {
+ if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
+ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
+ }
+  }
+   }
+
/* Debug of register spilling: Go spill everything. */
if (unlikely(spill_all)) {
   int reg = choose_spill_reg(g);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev