Including mesa-dev in my previous reply. El 11/07/18 a las 01:08, Caio Marcelo de Oliveira Filho escribió: >> Since Gen8+ Intel PRM states that "r127 must not be used for return >> address when there is a src and dest overlap in send instruction." > > The previous patch, that verifies the condition above is causing > > tests/spec/arb_compute_shader/linker/bug-93840.shader_test > > to crash with > > shader_runner: ../src/intel/compiler/brw_fs_generator.cpp:2455: int > fs_generator::generate_code(const cfg_t*, int): Assertion `validated' failed. > > I also could reproduce the crash locally. It happens even with this > patch (which adds the hack) applied.
I've seen it in Jenkins, but couldn't reproduce it so I thought it wasn't related. Now I've realized that I was using a release build at that moment. The good thing is that the validator rule has detected that the generated instruction was incorrect. >> This patch implements this restriction creating new grf127_send_hack_node >> at the register allocator. This node has a fixed assignation to grf127. >> >> For vgrf that are used as destination of send messages we create node >> interfereces with the grf127_send_hack_node. So the register allocator >> will never assign to these vgrf a register that involves grf127. >> >> If dispatch_width > 8 we don't create these interferences to the because >> all instructions have node interferences between sources and destination. >> That is enough to avoid the r127 restriction. > > I think for both widths will not be enough. The instruction that fails > the validation is: > > mov(8) g126<1>UD g0<8,8,1>UD { align1 > WE_all 1Q }; > mov(1) g126.2<1>UD 0x00000090UD { align1 > WE_all 1N }; > send(16) g126<1>UW g126<8,8,1>UD > data ( DC OWORD block read, 253, 3) mlen 1 rlen 2 > { align1 WE_all 1H }; > ERROR: r127 must not be used for return address when there is a src > and dest overlap > > Which if I understood correctly comes from the scratch reading being > created by the spilling logic. In brw_oword_block_read_scratch() we > see > > if (p->devinfo->gen >= 7) { > /* On gen 7 and above, we no longer have message registers and we can > * send from any register we want. By using the destination register > * for the message, we guarantee that the implied message write won't > * accidentally overwrite anything. This has been a problem because > * the MRF registers and source for the final FB write are both fixed > * and may overlap. > */ > mrf = retype(dest, BRW_REGISTER_TYPE_UD); > } else { > mrf = retype(mrf, BRW_REGISTER_TYPE_UD); > } > dest = retype(dest, BRW_REGISTER_TYPE_UW); > > It seems to me we'll have to handle r127 there as well. Yes, as in this case source and destination are coded to be the same vgrf, we don't have a source/destination interference on SIMD16. I'm doing some extra testing but something like next code at assigns_regs seems to fix the issue: if (spilled_any_registers) { foreach_block_and_inst(block, fs_inst, inst, cfg) { if (inst->opcode == SHADER_OPCODE_GEN7_SCRATCH_READ || inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ) { ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node); } } Thanks Caio for digging into the problem. I'm sending today a patch to deal with this case. Chema >> >> This fixes CTS tests that raised this issue as they were executed as SIMD8: >> >> dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom >> >> Shader-db results on Skylake: >> total instructions in shared programs: 7686798 -> 7686797 (<.01%) >> instructions in affected programs: 301 -> 300 (-0.33%) >> helped: 1 >> HURT: 0 >> >> total cycles in shared programs: 337092322 -> 337091919 (<.01%) >> cycles in affected programs: 22420415 -> 22420012 (<.01%) >> helped: 712 >> HURT: 588 >> >> Shader-db results on Broadwell: >> >> total instructions in shared programs: 7658574 -> 7658625 (<.01%) >> instructions in affected programs: 19610 -> 19661 (0.26%) >> helped: 3 >> HURT: 4 >> >> total cycles in shared programs: 340694553 -> 340676378 (<.01%) >> cycles in affected programs: 24724915 -> 24706740 (-0.07%) >> helped: 998 >> HURT: 916 >> >> total spills in shared programs: 4300 -> 4311 (0.26%) >> spills in affected programs: 333 -> 344 (3.30%) >> helped: 1 >> HURT: 3 >> >> total fills in shared programs: 5370 -> 5378 (0.15%) >> fills in affected programs: 274 -> 282 (2.92%) >> helped: 1 >> HURT: 3 >> >> v2: Avoid duplicating register classes without grf127. Let's use a node >> with a fixed assignation to grf127 and create interferences to send >> message vgrf destinations. (Eric Anholt) >> v3: Update reference to CTS VK_KHR_8bit_storage failing tests. >> (Jose Maria Casanova) >> --- >> src/intel/compiler/brw_fs_reg_allocate.cpp | 25 ++++++++++++++++++++++ >> 1 file changed, 25 insertions(+) >> >> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp >> b/src/intel/compiler/brw_fs_reg_allocate.cpp >> index ec8e116cb38..59e047483c0 100644 >> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp >> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp >> @@ -548,6 +548,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool >> spill_all) >> int first_mrf_hack_node = node_count; >> if (devinfo->gen >= 7) >> node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START; >> + int grf127_send_hack_node = node_count; >> + if (devinfo->gen >= 8 && dispatch_width == 8) >> + node_count ++; >> struct ra_graph *g = >> ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, >> node_count); >> >> @@ -653,6 +656,28 @@ fs_visitor::assign_regs(bool allow_spilling, bool >> spill_all) >> } >> } >> >> + if (devinfo->gen >= 8 && dispatch_width == 8) { >> + /* At Intel Broadwell PRM, vol 07, section "Instruction Set >> Reference", >> + * subsection "EUISA Instructions", Send Message (page 990): >> + * >> + * "r127 must not be used for return address when there is a src and >> + * dest overlap in send instruction." >> + * >> + * We are avoiding using grf127 as part of the destination of send >> + * messages adding a node interference to the grf127_send_hack_node. >> + * This node has a fixed asignment to grf127. >> + * >> + * We don't apply it to SIMD16 because previous code avoids any >> register >> + * overlap between sources and destination. >> + */ >> + ra_set_node_reg(g, grf127_send_hack_node, 127); >> + foreach_block_and_inst(block, fs_inst, inst, cfg) { >> + if (inst->is_send_from_grf() && inst->dst.file == VGRF) { >> + ra_add_node_interference(g, inst->dst.nr, >> grf127_send_hack_node); >> + } >> + } >> + } >> + >> /* Debug of register spilling: Go spill everything. */ >> if (unlikely(spill_all)) { >> int reg = choose_spill_reg(g); >> -- >> 2.17.1 >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev