On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott <cwabbo...@gmail.com> wrote: > On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner <matts...@gmail.com> wrote: >> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga <ito...@igalia.com> >> wrote: >>> From: Connor Abbott <connor.w.abb...@intel.com> >>> >>> It appears that not only math instructions, but also MOV_BYTES or >>> any instruction that uses Align1 mode cannot be in the middle >>> of a dependency control sequence or the GPU will hang (at least on my >>> BDW). This fixes GPU hangs in some fp64 tests. >> >> I'm pretty surprised by this assessment. Doubtful even. >> >>> Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 ++++++++++++----- >>> 1 file changed, 12 insertions(+), 5 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> index 3bcd5cb..bc0a33b 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const >>> vec4_instruction *inst) >>> } >>> >>> /* >>> + * Instructions that use Align1 mode cause the GPU to hang when inserted >>> + * between a NoDDClr and NoDDChk in Align16 mode. Discovered >>> empirically. >>> + */ >>> + >>> + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || >>> + inst->opcode == VEC4_OPCODE_MOV_BYTES || >> >> PACK_BYTES sets depctrl itself in the generator, and at the time I >> added it I made a test that did >> >> vec4 foo = vec4(packUnorm4x8(...), >> packUnorm4x8(...), >> packUnorm4x8(...), >> packUnorm4x8(...)) >> >> and confirmed that it set depctrl properly on the whole sequence. >> There could of course be bugs somewhere, but the "hardware doesn't >> work if you mix align1 and align16 depctrl" seems unlikely. >> >> Do you know of a test that this affects? > > This only affects FP64 tests, since there we use an align1 mov to do > double-to-float and float-to-double. However, I tried commenting out > emit_nir_code() and just doing essentially: > > emit(MOV(...))->force_writemask_all = true; > emit(VEC4_OPCODE_PACK_BYTES, ...); > emit(MOV(...))->force_writemask_all = true; > > and on my BDW it hanged. In case it's not clear: this isn't about > setting depctrl on the instruction itself, it just can't be inside of > a depctrl sequence (which we were already disallowing for math > instructions anyways).
Very weird. I'll take a look. So I understand, are the MOV instructions writing different channels of the same register? And VEC4_OPCODE_PACK_BYTES is writing to a different or the same register as the MOVs? (I saw your fixup reply) By the way, the math code is too heavy handed as far as I know. The BDW+ docs say that the MATH instruction itself cannot take dependency control hints (and empirically earlier platforms seem to have problems with this as well, see tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a math instruction being in the middle of a NoDDC* block. The person who implemented the math did the minimal amount of work to fix the problem. The PRM also says: """ Instructions other than send, may use this control as long as operations that have different pipeline latencies are not mixed. The operations that have longer latencies are: Opcodes pln, lrp, dp*. Operations involving double precision computation. Integer DW multiplication where both source operands are DWs. """ I would say that mixing a double-precision operation and something else might cause problems, but that seems like we have a different problem thus far. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev