[Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions
From: Connor AbbottIt appears that not only math instructions, but also MOV_BYTES or any instruction that uses Align1 mode cannot be in the middle of a dependency control sequence or the GPU will hang (at least on my BDW). This fixes GPU hangs in some fp64 tests. Reviewed-by: Iago Toral Quiroga --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 3bcd5cb..bc0a33b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction *inst) } /* +* Instructions that use Align1 mode cause the GPU to hang when inserted +* between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically. +*/ + + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || + inst->opcode == VEC4_OPCODE_MOV_BYTES || + inst->is_math()) + return true; + + + /* * mlen: * In the presence of send messages, totally interrupt dependency * control. They're long enough that the chance of dependency @@ -851,12 +862,8 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction *inst) * enable of the last instruction, the optimization must be avoided. This is * to avoid instructions being shot down the pipeline when no writes are * required. -* -* math: -* Dependency control does not work well over math instructions. -* NB: Discovered empirically */ - return (inst->mlen || inst->predicate || inst->is_math()); + return (inst->mlen || inst->predicate); } /** -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions
On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbottwrote: > On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner wrote: >> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga >> wrote: >>> From: Connor Abbott >>> >>> It appears that not only math instructions, but also MOV_BYTES or >>> any instruction that uses Align1 mode cannot be in the middle >>> of a dependency control sequence or the GPU will hang (at least on my >>> BDW). This fixes GPU hangs in some fp64 tests. >> >> I'm pretty surprised by this assessment. Doubtful even. >> >>> Reviewed-by: Iago Toral Quiroga >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 - >>> 1 file changed, 12 insertions(+), 5 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> index 3bcd5cb..bc0a33b 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const >>> vec4_instruction *inst) >>> } >>> >>> /* >>> +* Instructions that use Align1 mode cause the GPU to hang when inserted >>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered >>> empirically. >>> +*/ >>> + >>> + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || >>> + inst->opcode == VEC4_OPCODE_MOV_BYTES || >> >> PACK_BYTES sets depctrl itself in the generator, and at the time I >> added it I made a test that did >> >> vec4 foo = vec4(packUnorm4x8(...), >> packUnorm4x8(...), >> packUnorm4x8(...), >> packUnorm4x8(...)) >> >> and confirmed that it set depctrl properly on the whole sequence. >> There could of course be bugs somewhere, but the "hardware doesn't >> work if you mix align1 and align16 depctrl" seems unlikely. >> >> Do you know of a test that this affects? > > This only affects FP64 tests, since there we use an align1 mov to do > double-to-float and float-to-double. However, I tried commenting out > emit_nir_code() and just doing essentially: > > emit(MOV(...))->force_writemask_all = true; > emit(VEC4_OPCODE_PACK_BYTES, ...); > emit(MOV(...))->force_writemask_all = true; > > and on my BDW it hanged. In case it's not clear: this isn't about > setting depctrl on the instruction itself, it just can't be inside of > a depctrl sequence (which we were already disallowing for math > instructions anyways). Very weird. I'll take a look. So I understand, are the MOV instructions writing different channels of the same register? And VEC4_OPCODE_PACK_BYTES is writing to a different or the same register as the MOVs? (I saw your fixup reply) By the way, the math code is too heavy handed as far as I know. The BDW+ docs say that the MATH instruction itself cannot take dependency control hints (and empirically earlier platforms seem to have problems with this as well, see tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a math instruction being in the middle of a NoDDC* block. The person who implemented the math did the minimal amount of work to fix the problem. The PRM also says: """ Instructions other than send, may use this control as long as operations that have different pipeline latencies are not mixed. The operations that have longer latencies are: Opcodes pln, lrp, dp*. Operations involving double precision computation. Integer DW multiplication where both source operands are DWs. """ I would say that mixing a double-precision operation and something else might cause problems, but that seems like we have a different problem thus far. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions
On Thu, Nov 19, 2015 at 1:54 PM, Matt Turnerwrote: > On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott wrote: >> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner wrote: >>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga >>> wrote: From: Connor Abbott It appears that not only math instructions, but also MOV_BYTES or any instruction that uses Align1 mode cannot be in the middle of a dependency control sequence or the GPU will hang (at least on my BDW). This fixes GPU hangs in some fp64 tests. >>> >>> I'm pretty surprised by this assessment. Doubtful even. >>> Reviewed-by: Iago Toral Quiroga --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 3bcd5cb..bc0a33b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction *inst) } /* +* Instructions that use Align1 mode cause the GPU to hang when inserted +* between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically. +*/ + + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || + inst->opcode == VEC4_OPCODE_MOV_BYTES || >>> >>> PACK_BYTES sets depctrl itself in the generator, and at the time I >>> added it I made a test that did >>> >>> vec4 foo = vec4(packUnorm4x8(...), >>> packUnorm4x8(...), >>> packUnorm4x8(...), >>> packUnorm4x8(...)) >>> >>> and confirmed that it set depctrl properly on the whole sequence. >>> There could of course be bugs somewhere, but the "hardware doesn't >>> work if you mix align1 and align16 depctrl" seems unlikely. >>> >>> Do you know of a test that this affects? >> >> This only affects FP64 tests, since there we use an align1 mov to do >> double-to-float and float-to-double. However, I tried commenting out >> emit_nir_code() and just doing essentially: >> >> emit(MOV(...))->force_writemask_all = true; >> emit(VEC4_OPCODE_PACK_BYTES, ...); >> emit(MOV(...))->force_writemask_all = true; >> >> and on my BDW it hanged. In case it's not clear: this isn't about >> setting depctrl on the instruction itself, it just can't be inside of >> a depctrl sequence (which we were already disallowing for math >> instructions anyways). > > Very weird. I'll take a look. So I understand, are the MOV > instructions writing different channels of the same register? And > VEC4_OPCODE_PACK_BYTES is writing to a different or the same register > as the MOVs? (I saw your fixup reply) Actually, I had them writing the same thing so the second overwrote the first one. The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all of them, not sure) were operating on completely different registers, and in the FP64 test that actually hung the GPU they were as well. Using d2f since it's simpler and I remember what the operands are (it's just an align1 mov with a dest stride of 2), the test code was something like: mov g50, g51 { no_dd_clear } d2f g52, g54 mov g50, g53 { no_dd_check } and changing the d2f to a normal align16 mov or commenting it out prevented the hang. It would be interesting to see if a math instruction instead of d2f also hangs. > > By the way, the math code is too heavy handed as far as I know. The > BDW+ docs say that the MATH instruction itself cannot take dependency > control hints (and empirically earlier platforms seem to have problems > with this as well, see > tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a > math instruction being in the middle of a NoDDC* block. The person who > implemented the math did the minimal amount of work to fix the > problem. > > The PRM also says: > > """ > Instructions other than send, may use this control as long as > operations that have different pipeline latencies are not mixed. The > operations that have longer latencies are: > > Opcodes pln, lrp, dp*. > Operations involving double precision computation. > Integer DW multiplication where both source operands are DWs. > """ > > I would say that mixing a double-precision operation and something > else might cause problems, but that seems like we have a different > problem thus far. Yeah, these are all just mov's so I would expect that section to apply. It still seems like we're not taking it into account, though... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions
On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbottwrote: > On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner wrote: >> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott wrote: >>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner wrote: On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga wrote: > From: Connor Abbott > > It appears that not only math instructions, but also MOV_BYTES or > any instruction that uses Align1 mode cannot be in the middle > of a dependency control sequence or the GPU will hang (at least on my > BDW). This fixes GPU hangs in some fp64 tests. I'm pretty surprised by this assessment. Doubtful even. > Reviewed-by: Iago Toral Quiroga > --- > src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 - > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > index 3bcd5cb..bc0a33b 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const > vec4_instruction *inst) > } > > /* > +* Instructions that use Align1 mode cause the GPU to hang when > inserted > +* between a NoDDClr and NoDDChk in Align16 mode. Discovered > empirically. > +*/ > + > + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || > + inst->opcode == VEC4_OPCODE_MOV_BYTES || PACK_BYTES sets depctrl itself in the generator, and at the time I added it I made a test that did vec4 foo = vec4(packUnorm4x8(...), packUnorm4x8(...), packUnorm4x8(...), packUnorm4x8(...)) and confirmed that it set depctrl properly on the whole sequence. There could of course be bugs somewhere, but the "hardware doesn't work if you mix align1 and align16 depctrl" seems unlikely. Do you know of a test that this affects? >>> >>> This only affects FP64 tests, since there we use an align1 mov to do >>> double-to-float and float-to-double. However, I tried commenting out >>> emit_nir_code() and just doing essentially: >>> >>> emit(MOV(...))->force_writemask_all = true; >>> emit(VEC4_OPCODE_PACK_BYTES, ...); >>> emit(MOV(...))->force_writemask_all = true; >>> >>> and on my BDW it hanged. In case it's not clear: this isn't about >>> setting depctrl on the instruction itself, it just can't be inside of >>> a depctrl sequence (which we were already disallowing for math >>> instructions anyways). >> >> Very weird. I'll take a look. So I understand, are the MOV >> instructions writing different channels of the same register? And >> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register >> as the MOVs? (I saw your fixup reply) > > Actually, I had them writing the same thing so the second overwrote > the first one. Err, just to be clear... in the actual test that led to me discovering this, the instructions (not mov's but a bfi and a mov IIRC) were writing different components of the same register. In my hacked-up test to see if align1 in between a depctrl sequence was the problem, I just had them both write to the same thing since it was easier. In any case, I don't think it should matter too much. > The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all > of them, not sure) were operating on completely different registers, > and in the FP64 test that actually hung the GPU they were as well. > Using d2f since it's simpler and I remember what the operands are > (it's just an align1 mov with a dest stride of 2), the test code was > something like: > > mov g50, g51 { no_dd_clear } > d2f g52, g54 > mov g50, g53 { no_dd_check } > > and changing the d2f to a normal align16 mov or commenting it out > prevented the hang. It would be interesting to see if a math > instruction instead of d2f also hangs. > >> >> By the way, the math code is too heavy handed as far as I know. The >> BDW+ docs say that the MATH instruction itself cannot take dependency >> control hints (and empirically earlier platforms seem to have problems >> with this as well, see >> tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a >> math instruction being in the middle of a NoDDC* block. The person who >> implemented the math did the minimal amount of work to fix the >> problem. >> >> The PRM also says: >> >> """ >> Instructions other than send, may use this control as long as >> operations that have different pipeline latencies are not mixed. The >> operations that have longer latencies are: >> >> Opcodes pln, lrp, dp*. >> Operations involving double precision computation. >> Integer DW multiplication where both source operands are DWs. >> """ >>
Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions
On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbottwrote: > On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott wrote: >> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner wrote: >>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott wrote: On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner wrote: > On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga > wrote: >> From: Connor Abbott >> >> It appears that not only math instructions, but also MOV_BYTES or >> any instruction that uses Align1 mode cannot be in the middle >> of a dependency control sequence or the GPU will hang (at least on my >> BDW). This fixes GPU hangs in some fp64 tests. > > I'm pretty surprised by this assessment. Doubtful even. > >> Reviewed-by: Iago Toral Quiroga >> --- >> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 - >> 1 file changed, 12 insertions(+), 5 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> index 3bcd5cb..bc0a33b 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const >> vec4_instruction *inst) >> } >> >> /* >> +* Instructions that use Align1 mode cause the GPU to hang when >> inserted >> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered >> empirically. >> +*/ >> + >> + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || >> + inst->opcode == VEC4_OPCODE_MOV_BYTES || > > PACK_BYTES sets depctrl itself in the generator, and at the time I > added it I made a test that did > > vec4 foo = vec4(packUnorm4x8(...), > packUnorm4x8(...), > packUnorm4x8(...), > packUnorm4x8(...)) > > and confirmed that it set depctrl properly on the whole sequence. > There could of course be bugs somewhere, but the "hardware doesn't > work if you mix align1 and align16 depctrl" seems unlikely. > > Do you know of a test that this affects? This only affects FP64 tests, since there we use an align1 mov to do double-to-float and float-to-double. However, I tried commenting out emit_nir_code() and just doing essentially: emit(MOV(...))->force_writemask_all = true; emit(VEC4_OPCODE_PACK_BYTES, ...); emit(MOV(...))->force_writemask_all = true; and on my BDW it hanged. In case it's not clear: this isn't about setting depctrl on the instruction itself, it just can't be inside of a depctrl sequence (which we were already disallowing for math instructions anyways). >>> >>> Very weird. I'll take a look. So I understand, are the MOV >>> instructions writing different channels of the same register? And >>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register >>> as the MOVs? (I saw your fixup reply) >> >> Actually, I had them writing the same thing so the second overwrote >> the first one. > > Err, just to be clear... in the actual test that led to me discovering > this, the instructions (not mov's but a bfi and a mov IIRC) were > writing different components of the same register. In my hacked-up > test to see if align1 in between a depctrl sequence was the problem, I > just had them both write to the same thing since it was easier. In any > case, I don't think it should matter too much. Interesting. If it was a BFI and a MOV writing different components of the same register, I can see that failing. Though I haven't measured, 3-src instructions usually have 2+ extra cycles of latency and with a 2-cycle issue time, a BFI followed immediately by a MOV would mean that both of them retired in the same cycle. That would seem bad if one of them was supposed to signal the scoreboard. × ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions
On Thu, Nov 19, 2015 at 3:11 PM, Matt Turnerwrote: > On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbott wrote: >> On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott wrote: >>> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner wrote: On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott wrote: > On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner wrote: >> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga >> wrote: >>> From: Connor Abbott >>> >>> It appears that not only math instructions, but also MOV_BYTES or >>> any instruction that uses Align1 mode cannot be in the middle >>> of a dependency control sequence or the GPU will hang (at least on my >>> BDW). This fixes GPU hangs in some fp64 tests. >> >> I'm pretty surprised by this assessment. Doubtful even. >> >>> Reviewed-by: Iago Toral Quiroga >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 - >>> 1 file changed, 12 insertions(+), 5 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> index 3bcd5cb..bc0a33b 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const >>> vec4_instruction *inst) >>> } >>> >>> /* >>> +* Instructions that use Align1 mode cause the GPU to hang when >>> inserted >>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered >>> empirically. >>> +*/ >>> + >>> + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || >>> + inst->opcode == VEC4_OPCODE_MOV_BYTES || >> >> PACK_BYTES sets depctrl itself in the generator, and at the time I >> added it I made a test that did >> >> vec4 foo = vec4(packUnorm4x8(...), >> packUnorm4x8(...), >> packUnorm4x8(...), >> packUnorm4x8(...)) >> >> and confirmed that it set depctrl properly on the whole sequence. >> There could of course be bugs somewhere, but the "hardware doesn't >> work if you mix align1 and align16 depctrl" seems unlikely. >> >> Do you know of a test that this affects? > > This only affects FP64 tests, since there we use an align1 mov to do > double-to-float and float-to-double. However, I tried commenting out > emit_nir_code() and just doing essentially: > > emit(MOV(...))->force_writemask_all = true; > emit(VEC4_OPCODE_PACK_BYTES, ...); > emit(MOV(...))->force_writemask_all = true; > > and on my BDW it hanged. In case it's not clear: this isn't about > setting depctrl on the instruction itself, it just can't be inside of > a depctrl sequence (which we were already disallowing for math > instructions anyways). Very weird. I'll take a look. So I understand, are the MOV instructions writing different channels of the same register? And VEC4_OPCODE_PACK_BYTES is writing to a different or the same register as the MOVs? (I saw your fixup reply) >>> >>> Actually, I had them writing the same thing so the second overwrote >>> the first one. >> >> Err, just to be clear... in the actual test that led to me discovering >> this, the instructions (not mov's but a bfi and a mov IIRC) were >> writing different components of the same register. In my hacked-up >> test to see if align1 in between a depctrl sequence was the problem, I >> just had them both write to the same thing since it was easier. In any >> case, I don't think it should matter too much. > > Interesting. If it was a BFI and a MOV writing different components of > the same register, I can see that failing. Though I haven't measured, > 3-src instructions usually have 2+ extra cycles of latency and with a > 2-cycle issue time, a BFI followed immediately by a MOV would mean > that both of them retired in the same cycle. That would seem bad if > one of them was supposed to signal the scoreboard. > × They weren't right next to each other, though -- there was a sequence of 5-6 instructions between them, one of which happens to be an align1 mov which (according to my testing) hangs the GPU in between any depctrl sequence. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions
On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quirogawrote: > From: Connor Abbott > > It appears that not only math instructions, but also MOV_BYTES or > any instruction that uses Align1 mode cannot be in the middle > of a dependency control sequence or the GPU will hang (at least on my > BDW). This fixes GPU hangs in some fp64 tests. I'm pretty surprised by this assessment. Doubtful even. > Reviewed-by: Iago Toral Quiroga > --- > src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 - > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > index 3bcd5cb..bc0a33b 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction > *inst) > } > > /* > +* Instructions that use Align1 mode cause the GPU to hang when inserted > +* between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically. > +*/ > + > + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || > + inst->opcode == VEC4_OPCODE_MOV_BYTES || PACK_BYTES sets depctrl itself in the generator, and at the time I added it I made a test that did vec4 foo = vec4(packUnorm4x8(...), packUnorm4x8(...), packUnorm4x8(...), packUnorm4x8(...)) and confirmed that it set depctrl properly on the whole sequence. There could of course be bugs somewhere, but the "hardware doesn't work if you mix align1 and align16 depctrl" seems unlikely. Do you know of a test that this affects? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions
On Thu, Nov 19, 2015 at 6:40 AM, Matt Turnerwrote: > On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga wrote: >> From: Connor Abbott >> >> It appears that not only math instructions, but also MOV_BYTES or >> any instruction that uses Align1 mode cannot be in the middle >> of a dependency control sequence or the GPU will hang (at least on my >> BDW). This fixes GPU hangs in some fp64 tests. > > I'm pretty surprised by this assessment. Doubtful even. > >> Reviewed-by: Iago Toral Quiroga >> --- >> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 - >> 1 file changed, 12 insertions(+), 5 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> index 3bcd5cb..bc0a33b 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction >> *inst) >> } >> >> /* >> +* Instructions that use Align1 mode cause the GPU to hang when inserted >> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically. >> +*/ >> + >> + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || >> + inst->opcode == VEC4_OPCODE_MOV_BYTES || > > PACK_BYTES sets depctrl itself in the generator, and at the time I > added it I made a test that did > > vec4 foo = vec4(packUnorm4x8(...), > packUnorm4x8(...), > packUnorm4x8(...), > packUnorm4x8(...)) > > and confirmed that it set depctrl properly on the whole sequence. > There could of course be bugs somewhere, but the "hardware doesn't > work if you mix align1 and align16 depctrl" seems unlikely. > > Do you know of a test that this affects? This only affects FP64 tests, since there we use an align1 mov to do double-to-float and float-to-double. However, I tried commenting out emit_nir_code() and just doing essentially: emit(MOV(...))->force_writemask_all = true; emit(VEC4_OPCODE_PACK_BYTES, ...); emit(MOV(...))->force_writemask_all = true; and on my BDW it hanged. In case it's not clear: this isn't about setting depctrl on the instruction itself, it just can't be inside of a depctrl sequence (which we were already disallowing for math instructions anyways). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions
On Thu, Nov 19, 2015 at 10:31 AM, Connor Abbottwrote: > On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner wrote: >> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga >> wrote: >>> From: Connor Abbott >>> >>> It appears that not only math instructions, but also MOV_BYTES or >>> any instruction that uses Align1 mode cannot be in the middle >>> of a dependency control sequence or the GPU will hang (at least on my >>> BDW). This fixes GPU hangs in some fp64 tests. >> >> I'm pretty surprised by this assessment. Doubtful even. >> >>> Reviewed-by: Iago Toral Quiroga >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 - >>> 1 file changed, 12 insertions(+), 5 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> index 3bcd5cb..bc0a33b 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const >>> vec4_instruction *inst) >>> } >>> >>> /* >>> +* Instructions that use Align1 mode cause the GPU to hang when inserted >>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered >>> empirically. >>> +*/ >>> + >>> + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || >>> + inst->opcode == VEC4_OPCODE_MOV_BYTES || >> >> PACK_BYTES sets depctrl itself in the generator, and at the time I >> added it I made a test that did >> >> vec4 foo = vec4(packUnorm4x8(...), >> packUnorm4x8(...), >> packUnorm4x8(...), >> packUnorm4x8(...)) >> >> and confirmed that it set depctrl properly on the whole sequence. >> There could of course be bugs somewhere, but the "hardware doesn't >> work if you mix align1 and align16 depctrl" seems unlikely. >> >> Do you know of a test that this affects? > > This only affects FP64 tests, since there we use an align1 mov to do > double-to-float and float-to-double. However, I tried commenting out > emit_nir_code() and just doing essentially: > > emit(MOV(...))->force_writemask_all = true; > emit(VEC4_OPCODE_PACK_BYTES, ...); > emit(MOV(...))->force_writemask_all = true; Err, I meant: emit(MOV(...))->no_dd_clear = true; emit(VEC4_OPCODE_PACK_BYTES, ...); emit(MOV(...))->no_dd_check = true; (this also hangs with my DOUBLE_TO_FLOAT and FLOAT_TO_DOUBLE opcodes, which don't use the data dependency bits at all). > > and on my BDW it hanged. In case it's not clear: this isn't about > setting depctrl on the instruction itself, it just can't be inside of > a depctrl sequence (which we were already disallowing for math > instructions anyways). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev