Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> It appears that not only math instructions, but also MOV_BYTES or
>>> any instruction that uses Align1 mode cannot be in the middle
>>> of a dependency control sequence or the GPU will hang (at least on my
>>> BDW). This fixes GPU hangs in some fp64 tests.
>>
>> I'm pretty surprised by this assessment. Doubtful even.
>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 3bcd5cb..bc0a33b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>>> vec4_instruction *inst)
>>> }
>>>
>>> /*
>>> +* Instructions that use Align1 mode cause the GPU to hang when inserted
>>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>>> empirically.
>>> +*/
>>> +
>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>
>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>> added it I made a test that did
>>
>>   vec4 foo = vec4(packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...))
>>
>> and confirmed that it set depctrl properly on the whole sequence.
>> There could of course be bugs somewhere, but the "hardware doesn't
>> work if you mix align1 and align16 depctrl" seems unlikely.
>>
>> Do you know of a test that this affects?
>
> This only affects FP64 tests, since there we use an align1 mov to do
> double-to-float and float-to-double. However, I tried commenting out
> emit_nir_code() and just doing essentially:
>
> emit(MOV(...))->force_writemask_all = true;
> emit(VEC4_OPCODE_PACK_BYTES, ...);
> emit(MOV(...))->force_writemask_all = true;
>
> and on my BDW it hanged. In case it's not clear: this isn't about
> setting depctrl on the instruction itself, it just can't be inside of
> a depctrl sequence (which we were already disallowing for math
> instructions anyways).

Very weird. I'll take a look. So I understand, are the MOV
instructions writing different channels of the same register? And
VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
as the MOVs? (I saw your fixup reply)

By the way, the math code is too heavy handed as far as I know. The
BDW+ docs say that the MATH instruction itself cannot take dependency
control hints (and empirically earlier platforms seem to have problems
with this as well, see
tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
math instruction being in the middle of a NoDDC* block. The person who
implemented the math did the minimal amount of work to fix the
problem.

The PRM also says:

"""
Instructions other than send, may use this control as long as
operations that have different pipeline latencies are not mixed. The
operations that have longer latencies are:

Opcodes pln, lrp, dp*.
Operations involving double precision computation.
Integer DW multiplication where both source operands are DWs.
"""

I would say that mixing a double-precision operation and something
else might cause problems, but that seems like we have a different
problem thus far.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>>> wrote:
 From: Connor Abbott 

 It appears that not only math instructions, but also MOV_BYTES or
 any instruction that uses Align1 mode cannot be in the middle
 of a dependency control sequence or the GPU will hang (at least on my
 BDW). This fixes GPU hangs in some fp64 tests.
>>>
>>> I'm pretty surprised by this assessment. Doubtful even.
>>>
 Reviewed-by: Iago Toral Quiroga 
 ---
  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
 b/src/mesa/drivers/dri/i965/brw_vec4.cpp
 index 3bcd5cb..bc0a33b 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
 @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
 vec4_instruction *inst)
 }

 /*
 +* Instructions that use Align1 mode cause the GPU to hang when 
 inserted
 +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
 empirically.
 +*/
 +
 +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
 +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>>
>>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>>> added it I made a test that did
>>>
>>>   vec4 foo = vec4(packUnorm4x8(...),
>>>   packUnorm4x8(...),
>>>   packUnorm4x8(...),
>>>   packUnorm4x8(...))
>>>
>>> and confirmed that it set depctrl properly on the whole sequence.
>>> There could of course be bugs somewhere, but the "hardware doesn't
>>> work if you mix align1 and align16 depctrl" seems unlikely.
>>>
>>> Do you know of a test that this affects?
>>
>> This only affects FP64 tests, since there we use an align1 mov to do
>> double-to-float and float-to-double. However, I tried commenting out
>> emit_nir_code() and just doing essentially:
>>
>> emit(MOV(...))->force_writemask_all = true;
>> emit(VEC4_OPCODE_PACK_BYTES, ...);
>> emit(MOV(...))->force_writemask_all = true;
>>
>> and on my BDW it hanged. In case it's not clear: this isn't about
>> setting depctrl on the instruction itself, it just can't be inside of
>> a depctrl sequence (which we were already disallowing for math
>> instructions anyways).
>
> Very weird. I'll take a look. So I understand, are the MOV
> instructions writing different channels of the same register? And
> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
> as the MOVs? (I saw your fixup reply)

Actually, I had them writing the same thing so the second overwrote
the first one. The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all
of them, not sure) were operating on completely different registers,
and in the FP64 test that actually hung the GPU they were as well.
Using d2f since it's simpler and I remember what the operands are
(it's just an align1 mov with a dest stride of 2), the test code was
something like:

mov g50, g51 { no_dd_clear }
d2f g52, g54
mov g50, g53 { no_dd_check }

and changing the d2f to a normal align16 mov or commenting it out
prevented the hang. It would be interesting to see if a math
instruction instead of d2f also hangs.

>
> By the way, the math code is too heavy handed as far as I know. The
> BDW+ docs say that the MATH instruction itself cannot take dependency
> control hints (and empirically earlier platforms seem to have problems
> with this as well, see
> tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
> math instruction being in the middle of a NoDDC* block. The person who
> implemented the math did the minimal amount of work to fix the
> problem.
>
> The PRM also says:
>
> """
> Instructions other than send, may use this control as long as
> operations that have different pipeline latencies are not mixed. The
> operations that have longer latencies are:
>
> Opcodes pln, lrp, dp*.
> Operations involving double precision computation.
> Integer DW multiplication where both source operands are DWs.
> """
>
> I would say that mixing a double-precision operation and something
> else might cause problems, but that seems like we have a different
> problem thus far.

Yeah, these are all just mov's so I would expect that section to
apply. It still seems like we're not taking it into account, though...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
>>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
 On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
 wrote:
> From: Connor Abbott 
>
> It appears that not only math instructions, but also MOV_BYTES or
> any instruction that uses Align1 mode cannot be in the middle
> of a dependency control sequence or the GPU will hang (at least on my
> BDW). This fixes GPU hangs in some fp64 tests.

 I'm pretty surprised by this assessment. Doubtful even.

> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 3bcd5cb..bc0a33b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
> vec4_instruction *inst)
> }
>
> /*
> +* Instructions that use Align1 mode cause the GPU to hang when 
> inserted
> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
> empirically.
> +*/
> +
> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||

 PACK_BYTES sets depctrl itself in the generator, and at the time I
 added it I made a test that did

   vec4 foo = vec4(packUnorm4x8(...),
   packUnorm4x8(...),
   packUnorm4x8(...),
   packUnorm4x8(...))

 and confirmed that it set depctrl properly on the whole sequence.
 There could of course be bugs somewhere, but the "hardware doesn't
 work if you mix align1 and align16 depctrl" seems unlikely.

 Do you know of a test that this affects?
>>>
>>> This only affects FP64 tests, since there we use an align1 mov to do
>>> double-to-float and float-to-double. However, I tried commenting out
>>> emit_nir_code() and just doing essentially:
>>>
>>> emit(MOV(...))->force_writemask_all = true;
>>> emit(VEC4_OPCODE_PACK_BYTES, ...);
>>> emit(MOV(...))->force_writemask_all = true;
>>>
>>> and on my BDW it hanged. In case it's not clear: this isn't about
>>> setting depctrl on the instruction itself, it just can't be inside of
>>> a depctrl sequence (which we were already disallowing for math
>>> instructions anyways).
>>
>> Very weird. I'll take a look. So I understand, are the MOV
>> instructions writing different channels of the same register? And
>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
>> as the MOVs? (I saw your fixup reply)
>
> Actually, I had them writing the same thing so the second overwrote
> the first one.

Err, just to be clear... in the actual test that led to me discovering
this, the instructions (not mov's but a bfi and a mov IIRC) were
writing different components of the same register. In my hacked-up
test to see if align1 in between a depctrl sequence was the problem, I
just had them both write to the same thing since it was easier. In any
case, I don't think it should matter too much.

> The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all
> of them, not sure) were operating on completely different registers,
> and in the FP64 test that actually hung the GPU they were as well.
> Using d2f since it's simpler and I remember what the operands are
> (it's just an align1 mov with a dest stride of 2), the test code was
> something like:
>
> mov g50, g51 { no_dd_clear }
> d2f g52, g54
> mov g50, g53 { no_dd_check }
>
> and changing the d2f to a normal align16 mov or commenting it out
> prevented the hang. It would be interesting to see if a math
> instruction instead of d2f also hangs.
>
>>
>> By the way, the math code is too heavy handed as far as I know. The
>> BDW+ docs say that the MATH instruction itself cannot take dependency
>> control hints (and empirically earlier platforms seem to have problems
>> with this as well, see
>> tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
>> math instruction being in the middle of a NoDDC* block. The person who
>> implemented the math did the minimal amount of work to fix the
>> problem.
>>
>> The PRM also says:
>>
>> """
>> Instructions other than send, may use this control as long as
>> operations that have different pipeline latencies are not mixed. The
>> operations that have longer latencies are:
>>
>> Opcodes pln, lrp, dp*.
>> Operations involving double precision computation.
>> Integer DW multiplication where both source operands are DWs.
>> """
>>

Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott  wrote:
>> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
>>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
 On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
> wrote:
>> From: Connor Abbott 
>>
>> It appears that not only math instructions, but also MOV_BYTES or
>> any instruction that uses Align1 mode cannot be in the middle
>> of a dependency control sequence or the GPU will hang (at least on my
>> BDW). This fixes GPU hangs in some fp64 tests.
>
> I'm pretty surprised by this assessment. Doubtful even.
>
>> Reviewed-by: Iago Toral Quiroga 
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index 3bcd5cb..bc0a33b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>> vec4_instruction *inst)
>> }
>>
>> /*
>> +* Instructions that use Align1 mode cause the GPU to hang when 
>> inserted
>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>> empirically.
>> +*/
>> +
>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>
> PACK_BYTES sets depctrl itself in the generator, and at the time I
> added it I made a test that did
>
>   vec4 foo = vec4(packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...))
>
> and confirmed that it set depctrl properly on the whole sequence.
> There could of course be bugs somewhere, but the "hardware doesn't
> work if you mix align1 and align16 depctrl" seems unlikely.
>
> Do you know of a test that this affects?

 This only affects FP64 tests, since there we use an align1 mov to do
 double-to-float and float-to-double. However, I tried commenting out
 emit_nir_code() and just doing essentially:

 emit(MOV(...))->force_writemask_all = true;
 emit(VEC4_OPCODE_PACK_BYTES, ...);
 emit(MOV(...))->force_writemask_all = true;

 and on my BDW it hanged. In case it's not clear: this isn't about
 setting depctrl on the instruction itself, it just can't be inside of
 a depctrl sequence (which we were already disallowing for math
 instructions anyways).
>>>
>>> Very weird. I'll take a look. So I understand, are the MOV
>>> instructions writing different channels of the same register? And
>>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
>>> as the MOVs? (I saw your fixup reply)
>>
>> Actually, I had them writing the same thing so the second overwrote
>> the first one.
>
> Err, just to be clear... in the actual test that led to me discovering
> this, the instructions (not mov's but a bfi and a mov IIRC) were
> writing different components of the same register. In my hacked-up
> test to see if align1 in between a depctrl sequence was the problem, I
> just had them both write to the same thing since it was easier. In any
> case, I don't think it should matter too much.

Interesting. If it was a BFI and a MOV writing different components of
the same register, I can see that failing. Though I haven't measured,
3-src instructions usually have 2+ extra cycles of latency and with a
2-cycle issue time, a BFI followed immediately by a MOV would mean
that both of them retired in the same cycle. That would seem bad if
one of them was supposed to signal the scoreboard.
×
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 3:11 PM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbott  wrote:
>> On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott  wrote:
>>> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
 On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> It appears that not only math instructions, but also MOV_BYTES or
>>> any instruction that uses Align1 mode cannot be in the middle
>>> of a dependency control sequence or the GPU will hang (at least on my
>>> BDW). This fixes GPU hangs in some fp64 tests.
>>
>> I'm pretty surprised by this assessment. Doubtful even.
>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 3bcd5cb..bc0a33b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>>> vec4_instruction *inst)
>>> }
>>>
>>> /*
>>> +* Instructions that use Align1 mode cause the GPU to hang when 
>>> inserted
>>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>>> empirically.
>>> +*/
>>> +
>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>
>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>> added it I made a test that did
>>
>>   vec4 foo = vec4(packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...))
>>
>> and confirmed that it set depctrl properly on the whole sequence.
>> There could of course be bugs somewhere, but the "hardware doesn't
>> work if you mix align1 and align16 depctrl" seems unlikely.
>>
>> Do you know of a test that this affects?
>
> This only affects FP64 tests, since there we use an align1 mov to do
> double-to-float and float-to-double. However, I tried commenting out
> emit_nir_code() and just doing essentially:
>
> emit(MOV(...))->force_writemask_all = true;
> emit(VEC4_OPCODE_PACK_BYTES, ...);
> emit(MOV(...))->force_writemask_all = true;
>
> and on my BDW it hanged. In case it's not clear: this isn't about
> setting depctrl on the instruction itself, it just can't be inside of
> a depctrl sequence (which we were already disallowing for math
> instructions anyways).

 Very weird. I'll take a look. So I understand, are the MOV
 instructions writing different channels of the same register? And
 VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
 as the MOVs? (I saw your fixup reply)
>>>
>>> Actually, I had them writing the same thing so the second overwrote
>>> the first one.
>>
>> Err, just to be clear... in the actual test that led to me discovering
>> this, the instructions (not mov's but a bfi and a mov IIRC) were
>> writing different components of the same register. In my hacked-up
>> test to see if align1 in between a depctrl sequence was the problem, I
>> just had them both write to the same thing since it was easier. In any
>> case, I don't think it should matter too much.
>
> Interesting. If it was a BFI and a MOV writing different components of
> the same register, I can see that failing. Though I haven't measured,
> 3-src instructions usually have 2+ extra cycles of latency and with a
> 2-cycle issue time, a BFI followed immediately by a MOV would mean
> that both of them retired in the same cycle. That would seem bad if
> one of them was supposed to signal the scoreboard.
> ×

They weren't right next to each other, though -- there was a sequence
of 5-6 instructions between them, one of which happens to be an align1
mov which (according to my testing) hangs the GPU in between any
depctrl sequence.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
> From: Connor Abbott 
>
> It appears that not only math instructions, but also MOV_BYTES or
> any instruction that uses Align1 mode cannot be in the middle
> of a dependency control sequence or the GPU will hang (at least on my
> BDW). This fixes GPU hangs in some fp64 tests.

I'm pretty surprised by this assessment. Doubtful even.

> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 3bcd5cb..bc0a33b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction 
> *inst)
> }
>
> /*
> +* Instructions that use Align1 mode cause the GPU to hang when inserted
> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically.
> +*/
> +
> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||

PACK_BYTES sets depctrl itself in the generator, and at the time I
added it I made a test that did

  vec4 foo = vec4(packUnorm4x8(...),
  packUnorm4x8(...),
  packUnorm4x8(...),
  packUnorm4x8(...))

and confirmed that it set depctrl properly on the whole sequence.
There could of course be bugs somewhere, but the "hardware doesn't
work if you mix align1 and align16 depctrl" seems unlikely.

Do you know of a test that this affects?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
>> From: Connor Abbott 
>>
>> It appears that not only math instructions, but also MOV_BYTES or
>> any instruction that uses Align1 mode cannot be in the middle
>> of a dependency control sequence or the GPU will hang (at least on my
>> BDW). This fixes GPU hangs in some fp64 tests.
>
> I'm pretty surprised by this assessment. Doubtful even.
>
>> Reviewed-by: Iago Toral Quiroga 
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index 3bcd5cb..bc0a33b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction 
>> *inst)
>> }
>>
>> /*
>> +* Instructions that use Align1 mode cause the GPU to hang when inserted
>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically.
>> +*/
>> +
>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>
> PACK_BYTES sets depctrl itself in the generator, and at the time I
> added it I made a test that did
>
>   vec4 foo = vec4(packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...))
>
> and confirmed that it set depctrl properly on the whole sequence.
> There could of course be bugs somewhere, but the "hardware doesn't
> work if you mix align1 and align16 depctrl" seems unlikely.
>
> Do you know of a test that this affects?

This only affects FP64 tests, since there we use an align1 mov to do
double-to-float and float-to-double. However, I tried commenting out
emit_nir_code() and just doing essentially:

emit(MOV(...))->force_writemask_all = true;
emit(VEC4_OPCODE_PACK_BYTES, ...);
emit(MOV(...))->force_writemask_all = true;

and on my BDW it hanged. In case it's not clear: this isn't about
setting depctrl on the instruction itself, it just can't be inside of
a depctrl sequence (which we were already disallowing for math
instructions anyways).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 10:31 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> It appears that not only math instructions, but also MOV_BYTES or
>>> any instruction that uses Align1 mode cannot be in the middle
>>> of a dependency control sequence or the GPU will hang (at least on my
>>> BDW). This fixes GPU hangs in some fp64 tests.
>>
>> I'm pretty surprised by this assessment. Doubtful even.
>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 3bcd5cb..bc0a33b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>>> vec4_instruction *inst)
>>> }
>>>
>>> /*
>>> +* Instructions that use Align1 mode cause the GPU to hang when inserted
>>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>>> empirically.
>>> +*/
>>> +
>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>
>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>> added it I made a test that did
>>
>>   vec4 foo = vec4(packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...))
>>
>> and confirmed that it set depctrl properly on the whole sequence.
>> There could of course be bugs somewhere, but the "hardware doesn't
>> work if you mix align1 and align16 depctrl" seems unlikely.
>>
>> Do you know of a test that this affects?
>
> This only affects FP64 tests, since there we use an align1 mov to do
> double-to-float and float-to-double. However, I tried commenting out
> emit_nir_code() and just doing essentially:
>
> emit(MOV(...))->force_writemask_all = true;
> emit(VEC4_OPCODE_PACK_BYTES, ...);
> emit(MOV(...))->force_writemask_all = true;

Err, I meant:

emit(MOV(...))->no_dd_clear = true;
emit(VEC4_OPCODE_PACK_BYTES, ...);
emit(MOV(...))->no_dd_check = true;

(this also hangs with my DOUBLE_TO_FLOAT and FLOAT_TO_DOUBLE opcodes,
which don't use the data dependency bits at all).

>
> and on my BDW it hanged. In case it's not clear: this isn't about
> setting depctrl on the instruction itself, it just can't be inside of
> a depctrl sequence (which we were already disallowing for math
> instructions anyways).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev