On 08/26/2018 05:21 AM, Jason Ekstrand wrote:
> On August 24, 2018 22:49:14 "Ian Romanick" <i...@freedesktop.org> wrote:
> 
>> From: Ian Romanick <ian.d.roman...@intel.com>
>>
>> Funny story... a single shader was hurt for instructions, spills, fills.
>> That same shader was also the most helped for cycles.  #GPUsAreWeird
>>
>> No changes on any other Intel platform.
>>
>> Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
>> total instructions in shared programs: 14304116 -> 14304261 (<.01%)
>> instructions in affected programs: 12776 -> 12921 (1.13%)
>> helped: 19
>> HURT: 1
>> helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1
>> helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55%
>> HURT stats (abs)   min: 189 max: 189 x̄: 189.00 x̃: 189
>> HURT stats (rel)   min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87%
>> 95% mean confidence interval for instructions value: -12.83 27.33
>> 95% mean confidence interval for instructions %-change: -1.57% 0.31%
>> Inconclusive result (value mean confidence interval includes 0).
>>
>> total cycles in shared programs: 527552861 -> 527531226 (<.01%)
>> cycles in affected programs: 1459195 -> 1437560 (-1.48%)
>> helped: 16
>> HURT: 2
>> helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6
>> helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03%
>> HURT stats (abs)   min: 12 max: 12 x̄: 12.00 x̃: 12
>> HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
>> 95% mean confidence interval for cycles value: -3699.81 1295.92
>> 95% mean confidence interval for cycles %-change: -0.94% 0.30%
>> Inconclusive result (value mean confidence interval includes 0).
>>
>> total spills in shared programs: 8025 -> 8033 (0.10%)
>> spills in affected programs: 208 -> 216 (3.85%)
>> helped: 1
>> HURT: 1
>>
>> total fills in shared programs: 10989 -> 11040 (0.46%)
>> fills in affected programs: 444 -> 495 (11.49%)
>> helped: 1
>> HURT: 1
>>
>> Ivy Bridge
>> total instructions in shared programs: 11709181 -> 11709153 (<.01%)
>> instructions in affected programs: 3505 -> 3477 (-0.80%)
>> helped: 3
>> HURT: 0
>> helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4
>> helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61%
>>
>> total cycles in shared programs: 254741126 -> 254738801 (<.01%)
>> cycles in affected programs: 919067 -> 916742 (-0.25%)
>> helped: 3
>> HURT: 0
>> helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160
>> helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03%
>>
>> total spills in shared programs: 4536 -> 4533 (-0.07%)
>> spills in affected programs: 40 -> 37 (-7.50%)
>> helped: 1
>> HURT: 0
>>
>> total fills in shared programs: 4819 -> 4813 (-0.12%)
>> fills in affected programs: 94 -> 88 (-6.38%)
>> helped: 1
>> HURT: 0
>>
>> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com>
>> ---
>> src/intel/compiler/brw_fs_nir.cpp | 38
>> ++++++++++++++++++++++++++++++++------
>> 1 file changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs_nir.cpp
>> b/src/intel/compiler/brw_fs_nir.cpp
>> index 9c9df5ac09f..a2c3d715380 100644
>> --- a/src/intel/compiler/brw_fs_nir.cpp
>> +++ b/src/intel/compiler/brw_fs_nir.cpp
>> @@ -3659,9 +3659,20 @@ fs_visitor::nir_emit_cs_intrinsic(const
>> fs_builder &bld,
>>       break;
>>    }
>>
>> -   case nir_intrinsic_shared_atomic_add:
>> -      nir_emit_shared_atomic(bld, BRW_AOP_ADD, instr);
>> +   case nir_intrinsic_shared_atomic_add: {
>> +      int op = BRW_AOP_ADD;
>> +      const nir_const_value *const val =
>> nir_src_as_const_value(instr->src[1]);
>> +
>> +      if (val != NULL) {
>> +         if (val->i32[0] == 1)
>> +            op = BRW_AOP_INC;
>> +         else if (val->i32[0] == -1)
>> +            op = BRW_AOP_DEC;
>> +      }
> 
> Given the repeated pattern in this and the other patches, would it make
> sense to add a little brw_atomic_add_op helper that takes a nir_src? 
> I'm not sure exactly how that would work with adjusting the number of
> surfed everywhere though.  Feel free to disregard if it doesn't make sense.

I had thought of that, then dismissed it... but that was when I was
thinking there was only one place to do this.  Now that it has grown to
3, I think this is a good idea.

> Another thought (more of an FYI, really) is that I had considered doing
> this as a NIR lowering pass at one point. Not sure if anyone else has
> hardware inc/dec operations though.  You don't need to do anything with
> that information; I'm just throwing it out there.

I thought about that too... I don't know.  It would be more work to
achieve basically the same result.  It might be possible to have it
detect cases like atomicAdd(a, -1) - 1 to generate BRW_AOP_PREDEC.

> --Jason
> 
>> +
>> +      nir_emit_shared_atomic(bld, op, instr);
>>       break;
>> +   }
>>    case nir_intrinsic_shared_atomic_imin:
>>       nir_emit_shared_atomic(bld, BRW_AOP_IMIN, instr);
>>       break;
>> @@ -4377,9 +4388,20 @@ fs_visitor::nir_emit_intrinsic(const fs_builder
>> &bld, nir_intrinsic_instr *instr
>>       break;
>>    }
>>
>> -   case nir_intrinsic_ssbo_atomic_add:
>> -      nir_emit_ssbo_atomic(bld, BRW_AOP_ADD, instr);
>> +   case nir_intrinsic_ssbo_atomic_add: {
>> +      int op = BRW_AOP_ADD;
>> +      const nir_const_value *const val =
>> nir_src_as_const_value(instr->src[2]);
>> +
>> +      if (val != NULL) {
>> +         if (val->i32[0] == 1)
>> +            op = BRW_AOP_INC;
>> +         else if (val->i32[0] == -1)
>> +            op = BRW_AOP_DEC;
>> +      }
>> +
>> +      nir_emit_ssbo_atomic(bld, op, instr);
>>       break;
>> +   }
>>    case nir_intrinsic_ssbo_atomic_imin:
>>       nir_emit_ssbo_atomic(bld, BRW_AOP_IMIN, instr);
>>       break;
>> @@ -4888,7 +4910,9 @@ fs_visitor::nir_emit_ssbo_atomic(const
>> fs_builder &bld,
>>    }
>>
>>    fs_reg offset = get_nir_src(instr->src[1]);
>> -   fs_reg data1 = get_nir_src(instr->src[2]);
>> +   fs_reg data1;
>> +   if (op != BRW_AOP_INC && op != BRW_AOP_DEC && op != BRW_AOP_PREDEC)
>> +      data1 = get_nir_src(instr->src[2]);
>>    fs_reg data2;
>>    if (op == BRW_AOP_CMPWR)
>>       data2 = get_nir_src(instr->src[3]);
>> @@ -4962,7 +4986,9 @@ fs_visitor::nir_emit_shared_atomic(const
>> fs_builder &bld,
>>
>>    fs_reg surface = brw_imm_ud(GEN7_BTI_SLM);
>>    fs_reg offset;
>> -   fs_reg data1 = get_nir_src(instr->src[1]);
>> +   fs_reg data1;
>> +   if (op != BRW_AOP_INC && op != BRW_AOP_DEC && op != BRW_AOP_PREDEC)
>> +      data1 = get_nir_src(instr->src[1]);
>>    fs_reg data2;
>>    if (op == BRW_AOP_CMPWR)
>>       data2 = get_nir_src(instr->src[2]);
>> -- 
>> 2.14.4
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to