It does not help Silvermont, i.e. only Haswell and SandyBridge are affected.
I don't use splitter since (1) it deletes zeroing of dest reg; (2)
scheduler can hoist them up . I will try r16/r32 variants and tell you
later.

2014-08-14 19:18 GMT+04:00 H.J. Lu <hjl.to...@gmail.com>:
> On Thu, Aug 14, 2014 at 4:50 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Hi All,
>>
>> Here is a fix for PR 62011 - remove false dependency for unary
>> bit-manipulation instructions for latest BigCore chips (Sandybridge
>> and Haswell) by outputting in assembly file zeroing destination
>> register before bmi instruction. I checked that performance restored
>> for popcnt, lzcnt and tzcnt instructions.
>>
>> Bootstrap and regression testing did not show any new failures.
>>
>> Is it OK for trunk?
>>
>> gcc/ChangeLog
>> 2014-08-14  Yuri Rumyantsev  <ysrum...@gmail.com>
>>
>> PR target/62011
>> * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function
>>  prototype.
>> * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function.
>> * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros.
>> * config/i386/i386.md (ctz<mode>2, clz<mode>2_lzcnt, popcount<mode>2,
>>  *popcount<mode>2_cmp, *popcountsi2_cmp_zext): Output zeroing
>>  destination register for unary bit-manipulation instructions
>>  if required.
>
> Why don't you use splitter to to generate XOR?
>
>> * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New.
>
> Is this needed for r16 and r32?  The original report says that only
> r64 is affected:
>
> http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance
>
> Have you tried this on Silvermont?  Does it help Silvermont?
>
> --
> H.J.

Reply via email to