I checked that zeroing destination operand for unary bit-manipulation
instruction is helpful for 64- and 32-bit mode only. So the patch was
changed.

Is it OK for trunk?

gcc/ChangeLog
2014-08-15  Yuri Rumyantsev  <ysrum...@gmail.com>

PR target/62011
* config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function
 prototype.
* config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function.
* config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros.
* config/i386/i386.md (ctz<mode>2, clz<mode>2_lzcnt, popcount<mode>2,
 *popcount<mode>2_cmp, *popcountsi2_cmp_zext): Output zeroing
 destination register for unary bit-manipulation instructions
 if required.
* config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New.

2014-08-14 19:39 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>:
> It does not help Silvermont, i.e. only Haswell and SandyBridge are affected.
> I don't use splitter since (1) it deletes zeroing of dest reg; (2)
> scheduler can hoist them up . I will try r16/r32 variants and tell you
> later.
>
> 2014-08-14 19:18 GMT+04:00 H.J. Lu <hjl.to...@gmail.com>:
>> On Thu, Aug 14, 2014 at 4:50 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Hi All,
>>>
>>> Here is a fix for PR 62011 - remove false dependency for unary
>>> bit-manipulation instructions for latest BigCore chips (Sandybridge
>>> and Haswell) by outputting in assembly file zeroing destination
>>> register before bmi instruction. I checked that performance restored
>>> for popcnt, lzcnt and tzcnt instructions.
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> Is it OK for trunk?
>>>
>>> gcc/ChangeLog
>>> 2014-08-14  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>
>>> PR target/62011
>>> * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function
>>>  prototype.
>>> * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function.
>>> * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros.
>>> * config/i386/i386.md (ctz<mode>2, clz<mode>2_lzcnt, popcount<mode>2,
>>>  *popcount<mode>2_cmp, *popcountsi2_cmp_zext): Output zeroing
>>>  destination register for unary bit-manipulation instructions
>>>  if required.
>>
>> Why don't you use splitter to to generate XOR?
>>
>>> * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New.
>>
>> Is this needed for r16 and r32?  The original report says that only
>> r64 is affected:
>>
>> http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance
>>
>> Have you tried this on Silvermont?  Does it help Silvermont?
>>
>> --
>> H.J.

Attachment: patch1
Description: Binary data

Reply via email to