It does not help Silvermont, i.e. only Haswell and SandyBridge are affected. I don't use splitter since (1) it deletes zeroing of dest reg; (2) scheduler can hoist them up . I will try r16/r32 variants and tell you later.
2014-08-14 19:18 GMT+04:00 H.J. Lu <hjl.to...@gmail.com>: > On Thu, Aug 14, 2014 at 4:50 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Hi All, >> >> Here is a fix for PR 62011 - remove false dependency for unary >> bit-manipulation instructions for latest BigCore chips (Sandybridge >> and Haswell) by outputting in assembly file zeroing destination >> register before bmi instruction. I checked that performance restored >> for popcnt, lzcnt and tzcnt instructions. >> >> Bootstrap and regression testing did not show any new failures. >> >> Is it OK for trunk? >> >> gcc/ChangeLog >> 2014-08-14 Yuri Rumyantsev <ysrum...@gmail.com> >> >> PR target/62011 >> * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function >> prototype. >> * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function. >> * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros. >> * config/i386/i386.md (ctz<mode>2, clz<mode>2_lzcnt, popcount<mode>2, >> *popcount<mode>2_cmp, *popcountsi2_cmp_zext): Output zeroing >> destination register for unary bit-manipulation instructions >> if required. > > Why don't you use splitter to to generate XOR? > >> * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New. > > Is this needed for r16 and r32? The original report says that only > r64 is affected: > > http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance > > Have you tried this on Silvermont? Does it help Silvermont? > > -- > H.J.