I checked that zeroing destination operand for unary bit-manipulation instruction is helpful for 64- and 32-bit mode only. So the patch was changed.
Is it OK for trunk? gcc/ChangeLog 2014-08-15 Yuri Rumyantsev <ysrum...@gmail.com> PR target/62011 * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function prototype. * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function. * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros. * config/i386/i386.md (ctz<mode>2, clz<mode>2_lzcnt, popcount<mode>2, *popcount<mode>2_cmp, *popcountsi2_cmp_zext): Output zeroing destination register for unary bit-manipulation instructions if required. * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New. 2014-08-14 19:39 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: > It does not help Silvermont, i.e. only Haswell and SandyBridge are affected. > I don't use splitter since (1) it deletes zeroing of dest reg; (2) > scheduler can hoist them up . I will try r16/r32 variants and tell you > later. > > 2014-08-14 19:18 GMT+04:00 H.J. Lu <hjl.to...@gmail.com>: >> On Thu, Aug 14, 2014 at 4:50 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Hi All, >>> >>> Here is a fix for PR 62011 - remove false dependency for unary >>> bit-manipulation instructions for latest BigCore chips (Sandybridge >>> and Haswell) by outputting in assembly file zeroing destination >>> register before bmi instruction. I checked that performance restored >>> for popcnt, lzcnt and tzcnt instructions. >>> >>> Bootstrap and regression testing did not show any new failures. >>> >>> Is it OK for trunk? >>> >>> gcc/ChangeLog >>> 2014-08-14 Yuri Rumyantsev <ysrum...@gmail.com> >>> >>> PR target/62011 >>> * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function >>> prototype. >>> * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function. >>> * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros. >>> * config/i386/i386.md (ctz<mode>2, clz<mode>2_lzcnt, popcount<mode>2, >>> *popcount<mode>2_cmp, *popcountsi2_cmp_zext): Output zeroing >>> destination register for unary bit-manipulation instructions >>> if required. >> >> Why don't you use splitter to to generate XOR? >> >>> * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New. >> >> Is this needed for r16 and r32? The original report says that only >> r64 is affected: >> >> http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance >> >> Have you tried this on Silvermont? Does it help Silvermont? >> >> -- >> H.J.
patch1
Description: Binary data