Re: [PATCH] Atom: Enabling unroll at O2 optimization level
On Thu, Apr 12, 2012 at 3:16 PM, Richard Guenther wrote: > On Thu, Apr 12, 2012 at 1:05 PM, Igor Zamyatin wrote: >> On Wed, Apr 11, 2012 at 12:39 PM, Richard Guenther >> wrote: >>> On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin wrote: Hi All! Here is a patch that enables unroll at O2 for Atom. This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32 bits) with quite moderate code size increase (~5% for EEMBC2.0, 32 bits). >>> >>> 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, >>> why? Why do you disable register renaming? check_imull requires a function >>> comment. >> >> Sure, enabling unroll for O3 could be the next step. >> We can't avoid code size increase with unroll - what number do you >> think will be appropriate? >> Register renaming was the reason of several degradations during tuning >> process >> Comment for check_imull was added >> >>> >>> This completely looks like a hack for EEMBC2.0, so it's definitely not ok. >> >> Why? EEMBC was measured and result provided here just because this >> benchmark considers to be very relevant for Atom > > I'd say that SPEC INT (2000 / 2006) is more relevant for Atom (SPEC FP > would be irrelevant OTOH). Similar code size for, say, Mozilla Firefox > or GCC itself would be important. > >>> -O2 is not supposed to give best benchmark results. >> >> O2 is wide-used so performance improvement could be important for users. > > But not at a 5% size cost. Please also always check the compile-time effect > which is important for -O2 as well. What would be an acceptable number of size cost/compile-time increase for O2 and O3 on EEMBC, SPEC INT 2000 and Mozilla? Is it possible in common to put Atom-specific unroll heuristics under some option which could be mentioned in GCC docs? > > Richard. > >>> >>> Thanks, >>> Richard. >>> Tested for i386 and x86-64, ok for trunk? >> >> Updated patch attached >> Thanks, Igor ChangeLog: 2012-04-10 Yakovlev Vladimir * gcc/config/i386/i386.c (check_imul): New routine. (ix86_loop_unroll_adjust): New target hook. (ix86_option_override_internal): Enable unrolling on Atom at -O2. (TARGET_LOOP_UNROLL_ADJUST): New define. Thanks, Igor
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
> > So would need much more benchmarking on macro workloads first at least. > > Like what, for example? I believe in this case everything also > strongly depends on test usage model (e.g. it usually compiled with Os > not O2) and, let's say, internal test structure - whether there are > hot loops that suitable for unroll. Normally the compiler doesn't know if a loop is hot unless you use profile feedback. So worst case on a big code base you may end up with a lot of unnecessary unrolling. On cold code it's just wasted bytes, but there could be already icache limited code where it would be worse. How about just a compiler bootstrap on Atom as a "worst case"? For the benchmark can you use profile feedback? BTW I know some loops are unrolled at -O3 by default at tree level because the vectorizer likes it. I actually have an older patch to dial this down for some common cases. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
On Thu, Apr 12, 2012 at 1:05 PM, Igor Zamyatin wrote: > On Wed, Apr 11, 2012 at 12:39 PM, Richard Guenther > wrote: >> On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin wrote: >>> Hi All! >>> >>> Here is a patch that enables unroll at O2 for Atom. >>> >>> This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32 >>> bits) with quite moderate code size increase (~5% for EEMBC2.0, 32 >>> bits). >> >> 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, >> why? Why do you disable register renaming? check_imull requires a function >> comment. > > Sure, enabling unroll for O3 could be the next step. > We can't avoid code size increase with unroll - what number do you > think will be appropriate? > Register renaming was the reason of several degradations during tuning process > Comment for check_imull was added > >> >> This completely looks like a hack for EEMBC2.0, so it's definitely not ok. > > Why? EEMBC was measured and result provided here just because this > benchmark considers to be very relevant for Atom I'd say that SPEC INT (2000 / 2006) is more relevant for Atom (SPEC FP would be irrelevant OTOH). Similar code size for, say, Mozilla Firefox or GCC itself would be important. >> -O2 is not supposed to give best benchmark results. > > O2 is wide-used so performance improvement could be important for users. But not at a 5% size cost. Please also always check the compile-time effect which is important for -O2 as well. Richard. >> >> Thanks, >> Richard. >> >>> >>> Tested for i386 and x86-64, ok for trunk? > > Updated patch attached > >>> >>> Thanks, >>> Igor >>> >>> ChangeLog: >>> >>> 2012-04-10 Yakovlev Vladimir >>> >>> * gcc/config/i386/i386.c (check_imul): New routine. >>> (ix86_loop_unroll_adjust): New target hook. >>> (ix86_option_override_internal): Enable unrolling on Atom at -O2. >>> (TARGET_LOOP_UNROLL_ADJUST): New define.
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
On Wed, Apr 11, 2012 at 12:39 PM, Richard Guenther wrote: > On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin wrote: >> Hi All! >> >> Here is a patch that enables unroll at O2 for Atom. >> >> This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32 >> bits) with quite moderate code size increase (~5% for EEMBC2.0, 32 >> bits). > > 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, > why? Why do you disable register renaming? check_imull requires a function > comment. Sure, enabling unroll for O3 could be the next step. We can't avoid code size increase with unroll - what number do you think will be appropriate? Register renaming was the reason of several degradations during tuning process Comment for check_imull was added > > This completely looks like a hack for EEMBC2.0, so it's definitely not ok. Why? EEMBC was measured and result provided here just because this benchmark considers to be very relevant for Atom > > -O2 is not supposed to give best benchmark results. O2 is wide-used so performance improvement could be important for users. > > Thanks, > Richard. > >> >> Tested for i386 and x86-64, ok for trunk? Updated patch attached >> >> Thanks, >> Igor >> >> ChangeLog: >> >> 2012-04-10 Yakovlev Vladimir >> >> * gcc/config/i386/i386.c (check_imul): New routine. >> (ix86_loop_unroll_adjust): New target hook. >> (ix86_option_override_internal): Enable unrolling on Atom at -O2. >> (TARGET_LOOP_UNROLL_ADJUST): New define. unroll1.patch Description: Binary data
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
On Wed, Apr 11, 2012 at 5:34 PM, Andi Kleen wrote: > Richard Guenther writes: >> >> 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, >> why? Why do you disable register renaming? check_imull requires a function >> comment. >> >> This completely looks like a hack for EEMBC2.0, so it's definitely not ok. >> >> -O2 is not supposed to give best benchmark results. > > Besides it is against the Intel Optimization Manual recommendation > to prefer small code on Atom to avoid falling out of the predecode hints > in the cache. Yes, this is well-known concern for Atom. But in the same time unroll could help a lot for inorder machines because it could provide more opportunities to a compiler scheduler. And experiments showed that unroll could really help. > > So would need much more benchmarking on macro workloads first at least. Like what, for example? I believe in this case everything also strongly depends on test usage model (e.g. it usually compiled with Os not O2) and, let's say, internal test structure - whether there are hot loops that suitable for unroll. > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
Richard Guenther writes: > > 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, > why? Why do you disable register renaming? check_imull requires a function > comment. > > This completely looks like a hack for EEMBC2.0, so it's definitely not ok. > > -O2 is not supposed to give best benchmark results. Besides it is against the Intel Optimization Manual recommendation to prefer small code on Atom to avoid falling out of the predecode hints in the cache. So would need much more benchmarking on macro workloads first at least. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] Atom: Enabling unroll at O2 optimization level
On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin wrote: > Hi All! > > Here is a patch that enables unroll at O2 for Atom. > > This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32 > bits) with quite moderate code size increase (~5% for EEMBC2.0, 32 > bits). 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, why? Why do you disable register renaming? check_imull requires a function comment. This completely looks like a hack for EEMBC2.0, so it's definitely not ok. -O2 is not supposed to give best benchmark results. Thanks, Richard. > > Tested for i386 and x86-64, ok for trunk? > > Thanks, > Igor > > ChangeLog: > > 2012-04-10 Yakovlev Vladimir > > * gcc/config/i386/i386.c (check_imull): New routine. > (ix86_loop_unroll_adjust): New target hook. > (ix86_option_override_internal): Enable unrolling on Atom at -O2. > (TARGET_LOOP_UNROLL_ADJUST): New define.
[PATCH] Atom: Enabling unroll at O2 optimization level
Hi All! Here is a patch that enables unroll at O2 for Atom. This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32 bits) with quite moderate code size increase (~5% for EEMBC2.0, 32 bits). Tested for i386 and x86-64, ok for trunk? Thanks, Igor ChangeLog: 2012-04-10 Yakovlev Vladimir * gcc/config/i386/i386.c (check_imull): New routine. (ix86_loop_unroll_adjust): New target hook. (ix86_option_override_internal): Enable unrolling on Atom at -O2. (TARGET_LOOP_UNROLL_ADJUST): New define. unroll.patch Description: Binary data