On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka <hubi...@ucw.cz> wrote: > Hi, > this is second part of the generic tuning changes sanityzing the tuning flags. > This patch again is supposed to deal with the "obvious" part only. > I will send separate patch for more changes. > > The flags changed agree on all CPUs considered for generic (and their > optimization manuals) + amdfam10, core2 and Atom SLM. > > I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it > seems like obvious omision (after double checking in optimization manual) and > droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores. Implementation of this > feature was always bit weird and its main purpose was to avoid terrible branch > predictor degeneration on the older AMD branch predictors. I benchmarked both > spec2k and 2k6 to verify there are no regression. > > Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements > in specfp > benchmarks. > > Bootstrapped/regtested x86_64-linux, will wait for comments and commit it > during weekend. I will be happy to revisit any of the generic tuning if > regressions pop up. > > Overall this patch also brings small code size improvements for smaller > loads/stores and less padding at -O2. Differences are sub 0.1% however. > > Honza > * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for > generic. > (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. > (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer. > (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips.
Can we drop generic on X86_TUNE_PAD_RETURNS? > (X86_TUNE_AVOID_VECTOR_DECODE): Drop for generic. > (X86_TUNE_REASSOC_FP_TO_PARALLEL): Enable for generic. -- H.J.