On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
> Hi,
> this is second part of the generic tuning changes sanityzing the tuning flags.
> This patch again is supposed to deal with the "obvious" part only.
> I will send separate patch for more changes.
>
> The flags changed agree on all CPUs considered for generic (and their
> optimization manuals) + amdfam10, core2 and Atom SLM.
>
> I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it
> seems like obvious omision (after double checking in optimization manual) and
> droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores.  Implementation of this
> feature was always bit weird and its main purpose was to avoid terrible branch
> predictor degeneration on the older AMD branch predictors. I benchmarked both
> spec2k and 2k6 to verify there are no regression.
>
> Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements 
> in specfp
> benchmarks.
>
> Bootstrapped/regtested x86_64-linux, will wait for comments and commit it
> during weekend.  I will be happy to revisit any of the generic tuning if
> regressions pop up.
>
> Overall this patch also brings small code size improvements for smaller
> loads/stores and less padding at -O2. Differences are sub 0.1% however.
>
> Honza
>         * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for 
> generic.
>         (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
>         (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer.
>         (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips.

Can we drop generic on X86_TUNE_PAD_RETURNS?

>         (X86_TUNE_AVOID_VECTOR_DECODE): Drop for generic.
>         (X86_TUNE_REASSOC_FP_TO_PARALLEL): Enable for generic.


-- 
H.J.

Reply via email to