On Wed, Sep 18, 2013 at 1:39 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
> Hi,
> when generic model was introduced, the 32bit only CPUs was still common on the
> market.  It would be stupid to tune 64bit code for CPUs that will never run 
> it.
> We thus introduced two models - generic32 that was considering needs
> of 32bit cpus (centrinos in particular) and generic64 that didn't.
>
>  /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
>     Athlon and K8.  */
>  /* Generic64 should produce code tuned for Nocona and K8.  */
>
> Was original definitions that are still in the source.
>
> Today the 32bit only CPUs are no longer important.  This patch thus
> drops 32bit generic.  This has effect of droping the following flags
> for generic at -m32:
>  use_leave, avoid_vector_decode, slow_imul_imm32_mem, slow_imul_imm8
> that are currently enabled for generic64 only.  This was to accomodate
> earlier AMD chips that are no longer relevant too.
>
> I also updated comment:
> ! /* Generic64 should produce code tuned for Nocona and K8.  */
> to:
> ! /* Generic should produce code tuned for Core-i7 (and newer chips)
> !    and btver1 (and newer chips).  */
> This is what I think generic represents today (it also fares swell on earlier
> cores and amdfam10, but we probably don't want to get too limited by these
> anymore).

This sounds good to me.

> I would like to proceed with modernization of generic64 - in particular
> to switch it to 4 issue scheduling model and revisit individual flags
> incrementally.
>
> Bootstrapped/regtested x86_64-linux, will commit it tomorrow if there
> are no complains.
>
> Honza
>
>         * i386.h (TARGET_GENERIC32, TARGET_GENERIC64): Remove.
>         (TARGET_GENERIC): Use PROCESOR_GENERIC
>         (enum processor_type): Unify generic32 and 64.
>         * i386.md (cpu): Likewise.
>         * x86-tune.def (use_leave): Enable for generic32.
>         (avoid_vector_decode, slow_imul_imm32_mem, slow_imul_imm8): Likewise.
>         * athlon.md: Change generic64 to generic in all occurences.
>         * i386-c.c (ix86_target_macros_internal): Unify generic64 and 32.
>         (ix86_target_macros_internal): Likewise.
>         * driver-i386.c (host_detect_local_cpu): Likewise.
>         * i386.c (generic64_memcpy, generic64_memset, generic64_cost): Rename 
> to ..
>         (generic_memcpy, generic_memset, generic_cost): This one.
>         (generic32_memcpy, generic32_memset, generic32_cost): Remove.
>         (m_GENERIC32, m_GENERIC64): Remove.
>         (m_GENERIC): Turn into one flag.
>         (processor_target): Unify generic tunnings.
>         (ix86_option_override_internal): Replace generic32/64 by generic.
>         (ix86_issue_rate): Likewise.
>         (ix86_adjust_cost): Likewise.


> *************** static const struct ptt processor_target
> *** 2384,2391 ****
>     {&core_cost, 16, 10, 16, 10, 16},
>     /* Core avx2  */
>     {&core_cost, 16, 10, 16, 10, 16},
> !   {&generic32_cost, 16, 7, 16, 7, 16},
> !   {&generic64_cost, 16, 10, 16, 10, 16},
>     {&amdfam10_cost, 32, 24, 32, 7, 32},
>     {&bdver1_cost, 16, 10, 16, 7, 11},
>     {&bdver2_cost, 16, 10, 16, 7, 11},
> --- 2303,2309 ----
>     {&core_cost, 16, 10, 16, 10, 16},
>     /* Core avx2  */
>     {&core_cost, 16, 10, 16, 10, 16},
> !   {&generic_cost, 16, 10, 16, 10, 16},
>     {&amdfam10_cost, 32, 24, 32, 7, 32},
>     {&bdver1_cost, 16, 10, 16, 7, 11},
>     {&bdver2_cost, 16, 10, 16, 7, 11},

I did some experiment with code alignment. I found
-fno-align-loops -fno-align-functions -fno-align-jumps
had no negative performance impacts on current
Intel processors while reducing code sizes by 1-2%.
Should we use

{&generic_cost, 0, 0, 0, 0, 0},

instead?

Thanks.

-- 
H.J.

Reply via email to