https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #16 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kewen Lin <[email protected]>:

https://gcc.gnu.org/g:c776dcd5f868a16661b86842916493b531988d1e

commit r17-258-gc776dcd5f868a16661b86842916493b531988d1e
Author: Kewen Lin <[email protected]>
Date:   Fri May 1 13:50:57 2026 +0000

    i386: Adjust some c86-4g*.md modeling to reduce build time

    Commit r17-203 caused significant increase in GCC build time
    on several environments as folks reported, mainly due to
    excessively long execution time of genautomata.

    As Alexander pointed out, the current division modeling in
    c86-4g*.md can cause a combinatorial explosion in the
    automaton, that further leads to significant build time
    increase.

    Following Alexander's suggestion, this patch introduces the
    dedicated automatons and cpu_units for idiv and fdiv, uses
    them to updates the integer, floating point division and
    square root modeling for now.  Some evaluated statistics
    are listed below.

    With r17-202:

        *Tested stage-1 i686 build -j 32: 255 seconds*

        $ nm -CS -t d --defined-only gcc/insn-automata.o \
              | sed 's/^[0-9]* 0*//' \
              | sort -n | tail -20
            13896 r slm_transitions
            15360 r znver4_fp_store_transitions
            16760 r znver4_ieu_transitions
            17776 r bdver1_ieu_transitions
            20068 r bdver1_fp_check
            20068 r bdver1_fp_transitions
            20983 t internal_state_transition(int, DFA_chip*)
            22270 t internal_min_issue_delay(int, DFA_chip*)
            26208 r slm_min_issue_delay
            27244 r bdver1_fp_min_issue_delay
            28518 r glm_check
            28518 r glm_transitions
            33690 r geode_min_issue_delay
            45436 r znver4_fpu_min_issue_delay
            46980 r bdver3_fp_min_issue_delay
            49428 r glm_min_issue_delay
            53730 r btver2_fp_min_issue_delay
            53760 r znver1_fp_transitions
            93960 r bdver3_fp_transitions
            181744 r znver4_fpu_transitions

    With culprit commit r17-203:

        *Tested stage-1 i686 build -j 32: 949 seconds*

            $ nm -CS -t d --defined-only gcc/insn-automata.o \
              | sed 's/^[0-9]* 0*//' \
              | sort -n | tail -20
            28518 r glm_check
            28518 r glm_transitions
            33690 r geode_min_issue_delay
            45436 r znver4_fpu_min_issue_delay
            46980 r bdver3_fp_min_issue_delay
            49428 r glm_min_issue_delay
            53730 r btver2_fp_min_issue_delay
            53760 r znver1_fp_transitions
            68160 r c86_4g_ieu_min_issue_delay
            93960 r bdver3_fp_transitions
            110080 r c86_4g_fp_min_issue_delay
            136320 r c86_4g_ieu_transitions
            181744 r znver4_fpu_transitions
            220160 r c86_4g_fp_transitions
            262988 r c86_4g_m7_fpu_base
            475225 r c86_4g_m7_ieu_min_issue_delay
            950450 r c86_4g_m7_ieu_transitions
            4010567 r c86_4g_m7_fpu_min_issue_delay
            5496908 r c86_4g_m7_fpu_check
            5496908 r c86_4g_m7_fpu_transitions

    With this patch:

        *Tested stage-1 i686 build -j 32: 257 seconds*

            $ nm -CS -t d --defined-only gcc/insn-automata.o \
              | sed 's/^[0-9]* 0*//' \
              | sort -n | tail -20

            20068 r bdver1_fp_transitions
            22354 r c86_4g_m7_ieu_min_issue_delay
            25705 t internal_state_transition(int, DFA_chip*)
            26208 r slm_min_issue_delay
            27164 t internal_min_issue_delay(int, DFA_chip*)
            27244 r bdver1_fp_min_issue_delay
            28518 r glm_check
            28518 r glm_transitions
            33690 r geode_min_issue_delay
            33728 r c86_4g_fp_transitions
            45436 r znver4_fpu_min_issue_delay
            46980 r bdver3_fp_min_issue_delay
            49428 r glm_min_issue_delay
            53730 r btver2_fp_min_issue_delay
            53760 r znver1_fp_transitions
            89414 r c86_4g_m7_ieu_transitions
            93960 r bdver3_fp_transitions
            181744 r znver4_fpu_transitions
            326322 r c86_4g_m7_fpu_min_issue_delay
            1305288 r c86_4g_m7_fpu_transitions

    I noticed the number of c86_4g_m7_fpu_transitions is still
    large, but this patch can address the build time issue.
    To avoid impacting folks' daily builds and regular testings,
    I'd like to land this patch first if possible.  We can then further
    refine the c86-4g modeling and investigate large transition
    count as part of the follow-up work, even potentially part
    of PR 87832.

    gcc/ChangeLog:

            * config/i386/c86-4g-m7.md (c86_4g_m7_idiv): New automaton.
            (c86_4g_m7_fdiv): Ditto.
            (c86-4g-m7-idiv): New unit.
            (c86-4g-m7-fdiv): Ditto.
            (c86_4g_m7_idiv_DI): Adjust unit in the reservation.
            (c86_4g_m7_idiv_SI): Ditto.
            (c86_4g_m7_idiv_HI): Ditto.
            (c86_4g_m7_idiv_QI): Ditto.
            (c86_4g_m7_idiv_DI_load): Ditto.
            (c86_4g_m7_idiv_SI_load): Ditto.
            (c86_4g_m7_idiv_HI_load): Ditto.
            (c86_4g_m7_idiv_QI_load): Ditto.
            (c86_4g_m7_fp_div): Ditto.
            (c86_4g_m7_fp_div_load): Ditto.
            (c86_4g_m7_fp_idiv_load): Ditto.
            (c86_4g_m7_avx512_ssediv): Ditto.
            (c86_4g_m7_avx512_ssediv_mem): Ditto.
            (c86_4g_m7_avx512_ssediv_z): Ditto.
            (c86_4g_m7_avx512_ssediv_zmem): Ditto.
            (c86_4g_m7_avx512_sse_sqrt): Ditto.
            (c86_4g_m7_avx512_sse_sqrt_load): Ditto.
            (c86_4g_m7_fp_sqrt): Ditto.  Rename from ...
            (c86_4g_m7fp_sqrt): ... here.
            * config/i386/c86-4g.md (c86_4g_idiv): New automaton.
            (c86_4g_fdiv): Ditto.
            (c86-4g-idiv): New unit.
            (c86-4g-fdiv): Ditto.
            (c86_4g_idiv_DI): Ditto.
            (c86_4g_idiv_SI): Ditto.
            (c86_4g_idiv_HI): Ditto.
            (c86_4g_idiv_QI): Ditto.
            (c86_4g_idiv_mem_DI): Ditto.
            (c86_4g_idiv_mem_SI): Ditto.
            (c86_4g_idiv_mem_HI): Ditto.
            (c86_4g_idiv_mem_QI): Ditto.
            (c86_4g_fp_sqrt): Ditto.
            (c86_4g_sse_sqrt_sf): Ditto.
            (c86_4g_sse_sqrt_sf_mem): Ditto.
            (c86_4g_sse_sqrt_df): Ditto.
            (c86_4g_sse_sqrt_df_mem): Ditto.
            (c86_4g_fp_op_div): Ditto.
            (c86_4g_fp_op_div_load): Ditto.
            (c86_4g_fp_op_idiv_load): Ditto.
            (c86_4g_ssediv_ss_ps): Ditto.
            (c86_4g_ssediv_ss_ps_load): Ditto.
            (c86_4g_ssediv_ss_pd): Ditto.
            (c86_4g_ssediv_ss_pd_load): Ditto.
            (c86_4g_ssediv_avx256_ps): Ditto.
            (c86_4g_ssediv_avx256_ps_load): Ditto.
            (c86_4g_ssediv_avx256_pd): Ditto.
            (c86_4g_ssediv_avx256_pd_load): Ditto.

    Signed-off-by: Kewen Lin <[email protected]>

Reply via email to