Re: [patch] Split parts of cse_insn out to a few new functions
On Mon, Mar 26, 2012 at 9:02 PM, Richard Sandiford wrote: >> * cse.c (cse_canonicalized_basic_blocks): New simple bitmap to >> tag basic blocks that have already been traversed at least once, >> so that all insns have been canonicalized. >> (cse_insn): Call canonicalize_insn only if the basic block that >> contains insn is visited for the first time. >> (cse_extended_basic_block): After visiting all insns in a basic >> block, mark the block in cse_canonicalized_basic_blocks. >> (cse_main): Setup and destroy cse_canonicalized_basic_blocks. > > OK, thanks (without the microoptimisation, as you say). Thanks, will commit! > Out of curiosity, do you still see this bit as useful: > > /* We potentially will process this insn many times. Therefore, > drop the REG_EQUAL note if it is equal to the SET_SRC of the > unique set in INSN. > > Do not do so if the REG_EQUAL note is for a STRICT_LOW_PART, > because cse_insn handles those specially. */ > > ? Does "many times" mean in CSE, or later? It means "in CSE", yes. But with the patch to canonicalize only once, I suppose this can go away again. Having said that, I do believe it would be good if we avoid having REG_EQUAL notes that are rtx_equal_p to the SET_SRC. This happens quite frequently after CSE. I'm not sure how to clean them up. Ciao! Steven
Re: [patch] Split parts of cse_insn out to a few new functions
Steven Bosscher writes: > On Wed, Mar 21, 2012 at 1:13 AM, Ian Lance Taylor wrote: >> On Tue, Mar 20, 2012 at 2:06 PM, Steven Bosscher wrote: >>> >>> This patch splits a couple of pieces of cse_insn out to new functions. >>> There are no functional changes, and no code generation differences as >>> far as I could tell on x86_64 (-m64 and -m32). > > Likewise for the attached patch. > >>> The purpose of the patch is and, loto hopefully make cse_insn easier >>> to understand. In a follow-up patch, I will make canonicalize_insn run >>> only once per insn (it currently, i.e. before and after this patch, >>> runs multiple times for CSE on extended basic blocks if a block is in >>> multiple extended basic blocks). > > That is what the attached patch does. > > Bootstrapped&tested on x86_64-unknown-linux-gnu. > OK for trunk? > > Ciao! > Steven > > * cse.c (cse_canonicalized_basic_blocks): New simple bitmap to > tag basic blocks that have already been traversed at least once, > so that all insns have been canonicalized. > (cse_insn): Call canonicalize_insn only if the basic block that > contains insn is visited for the first time. > (cse_extended_basic_block): After visiting all insns in a basic > block, mark the block in cse_canonicalized_basic_blocks. > (cse_main): Setup and destroy cse_canonicalized_basic_blocks. OK, thanks (without the microoptimisation, as you say). Out of curiosity, do you still see this bit as useful: /* We potentially will process this insn many times. Therefore, drop the REG_EQUAL note if it is equal to the SET_SRC of the unique set in INSN. Do not do so if the REG_EQUAL note is for a STRICT_LOW_PART, because cse_insn handles those specially. */ ? Does "many times" mean in CSE, or later? Richard
Re: [patch] Split parts of cse_insn out to a few new functions
On Thu, Mar 22, 2012 at 12:09 AM, Steven Bosscher wrote: > (cse_find_path): Micro-optimization, reorder one condition to > avoid a reference to cfun. Ah, and please ignore this bit. I don't know what I was thinking...
Re: [patch] Split parts of cse_insn out to a few new functions
On Wed, Mar 21, 2012 at 1:13 AM, Ian Lance Taylor wrote: > On Tue, Mar 20, 2012 at 2:06 PM, Steven Bosscher wrote: >> >> This patch splits a couple of pieces of cse_insn out to new functions. >> There are no functional changes, and no code generation differences as >> far as I could tell on x86_64 (-m64 and -m32). Likewise for the attached patch. >> The purpose of the patch is and, loto hopefully make cse_insn easier >> to understand. In a follow-up patch, I will make canonicalize_insn run >> only once per insn (it currently, i.e. before and after this patch, >> runs multiple times for CSE on extended basic blocks if a block is in >> multiple extended basic blocks). That is what the attached patch does. Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk? Ciao! Steven * cse.c (cse_canonicalized_basic_blocks): New simple bitmap to tag basic blocks that have already been traversed at least once, so that all insns have been canonicalized. (cse_insn): Call canonicalize_insn only if the basic block that contains insn is visited for the first time. (cse_extended_basic_block): After visiting all insns in a basic block, mark the block in cse_canonicalized_basic_blocks. (cse_main): Setup and destroy cse_canonicalized_basic_blocks. (cse_find_path): Micro-optimization, reorder one condition to avoid a reference to cfun. * cse.c (cse_canonicalized_basic_blocks): New simple bitmap to tag basic blocks that have already been traversed at least once, so that all insns have been canonicalized. (cse_insn): Call canonicalize_insn only if the basic block that contains insn is visited for the first time. (cse_extended_basic_block): After visiting all insns in a basic block, mark the block in cse_canonicalized_basic_blocks. (cse_main): Setup and destroy cse_canonicalized_basic_blocks. (cse_find_path): Micro-optimization, reorder one condition to avoid a reference to cfun. Index: cse.c === --- cse.c (revision 185622) +++ cse.c (working copy) @@ -551,6 +551,10 @@ static bitmap cse_ebb_live_in, cse_ebb_l already as part of an already processed extended basic block. */ static sbitmap cse_visited_basic_blocks; +/* A simple bitmap to track for which basic blocks all insns have been + canonicalized already. */ +static sbitmap cse_canonicalized_basic_blocks; + static bool fixed_base_plus_p (rtx x); static int notreg_cost (rtx, enum rtx_code, int); static int approx_reg_cost_1 (rtx *, void *); @@ -4492,8 +4496,10 @@ cse_insn (rtx insn) /* Record all the SETs in this instruction. */ n_sets = find_sets_in_insn (insn, &sets); - /* Substitute the canonical register where possible. */ - canonicalize_insn (insn, &sets, n_sets); + /* If we have not visited this block before (as part of another extended + basic block, substitute the canonical register where possible. */ + if (!TEST_BIT (cse_canonicalized_basic_blocks, BLOCK_FOR_INSN (insn)->index)) +canonicalize_insn (insn, &sets, n_sets); /* If this insn has a REG_EQUAL note, store the equivalent value in SRC_EQV, if different, or if the DEST is a STRICT_LOW_PART. The latter condition @@ -6254,10 +6260,9 @@ cse_find_path (basic_block first_bb, str else e = NULL; - if (e - && !((e->flags & EDGE_ABNORMAL_CALL) && cfun->has_nonlocal_label) - && e->dest != EXIT_BLOCK_PTR + if (e && e->dest != EXIT_BLOCK_PTR && single_pred_p (e->dest) + && !((e->flags & EDGE_ABNORMAL_CALL) && cfun->has_nonlocal_label) /* Avoid visiting basic blocks twice. The large comment above explains why this can happen. */ && !TEST_BIT (cse_visited_basic_blocks, e->dest->index)) @@ -6452,6 +6457,9 @@ cse_extended_basic_block (struct cse_bas } } + /* We have now canonicalized all insns in this basic block. */ + SET_BIT (cse_canonicalized_basic_blocks, bb->index); + /* With non-call exceptions, we are not always able to update the CFG properly inside cse_insn. So clean up possibly redundant EH edges here. */ @@ -6555,6 +6563,10 @@ cse_main (rtx f ATTRIBUTE_UNUSED, int nr cse_visited_basic_blocks = sbitmap_alloc (last_basic_block); sbitmap_zero (cse_visited_basic_blocks); + /* Set up the table of already canonicalized basic blocks. */ + cse_canonicalized_basic_blocks = sbitmap_alloc (last_basic_block); + sbitmap_zero (cse_canonicalized_basic_blocks); + /* Loop over basic blocks in reverse completion order (RPO), excluding the ENTRY and EXIT blocks. */ n_blocks = pre_and_rev_post_order_compute (NULL, rc_order, false); @@ -6598,6 +6610,7 @@ cse_main (rtx f ATTRIBUTE_UNUSED, int nr free (reg_eqv_table); free (ebb_data.path); sbitmap_free (cse_visited_basic_blocks); + sbitmap_free (cse_canonicalized_basic_blocks); free (rc_order); rtl_hooks = general_rtl_hooks;
Re: [patch] Split parts of cse_insn out to a few new functions
On Tue, Mar 20, 2012 at 2:06 PM, Steven Bosscher wrote: > > This patch splits a couple of pieces of cse_insn out to new functions. > There are no functional changes, and no code generation differences as > far as I could tell on x86_64 (-m64 and -m32). > > The purpose of the patch is and, loto hopefully make cse_insn easier > to understand. In a follow-up patch, I will make canonicalize_insn run > only once per insn (it currently, i.e. before and after this patch, > runs multiple times for CSE on extended basic blocks if a block is in > multiple extended basic blocks). This is OK. Thanks. Ian
[patch] Split parts of cse_insn out to a few new functions
Hello, This patch splits a couple of pieces of cse_insn out to new functions. There are no functional changes, and no code generation differences as far as I could tell on x86_64 (-m64 and -m32). The purpose of the patch is and, loto hopefully make cse_insn easier to understand. In a follow-up patch, I will make canonicalize_insn run only once per insn (it currently, i.e. before and after this patch, runs multiple times for CSE on extended basic blocks if a block is in multiple extended basic blocks). Bootstrapped & tested on x86_64-unknown-linux-gnu. OK for trunk? Ciao! Steven * cse.c (invalidate_from_sets_and_clobbers, try_back_substitute_reg, find_sets_in_insn, canonicalize_insn): Split out from ... (cse_insn): ... here. (invalidate_from_clobbers): Take an insn instead of the pattern. Index: cse.c === --- cse.c (revision 185515) +++ cse.c (working copy) @@ -597,6 +597,7 @@ static void record_jump_cond (enum rtx_c static void cse_insn (rtx); static void cse_prescan_path (struct cse_basic_block_data *); static void invalidate_from_clobbers (rtx); +static void invalidate_from_sets_and_clobbers (rtx); static rtx cse_process_notes (rtx, rtx, bool *); static void cse_extended_basic_block (struct cse_basic_block_data *); static void count_reg_usage (rtx, int *, rtx, int); @@ -4089,10 +4090,22 @@ record_jump_cond (enum rtx_code code, en } /* CSE processing for one instruction. - First simplify sources and addresses of all assignments - in the instruction, using previously-computed equivalents values. - Then install the new sources and destinations in the table - of available values. */ + + Most "true" common subexpressions are mostly optimized away in GIMPLE, + but the few that "leak through" are cleaned up by cse_insn, and complex + addressing modes are often formed here. + + The main function is cse_insn, and between here and that function + a couple of helper functions is defined to keep the size of cse_insn + within reasonable proportions. + + Data is shared between the main and helper functions via STRUCT SET, + that contains all data related for every set in the instruction that + is being processed. + + Note that cse_main processes all sets in the instruction. Most + passes in GCC only process simple SET insns or single_set insns, but + CSE processes insns with multiple sets as well. */ /* Data on one SET contained in the instruction. */ @@ -4128,50 +4141,93 @@ struct set /* Table entry for the destination address. */ struct table_elt *dest_addr_elt; }; + +/* Special handling for (set REG0 REG1) where REG0 is the + "cheapest", cheaper than REG1. After cse, REG1 will probably not + be used in the sequel, so (if easily done) change this insn to + (set REG1 REG0) and replace REG1 with REG0 in the previous insn + that computed their value. Then REG1 will become a dead store + and won't cloud the situation for later optimizations. + + Do not make this change if REG1 is a hard register, because it will + then be used in the sequel and we may be changing a two-operand insn + into a three-operand insn. + + This is the last transformation that cse_insn will try to do. */ static void -cse_insn (rtx insn) +try_back_substitute_reg (rtx set, rtx insn) { - rtx x = PATTERN (insn); - int i; - rtx tem; - int n_sets = 0; + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); - rtx src_eqv = 0; - struct table_elt *src_eqv_elt = 0; - int src_eqv_volatile = 0; - int src_eqv_in_memory = 0; - unsigned src_eqv_hash = 0; + if (REG_P (dest) + && REG_P (src) && ! HARD_REGISTER_P (src) + && REGNO_QTY_VALID_P (REGNO (src))) +{ + int src_q = REG_QTY (REGNO (src)); + struct qty_table_elem *src_ent = &qty_table[src_q]; - struct set *sets = (struct set *) 0; + if (src_ent->first_reg == REGNO (dest)) + { + /* Scan for the previous nonnote insn, but stop at a basic +block boundary. */ + rtx prev = insn; + rtx bb_head = BB_HEAD (BLOCK_FOR_INSN (insn)); + do + { + prev = PREV_INSN (prev); + } + while (prev != bb_head && (NOTE_P (prev) || DEBUG_INSN_P (prev))); - this_insn = insn; -#ifdef HAVE_cc0 - /* Records what this insn does to set CC0. */ - this_insn_cc0 = 0; - this_insn_cc0_mode = VOIDmode; -#endif + /* Do not swap the registers around if the previous instruction +attaches a REG_EQUIV note to REG1. - /* Find all the SETs and CLOBBERs in this instruction. - Record all the SETs in the array `set' and count them. - Also determine whether there is a CLOBBER that invalidates - all memory references, or all references at varying addresses. */ +??? It's not entirely clear whether we can transfer a REG_EQUIV +from the pseudo that origina