On Tue, Jun 23, 2026 at 10:14 AM Ciprian Arbone <[email protected]> wrote: > > On Wed, 17 Jun 2026 at 14:34, Richard Biener <[email protected]> > wrote: > > > > On Wed, Jun 17, 2026 at 12:21 PM Ciprian Arbone <[email protected]> > > wrote: > > > > > > On Wed, 17 Jun 2026 at 11:50, Richard Biener <[email protected]> > > > wrote: > > > > > > > > On Wed, Jun 17, 2026 at 10:48 AM Ciprian Arbone > > > > <[email protected]> wrote: > > > > > > > > > > On Wed, 17 Jun 2026 at 11:36, Richard Biener > > > > > <[email protected]> wrote: > > > > > > > > > > > > On Wed, Jun 17, 2026 at 10:21 AM Ciprian Arbone > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > On Tue, 16 Jun 2026 at 16:48, Richard Biener > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > On Fri, Jun 12, 2026 at 3:31 PM Ciprian Arbone via Gcc > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > We recently enabled LDMIA/STMIA instructions for Thumb-1 > > > > > > > > > (Cortex-M0+) by > > > > > > > > > modifying ARM_AUTOINC_VALID_FOR_MODE_P to allow > > > > > > > > > auto-increment addressing > > > > > > > > > for THUMB1 targets. However, we've discovered that IVOPTs > > > > > > > > > generates > > > > > > > > > suboptimal code for simple loops due to incorrect addressing > > > > > > > > > mode selection. > > > > > > > > > > > > > > > > > > Consider this test case: > > > > > > > > > > > > > > > > > > void test(int *a, int *b, int size) > > > > > > > > > { > > > > > > > > > for (int i = 0; i < size; i++) > > > > > > > > > { > > > > > > > > > a[i] = b[i] * a[i]; > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > > > > > > > > > GCC currently generates: > > > > > > > > > > > > > > > > > > ldmia r0!, {r4} > > > > > > > > > ldmia r1!, {r6} > > > > > > > > > subs r5, r0, #4 > > > > > > > > > ... > > > > > > > > > str r4, [r5, #0] > > > > > > > > > > > > > > > > > > The issue occurs because IVOPTs selects a candidate with the > > > > > > > > > lowest cost that > > > > > > > > > has the following structure: > > > > > > > > > > > > > > > > > > Candidate xxx: > > > > > > > > > Incr POS: after use 0 > > > > > > > > > IV struct: > > > > > > > > > Type: unsigned int > > > > > > > > > Base: (unsigned int) a_13(D) > > > > > > > > > Step: 4 > > > > > > > > > > > > > > > > > > This results in the following loop structure: > > > > > > > > > > > > > > > > > > loop-preheader: > > > > > > > > > r0 = a > > > > > > > > > jump loop-exiting > > > > > > > > > > > > > > > > > > loop-header: > > > > > > > > > load-from [r0] > > > > > > > > > increment r0 > > > > > > > > > store-to [r0, #-4] > > > > > > > > > > > > > > > > > > loop-exiting: > > > > > > > > > jump loop-header > > > > > > > > > > > > > > > > > > **Issue 1:** IVOPTs recognizes both patterns as valid > > > > > > > > > post-increment with > > > > > > > > > offset zero: > > > > > > > > > - "load-from [r0]; increment r0" → recognized as post-inc > > > > > > > > > from offset 0 > > > > > > > > > - "increment r0; store-to [r0, #-4]" → also recognized as > > > > > > > > > post-inc from > > > > > > > > > offset 0 > > > > > > > > > > > > > > > > > > The code in tree-ssa-loop-ivopts.cc:get_address_cost() > > > > > > > > > applies the adjustment: > > > > > > > > > > > > > > > > > > if (stmt_after_increment (data->current_loop, cand, > > > > > > > > > use->stmt)) > > > > > > > > > ainc_offset += ainc_step; > > > > > > > > > cost = get_address_cost_ainc (ainc_step, ainc_offset, > > > > > > > > > addr_mode, mem_mode, as, > > > > > > > > > speed); > > > > > > > > > > > > > > > > > > However, Thumb-1 does not support negative immediate offsets > > > > > > > > > in addressing > > > > > > > > > modes. The pattern "increment r0; store-to [r0, #-4]" can > > > > > > > > > never be realized > > > > > > > > > as a post-increment store on Thumb-1, yet IVOPTs assigns it a > > > > > > > > > low cost. > > > > > > > > > > > > > > > > > > **Question 1:** Should get_address_cost() verify that an > > > > > > > > > addressing mode is > > > > > > > > > actually valid on the target before assigning auto-increment > > > > > > > > > cost? Currently, > > > > > > > > > it appears to assume validity without checking target > > > > > > > > > constraints. > > > > > > > > > > > > > > > > It should end up calling the legitimize_address_p target hook > > > > > > > > to verify > > > > > > > > validity. > > > > > > > > > > > > > > The legitimize_address_p target hook is invoked, but relatively > > > > > > > late—during the GIMPLE-to-RTL conversion. > > > > > > > At that point, after ivopts, the GIMPLE already contains code > > > > > > > that may > > > > > > > not be optimal. > > > > > > > > > > > > IVOPTs checks valid_mem_ref_p which calls > > > > > > memory_address_addr_space_p > > > > > > with artificially generated RTL and that calls legitimate_address_p. > > > > > > > > > > Yes, it does: but only if get_address_cost_ainc returns infinite_cost, > > > > > meaning that cand and use > > > > > cannot be paired to form an auto-modifying address expression. > > > > > > > > I don't quite understand, but ainc forming does not check validity > > > > that's > > > > what needs to be fixed. > > > > > > > > > > My concern is specifically about the cost model in ivopts. In a loop like: > > > > > > for (;;) > > > a[i] = a[i] * <expr>; > > > > > > ivopts may consider alternatives such as: > > > > > > // r0 = a > > > load-from [r0, #0] > > > add r0, #4 > > > store-to [r0, #-4] > > > > > > and: > > > > > > // r0 = a > > > load-from [r0, #0] > > > store-to [r0, #0] > > > add r0, #4 > > > > > > These appear to be treated as having equal cost. However, in the first > > > case, performing the increment after the load can lead to an invalid > > > addressing mode for the store ([r0, #-4]), which then has to be fixed > > > up later by splitting it into multiple instructions. In practice, this > > > makes the first alternative more expensive. > > > > As said, IVOPTs is supposed to see that it's using a -4 constant offset > > and reject that if this is not valid. If that doesn't work for whatever > > reason then that's a bug. > > > > Richard. > > > > Hi Richard, > > I believe IVOPTS is not recognizing that a constant offset of -4 is > being used, because once it is combined with the increment step, it > results in a valid effective increment offset of zero: > > if (stmt_after_increment (data->current_loop, cand, use->stmt)) > ainc_offset += ainc_step; > cost = get_address_cost_ainc (ainc_step, ainc_offset, > addr_mode, mem_mode, as, speed); > > I tried to work around this by: > > +static bool > +autoinc_possible_for_pair (struct ivopts_data *data, struct iv_use *use, > + struct iv_cand *cand, aff_tree *aff_inv) > +{ > + /* For IP_NORMAL position, verify the increment and use are in the same > basic > + * block */ > + if (cand->pos == IP_NORMAL) > + { > + gimple_stmt_iterator incr_pos > + = gsi_last_bb (ip_normal_pos (data->current_loop)); > + if (gimple_bb (use->stmt)->index != gsi_bb (incr_pos)->index) > + return false; > + } > + > + if (!aff_combination_zero_p (aff_inv)) > + { > + struct mem_address addr > + = {.symbol = NULL_TREE, > + .base = integer_one_node, > + .index = NULL_TREE, > + .step = NULL_TREE, > + .offset = wide_int_to_tree (sizetype, aff_inv->offset)}; > + machine_mode mem_mode = TYPE_MODE (use->mem_type); > + addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (use->iv->base)); > + return valid_mem_ref_p (mem_mode, as, &addr); > + } > + > + return true; > +} > + > /* Return cost of computing USE's address expression by using CAND. > AFF_INV and AFF_VAR represent invariant and variant parts of the > address expression, respectively. If AFF_INV is simple, store > @@ -4743,6 +4773,7 @@ get_address_cost (struct ivopts_data *data, > struct iv_use *use, > { > poly_int64 ainc_step; > if (can_autoinc > + && autoinc_possible_for_pair (data, use, cand, aff_inv) > && ratio == 1 > && ptrdiff_tree_p (cand->iv->step, &ainc_step)) > { > > However, I am not entirely certain that this is the correct approach.
Hmm, not exactly sure. But the place you are patching at least shows that we do not verify mem-ref validity for it. Is this tracked with a bugzilla report and a testcase btw? Thanks, Richard. > Thanks, > Ciprian. > > > > > > > > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > > _24 = (void *) ivtmp.15_20; > > > > > > > _6 = MEM[(int *)_24]; > > > > > > > ivtmp.15_21 = ivtmp.15_20 + 4; > > > > > > > > > > > > > > . . . > > > > > > > > > > > > > > _25 = (void *) ivtmp.15_21; > > > > > > > MEM[(int *)_25 + 4294967292 * 1] = _7; <------ > > > > > > > > > > > > > > Would it make sense to check whether the addressing mode is valid > > > > > > > at this point > > > > > > > (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4732), > > > > > > > similar to the validation performed later > > > > > > > (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4753)? > > > > > > > > > > > > > > > > > > > > > > > > **Issue 2:** IVOPTs also assigns low cost to another > > > > > > > > > candidate: > > > > > > > > > > > > > > > > > > Candidate yyy: > > > > > > > > > Incr POS: before exit test > > > > > > > > > IV struct: > > > > > > > > > Type: unsigned int > > > > > > > > > Base: (unsigned int) a_13(D) > > > > > > > > > Step: 4 > > > > > > > > > > > > > > > > > > This produces: > > > > > > > > > > > > > > > > > > loop-preheader: > > > > > > > > > r0 = &a[0] > > > > > > > > > jump loop-exiting > > > > > > > > > > > > > > > > > > loop-header: > > > > > > > > > load-from [r0, #-4] > > > > > > > > > store-to [r0] > > > > > > > > > > > > > > > > > > loop-exiting: > > > > > > > > > increment r0 > > > > > > > > > jump loop-header > > > > > > > > > > > > > > > > > > IVOPTs considers that the increment in the loop-exiting block > > > > > > > > > can be paired > > > > > > > > > with "load-from [r0, #-4]" in the loop-header block, despite > > > > > > > > > them being in > > > > > > > > > different basic blocks. > > > > > > > > > > > > > > > > > > **Question 2:** Should get_address_cost() verify that the > > > > > > > > > candidate increment > > > > > > > > > and use->stmt are in the same basic block when cand->pos == > > > > > > > > > IP_NORMAL? > > > > > > > > > Cross-block pairing seems problematic for post-increment > > > > > > > > > addressing mode > > > > > > > > > costing. > > > > > > > > > > > > > > > > If the pairing is wrong it should be fixed, where's that > > > > > > > > pairing done? > > > > > > > > > > > > > > > > > > > > > > The pairing takes place at this point > > > > > > > (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4739), > > > > > > > where the candidate (the `increment r0` statement from the > > > > > > > loop‑exiting basic block) is combined with the use > > > > > > > (the `load-from [r0, #-4]` statement from the loop‑header basic > > > > > > > block), resulting in can_autoinc being set to true. > > > > > > > > > > > > > > loop-header: > > > > > > > load-from [r0, #-4] > > > > > > > store-to [r0] > > > > > > > > > > > > > > loop-exiting: > > > > > > > increment r0 > > > > > > > bcc loop-header > > > > > > > > > > > > > > > > > > > > > > > > > Both issues suggest that IVOPTs may need additional > > > > > > > > > validation to ensure: > > > > > > > > > 1. The selected addressing mode is actually supported by the > > > > > > > > > target > > > > > > > > > 2. The increment and memory operation are properly co-located > > > > > > > > > for IP_NORMAL > > > > > > > > > candidates > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > Ciprian Arbone
