On Tue, Jun 23, 2026 at 10:14 AM Ciprian Arbone <[email protected]> wrote:
>
> On Wed, 17 Jun 2026 at 14:34, Richard Biener <[email protected]> 
> wrote:
> >
> > On Wed, Jun 17, 2026 at 12:21 PM Ciprian Arbone <[email protected]> 
> > wrote:
> > >
> > > On Wed, 17 Jun 2026 at 11:50, Richard Biener <[email protected]> 
> > > wrote:
> > > >
> > > > On Wed, Jun 17, 2026 at 10:48 AM Ciprian Arbone 
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Wed, 17 Jun 2026 at 11:36, Richard Biener 
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Jun 17, 2026 at 10:21 AM Ciprian Arbone 
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Tue, 16 Jun 2026 at 16:48, Richard Biener 
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Fri, Jun 12, 2026 at 3:31 PM Ciprian Arbone via Gcc 
> > > > > > > > <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > We recently enabled LDMIA/STMIA instructions for Thumb-1 
> > > > > > > > > (Cortex-M0+) by
> > > > > > > > > modifying ARM_AUTOINC_VALID_FOR_MODE_P to allow 
> > > > > > > > > auto-increment addressing
> > > > > > > > > for THUMB1 targets. However, we've discovered that IVOPTs 
> > > > > > > > > generates
> > > > > > > > > suboptimal code for simple loops due to incorrect addressing 
> > > > > > > > > mode selection.
> > > > > > > > >
> > > > > > > > > Consider this test case:
> > > > > > > > >
> > > > > > > > > void test(int *a, int *b, int size)
> > > > > > > > > {
> > > > > > > > >     for (int i = 0; i < size; i++)
> > > > > > > > >     {
> > > > > > > > >         a[i] = b[i] * a[i];
> > > > > > > > >     }
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > GCC currently generates:
> > > > > > > > >
> > > > > > > > >     ldmia   r0!, {r4}
> > > > > > > > >     ldmia   r1!, {r6}
> > > > > > > > >     subs    r5, r0, #4
> > > > > > > > >     ...
> > > > > > > > >     str     r4, [r5, #0]
> > > > > > > > >
> > > > > > > > > The issue occurs because IVOPTs selects a candidate with the 
> > > > > > > > > lowest cost that
> > > > > > > > > has the following structure:
> > > > > > > > >
> > > > > > > > > Candidate xxx:
> > > > > > > > >   Incr POS: after use 0
> > > > > > > > >   IV struct:
> > > > > > > > >     Type:       unsigned int
> > > > > > > > >     Base:       (unsigned int) a_13(D)
> > > > > > > > >     Step:       4
> > > > > > > > >
> > > > > > > > > This results in the following loop structure:
> > > > > > > > >
> > > > > > > > > loop-preheader:
> > > > > > > > >     r0 = a
> > > > > > > > >     jump loop-exiting
> > > > > > > > >
> > > > > > > > > loop-header:
> > > > > > > > >     load-from  [r0]
> > > > > > > > >     increment  r0
> > > > > > > > >     store-to   [r0, #-4]
> > > > > > > > >
> > > > > > > > > loop-exiting:
> > > > > > > > >     jump loop-header
> > > > > > > > >
> > > > > > > > > **Issue 1:** IVOPTs recognizes both patterns as valid 
> > > > > > > > > post-increment with
> > > > > > > > > offset zero:
> > > > > > > > >   - "load-from [r0]; increment r0" → recognized as post-inc 
> > > > > > > > > from offset 0
> > > > > > > > >   - "increment r0; store-to [r0, #-4]" → also recognized as 
> > > > > > > > > post-inc from
> > > > > > > > >     offset 0
> > > > > > > > >
> > > > > > > > > The code in tree-ssa-loop-ivopts.cc:get_address_cost() 
> > > > > > > > > applies the adjustment:
> > > > > > > > >
> > > > > > > > >     if (stmt_after_increment (data->current_loop, cand, 
> > > > > > > > > use->stmt))
> > > > > > > > >         ainc_offset += ainc_step;
> > > > > > > > >     cost = get_address_cost_ainc (ainc_step, ainc_offset,
> > > > > > > > >                                   addr_mode, mem_mode, as, 
> > > > > > > > > speed);
> > > > > > > > >
> > > > > > > > > However, Thumb-1 does not support negative immediate offsets 
> > > > > > > > > in addressing
> > > > > > > > > modes. The pattern "increment r0; store-to [r0, #-4]" can 
> > > > > > > > > never be realized
> > > > > > > > > as a post-increment store on Thumb-1, yet IVOPTs assigns it a 
> > > > > > > > > low cost.
> > > > > > > > >
> > > > > > > > > **Question 1:** Should get_address_cost() verify that an 
> > > > > > > > > addressing mode is
> > > > > > > > > actually valid on the target before assigning auto-increment 
> > > > > > > > > cost? Currently,
> > > > > > > > > it appears to assume validity without checking target 
> > > > > > > > > constraints.
> > > > > > > >
> > > > > > > > It should end up calling the legitimize_address_p target hook 
> > > > > > > > to verify
> > > > > > > > validity.
> > > > > > >
> > > > > > > The legitimize_address_p target hook is invoked, but relatively
> > > > > > > late—during the GIMPLE-to-RTL conversion.
> > > > > > > At that point, after ivopts, the GIMPLE already contains code 
> > > > > > > that may
> > > > > > > not be optimal.
> > > > > >
> > > > > > IVOPTs checks valid_mem_ref_p which calls 
> > > > > > memory_address_addr_space_p
> > > > > > with artificially generated RTL and that calls legitimate_address_p.
> > > > >
> > > > > Yes, it does: but only if get_address_cost_ainc returns infinite_cost,
> > > > > meaning that cand and use
> > > > > cannot be paired to form an auto-modifying address expression.
> > > >
> > > > I don't quite understand, but ainc forming does not check validity 
> > > > that's
> > > > what needs to be fixed.
> > > >
> > >
> > > My concern is specifically about the cost model in ivopts. In a loop like:
> > >
> > > for (;;)
> > > a[i] = a[i] * <expr>;
> > >
> > > ivopts may consider alternatives such as:
> > >
> > > // r0 = a
> > > load-from [r0, #0]
> > > add r0, #4
> > > store-to [r0, #-4]
> > >
> > > and:
> > >
> > > // r0 = a
> > > load-from [r0, #0]
> > > store-to [r0, #0]
> > > add r0, #4
> > >
> > > These appear to be treated as having equal cost. However, in the first
> > > case, performing the increment after the load can lead to an invalid
> > > addressing mode for the store ([r0, #-4]), which then has to be fixed
> > > up later by splitting it into multiple instructions. In practice, this
> > > makes the first alternative more expensive.
> >
> > As said, IVOPTs is supposed to see that it's using a -4 constant offset
> > and reject that if this is not valid.  If that doesn't work for whatever
> > reason then that's a bug.
> >
> > Richard.
> >
>
> Hi Richard,
>
> I believe IVOPTS is not recognizing that a constant offset of -4 is
> being used, because once it is combined with the increment step, it
> results in a valid effective increment offset of zero:
>
> if (stmt_after_increment (data->current_loop, cand, use->stmt))
> ainc_offset += ainc_step;
> cost = get_address_cost_ainc (ainc_step, ainc_offset,
> addr_mode, mem_mode, as, speed);
>
> I tried to work around this by:
>
> +static bool
> +autoinc_possible_for_pair (struct ivopts_data *data, struct iv_use *use,
> + struct iv_cand *cand, aff_tree *aff_inv)
> +{
> + /* For IP_NORMAL position, verify the increment and use are in the same 
> basic
> + * block */
> + if (cand->pos == IP_NORMAL)
> + {
> + gimple_stmt_iterator incr_pos
> + = gsi_last_bb (ip_normal_pos (data->current_loop));
> + if (gimple_bb (use->stmt)->index != gsi_bb (incr_pos)->index)
> + return false;
> + }
> +
> + if (!aff_combination_zero_p (aff_inv))
> + {
> + struct mem_address addr
> + = {.symbol = NULL_TREE,
> + .base = integer_one_node,
> + .index = NULL_TREE,
> + .step = NULL_TREE,
> + .offset = wide_int_to_tree (sizetype, aff_inv->offset)};
> + machine_mode mem_mode = TYPE_MODE (use->mem_type);
> + addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (use->iv->base));
> + return valid_mem_ref_p (mem_mode, as, &addr);
> + }
> +
> + return true;
> +}
> +
> /* Return cost of computing USE's address expression by using CAND.
> AFF_INV and AFF_VAR represent invariant and variant parts of the
> address expression, respectively. If AFF_INV is simple, store
> @@ -4743,6 +4773,7 @@ get_address_cost (struct ivopts_data *data,
> struct iv_use *use,
> {
> poly_int64 ainc_step;
> if (can_autoinc
> + && autoinc_possible_for_pair (data, use, cand, aff_inv)
> && ratio == 1
> && ptrdiff_tree_p (cand->iv->step, &ainc_step))
> {
>
> However, I am not entirely certain that this is the correct approach.

Hmm, not exactly sure.  But the place you are patching at least shows
that we do not verify mem-ref validity for it.

Is this tracked with a bugzilla report and a testcase btw?

Thanks,
Richard.

> Thanks,
> Ciprian.
>
>
> > >
> > > > >
> > > > >
> > > > > > Richard.
> > > > > >
> > > > > > >
> > > > > > > _24 = (void *) ivtmp.15_20;
> > > > > > > _6 = MEM[(int *)_24];
> > > > > > > ivtmp.15_21 = ivtmp.15_20 + 4;
> > > > > > >
> > > > > > > . . .
> > > > > > >
> > > > > > > _25 = (void *) ivtmp.15_21;
> > > > > > > MEM[(int *)_25 + 4294967292 * 1] = _7; <------
> > > > > > >
> > > > > > > Would it make sense to check whether the addressing mode is valid 
> > > > > > > at this point
> > > > > > > (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4732),
> > > > > > > similar to the validation performed later
> > > > > > > (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4753)?
> > > > > > >
> > > > > > > >
> > > > > > > > > **Issue 2:** IVOPTs also assigns low cost to another 
> > > > > > > > > candidate:
> > > > > > > > >
> > > > > > > > > Candidate yyy:
> > > > > > > > >   Incr POS: before exit test
> > > > > > > > >   IV struct:
> > > > > > > > >     Type:       unsigned int
> > > > > > > > >     Base:       (unsigned int) a_13(D)
> > > > > > > > >     Step:       4
> > > > > > > > >
> > > > > > > > > This produces:
> > > > > > > > >
> > > > > > > > > loop-preheader:
> > > > > > > > >     r0 = &a[0]
> > > > > > > > >     jump loop-exiting
> > > > > > > > >
> > > > > > > > > loop-header:
> > > > > > > > >     load-from  [r0, #-4]
> > > > > > > > >     store-to   [r0]
> > > > > > > > >
> > > > > > > > > loop-exiting:
> > > > > > > > >     increment  r0
> > > > > > > > >     jump loop-header
> > > > > > > > >
> > > > > > > > > IVOPTs considers that the increment in the loop-exiting block 
> > > > > > > > > can be paired
> > > > > > > > > with "load-from [r0, #-4]" in the loop-header block, despite 
> > > > > > > > > them being in
> > > > > > > > > different basic blocks.
> > > > > > > > >
> > > > > > > > > **Question 2:** Should get_address_cost() verify that the 
> > > > > > > > > candidate increment
> > > > > > > > > and use->stmt are in the same basic block when cand->pos == 
> > > > > > > > > IP_NORMAL?
> > > > > > > > > Cross-block pairing seems problematic for post-increment 
> > > > > > > > > addressing mode
> > > > > > > > > costing.
> > > > > > > >
> > > > > > > > If the pairing is wrong it should be fixed, where's that 
> > > > > > > > pairing done?
> > > > > > > >
> > > > > > >
> > > > > > > The pairing takes place at this point
> > > > > > > (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4739),
> > > > > > > where the candidate (the `increment r0` statement from the
> > > > > > > loop‑exiting basic block) is combined with the use
> > > > > > > (the `load-from [r0, #-4]` statement from the loop‑header basic
> > > > > > > block), resulting in can_autoinc being set to true.
> > > > > > >
> > > > > > > loop-header:
> > > > > > >     load-from [r0, #-4]
> > > > > > >     store-to [r0]
> > > > > > >
> > > > > > > loop-exiting:
> > > > > > >     increment r0
> > > > > > >     bcc loop-header
> > > > > > >
> > > > > > > > >
> > > > > > > > > Both issues suggest that IVOPTs may need additional 
> > > > > > > > > validation to ensure:
> > > > > > > > > 1. The selected addressing mode is actually supported by the 
> > > > > > > > > target
> > > > > > > > > 2. The increment and memory operation are properly co-located 
> > > > > > > > > for IP_NORMAL
> > > > > > > > >    candidates
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Ciprian Arbone

Reply via email to