On Wed, Dec 3, 2025 at 4:19 PM Aleksandar Rakic <[email protected]> wrote: > > Hi Richard, > > Thank you for your feedback on the patch. I’d like to better understand > your point about this being considered an “unrelated change.” > > Could you please explain why distributing the invariant to > `parts.offset` in the case of `!(ok_with_ratio_p || ok_without_ratio_p)` > is seen as unrelated? From my understanding, this condition only checks > support for `base + index [<< scale]`, not for `base + offset`. For > example, in commit 45f3501bda (around line 4754 in > tree-ssa-loop-ivopts.cc), the code still validates `base + offset` even > when `base + index` is not supported. > > If possible, could you share an example or scenario where distributing > to `parts.offset` would not be valid? This would help clarify the design > intent and ensure the patch aligns with expectations.
I think it's on your side to show this change is required and positive, not mine to prove it is not. Sorry, but every time I look at this patch it takes me more than an hour to familiarize myself again with this code, this feels like wasted time, esp. since last time I did this I could not reproduce any change from your patch on the testcase(s). Richard. > > Thanks for your time and insights. > > Best regards, > Aleksandar > > > ________________________________________ > From: Richard Biener <[email protected]> > Sent: Wednesday, January 8, 2025 3:30 PM > To: Aleksandar Rakic > Cc: [email protected]; Djordje Todorovic; Jovan Dmitrovic > Subject: Re: [Bug tree-optimization/109429] [PATCH v2] ivopts: fixed > complexities > > [Some people who received this message don't often get email from > [email protected]. Learn why this is important at > https://aka.ms/LearnAboutSenderIdentification ] > > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > On Wed, Dec 18, 2024 at 2:55 PM Aleksandar Rakic > <[email protected]> wrote: > > > > Hi, > > > > > > > > > > Hi, > > > > > > > > I think I managed to fix indentation from the previous version. > > > > > > > > When comparing the tables showing the candidates for the group 1 > > > > before and after applying this patch, it can be observed that > > > > complexities for the candidates where the computation depends on the > > > > invariant expressions or the invariant variables should be at least > > > > one, which aligns with the approach used in the commit c2b64ce. > > > > > > Commit c2b64ce is > > > > > > +2017-05-11 Bin Cheng <[email protected]> > > > + > > > + * tree-ssa-address.c (struct mem_address): Move to header file. > > > + (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global. > > > + * tree-ssa-address.h (struct mem_address): Move from C file. > > > + (valid_mem_ref_p, move_fixed_address_to_symbol): Declare. > > > > > > that hasn't any "approach" to complexity, it just makes a function > > > global. > > > > In the commit c2b64ce, which is the parent of the commit f9f69dd, which > > introduced the bug, in the function get_address_cost the complexity is > > incremented by one if var_present != 0, so that's the reason why I've > > mentioned that commit. > > OK, so that was confusing - you should have mentioned the commit > _changing_ the behavior. I see that change is very large though. > > > > > > > > > ===== Before this patch ===== > > > > Group 1: > > > > cand cost compl. inv.expr. inv.vars > > > > 1 11 0 5; NIL; > > > > 2 11 0 6; NIL; > > > > 4 8 0 7; NIL; > > > > 5 9 0 8; NIL; > > > > 6 1 0 NIL; NIL; > > > > 7 1 1 NIL; NIL; > > > > 9 7 0 5; NIL; > > > > ===== Before this patch ===== > > > > ===== After this patch ===== > > > > Group 1: > > > > cand cost compl. inv.expr. inv.vars > > > > 1 11 2 4; NIL; > > > > > > why does complexity go up to 2 from 0 here? > > > > The complexity for certain candidates equals 2 in the figure because two > > conditions are met: > > 1) parts.offset != NULL_TREE && !integer_zerop (parts.offset), > > 2) (inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) || > > (inv_expr && *inv_expr != NULL). > > > > > > 2 11 1 4; NIL; > > > > 4 8 1 5; NIL; > > > > 5 8 2 6; NIL; > > > > > > Likewise, and why does cost change? > > > > > > > 6 1 0 NIL; NIL; > > > > 7 1 1 NIL; NIL; > > > > 9 7 2 4; NIL; > > > > > > Likewise. > > > > The cost for a candidate in the figure changes because parts.offset and > > aff_inv->offset change, which lead to different cost. > > > > > This comparison is probably not very useful without showing the actual > > > candidate and its uses? > > > > The candidates and the use are attached. For example, the candidate 1 > > has a base 0 and a step 1. Use 1 has a base vector_27(D) + > > ((sizetype) _11 + 1) * 4 and a step 4. Use can be presented as ubase + > > ratio * (candidate - cbase), where ratio is ustep/cstep. It equals > > vector_27(D) + ((sizetype) _11 + 1) * 4 + 4 * candidate1, ie. aff_inv > > equals vector_27(D) + (sizetype) _11 * 4 + 4 and aff_var equals > > 4 * candidate1. Then, parts.offset becomes 4 (from aff_inv) and the > > invariant expression 4 equals vector_27(D) + (sizetype) _11 * 4. > > > > > The above before/after figures do not match the testcase > > > ontop of trunk. > > > > > > > ===== After this patch ===== > > > > > > > > Hence, if the invariant expressions or the invariant variables are > > > > used when representing use with candidate, the complexity should be > > > > larger for more complex expressions, so it is incremented by one. I > > > > am not sure whether inv_present could be expressed as parts. > > > > > > The testcase looks mips specific - it has just scan-tree-dump-not > > > which is probably easily confused to PASS when it regressed. Can you > > > instead add a gcc.target/mips/ testcase that scans for actual > > > assembler features? If the testcase relies on inlining daxpy then > > > declaring that static helps that you just get dgefa in the final > > > assembly. If you want to scan both functions I suggest to split the > > > testcase into two to make it more reliable. > > > > I positioned on the commit a4954130d4 (c++: define __cpp_pack_indexing > > [PR113798]), applied the patch, and got the same before/after figures. > > Also, both scan-tree-dump-not tests passed. I declared daxpy static in > > the new testcase. I am not sure what features to scan in assembly. > > A scan-tree-dump-not easily passes (as said, it passes before/after the > patch for me). > > > > I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu > > > generates for the innermost loop of dgefa > > > > > > .L12: > > > addu $3,$9,$2 > > > addu $3,$3,$8 > > > lwc1 $f1,0($3) > > > lwc1 $f0,0($2) > > > addiu $7,$7,1 > > > mul.s $f1,$f2,$f1 > > > addiu $2,$2,4 > > > slt $3,$7,$10 > > > add.s $f0,$f0,$f1 > > > .set noreorder > > > .set nomacro > > > bne $3,$0,.L12 > > > > > > and with the patch > > > > > > .L12: > > > addu $3,$9,$2 > > > addu $3,$3,$8 > > > lwc1 $f1,0($3) > > > lwc1 $f0,0($2) > > > addiu $7,$7,1 > > > mul.s $f1,$f2,$f1 > > > addiu $2,$2,4 > > > slt $3,$7,$10 > > > add.s $f0,$f0,$f1 > > > .set noreorder > > > .set nomacro > > > bne $3,$0,.L12 > > > > > > that's suspiciously identical?! In fact the whole testcase generates > > > identical code. > > > > Could you please elaborate what do you mean by r15-5650-gd9c908b7503965? > > That's the revision as gcc-descr encodes it, git hash d9c908b750 which is > a few days earlier as your one above. > > > We managed to generate different assembly code with a patch. We used the > > following command to achieve that: > > > > $PREFIX/bin/mips64-r6-linux-gnu-gcc test.c -O2 \ > > -fdump-tree-ivopts-details -S -o test.s, > > > > where $PREFIX is an install directory. > > > > Could you please share the command which you used to generate identical > > code? As I mentioned, we positioned on the commit a4954130d4 and applied > > the patch. > > Same as yours, I mentioned the GCC configuration which might be the > difference. > Note I've not installed the mips cross but only built all-gcc and > invoked gcc/cc1 > directly. > > > > So besides not being able to see the actual problem (maybe I need some > > > -march/-mtune?) the actual issue I have with the patch is that aff_inv > > > is tried to be distributed to other components and for parts that fail > > > to be distributed we cost it via > > > > > > if (comp_inv != NULL_TREE) > > > cost = force_var_cost (data, comp_inv, inv_vars); > > > > > > simply ensuring the complexity is at least one would have been to > > > change that to > > > > > > if (comp_inv != NULL_TREE) > > > { > > > cost = force_var_cost (data, comp_inv, inv_vars); > > > /* Ensure complexity is at least one. */ > > > cost.complexity = MAX (1, cost.complexity); > > > } > > > > > > or alternatively just do that for the if (comp_inv && inv_expr && > > > !simple_inv) case > > > > The patch you posted related to complexity seems to be OK, but I would > > slightly modify it as follows: > > > > if (comp_inv != NULL_TREE) > > { > > cost = force_var_cost (data, comp_inv, inv_vars); > > cost.complexity += 1; > > } > > > > I said 'the complexity should be at least one' because it was zero in > > the before figure, but it is just one special case. > > > > > (it's a bit odd we adjust cost there only for 'inv_expr != NULL'). > > > > Could you please elaborate this sentence? My condition for complexity > > incrementing was: > > > > if ((inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) || \ > > (inv_expr && *inv_expr != NULL)). > > > > > The patch you posted instead of just adjusting complexity seems to > > > change the way we distribute the invariant - in particular we now > > > distribute it to parts.offset even when that is not supported > > > (!(ok_with_ratio_p || ok_without_ratio_p)), that's an odd change. > > > > Could you please elaborate this because I think I don't understand you. > > The condition !(ok_with_ratio_p || ok_without_ratio_p) checks whether > > base + index [<< scale] is supported or not. It doesn't check whether > > base + offset is supported. For example, the code on the commit > > a4954130d4 at line 4767 of tree-ssa-loop-ivopts.cc checks addressing > > mode base + offset even if base + index is not checked. I think that > > base + index is not required condition for base + offset. > > It looks like an unrelated change to me. If you are happy with > > > if (comp_inv != NULL_TREE) > > { > > cost = force_var_cost (data, comp_inv, inv_vars); > > cost.complexity += 1; > > } > > can you please give that full testing and re-post such patch? I'd also > appreciate if you'd move the testcase to be in gcc.target/mips, scanning > for expected assembly, rather than scanning for unexpected things not > appearing. > > Please also adjust the commit message to indicate the revision > introducing the regression rather than using the one right before. > > Richard, > > > Kind regards, > > Aleksandar > > > > ________________________________________ > > From: Richard Biener <[email protected]> > > Sent: Thursday, November 28, 2024 1:45 PM > > To: Aleksandar Rakic > > Cc: [email protected]; Djordje Todorovic; Jovan Dmitrovic > > Subject: Re: [Bug tree-optimization/109429] [PATCH v2] ivopts: fixed > > complexities > > > > > > > > On Wed, Sep 25, 2024 at 5:32 PM Aleksandar Rakic > > <[email protected]> wrote: > > > > > > Hi, > > > > > > I think I managed to fix indentation from the previous version. > > > > > > When comparing the tables showing the candidates for the group 1 before > > > and after applying this patch, it can be observed that complexities for > > > the candidates where the computation depends on the invariant > > > expressions or the invariant variables should be at least one, which > > > aligns with the approach used in the commit c2b64ce. > > > > Commit c2b64ce is > > > > +2017-05-11 Bin Cheng <[email protected]> > > + > > + * tree-ssa-address.c (struct mem_address): Move to header file. > > + (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global. > > + * tree-ssa-address.h (struct mem_address): Move from C file. > > + (valid_mem_ref_p, move_fixed_address_to_symbol): Declare. > > > > that hasn't any "approach" to complexity, it just makes a function global. > > > > > > > > ===== Before this patch ===== > > > Group 1: > > > cand cost compl. inv.expr. inv.vars > > > 1 11 0 5; NIL; > > > 2 11 0 6; NIL; > > > 4 8 0 7; NIL; > > > 5 9 0 8; NIL; > > > 6 1 0 NIL; NIL; > > > 7 1 1 NIL; NIL; > > > 9 7 0 5; NIL; > > > ===== Before this patch ===== > > > ===== After this patch ===== > > > Group 1: > > > cand cost compl. inv.expr. inv.vars > > > 1 11 2 4; NIL; > > > > why does complexity go up to 2 from 0 here? > > > > > 2 11 1 4; NIL; > > > 4 8 1 5; NIL; > > > 5 8 2 6; NIL; > > > > Likewise, and why does cost change? > > > > > 6 1 0 NIL; NIL; > > > 7 1 1 NIL; NIL; > > > 9 7 2 4; NIL; > > > > Likewise. > > > > This comparison is probably not very useful without showing the actual > > candidate > > and its uses? The above before/after figures do not match the testcase > > ontop of trunk. > > > > > ===== After this patch ===== > > > > > > Hence, if the invariant expressions or the invariant variables are used > > > when representing use with candidate, the complexity should be larger > > > for more complex expressions, so it is incremented by one. I am not sure > > > whether inv_present could be expressed as parts. > > > > The testcase looks mips specific - it has just scan-tree-dump-not which > > is probably easily confused to PASS when it regressed. Can you instead > > add a gcc.target/mips/ testcase that scans for actual assembler features? > > If the testcase relies on inlining daxpy then declaring that static helps > > that > > you just get dgefa in the final assembly. If you want to scan both > > functions > > I suggest to split the testcase into two to make it more reliable. > > > > I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu generates > > for the innermost loop of dgefa > > > > .L12: > > addu $3,$9,$2 > > addu $3,$3,$8 > > lwc1 $f1,0($3) > > lwc1 $f0,0($2) > > addiu $7,$7,1 > > mul.s $f1,$f2,$f1 > > addiu $2,$2,4 > > slt $3,$7,$10 > > add.s $f0,$f0,$f1 > > .set noreorder > > .set nomacro > > bne $3,$0,.L12 > > > > and with the patch > > > > .L12: > > addu $3,$9,$2 > > addu $3,$3,$8 > > lwc1 $f1,0($3) > > lwc1 $f0,0($2) > > addiu $7,$7,1 > > mul.s $f1,$f2,$f1 > > addiu $2,$2,4 > > slt $3,$7,$10 > > add.s $f0,$f0,$f1 > > .set noreorder > > .set nomacro > > bne $3,$0,.L12 > > > > that's suspiciously identical?! In fact the whole testcase generates > > identical code. > > > > So besides not being able to see the actual problem (maybe I need some > > -march/-mtune?) > > the actual issue I have with the patch is that aff_inv is tried to be > > distributed to other > > components and for parts that fail to be distributed we cost it via > > > > if (comp_inv != NULL_TREE) > > cost = force_var_cost (data, comp_inv, inv_vars); > > > > simply ensuring the complexity is at least one would have been to change > > that to > > > > if (comp_inv != NULL_TREE) > > { > > cost = force_var_cost (data, comp_inv, inv_vars); > > /* Ensure complexity is at least one. */ > > cost.complexity = MAX (1, cost.complexity); > > } > > > > or alternatively just do that for the if (comp_inv && inv_expr && > > !simple_inv) case > > (it's a bit odd we adjust cost there only for 'inv_expr != NULL'). > > > > The patch you posted instead of just adjusting complexity seems to change > > the way we distribute the invariant - in particular we now distribute it to > > parts.offset even when that is not supported (!(ok_with_ratio_p || > > ok_without_ratio_p)), > > that's an odd change. > > > > complexity is supposed to be a tie-breaker, so I think that having > > bigger complexity > > for when we can't move it fully to index is OK - in the end any part > > that cannot be > > moved will end up being applied to base I think (that we have > > essentially two functions > > for this, one for costing and one for actual code emission, is a bit > > unfortunate). > > > > In the end I can't ack this patch as I cannot reproduce an effect on > > the testcase > > you added and because of the odd change part of the patch which is clearly > > not > > only doing what it's description says. > > > > Thanks, > > Richard. > > > > > > > Regards, > > > Aleksandar
