[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #14 from Jaydeep Chauhan --- (In reply to Jaydeep Chauhan from comment #10) I have tried solve this case using define_split or define_insn_and_split but i am facing is some issue as per below: 1.It is not possible to combine below four instruction addq$1, %rax salq$4, %rax addqp1(%rip), %rax movl(%rax), %eax into the this shlq$4, %rcx movl16(%rax,%rcx), %eax using define_split or define_insn_and_split To add offset "16" and "p1" it is creating problem because "p1" is symbol ref. 2.Also to optimize addqp1(%rip), %rax movl(%rax), %eax into a single instruction it should need to be seperate define_split or define_insn_and_split. So it should need to be seperate 3.I have also tried this case with peephole but i am facing same problem with this also (define_peephole2 [(parallel [(set (match_operand:DI 0 "register_operand") (plus:DI (match_dup 0) (match_operand:DI 1 "const_int_operand"))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 0) (ashift:DI (match_dup 0) (match_operand 2 "const_int_operand"))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 0) (plus:DI (match_operand:DI 4 "nonimmediate_operand") (mem:DI (match_operand:SDWIM 5 "" (clobber (reg:CC FLAGS_REG))])] "" [(parallel [(set (match_dup 0) (ashift:DI (match_dup 0)(match_dup 2))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 0) (plus:DI (match_dup 4) (mem:DI (match_dup 5 (clobber (reg:CC FLAGS_REG))])] { int scale = 8 << INTVAL (operands[1]); operands[4] = gen_rtx_PLUS (DImode,operands[4], GEN_INT (scale)); }) Please share your comment/suggestion on this.
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #13 from Segher Boessenkool --- Yeah, that's not going to happen. Will it help to do some define_split or define_insn_and_split for this?
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #12 from Jakub Jelinek --- (In reply to Jakub Jelinek from comment #11) > Unless the combiner grows the possibility to split into 3 functions, I'm I mean 3 instructions when trying to combine 4. > afraid this would need to be solved in machine reorg or something similar.
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 Jakub Jelinek changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #11 from Jakub Jelinek --- That is because e.g. on: struct S { int a, b, c, d; }; struct T { struct S a; struct S b[256]; } *v; int foo (unsigned char *x) { return v->b[*x].a; } we have: (insn 6 3 7 2 (set (reg/f:DI 88 [ v ]) (mem/f/c:DI (symbol_ref:DI ("v") [flags 0x2] ) [1 v+0 S8 A64])) "pr54589-3.c":7:18 66 {*movdi_internal} (nil)) (insn 7 6 8 2 (set (reg:SI 89 [ *x_5(D) ]) (zero_extend:SI (mem:QI (reg/v/f:DI 86 [ x ]) [0 *x_5(D)+0 S1 A8]))) "pr54589-3.c":7:15 119 {*zero_extendqisi2} (expr_list:REG_DEAD (reg/v/f:DI 86 [ x ]) (nil))) (insn 8 7 9 2 (set (reg:DI 90 [ *x_5(D) ]) (sign_extend:DI (reg:SI 89 [ *x_5(D) ]))) "pr54589-3.c":7:18 128 {*extendsidi2_rex64} (expr_list:REG_DEAD (reg:SI 89 [ *x_5(D) ]) (nil))) (insn 9 8 10 2 (parallel [ (set (reg:DI 91) (plus:DI (reg:DI 90 [ *x_5(D) ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) "pr54589-3.c":7:18 191 {*adddi_1} (expr_list:REG_DEAD (reg:DI 90 [ *x_5(D) ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (insn 10 9 11 2 (parallel [ (set (reg:DI 92) (ashift:DI (reg:DI 91) (const_int 4 [0x4]))) (clobber (reg:CC 17 flags)) ]) "pr54589-3.c":7:18 518 {*ashldi3_1} (expr_list:REG_DEAD (reg:DI 91) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (insn 11 10 12 2 (parallel [ (set (reg/f:DI 93) (plus:DI (reg/f:DI 88 [ v ]) (reg:DI 92))) (clobber (reg:CC 17 flags)) ]) "pr54589-3.c":7:18 191 {*adddi_1} (expr_list:REG_DEAD (reg:DI 92) (expr_list:REG_DEAD (reg/f:DI 88 [ v ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil) (insn 12 11 17 2 (set (reg:SI 94 [ v.0_1->b[_3].a ]) (mem:SI (reg/f:DI 93) [2 v.0_1->b[_3].a+0 S4 A32])) "pr54589-3.c":7:18 67 {*movsi_internal} (expr_list:REG_DEAD (reg/f:DI 93) (nil))) and combine would handle that fine if it tried to combine 9, 10, 11 -> 12 first, but unfortunately for this case it tries 6 -> 11 first and, if the combination of everything doesn't simplify, we try to split it only into two parts, which in this case isn't enough, we'd need to split those 4 instructions into 3 (one to do the shift, one to do the v load and last to do the mem read with the 16(reg1, reg2) addressing. Unless the combiner grows the possibility to split into 3 functions, I'm afraid this would need to be solved in machine reorg or something similar.
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 Jaydeep Chauhan changed: What|Removed |Added CC||jaydeepchauhan1494 at gmail dot co ||m --- Comment #10 from Jaydeep Chauhan --- Hi, For the below case fix is not working. #include struct param { int a, b, c, d; __m128i array[256]; }; struct param *p1; void func(struct param *p, unsigned char *src, int *dst) { __m128i x = p1->array[*src]; *dst = _mm_cvtsi128_si32(x); } gcc generates for x86-64 with -O2: func: movzbl (%rsi), %eax addq$1, %rax salq$4, %rax addqp1(%rip), %rax movl(%rax), %eax movl%eax, (%rdx) ret clang generates for x86_64 with -O2: func: movqp1(%rip), %rax movzbl (%rsi), %ecx shlq$4, %rcx movl16(%rax,%rcx), %eax movl%eax, (%rdx) retq
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Jakub Jelinek --- Fixed for GCC9 and later.
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #8 from Jakub Jelinek --- Author: jakub Date: Sat Dec 1 07:27:58 2018 New Revision: 266707 URL: https://gcc.gnu.org/viewcvs?rev=266707&root=gcc&view=rev Log: PR target/54589 * combine.c (find_split_point): For invalid memory address nonobj + obj + const, if reg + obj + const is valid addressing mode, split at nonobj. Use if rather than else if for the fallback. Comment fixes. * gcc.target/i386/pr54589.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr54589.c Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c trunk/gcc/testsuite/ChangeLog
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- Created attachment 45125 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45125&action=edit gcc9-pr54589.patch Untested fix.
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #6 from Jakub Jelinek --- --- gcc/combine.c.jj2018-11-21 19:57:26.229726485 +0100 +++ gcc/combine.c 2018-11-29 17:57:48.069423874 +0100 @@ -4945,7 +4945,7 @@ find_split_point (rtx *loc, rtx_insn *in } /* If we have a PLUS whose second operand is a constant and the -address is not valid, perhaps will can split it up using +address is not valid, perhaps we can split it up using the machine-specific way to split large constants. We use the first pseudo-reg (one of the virtual regs) as a placeholder; it will not remain in the result. */ @@ -4960,7 +4960,7 @@ find_split_point (rtx *loc, rtx_insn *in /* This should have produced two insns, each of which sets our placeholder. If the source of the second is a valid address, -we can make put both sources together and make a split point +we can put both sources together and make a split point in the middle. */ if (seq @@ -5001,14 +5001,45 @@ find_split_point (rtx *loc, rtx_insn *in } } + /* If that didn't work and we have a nested plus, like: +((REG1 * CONST1) + REG2) + CONST2 and (REG1 + REG2) + CONST2 +is valid address, try to split (REG1 * CONST1). */ + if (GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS + && !OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 0)) + && OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 1))) + { + rtx tem = XEXP (XEXP (XEXP (x, 0), 0), 0); + XEXP (XEXP (XEXP (x, 0), 0), 0) = reg; + if (memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0), + MEM_ADDR_SPACE (x))) + { + XEXP (XEXP (XEXP (x, 0), 0), 0) = tem; + return &XEXP (XEXP (XEXP (x, 0), 0), 0); + } + XEXP (XEXP (XEXP (x, 0), 0), 0) = tem; + } + else if (GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS + && OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 0)) + && !OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 1))) + { + rtx tem = XEXP (XEXP (XEXP (x, 0), 0), 1); + XEXP (XEXP (XEXP (x, 0), 0), 1) = reg; + if (memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0), + MEM_ADDR_SPACE (x))) + { + XEXP (XEXP (XEXP (x, 0), 0), 1) = tem; + return &XEXP (XEXP (XEXP (x, 0), 0), 1); + } + XEXP (XEXP (XEXP (x, 0), 0), 1) = tem; + } + /* If that didn't work, perhaps the first operand is complex and needs to be computed separately, so make a split point there. This will occur on machines that just support REG + CONST and have a constant moved through some previous computation. */ - - else if (!OBJECT_P (XEXP (XEXP (x, 0), 0)) - && ! (GET_CODE (XEXP (XEXP (x, 0), 0)) == SUBREG -&& OBJECT_P (SUBREG_REG (XEXP (XEXP (x, 0), 0) + if (!OBJECT_P (XEXP (XEXP (x, 0), 0)) + && ! (GET_CODE (XEXP (XEXP (x, 0), 0)) == SUBREG + && OBJECT_P (SUBREG_REG (XEXP (XEXP (x, 0), 0) return &XEXP (XEXP (x, 0), 0); } fixes this for me.
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||segher at gcc dot gnu.org --- Comment #5 from Jakub Jelinek --- Note, in other cases combine is sucessful. E.g. consider: struct S { int a, b, c, d; }; struct T { struct S e[16]; struct S f[1024]; } t; int foo (unsigned long x) { return t.f[x + 5].a; } int bar (struct T *x, unsigned long y) { return x->f[y + 5].b; } On x86_64-linux with -O2, we have before combine in both cases (idx + 21) << 4 and in foo combine says: Trying 7, 8 -> 10: 7: {r87:DI=r91:DI+0x15;clobber flags:CC;} REG_DEAD r91:DI REG_UNUSED flags:CC 8: {r88:DI=r87:DI<<0x4;clobber flags:CC;} REG_DEAD r87:DI REG_UNUSED flags:CC 10: r90:SI=[r88:DI+`t'] REG_DEAD r88:DI Failed to match this instruction: (set (reg:SI 90 [ t.f[_1].a ]) (mem:SI (plus:DI (mult:DI (reg:DI 91) (const_int 16 [0x10])) (const:DI (plus:DI (symbol_ref:DI ("t") [flags 0x2] ) (const_int 336 [0x150] [1 t.f[_1].a+0 S4 A32])) Successfully matched this instruction: (set (reg:DI 88) (ashift:DI (reg:DI 91) (const_int 4 [0x4]))) Successfully matched this instruction: (set (reg:SI 90 [ t.f[_1].a ]) (mem:SI (plus:DI (reg:DI 88) (const:DI (plus:DI (symbol_ref:DI ("t") [flags 0x2] ) (const_int 336 [0x150] [1 t.f[_1].a+0 S4 A32])) and it is merged the way we want: salq$4, %rdi movlt+336(%rdi), %eax while in bar: Failed to match this instruction: (set (reg:SI 91 [ x_4(D)->f[_1].b ]) (mem:SI (plus:DI (plus:DI (mult:DI (reg:DI 93) (const_int 16 [0x10])) (reg:DI 92)) (const_int 340 [0x154])) [1 x_4(D)->f[_1].b+0 S4 A32])) Failed to match this instruction: (set (reg:DI 88) (plus:DI (ashift:DI (reg:DI 93) (const_int 4 [0x4])) (reg:DI 92))) and it is not merged: addq$21, %rsi salq$4, %rsi movl4(%rsi,%rdi), %eax when it could be: salq$4, %rsi movl340(%rsi,%rdi), %eax It isn't x86 specific though, e.g. on powerpc64-linux we end up with: addi 4,4,21 sldi 4,4,4 add 4,3,4 lwa 3,4(4) for bar, where I guess: sldi 4,4,4 add 4,3,4 lwa 3,340(4) would work too.
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #4 from Uroš Bizjak --- clang generates for x86_64: movzbl (%rsi), %eax shlq$4, %rax movl16(%rdi,%rax), %eax movl%eax, (%rdx) retq and for i?86: movl8(%esp), %edx movl4(%esp), %ecx movl12(%esp), %eax movzbl (%edx), %edx shll$4, %edx movl16(%ecx,%edx), %ecx movl%ecx, (%eax) retl for the later case (-m32), gcc generates: movl8(%esp), %eax movzbl (%eax), %eax addl$1, %eax sall$4, %eax addl4(%esp), %eax movl(%eax), %edx movl12(%esp), %eax movl%edx, (%eax) ret so, two extra additions.
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #3 from sgunderson at bigfoot dot com --- Still there in GCC 7.2.1 (exact same assembler output), and in 8.0 snapshot 20171017.
[Bug target/54589] struct offset add should be folded into address calculation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #2 from sgunderson at bigfoot dot com 2012-09-17 09:18:16 UTC --- FWIW, in my original code, func() is a part of a loop body (it keeps reading values from src in a loop). It doesn't really change anything in the generated code, though.
[Bug target/54589] struct offset add should be folded into address calculation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 Richard Guenther changed: What|Removed |Added Target||x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW Keywords||missed-optimization Last reconfirmed||2012-09-17 Component|tree-optimization |target Ever Confirmed|0 |1 Summary|[missed-optimization] |struct offset add should be |struct offset add should be |folded into address |folded into address |calculation |calculation | Known to fail||4.6.3, 4.7.2, 4.8.0 --- Comment #1 from Richard Guenther 2012-09-17 08:49:25 UTC --- As there is no loop involved it is RTL fwprop or combines job to eventually do this. I see func: .LFB517: .cfi_startproc movzbl (%rsi), %eax addq$1, %rax salq$4, %rax movl(%rdi,%rax), %eax movl%eax, (%rdx) ret for both 4.7 and 4.8. i?86 is even worse.