[Bug target/54589] struct offset add should be folded into address calculation

2018-12-17 Thread jaydeepchauhan1494 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #14 from Jaydeep Chauhan  ---
(In reply to Jaydeep Chauhan from comment #10)


I have tried solve this case using define_split or define_insn_and_split 
  but i am facing is some issue as per below:

1.It is not possible to combine below four instruction 

addq$1, %rax
salq$4, %rax
addqp1(%rip), %rax
movl(%rax), %eax

into the this
shlq$4, %rcx
movl16(%rax,%rcx), %eax 

using define_split or define_insn_and_split

To add offset "16" and "p1"  it is creating problem because "p1" is symbol ref.

2.Also to optimize 
addqp1(%rip), %rax
movl(%rax), %eax
into a single instruction it should need to be seperate define_split or
define_insn_and_split.
So it should need to be seperate 

3.I have also tried this case with peephole but i am facing same problem with
this also


(define_peephole2
  [(parallel [(set (match_operand:DI 0 "register_operand")
 (plus:DI (match_dup 0)
 (match_operand:DI 1 "const_int_operand")))
 (clobber (reg:CC FLAGS_REG))])
   (parallel [(set (match_dup 0)
 (ashift:DI (match_dup 0)
 (match_operand 2 "const_int_operand")))
 (clobber (reg:CC FLAGS_REG))])
   (parallel [(set (match_dup 0)
(plus:DI (match_operand:DI 4 "nonimmediate_operand")
 (mem:DI (match_operand:SDWIM 5 ""
 (clobber (reg:CC FLAGS_REG))])]
  ""
  [(parallel [(set (match_dup 0)
   (ashift:DI (match_dup 0)(match_dup 2)))
   (clobber (reg:CC FLAGS_REG))])
   (parallel [(set (match_dup 0)
   (plus:DI (match_dup 4)
(mem:DI (match_dup 5
(clobber (reg:CC FLAGS_REG))])]
{
   int scale = 8 << INTVAL (operands[1]);
   operands[4] = gen_rtx_PLUS (DImode,operands[4], GEN_INT (scale));
})


Please share your comment/suggestion on this.

[Bug target/54589] struct offset add should be folded into address calculation

2018-12-07 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #13 from Segher Boessenkool  ---
Yeah, that's not going to happen.

Will it help to do some define_split or define_insn_and_split for this?

[Bug target/54589] struct offset add should be folded into address calculation

2018-12-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #12 from Jakub Jelinek  ---
(In reply to Jakub Jelinek from comment #11)
> Unless the combiner grows the possibility to split into 3 functions, I'm

I mean 3 instructions when trying to combine 4.

> afraid this would need to be solved in machine reorg or something similar.

[Bug target/54589] struct offset add should be folded into address calculation

2018-12-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jakub Jelinek  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #11 from Jakub Jelinek  ---
That is because e.g. on:
struct S { int a, b, c, d; };
struct T { struct S a; struct S b[256]; } *v;

int
foo (unsigned char *x)
{
  return v->b[*x].a;
}
we have:
(insn 6 3 7 2 (set (reg/f:DI 88 [ v ])
(mem/f/c:DI (symbol_ref:DI ("v") [flags 0x2]  ) [1 v+0 S8 A64])) "pr54589-3.c":7:18 66 {*movdi_internal}
 (nil))
(insn 7 6 8 2 (set (reg:SI 89 [ *x_5(D) ])
(zero_extend:SI (mem:QI (reg/v/f:DI 86 [ x ]) [0 *x_5(D)+0 S1 A8])))
"pr54589-3.c":7:15 119 {*zero_extendqisi2}
 (expr_list:REG_DEAD (reg/v/f:DI 86 [ x ])
(nil)))
(insn 8 7 9 2 (set (reg:DI 90 [ *x_5(D) ])
(sign_extend:DI (reg:SI 89 [ *x_5(D) ]))) "pr54589-3.c":7:18 128
{*extendsidi2_rex64}
 (expr_list:REG_DEAD (reg:SI 89 [ *x_5(D) ])
(nil)))
(insn 9 8 10 2 (parallel [
(set (reg:DI 91)
(plus:DI (reg:DI 90 [ *x_5(D) ])
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) "pr54589-3.c":7:18 191 {*adddi_1}
 (expr_list:REG_DEAD (reg:DI 90 [ *x_5(D) ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(insn 10 9 11 2 (parallel [
(set (reg:DI 92)
(ashift:DI (reg:DI 91)
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) "pr54589-3.c":7:18 518 {*ashldi3_1}
 (expr_list:REG_DEAD (reg:DI 91)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(insn 11 10 12 2 (parallel [
(set (reg/f:DI 93)
(plus:DI (reg/f:DI 88 [ v ])
(reg:DI 92)))
(clobber (reg:CC 17 flags))
]) "pr54589-3.c":7:18 191 {*adddi_1}
 (expr_list:REG_DEAD (reg:DI 92)
(expr_list:REG_DEAD (reg/f:DI 88 [ v ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)
(insn 12 11 17 2 (set (reg:SI 94 [ v.0_1->b[_3].a ])
(mem:SI (reg/f:DI 93) [2 v.0_1->b[_3].a+0 S4 A32])) "pr54589-3.c":7:18
67 {*movsi_internal}
 (expr_list:REG_DEAD (reg/f:DI 93)
(nil)))
and combine would handle that fine if it tried to combine 9, 10, 11 -> 12
first, but unfortunately for this case it tries 6 -> 11 first and, if the
combination of everything doesn't simplify, we try to split it only into two
parts, which in this case isn't enough, we'd need to split those 4 instructions
into 3 (one to do the shift, one to do the v load and last to do the mem read
with the 16(reg1, reg2) addressing.

Unless the combiner grows the possibility to split into 3 functions, I'm afraid
this would need to be solved in machine reorg or something similar.

[Bug target/54589] struct offset add should be folded into address calculation

2018-12-07 Thread jaydeepchauhan1494 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jaydeep Chauhan  changed:

   What|Removed |Added

 CC||jaydeepchauhan1494 at gmail 
dot co
   ||m

--- Comment #10 from Jaydeep Chauhan  ---
 Hi,

 For the below case fix is not working.

 #include 

  struct param {
  int a, b, c, d;
  __m128i array[256];
  };

  struct param *p1;
  void func(struct param *p, unsigned char *src, int *dst)
  {
  __m128i x = p1->array[*src];
  *dst = _mm_cvtsi128_si32(x);
  }

  gcc generates for x86-64 with -O2:

  func:
movzbl  (%rsi), %eax
addq$1, %rax
salq$4, %rax
addqp1(%rip), %rax
movl(%rax), %eax
movl%eax, (%rdx)
ret

  clang generates for x86_64 with -O2:  

  func:  
movqp1(%rip), %rax
movzbl  (%rsi), %ecx
shlq$4, %rcx
movl16(%rax,%rcx), %eax
movl%eax, (%rdx)
retq

[Bug target/54589] struct offset add should be folded into address calculation

2018-11-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Jakub Jelinek  ---
Fixed for GCC9 and later.

[Bug target/54589] struct offset add should be folded into address calculation

2018-11-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #8 from Jakub Jelinek  ---
Author: jakub
Date: Sat Dec  1 07:27:58 2018
New Revision: 266707

URL: https://gcc.gnu.org/viewcvs?rev=266707&root=gcc&view=rev
Log:
PR target/54589
* combine.c (find_split_point): For invalid memory address
nonobj + obj + const, if reg + obj + const is valid addressing
mode, split at nonobj.  Use if rather than else if for the
fallback.  Comment fixes.

* gcc.target/i386/pr54589.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr54589.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/combine.c
trunk/gcc/testsuite/ChangeLog

[Bug target/54589] struct offset add should be folded into address calculation

2018-11-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
Created attachment 45125
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45125&action=edit
gcc9-pr54589.patch

Untested fix.

[Bug target/54589] struct offset add should be folded into address calculation

2018-11-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #6 from Jakub Jelinek  ---
--- gcc/combine.c.jj2018-11-21 19:57:26.229726485 +0100
+++ gcc/combine.c   2018-11-29 17:57:48.069423874 +0100
@@ -4945,7 +4945,7 @@ find_split_point (rtx *loc, rtx_insn *in
}

   /* If we have a PLUS whose second operand is a constant and the
-address is not valid, perhaps will can split it up using
+address is not valid, perhaps we can split it up using
 the machine-specific way to split large constants.  We use
 the first pseudo-reg (one of the virtual regs) as a placeholder;
 it will not remain in the result.  */
@@ -4960,7 +4960,7 @@ find_split_point (rtx *loc, rtx_insn *in

  /* This should have produced two insns, each of which sets our
 placeholder.  If the source of the second is a valid address,
-we can make put both sources together and make a split point
+we can put both sources together and make a split point
 in the middle.  */

  if (seq
@@ -5001,14 +5001,45 @@ find_split_point (rtx *loc, rtx_insn *in
}
}

+ /* If that didn't work and we have a nested plus, like:
+((REG1 * CONST1) + REG2) + CONST2 and (REG1 + REG2) + CONST2
+is valid address, try to split (REG1 * CONST1).  */
+ if (GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS
+ && !OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 0))
+ && OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 1)))
+   {
+ rtx tem = XEXP (XEXP (XEXP (x, 0), 0), 0);
+ XEXP (XEXP (XEXP (x, 0), 0), 0) = reg;
+ if (memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0),
+  MEM_ADDR_SPACE (x)))
+   {
+ XEXP (XEXP (XEXP (x, 0), 0), 0) = tem;
+ return &XEXP (XEXP (XEXP (x, 0), 0), 0);
+   }
+ XEXP (XEXP (XEXP (x, 0), 0), 0) = tem;
+   }
+ else if (GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS
+  && OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 0))
+  && !OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 1)))
+   {
+ rtx tem = XEXP (XEXP (XEXP (x, 0), 0), 1);
+ XEXP (XEXP (XEXP (x, 0), 0), 1) = reg;
+ if (memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0),
+  MEM_ADDR_SPACE (x)))
+   {
+ XEXP (XEXP (XEXP (x, 0), 0), 1) = tem;
+ return &XEXP (XEXP (XEXP (x, 0), 0), 1);
+   }
+ XEXP (XEXP (XEXP (x, 0), 0), 1) = tem;
+   }
+
  /* If that didn't work, perhaps the first operand is complex and
 needs to be computed separately, so make a split point there.
 This will occur on machines that just support REG + CONST
 and have a constant moved through some previous computation.  */
-
- else if (!OBJECT_P (XEXP (XEXP (x, 0), 0))
-  && ! (GET_CODE (XEXP (XEXP (x, 0), 0)) == SUBREG
-&& OBJECT_P (SUBREG_REG (XEXP (XEXP (x, 0), 0)
+ if (!OBJECT_P (XEXP (XEXP (x, 0), 0))
+ && ! (GET_CODE (XEXP (XEXP (x, 0), 0)) == SUBREG
+   && OBJECT_P (SUBREG_REG (XEXP (XEXP (x, 0), 0)
return &XEXP (XEXP (x, 0), 0);
}


fixes this for me.

[Bug target/54589] struct offset add should be folded into address calculation

2018-11-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||segher at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
Note, in other cases combine is sucessful.  E.g. consider:
struct S { int a, b, c, d; };
struct T { struct S e[16]; struct S f[1024]; } t;

int
foo (unsigned long x)
{
  return t.f[x + 5].a;
}

int
bar (struct T *x, unsigned long y)
{
  return x->f[y + 5].b;
}

On x86_64-linux with -O2, we have before combine in both cases
(idx + 21) << 4 and in foo combine says:
Trying 7, 8 -> 10:
7: {r87:DI=r91:DI+0x15;clobber flags:CC;}
  REG_DEAD r91:DI
  REG_UNUSED flags:CC
8: {r88:DI=r87:DI<<0x4;clobber flags:CC;}
  REG_DEAD r87:DI
  REG_UNUSED flags:CC
   10: r90:SI=[r88:DI+`t']
  REG_DEAD r88:DI
Failed to match this instruction:
(set (reg:SI 90 [ t.f[_1].a ])
(mem:SI (plus:DI (mult:DI (reg:DI 91)
(const_int 16 [0x10]))
(const:DI (plus:DI (symbol_ref:DI ("t") [flags 0x2]  )
(const_int 336 [0x150] [1 t.f[_1].a+0 S4 A32]))
Successfully matched this instruction:
(set (reg:DI 88)
(ashift:DI (reg:DI 91)
(const_int 4 [0x4])))
Successfully matched this instruction:
(set (reg:SI 90 [ t.f[_1].a ])
(mem:SI (plus:DI (reg:DI 88)
(const:DI (plus:DI (symbol_ref:DI ("t") [flags 0x2]  )
(const_int 336 [0x150] [1 t.f[_1].a+0 S4 A32]))
and it is merged the way we want:
salq$4, %rdi
movlt+336(%rdi), %eax
while in bar:
Failed to match this instruction:
(set (reg:SI 91 [ x_4(D)->f[_1].b ])
(mem:SI (plus:DI (plus:DI (mult:DI (reg:DI 93)
(const_int 16 [0x10]))
(reg:DI 92))
(const_int 340 [0x154])) [1 x_4(D)->f[_1].b+0 S4 A32]))
Failed to match this instruction:
(set (reg:DI 88)
(plus:DI (ashift:DI (reg:DI 93)
(const_int 4 [0x4]))
(reg:DI 92)))
and it is not merged:
addq$21, %rsi
salq$4, %rsi
movl4(%rsi,%rdi), %eax
when it could be:
salq$4, %rsi
movl340(%rsi,%rdi), %eax

It isn't x86 specific though, e.g. on powerpc64-linux we end up with:
addi 4,4,21
sldi 4,4,4
add 4,3,4
lwa 3,4(4)
for bar, where I guess:
sldi 4,4,4
add 4,3,4
lwa 3,340(4)
would work too.

[Bug target/54589] struct offset add should be folded into address calculation

2018-08-22 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #4 from Uroš Bizjak  ---
clang generates for x86_64:

movzbl  (%rsi), %eax
shlq$4, %rax
movl16(%rdi,%rax), %eax
movl%eax, (%rdx)
retq

and for i?86:

movl8(%esp), %edx
movl4(%esp), %ecx
movl12(%esp), %eax
movzbl  (%edx), %edx
shll$4, %edx
movl16(%ecx,%edx), %ecx
movl%ecx, (%eax)
retl

for the later case (-m32), gcc generates:

movl8(%esp), %eax
movzbl  (%eax), %eax
addl$1, %eax
sall$4, %eax
addl4(%esp), %eax
movl(%eax), %edx
movl12(%esp), %eax
movl%edx, (%eax)
ret

so, two extra additions.

[Bug target/54589] struct offset add should be folded into address calculation

2017-11-03 Thread sgunderson at bigfoot dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #3 from sgunderson at bigfoot dot com ---
Still there in GCC 7.2.1 (exact same assembler output), and in 8.0 snapshot
20171017.

[Bug target/54589] struct offset add should be folded into address calculation

2012-09-17 Thread sgunderson at bigfoot dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #2 from sgunderson at bigfoot dot com 2012-09-17 09:18:16 UTC ---
FWIW, in my original code, func() is a part of a loop body (it keeps reading
values from src in a loop). It doesn't really change anything in the generated
code, though.


[Bug target/54589] struct offset add should be folded into address calculation

2012-09-17 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Richard Guenther  changed:

   What|Removed |Added

 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
   Last reconfirmed||2012-09-17
  Component|tree-optimization   |target
 Ever Confirmed|0   |1
Summary|[missed-optimization]   |struct offset add should be
   |struct offset add should be |folded into address
   |folded into address |calculation
   |calculation |
  Known to fail||4.6.3, 4.7.2, 4.8.0

--- Comment #1 from Richard Guenther  2012-09-17 
08:49:25 UTC ---
As there is no loop involved it is RTL fwprop or combines job to eventually
do this.  I see

func:
.LFB517:
.cfi_startproc
movzbl  (%rsi), %eax
addq$1, %rax
salq$4, %rax
movl(%rdi,%rax), %eax
movl%eax, (%rdx)
ret

for both 4.7 and 4.8.  i?86 is even worse.