[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-10-18 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #23 from rsandifo at gcc dot gnu.org  
---
Fixed.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-06-17 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #22 from rsandifo at gcc dot gnu.org  
---
(In reply to kugan from comment #21)
> (In reply to Christophe Lyon from comment #20)
> > Hi Kugan,
> > 
> > The new test fails with -mabi=ilp32:
> > FAIL: gcc.target/aarch64/pr88834.c scan-assembler-times \\tld2w\\t{z[0-9]+.s
> > - z[0-9]+.s}, p[0-7]/z, \\[x[0-9]+, x[0-9]+, lsl 2\\]\\n 2
> > FAIL: gcc.target/aarch64/pr88834.c scan-assembler-times \\tst2w\\t{z[0-9]+.s
> > - z[0-9]+.s}, p[0-7], \\[x[0-9]+, x[0-9]+, lsl 2\\]\\n 1
> 
> Thanks Christophe. In the back-end, when we use ILP32, we don't accept
> SImode ops if like:
> 
> (plus:SI (mult:SI (reg:SI 91)
> (const_int 4 [0x4]))
> (reg:SI 90))
> 
> While we would accept Pmode. My question is, should we care about ILP32 for
> SVE? If so we need to fix this. Otherwise, we can run the test for LP64.

We care, but I don't think anyone's actively working on it.

I agree it's a generic ILP32 problem though.  We already disable
scan-assembler matching in sve/ for similar reasons, so I think the
easiest fix is to move both pr88834.c and pr88838.c from aarch64/ to
aarch64/sve/.  Sorry for not noticing that they were in the "wrong"
directory during the reviews.

Note that after moving, the dg-options for both tests should just
be "-O3", without "-S" or "-march=...".

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-06-17 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #21 from kugan at gcc dot gnu.org ---
(In reply to Christophe Lyon from comment #20)
> Hi Kugan,
> 
> The new test fails with -mabi=ilp32:
> FAIL: gcc.target/aarch64/pr88834.c scan-assembler-times \\tld2w\\t{z[0-9]+.s
> - z[0-9]+.s}, p[0-7]/z, \\[x[0-9]+, x[0-9]+, lsl 2\\]\\n 2
> FAIL: gcc.target/aarch64/pr88834.c scan-assembler-times \\tst2w\\t{z[0-9]+.s
> - z[0-9]+.s}, p[0-7], \\[x[0-9]+, x[0-9]+, lsl 2\\]\\n 1

Thanks Christophe. In the back-end, when we use ILP32, we don't accept SImode
ops if like:

(plus:SI (mult:SI (reg:SI 91)
(const_int 4 [0x4]))
(reg:SI 90))

While we would accept Pmode. My question is, should we care about ILP32 for
SVE? If so we need to fix this. Otherwise, we can run the test for LP64.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-06-13 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #20 from Christophe Lyon  ---
Hi Kugan,

The new test fails with -mabi=ilp32:
FAIL: gcc.target/aarch64/pr88834.c scan-assembler-times \\tld2w\\t{z[0-9]+.s -
z[0-9]+.s}, p[0-7]/z, \\[x[0-9]+, x[0-9]+, lsl 2\\]\\n 2
FAIL: gcc.target/aarch64/pr88834.c scan-assembler-times \\tst2w\\t{z[0-9]+.s -
z[0-9]+.s}, p[0-7], \\[x[0-9]+, x[0-9]+, lsl 2\\]\\n 1

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-06-12 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #19 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Thu Jun 13 03:18:54 2019
New Revision: 272232

URL: https://gcc.gnu.org/viewcvs?rev=272232=gcc=rev
Log:

gcc/ChangeLog:

2019-06-13  Kugan Vivekanandarajah  

PR target/88834
* tree-ssa-loop-ivopts.c (get_mem_type_for_internal_fn): Handle
IFN_MASK_LOAD_LANES and IFN_MASK_STORE_LANES.
(get_alias_ptr_type_for_ptr_address): Likewise.
(add_iv_candidate_for_use): Add scaled index candidate if useful.
* tree-ssa-address.c (preferred_mem_scale_factor): New.
* config/aarch64/aarch64.c (aarch64_classify_address): Relax
allow_reg_index_p.

gcc/testsuite/ChangeLog:

2019-06-13  Kugan Vivekanandarajah  

PR target/88834
* gcc.target/aarch64/pr88834.c: New test.
* gcc.target/aarch64/sve/struct_vect_1.c: Adjust.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_7.c: Likewise.


Added:
trunk/gcc/testsuite/gcc.target/aarch64/pr88834.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_1.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_15.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_16.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_17.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_7.c
trunk/gcc/tree-ssa-address.c
trunk/gcc/tree-ssa-address.h
trunk/gcc/tree-ssa-loop-ivopts.c

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-09 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #18 from rsandifo at gcc dot gnu.org  
---
(In reply to kugan from comment #12)
> (In reply to rsand...@gcc.gnu.org from comment #10)
> > (In reply to kugan from comment #9)
> > > Created attachment 46040 [details]
> > > patch
> > 
> > Wasn't sure whether this patch was WIP or the final version
> > for review, but we need to do something more generic than
> > dividing by 4.  I think the test will still fail with "int"
> > changed to "short" for example.
> > 
> > I also don't think the new candidate should be tied to the
> > mask/load store functions.  Maybe one approach would be to
> > check when adding a zero-based candidate for a use in:
> > 
> >   /* Record common candidate with initial value zero.  */
> >   basetype = TREE_TYPE (iv->base);
> >   if (POINTER_TYPE_P (basetype))
> > basetype = sizetype;
> >   record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> > 
> > whether the use actually benefits from this unscaled iv.
> > If the use is USE_REF_ADDRESS, we could compare the cost
> > of an address with an unscaled index with the cost of an address
> > with a scaled index.  I think the natural scale value to try
> > would be GET_MODE_INNER (TYPE_MODE (mem_type)).
> 
> Thanks for the comments. I agree this is the right place. But I am not sure
> if checking the cost at this point is what IV opt generally does. In
> general, IV-opt adds candidates which can be helpful and later decides the
> optimal set. 

But I was talking about comparing the cost of the address rather
than the cost of the iv.  Like you say, the idea is to add candidates
that might be useful, and what we want to know here is whether the
bytes offset is likely to be a useful candidate for this use.

Another way of deciding whether to go for a scaled candidate would
be to test for a legitimate address directly (rather than via
address costs) if you prefer that.  I just thought using address
costs might be easier.

We could also keep the unscaled candidate in addition to the
new scaled one if we have evidence that having both is useful.
The danger is that if we add too many, we'll trip the iv limit,
so I think we'd need positive evidence for keeping both.

> If we are to use get_computation_cost to see the costs, we have to create
> iv_cand and then discard. Since we are adding only one candidate and that
> too for SVE like targets, I am thinking that it is OK. If you still prefer
> to check the cost, I will change that.

IMO it's a generic concept that just happens to apply to SVE.
If an architecture is going to support just one "reg+reg" addressing
mode, the two obvious choices are for the offset register to be unscaled
(bytes) or scaled by the element or access size (indices).  SVE chose
the latter.  In that case, the most useful candidate is likely to be
the index rather than the byte offset.

This applies to single-vector loads and stores as well as
LOAD/STORE_LANES.  The reason we usually get good iv choices
for single vectors is that the index usually exists as a candidate
already, in the form of the loop control iv.  (This is of course the
main benefit to base+scaled addressing over base+unscaled addressing.)
But it's probably possible to construct examples in which the
index candidate doesn't already exist even for single vectors.

> Attached patch (only the ivopt changes) and testcase

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-09 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #17 from kugan at gcc dot gnu.org ---
(In reply to Wilco from comment #16)
> (In reply to kugan from comment #15)
> > (In reply to Wilco from comment #11)
> > > There is also something odd with the way the loop iterates, this doesn't
> > > look right:
> > > 
> > > whilelo p0.s, x3, x4
> > > incwx3
> > > ptest   p1, p0.b
> > > bne .L3
> > 
> > I am not sure I understand this. I tried with qemu using an execution
> > testcase and It seems to work.
> > 
> > whilelo p0.s, x4, x5
> > incwx4
> > ptest   p1, p0.b
> > bne .L3
> > In my case I have the above (register allocation difference only) incw is
> > correct considering two vector word registers? Am I missing something here?
> 
> I'm talking about the completely redundant ptest, where does that come from?

It is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88836

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #16 from Wilco  ---
(In reply to kugan from comment #15)
> (In reply to Wilco from comment #11)
> > There is also something odd with the way the loop iterates, this doesn't
> > look right:
> > 
> > whilelo p0.s, x3, x4
> > incwx3
> > ptest   p1, p0.b
> > bne .L3
> 
> I am not sure I understand this. I tried with qemu using an execution
> testcase and It seems to work.
> 
> whilelo   p0.s, x4, x5
>   incwx4
>   ptest   p1, p0.b
>   bne .L3
> In my case I have the above (register allocation difference only) incw is
> correct considering two vector word registers? Am I missing something here?

I'm talking about the completely redundant ptest, where does that come from?

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-08 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #15 from kugan at gcc dot gnu.org ---
(In reply to Wilco from comment #11)
> There is also something odd with the way the loop iterates, this doesn't
> look right:
> 
> whilelo p0.s, x3, x4
> incwx3
> ptest   p1, p0.b
> bne .L3

I am not sure I understand this. I tried with qemu using an execution testcase
and It seems to work.

whilelo p0.s, x4, x5
incwx4
ptest   p1, p0.b
bne .L3
In my case I have the above (register allocation difference only) incw is
correct considering two vector word registers? Am I missing something here?

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-08 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #14 from kugan at gcc dot gnu.org ---
Created attachment 46104
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46104=edit
testcase

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-08 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

kugan at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #46040|0   |1
is obsolete||

--- Comment #13 from kugan at gcc dot gnu.org ---
Created attachment 46103
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46103=edit
ivopt changes alone

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-08 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #12 from kugan at gcc dot gnu.org ---
(In reply to rsand...@gcc.gnu.org from comment #10)
> (In reply to kugan from comment #9)
> > Created attachment 46040 [details]
> > patch
> 
> Wasn't sure whether this patch was WIP or the final version
> for review, but we need to do something more generic than
> dividing by 4.  I think the test will still fail with "int"
> changed to "short" for example.
> 
> I also don't think the new candidate should be tied to the
> mask/load store functions.  Maybe one approach would be to
> check when adding a zero-based candidate for a use in:
> 
>   /* Record common candidate with initial value zero.  */
>   basetype = TREE_TYPE (iv->base);
>   if (POINTER_TYPE_P (basetype))
> basetype = sizetype;
>   record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> 
> whether the use actually benefits from this unscaled iv.
> If the use is USE_REF_ADDRESS, we could compare the cost
> of an address with an unscaled index with the cost of an address
> with a scaled index.  I think the natural scale value to try
> would be GET_MODE_INNER (TYPE_MODE (mem_type)).

Thanks for the comments. I agree this is the right place. But I am not sure if
checking the cost at this point is what IV opt generally does. In general,
IV-opt adds candidates which can be helpful and later decides the optimal set. 

If we are to use get_computation_cost to see the costs, we have to create
iv_cand and then discard. Since we are adding only one candidate and that too
for SVE like targets, I am thinking that it is OK. If you still prefer to check
the cost, I will change that.

Attached patch (only the ivopt changes) and testcase

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-03-28 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #11 from Wilco  ---
There is also something odd with the way the loop iterates, this doesn't look
right:

whilelo p0.s, x3, x4
incwx3
ptest   p1, p0.b
bne .L3

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-03-28 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #10 from rsandifo at gcc dot gnu.org  
---
(In reply to kugan from comment #9)
> Created attachment 46040 [details]
> patch

Wasn't sure whether this patch was WIP or the final version
for review, but we need to do something more generic than
dividing by 4.  I think the test will still fail with "int"
changed to "short" for example.

I also don't think the new candidate should be tied to the
mask/load store functions.  Maybe one approach would be to
check when adding a zero-based candidate for a use in:

  /* Record common candidate with initial value zero.  */
  basetype = TREE_TYPE (iv->base);
  if (POINTER_TYPE_P (basetype))
basetype = sizetype;
  record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);

whether the use actually benefits from this unscaled iv.
If the use is USE_REF_ADDRESS, we could compare the cost
of an address with an unscaled index with the cost of an address
with a scaled index.  I think the natural scale value to try
would be GET_MODE_INNER (TYPE_MODE (mem_type)).

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-03-27 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

kugan at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #45686|0   |1
is obsolete||

--- Comment #9 from kugan at gcc dot gnu.org ---
Created attachment 46040
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46040=edit
patch

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-03-27 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #8 from kugan at gcc dot gnu.org ---
(In reply to rsand...@gcc.gnu.org from comment #7)
> Thanks for looking at this.
> 
> (In reply to kugan from comment #6)
> > cmp w3, 0
> > ble .L1
> > sub w3, w3, #1
> > mov x4, 0
> > cntwx5
> > ptrue   p1.s, all
> > lsr w3, w3, 1
> > add w3, w3, 1
> > whilelo p0.s, xzr, x3
> > .p2align 3,,7
> > .L3:
> > ld2w{z4.s - z5.s}, p0/z, [x1, x4, lsl 2]
> > ld2w{z2.s - z3.s}, p0/z, [x2, x4, lsl 2]
> > add z0.s, z4.s, z2.s
> > sub z1.s, z5.s, z3.s
> > st2w{z0.s - z1.s}, p0, [x0, x4, lsl 2]
> > whilelo p0.s, x5, x3
> > incbx4, all, mul #2
> > incwx5
> > ptest   p1, p0.b
> > bne .L3
> > .L1:
> > ret
> > .cfi_endproc
> 
> This doesn't look right.  x4 is an index, so it should be
> incremented by the number of words in two vectors, rather than
> the number of bytes in two vectors.

Thanks for the comments. Fixed it with the attached patch it generates

f:
.LFB0:
.cfi_startproc
cmp w3, 0
ble .L1
sub w5, w3, #1
cntwx4
mov x3, 0
ptrue   p1.s, all
lsr w5, w5, 1
add w5, w5, 1
whilelo p0.s, xzr, x5
.p2align 3,,7
.L3:
ld2w{z4.s - z5.s}, p0/z, [x1, x3, lsl 2]
ld2w{z2.s - z3.s}, p0/z, [x2, x3, lsl 2]
add z0.s, z4.s, z2.s
sub z1.s, z5.s, z3.s
st2w{z0.s - z1.s}, p0, [x0, x3, lsl 2]
whilelo p0.s, x4, x5
inchx3
incwx4
ptest   p1, p0.b
bne .L3
.L1:
ret
.cfi_endproc

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-27 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #7 from rsandifo at gcc dot gnu.org  
---
Thanks for looking at this.

(In reply to kugan from comment #6)
>   cmp w3, 0
>   ble .L1
>   sub w3, w3, #1
>   mov x4, 0
>   cntwx5
>   ptrue   p1.s, all
>   lsr w3, w3, 1
>   add w3, w3, 1
>   whilelo p0.s, xzr, x3
>   .p2align 3,,7
> .L3:
>   ld2w{z4.s - z5.s}, p0/z, [x1, x4, lsl 2]
>   ld2w{z2.s - z3.s}, p0/z, [x2, x4, lsl 2]
>   add z0.s, z4.s, z2.s
>   sub z1.s, z5.s, z3.s
>   st2w{z0.s - z1.s}, p0, [x0, x4, lsl 2]
>   whilelo p0.s, x5, x3
>   incbx4, all, mul #2
>   incwx5
>   ptest   p1, p0.b
>   bne .L3
> .L1:
>   ret
>   .cfi_endproc

This doesn't look right.  x4 is an index, so it should be
incremented by the number of words in two vectors, rather than
the number of bytes in two vectors.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-12 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #6 from kugan at gcc dot gnu.org ---

> 
> Note the difference in mode for aarch64_classify_address. Not sure if this
> is because of the way my patch changes ivopt.

Yes, it ws my mistake in iv-use. with attached patch, I now get
cmp w3, 0
ble .L1
sub w3, w3, #1
mov x4, 0
cntwx5
ptrue   p1.s, all
lsr w3, w3, 1
add w3, w3, 1
whilelo p0.s, xzr, x3
.p2align 3,,7
.L3:
ld2w{z4.s - z5.s}, p0/z, [x1, x4, lsl 2]
ld2w{z2.s - z3.s}, p0/z, [x2, x4, lsl 2]
add z0.s, z4.s, z2.s
sub z1.s, z5.s, z3.s
st2w{z0.s - z1.s}, p0, [x0, x4, lsl 2]
whilelo p0.s, x5, x3
incbx4, all, mul #2
incwx5
ptest   p1, p0.b
bne .L3
.L1:
ret
.cfi_endproc

I will post the patch for review after stage-1 opens. In the meantime any
review is appreciated. Especially the part where iv-use is setup and
get_alias_ptr_type_for_ptr_address.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-12 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

kugan at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #45661|0   |1
is obsolete||

--- Comment #5 from kugan at gcc dot gnu.org ---
Created attachment 45686
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45686=edit
ivopt patch v2

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-11 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #4 from kugan at gcc dot gnu.org ---
Created attachment 45661
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45661=edit
ivopt patch v1

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-11 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #3 from kugan at gcc dot gnu.org ---
I added iv-use for MASKED_LOAD_LANE and the result is
cmp w3, 0
ble .L1
sub w5, w3, #1
mov x4, 0
lsr w5, w5, 1
add w5, w5, 1
whilelo p0.s, xzr, x5
.p2align 3,,7
.L3:
lsl x3, x4, 3
incwx4
add x7, x1, x3
add x6, x2, x3
ld2w{z4.s - z5.s}, p0/z, [x7]
ld2w{z2.s - z3.s}, p0/z, [x6]
add x3, x0, x3
add z0.s, z4.s, z2.s
sub z1.s, z5.s, z3.s
st2w{z0.s - z1.s}, p0, [x3]
whilelo p0.s, x4, x5
bne .L3
.L1:
ret

No base plus scaled index addressing mode. This is because in ivopt

When called from ivopt:
Breakpoint 4, aarch64_classify_address (info=0x7fffcba0, x=0x76c44f30,
mode=E_DImode, strict_p=false, type=ADDR_QUERY_M)
at
/home/kugan/work/abe/snapshots/gcc.git~origin~aarch64~sve-acle-branch/gcc/config/aarch64/aarch64.c:5689
5689{
(gdb) p debug_rtx (x)
(plus:DI (mult:DI (reg:DI 91)
(const_int 8 [0x8]))
(reg:DI 90))

it accepts it.

When in cfgexpand:
Breakpoint 5, aarch64_classify_address (info=0x7fffcca0, x=0x76c5b840,
mode=E_VNx8SImode, strict_p=false, type=ADDR_QUERY_M)
at
/home/kugan/work/abe/snapshots/gcc.git~origin~aarch64~sve-acle-branch/gcc/config/aarch64/aarch64.c:5689
5689{
(gdb) p debug_rtx (x)
(plus:DI (mult:DI (reg:DI 92 [ ivtmp_28 ])
(const_int 8 [0x8]))
(reg/v/f:DI 110 [ y ]))


This is not accepted because of aarch64_classify_index (info, op1, mode,
strict_p) failing (as it should).

Note the difference in mode for aarch64_classify_address. Not sure if this is
because of the way my patch changes ivopt.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-03 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #2 from kugan at gcc dot gnu.org ---
I'll assign it to myself unless it is being looked at by someone else.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-01-14 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||aarch64
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-14
 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.