[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2021-01-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
  Known to work||11.0

--- Comment #8 from ktkachov at gcc dot gnu.org ---
The issue in this bug report is that the "get low lane" operation should just
be a move rather than a vec_select so that it can be optimised away.
After g:e140f5fd3e235c5a37dc99b79f37a5ad4dc59064 GCC 11 does the right thing
for all testcases in this PR

So marking this as fixed.

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2020-03-31 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

--- Comment #7 from Sebastian Pop  ---
Hi Andrew, have you committed the fix for this?

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2020-01-25 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

Andrew Pinski  changed:

   What|Removed |Added

  Attachment #47356|0   |1
is obsolete||

--- Comment #6 from Andrew Pinski  ---
Created attachment 47706
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47706&action=edit
latest patch

I will submit this tomorrow.  I just need to "xfail" two of the testcases for
big-endian.  big-endian does something weird sometimes.

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2020-01-24 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

--- Comment #5 from Andrew Pinski  ---
Note some of the moves were removed with
g:8c8952918b75f4fa6adbbe44cd641d5fd0bb55e3

But it is not a general solution, it just "splits" the case where dst and
source have the same register.  my patch (which I have a few improvements to it
and fixing it for GCC 10) handles the case where the dst and the source
registers are different which is what is happening in this case.

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2020-01-12 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2019-11-25 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

--- Comment #4 from Andrew Pinski  ---
(In reply to Wilco from comment #3)
> I think it's because many intrinsics in arm_neon.h still use asm which
> inhibits most optimizations.

NO in this case it is not.

Take:
#include "arm_neon.h"

float64x1_t fun(float64x2_t a, float64x2_t b) {
  return vget_low_f64(b);
}
double fun1(float64x2_t a, float64x2_t b) {
  return b[0];
}

 CUT 
Both of these should be optimized to just
fmov d0, d1
ret

Even worse take:
#include "arm_neon.h"

float64x1_t fun(float64x2_t a, float64x2_t b) {
  return vget_low_f64(b) + vget_high_f64(b);
}
double fun1(float64x2_t a, float64x2_t b) {
  return b[0] + b[1];
}

 CUT ---

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2019-11-25 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #3 from Wilco  ---
(In reply to Andrew Pinski from comment #2)
> Created attachment 47356 [details]
> Patch which I wrote for GCC 7.3
> 
> I have to double check if it applies directly as I had other patches in this
> area but this is the patch which I had wrote for GCC 7.3.

I think it's because many intrinsics in arm_neon.h still use asm which inhibits
most optimizations.

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2019-11-25 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

--- Comment #2 from Andrew Pinski  ---
Created attachment 47356
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47356&action=edit
Patch which I wrote for GCC 7.3

I have to double check if it applies directly as I had other patches in this
area but this is the patch which I had wrote for GCC 7.3.

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2019-11-25 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

Andrew Pinski  changed:

   What|Removed |Added

 Target||aarch64-linux-gnu
 Status|UNCONFIRMED |NEW
Version|unknown |10.0
   Keywords||missed-optimization
   Last reconfirmed||2019-11-25
  Component|rtl-optimization|target
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Ever confirmed|0   |1
   Severity|normal  |enhancement

--- Comment #1 from Andrew Pinski  ---
Confirmed.
I have a patch (or two) that will optimize this.
I was going to be submitting them in the next few days.