[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 ktkachov at gcc dot gnu.org changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Known to work||11.0 --- Comment #8 from ktkachov at gcc dot gnu.org --- The issue in this bug report is that the "get low lane" operation should just be a move rather than a vec_select so that it can be optimised away. After g:e140f5fd3e235c5a37dc99b79f37a5ad4dc59064 GCC 11 does the right thing for all testcases in this PR So marking this as fixed.
[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 --- Comment #7 from Sebastian Pop --- Hi Andrew, have you committed the fix for this?
[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 Andrew Pinski changed: What|Removed |Added Attachment #47356|0 |1 is obsolete|| --- Comment #6 from Andrew Pinski --- Created attachment 47706 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47706&action=edit latest patch I will submit this tomorrow. I just need to "xfail" two of the testcases for big-endian. big-endian does something weird sometimes.
[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 --- Comment #5 from Andrew Pinski --- Note some of the moves were removed with g:8c8952918b75f4fa6adbbe44cd641d5fd0bb55e3 But it is not a general solution, it just "splits" the case where dst and source have the same register. my patch (which I have a few improvements to it and fixing it for GCC 10) handles the case where the dst and the source registers are different which is what is happening in this case.
[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 Andrew Pinski changed: What|Removed |Added Status|NEW |ASSIGNED
[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 --- Comment #4 from Andrew Pinski --- (In reply to Wilco from comment #3) > I think it's because many intrinsics in arm_neon.h still use asm which > inhibits most optimizations. NO in this case it is not. Take: #include "arm_neon.h" float64x1_t fun(float64x2_t a, float64x2_t b) { return vget_low_f64(b); } double fun1(float64x2_t a, float64x2_t b) { return b[0]; } CUT Both of these should be optimized to just fmov d0, d1 ret Even worse take: #include "arm_neon.h" float64x1_t fun(float64x2_t a, float64x2_t b) { return vget_low_f64(b) + vget_high_f64(b); } double fun1(float64x2_t a, float64x2_t b) { return b[0] + b[1]; } CUT ---
[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #3 from Wilco --- (In reply to Andrew Pinski from comment #2) > Created attachment 47356 [details] > Patch which I wrote for GCC 7.3 > > I have to double check if it applies directly as I had other patches in this > area but this is the patch which I had wrote for GCC 7.3. I think it's because many intrinsics in arm_neon.h still use asm which inhibits most optimizations.
[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 --- Comment #2 from Andrew Pinski --- Created attachment 47356 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47356&action=edit Patch which I wrote for GCC 7.3 I have to double check if it applies directly as I had other patches in this area but this is the patch which I had wrote for GCC 7.3.
[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 Andrew Pinski changed: What|Removed |Added Target||aarch64-linux-gnu Status|UNCONFIRMED |NEW Version|unknown |10.0 Keywords||missed-optimization Last reconfirmed||2019-11-25 Component|rtl-optimization|target Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Ever confirmed|0 |1 Severity|normal |enhancement --- Comment #1 from Andrew Pinski --- Confirmed. I have a patch (or two) that will optimize this. I was going to be submitting them in the next few days.