https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Wilco from comment #3) > I think it's because many intrinsics in arm_neon.h still use asm which > inhibits most optimizations. NO in this case it is not. Take: #include "arm_neon.h" float64x1_t fun(float64x2_t a, float64x2_t b) { return vget_low_f64(b); } double fun1(float64x2_t a, float64x2_t b) { return b[0]; } ---- CUT ---- Both of these should be optimized to just fmov d0, d1 ret Even worse take: #include "arm_neon.h" float64x1_t fun(float64x2_t a, float64x2_t b) { return vget_low_f64(b) + vget_high_f64(b); } double fun1(float64x2_t a, float64x2_t b) { return b[0] + b[1]; } ---- CUT ---