https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103568
Bug ID: 103568 Summary: sub-optimal vector construction with two loaded doubles on Power10 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- For the test case: vector double test(double *a, double *b) { return (vector double) { *a, *b }; } On Power10, we generate the code like: ld 10,0(3) ld 9,0(4) mtvsrdd 34,9,10 As Power10 latency table, we can get better code with xxlor like: lxsd 0, 0(4) lxvrdx 1, 0, 3 xxlor 34, 1, 32 As to the prerequisites "if we can assume the doubleword 1 of a vsx register after an lfd is zero", as Segher pointed out "ISA 3.1 section 7.1.1.1 says this already". SPEC2017 510.parest_r may be one benchmark to evaluate the effect (with vectorization turned on).