https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111697
Bug ID: 111697 Summary: Sub optimal code gen for initialising vector using loop Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, For the following test-case: typedef int v4si __attribute__((vector_size (sizeof (int) * 4))); v4si f(int x) { v4si v; for (int i = 0; i < 4; i++) v[i] = x; return v; } Compiling with -O2 results in following .optimized dump: v4si f (int x) { v4si v; <bb 2> [local count: 214748368]: v_16 = BIT_INSERT_EXPR <v_12(D), x_6(D), 0 (32 bits)>; v_20 = BIT_INSERT_EXPR <v_16, x_6(D), 32 (32 bits)>; v_24 = BIT_INSERT_EXPR <v_20, x_6(D), 64 (32 bits)>; v_2 = BIT_INSERT_EXPR <v_24, x_6(D), 96 (32 bits)>; return v_2; } and following code-gen on aarch64: f: movi v0.4s, 0 fmov s31, w0 ins v0.s[0], v31.s[0] ins v0.s[1], v31.s[0] ins v0.s[2], v31.s[0] ins v0.s[3], v31.s[0] ret which could instead be a single dup instruction: f: dup v0.4s, w0 ret Similarly, code-gen on x86_64: f: movd %edi, %xmm0 movd %edi, %xmm1 pshufd $225, %xmm0, %xmm0 movss %xmm1, %xmm0 pshufd $225, %xmm0, %xmm0 pshufd $198, %xmm0, %xmm0 movss %xmm1, %xmm0 pshufd $198, %xmm0, %xmm0 pshufd $39, %xmm0, %xmm0 movss %xmm1, %xmm0 pshufd $39, %xmm0, %xmm0 ret