https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71805
Bug ID: 71805 Summary: incorrect code for test pr45752.c with -mcpu=power9 Product: gcc Version: 6.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: acsawdey at gcc dot gnu.org CC: bergner at gcc dot gnu.org, meissner at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Target: powerpc64le-linux Created attachment 38859 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38859&action=edit objdump of generated binary plus my annotations which are abstracted in the note above testsuite/gcc.dg/vect/pr45752.c is producing some code where it seems like a register value needed is being overwritten Compile flags: /home/sawdey/src/gcc/gcc-6-branch/build/gcc/xgcc -B/home/sawdey/src/gcc/gcc-6-branch/build/gcc/ /home/sawdey/src/gcc/gcc-6-branch/gcc/gcc/testsuite/gcc.dg/vect/pr45752.c -mcpu=power9 -Wl,-rpath=/tmp/lib64 -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -ffat-lto-objects -maltivec -mpower9-vector -ftree-vectorize -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details --param tree-reassoc-width=1 -lm -o ./pr45752.exe The compiler is gcc-6-branch 238072 plus bergner's p9 VMX ICE patch and kelvin's vpermr fix. The 4th group of 4 results is incorrect: (gdb) p check_results $24 = {3208, 1334, 28764, 35679, 2789, 13028, 4754, 168364, 91254, 12399, 22848, 8174, 307964, 146829, 22009, 32668, 11594, 447564, 202404, 31619} (gdb) p output $25 = {3208, 1334, 28764, 35679, 2789, 13028, 4754, 168364, 91254, 12399, 22848, 8174, 310424, 178137, 26529, 31036, 11594, 447564, 202404, 31619} This is my extraction of the dataflow for the incorrect vector: 10000788: 09 00 e9 f5 lxv vs47,0(r9) << set vs47/v15 from load 100007b8: 09 00 87 f6 lxv vs52,0(r7) << set vs52/v20 from load 10000898: 09 01 81 f4 lxv vs36,256(r1) << set vs36/v4 from load 100008f8: 99 01 61 f7 lxv vs59,400(r1) << set vs59 from load 10000900: 89 01 01 f4 lxv vs32,384(r1) << set vs32 from load 10000918: 01 00 e7 f7 lxv vs31,0(r7) << set vs31 from load 1000094c: 01 00 49 f4 lxv vs2,0(r9) << set vs2 from load 10000950: 01 00 a7 f5 lxv vs13,0(r7) << set vs13 from load 10000958: eb 03 fb 11 vperm v15,v27,v0,v15 << set v15/vs47 from v27, v0, v15 10000988: 8c 22 81 11 vspltw v12,v4,1 << set v12/vs44 from v4/vs36 10000994: 01 00 29 f4 lxv vs1,0(r9) << set vs1 from load 100009a0: 96 64 ac f2 xxlor vs21,vs44,vs44 << set vs21 from vs44/v12 100009a4: 8c 22 83 11 vspltw v12,v4,3 << set v12/vs44 from v4/vs36 100009c0: 96 64 8c f0 xxlor vs4,vs44,vs44 << set vs4 from vs44/v12 100009cc: 91 ac b5 f1 xxlor vs45,vs21,vs21 << set vs45/v13 from vs21 100009d0: 91 fc df f1 xxlor vs46,vs31,vs31 << set vs46/v14 from vs31 100009f0: 96 7c af f0 xxlor vs5,vs47,vs47 << set vs5 from vs47/v15 100009fc: 89 70 ed 10 vmuluwm v7,v13,v14 << set v7/vs39 from v13, v14 10000a08: 91 14 a2 f1 xxlor vs45,vs2,vs2 << set vs45/v13 from vs2 10000a28: 91 24 84 f1 xxlor vs44,vs4,vs4 << set v12/vs44 from vs4 10000a2c: 89 68 8c 11 vmuluwm v12,v12,v13 << set v12/vs44 from v12, v13 10000a3c: 96 64 8c f0 xxlor vs4,vs44,vs44 << set vs4 from vs44/v12 10000a40: f9 00 81 f5 lxv vs44,240(r1) << set vs44/v12 from load 10000a44: 8c 62 c0 11 vspltw v14,v12,0 << set v14/vs46 from v12/vs44 10000aa0: d4 68 5a f1 xxperm vs10,vs58,vs13 << set vs10 from vs58, vs13 10000aa4: 8c 22 40 13 vspltw v26,v4,0 << set v26/vs58 from v4/vs36 10000acc: 01 00 c7 f7 lxv vs30,0(r7) << set vs30 from load 10000b08: 91 0c 81 f1 xxlor vs44,vs1,vs1 << set vs44/v12 from vs1 10000b0c: 89 60 ce 11 vmuluwm v14,v14,v12 << set v14/vs46 from v14, v12 10000b20: 8c 22 a2 11 vspltw v13,v4,2 << set v13/vs45 from v4/vs36 10000b24: 96 6c 8d f3 xxlor vs28,vs45,vs45 << set vs28 from vs45/v13 10000b40: 91 2c 85 f1 xxlor vs44,vs5,vs5 << set vs44/v12 from vs5 10000b44: 89 a0 8c 12 vmuluwm v20,v12,v20 << set v20/vs52 from v12 and v20 10000b5c: 01 00 a9 f5 lxv vs13,0(r9) << set vs13 from load 10000b94: 89 68 ac 11 vmuluwm v13,v12,v13 << v13/vs45 set here to be written over? 10000b98: 91 e4 9c f1 xxlor vs44,vs28,vs28 << set vs44/v12 from vs28 10000ba0: 91 fc bf f1 xxlor vs45,vs31,vs31 << set vs45/v13 from vs31 10000ba4: 89 68 8c 12 vmuluwm v20,v12,v13 << set v20 from v12 and v13 10000bcc: 91 24 24 f3 xxlor vs57,vs4,vs4 << set vs57/v25 from vs4 10000be0: 80 c8 e7 10 vadduwm v7,v7,v25 << set v7 from v7 and v25 10000c00: 80 70 e7 10 vadduwm v7,v7,v14 << set v7 from v7 and v14 10000c10: 91 6c ed f3 xxlor vs63,vs13,vs13 << set vs63 from vs13 10000c28: 89 f8 5a 13 vmuluwm v26,v26,v31 << set v26 from v26 and v31 10000c44: 80 a0 07 11 vadduwm v8,v7,v20 << set v8/vs40 from v7 and v28 10000c68: 80 d0 08 11 vadduwm v8,v8,v26 << set v8/vs40 from v8 and v26 The punchline is at 10000b94/10000ba0 which both set v13/vs45 and I don't think that is what was intended.