[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 Sam James changed: What|Removed |Added Status|NEW |ASSIGNED Blocks||101926 Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101926 [Bug 101926] [meta-bug] struct/complex/other argument passing and return should be improved
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 --- Comment #11 from Segher Boessenkool --- Why does our unpack expander use UNSPEC_UNPACK_128BIT at all, why can it not simply generate simple code (without unspecs) directly? (Same goes for "pack").
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 --- Comment #10 from luoxhu at gcc dot gnu.org --- In expand, Power8 will emit two register permute instructions to byte swap the contents by rs6000_emit_le_vsx_move. P9: 5: NOTE_INSN_BASIC_BLOCK 2 2: r129:TF=%1:TF 3: r130:TF=%3:TF 4: NOTE_INSN_FUNCTION_BEG 7: r117:DF=unspec[r129:TF,0] 70 8: r131:V2DF=r121:V2DF 9: r133:DF=vec_select(r131:V2DF,parallel) 10: r131:V2DF=vec_concat(r117:DF,r133:DF) 11: r122:V2DF=r131:V2DF 12: r118:DF=unspec[r129:TF,0x1] 70 13: r119:DF=unspec[r130:TF,0] 70 14: r134:V2DF=r124:V2DF 15: r136:DF=vec_select(r134:V2DF,parallel) 16: r134:V2DF=vec_concat(r119:DF,r136:DF) 17: r125:V2DF=r134:V2DF 18: r120:DF=unspec[r130:TF,0x1] 70 19: r137:V2DF=r122:V2DF 20: r139:DF=vec_select(r137:V2DF,parallel) 21: r137:V2DF=vec_concat(r139:DF,r118:DF) 22: [r112:DI]=r137:V2DF 23: r140:V2DF=r125:V2DF 24: r142:DF=vec_select(r140:V2DF,parallel) 25: r140:V2DF=vec_concat(r142:DF,r120:DF) 26: [r112:DI+0x10]=r140:V2DF 27: r143:V4SI=[r112:DI] 28: r144:V4SI=[r112:DI+0x10] 29: r127:V4SI=r143:V4SI 30: r128:V4SI=r144:V4SI 34: %2:V4SI=r127:V4SI 35: %3:V4SI=r128:V4SI 36: use %2:V4SI 37: use %3:V4SI P8: 5: NOTE_INSN_BASIC_BLOCK 2 2: r129:TF=%1:TF 3: r130:TF=%3:TF 4: NOTE_INSN_FUNCTION_BEG 7: r117:DF=unspec[r129:TF,0] 70 8: r131:V2DF=r121:V2DF 9: r133:DF=vec_select(r131:V2DF,parallel) 10: r131:V2DF=vec_concat(r117:DF,r133:DF) 11: r122:V2DF=r131:V2DF 12: r118:DF=unspec[r129:TF,0x1] 70 13: r119:DF=unspec[r130:TF,0] 70 14: r134:V2DF=r124:V2DF 15: r136:DF=vec_select(r134:V2DF,parallel) 16: r134:V2DF=vec_concat(r119:DF,r136:DF) 17: r125:V2DF=r134:V2DF 18: r120:DF=unspec[r130:TF,0x1] 70 19: r137:V2DF=r122:V2DF 20: r139:DF=vec_select(r137:V2DF,parallel) 21: r137:V2DF=vec_concat(r139:DF,r118:DF) 22: r140:V2DF=vec_select(r137:V2DF,parallel) 23: [r112:DI]=vec_select(r140:V2DF,parallel) 24: r141:V2DF=r125:V2DF 25: r143:DF=vec_select(r141:V2DF,parallel) 26: r141:V2DF=vec_concat(r143:DF,r120:DF) 27: r144:V2DF=vec_select(r141:V2DF,parallel) 28: [r112:DI+0x10]=vec_select(r144:V2DF,parallel) 29: r146:V4SI=vec_select([r112:DI],parallel) 30: r145:V4SI=vec_select(r146:V4SI,parallel) 31: r148:V4SI=vec_select([r112:DI+0x10],parallel) 32: r147:V4SI=vec_select(r148:V4SI,parallel) 33: r127:V4SI=r145:V4SI 34: r128:V4SI=r147:V4SI 38: %2:V4SI=r127:V4SI 39: %3:V4SI=r128:V4SI 40: use %2:V4SI 41: use %3:V4SI Difference starts from #22. Power8 will emit two vec_select instructions for stack store/load operations. But power9 needs only one.
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 luoxhu at gcc dot gnu.org changed: What|Removed |Added CC||luoxhu at gcc dot gnu.org --- Comment #9 from luoxhu at gcc dot gnu.org --- No load/store on Power9. cat pr69493.s .file "pr69493.c" .abiversion 2 .section".text" .align 2 .p2align 4,,15 .globl test_big_double .type test_big_double, @function test_big_double: .LFB0: .cfi_startproc mfvsrd 7,1 mfvsrd 10,2 mfvsrd 8,3 mfvsrd 9,4 mtvsrdd 34,10,7 mtvsrdd 35,9,8 blr .long 0 .byte 0,0,0,0,0,0,0,0 .cfi_endproc .LFE0: .size test_big_double,.-test_big_double .ident "GCC: (GNU) 9.2.1 20191023 (Advance-Toolchain 13.0-1) [aba1f4e8b6ac]" .gnu_attribute 4, 5 .section.note.GNU-stack,"",@progbits
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 --- Comment #8 from Peter Bergner --- I'll note that Kelvin's r256656 commit fixed the test case in Comment 6 because we know the loads and stores are sufficiently aligned and there are loads and stores that will do the correct byte swap in LE mode if the address is aligned. However, we still produce poor code for the first test case.
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 --- Comment #7 from Peter Bergner --- (In reply to Peter Bergner from comment #6) > When compiling for POWER9, we get the code we want/expect: FYI, we also get optimal code (ie, just a blr) when compiling on POWER8 BE.
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 Peter Bergner changed: What|Removed |Added CC||bergner at gcc dot gnu.org --- Comment #6 from Peter Bergner --- A simpler test case that shows the same problem when compiling for POWER8. When compiling for POWER9, we get the code we want/expect: bergner@pike:~/gcc/BUGS/PR70053$ cat pr69493-2.c typedef struct { __vector double vx0; __vector double vx1; } vec_t; vec_t foo (__vector double a, __vector double b) { vec_t result; result.vx0 = a; result.vx1 = b; return result; } bergner@pike:~/gcc/BUGS/PR70053$ /home/bergner/gcc/build/gcc-fsf-mainline-pr70053-debug/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-pr70053-debug/gcc -S -O2 -mcpu=power8 pr69493-2.c bergner@pike:~/gcc/BUGS/PR70053$ cat pr69493-2.s ... foo: addi 8,1,-96 li 10,32 xxpermdi 34,34,34,2 xxpermdi 35,35,35,2 li 9,48 stxvd2x 34,8,10 stxvd2x 35,8,9 lxvd2x 34,8,10 lxvd2x 35,8,9 xxpermdi 34,34,34,2 xxpermdi 35,35,35,2 blr bergner@pike:~/gcc/BUGS/PR70053$ /home/bergner/gcc/build/gcc-fsf-mainline-pr70053-debug/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-pr70053-debug/gcc -S -O2 -mcpu=power9 pr69493-2.c bergner@pike:~/gcc/BUGS/PR70053$ cat pr69493-2.s ... foo: blr
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 --- Comment #5 from Segher Boessenkool --- Ah, needs -mlittle, not just -mabi=elfv2.
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 --- Comment #4 from Bill Schmidt --- I still see the problem with: GCC: (GNU) 6.0.0 20160309 (experimental) [trunk revision 234085]
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 --- Comment #3 from Bill Schmidt --- That's interesting. We have some other examples of similar issues we should check as well before closing this. I'll take a look in a bit.
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 --- Comment #2 from Segher Boessenkool --- This seems fixed on current trunk (dse1 removes the reload from mem)?
[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-01-26 CC||segher at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Segher Boessenkool --- Confirmed. At expand time it already goes via memory.