https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80697
Michael Meissner <meissner at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2017-05-11 Ever confirmed|0 |1 --- Comment #2 from Michael Meissner <meissner at gcc dot gnu.org> --- I did some comparisons to older benchmarks that were run on the same machine. On April 21, 2016 I did a benchmark run with subversion id 235167, and milc's speed was roughly the same as GCC 6.3. On May 12, 2016, I did a benchmark run with subversion id 236136, and milc's speed was roughly the same as GCC 7.1. Here is the function that seems to be causing the performance issues: Instructions | gcc7 | gcc6 | diff | Class ============ | ==== | ==== | ==== | ===== fadd, xsaddd | 12 | 0 | -12 | DF add fmadd, xsmadd*dp | 20 | 28 | 8 | DF multiply and add fmsub, xsmsub*dp | 4 | 0 | -4 | DF multiply and subtract fmul, xsmuldp | 24 | 8 | -16 | DF multiply fnmsub, xsnmsub*dp | 0 | 12 | 12 | DF negate, multiply and subtract fsub, xssubdp | 4 | 0 | -4 | DF subtract ld | 5 | 0 | -5 | load doubleword offset lfd | 48 | 53 | 5 | load DF offset mtvsrd | 5 | 0 | -5 | move to vsr doubleword xvadddp | 3 | 0 | -3 | V2DF add xvmadd*dp | 5 | 7 | 2 | V2DF multiply and add xvmuldp | 6 | 2 | -4 | V2DF multiply xvnmsub*dp | 1 | 3 | 2 | V2DF negate, multiply and subtract xvsubdp | 1 | 0 | -1 | V2DF subtract If I had to guess there are two things going on that are based in PowerPC changes in that period. The first is a rather massive patch that I put in to add ISA 3.0 d-form (register+offset) support. It looks like it causes the register allocator to load values in GPRs and do direct moves when it wants to move a value to a scalar DFmode value in a traditional Altivec register (which prior to ISA 3.0 did not have d-form support). This accounts for the LD instead of the LFD instructions and the MTVSRD. While it is better than a store and a load, a direct move on power8 systems is fairly slow. I ran into a similar thing with PR 68163, and fixing it involved tuning the constraints for the moves (SFmode in the case of 68163, DFmode here). The second thing is Aaron Sawdey's patch for tuning the reassociation width went in in this period. This likely affects when we can merge adds and multiply into the PowerPC fma instructions. 2016-05-04 Aaron Sawdey <acsaw...@linux.vnet.ibm.com> * config/rs6000/rs6000.c (rs6000_reassociation_width): Add function for TARGET_SCHED_REASSOCIATION_WIDTH to enable parallel reassociation for power8 and forward. 2016-05-11 Michael Meissner <meiss...@linux.vnet.ibm.com> * config/rs6000/predicates.md (quad_memory_operand): Move most of the code into quad_address_p and call it to share code with vsx_quad_dform_memory_operand. (vsx_quad_dform_memory_operand): New predicate for ISA 3.0 vector d-form support. * config/rs6000/rs6000.opt (-mlra): Switch to being an option mask bit instead of being a separate word. Split -mpower9-dform into two switches, -mpower9-dform-scalar and -mpower9-dform-vector. * config/rs6000/rs6000.c (RELOAD_REG_QUAD_OFFSET): New addr_mask for the register class supporting 128-bit quad word memory offsets. (mode_supports_vsx_dform_quad): Helper function to return if the register class uses quad word memory offsets. (rs6000_debug_addr_mask): Add support for quad word memory offsets. (rs6000_debug_reg_global): Always print if we are using LRA or not. (rs6000_setup_reg_addr_masks): If ISA 3.0 vector d-form instructions are enabled, set up the appropriate addr_masks for 128-bit types. (rs6000_init_hard_regno_mode_ok): wb constraint is now based on -mpower9-dform-scalar, instead of -mpower9-dform. (rs6000_option_override_internal): Split -mpower9-dform into two switches, -mpower9-dform-scalar and -mpower9-dform-vector. The -mpower9-dform switch sets or clears both. If we are not using the LRA register allocator, do not enable -mpower9-dform-vector by default. If we are using LRA, enable -mpower9-dform-vector and -mvsx-timode if it is appropriate. Issue a warning if either -mpower9-dform-vector or -mvsx-timode are explicitly used without enabling LRA. (quad_address_offset_p): New helper function to return if the offset is legal for quad word memory instructions. (quad_address_p): New function to determin if GPR or vector register quad word memory addresses are legal. (mem_operand_gpr): Validate quad word address offsets. (reg_offset_addressing_ok_p): Add support for ISA 3.0 vector d-form (register + offset) instructions. (offsettable_ok_by_alignment): Likewise. (rs6000_legitimate_offset_address_p): Likewise. (legitimate_lo_sum_address_p): Likewise. (rs6000_legitimize_address): Likewise. (rs6000_legitimize_reload_address): Add more debug statements for -mdebug=addr. (rs6000_legitimate_address_p): Add support for ISA 3.0 vector d-form instructions. (rs6000_secondary_reload_memory): Add support for ISA 3.0 vector d-form instructions. Distinguish different cases in debug output. (rs6000_secondary_reload_inner): Add support for ISA 3.0 vector d-form instructions. (rs6000_preferred_reload_class): Likewise. (rs6000_output_move_128bit): Add support for ISA 3.0 d-form instructions. If ISA 3.0 is available, generate lxvx/stxvx instead of the ISA 2.06 indexed memory instructions. (rs6000_emit_prologue): If we have ISA 3.0 d-form instructions, use them to save/restore the saved vector registers instead of using Altivec instructions. (rs6000_emit_epilogue): Likewise. (rs6000_lra_p): Use TARGET_LRA instead of the old option word. (rs6000_opt_masks): Split -mpower9-dform into -mpower9-dform-scalar and -mpower9-dform-vector. (rs6000_print_options_internal): Print -mno-<switch> if <switch> was not selected. * config/rs6000/vsx.md (p9_vecload_<mode>): Delete hack to emit ISA 3.0 vector indexed memory instructions, and fold the code into the normal mov<mode> patterns. (p9_vecstore_<mode>): Likewise. (vsx_mov<mode>): Add support for ISA 3.0 vector d-form instructions. (vsx_movti_64bit): Likewise. (vsx_movti_32bit): Likewise. * config/rs6000/constraints.md (wO constraint): New constraint for ISA 3.0 vector d-form support. * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Use -mpower9-dform-scalar instead of -mpower9-dform. Add note not to include -mpower9-dform-vector until we switch over to LRA. (POWERPC_MASKS): Add -mlra. Split -mpower9-dform into two. switches, -mpower9-dform-scalar and -mpower9-dform-vector. * config/rs6000/rs6000-protos.h (quad_address_p): Add declaration. * doc/invoke.texi (RS/6000 and PowerPC Options): Add documentation for -mpower9-dform and -mlra. * doc/md.texi (wO constraint): Document wO constraint.