[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 --- Comment #23 from H.J. Lu --- (In reply to Uroš Bizjak from comment #22) > Created attachment 38412 [details] > Proposed patch > > This patch moves all TARGET_SSE_PARTIAL_REG_DEPENDENCY FP conversion > splitters to a later split pass. Plus, the patch substantially cleans these > and related patterns. > > The functionality of post-reload conversion splitters goes this way: > > - process FP conversions for TARGET_USE_VECTOR_FP_CONVERTS in an early > post-reload splitter. This pass will rewrite FP conversions to vector insns > and is thus incompatible with the next two passes. AMDFAM10 processors > depend on this transformation. > > - process FP conversions for TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS in a > peephole2 pass. This will transform mem->reg insns to reg->reg insns, and > these insn could be processed by the next pass. Some Intel processors depend > on this transformation. > > - process FP conversions for TARGET_SSE_PARTIAL_REG_DEPENDENCY in a late > post-reload splitter, when allocated registers are stable. AMD and Intel > processors depend on this pass, so it is part of generic tuning. We need to move those special SSE SF->DF splitters before (define_split [(set (match_operand 0 "any_fp_register_operand") (float_extend (match_operand 1 "memory_operand")))] "reload_completed && (GET_MODE (operands[0]) == TFmode || GET_MODE (operands[0]) == XFmode || GET_MODE (operands[0]) == DFmode)" [(set (match_dup 0) (match_dup 2))] { operands[2] = find_constant_src (curr_insn); if (operands[2] == NULL_RTX || (SSE_REGNO_P (REGNO (operands[0])) && standard_sse_constant_p (operands[2], GET_MODE (operands[0])) != 1) || (STACK_REGNO_P (REGNO (operands[0])) && standard_80387_constant_p (operands[2]) < 1)) FAIL; }) Otherwise, they may not be used on memory operand since the general SSE (In reply to Uroš Bizjak from comment #22) > Created attachment 38412 [details] > Proposed patch > > This patch moves all TARGET_SSE_PARTIAL_REG_DEPENDENCY FP conversion > splitters to a later split pass. Plus, the patch substantially cleans these > and related patterns. > > The functionality of post-reload conversion splitters goes this way: > > - process FP conversions for TARGET_USE_VECTOR_FP_CONVERTS in an early > post-reload splitter. This pass will rewrite FP conversions to vector insns > and is thus incompatible with the next two passes. AMDFAM10 processors > depend on this transformation. > > - process FP conversions for TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS in a > peephole2 pass. This will transform mem->reg insns to reg->reg insns, and > these insn could be processed by the next pass. Some Intel processors depend > on this transformation. > > - process FP conversions for TARGET_SSE_PARTIAL_REG_DEPENDENCY in a late > post-reload splitter, when allocated registers are stable. AMD and Intel > processors depend on this pass, so it is part of generic tuning. We need to move those special SSE SF->DF splitters before (define_split [(set (match_operand 0 "any_fp_register_operand") (float_extend (match_operand 1 "memory_operand")))] "reload_completed && (GET_MODE (operands[0]) == TFmode || GET_MODE (operands[0]) == XFmode || GET_MODE (operands[0]) == DFmode)" [(set (match_dup 0) (match_dup 2))] { operands[2] = find_constant_src (curr_insn); if (operands[2] == NULL_RTX || (SSE_REGNO_P (REGNO (operands[0])) && standard_sse_constant_p (operands[2], GET_MODE (operands[0])) != 1) || (STACK_REGNO_P (REGNO (operands[0])) && standard_80387_constant_p (operands[2]) < 1)) FAIL; }) Otherwise, they may not be used on memory operand since the general SSE float_extend splitter on memory operand will be used.
[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 --- Comment #24 from H.J. Lu --- Created attachment 38414 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38414&action=edit A patch to move special SSE splitters before general SSE float_extend splitter
[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 --- Comment #25 from Uroš Bizjak --- (In reply to H.J. Lu from comment #23) > We need to move those special SSE SF->DF splitters before No, this splitter will fail if the transformation doesn't result in a constant. So, we actually want this splitter first, to try to transform a memory load to a constant load, and moving others before this one would be harmful. > (define_split > [(set (match_operand 0 "any_fp_register_operand") > (float_extend (match_operand 1 "memory_operand")))] > "reload_completed >&& (GET_MODE (operands[0]) == TFmode >|| GET_MODE (operands[0]) == XFmode >|| GET_MODE (operands[0]) == DFmode)" > [(set (match_dup 0) (match_dup 2))] > { > operands[2] = find_constant_src (curr_insn); > > if (operands[2] == NULL_RTX > || (SSE_REGNO_P (REGNO (operands[0])) > && standard_sse_constant_p (operands[2], > GET_MODE (operands[0])) != 1) > || (STACK_REGNO_P (REGNO (operands[0])) >&& standard_80387_constant_p (operands[2]) < 1)) > FAIL; > })
[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 --- Comment #26 from H.J. Lu --- (In reply to Uroš Bizjak from comment #25) > (In reply to H.J. Lu from comment #23) > > > We need to move those special SSE SF->DF splitters before > > No, this splitter will fail if the transformation doesn't result in a > constant. So, we actually want this splitter first, to try to transform a > memory load to a constant load, and moving others before this one would be > harmful. > > > (define_split > > [(set (match_operand 0 "any_fp_register_operand") > > (float_extend (match_operand 1 "memory_operand")))] > > "reload_completed > >&& (GET_MODE (operands[0]) == TFmode > >|| GET_MODE (operands[0]) == XFmode > >|| GET_MODE (operands[0]) == DFmode)" > > [(set (match_dup 0) (match_dup 2))] > > { > > operands[2] = find_constant_src (curr_insn); > > > > if (operands[2] == NULL_RTX > > || (SSE_REGNO_P (REGNO (operands[0])) > > && standard_sse_constant_p (operands[2], > > GET_MODE (operands[0])) != 1) > > || (STACK_REGNO_P (REGNO (operands[0])) > >&& standard_80387_constant_p (operands[2]) < 1)) > > FAIL; > > }) But when this splitter fails, no other splitters will be tried.
[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 --- Comment #27 from Uroš Bizjak --- (In reply to H.J. Lu from comment #26) > But when this splitter fails, no other splitters will be tried. Bah. This is clearly an implementation bug in the split pass. I don't think we have to work around it, the infrastructure bug should be fixed instead. Can you please open a new bug about it? I propose to go ahead and commit my patch, which is already accumulating several semi-related fixes and improvements. The patch will probably expose an optimization problem in the splitting pass, so we will have a testcase to illustrate the problem.
[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 --- Comment #28 from uros at gcc dot gnu.org --- Author: uros Date: Wed May 4 21:13:13 2016 New Revision: 235906 URL: https://gcc.gnu.org/viewcvs?rev=235906&root=gcc&view=rev Log: PR target/70873 * config/i386/i386.md (TARGET_SSE_PARTIAL_REG_DEPENDENCY float_extend sf->df peephole2): Change to post-epilogue_completed late splitter. Use sse_reg_operand as operand 0 predicate. (TARGET_SSE_PARTIAL_REG_DEPENDENCY float_truncate df->sf peephole2): Ditto. (TARGET_SSE_PARTIAL_REG_DEPENDENCY float {si,di}->{sf,df} peephole2): Ditto. Emit the pattern using RTX. (TARGET_USE_VECTOR_FP_CONVERTS float_extend sf->df splitter): Use sse_reg_opreand as operand 0 predicate. Do not use true_regnum in the post-reload splitter. Use lowpart_subreg instead of gen_rtx_REG. (TARGET_USE_VECTOR_FP_CONVERTS float_truncate df->sf splitter): Ditto. (TARGET_USE_VECTOR_CONVERTS float si->{sf,df} splitter): Use sse_reg_operand as operand 0 predicate. (TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS float_extend sf->df peephole2): Use sse_reg_opreand as operand 0 predicate. Use lowpart_subreg instead of gen_rtx_REG. (TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS float_truncate sf->df peephole2): Ditto. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md
[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 --- Comment #29 from Uroš Bizjak --- Created attachment 38419 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38419&action=edit Patch to fix problematic splitters I'm testing the attached patch that introduces ix86_standard_x87sse_constant_load_p function, and uses it to limit problematic splitters only to cases, where we can actually transform the insn to a constant load.
[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 --- Comment #30 from uros at gcc dot gnu.org --- Author: uros Date: Thu May 5 22:48:29 2016 New Revision: 235936 URL: https://gcc.gnu.org/viewcvs?rev=235936&root=gcc&view=rev Log: PR target/70873 * config/i386/i386-protos.h (ix86_standard_x87sse_constant_load_p): New prototype. * config/i386/i386.c (ix86_standard_x87sse_constant_load_p): New. * config/i386/i386.md (push mem splitter): Use find_constant_src in the splitter condition. (FP load splitter): Use ix86_standard_x87sse_constant_load_p in the splitter condition. (FP float_extend load splitter): Ditto. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-protos.h trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.md
[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #31 from Uroš Bizjak --- Fixed.