Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
On Tue, Sep 6, 2011 at 10:54 AM, Ilya Enkovich enkovich@gmail.com wrote: 2011/9/6 Uros Bizjak ubiz...@gmail.com: Please merge your new splitters with corresponding LEA patterns. OK with this change. Thanks, Uros. Fixed. Could please someone check it in if it's OK now? Thanks, Ilya --- gcc/ 2011-09-06 Enkovich Ilya ilya.enkov...@intel.com * config/i386/i386-protos.h (ix86_lea_outperforms): New. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. * config/i386/i386.c (LEA_MAX_STALL): New. (increase_distance): Likewise. (insn_defines_reg): Likewise. (insn_uses_reg_mem): Likewise. (distance_non_agu_define_in_bb): Likewise. (distance_agu_use_in_bb): Likewise. (ix86_lea_outperforms): Likewise. (ix86_ok_to_clobber_flags): Likewise. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. (distance_non_agu_define): Search in pred BBs added. (distance_agu_use): Search in succ BBs added. (IX86_LEA_PRIORITY): Value changed from 2 to 0. (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL. (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision. * config/i386/i386.md: Split added to transform non destructive add into move and add. (lea_1): transformed into insn_and_split to avoid AGU stalls. (leamode_2): Likewise. I checked it into trunk for you. Thanks. -- H.J.
Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
Hello, Thanks for review! 2011/9/3 Uros Bizjak ubiz...@gmail.com: Did you also test on x32 ? H.J.'s x32 page [1] currently says that Atom LEA optimization is disabled on x32 for some reason. No. I did not try to cover x32. It will be a separate work. +bool +ix86_avoid_lea_for_addr (rtx insn, rtx operands[]) +{ + unsigned int regno0 = true_regnum (operands[0]) ; + unsigned int regno1 = -1; + unsigned int regno2 = -1; Use INVALID_REGNUM here. Fixed. Also used INVALID_REGNUM in other places where -1 was used as invalid register number. +extern void +ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode) +{ Missing comment. Fixed +;; Split non destructive adds if we cannot use lea. +(define_split + [(set (match_operand:SWI48 0 register_operand ) + (plus:SWI48 (match_operand:SWI48 1 register_operand ) + (match_operand:SWI48 2 nonmemory_operand ))) + (clobber (reg:CC FLAGS_REG))] + reload_completed ix86_avoid_lea_for_add (insn, operands) + [(set (match_dup 0) (match_dup 1)) + (parallel [(set (match_dup 0) (plus:MODE (match_dup 0) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))]) + ] +) Put all closing braces on one line: Fixed. +;; Split lea into one or more ALU instructions if profitable. +(define_split + [(set (match_operand:SI 0 register_operand ) + (subreg:SI (match_operand:DI 1 lea_address_operand ) 0))] + reload_completed ix86_avoid_lea_for_addr (insn, operands) + [(const_int 0)] +{ + ix86_split_lea_for_addr (operands, SImode); + DONE; +}) This is valid only for TARGET_64BIT. Fixed. Please note that x32 adds quite some different LEA patterns (see i386.md, line 5466+). I suggest you merge your splitters with these define_insn patterns into define_insn_and_split, adding reload_completed ix86_avoid_lea_for_addr (insn, operands) as a split condition. Thanks for the note. I'll look at new patterns when we enable lea optimization for x32. Uros. Is fixed version OK? Thanks, Ilya --- gcc/ 2011-09-06 Enkovich Ilya ilya.enkov...@intel.com * config/i386/i386-protos.h (ix86_lea_outperforms): New. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. * config/i386/i386.c (LEA_MAX_STALL): New. (increase_distance): Likewise. (insn_defines_reg): Likewise. (insn_uses_reg_mem): Likewise. (distance_non_agu_define_in_bb): Likewise. (distance_agu_use_in_bb): Likewise. (ix86_lea_outperforms): Likewise. (ix86_ok_to_clobber_flags): Likewise. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. (distance_non_agu_define): Search in pred BBs added. (distance_agu_use): Search in succ BBs added. (IX86_LEA_PRIORITY): Value changed from 2 to 0. (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL. (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision. * config/i386/i386.md: Splits added to transform lea into a sequence of instructions. lea.diff Description: Binary data
Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
On Tue, Sep 6, 2011 at 2:26 PM, Ilya Enkovich enkovich@gmail.com wrote: Is fixed version OK? Thanks, Ilya --- gcc/ 2011-09-06 Enkovich Ilya ilya.enkov...@intel.com * config/i386/i386-protos.h (ix86_lea_outperforms): New. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. * config/i386/i386.c (LEA_MAX_STALL): New. (increase_distance): Likewise. (insn_defines_reg): Likewise. (insn_uses_reg_mem): Likewise. (distance_non_agu_define_in_bb): Likewise. (distance_agu_use_in_bb): Likewise. (ix86_lea_outperforms): Likewise. (ix86_ok_to_clobber_flags): Likewise. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. (distance_non_agu_define): Search in pred BBs added. (distance_agu_use): Search in succ BBs added. (IX86_LEA_PRIORITY): Value changed from 2 to 0. (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL. (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision. * config/i386/i386.md: Splits added to transform lea into a sequence of instructions. Please merge your new splitters with corresponding LEA patterns. OK with this change. Thanks, Uros.
Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
2011/9/6 Uros Bizjak ubiz...@gmail.com: Please merge your new splitters with corresponding LEA patterns. OK with this change. Thanks, Uros. Fixed. Could please someone check it in if it's OK now? Thanks, Ilya --- gcc/ 2011-09-06 Enkovich Ilya ilya.enkov...@intel.com * config/i386/i386-protos.h (ix86_lea_outperforms): New. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. * config/i386/i386.c (LEA_MAX_STALL): New. (increase_distance): Likewise. (insn_defines_reg): Likewise. (insn_uses_reg_mem): Likewise. (distance_non_agu_define_in_bb): Likewise. (distance_agu_use_in_bb): Likewise. (ix86_lea_outperforms): Likewise. (ix86_ok_to_clobber_flags): Likewise. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. (distance_non_agu_define): Search in pred BBs added. (distance_agu_use): Search in succ BBs added. (IX86_LEA_PRIORITY): Value changed from 2 to 0. (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL. (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision. * config/i386/i386.md: Split added to transform non destructive add into move and add. (lea_1): transformed into insn_and_split to avoid AGU stalls. (leamode_2): Likewise. lea.diff Description: Binary data
Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
... Sent again, with correct Cc and subject line ... Hello! Here is a patch which adds few more splits for AGU stalls avoidance on Atom. It also fixes cost model and detects AGU stalls more efficiently. Bootstrapped and checked on x86_64-linux. 2011-09-02 Enkovich Ilya ilya.enkov...@intel.com * config/i386/i386-protos.h (ix86_lea_outperforms): New. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. * config/i386/i386.c (LEA_MAX_STALL): New. (increase_distance): Likewise. (insn_defines_reg): Likewise. (insn_uses_reg_mem): Likewise. (distance_non_agu_define_in_bb): Likewise. (distance_agu_use_in_bb): Likewise. (ix86_lea_outperforms): Likewise. (ix86_ok_to_clobber_flags): Likewise. (ix86_avoid_lea_for_add): Likewise. (ix86_avoid_lea_for_addr): Likewise. (ix86_split_lea_for_addr): Likewise. (distance_non_agu_define): Search in pred BBs added. (distance_agu_use): Search in succ BBs added. (IX86_LEA_PRIORITY): Value changed from 2 to 0. (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL. (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision. * config/i386/i386.md: Splits added to transform lea into a sequence of instructions. Did you also test on x32 ? H.J.'s x32 page [1] currently says that Atom LEA optimization is disabled on x32 for some reason. The patch looks OK to me, with a few nits below. [1] https://sites.google.com/site/x32abi/ --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h +bool +ix86_avoid_lea_for_addr (rtx insn, rtx operands[]) +{ + unsigned int regno0 = true_regnum (operands[0]) ; + unsigned int regno1 = -1; + unsigned int regno2 = -1; Use INVALID_REGNUM here. +extern void +ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode) +{ Missing comment. --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -5777,6 +5777,41 @@ (const_string none))) (set_attr mode QI)]) +;; Split non destructive adds if we cannot use lea. +(define_split + [(set (match_operand:SWI48 0 register_operand ) + (plus:SWI48 (match_operand:SWI48 1 register_operand ) + (match_operand:SWI48 2 nonmemory_operand ))) + (clobber (reg:CC FLAGS_REG))] + reload_completed ix86_avoid_lea_for_add (insn, operands) + [(set (match_dup 0) (match_dup 1)) + (parallel [(set (match_dup 0) (plus:MODE (match_dup 0) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))]) + ] +) Put all closing braces on one line: (clobber (reg:CC FLAGS_REG))])]) +;; Split lea into one or more ALU instructions if profitable. +(define_split + [(set (match_operand:SI 0 register_operand ) + (subreg:SI (match_operand:DI 1 lea_address_operand ) 0))] + reload_completed ix86_avoid_lea_for_addr (insn, operands) + [(const_int 0)] +{ + ix86_split_lea_for_addr (operands, SImode); + DONE; +}) This is valid only for TARGET_64BIT. Please note that x32 adds quite some different LEA patterns (see i386.md, line 5466+). I suggest you merge your splitters with these define_insn patterns into define_insn_and_split, adding reload_completed ix86_avoid_lea_for_addr (insn, operands) as a split condition. Uros.