Re: [PATCH, Atom] Improve AGU stalls avoidance optimization

2011-09-08 Thread H.J. Lu
On Tue, Sep 6, 2011 at 10:54 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2011/9/6 Uros Bizjak ubiz...@gmail.com:

 Please merge your new splitters with corresponding LEA patterns.

 OK with this change.

 Thanks,
 Uros.


 Fixed. Could please someone check it in if it's OK now?

 Thanks,
 Ilya
 ---
 gcc/

 2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

        * config/i386/i386-protos.h (ix86_lea_outperforms): New.
        (ix86_avoid_lea_for_add): Likewise.
        (ix86_avoid_lea_for_addr): Likewise.
        (ix86_split_lea_for_addr): Likewise.

        * config/i386/i386.c (LEA_MAX_STALL): New.
        (increase_distance): Likewise.
        (insn_defines_reg): Likewise.
        (insn_uses_reg_mem): Likewise.
        (distance_non_agu_define_in_bb): Likewise.
        (distance_agu_use_in_bb): Likewise.
        (ix86_lea_outperforms): Likewise.
        (ix86_ok_to_clobber_flags): Likewise.
        (ix86_avoid_lea_for_add): Likewise.
        (ix86_avoid_lea_for_addr): Likewise.
        (ix86_split_lea_for_addr): Likewise.
        (distance_non_agu_define): Search in pred BBs added.
        (distance_agu_use): Search in succ BBs added.
        (IX86_LEA_PRIORITY): Value changed from 2 to 0.
        (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
        (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.

        * config/i386/i386.md: Split added to transform non destructive
        add into move and add.
        (lea_1): transformed into insn_and_split to avoid AGU stalls.
        (leamode_2): Likewise.


I checked it into trunk for you.

Thanks.

-- 
H.J.


Re: [PATCH, Atom] Improve AGU stalls avoidance optimization

2011-09-06 Thread Ilya Enkovich
Hello,

Thanks for review!

2011/9/3 Uros Bizjak ubiz...@gmail.com:
 Did you also test on x32 ? H.J.'s x32 page [1] currently says that
 Atom LEA optimization is disabled on x32 for some reason.
No. I did not try to cover x32. It will be a separate work.

 +bool
 +ix86_avoid_lea_for_addr (rtx insn, rtx operands[])
 +{
 +  unsigned int regno0 = true_regnum (operands[0]) ;
 +  unsigned int regno1 = -1;
 +  unsigned int regno2 = -1;

 Use INVALID_REGNUM here.
Fixed. Also used  INVALID_REGNUM in other places where -1 was used as
invalid register number.


 +extern void
 +ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode)
 +{

 Missing comment.
Fixed

 +;; Split non destructive adds if we cannot use lea.
 +(define_split
 +  [(set (match_operand:SWI48 0 register_operand )
 +       (plus:SWI48 (match_operand:SWI48 1 register_operand )
 +              (match_operand:SWI48 2 nonmemory_operand )))
 +   (clobber (reg:CC FLAGS_REG))]
 +  reload_completed  ix86_avoid_lea_for_add (insn, operands)
 +  [(set (match_dup 0) (match_dup 1))
 +   (parallel [(set (match_dup 0) (plus:MODE (match_dup 0) (match_dup 2)))
 +             (clobber (reg:CC FLAGS_REG))])
 +  ]
 +)

 Put all closing braces on one line:
Fixed.

 +;; Split lea into one or more ALU instructions if profitable.
 +(define_split
 +  [(set (match_operand:SI 0 register_operand )
 +       (subreg:SI (match_operand:DI 1 lea_address_operand ) 0))]
 +  reload_completed  ix86_avoid_lea_for_addr (insn, operands)
 +  [(const_int 0)]
 +{
 +  ix86_split_lea_for_addr (operands, SImode);
 +  DONE;
 +})

 This is valid only for TARGET_64BIT.
Fixed.

 Please note that x32 adds quite some different LEA patterns (see
 i386.md, line 5466+). I suggest you merge your splitters with these
 define_insn patterns into define_insn_and_split, adding 
 reload_completed  ix86_avoid_lea_for_addr (insn, operands) as a
 split condition.
Thanks for the note. I'll look at new patterns when we enable lea
optimization for x32.


 Uros.


Is fixed version OK?

Thanks,
Ilya
---
gcc/

2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

* config/i386/i386-protos.h (ix86_lea_outperforms): New.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.

* config/i386/i386.c (LEA_MAX_STALL): New.
(increase_distance): Likewise.
(insn_defines_reg): Likewise.
(insn_uses_reg_mem): Likewise.
(distance_non_agu_define_in_bb): Likewise.
(distance_agu_use_in_bb): Likewise.
(ix86_lea_outperforms): Likewise.
(ix86_ok_to_clobber_flags): Likewise.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.
(distance_non_agu_define): Search in pred BBs added.
(distance_agu_use): Search in succ BBs added.
(IX86_LEA_PRIORITY): Value changed from 2 to 0.
(LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
(ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.

* config/i386/i386.md: Splits added to transform lea into a
sequence of instructions.


lea.diff
Description: Binary data


Re: [PATCH, Atom] Improve AGU stalls avoidance optimization

2011-09-06 Thread Uros Bizjak
On Tue, Sep 6, 2011 at 2:26 PM, Ilya Enkovich enkovich@gmail.com wrote:


 Is fixed version OK?

 Thanks,
 Ilya
 ---
 gcc/

 2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

        * config/i386/i386-protos.h (ix86_lea_outperforms): New.
        (ix86_avoid_lea_for_add): Likewise.
        (ix86_avoid_lea_for_addr): Likewise.
        (ix86_split_lea_for_addr): Likewise.

        * config/i386/i386.c (LEA_MAX_STALL): New.
        (increase_distance): Likewise.
        (insn_defines_reg): Likewise.
        (insn_uses_reg_mem): Likewise.
        (distance_non_agu_define_in_bb): Likewise.
        (distance_agu_use_in_bb): Likewise.
        (ix86_lea_outperforms): Likewise.
        (ix86_ok_to_clobber_flags): Likewise.
        (ix86_avoid_lea_for_add): Likewise.
        (ix86_avoid_lea_for_addr): Likewise.
        (ix86_split_lea_for_addr): Likewise.
        (distance_non_agu_define): Search in pred BBs added.
        (distance_agu_use): Search in succ BBs added.
        (IX86_LEA_PRIORITY): Value changed from 2 to 0.
        (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
        (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.

        * config/i386/i386.md: Splits added to transform lea into a
        sequence of instructions.

Please merge your new splitters with corresponding LEA patterns.

OK with this change.

Thanks,
Uros.


Re: [PATCH, Atom] Improve AGU stalls avoidance optimization

2011-09-06 Thread Ilya Enkovich
2011/9/6 Uros Bizjak ubiz...@gmail.com:

 Please merge your new splitters with corresponding LEA patterns.

 OK with this change.

 Thanks,
 Uros.


Fixed. Could please someone check it in if it's OK now?

Thanks,
Ilya
---
gcc/

2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

* config/i386/i386-protos.h (ix86_lea_outperforms): New.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.

* config/i386/i386.c (LEA_MAX_STALL): New.
(increase_distance): Likewise.
(insn_defines_reg): Likewise.
(insn_uses_reg_mem): Likewise.
(distance_non_agu_define_in_bb): Likewise.
(distance_agu_use_in_bb): Likewise.
(ix86_lea_outperforms): Likewise.
(ix86_ok_to_clobber_flags): Likewise.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.
(distance_non_agu_define): Search in pred BBs added.
(distance_agu_use): Search in succ BBs added.
(IX86_LEA_PRIORITY): Value changed from 2 to 0.
(LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
(ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.

* config/i386/i386.md: Split added to transform non destructive
add into move and add.
(lea_1): transformed into insn_and_split to avoid AGU stalls.
(leamode_2): Likewise.


lea.diff
Description: Binary data


Re: [PATCH, Atom] Improve AGU stalls avoidance optimization

2011-09-03 Thread Uros Bizjak
... Sent again, with correct Cc and subject line ...

Hello!

 Here is a patch which adds few more splits for AGU stalls avoidance on
 Atom. It also fixes cost model and detects AGU stalls more
 efficiently.

 Bootstrapped and checked on x86_64-linux.

 2011-09-02  Enkovich Ilya  ilya.enkov...@intel.com

       * config/i386/i386-protos.h (ix86_lea_outperforms): New.
       (ix86_avoid_lea_for_add): Likewise.
       (ix86_avoid_lea_for_addr): Likewise.
       (ix86_split_lea_for_addr): Likewise.

       * config/i386/i386.c (LEA_MAX_STALL): New.
       (increase_distance): Likewise.
       (insn_defines_reg): Likewise.
       (insn_uses_reg_mem): Likewise.
       (distance_non_agu_define_in_bb): Likewise.
       (distance_agu_use_in_bb): Likewise.
       (ix86_lea_outperforms): Likewise.
       (ix86_ok_to_clobber_flags): Likewise.
       (ix86_avoid_lea_for_add): Likewise.
       (ix86_avoid_lea_for_addr): Likewise.
       (ix86_split_lea_for_addr): Likewise.
       (distance_non_agu_define): Search in pred BBs added.
       (distance_agu_use): Search in succ BBs added.
       (IX86_LEA_PRIORITY): Value changed from 2 to 0.
       (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
       (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.

       * config/i386/i386.md: Splits added to transform lea into a
       sequence of instructions.

Did you also test on x32 ? H.J.'s x32 page [1] currently says that
Atom LEA optimization is disabled on x32 for some reason.

The patch looks OK to me, with a few nits below.

[1] https://sites.google.com/site/x32abi/

--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h

+bool
+ix86_avoid_lea_for_addr (rtx insn, rtx operands[])
+{
+  unsigned int regno0 = true_regnum (operands[0]) ;
+  unsigned int regno1 = -1;
+  unsigned int regno2 = -1;

Use INVALID_REGNUM here.

+extern void
+ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode)
+{

Missing comment.

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5777,6 +5777,41 @@
        (const_string none)))
   (set_attr mode QI)])

+;; Split non destructive adds if we cannot use lea.
+(define_split
+  [(set (match_operand:SWI48 0 register_operand )
+       (plus:SWI48 (match_operand:SWI48 1 register_operand )
+              (match_operand:SWI48 2 nonmemory_operand )))
+   (clobber (reg:CC FLAGS_REG))]
+  reload_completed  ix86_avoid_lea_for_add (insn, operands)
+  [(set (match_dup 0) (match_dup 1))
+   (parallel [(set (match_dup 0) (plus:MODE (match_dup 0) (match_dup 2)))
+             (clobber (reg:CC FLAGS_REG))])
+  ]
+)

Put all closing braces on one line:

             (clobber (reg:CC FLAGS_REG))])])

+;; Split lea into one or more ALU instructions if profitable.
+(define_split
+  [(set (match_operand:SI 0 register_operand )
+       (subreg:SI (match_operand:DI 1 lea_address_operand ) 0))]
+  reload_completed  ix86_avoid_lea_for_addr (insn, operands)
+  [(const_int 0)]
+{
+  ix86_split_lea_for_addr (operands, SImode);
+  DONE;
+})

This is valid only for TARGET_64BIT.

Please note that x32 adds quite some different LEA patterns (see
i386.md, line 5466+). I suggest you merge your splitters with these
define_insn patterns into define_insn_and_split, adding 
reload_completed  ix86_avoid_lea_for_addr (insn, operands) as a
split condition.

Uros.