Hello!

Apparently, reload does not like matching register alternatives in
*avx_unpcklpd256. Attached patch removes this problematic alternative
and generates vunpcklpd insn instead, since the later instruction has
the same length, latency, uops, etc  as vmovddup while giving much
more freedom to reload.

I would kindly ask someone fluent in fortran to construct a
compile-time testcase from the PR [1], that will correctly exercise
the fix on x86 with "-O3 -mavx" options and immortalize the testcase
in gfortran testsuite.

2011-10-27  Uros Bizjak  <ubiz...@gmail.com>

        PR target/50875
        * config/i386/sse.md (*avx2_unpcklpd256): Remove extra insn
        constraints.  Change alternative 1 to "x,m,1".

Patch was bootstrapped and regresion tested on
x86_64-pc-linux-gnu{,-m32} with --with-fpmath=avx configure option.
Also, the fortran testcase from the PR was checked manually that it
compiles and works OK.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50875

Thanks,
Uros.
Index: ChangeLog
===================================================================
--- ChangeLog   (revision 180528)
+++ ChangeLog   (working copy)
@@ -1,5 +1,9 @@
 2011-10-26  Eric Botcazou  <ebotca...@adacore.com>
 
+<<<<<<< .mine
+       * input.c (expand_location): Rewrite using linemap_resolve_location
+       and linemap_expand_location.  Add a comment.
+=======
        * reload.c (reload_inner_reg_of_subreg): Change type of return value
        and type of OUTPUT parameter to bool and adjust.  Document MODE and
        OUTPUT parameters.  Use HARD_REGISTER_P.  Reorder final condition
@@ -197,6 +201,7 @@
        * input.c (expand_location): Rewrite using
        linemap_resolve_location and linemap_expand_location.  Add a
        comment.
+>>>>>>> .r180528
 
 2011-10-25  Jakub Jelinek  <ja...@redhat.com>
 
@@ -315,7 +320,7 @@
 2011-10-24  Julian Brown  <jul...@codesourcery.com>
 
        * config/m68k/m68k.c (notice_update_cc): Tighten condition for
-       setting CC_REVERSED for FP comparisons.  
+       setting CC_REVERSED for FP comparisons.
 
 2011-10-24  Richard Guenther  <rguent...@suse.de>
 
@@ -368,14 +373,12 @@
        float and integer regs.
        (sparc_register_move_cost): Adjust to account for VIS3 moves.
        (sparc_preferred_reload_class): On 32-bit with VIS3 when moving an
-       integer reg to a class containing EXTRA_FP_REGS, constrain to
-       FP_REGS.
+       integer reg to a class containing EXTRA_FP_REGS, constrain to FP_REGS.
        (sparc_secondary_reload): On 32-bit with VIS3 when moving between
        float and integer regs we sometimes need a FP_REGS class
        intermediate move to satisfy the reload.  When this happens
        specify an extra cost of 2.
-       (*movsi_insn): Rename to have "_novis3" suffix and add !VIS3
-       guard.
+       (*movsi_insn): Rename to have "_novis3" suffix and add !VIS3 guard.
        (*movdi_insn_sp32_v9): Likewise.
        (*movdi_insn_sp64): Likewise.
        (*movsf_insn): Likewise.
@@ -399,10 +402,9 @@
        (*mov<VM32:mode>_insn_vis3): New insn.
        (*mov<VM64:mode>_insn_sp64_vis3): New insn.
        (*mov<VM64:mode>_insn_sp32_vis3): New insn.
-       (VM64 reg<-->reg split): New spliiter for 32-bit.
+       (VM64 reg<-->reg split): New splitter for 32-bit.
 
-       * config/sparc/sparc.c (sparc_split_regreg_legitimate): New
-       function.
+       * config/sparc/sparc.c (sparc_split_regreg_legitimate): New function.
        * config/sparc/sparc-protos.h (sparc_split_regreg_legitimate):
        Declare it.
        * config/sparc/sparc.md (DImode reg/reg split): Use it.
@@ -478,22 +480,23 @@
 2011-10-23  Tom de Vries  <t...@codesourcery.com>
 
        PR tree-optimization/50763
-       * tree-ssa-tail-merge.c (same_succ_flush_bb): New function, factored out
-       of ...
+       * tree-ssa-tail-merge.c (same_succ_flush_bb): New function, factored
+       out of ...
        (same_succ_flush_bbs): Use same_succ_flush_bb.
        (purge_bbs): Remove argument.  Remove calls to same_succ_flush_bbs,
        release_last_vdef and delete_basic_block.
        (unlink_virtual_phi): New function.
        (update_vuses): Add and use vuse1_phi_args argument.  Set var to
-       SSA_NAME_VAR of vuse1 or vuse2, and use var.  Handle case that def_stmt2
-       is NULL.  Use phi result as phi arg in case vuse1 or vuse2 is NULL_TREE.
-       Replace uses of vuse1 if vuse2 is NULL_TREE.  Fix code to limit
-       replacement of uses.  Propagate phi argument for phis with a single
-       argument.
+       SSA_NAME_VAR of vuse1 or vuse2, and use var.  Handle case that
+       def_stmt2 is NULL.  Use phi result as phi arg in case vuse1 or vuse2
+       is NULL_TREE.  Replace uses of vuse1 if vuse2 is NULL_TREE.  Fix code
+       to limit replacement of uses.  Propagate phi argument for phis with a
+       single argument.
        (replace_block_by): Update vops if phi_vuse1 or phi_vuse2 is NULL_TREE.
-       Set vuse1_phi_args if vuse1 is a phi defined in bb1.  Add vuse1_phi_args
-       as argument to call to update_vuses.  Call release_last_vdef,
-       same_succ_flush_bb, delete_basic_block.  Update CDI_DOMINATORS info.
+       Set vuse1_phi_args if vuse1 is a phi defined in bb1.  Add
+       vuse1_phi_args as argument to call to update_vuses.  Call
+       release_last_vdef, same_succ_flush_bb, delete_basic_block.  Update
+       CDI_DOMINATORS info.
        (tail_merge_optimize): Remove argument in call to purge_bbs.  Remove
        call to free_dominance_info.  Only call calculate_dominance_info once.
 
Index: config/i386/sse.md
===================================================================
--- config/i386/sse.md  (revision 180546)
+++ config/i386/sse.md  (working copy)
@@ -4231,15 +4231,14 @@
   [(set (match_operand:V4DF 0 "register_operand"         "=x,x")
        (vec_select:V4DF
          (vec_concat:V8DF
-           (match_operand:V4DF 1 "nonimmediate_operand" "xm,x")
-           (match_operand:V4DF 2 "nonimmediate_operand" " 1,xm"))
+           (match_operand:V4DF 1 "nonimmediate_operand" " x,m")
+           (match_operand:V4DF 2 "nonimmediate_operand" "xm,1"))
          (parallel [(const_int 0) (const_int 4)
                     (const_int 2) (const_int 6)])))]
-  "TARGET_AVX
-   && (!MEM_P (operands[1]) || rtx_equal_p (operands[1], operands[2]))"
+  "TARGET_AVX"
   "@
-   vmovddup\t{%1, %0|%0, %1}
-   vunpcklpd\t{%2, %1, %0|%0, %1, %2}"
+   vunpcklpd\t{%2, %1, %0|%0, %1, %2}
+   vmovddup\t{%1, %0|%0, %1}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V4DF")])

Reply via email to