Re: [patch] Support vectorization of widening shifts

2011-10-18 Thread Jakub Jelinek
On Tue, Oct 18, 2011 at 11:39:22AM +0200, Ira Rosen wrote:
 On 2 October 2011 10:30, Ira Rosen ira.ro...@linaro.org wrote:
  On 29 September 2011 17:30, Ramana Radhakrishnan
  ramana.radhakrish...@linaro.org wrote:
  On 19 September 2011 08:54, Ira Rosen ira.ro...@linaro.org wrote:
 
 
  Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
  and arm-linux-gnueabi
  OK for mainline?
 
  Sorry I missed this patch. Is there any reason why we need unspecs in
  this case ? Can't this be represented by subregs and zero/ sign
  extensions in RTL without the UNSPECs ?
 
 I committed the attached patch with Ramana's solution for testing

 +/* Detect widening shift pattern:
  
 +   type a_t;
 +   TYPE a_T, res_T;
 +
 +   S1 a_t = ;
 +   S2 a_T = (TYPE) a_t;
 +   S3 res_T = a_T  CONST;
 +
 +  where type 'TYPE' is at least double the size of type 'type'.
 +
 +  Also detect unsgigned cases:

unsigned

Jakub


Re: [patch] Support vectorization of widening shifts

2011-10-02 Thread Ira Rosen
On 29 September 2011 17:30, Ramana Radhakrishnan
ramana.radhakrish...@linaro.org wrote:
 On 19 September 2011 08:54, Ira Rosen ira.ro...@linaro.org wrote:


 Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
 and arm-linux-gnueabi
 OK for mainline?

 Sorry I missed this patch. Is there any reason why we need unspecs in
 this case ? Can't this be represented by subregs and zero/ sign
 extensions in RTL without the UNSPECs ?

Like this:

Index: config/arm/neon.md
===
--- config/arm/neon.md  (revision 178942)
+++ config/arm/neon.md  (working copy)
@@ -5550,6 +5550,46 @@
  }
 )

+(define_insn neon_vec_USshiftl_mode
+ [(set (match_operand:V_widen 0 register_operand =w)
+   (SE:V_widen (match_operand:VW 1 register_operand w)))
+   (match_operand:SI 2 immediate_operand i)]
+  TARGET_NEON
+{
+  /* The boundaries are: 0  imm = size.  */
+  neon_const_bounds (operands[2], 0, neon_element_bits (MODEmode) + 1);
+  return vshll.USV_sz_elem %q0, %P1, %2;
+}
+  [(set_attr neon_type neon_shift_1)]
+)
+
+(define_expand vec_widen_USshiftl_lo_mode
+  [(match_operand:V_unpack 0 register_operand )
+   (SE:V_unpack (match_operand:VU 1 register_operand ))
+   (match_operand:SI 2 immediate_operand i)]
+ TARGET_NEON  !BYTES_BIG_ENDIAN
+ {
+  emit_insn (gen_neon_vec_USshiftl_V_half (operands[0],
+   simplify_gen_subreg (V_HALFmode, operands[1], MODEmode, 0),
+   operands[2]));
+   DONE;
+ }
+)
+
+(define_expand vec_widen_USshiftl_hi_mode
+  [(match_operand:V_unpack 0 register_operand )
+   (SE:V_unpack (match_operand:VU 1 register_operand ))
+   (match_operand:SI 2 immediate_operand i)]
+ TARGET_NEON  !BYTES_BIG_ENDIAN
+ {
+  emit_insn (gen_neon_vec_USshiftl_V_half (operands[0],
+simplify_gen_subreg (V_HALFmode, operands[1], MODEmode,
+GET_MODE_SIZE (V_HALFmode)),
+operands[2]));
+   DONE;
+ }
+)
+
 ;; Vectorize for non-neon-quad case
 (define_insn neon_unpackUS_mode
  [(set (match_operand:V_widen 0 register_operand =w)
@@ -5626,6 +5666,34 @@
  }
 )

+(define_expand vec_widen_USshiftl_hi_mode
+ [(match_operand:V_double_width 0 register_operand )
+   (SE:V_double_width (match_operand:VDI 1 register_operand ))
+   (match_operand:SI 2 immediate_operand i)]
+ TARGET_NEON
+ {
+   rtx tmpreg = gen_reg_rtx (V_widenmode);
+   emit_insn (gen_neon_vec_USshiftl_mode (tmpreg, operands[1],
operands[2]));
+   emit_insn (gen_neon_vget_highV_widen_l (operands[0], tmpreg));
+
+   DONE;
+ }
+)
+
+(define_expand vec_widen_USshiftl_lo_mode
+  [(match_operand:V_double_width 0 register_operand )
+   (SE:V_double_width (match_operand:VDI 1 register_operand ))
+   (match_operand:SI 2 immediate_operand i)]
+ TARGET_NEON
+ {
+   rtx tmpreg = gen_reg_rtx (V_widenmode);
+   emit_insn (gen_neon_vec_USshiftl_mode (tmpreg, operands[1],
operands[2]));
+   emit_insn (gen_neon_vget_lowV_widen_l (operands[0], tmpreg));
+
+   DONE;
+ }
+)
+
 ; FIXME: These instruction patterns can't be used safely in big-endian mode
 ; because the ordering of vector elements in Q registers is different from what
 ; the semantics of the instructions require.

?

Thanks,
Ira



 cheers
 Ramana


 Thanks,
 Ira

 ChangeLog:

        * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo,
 vec_widen_sshiftl_hi,
        vec_widen_sshiftl_lo): Document.
        * tree-pretty-print.c (dump_generic_node): Handle 
 WIDEN_SHIFT_LEFT_EXPR,
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        (op_code_prio): Likewise.
        (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR.
        * optabs.c (optab_for_tree_code): Handle
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo.
        * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo.
        * genopinit.c (optabs): Initialize the new optabs.
        * expr.c (expand_expr_real_2): Handle
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        * gimple-pretty-print.c (dump_binary_rhs): Likewise.
        * tree-vectorizer.h (NUM_PATTERNS): Increase to 6.
        * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR,
        VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New.
        * cfgexpand.c (expand_debug_expr):  Handle new tree codes.
        * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add
        vect_recog_widen_shift_pattern.
        (vect_handle_widen_mult_by_const): Rename...
        (vect_handle_widen_op_by_const): ...to this.  Handle shifts.
        Add a new argument, update documentation.
        (vect_recog_widen_mult_pattern): Assume that only second
        operand can be constant.  Update call to
        vect_handle_widen_op_by_const.
        (vect_operation_fits_smaller_type): Add the already existing
        def stmt to the list of pattern statements.
        

Re: [patch] Support vectorization of widening shifts

2011-09-29 Thread Ramana Radhakrishnan
On 19 September 2011 08:54, Ira Rosen ira.ro...@linaro.org wrote:


 Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
 and arm-linux-gnueabi
 OK for mainline?

Sorry I missed this patch. Is there any reason why we need unspecs in
this case ? Can't this be represented by subregs and zero/ sign
extensions in RTL without the UNSPECs ?

cheers
Ramana


 Thanks,
 Ira

 ChangeLog:

        * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo,
 vec_widen_sshiftl_hi,
        vec_widen_sshiftl_lo): Document.
        * tree-pretty-print.c (dump_generic_node): Handle 
 WIDEN_SHIFT_LEFT_EXPR,
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        (op_code_prio): Likewise.
        (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR.
        * optabs.c (optab_for_tree_code): Handle
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo.
        * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo.
        * genopinit.c (optabs): Initialize the new optabs.
        * expr.c (expand_expr_real_2): Handle
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        * gimple-pretty-print.c (dump_binary_rhs): Likewise.
        * tree-vectorizer.h (NUM_PATTERNS): Increase to 6.
        * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR,
        VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New.
        * cfgexpand.c (expand_debug_expr):  Handle new tree codes.
        * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add
        vect_recog_widen_shift_pattern.
        (vect_handle_widen_mult_by_const): Rename...
        (vect_handle_widen_op_by_const): ...to this.  Handle shifts.
        Add a new argument, update documentation.
        (vect_recog_widen_mult_pattern): Assume that only second
        operand can be constant.  Update call to
        vect_handle_widen_op_by_const.
        (vect_operation_fits_smaller_type): Add the already existing
        def stmt to the list of pattern statements.
        (vect_recog_widen_shift_pattern): New.
        * tree-vect-stmts.c (vectorizable_type_promotion): Handle
        widening shifts.
        (supportable_widening_operation): Likewise.
        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
        * config/arm/neon.md (neon_vec_USshiftl_lo_mode): New.
        (vec_widen_USshiftl_lo_mode, neon_vec_USshiftl_hi_mode,
        vec_widen_USshiftl_hi_mode, neon_vec_USshift_left_mode):
        Likewise.
        * tree-vect-slp.c (vect_build_slp_tree): Require same shift operand
        for widening shift.

 testsuite/ChangeLog:

       * gcc.dg/vect/vect-widen-shift-s16.c: New.
       * gcc.dg/vect/vect-widen-shift-s8.c: New.
       * gcc.dg/vect/vect-widen-shift-u16.c: New.
       * gcc.dg/vect/vect-widen-shift-u8.c: New.



Re: [patch] Support vectorization of widening shifts

2011-09-27 Thread Ira Rosen
On 26 September 2011 17:12, Richard Guenther richard.guent...@gmail.com wrote:
 On Mon, Sep 19, 2011 at 9:54 AM, Ira Rosen ira.ro...@linaro.org wrote:
 Hi,

 This patch adds a support of widening shift left. The following
 pattern is detected:

 type a_t;
 TYPE a_T, res_T;

 a_t = ;
 a_T = (TYPE) a_t;
 res_T = a_T  CONST;

 ('TYPE' is at least 2 times bigger than 'type', and CONST is at most
 the size of 'type')

 and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it:

 a_t = ;
 a_T = (TYPE) a_t;
 res_T = a_T  CONST;
    --  res_T = a_t w CONST;

 which is later transformed into:

 va_t = ;
 vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR va_t, CONST;
 vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR va_t, CONST;

 This patch also supports unsigned types, and cases when 'TYPE' is 4
 times bigger than 'type'.
 This feature is similar to vectorization of widening multiplication.

 Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
 and arm-linux-gnueabi
 OK for mainline?

 Hmm, it doesn't look like arm has real widening shift instructions.

It does: vshll. The implementation may look awkward because we don't
support multiple vector sizes in the same operation (vshll takes a
64-bit vector and produces a 128-bit vector), but the resulting code
is just the instruction itself.

 So why not split this into the widening, shift parts in the vectorizer?

What do you mean? (We of course already support widening first and
then shifting the widened value).

Thanks,
Ira

 That
 way you wouldn't need new tree codes and all architectures that can
 do widening conversions would benefit?

 Thanks,
 Richard.



Re: [patch] Support vectorization of widening shifts

2011-09-27 Thread Richard Guenther
On Tue, Sep 27, 2011 at 8:32 AM, Ira Rosen ira.ro...@linaro.org wrote:
 On 26 September 2011 17:12, Richard Guenther richard.guent...@gmail.com 
 wrote:
 On Mon, Sep 19, 2011 at 9:54 AM, Ira Rosen ira.ro...@linaro.org wrote:
 Hi,

 This patch adds a support of widening shift left. The following
 pattern is detected:

 type a_t;
 TYPE a_T, res_T;

 a_t = ;
 a_T = (TYPE) a_t;
 res_T = a_T  CONST;

 ('TYPE' is at least 2 times bigger than 'type', and CONST is at most
 the size of 'type')

 and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it:

 a_t = ;
 a_T = (TYPE) a_t;
 res_T = a_T  CONST;
    --  res_T = a_t w CONST;

 which is later transformed into:

 va_t = ;
 vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR va_t, CONST;
 vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR va_t, CONST;

 This patch also supports unsigned types, and cases when 'TYPE' is 4
 times bigger than 'type'.
 This feature is similar to vectorization of widening multiplication.

 Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
 and arm-linux-gnueabi
 OK for mainline?

 Hmm, it doesn't look like arm has real widening shift instructions.

 It does: vshll. The implementation may look awkward because we don't
 support multiple vector sizes in the same operation (vshll takes a
 64-bit vector and produces a 128-bit vector), but the resulting code
 is just the instruction itself.

Ah, ok.  Can you please do s/SHIFT_LEFT/LSHIFT/ on the new tree
code names for consistency with LSHIFT_EXPR?  Also please implement
some gimple verification for the new codes instead of sticking them
to the /* FIXME */ case in tree-cfg.c.

 So why not split this into the widening, shift parts in the vectorizer?

 What do you mean? (We of course already support widening first and
 then shifting the widened value).

Of course.

Ok with the above changes.

Thanks,
Richard.

 Thanks,
 Ira

 That
 way you wouldn't need new tree codes and all architectures that can
 do widening conversions would benefit?

 Thanks,
 Richard.




Re: [patch] Support vectorization of widening shifts

2011-09-26 Thread Richard Guenther
On Mon, Sep 19, 2011 at 9:54 AM, Ira Rosen ira.ro...@linaro.org wrote:
 Hi,

 This patch adds a support of widening shift left. The following
 pattern is detected:

 type a_t;
 TYPE a_T, res_T;

 a_t = ;
 a_T = (TYPE) a_t;
 res_T = a_T  CONST;

 ('TYPE' is at least 2 times bigger than 'type', and CONST is at most
 the size of 'type')

 and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it:

 a_t = ;
 a_T = (TYPE) a_t;
 res_T = a_T  CONST;
    --  res_T = a_t w CONST;

 which is later transformed into:

 va_t = ;
 vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR va_t, CONST;
 vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR va_t, CONST;

 This patch also supports unsigned types, and cases when 'TYPE' is 4
 times bigger than 'type'.
 This feature is similar to vectorization of widening multiplication.

 Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
 and arm-linux-gnueabi
 OK for mainline?

Hmm, it doesn't look like arm has real widening shift instructions.  So
why not split this into the widening, shift parts in the vectorizer?  That
way you wouldn't need new tree codes and all architectures that can
do widening conversions would benefit?

Thanks,
Richard.

 Thanks,
 Ira

 ChangeLog:

        * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo,
 vec_widen_sshiftl_hi,
        vec_widen_sshiftl_lo): Document.
        * tree-pretty-print.c (dump_generic_node): Handle 
 WIDEN_SHIFT_LEFT_EXPR,
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        (op_code_prio): Likewise.
        (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR.
        * optabs.c (optab_for_tree_code): Handle
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo.
        * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo.
        * genopinit.c (optabs): Initialize the new optabs.
        * expr.c (expand_expr_real_2): Handle
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        * gimple-pretty-print.c (dump_binary_rhs): Likewise.
        * tree-vectorizer.h (NUM_PATTERNS): Increase to 6.
        * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR,
        VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New.
        * cfgexpand.c (expand_debug_expr):  Handle new tree codes.
        * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add
        vect_recog_widen_shift_pattern.
        (vect_handle_widen_mult_by_const): Rename...
        (vect_handle_widen_op_by_const): ...to this.  Handle shifts.
        Add a new argument, update documentation.
        (vect_recog_widen_mult_pattern): Assume that only second
        operand can be constant.  Update call to
        vect_handle_widen_op_by_const.
        (vect_operation_fits_smaller_type): Add the already existing
        def stmt to the list of pattern statements.
        (vect_recog_widen_shift_pattern): New.
        * tree-vect-stmts.c (vectorizable_type_promotion): Handle
        widening shifts.
        (supportable_widening_operation): Likewise.
        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
        * config/arm/neon.md (neon_vec_USshiftl_lo_mode): New.
        (vec_widen_USshiftl_lo_mode, neon_vec_USshiftl_hi_mode,
        vec_widen_USshiftl_hi_mode, neon_vec_USshift_left_mode):
        Likewise.
        * tree-vect-slp.c (vect_build_slp_tree): Require same shift operand
        for widening shift.

 testsuite/ChangeLog:

       * gcc.dg/vect/vect-widen-shift-s16.c: New.
       * gcc.dg/vect/vect-widen-shift-s8.c: New.
       * gcc.dg/vect/vect-widen-shift-u16.c: New.
       * gcc.dg/vect/vect-widen-shift-u8.c: New.