Re: [patch] Support vectorization of widening shifts
On Tue, Oct 18, 2011 at 11:39:22AM +0200, Ira Rosen wrote: On 2 October 2011 10:30, Ira Rosen ira.ro...@linaro.org wrote: On 29 September 2011 17:30, Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote: On 19 September 2011 08:54, Ira Rosen ira.ro...@linaro.org wrote: Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux and arm-linux-gnueabi OK for mainline? Sorry I missed this patch. Is there any reason why we need unspecs in this case ? Can't this be represented by subregs and zero/ sign extensions in RTL without the UNSPECs ? I committed the attached patch with Ramana's solution for testing +/* Detect widening shift pattern: + type a_t; + TYPE a_T, res_T; + + S1 a_t = ; + S2 a_T = (TYPE) a_t; + S3 res_T = a_T CONST; + + where type 'TYPE' is at least double the size of type 'type'. + + Also detect unsgigned cases: unsigned Jakub
Re: [patch] Support vectorization of widening shifts
On 29 September 2011 17:30, Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote: On 19 September 2011 08:54, Ira Rosen ira.ro...@linaro.org wrote: Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux and arm-linux-gnueabi OK for mainline? Sorry I missed this patch. Is there any reason why we need unspecs in this case ? Can't this be represented by subregs and zero/ sign extensions in RTL without the UNSPECs ? Like this: Index: config/arm/neon.md === --- config/arm/neon.md (revision 178942) +++ config/arm/neon.md (working copy) @@ -5550,6 +5550,46 @@ } ) +(define_insn neon_vec_USshiftl_mode + [(set (match_operand:V_widen 0 register_operand =w) + (SE:V_widen (match_operand:VW 1 register_operand w))) + (match_operand:SI 2 immediate_operand i)] + TARGET_NEON +{ + /* The boundaries are: 0 imm = size. */ + neon_const_bounds (operands[2], 0, neon_element_bits (MODEmode) + 1); + return vshll.USV_sz_elem %q0, %P1, %2; +} + [(set_attr neon_type neon_shift_1)] +) + +(define_expand vec_widen_USshiftl_lo_mode + [(match_operand:V_unpack 0 register_operand ) + (SE:V_unpack (match_operand:VU 1 register_operand )) + (match_operand:SI 2 immediate_operand i)] + TARGET_NEON !BYTES_BIG_ENDIAN + { + emit_insn (gen_neon_vec_USshiftl_V_half (operands[0], + simplify_gen_subreg (V_HALFmode, operands[1], MODEmode, 0), + operands[2])); + DONE; + } +) + +(define_expand vec_widen_USshiftl_hi_mode + [(match_operand:V_unpack 0 register_operand ) + (SE:V_unpack (match_operand:VU 1 register_operand )) + (match_operand:SI 2 immediate_operand i)] + TARGET_NEON !BYTES_BIG_ENDIAN + { + emit_insn (gen_neon_vec_USshiftl_V_half (operands[0], +simplify_gen_subreg (V_HALFmode, operands[1], MODEmode, +GET_MODE_SIZE (V_HALFmode)), +operands[2])); + DONE; + } +) + ;; Vectorize for non-neon-quad case (define_insn neon_unpackUS_mode [(set (match_operand:V_widen 0 register_operand =w) @@ -5626,6 +5666,34 @@ } ) +(define_expand vec_widen_USshiftl_hi_mode + [(match_operand:V_double_width 0 register_operand ) + (SE:V_double_width (match_operand:VDI 1 register_operand )) + (match_operand:SI 2 immediate_operand i)] + TARGET_NEON + { + rtx tmpreg = gen_reg_rtx (V_widenmode); + emit_insn (gen_neon_vec_USshiftl_mode (tmpreg, operands[1], operands[2])); + emit_insn (gen_neon_vget_highV_widen_l (operands[0], tmpreg)); + + DONE; + } +) + +(define_expand vec_widen_USshiftl_lo_mode + [(match_operand:V_double_width 0 register_operand ) + (SE:V_double_width (match_operand:VDI 1 register_operand )) + (match_operand:SI 2 immediate_operand i)] + TARGET_NEON + { + rtx tmpreg = gen_reg_rtx (V_widenmode); + emit_insn (gen_neon_vec_USshiftl_mode (tmpreg, operands[1], operands[2])); + emit_insn (gen_neon_vget_lowV_widen_l (operands[0], tmpreg)); + + DONE; + } +) + ; FIXME: These instruction patterns can't be used safely in big-endian mode ; because the ordering of vector elements in Q registers is different from what ; the semantics of the instructions require. ? Thanks, Ira cheers Ramana Thanks, Ira ChangeLog: * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo, vec_widen_sshiftl_hi, vec_widen_sshiftl_lo): Document. * tree-pretty-print.c (dump_generic_node): Handle WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR. (op_code_prio): Likewise. (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR. * optabs.c (optab_for_tree_code): Handle VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR. (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo. * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo. * genopinit.c (optabs): Initialize the new optabs. * expr.c (expand_expr_real_2): Handle VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR. * gimple-pretty-print.c (dump_binary_rhs): Likewise. * tree-vectorizer.h (NUM_PATTERNS): Increase to 6. * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR, VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New. * cfgexpand.c (expand_debug_expr): Handle new tree codes. * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add vect_recog_widen_shift_pattern. (vect_handle_widen_mult_by_const): Rename... (vect_handle_widen_op_by_const): ...to this. Handle shifts. Add a new argument, update documentation. (vect_recog_widen_mult_pattern): Assume that only second operand can be constant. Update call to vect_handle_widen_op_by_const. (vect_operation_fits_smaller_type): Add the already existing def stmt to the list of pattern statements.
Re: [patch] Support vectorization of widening shifts
On 19 September 2011 08:54, Ira Rosen ira.ro...@linaro.org wrote: Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux and arm-linux-gnueabi OK for mainline? Sorry I missed this patch. Is there any reason why we need unspecs in this case ? Can't this be represented by subregs and zero/ sign extensions in RTL without the UNSPECs ? cheers Ramana Thanks, Ira ChangeLog: * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo, vec_widen_sshiftl_hi, vec_widen_sshiftl_lo): Document. * tree-pretty-print.c (dump_generic_node): Handle WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR. (op_code_prio): Likewise. (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR. * optabs.c (optab_for_tree_code): Handle VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR. (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo. * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo. * genopinit.c (optabs): Initialize the new optabs. * expr.c (expand_expr_real_2): Handle VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR. * gimple-pretty-print.c (dump_binary_rhs): Likewise. * tree-vectorizer.h (NUM_PATTERNS): Increase to 6. * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR, VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New. * cfgexpand.c (expand_debug_expr): Handle new tree codes. * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add vect_recog_widen_shift_pattern. (vect_handle_widen_mult_by_const): Rename... (vect_handle_widen_op_by_const): ...to this. Handle shifts. Add a new argument, update documentation. (vect_recog_widen_mult_pattern): Assume that only second operand can be constant. Update call to vect_handle_widen_op_by_const. (vect_operation_fits_smaller_type): Add the already existing def stmt to the list of pattern statements. (vect_recog_widen_shift_pattern): New. * tree-vect-stmts.c (vectorizable_type_promotion): Handle widening shifts. (supportable_widening_operation): Likewise. * tree-inline.c (estimate_operator_cost): Handle new tree codes. * tree-vect-generic.c (expand_vector_operations_1): Likewise. * tree-cfg.c (verify_gimple_assign_binary): Likewise. * config/arm/neon.md (neon_vec_USshiftl_lo_mode): New. (vec_widen_USshiftl_lo_mode, neon_vec_USshiftl_hi_mode, vec_widen_USshiftl_hi_mode, neon_vec_USshift_left_mode): Likewise. * tree-vect-slp.c (vect_build_slp_tree): Require same shift operand for widening shift. testsuite/ChangeLog: * gcc.dg/vect/vect-widen-shift-s16.c: New. * gcc.dg/vect/vect-widen-shift-s8.c: New. * gcc.dg/vect/vect-widen-shift-u16.c: New. * gcc.dg/vect/vect-widen-shift-u8.c: New.
Re: [patch] Support vectorization of widening shifts
On 26 September 2011 17:12, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Sep 19, 2011 at 9:54 AM, Ira Rosen ira.ro...@linaro.org wrote: Hi, This patch adds a support of widening shift left. The following pattern is detected: type a_t; TYPE a_T, res_T; a_t = ; a_T = (TYPE) a_t; res_T = a_T CONST; ('TYPE' is at least 2 times bigger than 'type', and CONST is at most the size of 'type') and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it: a_t = ; a_T = (TYPE) a_t; res_T = a_T CONST; -- res_T = a_t w CONST; which is later transformed into: va_t = ; vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR va_t, CONST; vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR va_t, CONST; This patch also supports unsigned types, and cases when 'TYPE' is 4 times bigger than 'type'. This feature is similar to vectorization of widening multiplication. Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux and arm-linux-gnueabi OK for mainline? Hmm, it doesn't look like arm has real widening shift instructions. It does: vshll. The implementation may look awkward because we don't support multiple vector sizes in the same operation (vshll takes a 64-bit vector and produces a 128-bit vector), but the resulting code is just the instruction itself. So why not split this into the widening, shift parts in the vectorizer? What do you mean? (We of course already support widening first and then shifting the widened value). Thanks, Ira That way you wouldn't need new tree codes and all architectures that can do widening conversions would benefit? Thanks, Richard.
Re: [patch] Support vectorization of widening shifts
On Tue, Sep 27, 2011 at 8:32 AM, Ira Rosen ira.ro...@linaro.org wrote: On 26 September 2011 17:12, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Sep 19, 2011 at 9:54 AM, Ira Rosen ira.ro...@linaro.org wrote: Hi, This patch adds a support of widening shift left. The following pattern is detected: type a_t; TYPE a_T, res_T; a_t = ; a_T = (TYPE) a_t; res_T = a_T CONST; ('TYPE' is at least 2 times bigger than 'type', and CONST is at most the size of 'type') and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it: a_t = ; a_T = (TYPE) a_t; res_T = a_T CONST; -- res_T = a_t w CONST; which is later transformed into: va_t = ; vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR va_t, CONST; vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR va_t, CONST; This patch also supports unsigned types, and cases when 'TYPE' is 4 times bigger than 'type'. This feature is similar to vectorization of widening multiplication. Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux and arm-linux-gnueabi OK for mainline? Hmm, it doesn't look like arm has real widening shift instructions. It does: vshll. The implementation may look awkward because we don't support multiple vector sizes in the same operation (vshll takes a 64-bit vector and produces a 128-bit vector), but the resulting code is just the instruction itself. Ah, ok. Can you please do s/SHIFT_LEFT/LSHIFT/ on the new tree code names for consistency with LSHIFT_EXPR? Also please implement some gimple verification for the new codes instead of sticking them to the /* FIXME */ case in tree-cfg.c. So why not split this into the widening, shift parts in the vectorizer? What do you mean? (We of course already support widening first and then shifting the widened value). Of course. Ok with the above changes. Thanks, Richard. Thanks, Ira That way you wouldn't need new tree codes and all architectures that can do widening conversions would benefit? Thanks, Richard.
Re: [patch] Support vectorization of widening shifts
On Mon, Sep 19, 2011 at 9:54 AM, Ira Rosen ira.ro...@linaro.org wrote: Hi, This patch adds a support of widening shift left. The following pattern is detected: type a_t; TYPE a_T, res_T; a_t = ; a_T = (TYPE) a_t; res_T = a_T CONST; ('TYPE' is at least 2 times bigger than 'type', and CONST is at most the size of 'type') and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it: a_t = ; a_T = (TYPE) a_t; res_T = a_T CONST; -- res_T = a_t w CONST; which is later transformed into: va_t = ; vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR va_t, CONST; vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR va_t, CONST; This patch also supports unsigned types, and cases when 'TYPE' is 4 times bigger than 'type'. This feature is similar to vectorization of widening multiplication. Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux and arm-linux-gnueabi OK for mainline? Hmm, it doesn't look like arm has real widening shift instructions. So why not split this into the widening, shift parts in the vectorizer? That way you wouldn't need new tree codes and all architectures that can do widening conversions would benefit? Thanks, Richard. Thanks, Ira ChangeLog: * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo, vec_widen_sshiftl_hi, vec_widen_sshiftl_lo): Document. * tree-pretty-print.c (dump_generic_node): Handle WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR. (op_code_prio): Likewise. (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR. * optabs.c (optab_for_tree_code): Handle VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR. (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo. * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo. * genopinit.c (optabs): Initialize the new optabs. * expr.c (expand_expr_real_2): Handle VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR. * gimple-pretty-print.c (dump_binary_rhs): Likewise. * tree-vectorizer.h (NUM_PATTERNS): Increase to 6. * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR, VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New. * cfgexpand.c (expand_debug_expr): Handle new tree codes. * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add vect_recog_widen_shift_pattern. (vect_handle_widen_mult_by_const): Rename... (vect_handle_widen_op_by_const): ...to this. Handle shifts. Add a new argument, update documentation. (vect_recog_widen_mult_pattern): Assume that only second operand can be constant. Update call to vect_handle_widen_op_by_const. (vect_operation_fits_smaller_type): Add the already existing def stmt to the list of pattern statements. (vect_recog_widen_shift_pattern): New. * tree-vect-stmts.c (vectorizable_type_promotion): Handle widening shifts. (supportable_widening_operation): Likewise. * tree-inline.c (estimate_operator_cost): Handle new tree codes. * tree-vect-generic.c (expand_vector_operations_1): Likewise. * tree-cfg.c (verify_gimple_assign_binary): Likewise. * config/arm/neon.md (neon_vec_USshiftl_lo_mode): New. (vec_widen_USshiftl_lo_mode, neon_vec_USshiftl_hi_mode, vec_widen_USshiftl_hi_mode, neon_vec_USshift_left_mode): Likewise. * tree-vect-slp.c (vect_build_slp_tree): Require same shift operand for widening shift. testsuite/ChangeLog: * gcc.dg/vect/vect-widen-shift-s16.c: New. * gcc.dg/vect/vect-widen-shift-s8.c: New. * gcc.dg/vect/vect-widen-shift-u16.c: New. * gcc.dg/vect/vect-widen-shift-u8.c: New.