Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-29 Thread Jakub Jelinek
On Fri, Jun 29, 2012 at 11:00:14AM +0200, Richard Guenther wrote:
 Indeed - the lack of cross-sub-128bit-word operations makes it very much
 expensive for some vectorizations.  Initially we added the patterns for
 vectorization of the hi/lo and interleave stuff because we didn't want
 regressions
 for vectorizing with 256bit vectors vs. 128bit vectors in the
 vectorizer testsuite.
 But now as we have support for vectorizing with both sizes we could consider
 not advertising the really not existing intstructions for 256bit vectors.  Or 
 at
 least properly model their cost.

The pr51581-3.c (f2) generated code is only shorter with -O3 -mavx
when using hi/lo over even/odd, with -O3 -mavx2 even/odd sequence is
shorter than hi/lo.
$ ~/timing ./pr51581-3-evenodd
Strip out best and worst realtime result
minimum: 0.110145575 sec real / 0.71177 sec CPU
maximum: 0.134790162 sec real / 0.000140234 sec CPU
average: 0.113982306 sec real / 0.000113236 sec CPU
stdev  : 0.002545680 sec real / 0.09365 sec CPU
$ ~/timing ./pr51581-3-hilo
Strip out best and worst realtime result
minimum: 0.098651474 sec real / 0.69318 sec CPU
maximum: 0.102126514 sec real / 0.000129507 sec CPU
average: 0.100120802 sec real / 0.000104589 sec CPU
stdev  : 0.001008010 sec real / 0.13241 sec CPU
Can't benchmark -mavx2 though...

Jakub


Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-28 Thread Jakub Jelinek
On Wed, Jun 27, 2012 at 02:37:08PM -0700, Richard Henderson wrote:
 
 I was sitting on this patch until I got around to fixing up Jakub's
 existing vector divmod code to use it.  But seeing as how he's adding
 more uses, I think it's better to get it in earlier.
 
 Tested via a patch sent under separate cover that changes
 __builtin_alpha_umulh to immediately fold to MULT_HIGHPART_EXPR.

Thanks.  Here is an incremental patch on top of my patch from yesterday
which expands some of the vector divisions/modulos using MULT_HIGHPART_EXPR
instead of VEC_WIDEN_MULT_*_EXPR + VEC_PERM_EXPR if backend supports that.
Improves code generated for ushort or short / or % on i?86 (slightly
complicated by the fact that unfortunately even -mavx2 doesn't support
vector by vector shifts for V{8,16}HImode (nor V{16,32}QImode), XOP does
though).

Ok for trunk?

I'll look at using MULT_HIGHPART_EXPR in the pattern recognizer and
vectorizing it as either of the sequences next.

2012-06-28  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/53645
* tree-vect-generic.c (expand_vector_divmod): Use MULT_HIGHPART_EXPR
instead of VEC_WIDEN_MULT_{HI,LO}_EXPR followed by VEC_PERM_EXPR
if possible.

* gcc.c-torture/execute/pr53645-2.c: New test.

--- gcc/tree-vect-generic.c.jj  2012-06-28 08:32:50.0 +0200
+++ gcc/tree-vect-generic.c 2012-06-28 09:10:51.436748834 +0200
@@ -455,7 +455,7 @@ expand_vector_divmod (gimple_stmt_iterat
   unsigned HOST_WIDE_INT mask = GET_MODE_MASK (TYPE_MODE (TREE_TYPE (type)));
   optab op;
   tree *vec;
-  unsigned char *sel;
+  unsigned char *sel = NULL;
   tree cur_op, mhi, mlo, mulcst, perm_mask, wider_type, tem;
 
   if (prec  HOST_BITS_PER_WIDE_INT)
@@ -744,26 +744,34 @@ expand_vector_divmod (gimple_stmt_iterat
   if (mode == -2 || BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
 return NULL_TREE;
 
-  op = optab_for_tree_code (VEC_WIDEN_MULT_LO_EXPR, type, optab_default);
-  if (op == NULL
-  || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing)
-return NULL_TREE;
-  op = optab_for_tree_code (VEC_WIDEN_MULT_HI_EXPR, type, optab_default);
-  if (op == NULL
-  || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing)
-return NULL_TREE;
-  sel = XALLOCAVEC (unsigned char, nunits);
-  for (i = 0; i  nunits; i++)
-sel[i] = 2 * i + (BYTES_BIG_ENDIAN ? 0 : 1);
-  if (!can_vec_perm_p (TYPE_MODE (type), false, sel))
-return NULL_TREE;
-  wider_type
-= build_vector_type (build_nonstandard_integer_type (prec * 2, unsignedp),
-nunits / 2);
-  if (GET_MODE_CLASS (TYPE_MODE (wider_type)) != MODE_VECTOR_INT
-  || GET_MODE_BITSIZE (TYPE_MODE (wider_type))
-!= GET_MODE_BITSIZE (TYPE_MODE (type)))
-return NULL_TREE;
+  op = optab_for_tree_code (MULT_HIGHPART_EXPR, type, optab_default);
+  if (op != NULL
+   optab_handler (op, TYPE_MODE (type)) != CODE_FOR_nothing)
+wider_type = NULL_TREE;
+  else
+{
+  op = optab_for_tree_code (VEC_WIDEN_MULT_LO_EXPR, type, optab_default);
+  if (op == NULL
+ || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing)
+   return NULL_TREE;
+  op = optab_for_tree_code (VEC_WIDEN_MULT_HI_EXPR, type, optab_default);
+  if (op == NULL
+ || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing)
+   return NULL_TREE;
+  sel = XALLOCAVEC (unsigned char, nunits);
+  for (i = 0; i  nunits; i++)
+   sel[i] = 2 * i + (BYTES_BIG_ENDIAN ? 0 : 1);
+  if (!can_vec_perm_p (TYPE_MODE (type), false, sel))
+   return NULL_TREE;
+  wider_type
+   = build_vector_type (build_nonstandard_integer_type (prec * 2,
+unsignedp),
+nunits / 2);
+  if (GET_MODE_CLASS (TYPE_MODE (wider_type)) != MODE_VECTOR_INT
+ || GET_MODE_BITSIZE (TYPE_MODE (wider_type))
+!= GET_MODE_BITSIZE (TYPE_MODE (type)))
+   return NULL_TREE;
+}
 
   cur_op = op0;
 
@@ -772,7 +780,7 @@ expand_vector_divmod (gimple_stmt_iterat
 case 0:
   gcc_assert (unsignedp);
   /* t1 = oprnd0  pre_shift;
-t2 = (type) (t1 w* ml  prec);
+t2 = t1 h* ml;
 q = t2  post_shift;  */
   cur_op = add_rshift (gsi, type, cur_op, pre_shifts);
   if (cur_op == NULL_TREE)
@@ -801,30 +809,37 @@ expand_vector_divmod (gimple_stmt_iterat
   for (i = 0; i  nunits; i++)
 vec[i] = build_int_cst (TREE_TYPE (type), mulc[i]);
   mulcst = build_vector (type, vec);
-  for (i = 0; i  nunits; i++)
-vec[i] = build_int_cst (TREE_TYPE (type), sel[i]);
-  perm_mask = build_vector (type, vec);
-  mhi = gimplify_build2 (gsi, VEC_WIDEN_MULT_HI_EXPR, wider_type,
-cur_op, mulcst);
-  mhi = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, mhi);
-  mlo = gimplify_build2 (gsi, VEC_WIDEN_MULT_LO_EXPR, wider_type,
-cur_op, mulcst);
-  mlo = gimplify_build1 (gsi, 

Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-28 Thread Jakub Jelinek
On Thu, Jun 28, 2012 at 09:17:55AM +0200, Jakub Jelinek wrote:
 I'll look at using MULT_HIGHPART_EXPR in the pattern recognizer and
 vectorizing it as either of the sequences next.

And here is corresponding pattern recognizer and vectorizer patch.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Unfortunately the addition of the builtin_mul_widen_* hooks on i?86 seems
to pessimize the generated code for gcc.dg/vect/pr51581-3.c
testcase (at least with -O3 -mavx) compared to when the hooks aren't
present, because i?86 has more natural support for widen mult lo/hi
compoared to widen mult even/odd, but I assume that on powerpc it is the
other way around.  So, how should I find out if both VEC_WIDEN_MULT_*_EXPR
and builtin_mul_widen_* are possible for the particular vectype which one
will be cheaper?

2012-06-28  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/51581
* tree-vect-stmts.c (permute_vec_elements): Add forward decl.
(vectorizable_operation): Handle vectorization of MULT_HIGHPART_EXPR
also using VEC_WIDEN_MULT_*_EXPR or builtin_mul_widen_* plus
VEC_PERM_EXPR if vector MULT_HIGHPART_EXPR isn't supported.
* tree-vect-patterns.c (vect_recog_divmod_pattern): Use
MULT_HIGHPART_EXPR instead of VEC_WIDEN_MULT_*_EXPR and shifts.

* gcc.dg/vect/pr51581-4.c: New test.

--- gcc/tree-vect-stmts.c.jj2012-06-26 11:38:28.0 +0200
+++ gcc/tree-vect-stmts.c   2012-06-28 13:27:50.475158271 +0200
@@ -3288,6 +3288,10 @@ vectorizable_shift (gimple stmt, gimple_
 }
 
 
+static tree permute_vec_elements (tree, tree, tree, gimple,
+ gimple_stmt_iterator *);
+
+
 /* Function vectorizable_operation.
 
Check if STMT performs a binary, unary or ternary operation that can
@@ -3300,17 +3304,18 @@ static bool
 vectorizable_operation (gimple stmt, gimple_stmt_iterator *gsi,
gimple *vec_stmt, slp_tree slp_node)
 {
-  tree vec_dest;
+  tree vec_dest, vec_dest2 = NULL_TREE;
+  tree vec_dest3 = NULL_TREE, vec_dest4 = NULL_TREE;
   tree scalar_dest;
   tree op0, op1 = NULL_TREE, op2 = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-  tree vectype;
+  tree vectype, wide_vectype = NULL_TREE;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   enum tree_code code;
   enum machine_mode vec_mode;
   tree new_temp;
   int op_type;
-  optab optab;
+  optab optab, optab2 = NULL;
   int icode;
   tree def;
   gimple def_stmt;
@@ -3327,6 +3332,8 @@ vectorizable_operation (gimple stmt, gim
   tree vop0, vop1, vop2;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   int vf;
+  unsigned char *sel = NULL;
+  tree decl1 = NULL_TREE, decl2 = NULL_TREE, perm_mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info)  !bb_vinfo)
 return false;
@@ -3451,31 +3458,97 @@ vectorizable_operation (gimple stmt, gim
   optab = optab_for_tree_code (code, vectype, optab_default);
 
   /* Supportable by target?  */
-  if (!optab)
+  if (!optab  code != MULT_HIGHPART_EXPR)
 {
   if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, no optab.);
   return false;
 }
   vec_mode = TYPE_MODE (vectype);
-  icode = (int) optab_handler (optab, vec_mode);
+  icode = optab ? (int) optab_handler (optab, vec_mode) : CODE_FOR_nothing;
+
+  if (icode == CODE_FOR_nothing
+   code == MULT_HIGHPART_EXPR
+   VECTOR_MODE_P (vec_mode)
+   BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN)
+{
+  /* If MULT_HIGHPART_EXPR isn't supported by the backend, see
+if we can emit VEC_WIDEN_MULT_{LO,HI}_EXPR followed by VEC_PERM_EXPR
+or builtin_mul_widen_{even,odd} followed by VEC_PERM_EXPR.  */
+  unsigned int prec = TYPE_PRECISION (TREE_TYPE (scalar_dest));
+  unsigned int unsignedp = TYPE_UNSIGNED (TREE_TYPE (scalar_dest));
+  tree wide_type
+   = build_nonstandard_integer_type (prec * 2, unsignedp);
+  wide_vectype
+= get_same_sized_vectype (wide_type, vectype);
+
+  sel = XALLOCAVEC (unsigned char, nunits_in);
+  if (VECTOR_MODE_P (TYPE_MODE (wide_vectype))
+  GET_MODE_SIZE (TYPE_MODE (wide_vectype))
+== GET_MODE_SIZE (vec_mode))
+   {
+ if (targetm.vectorize.builtin_mul_widen_even
+  (decl1 = targetm.vectorize.builtin_mul_widen_even (vectype))
+  targetm.vectorize.builtin_mul_widen_odd
+  (decl2 = targetm.vectorize.builtin_mul_widen_odd (vectype))
+  TYPE_MODE (TREE_TYPE (TREE_TYPE (decl1)))
+== TYPE_MODE (wide_vectype))
+   {
+ for (i = 0; i  nunits_in; i++)
+   sel[i] = !BYTES_BIG_ENDIAN + (i  ~1)
++ ((i  1) ? nunits_in : 0);
+ if (0  can_vec_perm_p (vec_mode, false, sel))
+   icode = 0;
+   }
+ if (icode == CODE_FOR_nothing)
+   {
+ decl1 = NULL_TREE;
+ decl2 = 

Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-28 Thread Richard Henderson
On 2012-06-28 07:05, Jakub Jelinek wrote:
 Unfortunately the addition of the builtin_mul_widen_* hooks on i?86 seems
 to pessimize the generated code for gcc.dg/vect/pr51581-3.c
 testcase (at least with -O3 -mavx) compared to when the hooks aren't
 present, because i?86 has more natural support for widen mult lo/hi
 compoared to widen mult even/odd, but I assume that on powerpc it is the
 other way around.  So, how should I find out if both VEC_WIDEN_MULT_*_EXPR
 and builtin_mul_widen_* are possible for the particular vectype which one
 will be cheaper?

I would assume that if the builtin exists, then it is cheaper.

I disagree about x86 has more natural support for hi/lo.  The basic sse2 
multiplication is even.  One shift per input is needed to generate odd.  On the 
other hand, one interleave per input is required for both hi/lo.  So 4 setup 
insns for hi/lo, and 2 setup insns for even/odd.  And on top of all that, XOP 
includes multiply odd at least for signed V4SI.

I'll have a look at the test case you mention while I re-look at the patches...


r~


Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-28 Thread Jakub Jelinek
On Thu, Jun 28, 2012 at 08:57:23AM -0700, Richard Henderson wrote:
 On 2012-06-28 07:05, Jakub Jelinek wrote:
  Unfortunately the addition of the builtin_mul_widen_* hooks on i?86 seems
  to pessimize the generated code for gcc.dg/vect/pr51581-3.c
  testcase (at least with -O3 -mavx) compared to when the hooks aren't
  present, because i?86 has more natural support for widen mult lo/hi
  compoared to widen mult even/odd, but I assume that on powerpc it is the
  other way around.  So, how should I find out if both VEC_WIDEN_MULT_*_EXPR
  and builtin_mul_widen_* are possible for the particular vectype which one
  will be cheaper?
 
 I would assume that if the builtin exists, then it is cheaper.
 
 I disagree about x86 has more natural support for hi/lo.  The basic sse2
 multiplication is even.  One shift per input is needed to generate odd. 
 On the other hand, one interleave per input is required for both hi/lo. 
 So 4 setup insns for hi/lo, and 2 setup insns for even/odd.  And on top of
 all that, XOP includes multiply odd at least for signed V4SI.

Perhaps the problem is then that the permutation is much more expensive
for even/odd.  With even/odd the f2 routine is:
vmovdqa d(%rip), %xmm2
vmovdqa .LC1(%rip), %xmm0
vpsrlq  $32, %xmm2, %xmm4
vmovdqa d+16(%rip), %xmm1
vpmuludq%xmm0, %xmm2, %xmm5
vpsrlq  $32, %xmm0, %xmm3
vpmuludq%xmm3, %xmm4, %xmm4
vpmuludq%xmm0, %xmm1, %xmm0
vmovdqa .LC2(%rip), %xmm2
vpsrlq  $32, %xmm1, %xmm1
vpmuludq%xmm3, %xmm1, %xmm3
vmovdqa .LC3(%rip), %xmm1
vpshufb %xmm2, %xmm5, %xmm5
vpshufb %xmm1, %xmm4, %xmm4
vpshufb %xmm2, %xmm0, %xmm2
vpshufb %xmm1, %xmm3, %xmm1
vpor%xmm4, %xmm5, %xmm4
vpor%xmm1, %xmm2, %xmm1
vpsrld  $1, %xmm4, %xmm4
vmovdqa %xmm4, c(%rip)
vpsrld  $1, %xmm1, %xmm1
vmovdqa %xmm1, c+16(%rip)
ret
and with lo/hi it is:
vmovdqa d(%rip), %xmm2
vpunpckhdq  %xmm2, %xmm2, %xmm3
vpunpckldq  %xmm2, %xmm2, %xmm2
vmovdqa .LC1(%rip), %xmm0
vpmuludq%xmm0, %xmm3, %xmm3
vmovdqa d+16(%rip), %xmm1
vpmuludq%xmm0, %xmm2, %xmm2
vshufps $221, %xmm2, %xmm3, %xmm2
vpsrld  $1, %xmm2, %xmm2
vmovdqa %xmm2, c(%rip)
vpunpckhdq  %xmm1, %xmm1, %xmm2
vpunpckldq  %xmm1, %xmm1, %xmm1
vpmuludq%xmm0, %xmm2, %xmm2
vpmuludq%xmm0, %xmm1, %xmm0
vshufps $221, %xmm0, %xmm2, %xmm0
vpsrld  $1, %xmm0, %xmm0
vmovdqa %xmm0, c+16(%rip)
ret

Jakub


Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-28 Thread H.J. Lu
On Thu, Jun 28, 2012 at 8:57 AM, Richard Henderson r...@redhat.com wrote:
 On 2012-06-28 07:05, Jakub Jelinek wrote:
 Unfortunately the addition of the builtin_mul_widen_* hooks on i?86 seems
 to pessimize the generated code for gcc.dg/vect/pr51581-3.c
 testcase (at least with -O3 -mavx) compared to when the hooks aren't
 present, because i?86 has more natural support for widen mult lo/hi
 compoared to widen mult even/odd, but I assume that on powerpc it is the
 other way around.  So, how should I find out if both VEC_WIDEN_MULT_*_EXPR
 and builtin_mul_widen_* are possible for the particular vectype which one
 will be cheaper?

 I would assume that if the builtin exists, then it is cheaper.

 I disagree about x86 has more natural support for hi/lo.  The basic sse2 
 multiplication is even.  One shift per input is needed to generate odd.  On 
 the other hand, one interleave per input is required for both hi/lo.  So 4 
 setup insns for hi/lo, and 2 setup insns for even/odd.  And on top of all 
 that, XOP includes multiply odd at least for signed V4SI.

 I'll have a look at the test case you mention while I re-look at the 
 patches...


The upper 128-bit of 256-bit AVX instructions aren't a good fit with the
current vectorizer infrastructure.


-- 
H.J.


Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-28 Thread Richard Henderson
On 2012-06-28 09:20, Jakub Jelinek wrote:
 Perhaps the problem is then that the permutation is much more expensive
 for even/odd.  With even/odd the f2 routine is:
...
 vpshufb %xmm2, %xmm5, %xmm5
 vpshufb %xmm1, %xmm4, %xmm4
 vpor%xmm4, %xmm5, %xmm4
...
 and with lo/hi it is:
 vshufps $221, %xmm2, %xmm3, %xmm2

Hmm.  That second has a reformatting delay.

Last week when I pulled the mulv4si3 routine out to i386.c,
I experimented with a few different options, including that
interleave+shufps sequence seen here for lo/hi.  See the 
comment there discussing options and timing.

This also shows a deficiency in our vec_perm logic:

0L 0H 2L 2H 1L 1H 3L 3H
0H 2H 0H 2H 1H 3H 1H 3H 2*pshufd
0H 1H 2H 3H punpckldq

without the permutation constants in memory.


r~


Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-28 Thread Richard Henderson
On 2012-06-28 07:05, Jakub Jelinek wrote:
   PR tree-optimization/51581
   * tree-vect-stmts.c (permute_vec_elements): Add forward decl.
   (vectorizable_operation): Handle vectorization of MULT_HIGHPART_EXPR
   also using VEC_WIDEN_MULT_*_EXPR or builtin_mul_widen_* plus
   VEC_PERM_EXPR if vector MULT_HIGHPART_EXPR isn't supported.
   * tree-vect-patterns.c (vect_recog_divmod_pattern): Use
   MULT_HIGHPART_EXPR instead of VEC_WIDEN_MULT_*_EXPR and shifts.
 
   * gcc.dg/vect/pr51581-4.c: New test.

Ok, except,

 +   if (0  can_vec_perm_p (vec_mode, false, sel))
 + icode = 0;

Testing hack left in.


r~


Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-28 Thread Jakub Jelinek
On Fri, Jun 29, 2012 at 12:00:10AM +0200, Bernhard Reutner-Fischer wrote:
 Really both HI? If so optab2 could be removed from that fn altogether..

Of course, thanks for pointing that out.  I've additionally added a result
mode check (similar to what supportable_widening_operation does).
The reason for not using supportable_widening_operation is that it only
tests even/odd calls for reductions, while we can use them everywhere.

Committed as obvious.

2012-06-29  Jakub Jelinek  ja...@redhat.com

* tree-vect-stmts.c (vectorizable_operation): Check both
VEC_WIDEN_MULT_LO_EXPR and VEC_WIDEN_MULT_HI_EXPR optabs.
Verify that operand[0]'s mode is TYPE_MODE (wide_vectype).

--- gcc/tree-vect-stmts.c   (revision 189053)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -3504,14 +3504,19 @@ vectorizable_operation (gimple stmt, gim
{
  decl1 = NULL_TREE;
  decl2 = NULL_TREE;
- optab = optab_for_tree_code (VEC_WIDEN_MULT_HI_EXPR,
+ optab = optab_for_tree_code (VEC_WIDEN_MULT_LO_EXPR,
   vectype, optab_default);
  optab2 = optab_for_tree_code (VEC_WIDEN_MULT_HI_EXPR,
vectype, optab_default);
  if (optab != NULL
   optab2 != NULL
   optab_handler (optab, vec_mode) != CODE_FOR_nothing
-  optab_handler (optab2, vec_mode) != CODE_FOR_nothing)
+  optab_handler (optab2, vec_mode) != CODE_FOR_nothing
+  insn_data[optab_handler (optab, vec_mode)].operand[0].mode
+== TYPE_MODE (wide_vectype)
+  insn_data[optab_handler (optab2,
+ vec_mode)].operand[0].mode
+== TYPE_MODE (wide_vectype))
{
  for (i = 0; i  nunits_in; i++)
sel[i] = !BYTES_BIG_ENDIAN + 2 * i;


Jakub


Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-27 Thread Steven Bosscher
On Wed, Jun 27, 2012 at 11:37 PM, Richard Henderson r...@redhat.com wrote:
        * tree.def (MULT_HIGHPART_EXPR): New.
        * cfgexpand.c (expand_debug_expr): Ignore it.
        * expr.c (expand_expr_real_2): Handle it.
        * fold-const.c (int_const_binop_1): Likewise.
        * optabs.c (optab_for_tree_code): Likewise.
        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
        * tree-inline.c (estimate_operator_cost): Likewise.
        * tree-pretty-print.c (dump_generic_node): Likewise.
        (op_code_prio, op_symbol_code): Likewise.
        * tree.c (commutative_tree_code): Likewise.  Also handle
        WIDEN_MULT_EXPR, VEC_WIDEN_MULT_HI_EXPR, VEC_WIDEN_MULT_LO_EXPR.

Maybe also a bit in doc/generic.texi? Or is this not supposed to be
exposed to the front ends?

Ciao!
Steven


Re: [PATCH] Add MULT_HIGHPART_EXPR

2012-06-27 Thread Richard Henderson
On 06/27/2012 02:42 PM, Steven Bosscher wrote:
 Maybe also a bit in doc/generic.texi? Or is this not supposed to be
 exposed to the front ends?

I can't imagine it being terribly useful to front ends, but there's
certainly nothing that ought to prevent it.  How's this?


r~
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index e99366f..c48b663 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1235,6 +1235,7 @@ the byte offset of the field, but should not be used 
directly; call
 @tindex PLUS_EXPR
 @tindex MINUS_EXPR
 @tindex MULT_EXPR
+@tindex MULT_HIGHPART_EXPR
 @tindex RDIV_EXPR
 @tindex TRUNC_DIV_EXPR
 @tindex FLOOR_DIV_EXPR
@@ -1433,6 +1434,11 @@ one operand is of floating type and the other is of 
integral type.
 The behavior of these operations on signed arithmetic overflow is
 controlled by the @code{flag_wrapv} and @code{flag_trapv} variables.
 
+@item MULT_HIGHPART_EXPR
+This node represents the ``high-part'' of a widening multiplication.
+For an integral type with @var{b} bits of precision, the result is
+the most significant @var{b} bits of the full @math{2@var{b}} product.
+
 @item RDIV_EXPR
 This node represents a floating point division operation.