Tweak partitioning for less streaming

2013-08-27 Thread Jan Hubicka
Hi,
this is something we noticed during experiments with Martin Liska.  The 
streaming
works a lot better if partitioning is based on original source order instead
of reverse postorder and both orders seems to work similarly badly code quality
wise.

This reduces streaming needed by firefox to 1/2.

Bootstrapped/regtested ppc64-linux, comitted.

* lto-partition.c (lto_balanced_map): Always base order on 
source file order.
Index: lto-partition.c
===
--- lto-partition.c (revision 202000)
+++ lto-partition.c (working copy)
@@ -449,11 +449,9 @@ lto_balanced_map (void)
 {
   int n_nodes = 0;
   int n_varpool_nodes = 0, varpool_pos = 0, best_varpool_pos = 0;
-  struct cgraph_node **postorder =
-XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
   struct cgraph_node **order = XNEWVEC (struct cgraph_node *, cgraph_max_uid);
   struct varpool_node **varpool_order = NULL;
-  int i, postorder_len;
+  int i;
   struct cgraph_node *node;
   int total_size = 0, best_total_size = 0;
   int partition_size;
@@ -468,24 +466,20 @@ lto_balanced_map (void)
 
   FOR_EACH_VARIABLE (vnode)
 gcc_assert (!vnode->symbol.aux);
-  /* Until we have better ordering facility, use toplogical order.
- Include only nodes we will partition and compute estimate of program
- size.  Note that since nodes that are not partitioned might be put into
- multiple partitions, this is just an estimate of real size.  This is why
- we keep partition_size updated after every partition is finalized.  */
-  postorder_len = ipa_reverse_postorder (postorder);
 
-  for (i = 0; i < postorder_len; i++)
-{
-  node = postorder[i];
-  if (get_symbol_class ((symtab_node) node) == SYMBOL_PARTITION)
-   {
- order[n_nodes++] = node;
-  total_size += inline_summary (node)->size;
-   }
-}
-  free (postorder);
+  FOR_EACH_DEFINED_FUNCTION (node)
+if (get_symbol_class ((symtab_node) node) == SYMBOL_PARTITION)
+  {
+   order[n_nodes++] = node;
+   total_size += inline_summary (node)->size;
+  }
 
+  /* Streaming works best when the source units do not cross partition
+ boundaries much.  This is because importing function from a source
+ unit tends to import a lot of global trees defined there.  We should
+ get better about minimizing the function bounday, but until that
+ things works smoother if we order in source order.  */
+  qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp);
   if (!flag_toplevel_reorder)
 {
   qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp);


Re: [PATCH][2/n] 2nd try: Re-organize -fvect-cost-model, enable basic vectorization at -O2

2013-08-27 Thread Xinliang David Li
Richard, I have some comments about the patch.

>   -ftree-vectorizer-verbose=This switch is deprecated. Use 
> -fopt-info instead.
>
>   ftree-slp-vectorize
> ! Common Report Var(flag_tree_slp_vectorize) Optimization
>   Enable basic block vectorization (SLP) on trees

The code dealing with the interactions between -ftree-vectorize, O3,
etc are complicated and hard to understand. Is it better to change the
meaning of -ftree-vectorize to mean -floop-vectorize only, and make it
independent of -fslp-vectorize?  P


>
> + fvect-cost-model=
> + Common Joined RejectNegative Enum(vect_cost_model) 
> Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT)
> + Specifies the cost model for vectorization
> +
> + Enum
> + Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
> vectorizer cost model %qs)
> +
> + EnumValue
> + Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED)
> +
> + EnumValue
> + Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC)
> +
> + EnumValue
> + Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP)

Introducing cheap model is a great change.

> +

> *** 173,179 
>   {
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>
> !   if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 0)
>   return false;
>
> if (dump_enabled_p ())
> --- 173,180 
>   {
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>
> !   if (loop_vinfo->cost_model == VECT_COST_MODEL_CHEAP
> !   || (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) 
> == 0)
>   return false;
>

When the cost_model == cheap, the alignment peeling should also be
disabled -- there will still be loops that are beneficial to be
vectorized without peeling -- at perhaps reduced net runtime gain.



>   struct gimple_opt_pass pass_slp_vectorize =
> --- 206,220 
>   static bool
>   gate_vect_slp (void)
>   {
> !   /* Apply SLP either according to whether the user specified whether to
> !  run SLP or not, or according to whether the user specified whether
> !  to do vectorization or not.  */
> !   if (global_options_set.x_flag_tree_slp_vectorize)
> ! return flag_tree_slp_vectorize != 0;
> !   if (global_options_set.x_flag_tree_vectorize)
> ! return flag_tree_vectorize != 0;
> !   /* And if vectorization was enabled by default run SLP only at -O3.  */
> !   return flag_tree_vectorize != 0 && optimize == 3;
>   }

The logic can be greatly simplified if slp vectorizer is controlled
independently -- easier for user to understand too.


> ! @item -fvect-cost-model=@var{model}
>   @opindex fvect-cost-model
> ! Alter the cost model used for vectorization.  The @var{model} argument
> ! should be one of @code{unlimited}, @code{dynamic} or @code{cheap}.
> ! With the @code{unlimited} model the vectorized code-path is assumed
> ! to be profitable while with the @code{dynamic} model a runtime check
> ! will guard the vectorized code-path to enable it only for iteration
> ! counts that will likely execute faster than when executing the original
> ! scalar loop.  The @code{cheap} model will disable vectorization of
> ! loops where doing so would be cost prohibitive for example due to
> ! required runtime checks for data dependence or alignment but otherwise
> ! is equal to the @code{dynamic} model.
> ! The default cost model depends on other optimization flags and is
> ! either @code{dynamic} or @code{cheap}.
>

Vectorizer in theory will only vectorize a loop with net runtime gain,
so the 'cost' here should only mean code size and compile time cost.

Cheap Model: with this model, the compiler will vectorize loops that
are considered beneficial for runtime performance with minimal code
size increase and compile time cost;
Unlimited Model: compiler will vectorize loops to maximize runtime
gain without considering compile time cost and impact to code size;


thanks,

David


COMDAT missing profile handling (was Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition)

2013-08-27 Thread Teresa Johnson
On Wed, Aug 21, 2013 at 8:25 AM, Jan Hubicka  wrote:
>> >
>> > Because offline COMDAT functoin will be porduced for every COMDAT used, I 
>> > think
>> > it is bad to porduce any COMDAT (or any reachable function via calls with 
>> > non-0
>> > count) that has empty profile (either because it got lost by COMDAT merging
>> > or because of reading mismatch).
>>
>> The approach this patch takes is to simply treat those functions the
>> same as we would if we didn't feed back profile data in the first
>> place, by using the frequencies. This is sufficient except when one is
>> inlined, which is why I have the special handling in the inliner
>> itself.
>
> Yes, my orignal plan was to have per-function profile_status that
> specify if profile is read, guessed or absent and do function local
> decision sanely with each setting.
>
> Here we read the function, we set profile to READ (with all counts being 0).
> We should drop it to GUESSED when we see that there are non-0 count edges
> calling the function in question and probably we should see if it is obviously
> hot (i.e. reachable by a hot call) and promote its function profile to HOT
> then to get code placement less bad...
>> >
>> > Since new direct calls can be discovered later, inline may want to do that
>> > again each time it inlines non-0 count call of COMDAT with 0 count...
>>
>> How about an approach like this:
>> - Invoke init_and_estimate_bb_frequencies as I am doing to guess the
>> profiles at profile read time for functions with 0 counts.
>
> I see, here we are out of sync.
> We always used to go with estimated frequencies for functions with 0 counts,
> but it seems that this code broke when prediction was moved before profiling.
> (we also should keep edge probabilities from predict.c in that case)
>
> The esitmated profile is already there before reading the profile in, so we
> only do not want to overwrite it.  Does the following work for you?

It does get the estimated frequencies on the bbs.

>
> Index: tree-profile.c
> ===
> --- tree-profile.c  (revision 201838)
> +++ tree-profile.c  (working copy)
> @@ -604,6 +604,34 @@
>
>pop_cfun ();
>  }
> +  /* See if 0 count function has non-0 count callers.  In this case we
> + lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +{
> +  struct cgraph_edge *e;
> +  bool called = false;
> +  if (node->count)
> +   continue;
> +  for (e = node->callers; e; e = e->next_caller)
> +   {
> + if (e->count)
> +   called = true;
> + if (cgraph_maybe_hot_edge_p (e))
> +   break;
> +   }
> +  if (e || called
> + && profile_status_for_function
> + (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
> +   {
> + if (dump_file)
> +   fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on 
> calls.\n",
> +cgraph_node_name (node), node->symbol.order,
> +e ? "function is hot" : "function is normal");
> + profile_status_for_function (DECL_STRUCT_FUNCTION 
> (node->symbol.decl))
> +   = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> + node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> +   }
> +}
>
>del_node_map();
>return 0;
> Index: predict.c
> ===
> --- predict.c   (revision 201838)
> +++ predict.c   (working copy)
> @@ -2715,6 +2715,9 @@
>gcov_type count_max, true_count_max = 0;
>basic_block bb;
>
> +  if (!ENTRY_BLOCK_PTR->count)
> +return 0;
> +
>FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>  true_count_max = MAX (bb->count, true_count_max);
>
>
>> - At inline time, invoke some kind of freqs_to_counts routine for any
>> 0-count routine that is reached by non-zero call edges. It would take
>
> We should not need that since frequencies ought to be there.
>
>> the sum of all incoming call edge counts and synthesize counts for the
>> bbs using the guessed profile frequencies applied earlier by
>> init_and_estimate_bb_frequencies. Then the inliner can do its normal
>> bb count scaling.
>
> Yes, i guess we should go this way.  Still we will need to watch overly large 
> values.
> Recrusive inlining can probably easily produce quite a nonsense here.
>
> We wil also need to solve problem that in this case cgraph edges will have 0 
> profile.
> We probably want to play the game there and just do the scaling for edge 
> count,
> since IPA passes probably do not want to care about partial profiles.

The problem I am having is that my new freqs_to_counts routine takes
the sum of the incoming edge counts and computes the bb counts by
scaling by the estimated bb frequency. But in the case where the
caller was also missing profile data the edge count will have 0
profile 

Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)

2013-08-27 Thread Ryan Hill
On Tue, 27 Aug 2013 19:41:09 +0200
Uros Bizjak  wrote:

> On Tue, Aug 27, 2013 at 7:36 PM, H.J. Lu  wrote:
> 
> >>> As reported in [1] the host processor detection has not yet been updated
> >>> to recognize Intel Ivy Bridge and Haswell processors.
> >>> This small patch adds the detection of these processors and assumes
> >>> core-avx2 as march for unknown processors of the PENTIUMPRO family that
> >>> support AVX2.
> >>
> >> I have committed slightly improved (attached) patch that uses
> >> core-avx-i for IvyBridge and adds another IvyBridge model number.
> >> While there, I also reordered a bunch  of statements.
> >>
> >> Thanks,
> >> Uros.
> >
> > Page C-3 in ntel optimization guide shows:
> >
> > 06_3CH, 06_45H, 06_46H Intel microarchitecture Haswell
> > 06_3AH, 06_3EH Intel microarchitecture Ivy Bridge
> > 06_2AH, 06_2DH Intel microarchitecture Sandy Bridge
> > 06_25H, 06_2CH, 06_2FH Intel microarchitecture Westmere
> > 06_1AH, 06_1EH, 06_1FH, Intel microarchitecture Nehalem
> > 06_2EH
> > 06_17H, 06_1DH Enhanced Intel Core microarchitecture
> > 06_0FH Intel Core microarchitecture
> >
> > At least, we should add 0x45 and 0x46 to Haswell.
> 
> OK, the patch is pre-approved.

Can these changes be backported to 4.8 and the Ivy Bridge parts to 4.7 as
well? We've had a couple reports of bad -march=native results for Haswell on
4.7.  I can file PRs if you're interested.


-- 
Ryan Hillpsn: dirtyepic_sk
   gcc-porting/toolchain/wxwidgets @ gentoo.org

47C3 6D62 4864 0E49 8E9E  7F92 ED38 BD49 957A 8463


signature.asc
Description: PGP signature


Re: v3 of GDB hooks for debugging GCC

2013-08-27 Thread David Malcolm
On Tue, 2013-08-27 at 09:58 -0600, Jeff Law wrote:
> On 08/26/2013 12:42 PM, David Malcolm wrote:
> > The patch also adds me a maintainer of gdbhooks.py into the MAINTAINERS
> > file.  (There doesn't seem to be any sort order to the maintainer part
> > of that file, should there be?)
> >
> > Finally, I added a copyright header to the new file ("part of GCC", FSF
> > assignee, GPLv3 or later).
> >
> > OK for trunk?
> Yes.  This is good.

Thanks; committed to trunk as r202040.



Re: [PATCH 3/3] Support dumping type bindings in lambda diagnostics.

2013-08-27 Thread Jason Merrill

On 08/27/2013 02:46 PM, Adam Butcher wrote:

Okay.  As it stands, it means that you get an additional 'const' in
diagnostics for lambda's not declared 'mutable'.


Hmm, I guess it would be preferable to use 'mutable' or nothing when 
printing the lambda just like when declaring one.


Jason



Re: RFA: Consider int and same-size short as equivalent vector components

2013-08-27 Thread Paolo Carlini

Hi,

On 08/28/2013 12:29 AM, Joern Rennecke wrote:

* c-common.h: (same_scalar_type_ignoring_signedness): Delete prototype.
 (vector_types_compatible_elements_p): Prototype.
Sorry for nitpicking, but since we are now using C++ I think declaration 
is more correct than prototype.


Paolo.


Re: RFA: Consider int and same-size short as equivalent vector components

2013-08-27 Thread Joern Rennecke

Quoting Marc Glisse :


On Mon, 26 Aug 2013, Joern Rennecke wrote:


Quoting Marc Glisse :


The issue seems larger than just short/int. On x86, (l

I don't understand what you mean here.  Could you post the actual   
code sample?


typedef long vl __attribute__((vector_size(16)));

void f(vl l){
  (l

Checking what tree.h has to say about the meaning of TYPE_VECTOR_OPAQUE,
I found it says:
/* Nonzero in a VECTOR_TYPE if the frontends should not emit warnings
   about missing conversions to other vector types of the same size.  */

So, presumably, the front ends should accept any combination of an opaque
vector with another vector (opaque or not) if the operation can be performed.
I don't want into all the ways a conversion might be used to make an
operation work at this time, but at least we should accept combinations
of vectors that are essentially the same.

I found for our test cases, this not only requires replacing the two
extant uses of same_scalar_type_ignoring_signedness, but also
replacing/augmenting three euqality comparisons.

For the C frontend, these were two simple equality comparisons, which
could be replaced straightforwardly;  For the C++ frontend, I'm not sure
if the same_type_ignoring_top_level_qualifiers_p check becomes redundant,
and less so, if so, if it'll stay redundant in the future.
The safe option seemed to be to have both tests.  I added the C++ front end
to the CC list in case they have some insight / opinion to offer on this.

I have successfully bootstrapped / regtested the attached patch on
i686-pc-linux-gnu.
2013-08-27  Joern Rennecke  

gcc/c-family:
* c-common.c (same_scalar_type_ignoring_signedness): Delete.
(vector_types_compatible_elements_p): New function.
* c-common.h: (same_scalar_type_ignoring_signedness): Delete prototype.
(vector_types_compatible_elements_p): Prototype.
gcc/c:
* c-typeck.c (build_binary_op): Use vector_types_compatible_elements_p.
gcc/cp:
* typeck.c (cp_build_binary_op): Use vector_types_compatible_elements_p.
gcc/testsuite:
* c-c++-common/opaque-vector.c: New test.

Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 202008)
+++ c-family/c-common.c (working copy)
@@ -10690,20 +10690,36 @@ resolve_overloaded_builtin (location_t l
 }
 }
 
-/* Ignoring their sign, return true if two scalar types are the same.  */
+/* True if vector types T1 and T2 can be inputs to the same binary
+   operator without conversion.
+   We don't check the overall vector size here because some of our callers
+   want to give different error messages when the vectors are compatible
+   except for the element count.  */
+
 bool
-same_scalar_type_ignoring_signedness (tree t1, tree t2)
+vector_types_compatible_elements_p (tree t1, tree t2)
 {
+  bool opaque = TYPE_VECTOR_OPAQUE (t1) || TYPE_VECTOR_OPAQUE (t2);
+  t1 = TREE_TYPE (t1);
+  t2 = TREE_TYPE (t2);
+
   enum tree_code c1 = TREE_CODE (t1), c2 = TREE_CODE (t2);
 
   gcc_assert ((c1 == INTEGER_TYPE || c1 == REAL_TYPE || c1 == FIXED_POINT_TYPE)
  && (c2 == INTEGER_TYPE || c2 == REAL_TYPE
  || c2 == FIXED_POINT_TYPE));
 
+  t1 = c_common_signed_type (t1);
+  t2 = c_common_signed_type (t2);
   /* Equality works here because c_common_signed_type uses
  TYPE_MAIN_VARIANT.  */
-  return c_common_signed_type (t1)
-== c_common_signed_type (t2);
+  if (t1 == t2)
+return true;
+  if (opaque && c1 == c2
+  && (c1 == INTEGER_TYPE || c1 == REAL_TYPE)
+  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2))
+return true;
+  return false;
 }
 
 /* Check for missing format attributes on function pointers.  LTYPE is
Index: c-family/c-common.h
===
--- c-family/c-common.h (revision 202008)
+++ c-family/c-common.h (working copy)
@@ -766,7 +766,7 @@ extern void warn_logical_operator (locat
   enum tree_code, tree, enum tree_code, tree);
 extern void check_main_parameter_types (tree decl);
 extern bool c_determine_visibility (tree);
-extern bool same_scalar_type_ignoring_signedness (tree, tree);
+extern bool vector_types_compatible_elements_p (tree, tree);
 extern void mark_valid_location_for_stdc_pragma (bool);
 extern bool valid_location_for_stdc_pragma_p (void);
 extern void set_float_const_decimal64 (void);
Index: c/c-typeck.c
===
--- c/c-typeck.c(revision 202008)
+++ c/c-typeck.c(working copy)
@@ -9973,7 +9973,7 @@ build_binary_op (location_t location, en
   if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
 {
   tree intt;
-  if (TREE_TYPE (type0) != TREE_TYPE (type1))
+ if (!vector_types_compatible_elements_p (type0, type1))
 {
   error_at (location, "comparing vectors with different "
   

[wwwdocs] Use https for twitter.com link

2013-08-27 Thread Gerald Pfeifer
No cookie, Gerald.  I should have caught this permanent redirect
(verified with wget among others).

Change applied.

Gerald

Index: style.mhtml
===
RCS file: /cvs/gcc/wwwdocs/htdocs/style.mhtml,v
retrieving revision 1.121
diff -u -3 -p -r1.121 style.mhtml
--- style.mhtml 13 Aug 2013 00:46:33 -  1.121
+++ style.mhtml 27 Aug 2013 20:03:57 -
@@ -158,7 +158,7 @@
   Mailing lists
   http://gcc.gnu.org/onlinedocs/gcc/Contributors.html";>Contributors
   Steering Committee
-  http://twitter.com/gnutools";>
+  https://twitter.com/gnutools";>
 @gnutools


Re: opt-info message change for vectorizer

2013-08-27 Thread Xinliang David Li
If this is the convention, we should probably have another patch to
fix all the existing opt-info messages.

thanks,

David

On Tue, Aug 27, 2013 at 1:23 PM, Mike Stump  wrote:
> On Aug 27, 2013, at 11:22 AM, Xinliang David Li  wrote:
>> Does this one look ok?
>
> We don't capitalize text after error:, warning: or note:.
>
>> thanks,
>>
>> David
>>
>> On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li  
>> wrote:
>>> Hi, In this patch, loop alignment peeling and loop versioning
>>> transformation will be reported via -fopt-info by default. This will
>>> help vectorizer size tuning.
>>>
>>> It also enhances the opt-info dump to include current function name as
>>> context. Otherwise, we may see duplicate messeages from inline/cloned
>>> instances.
>>>
>>> An example opt report:
>>>
>>>
>>>
>>> b.c:16:A::foo: note: Loop is vectorized
>>>
>>> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization
>>>
>>> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization
>>>
>>> b.c:16:A::foo: note: Completely unroll loop 6 times
>>>
>>> b.c:12:A::foo: note: Completely unroll loop 6 times
>>>
>>>
>>> Ok after testing?
>>>
>>> thanks,
>>>
>>> David
>


wide-int branch: cleaned up comparison functions.

2013-08-27 Thread Kenneth Zadeck
Removed the redundant implementations of several comparison function by 
just forwarding the oo version to the static version.   Added static 
versions of cmp, cmpu and cmps.


kenny
Index: gcc/wide-int.h
===
--- gcc/wide-int.h	(revision 202033)
+++ gcc/wide-int.h	(working copy)
@@ -443,12 +443,18 @@ public:
 
   template 
   int cmp (const T &, signop) const;
+  template 
+  static int cmp (const T1 &, const T2 &, signop);
 
   template 
   int cmps (const T &) const;
+  template 
+  static int cmps (const T1 &, const T2 &);
 
   template 
   int cmpu (const T &) const;
+  template 
+  static int cmpu (const T1 &, const T2 &);
 
   bool only_sign_bit_p (unsigned int) const;
   bool only_sign_bit_p () const;
@@ -1137,38 +1143,7 @@ template 
 inline bool
 wide_int_ro::operator == (const T &c) const
 {
-  bool result;
-  HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS];
-  const HOST_WIDE_INT *s;
-  unsigned int cl;
-  unsigned int p1, p2;
-
-  p1 = precision;
-
-  s = to_shwi1 (ws, &cl, &p2, c);
-  check_precision (&p1, &p2, true, false);
-
-  if (p1 == 0)
-/* There are prec 0 types and we need to do this to check their
-   min and max values.  */
-result = (len == cl) && (val[0] == s[0]);
-  else if (p1 < HOST_BITS_PER_WIDE_INT)
-{
-  unsigned HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << p1) - 1;
-  result = (val[0] & mask) == (s[0] & mask);
-}
-  else if (p1 == HOST_BITS_PER_WIDE_INT)
-result = val[0] == s[0];
-  else
-result = eq_p_large (val, len, p1, s, cl);
-
-  if (result)
-gcc_assert (len == cl);
-
-#ifdef DEBUG_WIDE_INT
-  debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2);
-#endif
-  return result;
+  return wide_int_ro::eq_p (*this, c);
 }
 
 /* Return true if C1 == C2.  If both parameters have nonzero precisions,
@@ -1219,31 +1194,7 @@ template 
 inline bool
 wide_int_ro::lts_p (const T &c) const
 {
-  bool result;
-  HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS];
-  const HOST_WIDE_INT *s;
-  unsigned int cl;
-  unsigned int p1, p2;
-
-  p1 = precision;
-  s = to_shwi1 (ws, &cl, &p2, c);
-  check_precision (&p1, &p2, false, true);
-
-  if (p1 <= HOST_BITS_PER_WIDE_INT
-  && p2 <= HOST_BITS_PER_WIDE_INT)
-{
-  gcc_assert (cl != 0);
-  HOST_WIDE_INT x0 = sext_hwi (val[0], p1);
-  HOST_WIDE_INT x1 = sext_hwi (s[0], p2);
-  result = x0 < x1;
-}
-  else
-result = lts_p_large (val, len, p1, s, cl, p2);
-
-#ifdef DEBUG_WIDE_INT
-  debug_vwa ("wide_int_ro:: %d = (%s lts_p %s\n", result, *this, s, cl, p2);
-#endif
-  return result;
+  return wide_int_ro::lts_p (*this, c);
 }
 
 /* Return true if C1 < C2 using signed comparisons.  */
@@ -1283,30 +1234,7 @@ template 
 inline bool
 wide_int_ro::ltu_p (const T &c) const
 {
-  bool result;
-  HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS];
-  const HOST_WIDE_INT *s;
-  unsigned int cl;
-  unsigned int p1, p2;
-
-  p1 = precision;
-  s = to_shwi1 (ws, &cl, &p2, c);
-  check_precision (&p1, &p2, false, true);
-
-  if (p1 <= HOST_BITS_PER_WIDE_INT
-  && p2 <= HOST_BITS_PER_WIDE_INT)
-{
-  unsigned HOST_WIDE_INT x0 = zext_hwi (val[0], p1);
-  unsigned HOST_WIDE_INT x1 = zext_hwi (s[0], p2);
-  result = x0 < x1;
-}
-  else
-result = ltu_p_large (val, len, p1, s, cl, p2);
-
-#ifdef DEBUG_WIDE_INT
-  debug_vwa ("wide_int_ro:: %d = (%s ltu_p %s)\n", result, *this, s, cl, p2);
-#endif
-  return result;
+  return wide_int_ro::ltu_p (*this, c); 
 }
 
 /* Return true if C1 < C2 using unsigned comparisons.  */
@@ -1524,27 +1452,37 @@ wide_int_ro::ge_p (const T1 &c1, const T
 return geu_p (c1, c2);
 }
 
-/* Returns -1 if THIS < C, 0 if THIS == C and 1 if A > C using
+/* Returns -1 if THIS < C, 0 if THIS == C and 1 if THIS > C using
signed compares.  */
 template 
 inline int
 wide_int_ro::cmps (const T &c) const
 {
+  return wide_int_ro::cmps (*this, c);
+}
+
+/* Returns -1 if C1 < C2, 0 if C1 == C2 and 1 if C1 > C2 using
+   signed compares.  */
+template 
+inline int
+wide_int_ro::cmps (const T1 &c1, const T2 &c2)
+{
   int result;
-  HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS];
-  const HOST_WIDE_INT *s;
-  unsigned int cl;
-  unsigned int prec;
+  HOST_WIDE_INT ws1[WIDE_INT_MAX_ELTS];
+  HOST_WIDE_INT ws2[WIDE_INT_MAX_ELTS];
+  const HOST_WIDE_INT *s1, *s2;  /* Returned data */
+  unsigned int cl1, cl2; /* array lengths  */
+  unsigned int p1, p2;   /* precisions */
 
-  s = to_shwi1 (ws, &cl, &prec, c);
-  if (prec == 0)
-prec = precision;
+  s1 = to_shwi1 (ws1, &cl1, &p1, c1);
+  s2 = to_shwi1 (ws2, &cl2, &p2, c2);
+  check_precision (&p1, &p2, false, true);
 
-  if (precision <= HOST_BITS_PER_WIDE_INT
-  && prec <= HOST_BITS_PER_WIDE_INT)
+  if (p1 <= HOST_BITS_PER_WIDE_INT
+  && p2 <= HOST_BITS_PER_WIDE_INT)
 {
-  HOST_WIDE_INT x0 = sext_hwi (val[0], precision);
-  HOST_WIDE_INT x1 = sext_hwi (s[0], prec);
+  HOST_WIDE_INT x0 = sext_hwi (s1[0], p1);
+  HOST_WIDE_INT x1 = 

Re: opt-info message change for vectorizer

2013-08-27 Thread Mike Stump
On Aug 27, 2013, at 11:22 AM, Xinliang David Li  wrote:
> Does this one look ok?

We don't capitalize text after error:, warning: or note:.

> thanks,
> 
> David
> 
> On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li  wrote:
>> Hi, In this patch, loop alignment peeling and loop versioning
>> transformation will be reported via -fopt-info by default. This will
>> help vectorizer size tuning.
>> 
>> It also enhances the opt-info dump to include current function name as
>> context. Otherwise, we may see duplicate messeages from inline/cloned
>> instances.
>> 
>> An example opt report:
>> 
>> 
>> 
>> b.c:16:A::foo: note: Loop is vectorized
>> 
>> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization
>> 
>> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization
>> 
>> b.c:16:A::foo: note: Completely unroll loop 6 times
>> 
>> b.c:12:A::foo: note: Completely unroll loop 6 times
>> 
>> 
>> Ok after testing?
>> 
>> thanks,
>> 
>> David



[wide-int] fix mips build error

2013-08-27 Thread Mike Stump
Since we use these in a cpp #if, we have to use #define for them.

Index: wide-int.h
===
--- wide-int.h  (revision 202032)
+++ wide-int.h  (working copy)
@@ -249,15 +249,15 @@ along with GCC; see the file COPYING3.  
on any platform is 64 bits.  When that changes, then it is likely
that a target hook should be defined so that targets can make this
value larger for those targets.  */
-const int addr_max_bitsize = 64;
+#define ADDR_MAX_BITSIZE 64
 
 /* This is the internal precision used when doing any address
arithmetic.  The '4' is really 3 + 1.  Three of the bits are for
the number of extra bits needed to do bit addresses and single bit is
allow everything to be signed without loosing any precision.  Then
everything is rounded up to the next HWI for efficiency.  */
-const int addr_max_precision
-  = ((addr_max_bitsize + 4 + HOST_BITS_PER_WIDE_INT - 1) & 
~(HOST_BITS_PER_WIDE_INT - 1));
+#define ADDR_MAX_PRECISION \
+  ((ADDR_MAX_BITSIZE + 4 + HOST_BITS_PER_WIDE_INT - 1) & 
~(HOST_BITS_PER_WIDE_INT - 1))
 
 /* This is used to bundle an rtx and a mode together so that the pair
can be used as the second operand of a wide int expression.  If we
@@ -4092,7 +4092,7 @@ fixed_wide_int ::operator -= (c
 
 /* A wide_int_ro that has a large enough precision to do any address math
on the target.  */
-typedef fixed_wide_int  addr_wide_int;
+typedef fixed_wide_int  addr_wide_int;
 /* A wide_int_ro that has a large enough precision to do any math on the
target.  */
 typedef fixed_wide_int  max_wide_int;
@@ -4132,14 +4132,14 @@ template <>
 inline const HOST_WIDE_INT*
 wide_int_ro::to_shwi1 (HOST_WIDE_INT *s ATTRIBUTE_UNUSED,
   unsigned int *l, unsigned int *p,
-  const fixed_wide_int  &y)
+  const fixed_wide_int  &y)
 {
   *p = y.get_precision ();
   *l = y.get_len ();
   return y.get_val ();
 }
 
-#if addr_max_precision != MAX_BITSIZE_MODE_ANY_INT
+#if ADDR_MAX_PRECISION != MAX_BITSIZE_MODE_ANY_INT
 template <>
 inline const HOST_WIDE_INT*
 wide_int_ro::to_shwi1 (HOST_WIDE_INT *s ATTRIBUTE_UNUSED,



Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-08-27 Thread Richard Henderson
On 08/27/2013 11:11 AM, Kirill Yukhin wrote:
>> > What happened to the bmi andn alternative we discussed?
> BMI only supported for 4- and 8- byte integers, while
> kandw - for HI/QI
> 

We're talking about values in registers.  Ignoring the high bits of the andn
result still produces the correct results.


r~


[PATCH 4/4] Support using 'auto' in a function parameter list to introduce an implicit template parameter.

2013-08-27 Thread Adam Butcher
* cp-tree.h (type_uses_auto_or_concept): Declare.
(is_auto_or_concept): Declare.
* decl.c (grokdeclarator): Allow 'auto' parameters with -std=gnu++1y or
-std=c++1y.
* type-utils.h: New header defining ...
(find_type_usage): ... this new template based on pt.c (type_uses_auto)
for searching a type tree given a predicate.
* pt.c (type_uses_auto): Reimplement via type-utils.h (find_type_usage).
(is_auto_or_concept): New function.
(type_uses_auto_or_concept): New function.
* parser.h (struct cp_parser): Add fully_implicit_function_template_p.
* parser.c (cp_parser_new): Initialize 
fully_implicit_function_template_p.
(cp_parser_lambda_expression): Copy and restore value of
fully_implicit_function_template_p as per other parser fields.
(cp_parser_parameter_declaration_list): Count generic
parameters and call ...
(add_implicit_template_parms): ... this new function to synthesize them
with help from type-utils.h (find_type_usage), ...
(tree_type_is_auto_or_concept): ... this new static function and ...
(make_generic_type_name): ... this new static function.
(cp_parser_direct_declarator): Account for implicit template parameters.
(cp_parser_lambda_declarator_opt): Finish fully implicit template if
necessary by calling ...
(finish_fully_implicit_template): ... this new function.
(cp_parser_member_declaration): Likewise.
(cp_parser_function_definition_after_declarator): Likewise.
---
 gcc/cp/cp-tree.h|   2 +
 gcc/cp/decl.c   |   7 +-
 gcc/cp/parser.c | 182 ++--
 gcc/cp/parser.h |   6 ++
 gcc/cp/pt.c |  35 +-
 gcc/cp/type-utils.h |  56 
 6 files changed, 264 insertions(+), 24 deletions(-)
 create mode 100644 gcc/cp/type-utils.h

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 876a72a..8d4ac94 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5453,10 +5453,12 @@ extern tree make_auto   (void);
 extern tree make_decltype_auto (void);
 extern tree do_auto_deduction  (tree, tree, tree);
 extern tree type_uses_auto (tree);
+extern tree type_uses_auto_or_concept  (tree);
 extern void append_type_to_template_for_access_check (tree, tree, tree,
  location_t);
 extern tree splice_late_return_type(tree, tree);
 extern bool is_auto(const_tree);
+extern bool is_auto_or_concept (const_tree);
 extern tree process_template_parm  (tree, location_t, tree, 
 bool, bool);
 extern tree end_template_parm_list (tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 4076a24..df44a6e 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10325,9 +10325,12 @@ grokdeclarator (const cp_declarator *declarator,
   if (ctype || in_namespace)
error ("cannot use %<::%> in parameter declaration");
 
-  if (type_uses_auto (type))
+  if (type_uses_auto (type) && cxx_dialect < cxx1y)
{
- error ("parameter declared %");
+ pedwarn (location_of (type), 0,
+  "use of % in parameter declaration "
+  "only available with "
+  "-std=c++1y or -std=gnu++1y");
  type = error_mark_node;
}
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1f0c2c2..7147bfa 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "plugin.h"
 #include "tree-pretty-print.h"
 #include "parser.h"
+#include "type-utils.h"
 
 
 /* The lexer.  */
@@ -2063,6 +2064,11 @@ static vec 
*cp_parser_initializer_list
 static bool cp_parser_ctor_initializer_opt_and_function_body
   (cp_parser *, bool);
 
+static tree add_implicit_template_parms
+  (cp_parser *, size_t, tree);
+static tree finish_fully_implicit_template
+  (cp_parser *, tree);
+
 /* Classes [gram.class] */
 
 static tree cp_parser_class_name
@@ -3385,6 +3391,9 @@ cp_parser_new (void)
   /* No template parameters apply.  */
   parser->num_template_parameter_lists = 0;
 
+  /* Not declaring an implicit function template.  */
+  parser->fully_implicit_function_template_p = false;
+
   return parser;
 }
 
@@ -8549,10 +8558,12 @@ cp_parser_lambda_expression (cp_parser* parser)
 = parser->num_template_parameter_lists;
 unsigned char in_statement = parser->in_statement;
 bool in_switch_statement_p = parser->in_switch_statement_p;
+bool fully_implicit_function_template_p = 
parser->fully_implicit_function_template_p;
 
 parser->num_template_parameter_lists = 0;
 parser->in_statement = 0;
 parser->in_switch_statement_p = false;
+parser->fully_implicit_func

[PATCH 3/4] Support dumping type bindings in lambda diagnostics.

2013-08-27 Thread Adam Butcher
* error.c (dump_function_decl): Use standard diagnostic flow to dump a
lambda diagnostic, albeit without stating the function name or
duplicating the parameter spec (which is dumped as part of the type).
---
 gcc/cp/error.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index c82a0ce..27ff962 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1380,14 +1380,7 @@ dump_function_decl (cxx_pretty_printer *pp, tree t, int 
flags)
   int do_outer_scope = ! (flags & TFF_UNQUALIFIED_NAME);
   tree exceptions;
   vec *typenames = NULL;
-
-  if (DECL_NAME (t) && LAMBDA_FUNCTION_P (t))
-{
-  /* A lambda's signature is essentially its "type", so defer.  */
-  gcc_assert (LAMBDA_TYPE_P (DECL_CONTEXT (t)));
-  dump_type (pp, DECL_CONTEXT (t), flags);
-  return;
-}
+  bool lambda_p = false;
 
   flags &= ~(TFF_UNQUALIFIED_NAME | TFF_TEMPLATE_NAME);
   if (TREE_CODE (t) == TEMPLATE_DECL)
@@ -1449,16 +1442,23 @@ dump_function_decl (cxx_pretty_printer *pp, tree t, int 
flags)
   else if (cname)
 {
   dump_type (pp, cname, flags);
-  pp_cxx_colon_colon (pp);
+  if (LAMBDA_TYPE_P (cname))
+   lambda_p = true;
+  else
+   pp_cxx_colon_colon (pp);
 }
   else
 dump_scope (pp, CP_DECL_CONTEXT (t), flags);
 
-  dump_function_name (pp, t, flags);
+  /* A lambda's signature is essentially its "type", which has already been
+ dumped.  */
+  if (!lambda_p)
+dump_function_name (pp, t, flags);
 
   if (!(flags & TFF_NO_FUNCTION_ARGUMENTS))
 {
-  dump_parameters (pp, parmtypes, flags);
+  if (!lambda_p)
+   dump_parameters (pp, parmtypes, flags);
 
   if (TREE_CODE (fntype) == METHOD_TYPE)
{
-- 
1.8.4



[PATCH 2/4] Don't generate lambda conversion op if arglist has parameter pack.

2013-08-27 Thread Adam Butcher
* lambda.c (maybe_add_lambda_conv_op): Optimize argvec building and
early out if CALLOP contains a function parameter pack.
---
 gcc/cp/lambda.c | 60 ++---
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index e9bc7c5..4d76f82 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -770,8 +770,51 @@ maybe_add_lambda_conv_op (tree type)
   return;
 }
 
+  argvec = make_tree_vector ();
+
+  /* Non-template conversion operators are defined directly.  Templates are
+ deferred.  In the non-template case, the nullptr instance of the stateless
+ lambda type is added to ARGVEC for build_call_a.  In the template case it
+ is bound via build_min.  */
+  if (!generic_lambda_p)
+{
+  arg = build1 (NOP_EXPR, TREE_TYPE (DECL_ARGUMENTS (callop)),
+   null_pointer_node);
+  argvec->quick_push (arg);
+}
+
+  /* Copy CALLOP's argument list (as per 'copy_list') as FN_ARGS in order to
+ declare the static member function "_FUN" below.  For each arg append to
+ ARGVEC (converting from reference in the template call op case).  Early 
out
+ if a parameter pack is found; conversion to function pointer is not
+ supported in this case.  */
+  tree fn_args = NULL_TREE;
+  {
+tree src = DECL_CHAIN (DECL_ARGUMENTS (callop));
+tree tgt;
+
+while (src)
+  {
+   if (FUNCTION_PARAMETER_PACK_P (src))
+ return;
+
+   if (!fn_args)
+ fn_args = tgt = copy_node (src);
+   else
+ {
+   TREE_CHAIN (tgt) = copy_node (src);
+   tgt = TREE_CHAIN (tgt);
+ }
+
+   mark_exp_read (tgt);
+   vec_safe_push (argvec,
+  generic_lambda_p ? convert_from_reference (tgt) : tgt);
+
+   src = TREE_CHAIN (src);
+  }
+  }
+
   tree fn_result = TREE_TYPE (TREE_TYPE (callop));
-  tree fn_args = copy_list (DECL_CHAIN (DECL_ARGUMENTS (callop)));
 
   if (generic_lambda_p)
 {
@@ -780,12 +823,6 @@ maybe_add_lambda_conv_op (tree type)
 implementation of the conversion operator.  */
 
   tree instance = build_nop (type, null_pointer_node);
-  argvec = make_tree_vector ();
-  for (arg = fn_args; arg; arg = DECL_CHAIN (arg))
-   {
- mark_exp_read (arg);
- vec_safe_push (argvec, convert_from_reference (arg));
-   }
 
   tree objfn = build_min (COMPONENT_REF, NULL_TREE,
  instance, DECL_NAME (callop), NULL_TREE);
@@ -802,15 +839,6 @@ maybe_add_lambda_conv_op (tree type)
 }
   else
 {
-  arg = build1 (NOP_EXPR, TREE_TYPE (DECL_ARGUMENTS (callop)),
-   null_pointer_node);
-  argvec = make_tree_vector ();
-  argvec->quick_push (arg);
-  for (arg = fn_args; arg; arg = DECL_CHAIN (arg))
-   {
- mark_exp_read (arg);
- vec_safe_push (argvec, arg);
-   }
   call = build_call_a (callop, argvec->length (), argvec->address ());
   CALL_FROM_THUNK_P (call) = 1;
   if (MAYBE_CLASS_TYPE_P (TREE_TYPE (call)))
-- 
1.8.4



Re: Lambda templates and implicit function templates.

2013-08-27 Thread Adam Butcher
Hi Jason,

Here's an updated patch set.  The fully_implicit_function_template_p
field has been moved into cp_parser and the other comments addressed.

I've done some testing with parameter packs also.  They work okay with
the explicit template parameter syntax for lambdas.

Unfortunately, due to errors being thrown 'early' in grokdeclarator, I
haven't been able to get 'auto...' (or reference/qualified variants)
working yet.  I think I need to defer processing the parameter pack
internals of grokdeclarator until I have the synthesized template
parameter (or generate one on the fly in-place --- but that's
returning to the old 'on-demand' implementation which we moved away
from).

I don't know if it's the correct thing to do but the implementation
currently omits the conversion to function pointer operator if the
argument list contains a parameter pack.

One other thing, assuming the 'auto...' syntax can be made to work,
bug 41933 needs to be resolved for the expansion returned by the
generic lambda in N3690 5.1.2.5 to compile.  Currently (transforming
the 'auto&&...' to an explicit ' T&&...') appears to
yield the bug.

In particular I get:
  error: expansion pattern ‘ts’ contains no argument packs
  sorry, unimplemented: use of ‘type_pack_expansion’ in template

   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41933#c8

You look to have done some work on it.  Any direction as to where to
go from there?  If the solution requires reworking variadics somehow
might that perhaps alleviate my problems implementing 'auto...'?
Wishful thinking probably!

Cheers,
Adam


Patch summary (4):
  Support lambda templates.
  Don't generate lambda conversion op if arglist has parameter pack.
  Support dumping type bindings in lambda diagnostics.
  Support using 'auto' in a function parameter list to introduce an
implicit template parameter.

 gcc/cp/cp-tree.h|   2 +
 gcc/cp/decl.c   |   7 +-
 gcc/cp/decl2.c  |   5 +-
 gcc/cp/error.c  |  22 +++---
 gcc/cp/lambda.c | 115 ++-
 gcc/cp/parser.c | 222 +---
 gcc/cp/parser.h |   6 ++
 gcc/cp/pt.c |  39 +
 gcc/cp/type-utils.h |  56 +
 9 files changed, 413 insertions(+), 61 deletions(-)
 create mode 100644 gcc/cp/type-utils.h

-- 
1.8.4



[PATCH 1/4] Support lambda templates.

2013-08-27 Thread Adam Butcher
* parser.c (cp_parser_lambda_declarator_opt): Accept template parameter
list with std=c++1y or std=gnu++1y.
(cp_parser_lambda_body): Don't call 'expand_or_defer_fn' for lambda call
operator template to avoid adding template result to symbol table.
* lambda.c (lambda_function): Return template result if call operator is
a template.
(maybe_add_lambda_conv_op): Support conversion of a non-capturing lambda
template to a function pointer.
* decl2.c (check_member_template): Don't reject lambda call operator
template in local [lambda] class.
* pt.c (instantiate_class_template_1): Don't instantiate lambda call
operator template when instantiating lambda class.
---
 gcc/cp/decl2.c  |  5 ++--
 gcc/cp/lambda.c | 87 -
 gcc/cp/parser.c | 40 --
 gcc/cp/pt.c |  4 ++-
 4 files changed, 110 insertions(+), 26 deletions(-)

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index d5d2912..ac9dbd7 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -507,8 +507,9 @@ check_member_template (tree tmpl)
   || (TREE_CODE (decl) == TYPE_DECL
  && MAYBE_CLASS_TYPE_P (TREE_TYPE (decl
 {
-  /* The parser rejects template declarations in local classes.  */
-  gcc_assert (!current_function_decl);
+  /* The parser rejects template declarations in local classes
+(with the exception of generic lambdas).  */
+  gcc_assert (!current_function_decl || LAMBDA_FUNCTION_P (decl));
   /* The parser rejects any use of virtual in a function template.  */
   gcc_assert (!(TREE_CODE (decl) == FUNCTION_DECL
&& DECL_VIRTUAL_P (decl)));
diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index a53e692..e9bc7c5 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -196,7 +196,7 @@ lambda_function (tree lambda)
  /*protect=*/0, /*want_type=*/false,
  tf_warning_or_error);
   if (lambda)
-lambda = BASELINK_FUNCTIONS (lambda);
+lambda = STRIP_TEMPLATE (get_first_fn (lambda));
   return lambda;
 }
 
@@ -759,6 +759,10 @@ maybe_add_lambda_conv_op (tree type)
   if (processing_template_decl)
 return;
 
+  bool generic_lambda_p
+= (DECL_TEMPLATE_INFO (callop)
+&& DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (callop)) == callop);
+
   if (DECL_INITIAL (callop) == NULL_TREE)
 {
   /* If the op() wasn't instantiated due to errors, give up.  */
@@ -766,7 +770,54 @@ maybe_add_lambda_conv_op (tree type)
   return;
 }
 
-  stattype = build_function_type (TREE_TYPE (TREE_TYPE (callop)),
+  tree fn_result = TREE_TYPE (TREE_TYPE (callop));
+  tree fn_args = copy_list (DECL_CHAIN (DECL_ARGUMENTS (callop)));
+
+  if (generic_lambda_p)
+{
+  /* Construct the dependent member call for the static member function
+'_FUN' and remove 'auto' from its return type to allow for simple
+implementation of the conversion operator.  */
+
+  tree instance = build_nop (type, null_pointer_node);
+  argvec = make_tree_vector ();
+  for (arg = fn_args; arg; arg = DECL_CHAIN (arg))
+   {
+ mark_exp_read (arg);
+ vec_safe_push (argvec, convert_from_reference (arg));
+   }
+
+  tree objfn = build_min (COMPONENT_REF, NULL_TREE,
+ instance, DECL_NAME (callop), NULL_TREE);
+  call = build_nt_call_vec (objfn, argvec);
+
+  if (type_uses_auto (fn_result))
+   {
+ ++processing_template_decl;
+ fn_result = finish_decltype_type
+   (call, /*id_expression_or_member_access_p=*/false,
+tf_warning_or_error);
+ --processing_template_decl;
+   }
+}
+  else
+{
+  arg = build1 (NOP_EXPR, TREE_TYPE (DECL_ARGUMENTS (callop)),
+   null_pointer_node);
+  argvec = make_tree_vector ();
+  argvec->quick_push (arg);
+  for (arg = fn_args; arg; arg = DECL_CHAIN (arg))
+   {
+ mark_exp_read (arg);
+ vec_safe_push (argvec, arg);
+   }
+  call = build_call_a (callop, argvec->length (), argvec->address ());
+  CALL_FROM_THUNK_P (call) = 1;
+  if (MAYBE_CLASS_TYPE_P (TREE_TYPE (call)))
+   call = build_cplus_new (TREE_TYPE (call), call, tf_warning_or_error);
+}
+
+  stattype = build_function_type (fn_result,
  FUNCTION_ARG_CHAIN (callop));
 
   /* First build up the conversion op.  */
@@ -794,6 +845,9 @@ maybe_add_lambda_conv_op (tree type)
   if (nested)
 DECL_INTERFACE_KNOWN (fn) = 1;
 
+  if (generic_lambda_p)
+fn = add_inherited_template_parms (fn, DECL_TI_TEMPLATE (callop));
+
   add_method (type, fn, NULL_TREE);
 
   /* Generic thunk code fails for varargs; we'll complain in mark_used if
@@ -820,8 +874,8 @@ maybe_add_lambda_conv_op (tree type)
   DECL_NOT_REALLY_EXTERN (fn) = 1;
   DECL_DECLARED_INLINE_P (fn) = 1;
   DECL_STATIC_FUNCTION

Re: [PATCH 3/3] Support dumping type bindings in lambda diagnostics.

2013-08-27 Thread Adam Butcher

Hi Jason,

Was just about to compose a mail to gcc-patches...  Been busy, then 
ill, now just about ready to submit a new set of diffs.


On 27.08.2013 17:47, Jason Merrill wrote:


This patch doesn't seem to depend on the others; go ahead and apply 
it.


Okay.  As it stands, it means that you get an additional 'const' in 
diagnostics for lambda's not declared 'mutable'.  I didn't think this to 
be necessarily a bad thing (distinguishing between 'mutable' and 'plain' 
lambdas and pointing at the cause for possible user error attempting to 
write to by-value captures) but it's probably contentious.  Do you want 
me to ditch the 'const' output prior to pushing?


Cheers,
Adam



Re: PING: Re: [patch] implement simd loops in trunk (OMP_SIMD)

2013-08-27 Thread Aldy Hernandez

On 08/19/13 13:18, Aldy Hernandez wrote:

Well, apparently this last revision was approved and I didn't even know 
about it :).


Tested one last time against current trunk and committed as revision 202029.

My apologies to Jakub for the merge problems he will inherit on the 
gomp-4_0-branch.


Aldy


On 08/19/13 04:35, Jakub Jelinek wrote:

On Thu, Aug 01, 2013 at 09:38:31AM -1000, Richard Henderson wrote:

+  if (simd
+ /*
+ || (fd->sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
+ && !fd->have_ordered)*/)


Debugging leftovers or what?


gomp-4_0-branch contains also the || there, but for the trunk merge
I think Aldy can leave that comment out, I'll deal with it when
merging the
trunk changes back to gomp-4_0-branch.


Removed comment and the enclosed code altogether.



+  /* Enforce simdlen 1 in simd loops with data sharing clauses
referencing
+ variable sized vars.  That is unnecessarily hard to support
and very
+ unlikely to result in vectorized code anyway.  */
+  if (is_simd)
+for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
+  switch (OMP_CLAUSE_CODE (c))
+   {
+   case OMP_CLAUSE_REDUCTION:
+ if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+   max_vf = 1;
+ /* FALLTHRU */
+   case OMP_CLAUSE_PRIVATE:
+   case OMP_CLAUSE_FIRSTPRIVATE:
+   case OMP_CLAUSE_LASTPRIVATE:
+   case OMP_CLAUSE_LINEAR:
+ if (is_variable_sized (OMP_CLAUSE_DECL (c)))
+   max_vf = 1;
+ break;
+   default:
+ continue;
+   }


Comment is wrong, code is right, adjusting max_vf not simdlen.


max_vf = 1 causes
   c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN);
   OMP_CLAUSE_SAFELEN_EXPR (c) = build_int_cst
(integer_type_node,
max_vf);
so I think the comment should just be changed s/simdlen/safelen/,
always get
those 2 wrong...  simdlen is for elemental functions, safelen for simd
loops.


Adjusted comment to document both things-- that max_vf is being set, and
that this later triggers a safelen of 1:

+  /* Set max_vf=1 (which will later enforce safelen=1) in simd loops
+ with data sharing clauses referencing variable sized vars.  That
+ is unnecessarily hard to support and very unlikely to result in
+ vectorized code anyway.  */




+  /* If max_vf is non-NULL, then we can use only vectorization factor
+ up to the max_vf we chose.  So stick it into safelen clause.  */
+  if (max_vf)
+{
+  tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
+   OMP_CLAUSE_SAFELEN);


First, non-zero, not non-null -- max_vf is not a pointer.


Ack.


Fixed.




Second, I don't understand why we're adjusting safelen.  The VF we
choose
for optimizing the loop (even 1), does not change the fact that there
are
no dependencies between loop iterations larger than that.


No, it adds depenencies.  We choose some size of the per-simd lane
arrays,
and we must not allow larger vectorization factor than that, as every
simd
lane should have different object.  The size is normally the maximum
possible value for a target, so it should be big enough, but we really
need
it to be recorded, so that we don't generate invalid code.  max_vf = 1
is for the cases where we punt on handling it properly, those loops
for now
can't be vectorized just because they had safelen clause (normal
vectorization rules will probably also punt on those).  In the future if
e.g. privatized per-SIMD lane VLAs are common we can improve that, but
right
now I somehow doubt that would be the case.


I'll let you two sort this out.  Nothing done.




Did you want to be adding a _max_vf_ attribute to record stuff that we
learned while examining the omp loop?  E.g. the max_vf=1 reduction
above?

Given the only adjustment we make to max_vf is to disable vectorization,
did we in fact want to discard the simd attribute instead, making this a
"regular" openmp loop?


See above, it is not only about disabling it, we record the sizes of the
arrays there.  And, for SAFELEN clause with 1, we actually don't set
loop->force_vect and set loop->safelen to 0 (i.e. normal loop).


Similarly, you too can sort this out.




+   if (__builtin_expect (N32 cond3 N31, 0)) goto ZERO_ITER_BB;
+   if (cond3 is <)
+ adj = STEP3 - 1;
+   else
+ adj = STEP3 + 1;
+   count3 = (adj + N32 - N31) / STEP3;
+   if (__builtin_expect (N22 cond2 N21, 0)) goto ZERO_ITER_BB;
+   if (cond2 is <)
+ adj = STEP2 - 1;
+   else
+ adj = STEP2 + 1;
+   count2 = (adj + N22 - N21) / STEP2;
+   if (__builtin_expect (N12 cond1 N11, 0)) goto ZERO_ITER_BB;


CEIL_DIV_EXPR, instead of TRUNC_DIV_EXPR and adjusting by hand?
Unless we
can't generate the same code in the end because generically we don't
know that
the values involved must be negative for GT?


This is just moving code around (and merging multiple copies o

Re: opt-info message change for vectorizer

2013-08-27 Thread Xinliang David Li
yes -- the long unmangled names can be annoying -- that is why I chose
to dump the short form of the function names -- combined with line
numbers, it should be enough to get the full context.

David

On Tue, Aug 27, 2013 at 11:36 AM, Teresa Johnson  wrote:
> My only concern is whether the dump messages will get too long with
> the full function name on the same line. The infrastructure that emits
> inform() notes ensures that the function name is printed before each
> block of messages related to that function (via an "In function foo:"
> type message), but I had found before that the new dump infrastructure
> doesn't do that. OTOH, your approach will make it much easier to grep
> the output of a large build. Personally, I use grep on this type of
> output enough to make the longer lines worth it. Either that or the
> new dump infrastructure needs to be fixed to emit the function name
> before each block of messages, a la inform().
>
> Thanks,
> Teresa
>
> On Tue, Aug 27, 2013 at 11:22 AM, Xinliang David Li  
> wrote:
>> Does this one look ok?
>>
>> thanks,
>>
>> David
>>
>> On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li  
>> wrote:
>>> Hi, In this patch, loop alignment peeling and loop versioning
>>> transformation will be reported via -fopt-info by default. This will
>>> help vectorizer size tuning.
>>>
>>> It also enhances the opt-info dump to include current function name as
>>> context. Otherwise, we may see duplicate messeages from inline/cloned
>>> instances.
>>>
>>> An example opt report:
>>>
>>>
>>>
>>> b.c:16:A::foo: note: Loop is vectorized
>>>
>>> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization
>>>
>>> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization
>>>
>>> b.c:16:A::foo: note: Completely unroll loop 6 times
>>>
>>> b.c:12:A::foo: note: Completely unroll loop 6 times
>>>
>>>
>>> Ok after testing?
>>>
>>> thanks,
>>>
>>> David
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-08-27 Thread Kirill Yukhin
On 27 Aug 22:11, Kirill Yukhin wrote:
Hello, I've while pasting the patch I've accidentally put
extra brace.
Pls Ignore it
> +(define_insn "kxnor"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
> + (not:SWI12
> +   (xor:SWI12
> + (match_operand:SWI12 1 "register_operand" "0,k")) ;; <--- Extra ')'
> + (match_operand:SWI12 2 "register_operand" "r,k")))]
> +  "TARGET_AVX512F"

--
Thanks, K


Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.

2013-08-27 Thread Kirill Yukhin
Hello,

> This patch is still far too large.
> 
> I think you should split it up based on every single mode iterator that
> you need to add or change.

Problem is that some iterators are depend on each other, so patches are
not going to be tiny.

Here is 1st one. It extends VF iterator - biggest impact I believe

Is it Ok?

Testing:
  1. Bootstrap pass.
  2. make check shows no regressions.
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f 
option.
  4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f 
option.


--
Thanks, K

PS. If it is - I am going to strip out ChangeLog lines from big patch

---
 gcc/config/i386/i386.c  |  62 +--
 gcc/config/i386/i386.md |   1 +
 gcc/config/i386/sse.md  | 283 +++-
 3 files changed, 241 insertions(+), 105 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8325919..5f50533 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -16538,8 +16538,8 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx 
op1)
   gcc_unreachable ();
 case V32QImode:
   extract = gen_avx_vextractf128v32qi;
-  load_unaligned = gen_avx_loaddqu256;
-  store_unaligned = gen_avx_storedqu256;
+  load_unaligned = gen_avx_loaddquv32qi;
+  store_unaligned = gen_avx_storedquv32qi;
   mode = V16QImode;
   break;
 case V8SFmode:
@@ -16642,10 +16642,56 @@ void
 ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
 {
   rtx op0, op1, m;
+  rtx (*load_unaligned) (rtx, rtx);
+  rtx (*store_unaligned) (rtx, rtx);
 
   op0 = operands[0];
   op1 = operands[1];
 
+  if (GET_MODE_SIZE (mode) == 64)
+{
+  switch (GET_MODE_CLASS (mode))
+   {
+   case MODE_VECTOR_INT:
+   case MODE_INT:
+ op0 = gen_lowpart (V16SImode, op0);
+ op1 = gen_lowpart (V16SImode, op1);
+ /* FALLTHRU */
+
+   case MODE_VECTOR_FLOAT:
+ switch (GET_MODE (op0))
+   {
+   default:
+ gcc_unreachable ();
+   case V16SImode:
+ load_unaligned = gen_avx512f_loaddquv16si;
+ store_unaligned = gen_avx512f_storedquv16si;
+ break;
+   case V16SFmode:
+ load_unaligned = gen_avx512f_loadups512;
+ store_unaligned = gen_avx512f_storeups512;
+ break;
+   case V8DFmode:
+ load_unaligned = gen_avx512f_loadupd512;
+ store_unaligned = gen_avx512f_storeupd512;
+ break;
+   }
+
+ if (MEM_P (op1))
+   emit_insn (load_unaligned (op0, op1));
+ else if (MEM_P (op0))
+   emit_insn (store_unaligned (op0, op1));
+ else
+   gcc_unreachable ();
+ break;
+
+   default:
+ gcc_unreachable ();
+   }
+
+  return;
+}
+
   if (TARGET_AVX
   && GET_MODE_SIZE (mode) == 32)
 {
@@ -16678,7 +16724,7 @@ ix86_expand_vector_move_misalign (enum machine_mode 
mode, rtx operands[])
  op0 = gen_lowpart (V16QImode, op0);
  op1 = gen_lowpart (V16QImode, op1);
  /* We will eventually emit movups based on insn attributes.  */
- emit_insn (gen_sse2_loaddqu (op0, op1));
+ emit_insn (gen_sse2_loaddquv16qi (op0, op1));
}
   else if (TARGET_SSE2 && mode == V2DFmode)
 {
@@ -16753,7 +16799,7 @@ ix86_expand_vector_move_misalign (enum machine_mode 
mode, rtx operands[])
  op0 = gen_lowpart (V16QImode, op0);
  op1 = gen_lowpart (V16QImode, op1);
  /* We will eventually emit movups based on insn attributes.  */
- emit_insn (gen_sse2_storedqu (op0, op1));
+ emit_insn (gen_sse2_storedquv16qi (op0, op1));
}
   else if (TARGET_SSE2 && mode == V2DFmode)
{
@@ -27473,13 +27519,13 @@ static const struct builtin_description 
bdesc_special_args[] =
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_lfence, "__builtin_ia32_lfence", 
IX86_BUILTIN_LFENCE, UNKNOWN, (int) VOID_FTYPE_VOID },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_mfence, 0, IX86_BUILTIN_MFENCE, 
UNKNOWN, (int) VOID_FTYPE_VOID },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storeupd, "__builtin_ia32_storeupd", 
IX86_BUILTIN_STOREUPD, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storedqu, "__builtin_ia32_storedqu", 
IX86_BUILTIN_STOREDQU, UNKNOWN, (int) VOID_FTYPE_PCHAR_V16QI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storedquv16qi, 
"__builtin_ia32_storedqu", IX86_BUILTIN_STOREDQU, UNKNOWN, (int) 
VOID_FTYPE_PCHAR_V16QI },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntv2df, "__builtin_ia32_movntpd", 
IX86_BUILTIN_MOVNTPD, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntv2di, "__builtin_ia32_movntdq", 
IX86_BUILTIN_MOVNTDQ, UNKNOWN, (int) VOID_FTYPE_PV2DI_V2DI },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntisi, "__builtin_ia32_movnti", 
IX86_BUILTIN_MOVNTI, UNKNOWN

Re: wide-int branch updated

2013-08-27 Thread Kenneth Zadeck
you are about an hour behind in reading your email.   I had just 
committed a patch that is very close to this.


On 08/27/2013 02:31 PM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

fixed fits_uhwi_p.

tested on x86-64.

kenny

Index: gcc/wide-int.h
===
--- gcc/wide-int.h  (revision 201985)
+++ gcc/wide-int.h  (working copy)
@@ -1650,7 +1650,7 @@ wide_int_ro::fits_shwi_p () const
  inline bool
  wide_int_ro::fits_uhwi_p () const
  {
-  return len == 1 || (len == 2 && val[1] == 0);
+  return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0);
  }

With upper bits being undefined, it doesn't seem safe to check
val[0] or val[1] like this.   I was thinking along the lines of:

inline bool
wide_int_ro::fits_uhwi_p () const
{
   if (precision <= HOST_BITS_PER_WIDE_INT)
 return true;
   if (len == 1)
 return val[0] >= 0;
   if (precision < HOST_BITS_PER_WIDE_INT * 2)
 return ((unsigned HOST_WIDE_INT) val[1]
<< (HOST_BITS_PER_WIDE_INT * 2 - precision)) == 0;
   return val[1] == 0;
}

Since we don't have a sign, everything HWI-sized or smaller fits in a
uhwi without loss of precision.

I've tested the above on x86_64-linux-gnu FWIW, in case it looks OK.

Thanks,
Richard




Re: opt-info message change for vectorizer

2013-08-27 Thread Teresa Johnson
My only concern is whether the dump messages will get too long with
the full function name on the same line. The infrastructure that emits
inform() notes ensures that the function name is printed before each
block of messages related to that function (via an "In function foo:"
type message), but I had found before that the new dump infrastructure
doesn't do that. OTOH, your approach will make it much easier to grep
the output of a large build. Personally, I use grep on this type of
output enough to make the longer lines worth it. Either that or the
new dump infrastructure needs to be fixed to emit the function name
before each block of messages, a la inform().

Thanks,
Teresa

On Tue, Aug 27, 2013 at 11:22 AM, Xinliang David Li  wrote:
> Does this one look ok?
>
> thanks,
>
> David
>
> On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li  wrote:
>> Hi, In this patch, loop alignment peeling and loop versioning
>> transformation will be reported via -fopt-info by default. This will
>> help vectorizer size tuning.
>>
>> It also enhances the opt-info dump to include current function name as
>> context. Otherwise, we may see duplicate messeages from inline/cloned
>> instances.
>>
>> An example opt report:
>>
>>
>>
>> b.c:16:A::foo: note: Loop is vectorized
>>
>> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization
>>
>> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization
>>
>> b.c:16:A::foo: note: Completely unroll loop 6 times
>>
>> b.c:12:A::foo: note: Completely unroll loop 6 times
>>
>>
>> Ok after testing?
>>
>> thanks,
>>
>> David



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: wide-int branch updated

2013-08-27 Thread Richard Sandiford
Kenneth Zadeck  writes:
> fixed fits_uhwi_p.
>
> tested on x86-64.
>
> kenny
>
> Index: gcc/wide-int.h
> ===
> --- gcc/wide-int.h(revision 201985)
> +++ gcc/wide-int.h(working copy)
> @@ -1650,7 +1650,7 @@ wide_int_ro::fits_shwi_p () const
>  inline bool
>  wide_int_ro::fits_uhwi_p () const
>  {
> -  return len == 1 || (len == 2 && val[1] == 0);
> +  return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0);
>  }

With upper bits being undefined, it doesn't seem safe to check
val[0] or val[1] like this.   I was thinking along the lines of:

inline bool
wide_int_ro::fits_uhwi_p () const
{
  if (precision <= HOST_BITS_PER_WIDE_INT)
return true;
  if (len == 1)
return val[0] >= 0;
  if (precision < HOST_BITS_PER_WIDE_INT * 2)
return ((unsigned HOST_WIDE_INT) val[1]
<< (HOST_BITS_PER_WIDE_INT * 2 - precision)) == 0;
  return val[1] == 0;
}

Since we don't have a sign, everything HWI-sized or smaller fits in a
uhwi without loss of precision.

I've tested the above on x86_64-linux-gnu FWIW, in case it looks OK.

Thanks,
Richard


Re: opt-info message change for vectorizer

2013-08-27 Thread Xinliang David Li
Does this one look ok?

thanks,

David

On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li  wrote:
> Hi, In this patch, loop alignment peeling and loop versioning
> transformation will be reported via -fopt-info by default. This will
> help vectorizer size tuning.
>
> It also enhances the opt-info dump to include current function name as
> context. Otherwise, we may see duplicate messeages from inline/cloned
> instances.
>
> An example opt report:
>
>
>
> b.c:16:A::foo: note: Loop is vectorized
>
> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization
>
> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization
>
> b.c:16:A::foo: note: Completely unroll loop 6 times
>
> b.c:12:A::foo: note: Completely unroll loop 6 times
>
>
> Ok after testing?
>
> thanks,
>
> David


Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)

2013-08-27 Thread H.J. Lu
On Tue, Aug 27, 2013 at 10:41 AM, Uros Bizjak  wrote:
> On Tue, Aug 27, 2013 at 7:36 PM, H.J. Lu  wrote:
>
 As reported in [1] the host processor detection has not yet been updated
 to recognize Intel Ivy Bridge and Haswell processors.
 This small patch adds the detection of these processors and assumes
 core-avx2 as march for unknown processors of the PENTIUMPRO family that
 support AVX2.
>>>
>>> I have committed slightly improved (attached) patch that uses
>>> core-avx-i for IvyBridge and adds another IvyBridge model number.
>>> While there, I also reordered a bunch  of statements.
>>>
>>> Thanks,
>>> Uros.
>>
>> Page C-3 in ntel optimization guide shows:
>>
>> 06_3CH, 06_45H, 06_46H Intel microarchitecture Haswell
>> 06_3AH, 06_3EH Intel microarchitecture Ivy Bridge
>> 06_2AH, 06_2DH Intel microarchitecture Sandy Bridge
>> 06_25H, 06_2CH, 06_2FH Intel microarchitecture Westmere
>> 06_1AH, 06_1EH, 06_1FH, Intel microarchitecture Nehalem
>> 06_2EH
>> 06_17H, 06_1DH Enhanced Intel Core microarchitecture
>> 06_0FH Intel Core microarchitecture
>>
>> At least, we should add 0x45 and 0x46 to Haswell.
>
> OK, the patch is pre-approved.
>

This is what I checked in.

Thanks.

-- 
H.J.

---
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9698dc9..54a4cf9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2013-08-27  H.J. Lu  
+
+* config/i386/driver-i386.c (host_detect_local_cpu): Update
+Haswell processor detection.
+
 2013-08-27  Christian Widmer  

 PR target/57927
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 001d12f..4cb9907 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -673,6 +673,8 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
   cpu = "core-avx-i";
   break;
 case 0x3c:
+case 0x45:
+case 0x46:
   /* Haswell.  */
   cpu = "core-avx2";
   break;


Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-08-27 Thread Kirill Yukhin
Hello Reichard,

On 26 Aug 09:37, Richard Henderson wrote:
> On 08/26/2013 09:13 AM, Kirill Yukhin wrote:
> > +(define_split
> > +  [(set (match_operand:SWI12 0 "mask_reg_operand")
> > +   (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
> > +(match_operand:SWI12 2 "mask_reg_operand")))
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_AVX512F && reload_completed"
> > +  [(set (match_operand:SWI12 0 "mask_reg_operand")
> > +   (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
> > +(match_operand:SWI12 2 "mask_reg_operand")))])
> 
> You must you match_dup on the new pattern half of define_split.
> This pattern must never have triggered during your tests, since
> it should have resulted in garbage rtl, and an ICE.
Thanks, fixed.

> > + (match_operand:SWI12 2 "register_operand" "r,Yk")))]
> > +  "TARGET_AVX512F"
> > +  "@
> > +   #
> > +   kandnw\t{%2, %1, %0|%0, %1, %2}"
> > +  [(set_attr "type" "*,msklog")
> > +   (set_attr "prefix" "*,vex")
> > +   (set_attr "mode" "")])
> 
> What happened to the bmi andn alternative we discussed?
BMI only supported for 4- and 8- byte integers, while
kandw - for HI/QI

> > +   and{l}\t{%k2, %k0|%k0, %k2}
> > +   #"
> > +  [(set_attr "type" "alu,alu,alu,msklog")
> > +   (set_attr "mode" "QI,QI,SI,HI")])
> 
> Why force the split?  You can write the kand here...
Done.

> > +  {w}\t{%2, %0|%0, %2}
> > +  #"
> > +  [(set_attr "type" "alu,alu,msklog")
> > +   (set_attr "mode" "HI")])
> 
> Likewise.
Done.

> The point being that with optimization enabled, we will have run the split and
> gotten all of the performance benefit of eliding the clobber.  But with
> optimization disabled, we don't need the split for correctness.
> 
> > +(define_insn "kunpckhi"
> > +  [(set (match_operand:HI 0 "register_operand" "=Yk")
> > +   (ior:HI
> > + (ashift:HI
> > +   (match_operand:HI 1 "register_operand" "Yk")
> > +   (const_int 8))
> > + (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" 
> > "Yk") 0]
> > +  "TARGET_AVX512F"
> > +  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
> > +  [(set_attr "mode" "HI")
> > +   (set_attr "type" "msklog")
> > +   (set_attr "prefix" "vex")])
> 
> Don't write the subreg explicitly.  Instead, use a match_operand:QI, which 
> will
> match the whole (subreg (reg)) expression, and also something that the 
> combiner
> could simplify out of that.
Thanks, fixed.

> > +(define_insn "*one_cmplhi2_1"
> > +  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk")
> > +   (not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
> > +  "ix86_unary_operator_ok (NOT, HImode, operands)"
> ...
> >  (define_insn "*one_cmplqi2_1"
> > -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
> > -   (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
> > +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,*Yk")
> > +   (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,*Yk")))]
> 
> Forgotten ! for Yk alternatives.
Thanks. Fixed.

> > +  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
> ...
> > +;; Do not split instructions with mask registers.
> >  (define_split
> ...
> > +   && (! ANY_MASK_REG_P (operands[0])
> > +   || ! ANY_MASK_REG_P (operands[1])
> > +   || ! ANY_MASK_REG_P (operands[2]))"
> 
> This ugliness is why I suggested adding a general_reg_operand in our last
> conversation.
If introduced general_reg_operand predicate.

Testing:
  1. Bootstrap pass.
  2. make check shows no regressions.
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f 
option.
  4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f 
option.


Is it ok?

--
Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c |  34 +++--
 gcc/config/i386/i386.h |  40 --
 gcc/config/i386/i386.md| 280 ++---
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 311 insertions(+), 60 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;; B H   T
-;;;   h jk
+;;;   h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gc

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-27 Thread Teresa Johnson
On Mon, Aug 19, 2013 at 10:47 AM, Teresa Johnson  wrote:
> On Mon, Aug 19, 2013 at 8:09 AM, Jan Hubicka  wrote:
>>> Remember it isn't using dominance anymore. The latest patch was
>>> instead ensuring the most frequent path between hot blocks and the
>>> entry/exit are marked hot. That should be better than the dominance
>>> approach used in the earlier version.
>>
>> Indeed, that looks more resonable approach.
>> Can you point me to the last version of patch? Last one I remember still
>> walked dominators...
>
> I've included the latest patch below. I still use dominators in the
> post-cfg-optimization fixup (fixup_partitions), but not in the
> partition sanitizing done during the partitioning itself
> (sanitize_hot_paths). The former is looking for hot bbs newly
> dominated by cold bbs after cfg transformations.
>
>>>
>>> > We can commit it and work on better solution incrementally but it will
>>> > probably mean replacing it later.  If you think it makes things easier
>>> > to work on it incrementally, I think the patch is OK.
>>>
>>> Yes, I think this is a big step forward from what is there now for
>>> splitting, which does the splitting purely based on bb count in
>>> isolation. I don't have a much better solution in mind yet.
>>>

Ping on this patch. Honza, did the latest version I sent last week
look ok to you? I've included below a new version that adds the
partitioning insanity warnings we discussed (emitted to the dump only
since as I noted there are various optimization passes that provoke
this due to not fixing up profile data).

(Honza - I am also going to move the discussion we started in this
thread on the COMDAT missing profile handling patch to a different
thread. Should have an update on that shortly.)

Latest patch below retested with bootstrap and regression testing on
x86-64-unknown-linux-gnu, and with a profiledbootstrap and
-freorder-blocks-and-partition forced on. Ok for trunk?

Thanks,
Teresa

2013-08-27  Teresa Johnson  
Steven Bosscher  

* cfgrtl.c (fixup_new_cold_bb): New routine.
(commit_edge_insertions): Invoke fixup_partitions.
(find_partition_fixes): New routine.
(fixup_partitions): Ditto.
(verify_hot_cold_block_grouping): Update comments.
(rtl_verify_edges): Invoke find_partition_fixes.
(rtl_verify_bb_pointers): Update comments.
(rtl_verify_bb_layout): Ditto.
* basic-block.h (probably_never_executed_edge_p): Declare.
(fixup_partitions): Ditto.
* cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions.
* bb-reorder.c (sanitize_hot_paths): New function.
(find_rarely_executed_basic_blocks_and_crossing_edges): Invoke
sanitize_hot_paths.
* predict.c (probably_never_executed_edge_p): New routine.
* cfg.c (check_bb_profile): Add partition insanity warnings.

Index: cfgrtl.c
===
--- cfgrtl.c(revision 202021)
+++ cfgrtl.c(working copy)
@@ -1358,6 +1358,43 @@ fixup_partition_crossing (edge e)
 }
 }

+/* Called when block BB has been reassigned to the cold partition,
+   because it is now dominated by another cold block,
+   to ensure that the region crossing attributes are updated.  */
+
+static void
+fixup_new_cold_bb (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* This is called when a hot bb is found to now be dominated
+ by a cold bb and therefore needs to become cold. Therefore,
+ its preds will no longer be region crossing. Any non-dominating
+ preds that were previously hot would also have become cold
+ in the caller for the same region. Any preds that were previously
+ region-crossing will be adjusted in fixup_partition_crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+{
+  fixup_partition_crossing (e);
+}
+
+  /* Possibly need to make bb's successor edges region crossing,
+ or remove stale region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+{
+  /* We can't have fall-through edges across partition boundaries.
+ Note that force_nonfallthru will do any necessary partition
+ boundary fixup by calling fixup_partition_crossing itself.  */
+  if ((e->flags & EDGE_FALLTHRU)
+  && BB_PARTITION (bb) != BB_PARTITION (e->dest)
+  && e->dest != EXIT_BLOCK_PTR)
+force_nonfallthru (e);
+  else
+fixup_partition_crossing (e);
+}
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
expense of adding new instructions or reordering basic blocks.

@@ -1996,6 +2033,14 @@ commit_edge_insertions (void)
 {
   basic_block bb;

+  /* Optimization passes that invoke this routine can cause hot blocks
+ previously reached by both hot and cold blocks to become dominated only
+ by cold blocks. This will cause the verification below to fail,
+ and lead to now cold code in the hot section. In some cases this
+ may only be vis

Re: [PATCH] Convert more passes to new dump framework

2013-08-27 Thread Xinliang David Li
+ Honza

On Tue, Aug 27, 2013 at 10:56 AM, Teresa Johnson  wrote:
> Ping #3.
>
> Thanks,
> Teresa
>
> On Mon, Aug 19, 2013 at 11:33 AM, Teresa Johnson  wrote:
>> Ping.
>> Thanks,
>> Teresa
>>
>> On Mon, Aug 12, 2013 at 6:54 AM, Teresa Johnson  wrote:
>>> On Tue, Aug 6, 2013 at 10:23 PM, Teresa Johnson  
>>> wrote:
 On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson  
 wrote:
> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor  wrote:
>> Hi,
>>
>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor  wrote:
>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>> >> This patch ports messages to the new dump framework,
>>> >
>>> > It would be great this new framework was documented somewhere.  I lost
>>> > track of what was agreed it would be and from the uses in the
>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>
>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>> wiki or elsewhere?
>>
>> Thanks
>>
>>>
>>> >
>>> > I'd also like to point out two other minor things inline:
>>> >
>>> > [...]
>>> >
>>> >> 2013-08-06  Teresa Johnson  
>>> >> Dehao Chen  
>>> >>
>>> >> * dumpfile.c (dump_loc): Add column number to output, make 
>>> >> newlines
>>> >> consistent.
>>> >> * dumpfile.h (OPTGROUP_OTHER): Add and enable under 
>>> >> OPTGROUP_ALL.
>>> >> * ipa-inline-transform.c (clone_inlined_nodes):
>>> >> (cgraph_node_opt_info): New function.
>>> >> (cgraph_node_call_chain): Ditto.
>>> >> (dump_inline_decision): Ditto.
>>> >> (inline_call): Invoke dump_inline_decision.
>>> >> * doc/invoke.texi: Document optall -fopt-info flag.
>>> >> * profile.c (read_profile_edge_counts): Use new dump 
>>> >> framework.
>>> >> (compute_branch_probabilities): Ditto.
>>> >> * passes.c (pass_manager::register_one_dump_file): Use 
>>> >> OPTGROUP_OTHER
>>> >> when pass not in any opt group.
>>> >> * value-prof.c (check_counter): Use new dump framework.
>>> >> (find_func_by_funcdef_no): Ditto.
>>> >> (check_ic_target): Ditto.
>>> >> * coverage.c (get_coverage_counts): Ditto.
>>> >> (coverage_init): Setup new dump framework.
>>> >> * ipa-inline.c (inline_small_functions): Set 
>>> >> is_in_ipa_inline.
>>> >> * ipa-inline.h (is_in_ipa_inline): Declare.
>>> >>
>>> >> * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>> >> * testsuite/gcc.dg/pr26570.c: Ditto.
>>> >> * testsuite/gcc.dg/pr32773.c: Ditto.
>>> >> * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>> >>
>>> >
>>> > [...]
>>> >
>>> >> Index: ipa-inline-transform.c
>>> >> ===
>>> >> --- ipa-inline-transform.c  (revision 201461)
>>> >> +++ ipa-inline-transform.c  (working copy)
>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, 
>>> >> bool d
>>> >>  }
>>> >>
>>> >>
>>> >> +#define MAX_INT_LENGTH 20
>>> >> +
>>> >> +/* Return NODE's name and profile count, if available.  */
>>> >> +
>>> >> +static const char *
>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>> >> +{
>>> >> +  char *buf;
>>> >> +  size_t buf_size;
>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 
>>> >> 0);
>>> >> +
>>> >> +  if (!bfd_name)
>>> >> +bfd_name = "unknown";
>>> >> +
>>> >> +  buf_size = strlen (bfd_name) + 1;
>>> >> +  if (profile_info)
>>> >> +buf_size += (MAX_INT_LENGTH + 3);
>>> >> +
>>> >> +  buf = (char *) xmalloc (buf_size);
>>> >> +
>>> >> +  strcpy (buf, bfd_name);
>>> >> +
>>> >> +  if (profile_info)
>>> >> +sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, 
>>> >> node->count);
>>> >> +  return buf;
>>> >> +}
>>> >
>>> > I'm not sure if output of this function is aimed only at the user or
>>> > if it is supposed to be used by gcc developers as well.  If the
>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>> > too.  We usually dump it after "/" sign separating it from node name.
>>> > It is invaluable when examining decisions in C++ code where you can
>>> > have lots of clones of a node (and also because existing dumps print
>>> > it, it is easy to combine them).
>>>
>>> The output is useful for both power users doing performance tuning of
>>> their application, and by gcc developers. Adding the id is not so
>>> useful for the former, but I agree that it i

[PING][PATCH] Fix PR58139 by correctly initializing reg_raw_mode[]

2013-08-27 Thread Peter Bergner
I'd like to ping the following patch which fixes a wrong code bug
on powerpc64-linux due to a lost dependency within the scheduler:

Fix PR58139 by correctly initializing reg_raw_mode[]

http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00910.html

H.J., can you please try and SPEC test this patch on Intel to
make sure we don't regress you?  Thanks.

Peter





Re: [PATCH] Convert more passes to new dump framework

2013-08-27 Thread Teresa Johnson
Ping #3.

Thanks,
Teresa

On Mon, Aug 19, 2013 at 11:33 AM, Teresa Johnson  wrote:
> Ping.
> Thanks,
> Teresa
>
> On Mon, Aug 12, 2013 at 6:54 AM, Teresa Johnson  wrote:
>> On Tue, Aug 6, 2013 at 10:23 PM, Teresa Johnson  wrote:
>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson  wrote:
 On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor  wrote:
> Hi,
>
> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor  wrote:
>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>> >> This patch ports messages to the new dump framework,
>> >
>> > It would be great this new framework was documented somewhere.  I lost
>> > track of what was agreed it would be and from the uses in the
>> > vectorizer I was never quite sure how to utilize it in other passes.
>>
>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>> wiki or elsewhere?
>
> Thanks
>
>>
>> >
>> > I'd also like to point out two other minor things inline:
>> >
>> > [...]
>> >
>> >> 2013-08-06  Teresa Johnson  
>> >> Dehao Chen  
>> >>
>> >> * dumpfile.c (dump_loc): Add column number to output, make 
>> >> newlines
>> >> consistent.
>> >> * dumpfile.h (OPTGROUP_OTHER): Add and enable under 
>> >> OPTGROUP_ALL.
>> >> * ipa-inline-transform.c (clone_inlined_nodes):
>> >> (cgraph_node_opt_info): New function.
>> >> (cgraph_node_call_chain): Ditto.
>> >> (dump_inline_decision): Ditto.
>> >> (inline_call): Invoke dump_inline_decision.
>> >> * doc/invoke.texi: Document optall -fopt-info flag.
>> >> * profile.c (read_profile_edge_counts): Use new dump 
>> >> framework.
>> >> (compute_branch_probabilities): Ditto.
>> >> * passes.c (pass_manager::register_one_dump_file): Use 
>> >> OPTGROUP_OTHER
>> >> when pass not in any opt group.
>> >> * value-prof.c (check_counter): Use new dump framework.
>> >> (find_func_by_funcdef_no): Ditto.
>> >> (check_ic_target): Ditto.
>> >> * coverage.c (get_coverage_counts): Ditto.
>> >> (coverage_init): Setup new dump framework.
>> >> * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>> >> * ipa-inline.h (is_in_ipa_inline): Declare.
>> >>
>> >> * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>> >> * testsuite/gcc.dg/pr26570.c: Ditto.
>> >> * testsuite/gcc.dg/pr32773.c: Ditto.
>> >> * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>> >>
>> >
>> > [...]
>> >
>> >> Index: ipa-inline-transform.c
>> >> ===
>> >> --- ipa-inline-transform.c  (revision 201461)
>> >> +++ ipa-inline-transform.c  (working copy)
>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, 
>> >> bool d
>> >>  }
>> >>
>> >>
>> >> +#define MAX_INT_LENGTH 20
>> >> +
>> >> +/* Return NODE's name and profile count, if available.  */
>> >> +
>> >> +static const char *
>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>> >> +{
>> >> +  char *buf;
>> >> +  size_t buf_size;
>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 
>> >> 0);
>> >> +
>> >> +  if (!bfd_name)
>> >> +bfd_name = "unknown";
>> >> +
>> >> +  buf_size = strlen (bfd_name) + 1;
>> >> +  if (profile_info)
>> >> +buf_size += (MAX_INT_LENGTH + 3);
>> >> +
>> >> +  buf = (char *) xmalloc (buf_size);
>> >> +
>> >> +  strcpy (buf, bfd_name);
>> >> +
>> >> +  if (profile_info)
>> >> +sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, 
>> >> node->count);
>> >> +  return buf;
>> >> +}
>> >
>> > I'm not sure if output of this function is aimed only at the user or
>> > if it is supposed to be used by gcc developers as well.  If the
>> > latter, an incredibly useful thing is to also dump node->symbol.order
>> > too.  We usually dump it after "/" sign separating it from node name.
>> > It is invaluable when examining decisions in C++ code where you can
>> > have lots of clones of a node (and also because existing dumps print
>> > it, it is easy to combine them).
>>
>> The output is useful for both power users doing performance tuning of
>> their application, and by gcc developers. Adding the id is not so
>> useful for the former, but I agree that it is very useful for compiler
>> developers. In fact, in the google branch version we emit more verbose
>> information (the lipo module id and the funcdef_no) to help uniquely
>> identify the routines and 

wide-int branch updated.

2013-08-27 Thread Kenneth Zadeck

removed all knowledge of SHIFT_COUNT_TRUNCATED from wide-int

both Richard Biener and Richard Sandiford had commented negatively about 
this.


fixed bug with wide-int::fits_uhwi_p.

kenny


Index: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 201985)
+++ gcc/fold-const.c	(working copy)
@@ -992,9 +992,9 @@ int_const_binop_1 (enum tree_code code,
 	/* It's unclear from the C standard whether shifts can overflow.
 	   The following code ignores overflow; perhaps a C standard
 	   interpretation ruling is needed.  */
-	res = op1.rshift (arg2, sign, GET_MODE_BITSIZE (TYPE_MODE (type)), TRUNC);
+	res = op1.rshift (arg2, sign, GET_MODE_BITSIZE (TYPE_MODE (type)));
   else
-	res = op1.lshift (arg2, GET_MODE_BITSIZE (TYPE_MODE (type)), TRUNC);
+	res = op1.lshift (arg2, GET_MODE_BITSIZE (TYPE_MODE (type)));
   break;
   
 case RROTATE_EXPR:
Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c	(revision 201985)
+++ gcc/simplify-rtx.c	(working copy)
@@ -3507,9 +3507,7 @@ rtx
 simplify_const_binary_operation (enum rtx_code code, enum machine_mode mode,
  rtx op0, rtx op1)
 {
-#if TARGET_SUPPORTS_WIDE_INT == 0
   unsigned int width = GET_MODE_PRECISION (mode);
-#endif
 
   if (VECTOR_MODE_P (mode)
   && code != VEC_CONCAT
@@ -3787,40 +3785,45 @@ simplify_const_binary_operation (enum rt
 	  break;
 
 	case LSHIFTRT:
-	  if (wide_int (std::make_pair (op1, mode)).neg_p ())
-	return NULL_RTX;
-
-	  result = wop0.rshiftu (pop1, bitsize, TRUNC);
-	  break;
-	  
 	case ASHIFTRT:
-	  if (wide_int (std::make_pair (op1, mode)).neg_p ())
-	return NULL_RTX;
-
-	  result = wop0.rshifts (pop1, bitsize, TRUNC);
-	  break;
-	  
 	case ASHIFT:
-	  if (wide_int (std::make_pair (op1, mode)).neg_p ())
-	return NULL_RTX;
-
-	  result = wop0.lshift (pop1, bitsize, TRUNC);
-	  break;
-	  
 	case ROTATE:
-	  if (wide_int (std::make_pair (op1, mode)).neg_p ())
-	return NULL_RTX;
-
-	  result = wop0.lrotate (pop1);
-	  break;
-	  
 	case ROTATERT:
-	  if (wide_int (std::make_pair (op1, mode)).neg_p ())
-	return NULL_RTX;
+	  {
+	wide_int wop1 = pop1;
+	if (wop1.neg_p ())
+	  return NULL_RTX;
 
-	  result = wop0.rrotate (pop1);
-	  break;
+	if (SHIFT_COUNT_TRUNCATED)
+	  wop1 = wop1.umod_trunc (width); 
 
+	switch (code)
+	  {
+	  case LSHIFTRT:
+		result = wop0.rshiftu (wop1, bitsize);
+		break;
+		
+	  case ASHIFTRT:
+		result = wop0.rshifts (wop1, bitsize);
+		break;
+		
+	  case ASHIFT:
+		result = wop0.lshift (wop1, bitsize);
+		break;
+		
+	  case ROTATE:
+		result = wop0.lrotate (wop1);
+		break;
+		
+	  case ROTATERT:
+		result = wop0.rrotate (wop1);
+		break;
+
+	  default:
+		gcc_unreachable ();
+	  }
+	break;
+	  }
 	default:
 	  return NULL_RTX;
 	}
Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc	(revision 201985)
+++ gcc/wide-int.cc	(working copy)
@@ -2404,7 +2404,12 @@ wide_int_ro::divmod_internal_2 (unsigned
 
 
 /* Do a truncating divide DIVISOR into DIVIDEND.  The result is the
-   same size as the operands.  SIGN is either SIGNED or UNSIGNED.  */
+   same size as the operands.  SIGN is either SIGNED or UNSIGNED.  If
+   COMPUTE_QUOTIENT is set, the quotient is computed and returned.  If
+   it is not set, the result is undefined.  If COMPUTE_REMAINDER is
+   set, the remaineder is returned in remainder.  If it is not set,
+   the remainder is undefined.  If OFLOW is not null, it is set to the
+   overflow value.  */
 wide_int_ro
 wide_int_ro::divmod_internal (bool compute_quotient,
 			  const HOST_WIDE_INT *dividend,
@@ -2470,6 +2475,10 @@ wide_int_ro::divmod_internal (bool compu
   quotient.precision = dividend_prec;
   remainder->precision = dividend_prec;
 
+  /* Initialize the incoming overflow if it has been provided.  */
+  if (oflow)
+*oflow = false;
+
   /* If overflow is set, just get out.  There will only be grief by
  continuing.  */
   if (overflow)
Index: gcc/wide-int.h
===
--- gcc/wide-int.h	(revision 201995)
+++ gcc/wide-int.h	(working copy)
@@ -259,19 +259,6 @@ const int addr_max_bitsize = 64;
 const int addr_max_precision
   = ((addr_max_bitsize + 4 + HOST_BITS_PER_WIDE_INT - 1) & ~(HOST_BITS_PER_WIDE_INT - 1));
 
-enum ShiftOp {
-  NONE,
-  /* There are two uses for the wide-int shifting functions.  The
- first use is as an emulation of the target hardware.  The
- second use is as service routines for other optimizations.  The
- first case needs to be identified by passing TRUNC as the value
- of ShiftOp so that shift amount is properly handled according to the
- SHIFT_COUNT_TRUNCATED flag.  For the second case, the shift
- amount is always truncated by the bytesize of the mode of
- THIS.  */
-  TRUNC
-};
-
 /*

Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)

2013-08-27 Thread Uros Bizjak
On Tue, Aug 27, 2013 at 7:36 PM, H.J. Lu  wrote:

>>> As reported in [1] the host processor detection has not yet been updated
>>> to recognize Intel Ivy Bridge and Haswell processors.
>>> This small patch adds the detection of these processors and assumes
>>> core-avx2 as march for unknown processors of the PENTIUMPRO family that
>>> support AVX2.
>>
>> I have committed slightly improved (attached) patch that uses
>> core-avx-i for IvyBridge and adds another IvyBridge model number.
>> While there, I also reordered a bunch  of statements.
>>
>> Thanks,
>> Uros.
>
> Page C-3 in ntel optimization guide shows:
>
> 06_3CH, 06_45H, 06_46H Intel microarchitecture Haswell
> 06_3AH, 06_3EH Intel microarchitecture Ivy Bridge
> 06_2AH, 06_2DH Intel microarchitecture Sandy Bridge
> 06_25H, 06_2CH, 06_2FH Intel microarchitecture Westmere
> 06_1AH, 06_1EH, 06_1FH, Intel microarchitecture Nehalem
> 06_2EH
> 06_17H, 06_1DH Enhanced Intel Core microarchitecture
> 06_0FH Intel Core microarchitecture
>
> At least, we should add 0x45 and 0x46 to Haswell.

OK, the patch is pre-approved.

Thanks,
Uros.


Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)

2013-08-27 Thread H.J. Lu
On Tue, Aug 27, 2013 at 10:28 AM, Uros Bizjak  wrote:
> Hello!
>
>> As reported in [1] the host processor detection has not yet been updated
>> to recognize Intel Ivy Bridge and Haswell processors.
>> This small patch adds the detection of these processors and assumes
>> core-avx2 as march for unknown processors of the PENTIUMPRO family that
>> support AVX2.
>
> I have committed slightly improved (attached) patch that uses
> core-avx-i for IvyBridge and adds another IvyBridge model number.
> While there, I also reordered a bunch  of statements.
>
> Thanks,
> Uros.

Page C-3 in ntel optimization guide shows:

06_3CH, 06_45H, 06_46H Intel microarchitecture Haswell
06_3AH, 06_3EH Intel microarchitecture Ivy Bridge
06_2AH, 06_2DH Intel microarchitecture Sandy Bridge
06_25H, 06_2CH, 06_2FH Intel microarchitecture Westmere
06_1AH, 06_1EH, 06_1FH, Intel microarchitecture Nehalem
06_2EH
06_17H, 06_1DH Enhanced Intel Core microarchitecture
06_0FH Intel Core microarchitecture

At least, we should add 0x45 and 0x46 to Haswell.

-- 
H.J.


Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)

2013-08-27 Thread Uros Bizjak
Hello!

> As reported in [1] the host processor detection has not yet been updated
> to recognize Intel Ivy Bridge and Haswell processors.
> This small patch adds the detection of these processors and assumes
> core-avx2 as march for unknown processors of the PENTIUMPRO family that
> support AVX2.

I have committed slightly improved (attached) patch that uses
core-avx-i for IvyBridge and adds another IvyBridge model number.
While there, I also reordered a bunch  of statements.

Thanks,
Uros.
Index: driver-i386.c
===
--- driver-i386.c   (revision 202024)
+++ driver-i386.c   (working copy)
@@ -644,13 +644,18 @@ const char *host_detect_local_cpu (int argc, const
  /* Atom.  */
  cpu = "atom";
  break;
+   case 0x0f:
+ /* Merom.  */
+   case 0x17:
+   case 0x1d:
+ /* Penryn.  */
+ cpu = "core2";
+ break;
case 0x1a:
case 0x1e:
case 0x1f:
case 0x2e:
  /* Nehalem.  */
- cpu = "corei7";
- break;
case 0x25:
case 0x2c:
case 0x2f:
@@ -662,20 +667,23 @@ const char *host_detect_local_cpu (int argc, const
  /* Sandy Bridge.  */
  cpu = "corei7-avx";
  break;
-   case 0x17:
-   case 0x1d:
- /* Penryn.  */
- cpu = "core2";
+   case 0x3a:
+   case 0x3e:
+ /* Ivy Bridge.  */
+ cpu = "core-avx-i";
  break;
-   case 0x0f:
- /* Merom.  */
- cpu = "core2";
+   case 0x3c:
+ /* Haswell.  */
+ cpu = "core-avx2";
  break;
default:
  if (arch)
{
  /* This is unknown family 0x6 CPU.  */
- if (has_avx)
+ if (has_avx2)
+   /* Assume Haswell.  */
+   cpu = "core-avx2";
+ else if (has_avx)
/* Assume Sandy Bridge.  */
cpu = "corei7-avx";
  else if (has_sse4_2)


Re: [PATCH 3/3] Support dumping type bindings in lambda diagnostics.

2013-08-27 Thread Jason Merrill

This patch doesn't seem to depend on the others; go ahead and apply it.

Jason


Re: RFA: fix some avr stdint issues

2013-08-27 Thread Georg-Johann Lay

Joern Rennecke wrote:
This patch fixes the gcc.dg/c99-stdint-5.c and gcc.dg/c99-stdint-6.c 
excess error failures.


FYI, some of the problems with the c99-stdint tests are related to the 
stdint.h implementation in use.  For example, some types in AVR-Libc's 
stdint.h are not defined in the way the GCC test suite expects them: 
AVR-Libc defines most C99 types by means of attribute mode.


https://savannah.nongnu.org/bugs/?34695#comment7

Johann



Re: [PING 3] [Patch RX] Add assembler option "-mcu" for generating assembler

2013-08-27 Thread nick clifton

Hi Sandeep,


gas/config:
2013-07-18  Sandeep Kumar Singh

 * rx.h: Add option -mcpu for target variants RX100 and RX200.



Approved - please apply.

Cheers
  Nick




Re: RFA: AVR: Support building AVR Linux targets

2013-08-27 Thread nick clifton

Hi Joseph,

   C. Draw up another patch that restricts the AVR patterns in config.gcc to
-none and -elf.


A and C - I think both changes should be applied.


OK - the patch for item A is already applied.  Here is a proposed patch 
for item C.  I have not applied the patch as obvious because I was not 
sure whether it was better to accept avr-*-elf or avr-*-elf*.  I went 
for the latter in case there were AVR ELF variants, but I defer to your 
superior knowledge.


OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2013-08-27  Nick Clifton  

* config.gcc (AVR): Restrict allowed targets to RTEMS, ELF and
NONE.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 202021)
+++ gcc/config.gcc  (working copy)
@@ -995,7 +995,7 @@
extra_gcc_objs="driver-avr.o avr-devices.o"
extra_objs="avr-devices.o avr-log.o"
;;
-avr-*-*)
+avr-*-elf* | avr-*-none)
 	tm_file="elfos.h avr/elf.h avr/avr-arch.h avr/avr.h dbxelf.h 
avr/avr-stdint.h"

if test x${with_avrlibc} != xno; then
tm_file="${tm_file} ${cpu_type}/avrlibc.h"



Re: v2 of GDB hooks for debugging GCC

2013-08-27 Thread Tom Tromey
> "David" == David Malcolm  writes:

David> Is there a precanned event provided by gdb that I can connect to for
David> when the underlying code has changed and my caches need to be
David> invalidated?

Maybe not :(

You could use the "exited" event as a decent approximation.

Also, and I think we're really getting into the obscure stuff here, if
you want to let users debug multiple versions of gcc at the same time,
then your caches have to be parameterized by the inferior.  I'm
reasonably confident that nobody actually does this.

Tom


Re: v3 of GDB hooks for debugging GCC

2013-08-27 Thread Jeff Law

On 08/26/2013 12:42 PM, David Malcolm wrote:

The patch also adds me a maintainer of gdbhooks.py into the MAINTAINERS
file.  (There doesn't seem to be any sort order to the maintainer part
of that file, should there be?)

Finally, I added a copyright header to the new file ("part of GCC", FSF
assignee, GPLv3 or later).

OK for trunk?

Yes.  This is good.

Thanks,

jeff


Re: [GOOGLE] Update AutoFDO annotation

2013-08-27 Thread Xinliang David Li
Ok.

David

On Tue, Aug 27, 2013 at 7:36 AM, Dehao Chen  wrote:
> Patch updated.
>
> Thanks,
> Dehao
>
> On Mon, Aug 26, 2013 at 4:11 PM, Xinliang David Li  wrote:
>> Can you add missing documentation on functions like ...:get_count_info
>> -- documenting return value etc.  Also it might be better to avoid
>> using 'set' as the local variable name. Change it to something more
>> specific.
>>
>> thanks,
>>
>> David
>>
>> On Thu, Aug 22, 2013 at 3:56 PM, Dehao Chen  wrote:
>>> This patch has 2 changes:
>>>
>>> 1. Now that we have discriminator for inlined callsite, we don't need
>>> special handling for callsite location any more.
>>> 2. If a source line is mapped to multiple BBs, only the first BB will
>>> be annotated.
>>> 3. Before actual annotation, mark everythin BB/edge as not annotated.
>>>
>>> Bootstrapped and passed regression test.
>>>
>>> OK for google branch?
>>>
>>> Thanks,
>>> Dehao
>>>
>>> Index: gcc/auto-profile.c
>>> ===
>>> --- gcc/auto-profile.c (revision 201927)
>>> +++ gcc/auto-profile.c (working copy)
>>> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  #include "config.h"
>>>  #include "system.h"
>>> @@ -100,6 +101,10 @@ typedef std::map icall_target
>>> (execution_count, value_profile_histogram).  */
>>>  typedef std::pair count_info;
>>>
>>> +/* Set of inline_stack. Used to track if the profile is already used to
>>> +   annotate the program.  */
>>> +typedef std::set location_set;
>>> +
>>>  struct string_compare
>>>  {
>>>bool operator() (const char *a, const char *b) const
>>> @@ -202,7 +207,8 @@ class autofdo_source_profile {
>>>const function_instance *get_function_instance_by_decl (tree decl) const;
>>>/* Find profile info for a given gimple STMT. If found, store the profile
>>>   info in INFO, and return true; otherwise return false.  */
>>> -  bool get_count_info (gimple stmt, count_info *info) const;
>>> +  bool get_count_info (gimple stmt, count_info *info,
>>> +   const location_set *set) const;
>>>/* Find total count of the callee of EDGE.  */
>>>gcov_type get_callsite_total_count (struct cgraph_edge *edge) const;
>>>
>>> @@ -284,17 +290,13 @@ static const char *get_original_name (const char *
>>>
>>>  /* Return the combined location, which is a 32bit integer in which
>>> higher 16 bits stores the line offset of LOC to the start lineno
>>> -   of DECL, The lower 16 bits stores the discrimnator of LOC if
>>> -   USE_DISCR is true, otherwise 0.  */
>>> +   of DECL, The lower 16 bits stores the discrimnator.  */
>>>
>>>  static unsigned
>>> -get_combined_location (location_t loc, tree decl, bool use_discr)
>>> +get_combined_location (location_t loc, tree decl)
>>>  {
>>> -  if (use_discr)
>>> -return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16)
>>> -   | get_discriminator_from_locus (loc);
>>> -  else
>>> -return (LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16;
>>> +  return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16)
>>> + | get_discriminator_from_locus (loc);
>>>  }
>>>
>>>  /* Return the function decl of a given lexical BLOCK.  */
>>> @@ -316,7 +318,7 @@ get_function_decl_from_block (tree block)
>>>  }
>>>
>>>  static void
>>> -get_inline_stack (gimple stmt, bool use_discr, inline_stack *stack)
>>> +get_inline_stack (gimple stmt, inline_stack *stack)
>>>  {
>>>location_t locus = gimple_location (stmt);
>>>if (LOCATION_LOCUS (locus) == UNKNOWN_LOCATION)
>>> @@ -337,14 +339,13 @@ static void
>>>
>>>tree decl = get_function_decl_from_block (block);
>>>stack->push_back (std::make_pair (
>>> -  decl, get_combined_location (locus, decl, level == 0 && use_discr)));
>>> +  decl, get_combined_location (locus, decl)));
>>>locus = tmp_locus;
>>>level++;
>>>  }
>>>stack->push_back (std::make_pair (
>>>current_function_decl,
>>> -  get_combined_location (locus, current_function_decl,
>>> - level == 0 && use_discr)));
>>> +  get_combined_location (locus, current_function_decl)));
>>>  }
>>>
>>>
>>> @@ -523,14 +524,16 @@ const function_instance *autofdo_source_profile::g
>>>return ret == map_.end() ? NULL : ret->second;
>>>  }
>>>
>>> -bool autofdo_source_profile::get_count_info (gimple stmt,
>>> - count_info *info) const
>>> +bool autofdo_source_profile::get_count_info (gimple stmt, count_info *info,
>>> + const location_set *set) const
>>>  {
>>>if (LOCATION_LOCUS (gimple_location (stmt)) == cfun->function_end_locus)
>>>  return false;
>>>
>>>inline_stack stack;
>>> -  get_inline_stack (stmt, true, &stack);
>>> +  get_inline_stack (stmt, &stack);
>>> +  if (set && set->find(stack) != set->end())
>>> +return false;
>>>if (stack.size () == 0)
>>>  return false;
>>>const function_instance *s = get_function_instance_by_inline_stack 
>>> (stack);
>>> @@ -544,7 +

Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy

2013-08-27 Thread Mike Stump
On Aug 27, 2013, at 4:08 AM, Richard Biener  wrote:
>> and converts:
>>  struct GTY(()) cgraph_node
>> to:
>>  struct GTY((user)) cgraph_node : public symtab_node_base

GTY didn't like single inheritance for me in in wide-int.h.  I extended GTY to 
support it better.   See the wide-int branch, if you need more beef here.  It 
isn't flawless, but, it did left me to things to make it work nice enough for 
my purposes.  In my case, all my data is in the base class, and the base class 
and the derived have the same address (single inheritance, no virtual bases), 
and I have no embedded pointers in my data.  I don't use user, seemed less 
ideal to me.

I also had to put:

  void gt_ggc_mx(max_wide_int*) { }
  void gt_pch_nx(max_wide_int*,void (*)(void*, void*), void*) { }
  void gt_pch_nx(max_wide_int*) { }

in wide-int.cc and:

  extern void gt_ggc_mx(max_wide_int*);
  extern void gt_pch_nx(max_wide_int*,void (*)(void*, void*), void*);
  extern void gt_pch_nx(max_wide_int*);

into wide-int.h to get it to go.  These I think are merely because I'm using a 
typedef for a template with template args.  In the wide-int branch, it'd be 
nice if a GTY god can figure out how to improves things so I don't need to do 
the above, and to review/approve/fix the GTY code to support single inheritance 
even better.

>> and:
>>  struct GTY(()) varpool_node
>> to:
>>  class GTY((user)) varpool_node : public symtab_node_base
>> 
>> dropping the symtab_node_def union.
>> 
>> Since gengtype is unable to cope with inheritance, we have to mark the
>> types with GTY((user)), and handcode the gty field-visiting functions.
>> Given the simple hierarchy, we don't need virtual functions for this.


Re: [GOOGLE] Update AutoFDO annotation

2013-08-27 Thread Dehao Chen
Patch updated.

Thanks,
Dehao

On Mon, Aug 26, 2013 at 4:11 PM, Xinliang David Li  wrote:
> Can you add missing documentation on functions like ...:get_count_info
> -- documenting return value etc.  Also it might be better to avoid
> using 'set' as the local variable name. Change it to something more
> specific.
>
> thanks,
>
> David
>
> On Thu, Aug 22, 2013 at 3:56 PM, Dehao Chen  wrote:
>> This patch has 2 changes:
>>
>> 1. Now that we have discriminator for inlined callsite, we don't need
>> special handling for callsite location any more.
>> 2. If a source line is mapped to multiple BBs, only the first BB will
>> be annotated.
>> 3. Before actual annotation, mark everythin BB/edge as not annotated.
>>
>> Bootstrapped and passed regression test.
>>
>> OK for google branch?
>>
>> Thanks,
>> Dehao
>>
>> Index: gcc/auto-profile.c
>> ===
>> --- gcc/auto-profile.c (revision 201927)
>> +++ gcc/auto-profile.c (working copy)
>> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "config.h"
>>  #include "system.h"
>> @@ -100,6 +101,10 @@ typedef std::map icall_target
>> (execution_count, value_profile_histogram).  */
>>  typedef std::pair count_info;
>>
>> +/* Set of inline_stack. Used to track if the profile is already used to
>> +   annotate the program.  */
>> +typedef std::set location_set;
>> +
>>  struct string_compare
>>  {
>>bool operator() (const char *a, const char *b) const
>> @@ -202,7 +207,8 @@ class autofdo_source_profile {
>>const function_instance *get_function_instance_by_decl (tree decl) const;
>>/* Find profile info for a given gimple STMT. If found, store the profile
>>   info in INFO, and return true; otherwise return false.  */
>> -  bool get_count_info (gimple stmt, count_info *info) const;
>> +  bool get_count_info (gimple stmt, count_info *info,
>> +   const location_set *set) const;
>>/* Find total count of the callee of EDGE.  */
>>gcov_type get_callsite_total_count (struct cgraph_edge *edge) const;
>>
>> @@ -284,17 +290,13 @@ static const char *get_original_name (const char *
>>
>>  /* Return the combined location, which is a 32bit integer in which
>> higher 16 bits stores the line offset of LOC to the start lineno
>> -   of DECL, The lower 16 bits stores the discrimnator of LOC if
>> -   USE_DISCR is true, otherwise 0.  */
>> +   of DECL, The lower 16 bits stores the discrimnator.  */
>>
>>  static unsigned
>> -get_combined_location (location_t loc, tree decl, bool use_discr)
>> +get_combined_location (location_t loc, tree decl)
>>  {
>> -  if (use_discr)
>> -return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16)
>> -   | get_discriminator_from_locus (loc);
>> -  else
>> -return (LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16;
>> +  return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16)
>> + | get_discriminator_from_locus (loc);
>>  }
>>
>>  /* Return the function decl of a given lexical BLOCK.  */
>> @@ -316,7 +318,7 @@ get_function_decl_from_block (tree block)
>>  }
>>
>>  static void
>> -get_inline_stack (gimple stmt, bool use_discr, inline_stack *stack)
>> +get_inline_stack (gimple stmt, inline_stack *stack)
>>  {
>>location_t locus = gimple_location (stmt);
>>if (LOCATION_LOCUS (locus) == UNKNOWN_LOCATION)
>> @@ -337,14 +339,13 @@ static void
>>
>>tree decl = get_function_decl_from_block (block);
>>stack->push_back (std::make_pair (
>> -  decl, get_combined_location (locus, decl, level == 0 && use_discr)));
>> +  decl, get_combined_location (locus, decl)));
>>locus = tmp_locus;
>>level++;
>>  }
>>stack->push_back (std::make_pair (
>>current_function_decl,
>> -  get_combined_location (locus, current_function_decl,
>> - level == 0 && use_discr)));
>> +  get_combined_location (locus, current_function_decl)));
>>  }
>>
>>
>> @@ -523,14 +524,16 @@ const function_instance *autofdo_source_profile::g
>>return ret == map_.end() ? NULL : ret->second;
>>  }
>>
>> -bool autofdo_source_profile::get_count_info (gimple stmt,
>> - count_info *info) const
>> +bool autofdo_source_profile::get_count_info (gimple stmt, count_info *info,
>> + const location_set *set) const
>>  {
>>if (LOCATION_LOCUS (gimple_location (stmt)) == cfun->function_end_locus)
>>  return false;
>>
>>inline_stack stack;
>> -  get_inline_stack (stmt, true, &stack);
>> +  get_inline_stack (stmt, &stack);
>> +  if (set && set->find(stack) != set->end())
>> +return false;
>>if (stack.size () == 0)
>>  return false;
>>const function_instance *s = get_function_instance_by_inline_stack 
>> (stack);
>> @@ -544,7 +547,7 @@ gcov_type autofdo_source_profile::get_callsite_tot
>>  {
>>inline_stack stack;
>>stack.push_back (std::make_pair(edge->callee->symbol.decl, 0));
>> -  get_inline_stack (edge->call_stmt,

Re: Request to merge Undefined Behavior Sanitizer in (take 3)

2013-08-27 Thread Marek Polacek
On Thu, Aug 22, 2013 at 09:01:57PM +0200, Marek Polacek wrote:
> On Thu, Aug 22, 2013 at 07:51:07PM +0200, Marek Polacek wrote:
> > Ping.
> 
> I'm withdrawing the ping for now.  I'll have to deal with some bootstrap
> comparison failures first (ugh!).

Fixed with this patch:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01344.html
Also, I've changed the hash table to pointer map:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01582.html

Regtested, ran bootstrap-ubsan on x86_64-linux.

Ok to merge ubsan into trunk?

Marek


Re: [PING PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK

2013-08-27 Thread Jakub Jelinek
On Tue, Aug 27, 2013 at 04:03:42PM +0200, Martin Jambor wrote:
> On Fri, Aug 23, 2013 at 05:29:23PM +0200, Jakub Jelinek wrote:
> > On Fri, Aug 23, 2013 at 05:11:22PM +0200, Martin Jambor wrote:
> > > Hi Jakub and/or Joseph,
> > > 
> > > the reporter of this bug seems to be very anxious to have it fixed in
> > > the repository.  While Richi is away, do you think you could have a
> > > look?  It is very small.
> > 
> > Isn't this ABI incompatible change (at least potential on various targets)?
> > If so, then it shouldn't be applied to release branches, because it would
> > create (potential?) ABI incompatibility between 4.8.[01] and 4.8.2.
> > 
> 
> I don't know.  I did a few attempts to observe a change in the calling
> convention of a function accepting a zero sized array terminated
> structure by value (on x86_64) but I did not succeed.  On the other
> hand, I assume there are many other ways how a mode can influence ABI.
> So I'll try to have a look whether we can hack around this situation
> in 4.8's expr.c instead.

All I remember that e.g. the number of regressions from PR20020 was big,
and any kind of TYPE_MODE changes are just extremely risky.
Perhaps x86_64 in the ABI decisions never uses mode, but we have tons of
other targets and it would surprise me if none of those were affected.

> Nevertheless, is this patch ok for trunk?

I'll defer that to Richard now that he is back ;)

Jakub


Re: [PING PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK

2013-08-27 Thread Martin Jambor
Hi,

On Fri, Aug 23, 2013 at 05:29:23PM +0200, Jakub Jelinek wrote:
> On Fri, Aug 23, 2013 at 05:11:22PM +0200, Martin Jambor wrote:
> > Hi Jakub and/or Joseph,
> > 
> > the reporter of this bug seems to be very anxious to have it fixed in
> > the repository.  While Richi is away, do you think you could have a
> > look?  It is very small.
> 
> Isn't this ABI incompatible change (at least potential on various targets)?
> If so, then it shouldn't be applied to release branches, because it would
> create (potential?) ABI incompatibility between 4.8.[01] and 4.8.2.
> 

I don't know.  I did a few attempts to observe a change in the calling
convention of a function accepting a zero sized array terminated
structure by value (on x86_64) but I did not succeed.  On the other
hand, I assume there are many other ways how a mode can influence ABI.
So I'll try to have a look whether we can hack around this situation
in 4.8's expr.c instead.

Nevertheless, is this patch ok for trunk?

Thanks,

Martin


[C++ Patch] PR 13981

2013-08-27 Thread Paolo Carlini

Hi,

in this very old issue submitter requested some sort of hint in the 
diagnostic that the specific problem with the testcase has to do with 
class type B being incomplete. I think an inform can do.


I feared that we would provide too many "false positives", in the sense 
that the inform would often not be that useful, depending on the target 
type, but in fact I tried and it *never* (X) triggers in our whole 
testsuite, thus I'm not that worried anymore ;)


Tested x86_64-linux.

Thanks,
Paolo.

(X) Without the class type check it would trigger 2 times in 
g++.old-deja for arrays of unknown bound. I think we don't want those.



/cp
2013-08-27  Paolo Carlini  

PR c++/13981
* typeck.c (convert_for_assignment): Provide an inform for pointers
to incomplete class types.

/testsuite
2013-08-27  Paolo Carlini  

PR c++/13981
* g++.dg/diagnostic/pr13981.C: New.
Index: cp/typeck.c
===
--- cp/typeck.c (revision 202020)
+++ cp/typeck.c (working copy)
@@ -7983,6 +7983,11 @@ convert_for_assignment (tree type, tree rhs,
default:
  gcc_unreachable();
  }
+ if (TYPE_PTR_P (rhstype)
+ && CLASS_TYPE_P (TREE_TYPE (rhstype))
+ && !COMPLETE_TYPE_P (TREE_TYPE (rhstype)))
+   inform (input_location, "class type %qT is incomplete",
+   TREE_TYPE (rhstype));
}
  return error_mark_node;
}
Index: testsuite/g++.dg/diagnostic/pr13981.C
===
--- testsuite/g++.dg/diagnostic/pr13981.C   (revision 0)
+++ testsuite/g++.dg/diagnostic/pr13981.C   (working copy)
@@ -0,0 +1,12 @@
+// PR c++/13981
+
+struct A {};
+struct B;
+
+void func( A *a );
+
+int main()
+{
+  B *b = 0;
+  func(b);  // { dg-error "cannot convert" }
+}  // { dg-message "is incomplete" "is incomplete" { target *-*-* } 11 }


[Patch, Fortran, F03] PR 55603: Memory leak with scalar allocatable function result

2013-08-27 Thread Janus Weil
Hi all,

here is a patch for PR 55603, which plugs a memory leak with scalar
allocatable function results.

To accomplish this, several things are done:
1) Allocatable scalar function results are passed as argument now and
returned by reference (just like array or character results, cf.
gfc_return_by_reference).
2) A temporary variable is inserted when calling a function with
allocatable scalar result (see gfc_conv_procedure_call).
3) The temporary is nullified before and freed after the call (which
is of course what plugs the memory leak in the end).

The first point requires some additional adjustments (see
create_function_arglist, gfc_trans_deferred_vars and
gfc_conv_variable).

In fact the patch is just a first step and does not handle more
advanced cases yet (like polymorphic allocatable scalar results,
finalization, etc). However, I wanted to get it out already, in order
to get some feedback on my approach before moving on. Also Tobias
mentioned that he has a draft patch for this PR, so he may have some
additions to it.

In any case, the patch fixes the original test case by Damian and
regtests cleanly on x86_64-unknown-linux-gnu. Ok for trunk?

Cheers,
Janus


2013-08-27  Janus Weil  

PR fortran/55603
* trans-decl.c (create_function_arglist, gfc_trans_deferred_vars): Pass
allocatable scalar results as argument.
* trans-expr.c (gfc_conv_variable): Extra dereferencing for allocatable
scalar results.
(gfc_conv_procedure_call): Pass allocatable scalar results as argument.
Create temporary, nullify and deallocate.
* trans-types.c (gfc_return_by_reference): Treat allocatable scalars.


2013-08-27  Janus Weil  

PR fortran/55603
* gfortran.dg/allocatable_scalar_result_1.f90: New.


pr55603.diff
Description: Binary data


allocatable_scalar_result_1.f90
Description: Binary data


Re: [ubsan] Introduce pointer_sized_int_node

2013-08-27 Thread Marek Polacek
On Tue, Aug 27, 2013 at 01:48:29PM +0200, Richard Biener wrote:
> > +  TI_POINTER_SIZED_TYPE,
> 
> I'd rather see TI_UINTPTR_TYPE and TI_INTPTR_TYPE (note they might
> not be exactly of POINTER_SIZE but larger).

We already have [u]intptr_type_node -- but only in c-family/, thus
ubsan.c/asan.c cannot use those nodes.  I can create both
TI_SIGNED_POINTER_SIZED_TYPE and TI_UNSIGNED_POINTER_SIZED_TYPE,
but we currently need only the latter...

> TI_POINTER_SIZED_TYPE is ambiguous, too - what's its signedness?

Unsigned.  But yeah, one can't tell by just looking at the name.

> All around the compiler we use sizetype and ssizetype to munge pointers
> (well, not strictly correct as targets may define sizetype to be 
> larger/smaller
> than actual pointers).

I see.

Marek


Re: ELF interposition and One Definition Rule

2013-08-27 Thread Nathan Sidwell

On 08/26/13 20:58, Jason Merrill wrote:


I would be happy with an even stronger default that optimizes on the assumption
that no interposition occurs; typically interposition is overriding a symbol
found in a dynamic library (i.e. malloc) rather than a symbol defined in the
same translation unit as the use.


I had thought that that case (overriding malloc etc) was what this patch was 
dealing with.  Perhaps I was confused.


There's nothing particularly special about ctors, dtors and virtual function 
implementations.  Their common feature, as Jan described, is that it's hard to 
take their address:

* ctors don't have names.
* dtors do have names, but you can't take their address.
* virtual function implementations can't have their instance address taken -- 
you get the ptr-to-memfn object describing the vtable slot etc.  There is (was?) 
a GNU extension to work around that, IIRC.


So that stops a programmer calling these indirectly.  However I don't really see 
the semantic difference between:


  fptr = &somefunc; ... fptr (...);
and
  somefunc (...);

So treating those 3 types of function specially because you can't use the former 
syntax with them seems odd.


nathan

--
Nathan Sidwell


Re: [PATCH] Change the badness computation to ensure no integer-underflow

2013-08-27 Thread Jan Hubicka
> >>
> >> I've updated the patch (as attached) to use sreal to compute badness.
>  +  badness = ((int)((double) edge->count / max_count
>  +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
>  +
> >>>
> >>> FP operations on the host are frowned upon if code generation depends
> >>> on their outcome.  They all should use sreal_* as predict already does.
> >>> Other than that I wonder why the final division is int (so we get 
> >>> truncation
> >>> and not rounding)?  That increases the effect of doing the multiply by
> >>> relbenefit in double.
Sorry, i missed this patch - I need to update my scripts marking files 
interesting
to me.

I originally duplicated the trick from global.c that used to do this for ages.
The rationale is that if you do only one FP operation, then you will not get
difference in between hosts doing 80bit fp (x86) and 64bit fp (others) and you 
do the operation a lot faster than sreal does. While this is on slipperly 
ground,
controlled use of double should not imply host dependency.

Badness computation is on hot the path of inliner that is the slowest WPA
optimization we have (well becuase it only does something realy useful).  I
suppose I can try to get Martin to benchmark the patch on firefox on the
afternoon.

Honza
> >>>
> >>> Richard.
> >>>
> /* Be sure that insanity of the profile won't lead to increasing 
>  counts
>    in the scalling and thus to overflow in the computation above.  */
> gcc_assert (max_count >= edge->count);


Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy

2013-08-27 Thread Jan Hubicka
> >>
> >> Also all of the symbol table is reachable from the global symbol_table
> >> dynamic array which is a GC root.  So instead of walking ->next/previous
> >> and edges you should have a custom marker for the symbol_table global
> >> which does more efficient marking with loops.
> >
> > Indeed, good point!
> > All of the pointers linking symbol table together (i.e. pointer to other 
> > symbol
> > table entries) can be ignored for PCH mark (since we are explicitely 
> > allocated,
> > and all we need is to mark the trees/lto data and other stuff we point to.
> >
> > These are annotated primarily to get PCH working, where we obviously need
> > to keep them.
> >
> > It would be nice to progress on this patch soon :)
> 
> Oh, and with a custom marker for the symbol_table GC root the actual
> symbol / edge objects need not be GC allocated at all!  Just make sure
> to properly mark the GC objects they refer to.

Lets play with this incrementally; first nodes are streamed to PCH, so we need
to get them out of that mess first.  Second even edges reffers to trees that 
needs
to be walked. There will be some surprises.

My previous comment about not walking all node pointers was wrong; we do 
maintain
list of removed cgraph nodes linked by ->next and not referred from any global
array (or rather hash).
So I guess in first iteration we can go with marker that marks all non-symtabs
and walks the ->next linked chains.

Thanks,
Honza
> 
> Richard.
> 
> > Honza
> >>
> >> Richard.


[ubsan] Use pointer map instead of hash table.

2013-08-27 Thread Marek Polacek
It turned out that for tree -> tree mapping we don't need the hash
table at all; pointer map is much more convenient.  So this patch
weeds out the hash table out of ubsan and introduces pointer map
instead.  Quite a lot of code could go away--no need to set the
alloc pools up etc.

Regtested, ran bootstrap-ubsan on x86_64-linux.  Applying to the
ubsan branch.

2013-08-27  Marek Polacek  

* ubsan.c: Use pointer map instead of hash table.

--- gcc/ubsan.c.mp  2013-08-27 12:31:04.136213922 +0200
+++ gcc/ubsan.c 2013-08-27 12:31:10.361236456 +0200
@@ -22,109 +22,42 @@ along with GCC; see the file COPYING3.
 #include "system.h"
 #include "coretypes.h"
 #include "tree.h"
-#include "alloc-pool.h"
 #include "cgraph.h"
 #include "gimple.h"
-#include "hash-table.h"
+#include "pointer-set.h"
 #include "output.h"
 #include "toplev.h"
 #include "ubsan.h"
 #include "c-family/c-common.h"
 
-/* This type represents an entry in the hash table; this hash table
-   maps from a TYPE to a ubsan type descriptor VAR_DECL for that type.  */
-struct ubsan_typedesc
-{
-  /* This represents the type of a variable.  */
-  tree type;
-
-  /* This is the VAR_DECL of the type.  */
-  tree decl;
-};
-
-static alloc_pool ubsan_typedesc_alloc_pool;
-
-/* Hash table for type descriptors.  */
-struct ubsan_typedesc_hasher
-  : typed_noop_remove 
-{
-  typedef ubsan_typedesc value_type;
-  typedef ubsan_typedesc compare_type;
-
-  static inline hashval_t hash (const value_type *);
-  static inline bool equal (const value_type *, const compare_type *);
-};
-
-/* Hash a memory reference.  */
-
-inline hashval_t
-ubsan_typedesc_hasher::hash (const ubsan_typedesc *data)
-{
-  return iterative_hash_object (TYPE_UID (data->type), 0);
-}
-
-/* Compare two data types.  */
-
-inline bool
-ubsan_typedesc_hasher::equal (const ubsan_typedesc *d1,
- const ubsan_typedesc *d2)
-{
-  /* Here, the types should have identical __typekind,
- __typeinfo and __typename.  */
-  return d1->type == d2->type;
-}
-
-static hash_table  ubsan_typedesc_ht;
+/* Map a TYPE to an ubsan type descriptor VAR_DECL for that type.  */
+static pointer_map_t *typedesc_map;
 
-/* Initializes an instance of ubsan_typedesc.  */
+/* Insert DECL as the VAR_DECL for TYPE in the TYPEDESC_MAP.  */
 
 static void
-ubsan_typedesc_init (ubsan_typedesc *data, tree type, tree decl)
+insert_decl_for_type (tree decl, tree type)
 {
-  data->type = type;
-  data->decl = decl;
+  void **slot = pointer_map_insert (typedesc_map, type);
+  gcc_assert (*slot == NULL);
+  *slot = decl;
 }
 
-/* This creates the alloc pool used to store the instances of
-   ubsan_typedesc that are stored in the hash table ubsan_typedesc_ht.  */
+/* Find the VAR_DECL for TYPE in TYPEDESC_MAP.  If TYPE does not
+   exist in the map, return NULL_TREE, otherwise, return the VAR_DECL
+   we found.  */
 
-static alloc_pool
-ubsan_typedesc_get_alloc_pool ()
-{
-  if (ubsan_typedesc_alloc_pool == NULL)
-ubsan_typedesc_alloc_pool = create_alloc_pool ("ubsan_typedesc",
-  sizeof (ubsan_typedesc),
-  10);
-  return ubsan_typedesc_alloc_pool;
-}
-
-/* Returns a reference to the hash table containing data type.
-   This function ensures that the hash table is created.  */
-
-static hash_table  &
-get_typedesc_hash_table ()
-{
-  if (!ubsan_typedesc_ht.is_created ())
-ubsan_typedesc_ht.create (10);
-
-  return ubsan_typedesc_ht;
-}
-
-/* Allocates memory for an instance of ubsan_typedesc into the memory
-   pool returned by ubsan_typedesc_get_alloc_pool and initialize it.
-   TYPE describes a particular type, DECL is its VAR_DECL.  */
-
-static ubsan_typedesc *
-ubsan_typedesc_new (tree type, tree decl)
+static tree
+lookup_decl_for_type (tree type)
 {
-  ubsan_typedesc *desc =
-(ubsan_typedesc *) pool_alloc (ubsan_typedesc_get_alloc_pool ());
-
-  ubsan_typedesc_init (desc, type, decl);
-  return desc;
+  /* If the pointer map is not initialized yet, create it now.  */
+  if (typedesc_map == NULL)
+typedesc_map = pointer_map_create ();
+  void **slot = pointer_map_contains (typedesc_map, type);
+  return slot ? (tree) *slot : NULL_TREE;
 }
 
-/* Helper routine, which encodes a value in the uptr type.
+/* Helper routine, which encodes a value in the pointer_sized_int_node.
Arguments with precision <= POINTER_SIZE are passed directly,
the rest is passed by reference.  T is a value we are to encode.  */
 
@@ -281,24 +214,19 @@ get_ubsan_type_info_for_type (tree type)
 }
 
 /* Helper routine that returns ADDR_EXPR of a VAR_DECL of a type
-   descriptor.  It first looks into the hash table; if not found,
-   create the VAR_DECL, put it into the hash table and return the
+   descriptor.  It first looks into the pointer map; if not found,
+   create the VAR_DECL, put it into the pointer map and return the
ADDR_EXPR of it.  TYPE describes a particular type.  */
 
 tree
 ubsan_t

Re: [PATCH 1/n] Add conditional compare support

2013-08-27 Thread Richard Biener
On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw  wrote:
> On 27/08/13 12:10, Richard Biener wrote:
>> What's this for and what's the desired semantics? I don't like having
>> extra tree codes for this.  Is this for a specific instruction set
>> feature?
>
> The background is to support the conditional compare instructions in ARM
> (more effectively) and AArch64 at all.
>
> The current method used in ARM is to expand into a series of store-flag
> instructions and then hope that combine can optimize them away (though
> that fails far too often, particularly when the first instruction in the
> sequence is combined into another pattern).  To make it work at all the
> compiler has to lie about the costs of various store-flag type
> operations which overall risks producing worse code and means we also
> have to support many more complex multi-instruction patterns than is
> desirable.  I really don't want to go down the same route for AArch64.
>
> The idea behind all this is to capture potential conditional compare
> operations early enough in the mid end that we can keep track of them
> until RTL expand time and then to emit the correct logic on all targets
> depending on what is the best thing for that target.  The current method
> of lowering into store-flag sequences doesn't really cut it.

It seems to me that then the initial instruction selection process (aka RTL
expansion) needs to be improved.  As we are expanding with having the
CFG around it should be easy enough to detect AND/ORIF cases and
do better here.  Yeah, I suppose this asks to turn existing jump expansion
optimizations up-side-down to optimize with the GIMPLE CFG in mind.

The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly
bogus - fold-const.c is way too early to decide this.  Similar to the
ongoing work of expanding / building-up switch expressions in a GIMPLE
pass, moving expand complexity up the pipeline this asks for a GIMPLE
phase that moves this decision down closer to RTL expansion.
(We have tree-ssa-ifcombine.c that is a related GIMPLE transform pass)

Richard.

> R.
>


Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy

2013-08-27 Thread Richard Biener
On Tue, Aug 27, 2013 at 1:40 PM, Jan Hubicka  wrote:
>> > +  while (x != xlimit)
>> > +{
>> > +  /* Code common to all symtab nodes. */
>> > +  gt_ggc_m_9tree_node (x->decl);
>> > +  gt_ggc_mx_symtab_node_base (x->next);
>> > +  gt_ggc_mx_symtab_node_base (x->previous);
>> > +  gt_ggc_mx_symtab_node_base (x->next_sharing_asm_name);
>> > +  gt_ggc_mx_symtab_node_base (x->previous_sharing_asm_name);
>> > +  gt_ggc_mx_symtab_node_base (x->same_comdat_group);
>> > +  gt_ggc_m_20vec_ipa_ref_t_va_gc_ (x->ref_list.references);
>> > +  gt_ggc_m_9tree_node (x->alias_target);
>> > +  gt_ggc_m_18lto_file_decl_data (x->lto_file_data);
>>
>> Don't we have template specializations so we just can do
>>
>>  gt_ggc_mark (x->decl);
>>  gt_ggc_mark (x->next);
> Yep, that is what I hope for.
>> ...
>> etc?
>>
>> Also all of the symbol table is reachable from the global symbol_table
>> dynamic array which is a GC root.  So instead of walking ->next/previous
>> and edges you should have a custom marker for the symbol_table global
>> which does more efficient marking with loops.
>
> Indeed, good point!
> All of the pointers linking symbol table together (i.e. pointer to other 
> symbol
> table entries) can be ignored for PCH mark (since we are explicitely 
> allocated,
> and all we need is to mark the trees/lto data and other stuff we point to.
>
> These are annotated primarily to get PCH working, where we obviously need
> to keep them.
>
> It would be nice to progress on this patch soon :)

Oh, and with a custom marker for the symbol_table GC root the actual
symbol / edge objects need not be GC allocated at all!  Just make sure
to properly mark the GC objects they refer to.

Richard.

> Honza
>>
>> Richard.


Re: [PATCH] Change the badness computation to ensure no integer-underflow

2013-08-27 Thread Richard Biener
On Tue, Jun 25, 2013 at 12:45 AM, Dehao Chen  wrote:
> The original patch has some flaw. The new patch is attached.
> Bootstrapped and passed regression tests.

Ok.

Thanks,
Richard.

> Thanks,
> Dehao
>
> On Mon, Jun 24, 2013 at 9:44 AM, Dehao Chen  wrote:
>> Hi, Richard,
>>
>> I've updated the patch (as attached) to use sreal to compute badness.
>>
>> Thanks,
>> Dehao
>>
>> On Mon, Jun 24, 2013 at 5:41 AM, Richard Biener
>>  wrote:
>>> On Sat, Jun 22, 2013 at 2:59 AM, Dehao Chen  wrote:
 This patch prevents integer-underflow of badness computation in ipa-inline.

 Bootstrapped and passed regression tests.

 OK for trunk?

 Thanks,
 Dehao

 gcc/ChangeLog:
 2013-06-21  Dehao Chen  

 * ipa-inline.c (edge_badness): Fix integer underflow.

 Index: gcc/ipa-inline.c
 ===
 --- gcc/ipa-inline.c (revision 200326)
 +++ gcc/ipa-inline.c (working copy)
 @@ -888,11 +888,9 @@ edge_badness (struct cgraph_edge *edge, bool dump)
else if (max_count)
  {
int relbenefit = relative_time_benefit (callee_info, edge, 
 edge_time);
 -  badness =
 - ((int)
 - ((double) edge->count * INT_MIN / 2 / max_count /
 RELATIVE_TIME_BENEFIT_RANGE) *
 - relbenefit) / growth;
 -
 +  badness = ((int)((double) edge->count / max_count
 +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
 +
>>>
>>> FP operations on the host are frowned upon if code generation depends
>>> on their outcome.  They all should use sreal_* as predict already does.
>>> Other than that I wonder why the final division is int (so we get truncation
>>> and not rounding)?  That increases the effect of doing the multiply by
>>> relbenefit in double.
>>>
>>> Richard.
>>>
/* Be sure that insanity of the profile won't lead to increasing 
 counts
   in the scalling and thus to overflow in the computation above.  */
gcc_assert (max_count >= edge->count);


Re: Stream TYPE_FINAL_P and DECL_FINAL_P to LTO

2013-08-27 Thread Jan Hubicka
> > + v = iterative_hash_host_wide_int (TYPE_TRANSPARENT_AGGR (t), v);
> > + v = iterative_hash_host_wide_int (TYPE_FINAL_P (t), v);
> 
> please use | (TYPE_FINAL_P (t) << 1) to speed this up like in other cases.

Thank you, I missed that TYPE_TRANSPARENT_AGGR is also flag.
I will update the patch also adding the cxx_constructor/cxx_destructor flags
that landed mainline in meantime and commit. (those go basically whenever
DECL_FINAL_P goes)

Honza


Re: [PATCH 1/n] Add conditional compare support

2013-08-27 Thread Richard Earnshaw
On 27/08/13 12:10, Richard Biener wrote:
> What's this for and what's the desired semantics? I don't like having
> extra tree codes for this.  Is this for a specific instruction set
> feature?

The background is to support the conditional compare instructions in ARM
(more effectively) and AArch64 at all.

The current method used in ARM is to expand into a series of store-flag
instructions and then hope that combine can optimize them away (though
that fails far too often, particularly when the first instruction in the
sequence is combined into another pattern).  To make it work at all the
compiler has to lie about the costs of various store-flag type
operations which overall risks producing worse code and means we also
have to support many more complex multi-instruction patterns than is
desirable.  I really don't want to go down the same route for AArch64.

The idea behind all this is to capture potential conditional compare
operations early enough in the mid end that we can keep track of them
until RTL expand time and then to emit the correct logic on all targets
depending on what is the best thing for that target.  The current method
of lowering into store-flag sequences doesn't really cut it.

R.



Re: [ubsan] Introduce pointer_sized_int_node

2013-08-27 Thread Richard Biener
On Mon, Aug 26, 2013 at 12:15 PM, Marek Polacek  wrote:
> I noticed I forgot to apply this old patch, already acked by Jason.
> It introduces new pointer_sized_int_node, thus we can get rid of
> uptr_type function in ubsan, and it allows us to do some clean-up
> in asan.c, too.
>
> Tested x86_64-linux, applying to ubsan branch.
>
> 2013-08-26  Marek Polacek  
>
> * tree.h (enum tree_index): Add TI_POINTER_SIZED_TYPE.
> (pointer_sized_int_node): Define.
> * tree.c (build_common_tree_nodes): Initialize
> pointer_sized_int_node.
> * ubsan.c (ubsan_encode_value): Use pointer_sized_int_node instead
> calling uptr_type.
> (uptr_type): Remove function.
> * asan.c (asan_global_struct): Use pointer_sized_int_node instead
> calling build_nonstandard_integer_type.
> (initialize_sanitizer_builtins): Likewise.
> (asan_finish_file): Likewise.
>
> --- gcc/tree.c.mp   2013-07-21 19:54:35.416986756 +0200
> +++ gcc/tree.c  2013-07-21 19:56:58.347562787 +0200
> @@ -9638,6 +9638,8 @@ build_common_tree_nodes (bool signed_cha
>  = build_pointer_type (build_type_variant (void_type_node, 1, 0));
>fileptr_type_node = ptr_type_node;
>
> +  pointer_sized_int_node = build_nonstandard_integer_type (POINTER_SIZE, 1);
> +
>float_type_node = make_node (REAL_TYPE);
>TYPE_PRECISION (float_type_node) = FLOAT_TYPE_SIZE;
>layout_type (float_type_node);
> --- gcc/ubsan.c.mp  2013-07-21 20:04:59.469653493 +0200
> +++ gcc/ubsan.c 2013-07-21 20:07:00.227178083 +0200
> @@ -123,14 +123,6 @@ ubsan_typedesc_new (tree type, tree decl
>return desc;
>  }
>
> -/* Build the ubsan uptr type.  */
> -
> -static tree
> -uptr_type (void)
> -{
> -  return build_nonstandard_integer_type (POINTER_SIZE, 1);
> -}
> -
>  /* Helper routine, which encodes a value in the uptr type.
> Arguments with precision <= POINTER_SIZE are passed directly,
> the rest is passed by reference.  T is a value we are to encode.  */
> @@ -143,7 +135,7 @@ ubsan_encode_value (tree t)
>  {
>  case INTEGER_TYPE:
>if (TYPE_PRECISION (type) <= POINTER_SIZE)
> -   return fold_build1 (NOP_EXPR, uptr_type (), t);
> +   return fold_build1 (NOP_EXPR, pointer_sized_int_node, t);
>else
> return build_fold_addr_expr (t);
>  case REAL_TYPE:
> @@ -153,7 +145,7 @@ ubsan_encode_value (tree t)
>   {
> tree itype = build_nonstandard_integer_type (bitsize, true);
> t = fold_build1 (VIEW_CONVERT_EXPR, itype, t);
> -   return fold_convert (uptr_type (), t);
> +   return fold_convert (pointer_sized_int_node, t);
>   }
> else
>   {
> --- gcc/tree.h.mp   2013-07-21 19:54:35.441986868 +0200
> +++ gcc/tree.h  2013-07-21 19:56:05.128353854 +0200
> @@ -4227,6 +4227,7 @@ enum tree_index
>TI_VA_LIST_FPR_COUNTER_FIELD,
>TI_BOOLEAN_TYPE,
>TI_FILEPTR_TYPE,
> +  TI_POINTER_SIZED_TYPE,

I'd rather see TI_UINTPTR_TYPE and TI_INTPTR_TYPE (note they might
not be exactly of POINTER_SIZE but larger).

TI_POINTER_SIZED_TYPE is ambiguous, too - what's its signedness?
All around the compiler we use sizetype and ssizetype to munge pointers
(well, not strictly correct as targets may define sizetype to be larger/smaller
than actual pointers).

Richard.

>
>TI_DFLOAT32_TYPE,
>TI_DFLOAT64_TYPE,
> @@ -4383,6 +4384,7 @@ extern GTY(()) tree global_trees[TI_MAX]
>  #define va_list_fpr_counter_field  
> global_trees[TI_VA_LIST_FPR_COUNTER_FIELD]
>  /* The C type `FILE *'.  */
>  #define fileptr_type_node  global_trees[TI_FILEPTR_TYPE]
> +#define pointer_sized_int_node global_trees[TI_POINTER_SIZED_TYPE]
>
>  #define boolean_type_node  global_trees[TI_BOOLEAN_TYPE]
>  #define boolean_false_node global_trees[TI_BOOLEAN_FALSE]
> --- gcc/asan.c.mp   2013-07-21 20:07:15.013237456 +0200
> +++ gcc/asan.c  2013-07-21 20:16:10.929376734 +0200
> @@ -1954,7 +1954,7 @@ asan_global_struct (void)
> = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
>   get_identifier (field_names[i]),
>   (i == 0 || i == 3) ? const_ptr_type_node
> - : build_nonstandard_integer_type (POINTER_SIZE, 1));
> + : pointer_sized_int_node);
>DECL_CONTEXT (fields[i]) = ret;
>if (i)
> DECL_CHAIN (fields[i - 1]) = fields[i];
> @@ -2039,8 +2039,7 @@ initialize_sanitizer_builtins (void)
> ptr_type_node, ptr_type_node, NULL_TREE);
>tree BT_FN_VOID_PTR_PTRMODE
>  = build_function_type_list (void_type_node, ptr_type_node,
> -   build_nonstandard_integer_type (POINTER_SIZE,
> -   1), 
> NULL_TREE);
> +   pointer_sized_int_node, NULL_TREE);
>tree BT_FN_VOID_INT
>  = build_function_type_list (void_type_node, integ

Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy

2013-08-27 Thread Jan Hubicka
> > +  while (x != xlimit)
> > +{
> > +  /* Code common to all symtab nodes. */
> > +  gt_ggc_m_9tree_node (x->decl);
> > +  gt_ggc_mx_symtab_node_base (x->next);
> > +  gt_ggc_mx_symtab_node_base (x->previous);
> > +  gt_ggc_mx_symtab_node_base (x->next_sharing_asm_name);
> > +  gt_ggc_mx_symtab_node_base (x->previous_sharing_asm_name);
> > +  gt_ggc_mx_symtab_node_base (x->same_comdat_group);
> > +  gt_ggc_m_20vec_ipa_ref_t_va_gc_ (x->ref_list.references);
> > +  gt_ggc_m_9tree_node (x->alias_target);
> > +  gt_ggc_m_18lto_file_decl_data (x->lto_file_data);
> 
> Don't we have template specializations so we just can do
> 
>  gt_ggc_mark (x->decl);
>  gt_ggc_mark (x->next);
Yep, that is what I hope for.
> ...
> etc?
> 
> Also all of the symbol table is reachable from the global symbol_table
> dynamic array which is a GC root.  So instead of walking ->next/previous
> and edges you should have a custom marker for the symbol_table global
> which does more efficient marking with loops.

Indeed, good point!
All of the pointers linking symbol table together (i.e. pointer to other symbol
table entries) can be ignored for PCH mark (since we are explicitely allocated,
and all we need is to mark the trees/lto data and other stuff we point to.

These are annotated primarily to get PCH working, where we obviously need
to keep them.

It would be nice to progress on this patch soon :)

Honza
> 
> Richard.


Re: Stream TYPE_FINAL_P and DECL_FINAL_P to LTO

2013-08-27 Thread Richard Biener
On Sun, Aug 25, 2013 at 5:23 PM, Jan Hubicka  wrote:
> Hi,
> this patch adds code to stream DECL_FINAL_P/TYPE_FINAL_P into LTO so we can
> use it after the ipa-devirt code at LTO time is merged in.
> (http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01007.html)
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> Honza
>
> * lto-streamer-out.c (hash_tree): Add streaming of DECL_FINAL_P
> and TYPE_FINAL_P
> * lto/lto.c (compare_tree_sccs_1): Add TYPE_FINAL_P and DECL_FINAL_P
> * tree-streamer-out.c (pack_ts_decl_with_vis_value_fields):
> Add DECL_FINAL_P.
> (pack_ts_type_common_value_fields): Add TYPE_FINAL_P.
> * tree-streamer-in.c (unpack_ts_decl_with_vis_value_fields):
> Add DECL_FINAL_P.
> (unpack_ts_type_common_value_fields): Add TYPE_FINAL_P.
>
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c  (revision 201974)
> +++ lto-streamer-out.c  (working copy)
> @@ -798,6 +798,9 @@ hash_tree (struct streamer_tree_cache_d
> v);
>   v = iterative_hash_host_wide_int (DECL_TLS_MODEL (t), v);
> }
> +  if (TREE_CODE (t) == FUNCTION_DECL)
> +   v = iterative_hash_host_wide_int (DECL_FINAL_P (t),
> + v);
>if (VAR_OR_FUNCTION_DECL_P (t))
> v = iterative_hash_host_wide_int (DECL_INIT_PRIORITY (t), v);
>  }
> @@ -838,7 +841,10 @@ hash_tree (struct streamer_tree_cache_d
> | (TYPE_USER_ALIGN (t) << 5)
> | (TYPE_READONLY (t) << 6), v);
>if (RECORD_OR_UNION_TYPE_P (t))
> -   v = iterative_hash_host_wide_int (TYPE_TRANSPARENT_AGGR (t), v);
> +   {
> + v = iterative_hash_host_wide_int (TYPE_TRANSPARENT_AGGR (t), v);
> + v = iterative_hash_host_wide_int (TYPE_FINAL_P (t), v);

please use | (TYPE_FINAL_P (t) << 1) to speed this up like in other cases.

Otherwise ok.

Thanks,
Richard.

> +   }
>else if (code == ARRAY_TYPE)
> v = iterative_hash_host_wide_int (TYPE_NONALIASED_COMPONENT (t), v);
>v = iterative_hash_host_wide_int (TYPE_PRECISION (t), v);
> Index: lto/lto.c
> ===
> --- lto/lto.c   (revision 201974)
> +++ lto/lto.c   (working copy)
> @@ -1892,6 +1892,7 @@ compare_tree_sccs_1 (tree t1, tree t2, t
>compare_values (DECL_DISREGARD_INLINE_LIMITS);
>compare_values (DECL_PURE_P);
>compare_values (DECL_LOOPING_CONST_OR_PURE_P);
> +  compare_values (DECL_FINAL_P);
>if (DECL_BUILT_IN_CLASS (t1) != NOT_BUILT_IN)
> compare_values (DECL_FUNCTION_CODE);
>if (DECL_STATIC_DESTRUCTOR (t1))
> @@ -1905,7 +1906,10 @@ compare_tree_sccs_1 (tree t1, tree t2, t
>compare_values (TYPE_NO_FORCE_BLK);
>compare_values (TYPE_NEEDS_CONSTRUCTING);
>if (RECORD_OR_UNION_TYPE_P (t1))
> -   compare_values (TYPE_TRANSPARENT_AGGR);
> +   {
> + compare_values (TYPE_TRANSPARENT_AGGR);
> + compare_values (TYPE_FINAL_P);
> +   }
>else if (code == ARRAY_TYPE)
> compare_values (TYPE_NONALIASED_COMPONENT);
>compare_values (TYPE_PACKED);
> Index: tree-streamer-out.c
> ===
> --- tree-streamer-out.c (revision 201974)
> +++ tree-streamer-out.c (working copy)
> @@ -242,6 +242,8 @@ pack_ts_decl_with_vis_value_fields (stru
>bp_pack_value (bp, DECL_TLS_MODEL (expr),  3);
>  }
>
> +  if (TREE_CODE (expr) == FUNCTION_DECL)
> +bp_pack_value (bp, DECL_FINAL_P (expr), 1);
>if (VAR_OR_FUNCTION_DECL_P (expr))
>  bp_pack_var_len_unsigned (bp, DECL_INIT_PRIORITY (expr));
>  }
> @@ -293,7 +295,10 @@ pack_ts_type_common_value_fields (struct
>bp_pack_value (bp, TYPE_NO_FORCE_BLK (expr), 1);
>bp_pack_value (bp, TYPE_NEEDS_CONSTRUCTING (expr), 1);
>if (RECORD_OR_UNION_TYPE_P (expr))
> -bp_pack_value (bp, TYPE_TRANSPARENT_AGGR (expr), 1);
> +{
> +  bp_pack_value (bp, TYPE_TRANSPARENT_AGGR (expr), 1);
> +  bp_pack_value (bp, TYPE_FINAL_P (expr), 1);
> +}
>else if (TREE_CODE (expr) == ARRAY_TYPE)
>  bp_pack_value (bp, TYPE_NONALIASED_COMPONENT (expr), 1);
>bp_pack_value (bp, TYPE_PACKED (expr), 1);
> Index: tree-streamer-in.c
> ===
> --- tree-streamer-in.c  (revision 201974)
> +++ tree-streamer-in.c  (working copy)
> @@ -275,6 +275,8 @@ unpack_ts_decl_with_vis_value_fields (st
>DECL_TLS_MODEL (expr) = (enum tls_model) bp_unpack_value (bp,  3);
>  }
>
> +  if (TREE_CODE (expr) == FUNCTION_DECL)
> +DECL_FINAL_P (expr) = (unsigned) bp_unpack_value (bp, 1);
>if (VAR_OR_FUNCTION_DECL_P (expr))
>  {
>priority_type p;
> @@ -346,7 +348,10 @@ unpack_ts_type_common_value_fields (stru
>  

Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)

2013-08-27 Thread Richard Biener
On Tue, Jul 30, 2013 at 4:47 PM, Zoran Jovanovic
 wrote:
> Thank you for the reply.
> I am in the process of modifying the patch according to some comments 
> received.
> Currently I am considering the usage of DECL_BIT_FIELD_REPRESENTATIVE.
> I see that they can be used during analysis phase for deciding which accesses 
> can be merged - only accesses with same representative will be merged.
> I have more dilemmas with usage of representatives for lowering. If my 
> understanding is correct bit field representative can only be associated to a 
> field declaration, and not to a BIT_FIELD_REF node. As a consequence 
> optimization must use COMPONENT_REF to model new bit field access (which 
> should be an equivalent of several merged accesses). To use COMPONENT_REF a 
> new field declaration with appropriate bit size (equal to sum of bit sizes of 
> all merged bit field accesses) must be created and then corresponding bit 
> field representative could be attached.
> Is my understanding correct? Is creating a new field declaration for every 
> set of merged bit field accesses acceptable?

You should just use the DECL_BIT_FIELD_REPRESENTATIVE FIELD_DECL, not
create any new FIELD_DECLs.  Also you can use BIT_FIELD_REFs but you need to
query the proper bit position / size from the DECL_BIT_FIELD_REPRESENTATIVE
FIELD_DECL.  Using the COMPONENT_REF is better though, I think.

Full patch queued for review (but you might want to update it to not create new
FIELD_DECLs).

Richard.

>
> Regards,
> Zoran Jovanovic
>
> 
> From: Richard Biener [richard.guent...@gmail.com]
> Sent: Thursday, July 18, 2013 11:31 AM
> To: Zoran Jovanovic; gcc-patches@gcc.gnu.org
> Cc: Petar Jovanovic
> Subject: Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc 
> inside)
>
> Zoran Jovanovic  wrote:
>
>>Hello,
>>This patch adds new optimization pass that combines several adjacent
>>bit field accesses that copy values from one memory location to another
>>into single bit field access.
>>
>>Example:
>>
>>Original code:
>>   D.1351;
>>   D.1350;
>>   D.1349;
>>  D.1349_2 = p1_1(D)->f1;
>>  p2_3(D)->f1 = D.1349_2;
>>  D.1350_4 = p1_1(D)->f2;
>>  p2_3(D)->f2 = D.1350_4;
>>  D.1351_5 = p1_1(D)->f3;
>>  p2_3(D)->f3 = D.1351_5;
>>
>>Optimized code:
>>   D.1358;
>>  D.1358_10 = BIT_FIELD_REF <*p1_1(D), 19, 13>;
>>  BIT_FIELD_REF <*p2_3(D), 19, 13> = D.1358_10;
>>
>>Algorithm works on basic block level and consists of following 3 major
>>steps:
>>1. Go trough basic block statements list. If there are statement pairs
>>that implement copy of bit field content from one memory location to
>>another record statements pointers and other necessary data in
>>corresponding data structure.
>>2. Identify records that represent adjacent bit field accesses and mark
>>them as merged.
>>3. Modify trees accordingly.
>
> All this should use BITFIELD_REPRESENTATIVE both to decide what accesses are 
> related and for the lowering. This makes sure to honor the appropriate memory 
> models.
>
> In theory only lowering is necessary and FRE and DSE will do the job of 
> optimizing - also properly accounting for alias issues that Joseph mentions. 
> The lowering and analysis is strongly related to SRA So I don't believe we 
> want a new pass for this.
>
> Richard.
>
>


Re: [PATCH 1/n] Add conditional compare support

2013-08-27 Thread Richard Biener
On Wed, Aug 21, 2013 at 12:23 PM, Zhenqiang Chen  wrote:
> Hi,
>
> The attached patch is the basic support for conditional compare (CCMP). It
> adds
> a set of keywords on TREE to represent CCMP:
>
> DEFTREECODE (TRUTH_ANDIF_LT_EXPR, "truth_andif_lt_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ANDIF_LE_EXPR, "truth_andif_le_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ANDIF_GT_EXPR, "truth_andif_gt_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ANDIF_GE_EXPR, "truth_andif_ge_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ANDIF_EQ_EXPR, "truth_andif_eq_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ANDIF_NE_EXPR, "truth_andif_ne_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ORIF_LT_EXPR, "truth_orif_lt_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ORIF_LE_EXPR, "truth_orif_le_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ORIF_GT_EXPR, "truth_orif_gt_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ORIF_GE_EXPR, "truth_orif_ge_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ORIF_EQ_EXPR, "truth_orif_eq_expr", tcc_ccomparison, 3)
> DEFTREECODE (TRUTH_ORIF_NE_EXPR, "truth_orif_ne_expr", tcc_ccomparison, 3)
>
> To distinguish others, the patch dumps CCMP as
>
>   && to "?&&"
>   || to "?||"
>
> A CCMP operator has three operands: two are from the compare and the other
> is from the result of previous compare. To reuse current codes, the result
> of
> previous compare is in TREE_OPERAND (ccmp, 2)/gimple_assign_rhs3. e.g.
>
>   r = (a > b) && (c > d)
>
> with CCMP, it will be
>
>   t1 = GT_EXPR (a, b)
>   r  = TRUTH_ANDIF_GT_EXPR (c, d, t1)
>
> The patch does not include ops to expand CCMP to RTL. It just roll CCMP back
> to BIT operators. So with a dummy "conditional_compare", we can test
> the patch in any port.
> In the patch, dummy conditional_compare for i386 and arm are added for test.
>
> Bootstrap on x86-64 and ARM Chromebook.
> No ICE and runtime errors in regression tests.

What's this for and what's the desired semantics? I don't like having
extra tree codes for this.  Is this for a specific instruction set
feature?

Richard.

> Thanks!
> -Zhenqiang
>
> ChangeLog:
> 2013-08-21  Zhenqiang Chen  
>
> * tree.def: Add conditional compare ops: TRUTH_ANDIF_LT_EXPR,
> TRUTH_ANDIF_LE_EXPR, TRUTH_ANDIF_GT_EXPR, TRUTH_ANDIF_GE_EXPR,
> TRUTH_ANDIF_EQ_EXPR, TRUTH_ANDIF_NE_EXPR, TRUTH_ORIF_LT_EXPR,
> TRUTH_ORIF_LE_EXPR, TRUTH_ORIF_GT_EXPR, TRUTH_ORIF_GE_EXPR,
> TRUTH_ORIF_EQ_EXPR, TRUTH_ORIF_NE_EXPR.
> * tree.h: Add misc utils for conditional compare.
> * tree.c (tree_node_structure_for_code, tree_code_size,
> record_node_allocation_statistics, contains_placeholder_p,
> find_placeholder_in_expr, substitute_in_expr,
> substitute_placeholder_in_expr, stabilize_reference_1, build2_stat,
> simple_cst_equal): Handle conditional compare.
> (get_code_from_ccompare_expr): New added.
> (generate_ccompare_code ): New added.
> * c-family/c-pretty-print.c (pp_c_expression): Handle conditional
> compare.
> * cp/error.c (dump_expr): Likewise.
> * cp/semantics.c (cxx_eval_constant_expression): Likewise.
> (potential_constant_expression_1): Likewise.
> (cxx_eval_ccmp_expression): New added.
> * cfgexpand.c (expand_debug_expr): Handle conditional compare.
> * expr.c (safe_from_p, expand_expr_real_2): Likewise.
> (expand_ccmp_to_bitop): New added.
> * gimple-pretty-print.c (dump_ccomparison_rhs): New added.
> (dump_gimple_assign): Handle conditional compare.
> * print-tree.c (print_node): Likewise
> * tree-dump.c (dequeue_and_dump): Likewise.
> * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
> * gimple.c (recalculate_side_effects): Likewise.
> (get_gimple_rhs_num_ops): Likewise
> * gimplify.c (goa_stabilize_expr, gimplify_expr, gimple_boolify):
> Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * tree-ssa-operands.c (get_expr_operands): Likewise.
> * tree-ssa-loop-niter.c (get_val_for): Likewise.
> * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> (verify_gimple_comparison_operands): New added.
> (verify_gimple_comparison): Call verify_gimple_comparison_operands.
> * fold-const.c (fold_truth_andor): Generate conditinal compare.
> * lra-constraints.c (remove_inheritance_pseudos):
> Initialize variable "set" to NULL_RTX.


Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy

2013-08-27 Thread Richard Biener
On Fri, Aug 16, 2013 at 2:57 AM, David Malcolm  wrote:
> This patch is the handwritten part of the conversion of these types
> to C++; it requires the followup patch, which is autogenerated.
>
> It converts:
>   struct GTY(()) symtab_node_base
> to:
>   class GTY((user)) symtab_node_base
>
> and converts:
>   struct GTY(()) cgraph_node
> to:
>   struct GTY((user)) cgraph_node : public symtab_node_base
>
> and:
>   struct GTY(()) varpool_node
> to:
>   class GTY((user)) varpool_node : public symtab_node_base
>
> dropping the symtab_node_def union.
>
> Since gengtype is unable to cope with inheritance, we have to mark the
> types with GTY((user)), and handcode the gty field-visiting functions.
> Given the simple hierarchy, we don't need virtual functions for this.
>
> Unfortunately doing so runs into various bugs in gengtype's handling
> of GTY((user)), so the patch also includes workarounds for these bugs.
>
> gengtype walks the graph of the *types* in the code, and produces
> functions in gtype-desc.[ch] for all types that are reachable from a
> GTY root.
>
> However, it ignores the contents of GTY((user)) types when walking
> this graph.
>
> Hence if you have a subgraph of types that are only reachable
> via fields in GTY((user)) types, gengtype won't generate helper code
> for those types.
>
> Ideally there would be some way to mark a GTY((user)) type to say
> which types it references, to avoid having to mark these types as
> GTY((user)).
>
> For now, work around this issue by providing explicit roots of the
> missing types, of dummy variables (see the bottom of cgraph.c)
>
> gcc/
>
> * cgraph.c (gt_ggc_mx): New.
> (gt_pch_nx (symtab_node_base *)): New.
> (gt_pch_nx (symtab_node_base *, gt_pointer_operator, void *)): New.
> (dummy_call_site_hash): New.
> (dummy_ipa_ref_list): New.
> (dummy_cgraph_clone_info): New.
> (dummy_ipa_replace_map_pointer): New.
> (dummy_varpool_node_ptr): New.
>
> * cgraph.h (symtab_node_base): Convert to a class;
> add GTY((user)).
> (gt_ggc_mx): New.
> (gt_pch_nx (symtab_node_base *p)): New.
> (gt_pch_nx (symtab_node_base *p, gt_pointer_operator op,
> void *cookie)): New.
> (cgraph_node): Inherit from symtab_node; convert to GTY((user)).
> (varpool_node): Likewise.
> (symtab_node_def): Remove.
> (is_a_helper ::test (symtab_node_def *)): Convert to...
> (is_a_helper ::test (symtab_node_base *)): ...this.
> (is_a_helper ::test (symtab_node_def *)): Convert to...
> (is_a_helper ::test (symtab_node_base *)): ...this.
>
> * ipa-ref.h (symtab_node_def): Drop.
> (symtab_node): Change underlying type from symtab_node_def to
> symtab_node_base.
> (const_symtab_node): Likwise.
>
> * is-a.h: Update examples in comment.
>
> * symtab.c (symtab_hash): Change symtab_node_def to symtab_node_base.
> (assembler_name_hash): Likewise.
> ---
>  gcc/cgraph.c  | 218 
> ++
>  gcc/cgraph.h  |  48 ++---
>  gcc/ipa-ref.h |   6 +-
>  gcc/is-a.h|   6 +-
>  gcc/symtab.c  |   4 +-
>  5 files changed, 247 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index 50d13ab..5cb6a31 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -3035,4 +3035,222 @@ cgraph_get_body (struct cgraph_node *node)
>return true;
>  }
>
> +/* GTY((user)) hooks for symtab_node_base (and its subclasses).
> +   We could use virtual functions for this, but given the presence of the
> +   "type" field and the trivial size of the class hierarchy, switches are
> +   perhaps simpler and faster.  */
> +
> +void gt_ggc_mx (symtab_node_base *x)
> +{
> +  /* Hand-written equivalent of the chain_next/chain_prev machinery, to
> + avoid deep call-stacks.
> +
> + Locate the neighbors of x (within the linked-list) that haven't been
> + marked yet, so that x through xlimit give a range suitable for marking.
> + Note that x (on entry) itself has already been marked by the
> + gtype-desc.c code, so we first try its successor.
> +  */
> +  symtab_node_base * xlimit = x ? x->next : NULL;
> +  while (ggc_test_and_set_mark (xlimit))
> +   xlimit = xlimit->next;
> +  if (x != xlimit)
> +for (;;)
> +  {
> +symtab_node_base * const xprev = x->previous;
> +if (xprev == NULL) break;
> +x = xprev;
> +(void) ggc_test_and_set_mark (xprev);
> +  }
> +  while (x != xlimit)
> +{
> +  /* Code common to all symtab nodes. */
> +  gt_ggc_m_9tree_node (x->decl);
> +  gt_ggc_mx_symtab_node_base (x->next);
> +  gt_ggc_mx_symtab_node_base (x->previous);
> +  gt_ggc_mx_symtab_node_base (x->next_sharing_asm_name);
> +  gt_ggc_mx_symtab_node_base (x->previous_sharing_asm_name);
> +  gt_ggc_mx_symtab_node_base (x->same_comdat_group);
> +  gt_ggc_m_20vec_ipa_ref

Re: [PATCH][4.8 backport] Fix PR57735

2013-08-27 Thread Richard Biener
On Mon, Jul 22, 2013 at 6:27 PM, Kyrylo Tkachov  wrote:
> Hi all,
>
> The fix for PR57735 is in current trunk (for a different issue I think), just
> needs a backport to 4.8.
> It is r198462 by Richard Sandiford:
>
> 2013-04-30 Richard Sandiford 
>
> * explow.c (plus_constant): Pass "mode" to immed_double_int_const.
> Use gen_int_mode rather than GEN_INT.
>
> Ok to backport to the 4.8 branch?
>
> I've attached the testcase that exposed the ICE. I think the ChangeLog would
> look like this:
>
>
> 2013-07-22  Kyrylo Tkachov  
>
> PR target/57735
> Backport from mainline
> 2013-04-30  Richard Sandiford  
>
> * explow.c (plus_constant): Pass "mode" to
> immed_double_int_const.
> Use gen_int_mode rather than GEN_INT.
>
> 2013-07-22  Kyrylo Tkachov  
>
> PR target/57735
> * g++.dg/ext/pr57735.C: New test.

Ok.  Please also add the testcase to the trunk.

Thanks,
Richard.

>
>
> Thanks,
> Kyrill


Re: Fix PR middle-end/57370

2013-08-27 Thread Richard Biener
On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
> I have a new patch that supersedes this. The new patch also fixes PR
> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
> no test regression on x86_64/linux. Ok for trunk?
>
> 2013-07-31  Easwaran Raman  
>
> PR middle-end/57370
> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
> of debug statements that cause inconsistent IR.

Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
at all.  The other hunks are ok, but we need to work harder to preserve
debug stmts - simply removing all is not going to fly.

Richard.

>
> testsuite/ChangeLog:
> 2013-07-31  Easwaran Raman  
>
> PR middle-end/57370
> PR tree-optimization/57393
> PR tree-optimization/58011
> * gfortran.dg/reassoc_12.f90: New testcase.
> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>
>
> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
>> Ping.
>>
>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  wrote:
>>> A newly generated statement in build_and_add_sum  function of
>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>>> statement. In one instance, it was assigned the wrong uid (of an
>>> earlier phi statement) which messed up the IR and caused the test
>>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>>> Ok for trunk?
>>>
>>> Thanks,
>>> Easwaran
>>>
>>>
>>> 2013-06-27  Easwaran Raman  
>>>
>>> PR middle-end/57370
>>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a 
>>> phi
>>> node for a non-phi gimple statement.
>>>
>>> testsuite/ChangeLog:
>>> 2013-06-27  Easwaran Raman  
>>>
>>> PR middle-end/57370
>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>>
>>>
>>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
>>> ===
>>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>> @@ -0,0 +1,74 @@
>>> +! { dg-do compile }
>>> +! { dg-options "-O2 -ffast-math" }
>>> +! PR middle-end/57370
>>> +
>>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
>>> +   grad_deriv,npoints, sx)
>>> +IMPLICIT REAL*8 (t)
>>> +INTEGER, PARAMETER :: dp=8
>>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
>>> +   e_ndrho_ndrho_rho
>>> +  DO ii=1,npoints
>>> +  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
>>> +t1425 = t233 * t557
>>> +t1429 = beta * t225
>>> +t1622 = t327 * t1621
>>> +t1626 = t327 * t1625
>>> +t1632 = t327 * t1631
>>> +t1685 = t105 * t1684
>>> +t2057 = t1636 + t8 * (t2635 + t3288)
>>> +  END IF
>>> +  IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN
>>> +t5469 = t5440 - t5443 - t5446 - t5449 - &
>>> +t5451 - t5454 - t5456 + t5459  - &
>>> +t5462 + t5466 - t5468
>>> +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
>>> +t5489 = 0.16e2_dp * t1429 * t1658
>>> +t5531 = 0.160e2_dp * t112 * t1626
>>> +t5533 = 0.160e2_dp * t112 * t1632
>>> +t5537 = 0.160e2_dp * t112 * t1622
>>> +t5541 = t5472 - t5478 - t5523 + t5525 + &
>>> +t5531 + t5533 + t5535 + t5537 + &
>>> +t5540
>>> +t5565 = t112 * t1685
>>> +t5575 = t5545 - t5548 + t5551 + t5553 - &
>>> +t5558 + t5560 - t5562 + t5564 - &
>>> +0.80e1_dp * t5565 + t5568 + t5572 + &
>>> +t5574
>>> +t5611 = t5579 - t5585 + t5590 - t5595 + &
>>> +t5597 - t5602 + t5604 + t5607 + &
>>> +t5610
>>> +t5613 = t5469 + t5541 + t5575 + t5611
>>> +t6223 = t6189 - &
>>> +0.36e0_dp  * t83 * t84 * t5613 + &
>>> +t6222
>>> +t6227 = - t8 * (t5305 + t6223)
>>> +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + &
>>> + t6227 * sx
>>> +t6352 = t5440 - t5443 - t5446 - t5449 - &
>>> +t5451 - t5454 + &
>>> +0.40e1_dp * t102  * t327 * t2057 * t557 - &
>>> +t5456 + t5459 - t5462 + t5466 - &
>>> +t5468
>>> +t6363 = t5480 - t5489 + &
>>> +0.96e2_dp  * t1054 * t640 * t3679
>>> +t6367 = t5472 - t5474 - t5478 - t5523 + &
>>> +t5525 + t5531 + t5533 + t5535 + &
>>> +t5537 - 0.20e1_dp * t102 * t105 * t6363 + &
>>> +t5540
>>> +t6370 = t5545 - t5548 + t5551 + t5553 

Re: [PATCH] Fix PR57521

2013-08-27 Thread Richard Biener
On Mon, 24 Jun 2013, Richard Biener wrote:

> 
> The following fixes a long-standing bug in tree if-conversion.
> The transform phase relies on being able to extract edge predicates
> by simply using the predicate under which its source block is
> executed.  That obviously isn't the correct one if the source
> block ends in a condition itself.  Thus the following patch
> verifies that each predecessor edge of blocks we will if-convert
> is non-critical.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

... to pick up where I left.

That didn't work out, we need to handle the case of one edge being
non-critical only to not regress.  I've dug in history where all the
strange existing tests come from - and they are all for bugs
exhibiting this very same problem, just they paper over it in various
interesting ways.

So the following removes all of them, re-doing the above fix in
a less conservative way.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2013-08-27  Richard Biener  

PR tree-optimization/57521
* tree-if-conv.c (if_convertible_bb_p): Verify that at least
one edge is non-critical.
(find_phi_replacement_condition): Make sure to use a non-critical
edge.  Cleanup and remove old bug workarounds.
(bb_postdominates_preds): Remove.
(if_convertible_loop_p_1): Do not compute post-dominators.
(combine_blocks): Do not free post-dominators.
(main_tree_if_conversion): Likewise.
(pass_data_if_conversion): Add TODO_verify_ssa.

* gcc.dg/torture/pr57521.c: New testcase.

Index: gcc/tree-if-conv.c
===
*** gcc/tree-if-conv.c  (revision 202017)
--- gcc/tree-if-conv.c  (working copy)
*** if_convertible_stmt_p (gimple stmt, vec<
*** 797,816 
return true;
  }
  
- /* Return true when BB post-dominates all its predecessors.  */
- 
- static bool
- bb_postdominates_preds (basic_block bb)
- {
-   unsigned i;
- 
-   for (i = 0; i < EDGE_COUNT (bb->preds); i++)
- if (!dominated_by_p (CDI_POST_DOMINATORS, EDGE_PRED (bb, i)->src, bb))
-   return false;
- 
-   return true;
- }
- 
  /* Return true when BB is if-convertible.  This routine does not check
 basic block's statements and phis.
  
--- 797,802 
*** if_convertible_bb_p (struct loop *loop,
*** 868,877 
return false;
}
  
!   if (EDGE_COUNT (bb->preds) == 2
!   && bb != loop->header
!   && !bb_postdominates_preds (bb))
! return false;
  
return true;
  }
--- 854,876 
return false;
}
  
!   /* At least one incoming edge has to be non-critical as otherwise edge
!  predicates are not equal to basic-block predicates of the edge
!  source.  */
!   if (EDGE_COUNT (bb->preds) > 1
!   && bb != loop->header)
! {
!   bool found = false;
!   FOR_EACH_EDGE (e, ei, bb->preds)
!   if (EDGE_COUNT (e->src->succs) == 1)
! found = true;
!   if (!found)
!   {
! if (dump_file && (dump_flags & TDF_DETAILS))
!   fprintf (dump_file, "only critical predecessors\n");
! return false;
!   }
! }
  
return true;
  }
*** if_convertible_loop_p_1 (struct loop *lo
*** 1084,1090 
  return false;
  
calculate_dominance_info (CDI_DOMINATORS);
-   calculate_dominance_info (CDI_POST_DOMINATORS);
  
/* Allow statements that can be handled during if-conversion.  */
ifc_bbs = get_loop_body_in_if_conv_order (loop);
--- 1083,1088 
*** if_convertible_loop_p (struct loop *loop
*** 1220,1227 
 if-conversion.  */
  
  static basic_block
! find_phi_replacement_condition (struct loop *loop,
!   basic_block bb, tree *cond,
gimple_stmt_iterator *gsi)
  {
edge first_edge, second_edge;
--- 1218,1224 
 if-conversion.  */
  
  static basic_block
! find_phi_replacement_condition (basic_block bb, tree *cond,
gimple_stmt_iterator *gsi)
  {
edge first_edge, second_edge;
*** find_phi_replacement_condition (struct l
*** 1231,1264 
first_edge = EDGE_PRED (bb, 0);
second_edge = EDGE_PRED (bb, 1);
  
!   /* Use condition based on following criteria:
!  1)
!S1: x = !c ? a : b;
! 
!S2: x = c ? b : a;
! 
!S2 is preferred over S1. Make 'b' first_bb and use its condition.
! 
!  2) Do not make loop header first_bb.
! 
!  3)
!S1: x = !(c == d)? a : b;
! 
!S21: t1 = c == d;
!S22: x = t1 ? b : a;
! 
!S3: x = (c == d) ? b : a;
! 
!S3 is preferred over S1 and S2*, Make 'b' first_bb and use
!its condition.
! 
!  4) If  pred B is dominated by pred A then use pred B's condition.
! See PR23115.  */
! 
!   /* Select condition that is not TRUTH_NOT_EXPR.  */
tmp_cond = bb_predicate (first_

Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed

2013-08-27 Thread Richard Biener
On Sat, Aug 3, 2013 at 2:21 AM, Richard Henderson  wrote:
> On 08/02/2013 11:53 AM, David Malcolm wrote:
>> FWIW I had a go at avoiding templates by attempting to tell gengtype to
>> write out functions for all GTY((user)) types, regardless of whether it
>> thinks they're referenced, with this:
>> @@ -3697,7 +3697,8 @@ write_types (outf_p output_header, type_p structures, 
>> type_p param_structs,
>>/* At last we emit the functions code.  */
>>oprintf (output_header, "\n/* functions code */\n");
>>for (s = structures; s; s = s->next)
>> -if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO)
>> +if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO
>> +|| s->kind == TYPE_USER_STRUCT)
>>{
>> options_p opt;
>>
>> but I ran into various "incomplete structure" errors due to gengtype's
>> lack of a full C/C++ parser.
>>
>> Hence templates seem the sanest option.
>
> I was actually contemplating going the other direction -- have gengtype
> write out _less_, relying on the templates more.  I hadn't gone very far
> down that road yet...

That was the original idea, btw (well, the original idea was to get rid of
gengtype completely of course).

Btw, I don't like moving the pass pipeline and pass objects into GC space.
It doesn't make much sense as pass execution is quite constrained so
passes should not allocate objects that extend the passes lifetime.

Richard.


Re: [PATCH] Fix illegal cast to rtx (*insn_gen_fn) (rtx, ...)

2013-08-27 Thread Stefan Kristiansson
On Tue, Aug 27, 2013 at 11:03:32AM +0200, Richard Biener wrote:
> On Wed, Jul 10, 2013 at 3:14 AM, Stefan Kristiansson
>  wrote:
> > The (static arg) generator functions are casted to a var arg
> > function pointer, making the assumption that the ABI for passing
> > the arguments will be the same as for static arguments.
> > This isn't a valid assumption on all architectures, var args might for
> > example be passed on the stack, even if there would be function argument
> > registers still available.
> >
> > There exists a bugreport for the problem here:
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12081
> >
> > This patch is taken from the suggestion by Rask Ingemann Lambertsen
> > and updated to the current svn tip of trunk.
> >
> > gcc/Changelog:
> >
> > 2013-07-10  Stefan Kristiansson  
> >
> > PR target/12081
> > * recog.h (rtx (*insn_gen_fn) (rtx, ...)): Remove typedef.
> > (genfun): Replace var arg function pointer with static argument
> > function pointers.
> > * optabs.h (GEN_FCN): Replace single define with ones of the
> > form GEN_FCN?, where ? is the number of arguments.
> > * reload1.c: Use GEN_FCN? instead of GEN_FCN
> 
> Can't we use a template for all this to avoid the union and *[12345] mess?
> 

Oleg Endo proposed a cleaner approach using functors here:
http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01315.html
and AFAIK that (or something close to it) is already applied.

Stefan


Re: [Patch, Fortran, OOP] PR 57305: ICE when calling SIZEOF on an unlimited polymorphic variable

2013-08-27 Thread Mikael Morin
Le 26/08/2013 16:16, Janus Weil a écrit :
 I'm slightly inclined to kindly invite the user to switch to
 STORAGE_SIZE+SIZE instead.  Any other opinion?
>>>
>>> Since the SIZEOF intrinsic has been around for some time in gfortran
>>> (before STORAGE_SIZE was available), I would say we should at least
>>> continue to support it for backward compatibility. And I guess we
>>> should also make it behave reasonably for all inputs. However, it
>>> might be worth to add a note in the documentation that STORAGE_SIZE
>>> and SIZE should be used instead in standard-conforming code.
>>>
>> I thought about it again, and what I'm actually in favor of is
>> diagnosing by default _all_ extensions having a standard replacement.
> 
> By 'diagnosing' you mean to give a warning? This can already be
> accomplished by compiling with "-std=f2008 -Wintrinsics-std", I guess
> (not by default, though).
> 
Yes, it would boil down to using that by default.

> Using only -std=f2008 currently results in errors like:
> 
> undefined reference to `sizeof_'
> 
... which is the best we can do, as sizeof could be a user
externally-defined function.

Mikael


[committed] Add testcases from PR578{60,61,75,76,77}

2013-08-27 Thread Jakub Jelinek
Hi!

These issues were caused by the LRA optional reloads support that got
reverted later on, but the testcases look to be useful for the testsuite,
so I went ahead and checked them all in.

2013-08-27  Jakub Jelinek  

PR rtl-optimization/57860
PR rtl-optimization/57861
PR rtl-optimization/57875
PR rtl-optimization/57876
PR rtl-optimization/57877
* gcc.c-torture/execute/pr57860.c: New test.
* gcc.c-torture/execute/pr57861.c: New test.
* gcc.c-torture/execute/pr57875.c: New test.
* gcc.c-torture/execute/pr57876.c: New test.
* gcc.c-torture/execute/pr57877.c: New test.

--- gcc/testsuite/gcc.c-torture/execute/pr57860.c.jj2013-08-27 
11:16:33.262457070 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57860.c   2013-08-27 
11:15:19.0 +0200
@@ -0,0 +1,25 @@
+/* PR rtl-optimization/57860 */
+
+extern void abort (void);
+int a, *b = &a, c, d, e, *f = &e, g, *h = &d, k[1] = { 1 };
+
+int
+foo (int p)
+{
+  for (;; g++)
+{
+  for (; c; c--);
+  *f = *h = p > ((0x1LL ^ a) & *b);
+  if (k[g])
+   return 0;
+}
+}
+
+int
+main ()
+{
+  foo (1);
+  if (d != 1)
+abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr57861.c.jj2013-08-27 
11:12:46.882684096 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57861.c   2013-08-27 
11:12:07.0 +0200
@@ -0,0 +1,33 @@
+/* PR rtl-optimization/57861 */
+
+extern void abort (void);
+short a = 1, f;
+int b, c, d, *g = &b, h, i, j;
+unsigned int e;
+
+static int
+foo (char p)
+{
+  int k;
+  for (c = 0; c < 2; c++)
+{
+  i = (j = 0) || p;
+  k = i * p;
+  if (e < k)
+   {
+ short *l = &f;
+ a = d && h;
+ *l = 0;
+   }
+}
+  return 0;
+}
+
+int
+main ()
+{
+  *g = foo (a);
+  if (a != 0)
+abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr57875.c.jj2013-08-27 
11:06:09.256786405 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57875.c   2013-08-27 
11:04:58.0 +0200
@@ -0,0 +1,21 @@
+/* PR rtl-optimization/57875 */
+
+extern void abort (void);
+int a[1], b, c, d, f, i;
+char e[1];
+
+int
+main ()
+{
+  for (; i < 1; i++)
+if (!d)
+  {
+   if (!c)
+ f = 2;
+   e[0] &= f ^= 0;
+  }
+  b = a[e[0] >> 1 & 1];
+  if (b != 0)
+abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr57876.c.jj2013-08-27 
11:06:12.224775684 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57876.c   2013-08-27 
11:04:20.0 +0200
@@ -0,0 +1,27 @@
+/* PR rtl-optimization/57876 */
+
+extern void abort (void);
+int a, b = 1, c, *d = &c, f, *g, h, j;
+static int e;
+
+int
+main ()
+{
+  int i;
+  for (i = 0; i < 2; i++)
+{
+  long long k = b;
+  int l;
+  for (f = 0; f < 8; f++)
+   {
+ int *m = &e;
+ j = *d;
+ h = a * j - 1;
+ *m = (h == 0) < k;
+ g = &l;
+   }
+}
+  if (e != 1)
+abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr57877.c.jj2013-08-27 
11:06:15.310759088 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57877.c   2013-08-27 
11:03:56.0 +0200
@@ -0,0 +1,28 @@
+/* PR rtl-optimization/57877 */
+
+extern void abort (void);
+int a, b, *c = &b, e, f = 6, g, h;
+short d;
+
+static unsigned char
+foo (unsigned long long p1, int *p2)
+{
+  for (; g <= 0; g++)
+{
+  short *i = &d;
+  int *j = &e;
+  h = *c;
+  *i = h;
+  *j = (*i == *p2) < p1;
+}
+  return 0;
+}
+
+int
+main ()
+{
+  foo (f, &a);
+  if (e != 1)
+abort ();
+  return 0;
+}

Jakub


Re: [AArch64] Rewrite the vdup_lane intrinsics in C

2013-08-27 Thread Marcus Shawcroft
On 9 August 2013 10:48, James Greenhalgh  wrote:

> ---
> gcc/
>
> 2013-08-09  James Greenhalgh  
>
> * config/aarch64/aarch64-simd-builtins.def
> (dup_lane_scalar): Remove.
> * config/aarch64/aarch64-simd.md
> (aarch64_simd_dup): Add 'w->w' alternative.
> (aarch64_dup_lane): Allow for VALL.
> (aarch64_dup_lane_scalar): Remove.
> (aarch64_dup_lane_): New.
> (aarch64_get_lane_signed): Add w->w altenative.
> (aarch64_get_lane_unsigned): Likewise.
> (aarch64_get_lane): Likewise.
> * config/aarch64/aarch64.c (aarch64_evpc_dup): New.
> (aarch64_expand_vec_perm_const_1): Use aarch64_evpc_dup.
> * config/aarch64/iterators.md (VSWAP_WIDTH): New.
> (VCON): Change container of V2SF.
> (vswap_width_name): Likewise.
> * config/aarch64/arm_neon.h
> (__aarch64_vdup_lane_any): New.
> (__aarch64_vdup_lane_<8,16,32,64>): Likewise.
> (vdup_n_<8,16,32,64>): Convert to C implementation.
> (vdup_lane_<8,16,32,64>): Likewise.
>
> gcc/testsuite/
>
> 2013-08-09  James Greenhalgh  
>
> * gcc.target/aarch64/scalar_intrinsics.c
> (vdup_lane<8,16,32,64>): Force values to SIMD registers.

OK
/Marcus


Re: SSA identifiers

2013-08-27 Thread Richard Biener
On Thu, Jun 27, 2013 at 8:52 PM, Andrew MacLeod  wrote:
> On 06/27/2013 02:39 PM, Jakub Jelinek wrote:
>>
>>
>> in tree-ssanames.c:release_ssa_names() :
>>
>> if (! SSA_NAME_IN_FREE_LIST (var))
>>  {
>>tree saved_ssa_name_var = SSA_NAME_VAR (var);
>>int saved_ssa_name_version = SSA_NAME_VERSION (var);
>>use_operand_p imm = &(SSA_NAME_IMM_USE_NODE (var));
>> <..>
>> /* Hopefully this can go away once we have the new incremental
>>   SSA updating code installed.  */
>>SET_SSA_NAME_VAR_OR_IDENTIFIER (var, saved_ssa_name_var);
>> I don't see a big issue with this, sure, you could
>> tree saved_ssa_name_identifier = saved_ssa_name_var ? saved_ssa_name_var :
>> SSA_NAME_IDENTIFIER (var);
>> and use that instead in SET_SSA_NAME_VAR_OR_IDENTIFIER.
>
>
> Yeah I wasn't too concerned about this one, the outof-ssa case looked like
> more of a possible issue.  Maybe neither is, they just popped out as
> inconsistent uses.

Restoring SSA_NAME_VAR_OR_IDENTIFIER is only for debugging.  Yes,
it probably should save SSA_NAME_VAR_OR_IDENTIFIER instead of SSA_NAME_VAR.

Richard.

> Andrew
>


Re: [PATCH] Improve jump threading using VRP information

2013-08-27 Thread Richard Biener
On Thu, Jun 27, 2013 at 6:29 AM, Jeff Law  wrote:
>
> Just something else I saw while analyzing dumps from an unrelated set of
> changes.
>
> It's relatively common to see sequences like this:
>
>   # parent_1 = PHI 
>   _11 = single_tree_10(D) != 0;
>   _12 = parent_1 == 0B;
>   _13 = _11 & _12;
>   if (_13 != 0)
> goto ;
>   else
> goto ;
>
> Where VRP can deduce that the value of parent_6 has a nonzero value on one
> (or more) of the paths reaching this block (because those paths dereference
> parent_6).
>
> Obviously when VRP knows parent_6 is nonzero, then we know the current block
> will always transfer control to BB 7.
>
> Or something like this:
>
>  :
>   # prephitmp_49 = PHI <1(4), pretmp_48(5)>
>   prev_line.31_6 = prev_line;
>   _8 = prev_line.31_6 != line_7(D);
>   line_differs_9 = (unsigned char) _8;
>   _10 = prephitmp_49 | line_differs_9;
>   if (_10 != 0)
> goto ;
>   else
> goto ;
>
>
> We may not know the exact value of _10, but VRP can determine that when BB6
> is reached from BB4 that BB6 will always transfer control to BB8.
>
>
> I never coded up the bits to utilize VRP information to simplify statements
> for threading except for COND_EXPRs.  It wasn't clear how often it would be
> useful, thus I took the easy way out.  This picks up a hundred or so
> additional threading opportunities in my testfiles.  Not huge, but worth it
> IMHO.
>
> Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
>
> OK for trunk?

Ok if it still applies.

Thanks,
Richard.

>
>
>
> * tree-vrp.c (simplify_stmt_for_jump_threading): Try to
> simplify assignments too.  If the RHS collapses to a singleton
> range, then return the value for the range.
>
> * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
> new file mode 100644
> index 000..9d9473e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
> +
> +
> +struct basic_block_def;
> +typedef struct basic_block_def *basic_block;
> +enum gimple_code
> +{
> +  LAST_AND_UNUSED_GIMPLE_CODE
> +};
> +struct omp_region
> +{
> +  struct omp_region *outer;
> +  basic_block cont;
> +};
> +void
> +build_omp_regions_1 (basic_block bb, struct omp_region *parent,
> +unsigned char single_tree, enum gimple_code code)
> +{
> +  if (code == 25)
> +parent = parent->outer;
> +  else if (code == 42)
> +parent->cont = bb;
> +  if (single_tree && !parent)
> +return;
> +  oof ();
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp1" }  } */
> +/* { dg-final { cleanup-tree-dump "vrp1" } } */
> +
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index ec7ef8f..683a3db 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -9109,15 +9109,27 @@ static vec equiv_stack;
>  static tree
>  simplify_stmt_for_jump_threading (gimple stmt, gimple within_stmt)
>  {
> -  /* We only use VRP information to simplify conditionals.  This is
> - overly conservative, but it's unclear if doing more would be
> - worth the compile time cost.  */
> -  if (gimple_code (stmt) != GIMPLE_COND)
> -return NULL;
> +  if (gimple_code (stmt) == GIMPLE_COND)
> +return vrp_evaluate_conditional (gimple_cond_code (stmt),
> +gimple_cond_lhs (stmt),
> +gimple_cond_rhs (stmt), within_stmt);
> +
> +  if (gimple_code (stmt) == GIMPLE_ASSIGN)
> +{
> +  value_range_t new_vr = VR_INITIALIZER;
> +  tree lhs = gimple_assign_lhs (stmt);
> +
> +  if (TREE_CODE (lhs) == SSA_NAME
> + && (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> + || POINTER_TYPE_P (TREE_TYPE (lhs
> +   {
> + extract_range_from_assignment (&new_vr, stmt);
> + if (range_int_cst_singleton_p (&new_vr))
> +   return new_vr.min;
> +   }
> +}
>
> -  return vrp_evaluate_conditional (gimple_cond_code (stmt),
> -  gimple_cond_lhs (stmt),
> -  gimple_cond_rhs (stmt), within_stmt);
> +  return NULL_TREE;
>  }
>
>  /* Blocks which have more than one predecessor and more than
>


Re: Document __builtin_isinf_sign more precisely

2013-08-27 Thread Richard Biener
On Wed, Jun 26, 2013 at 4:16 PM, Marc Glisse  wrote:
> Ping http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00475.html

Ok.

Thanks,
Richard.

>
> On Sun, 9 Jun 2013, Marc Glisse wrote:
>
>> Hello,
>>
>> this patch documents that __builtin_isinf_sign returns +-1 for +-Inf. This
>> builtin was created so it could be used by a libc that has a stronger
>> guarantee than the standard, and for glibc that means returning +-1, which
>> is what the code already does.
>>
>> 2013-06-10  Marc Glisse  
>>
>> PR middle-end/57219
>> * doc/extend.texi (__builtin_isinf_sign): Restrict the return
>> values to -1, 0 and 1.
>
>
> --
> Marc Glisse


Re: [patch] Fix error recovery issue with alias

2013-08-27 Thread Richard Biener
On Tue, Jun 25, 2013 at 9:23 AM, Eric Botcazou  wrote:
> Hi,
>
> this fixes a segfault on a malformed alias declaration, which is correctly
> flagged as an error by handle_alias_pairs:
>
> error: 'Linker_Alias.Var' aliased to undefined symbol 'var2'
>
> but is nevertheless later processed as a valid alias by the cgraph machinery.
>
> Bootstrapped/regtested on x86_64-suse-linux, OK for the mainline?

Ok if it still applies.

Thanks,
Richard.

>
> 2013-06-25  Eric Botcazou  
>
> * cgraphunit.c (handle_alias_pairs): Reset the alias flag after the
> error message is issued for an alias to an undefined symbol.
>
>
> 2013-06-25  Eric Botcazou  
>
> * gnat.dg/specs/linker_alias.ads: New test.
>
>
> --
> Eric Botcazou


[PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)

2013-08-27 Thread Christian Widmer
Hello!

As reported in [1] the host processor detection has not yet been updated
to recognize Intel Ivy Bridge and Haswell processors.
This small patch adds the detection of these processors and assumes
core-avx2 as march for unknown processors of the PENTIUMPRO family that
support AVX2.


Best regards

Christian Widmer


[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57927
2013-08-27  Christian Widmer  

PR target/57927
* config/i386/driver-i386.c (host_detect_local_cpu): Add detection of Ivy
Bridge and Haswell processors and assume core-avx2 for unknown AVX2 capable
processors

Index: gcc/config/i386/driver-i386.c
===
*** gcc/config/i386/driver-i386.c	(revision 202013)
--- gcc/config/i386/driver-i386.c	(working copy)
*** const char *host_detect_local_cpu (int a
*** 662,667 
--- 662,675 
  	  /* Sandy Bridge.  */
  	  cpu = "corei7-avx";
  	  break;
+ 	case 0x3a:
+ 	  /* Ivy Bridge.  */
+ 	  cpu = "corei7-avx";
+ 	  break;
+ 	case 0x3c:
+ 	  /* Haswell.  */
+ 	  cpu = "core-avx2";
+ 	  break;
  	case 0x17:
  	case 0x1d:
  	  /* Penryn.  */
*** const char *host_detect_local_cpu (int a
*** 675,682 
  	  if (arch)
  	{
  	  /* This is unknown family 0x6 CPU.  */
! 	  if (has_avx)
! 		/* Assume Sandy Bridge.  */
  		cpu = "corei7-avx";
  	  else if (has_sse4_2)
  		{
--- 683,693 
  	  if (arch)
  	{
  	  /* This is unknown family 0x6 CPU.  */
! 	  if (has_avx2)
! 		/* Assume Haswell.  */
! 		cpu = "core-avx2";
! 	  else if (has_avx)
! 		/* Assume Sandy/Ivy Bridge.  */
  		cpu = "corei7-avx";
  	  else if (has_sse4_2)
  		{



Re: [RFC Patch, Aarch64] : Macros for profile code generation to enable gprof support

2013-08-27 Thread Marcus Shawcroft
Hi Venkat,

On 3 August 2013 19:01, Venkataramanan Kumar
 wrote:

> This patch adds macros to support gprof in Aarch64. The difference
> from the previous patch is that the compiler, while generating
> "mcount" routine for an instrumented function, also passes the return
> address as argument.
>
> The "mcount" routine in glibc will be modified as follows.
>
> (-Snip-)
>  #define MCOUNT \
> -void __mcount (void) 
> \
> +void __mcount (void* frompc)
>\
>  {
> \
> -  mcount_internal ((u_long) RETURN_ADDRESS (1), (u_long) RETURN_ADDRESS 
> (0)); \
> +  mcount_internal ((u_long) frompc, (u_long) RETURN_ADDRESS (0)); \
>  }
> (-Snip-)


> If this is Ok I will send the patch to glibc as well.

> 2013-08-02  Venkataramanan Kumar  
>
>  * config/aarch64/aarch64.h (MCOUNT_NAME): Define.
>(NO_PROFILE_COUNTERS): Likewise.
>(PROFILE_HOOK): Likewise.
>(FUNCTION_PROFILER): Likewise.
> *  config/aarch64/aarch64.c (aarch64_function_profiler): Remove.
>.
>
> regards,
> Venkat.

+  emit_library_call (fun, LCT_NORMAL, VOIDmode, 1,lr,Pmode); \
+}

GNU coding style requires spaces after the commas, but otherwise I
have no further comments on this patch. Post the glibc patch please.

Thanks
/Marcus


Re: [Patch,AArch64] Support SISD Shifts (SHL/USHR/SSHL/USHL/SSHR)

2013-08-27 Thread Marcus Shawcroft
On 20 August 2013 16:04, Vidya Praveen  wrote:

>> 2013-08-20  Vidya Praveen  
>>
>> * config/aarch64/aarch64.md (unspec): Add UNSPEC_SISD_SSHL,
>> UNSPEC_SISD_USHL, UNSPEC_USHL_2S, UNSPEC_SSHL_2S, UNSPEC_SISD_NEG.
>> (3_insn): Remove.
>> (aarch64_ashl_sisd_or_int_3): New Pattern.
>> (aarch64_lshr_sisd_or_int_3): Likewise.
>> (aarch64_ashr_sisd_or_int_3): Likewise.
>> (define_split for aarch64_lshr_sisd_or_int_di3): Likewise.
>> (define_split for aarch64_lshr_sisd_or_int_si3): Likewise.
>> (define_split for aarch64_ashr_sisd_or_int_di3): Likewise.
>> (define_split for aarch64_ashr_sisd_or_int_si3): Likewise.
>> (aarch64_sisd_ushl, aarch64_sisd_sshl): Likewise.
>> (aarch64_ushl_2s, aarch64_sshl_2s, aarch64_sisd_neg_qi): Likewise.
>> (ror3_insn): Likewise.
>> * config/aarch64/predicates.md (aarch64_simd_register): New.
>>
>> gcc/testsuite/ChangeLog
>>
>> 2013-08-20  Vidya Praveen  
>>
>> * gcc.target/aarch64/scalar_shift_1.c: New.
>>


OK
/Marcus


Re: [Patch, AArch64] Optimized implementation of vget_low_* in arm_neon.h.

2013-08-27 Thread Marcus Shawcroft
On 20 August 2013 12:21, Tejas Belagod  wrote:
> Hi,
>
> This patch replaces all inline asm implementations of vget_low_* in
> arm_neon.h with optimized implementations using other neon intrinsics.
>
> Tested with aarch64-none-elf.
>
> OK?

This is OK. /Marcus