Re: [PATCH] asan unit tests from llvm lit-test incremental changes

2012-12-12 Thread Konstantin Serebryany
On Thu, Dec 13, 2012 at 1:30 AM, Jakub Jelinek  wrote:
> On Wed, Dec 12, 2012 at 10:16:49PM +0100, Dodji Seketeli wrote:
>> Independently of this review, I think it's be interesting to hear
>> Kostya's voice on:
>>
>> Jakub Jelinek  writes:
>>
>> > 2) In large-func-test-1.C, I had to stop matching the backtrace after
>> > _Znw[jm], because libasan is using the fast but inaccurate backtrace,
>> > and while the tests can be easily tweaked to compile with
>> > -fno-omit-frame-pointer, we definitely can't rely on libstdc++.so to be
>> > built with that option.  Most likely it isn't.

The tests should be built with -fno-omit-frame-pointer and we don't
need that for libstdc++.so.
This is how it works in LLVM.
asan's interceptors are written in such a way that they don't care
if libstdc++.so or the asan run-time have frame pointers.

>>> I repeat that I think
>> > that at least for Linux libasan should use the _Unwind* based backtrace
>> > at least for the fatal functions (__asan_report* etc.),

We are discussing it from time to time.
Sometimes, if e.g. an error happens inside a qsort callback,
the fp-based unwinder fails to unwind through libc, while _Unwind would work.

I was opposed to this sometime ago because _Unwind often produced
buggy stack traces on Ubuntu Lucid (the version we cared about).
I also vaguely remember some problems with _Unwind* depending on
malloc (or maybe that's something else?)
Now we mostly care about Ubuntu Precise and we need to test whether
_Unwind produces good enough results there before switching.

unwinding on malloc/free should keep using the fp-based unwinder, at
least by default.

We'll be tracking the issue in
https://code.google.com/p/address-sanitizer/issues/detail?id=137

>>  and perhaps for
>> > these malloc wrappers like ::operator new, ::operator new[] and their
>> > const std::nothrow_t& variants libasan could intercept them, call
>> > malloc and if that returns NULL, call the original corresponding function
>> > so that it deals with exceptions, new handler etc.

Hmm.. Why's that?
Calling libc's malloc or libstdc++'s operator new in asan run-time is
really a bad idea.
asan's allocator should never return 0 anyway, it should simply crash.
I don't think we want to support new handler at all.

>
> Yeah, I'd appreciate that too.





>
>> and on:
>>
>> > 3) deep-thread-stack-1.C fails for me right now with some libasan 
>> > assertion,
>> > Kostya, can you please look at that?
>> >   AsanThread *t = asanThreadRegistry().GetCurrent();
>> >   CHECK(t);
>> > where it failed on the CHECK, because t was NULL.  I've skipped the test 
>> > for
>> > now.
>>
>> [...]
>
> This one is for the testcase solved right now already by the -lasan -lpthread
> linking instead of just -lpthread (and driver adding -lasan afterwards).
> We'll need to think about how to tweak the driver to add -lasan early on the
> command line, before user passed -l* options.
>>
>> > --- gcc/testsuite/g++.dg/asan/deep-tail-call-1.C.jj 2012-12-04 
>> > 20:24:10.0 +0100
>> > +++ gcc/testsuite/g++.dg/asan/deep-tail-call-1.C2012-12-05 
>> > 11:01:48.600443834 +0100
>> > @@ -1,21 +1,22 @@
>> > -// { dg-do run }
>> > +// { dg-do run }
>> >  // { dg-options "-fno-omit-frame-pointer -fno-optimize-sibling-calls" }
>> >  // { dg-additional-options "-mno-omit-leaf-frame-pointer" { target { 
>> > i?86-*-* x86_64-*-* } } }
>> > -// { dg-shouldfail "asan" }
>> > +// { dg-shouldfail "asan" }
>> >
>> >  int global[10];
>> >  void __attribute__((noinline)) call4(int i) { global[i+10]++; }
>> >  void __attribute__((noinline)) call3(int i) { call4(i); }
>> >  void __attribute__((noinline)) call2(int i) { call3(i); }
>> >  void __attribute__((noinline)) call1(int i) { call2(i); }
>> > -int main(int argc, char **argv) {
>> > -  call1(argc);
>> > +volatile int one = 1;
>>
>> Just curious, why do we need this variable to be volatile, especially
>> since the test is compiled without optimization?
>
> asan.exp tests are torture tests, they iterate over several -O* options,
> unless explicitly dg-skip-if skipped.  It could be non-volatile with
> asm volatile ("" : : : "memory");
> or asm volatile ("" : "+m" (one)); or similar too, sure.
> I just don't want to rely on argc being one, and the compiler shouldn't know
> that one is 1 in the test.
>
>> [...]
>>
>> The patch looks OK to me in any case.
>
> Thanks.
>
> Jakub


--kcc


Re: PATCH: Remove AM_MAKEFLAGS from libsanitizer

2012-12-12 Thread Paolo Bonzini
Il 13/12/2012 00:23, H.J. Lu ha scritto:
> On Wed, Dec 12, 2012 at 3:01 PM, H.J. Lu  wrote:
>> On Wed, Dec 12, 2012 at 2:13 PM, Paolo Bonzini  wrote:
>>> Il 12/12/2012 19:11, H.J. Lu ha scritto:

 in RAW_CXX_TARGET_EXPORTS.  There is no need to do anything.
>>
>> Nope, if you remove this you get the wrong definition of CC and CXX from
>> EXTRA_TARGET_FLAGS.  Instead, you need to add RAW_CXX_FOR_TARGET to
>> AM_MAKEFLAGS and EXTRA_TARGET_FLAGS.
>>
 We have set CXX and CXX_FOR_TARGET before EXTRA_TARGET_FLAGS
 is used:

 RAW_CXX_TARGET_EXPORTS = \
 $(BASE_TARGET_EXPORTS) \
 CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
 CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
 export CXX;

 Shouldn't we the right CXX and CXX_FOR_TARGET?
>>>
>>> The purpose of EXTRA_TARGET_FLAGS is to forward any *_FOR_TARGET
>>> variable passed on the command line to the toplevel Makefile.
>>>
>>
>> We don't pass down RAW_CXX_FOR_TARGET from toplevel
>> Makefile.  If one wants to change RAW_CXX_FOR_TARGET
>> from command line, he/she should do
>>
>> # make RAW_CXX_FOR_TARGET=xxx
>>
>> and RAW_CXX_TARGET_EXPORTS should handle it properly.

NORMAL_TARGET_EXPORTS and RAW_CXX_TARGET_EXPORTS is (mostly) for
configure time.  Makefiles do not use the environment variables, so you
need EXTRA_TARGET_FLAGS (and more generally the args argument to the
"all" macro) instead.

> 
> Also there are:
> 
> # --
> # Special directives to GNU Make
> # --
> 
> # Don't pass command-line variables to submakes.
> .NOEXPORT:
> MAKEOVERRIDES=
> 
> in toplevel Makefile.  How does
> 
> # make *_FOR_TARGET=
> 
> work at toplevel?

That's what I'm trying to say: the purpose of EXTRA_TARGET_FLAGS is to
forward any *_FOR_TARGET>> variable passed on the command line to the
toplevel Makefile.

Paolo


Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
Try the following one. 1) -minline-all-stringops
-mstringop-strategy=rep_8byte -O2 vs 1) -mstringop_strategy=libcall
-O2.

David


#include 
#include 
#include 
#ifndef LEN
#define LEN 16
#endif

void copy(char* s1, char* s2,int len) __attribute__((noinline));
void copy(char* s1, char* s2,int len)
{
   memcpy(s2,s1,len);
}


int main() {

  char* s1 = (char*) malloc(LEN  +10);
  char* s2 = (char*) malloc(LEN  +10);
  int i = 0;
  for (i =  0; i < 10; i++)
  {
copy(s1+1,s2+3,LEN);
  }
}

On Wed, Dec 12, 2012 at 10:21 PM, Jakub Jelinek  wrote:
> On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote:
>> On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka  wrote:
>> >> > libcall is not faster up to 8KB to rep sequence that is better for 
>> >> > regalloc/code
>> >> > cache than fully blowin function call.
>> >>
>> >> Be careful with this. My recollection is that REP sequence is good for
>> >> any size -- for smaller size, the REP initial set up cost is too high
>> >> (10s of cycles), while for large size copy, it is less efficient
>> >> compared with library version.
>> >
>> > Well this is based on the data from the memtest script.
>> > Core has good REP implementation - it is a win from rather small blocks (16
>> > bytes if I recall) and it does not need alignment.
>> > Library version starts to be interesting with caching hints, but I think 
>> > till 80KB
>> > it is still not a win for my setup (glibc-2.15)
>>
>> A simple test shows that -mstringop-strategy=libcall always beats
>> -mstringop-strategy=rep_8byte (on core2 and corei7) except for size
>> smaller than 8 where the rep_8byte strategy simply bypasses REP movs.
>> Can you share your memtest ?
>
> I can't believe that say 16 byte or 32 byte memcpy can be ever faster using a
> libcall.  The PLT call overhead is simply too high.
>
> Jakub


Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-12 Thread Dmitry Vyukov
On Thu, Dec 13, 2012 at 1:58 AM, Richard Henderson  wrote:
> On 12/12/2012 11:50 AM, Jakub Jelinek wrote:
>> 2012-12-12  Jakub Jelinek  
>>
>>   PR sanitizer/55508
>>   * builtin-attrs.def (ATTR_TMPURE_NOTHROW_LEAF_LIST,
>>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): New.
>>   * asan.c (ATTR_TMPURE_NOTHROW_LEAF_LIST,
>>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): Define.
>>   * sanitizer.def: Make __asan_report_* and __asan_handle_no_return
>>   builtins tm pure.
>
> Ok.
>
> Agreed about we need another solution for tsan + tm.


What type of bugs do you expect tsan catch in transactional setting?
Are we talking about data races between transactional and
non-transactional code?

Does atomic transactions permitted by gcc? I mean can I use mutexes
and atomics inside of transactions to synchronize with
non-transactional code?


Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-12 Thread Dmitry Vyukov
On Wed, Dec 12, 2012 at 11:50 PM, Jakub Jelinek  wrote:
> Hi!
>
> Various TM tests ICE when built with -fgnu-tm -fsanitizer=address.
> The problem is that asan.c pass adds calls to builtins that weren't there
> before and TM is upset about it.  The __asan_report* are all like
> abort, in correctly written program they shouldn't have a user visible
> effect, in bad program they will terminate the process, but in any case
> it doesn't matter how many times they are retried as part of a transaction,
> there is no state to roll back on transaction cancellation.
> __asan_handle_no_return, while not being noreturn, just marks the stack as
> unprotected, so again in correctly written application no effect, in bad app
> might result in some issues being undetected, but still, it can be done many
> times and isn't irreversible.

Hi!

I was only loosely following tm-languages discussions. Does gcc tm
model guarantees strong consistency for all memory accesses? I mean
there are tm implementations that allow transient inconsistencies,
than are detected later and trx is restarted. Can't asan trigger false
positives in this case?
Also, what is the order of instrumentation in tm+asan setting? I mean
that neither tm must instrument asan instrumentation, nor asan must
instrument tm instrumentation. Is it the case? There also can be
conflicts related to ordering of instrumentation in the following
case:
asan_check();
speculative_load();
tm_check();
In this case tm hopes to detect inconsistent speculative load
afterwards, but asan will crash the program before tm has a chance to
abort and retry (it is related to the first point).

Sorry, I don't know how gcc tm works. But I just suspect that it can
be very tricky.


> The following patch fixes the ICEs by making all of these transaction pure.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> As for TSAN, no idea what to do, TSAN diagnostics could be in a similar
> category, there is nothing to reverse, but as the library doesn't understand
> transactions, perhaps better would be not to tsan instrument anything inside
> of transactions, until the library is made TM aware.
>
> 2012-12-12  Jakub Jelinek  
>
> PR sanitizer/55508
> * builtin-attrs.def (ATTR_TMPURE_NOTHROW_LEAF_LIST,
> ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): New.
> * asan.c (ATTR_TMPURE_NOTHROW_LEAF_LIST,
> ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): Define.
> * sanitizer.def: Make __asan_report_* and __asan_handle_no_return
> builtins tm pure.
>
> --- gcc/builtin-attrs.def.jj2012-10-22 08:42:23.0 +0200
> +++ gcc/builtin-attrs.def   2012-12-12 11:56:54.942938879 +0100
> @@ -263,6 +263,11 @@ DEF_ATTR_TREE_LIST (ATTR_TMPURE_MALLOC_N
>  DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LIST,
>ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LIST)
>
> +DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LEAF_LIST,
> +   ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
> +DEF_ATTR_TREE_LIST (ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST,
> +   ATTR_TM_TMPURE, ATTR_NULL, 
> ATTR_NORETURN_NOTHROW_LEAF_LIST)
> +
>  /* Construct a tree for a format_arg attribute.  */
>  #define DEF_FORMAT_ARG_ATTRIBUTE(FA)   \
>DEF_ATTR_TREE_LIST (ATTR_FORMAT_ARG_##FA, ATTR_FORMAT_ARG,   \
> --- gcc/asan.c.jj   2012-12-11 13:05:36.0 +0100
> +++ gcc/asan.c  2012-12-12 11:57:59.626550534 +0100
> @@ -1611,8 +1611,13 @@ initialize_sanitizer_builtins (void)
>  #define BT_FN_VOID_VPTR_I16_INT BT_FN_VOID_VPTR_IX_INT[4]
>  #undef ATTR_NOTHROW_LEAF_LIST
>  #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF
> +#undef ATTR_TMPURE_NOTHROW_LEAF_LIST
> +#define ATTR_TMPURE_NOTHROW_LEAF_LIST ECF_TM_PURE | ATTR_NOTHROW_LEAF_LIST
>  #undef ATTR_NORETURN_NOTHROW_LEAF_LIST
>  #define ATTR_NORETURN_NOTHROW_LEAF_LIST ECF_NORETURN | ATTR_NOTHROW_LEAF_LIST
> +#undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
> +#define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
> +  ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
>  #undef DEF_SANITIZER_BUILTIN
>  #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,  \
> --- gcc/sanitizer.def.jj2012-12-11 11:28:10.0 +0100
> +++ gcc/sanitizer.def   2012-12-12 11:57:12.714833945 +0100
> @@ -32,25 +32,25 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT
>  /* Do not reorder the BUILT_IN_ASAN_REPORT* builtins, e.g. cfgcleanup.c
> relies on this order.  */
>  DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD1, "__asan_report_load1",
> - BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
> + BT_FN_VOID_PTR, ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST)
>  DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD2, "__asan_report_load2",
> - BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
> + BT_FN_VOID_PTR, ATTR_TMPURE_NORET

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote:
> On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka  wrote:
> >> > libcall is not faster up to 8KB to rep sequence that is better for 
> >> > regalloc/code
> >> > cache than fully blowin function call.
> >>
> >> Be careful with this. My recollection is that REP sequence is good for
> >> any size -- for smaller size, the REP initial set up cost is too high
> >> (10s of cycles), while for large size copy, it is less efficient
> >> compared with library version.
> >
> > Well this is based on the data from the memtest script.
> > Core has good REP implementation - it is a win from rather small blocks (16
> > bytes if I recall) and it does not need alignment.
> > Library version starts to be interesting with caching hints, but I think 
> > till 80KB
> > it is still not a win for my setup (glibc-2.15)
> 
> A simple test shows that -mstringop-strategy=libcall always beats
> -mstringop-strategy=rep_8byte (on core2 and corei7) except for size
> smaller than 8 where the rep_8byte strategy simply bypasses REP movs.
> Can you share your memtest ?

I can't believe that say 16 byte or 32 byte memcpy can be ever faster using a
libcall.  The PLT call overhead is simply too high.

Jakub


Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka  wrote:
>> > libcall is not faster up to 8KB to rep sequence that is better for 
>> > regalloc/code
>> > cache than fully blowin function call.
>>
>> Be careful with this. My recollection is that REP sequence is good for
>> any size -- for smaller size, the REP initial set up cost is too high
>> (10s of cycles), while for large size copy, it is less efficient
>> compared with library version.
>
> Well this is based on the data from the memtest script.
> Core has good REP implementation - it is a win from rather small blocks (16
> bytes if I recall) and it does not need alignment.
> Library version starts to be interesting with caching hints, but I think till 
> 80KB
> it is still not a win for my setup (glibc-2.15)

A simple test shows that -mstringop-strategy=libcall always beats
-mstringop-strategy=rep_8byte (on core2 and corei7) except for size
smaller than 8 where the rep_8byte strategy simply bypasses REP movs.
Can you share your memtest ?

thanks,

David

>> >> >
>> >> >/* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix 
>> >> > stall
>> >> > * on 16-bit immediate moves into memory on Core2 and Corei7.  */
>> >> > @@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
>> >> >m_K6,
>> >> >
>> >> >/* X86_TUNE_USE_CLTD */
>> >> > -  ~(m_PENT | m_ATOM | m_K6),
>> >> > +  ~(m_PENT | m_ATOM | m_K6 | m_GENERIC),
>>
>> My change was to enable CLTD for generic. Is your change intended to
>> revert that?
>
> No, it is merge conflict, sorry.  I will update it in my tree.
>> > Skipping inc/dec is to avoid partial flag stall happening on P4 only.
>> >> >
>>
>>
>> K8 and K10 partitions the flags into groups. References to flags to
>> the same group can still cause the stall -- not sure how that can be
>> handled.
>
> I  belive the stalls happends only in quite special cases where compare 
> instruction
> combines flags from multiple instructions.  GCC don't generate this type of 
> code, so
> we should be safe.
>
> Honza


[PATCH][Cilkplus] Check invalid gotos, increments and report errors

2012-12-12 Thread Iyer, Balaji V
Hello Everyone,
This patch is for Cilk Plus branch affecting mainly the C compiler. It 
checks to see if the gotos and increment expressions inside cilk_for are valid 
and report errors otherwise.

Thanks,

Balaji V. Iyer.
Index: gcc/c-family/c-cilk.c
===
--- gcc/c-family/c-cilk.c   (revision 194366)
+++ gcc/c-family/c-cilk.c   (working copy)
@@ -1292,8 +1292,9 @@
 them in code that spawns. */
   if ((TREE_CODE (arg) == VAR_DECL) && DECL_HARD_REGISTER (arg))
{
- error ("explicit register variable %qD may not be modified in spawn",
-arg);
+ error_at (EXPR_LOCATION (arg),
+   "explicit register variable %qD may not be modified in "
+   "spawn", arg);
  arg = null_pointer_node;
}
   else
@@ -1464,7 +1465,8 @@
   return copy_decl_no_change (decl, id);
 
 case LABEL_DECL:
-  error ("Invalid use of label %q+D in spawn", decl);
+  error_at (EXPR_LOCATION (decl),
+   "Invalid use of label %q+D in spawn", decl);
   return error_mark_node;
 
 case RESULT_DECL:
@@ -1544,7 +1546,8 @@
   if (! (flags & ECF_NOTHROW) && flag_exceptions)
 *throws = true;
   if (flags & ECF_RETURNS_TWICE)
-error ("Can not spawn call to function that returns twice");
+error_at (EXPR_LOCATION (t),
+ "Can not spawn call to function that returns twice");
   return 0;
 }
 
@@ -1761,6 +1764,7 @@
   tree cond = FOR_COND (loop);
   tree init = CILK_FOR_INIT (loop);
   tree incr = cilk_simplify_tree (FOR_EXPR (loop));
+  tree orig_incr = cilk_simplify_tree (FOR_EXPR (loop)); /* Copy for LOC.  */ 
   tree grain = CILK_FOR_GRAIN (loop);
   tree body = FOR_BODY (loop);
   
@@ -1937,9 +1941,12 @@
  exactly_one = incr_direction == -1;
}
   else
-   gcc_unreachable ();
+   {
+ error_at (EXPR_LOCATION (orig_incr),
+   "Invalid loop increment operation.");
+ cfd->invalid = true;
+   }
   break;
-
 default:
   gcc_unreachable ();
 }
@@ -2865,3 +2872,6 @@
 }
   return decl;
 }
+
+
+  
Index: gcc/c-family/ChangeLog.cilkplus
===
--- gcc/c-family/ChangeLog.cilkplus (revision 194366)
+++ gcc/c-family/ChangeLog.cilkplus (working copy)
@@ -1,3 +1,11 @@
+2012-12-12  Balaji V. Iyer  
+
+   * c-typeck.c (c_finish_cilk_loop) Added location for error reporting.
+   * c-cilk.c (wrapper_parm_cb): Likewise.
+   (copy_decl_for_cilk): Likewise.
+   (check_outlined_calls): Likewise.
+   (extract_for_fields): Reported an error instead of gcc_unreachable ().
+
 2012-12-10  Balaji V. Iyer  
 
* c-cpp-elem-function.c: New file.
Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c(revision 194366)
+++ gcc/c/c-typeck.c(working copy)
@@ -10982,19 +10982,19 @@
 
   if (!cond)
 {
-  error ("cilk_for missing condition");
+  error ("_Cilk_for missing condition");
   return;
 }
   if (!incr)
 {
-  error ("cilk_for missing increment");
+  error ("_Cilk_for missing increment");
   return;
 }
 
   /* If the condition is zero don't generate a loop construct.  */
   if (TREE_CONSTANT (cond))
 {
-  error ("cilk_for has constant condition");
+  error_at (EXPR_LOCATION (cond), "_Cilk_for has constant condition");
   return;
 }
   if (!cvar)
@@ -11004,7 +11004,7 @@
 }
   if (clab)
 {
-  error ("cilk_for has continue");
+  error_at (EXPR_LOCATION (clab), "_Cilk_for has continue");
   return;
 }
 
Index: gcc/c/c-objc-common.h
===
--- gcc/c/c-objc-common.h   (revision 194366)
+++ gcc/c/c-objc-common.h   (working copy)
@@ -120,4 +120,7 @@
 
 #undef LANG_HOOKS_ELEM_FN_CREATE_FN
 #define LANG_HOOKS_ELEM_FN_CREATE_FN elem_fn_create_fn
+
+#undef LANG_HOOKS_CILK_CHECK_CTRL_FLOW
+#define LANG_HOOKS_CILK_CHECK_CTRL_FLOW cilk_check_ctrl_flow
 #endif /* GCC_C_OBJC_COMMON */
Index: 
gcc/testsuite/gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_incr_errors.c
===
--- 
gcc/testsuite/gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_incr_errors.c 
(revision 0)
+++ 
gcc/testsuite/gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_incr_errors.c 
(revision 0)
@@ -0,0 +1,22 @@
+#define ARRAY_SIZE 10
+
+int main(void)
+{
+  int ii = 0;
+  int array[ARRAY_SIZE];
+  
+  _Cilk_for (ii = 0; ii < 10; ii <<= 2) /* { dg-error "Invalid loop increment 
operation." } */
+array[ii] = 5;
+
+  _Cilk_for (ii = 0; ii < 10; ii *= 2) /* { dg-error "Invalid loop increment 
operation." } */
+array[ii] = 5;
+
+  _Cilk_for (ii = 0; ii < 10; ii /= 2) /* { dg-error "Invalid loop increment 
operation." } */
+

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jan Hubicka
> > libcall is not faster up to 8KB to rep sequence that is better for 
> > regalloc/code
> > cache than fully blowin function call.
> 
> Be careful with this. My recollection is that REP sequence is good for
> any size -- for smaller size, the REP initial set up cost is too high
> (10s of cycles), while for large size copy, it is less efficient
> compared with library version.

Well this is based on the data from the memtest script.  
Core has good REP implementation - it is a win from rather small blocks (16
bytes if I recall) and it does not need alignment.
Library version starts to be interesting with caching hints, but I think till 
80KB
it is still not a win for my setup (glibc-2.15)
> >> >
> >> >/* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
> >> > * on 16-bit immediate moves into memory on Core2 and Corei7.  */
> >> > @@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
> >> >m_K6,
> >> >
> >> >/* X86_TUNE_USE_CLTD */
> >> > -  ~(m_PENT | m_ATOM | m_K6),
> >> > +  ~(m_PENT | m_ATOM | m_K6 | m_GENERIC),
> 
> My change was to enable CLTD for generic. Is your change intended to
> revert that?

No, it is merge conflict, sorry.  I will update it in my tree.
> > Skipping inc/dec is to avoid partial flag stall happening on P4 only.
> >> >
> 
> 
> K8 and K10 partitions the flags into groups. References to flags to
> the same group can still cause the stall -- not sure how that can be
> handled.

I  belive the stalls happends only in quite special cases where compare 
instruction
combines flags from multiple instructions.  GCC don't generate this type of 
code, so
we should be safe.

Honza


Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
On Wed, Dec 12, 2012 at 4:16 PM, Xinliang David Li  wrote:
> On Wed, Dec 12, 2012 at 10:30 AM, Jan Hubicka  wrote:
>> Concerning 1push per cycle, I think it is same as K7 hardware did, so move
>> prologue should be a win.
>>> > Index: config/i386/i386.c
>>> > ===
>>> > --- config/i386/i386.c  (revision 194452)
>>> > +++ config/i386/i386.c  (working copy)
>>> > @@ -1620,14 +1620,14 @@ struct processor_costs core_cost = {
>>> >COSTS_N_INSNS (8),   /* cost of FABS instruction.  */
>>> >COSTS_N_INSNS (8),   /* cost of FCHS instruction.  */
>>> >COSTS_N_INSNS (40),  /* cost of FSQRT instruction.  */
>>> > -  {{libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
>>> > -   {libcall, {{24, loop, true}, {128, rep_prefix_8_byte, true},
>>> > +  {{libcall, {{8192, rep_prefix_4_byte, true}, {-1, libcall, false}}},
>>> > +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
>>> >{-1, libcall, false,
>>> >{{libcall, {{6, loop_1_byte, true},
>>> >{24, loop, true},
>>> >{8192, rep_prefix_4_byte, true},
>>> >{-1, libcall, false}}},
>>> > -   {libcall, {{24, loop, true}, {512, rep_prefix_8_byte, true},
>>> > +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
>>
>> libcall is not faster up to 8KB to rep sequence that is better for 
>> regalloc/code
>> cache than fully blowin function call.
>
> Be careful with this. My recollection is that REP sequence is good for


s/good/not good/


David

> any size -- for smaller size, the REP initial set up cost is too high
> (10s of cycles), while for large size copy, it is less efficient
> compared with library version.
>
>
>>> > @@ -1806,7 +1806,7 @@ static unsigned int initial_ix86_tune_fe
>>> >m_PPRO,
>>> >
>>> >/* X86_TUNE_PARTIAL_FLAG_REG_STALL */
>>> > -  m_CORE2I7 | m_GENERIC,
>>> > +  m_GENERIC | m_CORE2,
>>
>> This disable shifts that store just some flags. Acroding to Agner's manual 
>> I7 handle
>> this well.
>>
>
> ok.
>
>> Partial flags stall
>> The Sandy Bridge uses the method of an extra Âľop to join partial registers 
>> not only for
>> general purpose registers but also for the flags register, unlike previous 
>> processors which
>> used this method only for general purpose registers. This occurs when a 
>> write to a part of
>> the flags register is followed by a read from a larger part of the flags 
>> register. The partial
>> flags stall of previous processors (See page 75) is therefore replaced by an 
>> extra Âľop. The
>> Sandy Bridge also generates an extra Âľop when reading the flags after a 
>> rotate instruction.
>>
>> This is cheaper than the 7 cycle delay on Core this flags is trying to avoid.
>
> ok.
>
>>> >
>>> >/* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
>>> > * on 16-bit immediate moves into memory on Core2 and Corei7.  */
>>> > @@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
>>> >m_K6,
>>> >
>>> >/* X86_TUNE_USE_CLTD */
>>> > -  ~(m_PENT | m_ATOM | m_K6),
>>> > +  ~(m_PENT | m_ATOM | m_K6 | m_GENERIC),
>
> My change was to enable CLTD for generic. Is your change intended to
> revert that?
>
>>
>> None of CPUs that generic care about are !USE_CLTD now after your change.
>>> > @@ -1910,10 +1910,10 @@ static unsigned int initial_ix86_tune_fe
>>> >m_ATHLON_K8,
>>> >
>>> >/* X86_TUNE_SSE_TYPELESS_STORES */
>>> > -  m_AMD_MULTIPLE,
>>> > +  m_AMD_MULTIPLE | m_CORE2I7, /**/
>>
>> Hmm, I can not seem to find this in manual now, but I believe that stores 
>> also do not type,
>> so movaps is preferred over movapd store because it is shorter.  If not, 
>> this change should
>> produce a lot of slowdowns.
>>> >
>>> >/* X86_TUNE_SSE_LOAD0_BY_PXOR */
>>> > -  m_PPRO | m_P4_NOCONA,
>>> > +  m_PPRO | m_P4_NOCONA | m_CORE2I7, /**/
>>
>> Agner:
>> A common way of setting a register to zero is XOR EAX,EAX or SUB EBX,EBX. The
>> Core2 and Nehalem processors recognize that certain instructions are 
>> independent of the
>> prior value of the register if the source and destination registers are the 
>> same.
>>
>> This applies to all of the following instructions: XOR, SUB, PXOR, XORPS, 
>> XORPD, and all
>> variants of PSUBxxx and PCMPxxx except PCMPEQQ.
>>> >
>>> >/* X86_TUNE_MEMORY_MISMATCH_STALL */
>>> >m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>>> > @@ -1938,7 +1938,7 @@ static unsigned int initial_ix86_tune_fe
>>> >
>>> >/* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict 
>>> > more
>>> >   than 4 branch instructions in the 16 byte window.  */
>>> > -  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>>> > +  m_PPRO | m_P4_NOCONA | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>>
>> This is special passs to handle limitations of AMD's K7/K8/K10 branch 
>> prediction.
>> Intel never h

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
On Wed, Dec 12, 2012 at 10:30 AM, Jan Hubicka  wrote:
> Concerning 1push per cycle, I think it is same as K7 hardware did, so move
> prologue should be a win.
>> > Index: config/i386/i386.c
>> > ===
>> > --- config/i386/i386.c  (revision 194452)
>> > +++ config/i386/i386.c  (working copy)
>> > @@ -1620,14 +1620,14 @@ struct processor_costs core_cost = {
>> >COSTS_N_INSNS (8),   /* cost of FABS instruction.  */
>> >COSTS_N_INSNS (8),   /* cost of FCHS instruction.  */
>> >COSTS_N_INSNS (40),  /* cost of FSQRT instruction.  */
>> > -  {{libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
>> > -   {libcall, {{24, loop, true}, {128, rep_prefix_8_byte, true},
>> > +  {{libcall, {{8192, rep_prefix_4_byte, true}, {-1, libcall, false}}},
>> > +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
>> >{-1, libcall, false,
>> >{{libcall, {{6, loop_1_byte, true},
>> >{24, loop, true},
>> >{8192, rep_prefix_4_byte, true},
>> >{-1, libcall, false}}},
>> > -   {libcall, {{24, loop, true}, {512, rep_prefix_8_byte, true},
>> > +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
>
> libcall is not faster up to 8KB to rep sequence that is better for 
> regalloc/code
> cache than fully blowin function call.

Be careful with this. My recollection is that REP sequence is good for
any size -- for smaller size, the REP initial set up cost is too high
(10s of cycles), while for large size copy, it is less efficient
compared with library version.


>> > @@ -1806,7 +1806,7 @@ static unsigned int initial_ix86_tune_fe
>> >m_PPRO,
>> >
>> >/* X86_TUNE_PARTIAL_FLAG_REG_STALL */
>> > -  m_CORE2I7 | m_GENERIC,
>> > +  m_GENERIC | m_CORE2,
>
> This disable shifts that store just some flags. Acroding to Agner's manual I7 
> handle
> this well.
>

ok.

> Partial flags stall
> The Sandy Bridge uses the method of an extra Âľop to join partial registers 
> not only for
> general purpose registers but also for the flags register, unlike previous 
> processors which
> used this method only for general purpose registers. This occurs when a write 
> to a part of
> the flags register is followed by a read from a larger part of the flags 
> register. The partial
> flags stall of previous processors (See page 75) is therefore replaced by an 
> extra Âľop. The
> Sandy Bridge also generates an extra Âľop when reading the flags after a 
> rotate instruction.
>
> This is cheaper than the 7 cycle delay on Core this flags is trying to avoid.

ok.

>> >
>> >/* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
>> > * on 16-bit immediate moves into memory on Core2 and Corei7.  */
>> > @@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
>> >m_K6,
>> >
>> >/* X86_TUNE_USE_CLTD */
>> > -  ~(m_PENT | m_ATOM | m_K6),
>> > +  ~(m_PENT | m_ATOM | m_K6 | m_GENERIC),

My change was to enable CLTD for generic. Is your change intended to
revert that?

>
> None of CPUs that generic care about are !USE_CLTD now after your change.
>> > @@ -1910,10 +1910,10 @@ static unsigned int initial_ix86_tune_fe
>> >m_ATHLON_K8,
>> >
>> >/* X86_TUNE_SSE_TYPELESS_STORES */
>> > -  m_AMD_MULTIPLE,
>> > +  m_AMD_MULTIPLE | m_CORE2I7, /**/
>
> Hmm, I can not seem to find this in manual now, but I believe that stores 
> also do not type,
> so movaps is preferred over movapd store because it is shorter.  If not, this 
> change should
> produce a lot of slowdowns.
>> >
>> >/* X86_TUNE_SSE_LOAD0_BY_PXOR */
>> > -  m_PPRO | m_P4_NOCONA,
>> > +  m_PPRO | m_P4_NOCONA | m_CORE2I7, /**/
>
> Agner:
> A common way of setting a register to zero is XOR EAX,EAX or SUB EBX,EBX. The
> Core2 and Nehalem processors recognize that certain instructions are 
> independent of the
> prior value of the register if the source and destination registers are the 
> same.
>
> This applies to all of the following instructions: XOR, SUB, PXOR, XORPS, 
> XORPD, and all
> variants of PSUBxxx and PCMPxxx except PCMPEQQ.
>> >
>> >/* X86_TUNE_MEMORY_MISMATCH_STALL */
>> >m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>> > @@ -1938,7 +1938,7 @@ static unsigned int initial_ix86_tune_fe
>> >
>> >/* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
>> >   than 4 branch instructions in the 16 byte window.  */
>> > -  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>> > +  m_PPRO | m_P4_NOCONA | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>
> This is special passs to handle limitations of AMD's K7/K8/K10 branch 
> prediction.
> Intel never had similar design, so this flag is pointless.

I noticed that too, but Andi has a better answer to it.

>
> We apparently ought to disable it for K10, at least per Agner's manual.
>> >
>> >/* X86_TUNE_SCHEDULE */
>> >  

Re: [cxx-conversion] Convert tree-sra.c'candidates to hash_table

2012-12-12 Thread Jakub Jelinek
On Tue, Dec 11, 2012 at 02:44:41PM -0800, Lawrence Crowl wrote:
> +/* Hash a tree in a uid_decl_map.  */
> +
> +inline hashval_t
> +uid_decl_hasher::hash (const value_type *item)
> +{
> +  return item->decl_minimal.uid;

Ugh, why aren't you using DECL_UID here?  The fact that DECL_UID is right
now defined to (DECL_MINIMAL_CHECK (NODE)->decl_minimal.uid) might change,
we really don't want to update thousands of locations should we need to
change that.

Jakub


Re: [v3] docbook vs. texlive > 2007

2012-12-12 Thread Benjamin De Kosnik

Fixup for a typo, on trunk and 4.7

-benjamin2012-12-12  Benjamin Kosnik  

	* doc/xml/manual/documentation_hacking.xml: Fix validation issue.


diff --git a/libstdc++-v3/doc/xml/manual/documentation_hacking.xml b/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
index 91d16dd..05c05a6 100644
--- a/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
+++ b/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
@@ -383,6 +383,8 @@
 
   
refman.out
+  
+
   
 A log of the compilation of the converted LaTeX form to pdf. This
 is a linear list, from the beginning of the
@@ -394,10 +396,20 @@
 incorrect, or will have clues at the end of the file with the dump
 of the memory usage of LaTeX.
   
-  
 
 
 
+	
+	  If the error at hand is not obvious after examination, a
+	  fall-back strategy is to start commenting out the doxygen
+	  input sources, which can be found in
+	  doc/doxygen/user.cfg.in, look for the
+	  INPUT tag. Start by commenting out whole
+	  directories of header files, until the offending header is
+	  identified. Then, read the latex log files to try and find
+	  surround text, and look for that in the offending header.
+	
+
  
 
 Markup
@@ -872,7 +884,7 @@ make XSL_STYLE_DIR="/usr/share/xml/docbook/stylesheet/nwalsh"
 
 
 	
-	  If the issue is not obvious after examination, or if one
+	  If the error at hand is not obvious after examination, or if one
 	  encounters the inscruitable Incomplete
 	  \ifmmode error, a fall-back strategy is to start
 	  commenting out parts of the XML document (regardless of what


libgo patch committed: Delete from a nil map is now a no-op

2012-12-12 Thread Ian Lance Taylor
The Go spec was changed slightly: calling delete on a nil map is now a
no-op, rather than causing a runtime panic.  This patch implements that
in gccgo.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

Index: libgo/runtime/go-map-delete.c
===
--- libgo/runtime/go-map-delete.c	(revision 194274)
+++ libgo/runtime/go-map-delete.c	(working copy)
@@ -27,7 +27,7 @@ __go_map_delete (struct __go_map *map, c
   void **pentry;
 
   if (map == NULL)
-runtime_panicstring ("deletion of entry in nil map");
+return;
 
   descriptor = map->__descriptor;
 
Index: gcc/testsuite/go.test/test/nil.go
===
--- gcc/testsuite/go.test/test/nil.go	(revision 193595)
+++ gcc/testsuite/go.test/test/nil.go	(working copy)
@@ -151,9 +151,8 @@ func maptest() {
 	shouldPanic(func() {
 		m[2] = 3
 	})
-	shouldPanic(func() {
-		delete(m, 2)
-	})
+	// can delete (non-existent) entries
+	delete(m, 2)
 }
 
 // nil slice


Re: PATCH: Remove AM_MAKEFLAGS from libsanitizer

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 3:01 PM, H.J. Lu  wrote:
> On Wed, Dec 12, 2012 at 2:13 PM, Paolo Bonzini  wrote:
>> Il 12/12/2012 19:11, H.J. Lu ha scritto:
> >>
> >> in RAW_CXX_TARGET_EXPORTS.  There is no need to do anything.
 >
 > Nope, if you remove this you get the wrong definition of CC and CXX from
 > EXTRA_TARGET_FLAGS.  Instead, you need to add RAW_CXX_FOR_TARGET to
 > AM_MAKEFLAGS and EXTRA_TARGET_FLAGS.
 >
>>> We have set CXX and CXX_FOR_TARGET before EXTRA_TARGET_FLAGS
>>> is used:
>>>
>>> RAW_CXX_TARGET_EXPORTS = \
>>> $(BASE_TARGET_EXPORTS) \
>>> CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
>>> CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
>>> export CXX;
>>>
>>> Shouldn't we the right CXX and CXX_FOR_TARGET?
>>
>> The purpose of EXTRA_TARGET_FLAGS is to forward any *_FOR_TARGET
>> variable passed on the command line to the toplevel Makefile.
>>
>
> We don't pass down RAW_CXX_FOR_TARGET from toplevel
> Makefile.  If one wants to change RAW_CXX_FOR_TARGET
> from command line, he/she should do
>
> # make RAW_CXX_FOR_TARGET=xxx
>
> and RAW_CXX_TARGET_EXPORTS should handle it properly.

Also there are:

# --
# Special directives to GNU Make
# --

# Don't pass command-line variables to submakes.
.NOEXPORT:
MAKEOVERRIDES=

in toplevel Makefile.  How does

# make *_FOR_TARGET=

work at toplevel?


-- 
H.J.


Re: [PATCH] Fix nonlocalized vars regression caused by vec changes (PR debug/55665)

2012-12-12 Thread Diego Novillo
On Wed, Dec 12, 2012 at 2:58 PM, Jakub Jelinek  wrote:

> 2012-12-12  Jakub Jelinek  
>
> PR debug/55665
> * tree-inline.c (remap_decls): Change nonlocalized_list
> to pointer to pointer to vector from pointer to vector.
> (remap_block): Pass address of BLOCK_NONLOCALIZED_VARS.
>
> * g++.dg/guality/pr55665.C: New test.

OK.


Diego.


libgo patch committed: Update to current library

2012-12-12 Thread Ian Lance Taylor
I've updated libgo to the current master Go library sources.  As usual
this e-mail only includes patches to files that are substantially
specific to gccgo.  The bulk of the patch is available from the
repository or from the master Go repository.  Bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r ddfa689e6405 libgo/MERGE
--- a/libgo/MERGE	Wed Dec 05 20:09:59 2012 -0800
+++ b/libgo/MERGE	Wed Dec 12 15:03:42 2012 -0800
@@ -1,4 +1,4 @@
-a070de932857
+c031aa767edf
 
 The first line of this file holds the Mercurial revision number of the
 last merge done from the master library sources.
diff -r ddfa689e6405 libgo/Makefile.am
--- a/libgo/Makefile.am	Wed Dec 05 20:09:59 2012 -0800
+++ b/libgo/Makefile.am	Wed Dec 12 15:03:42 2012 -0800
@@ -221,6 +221,7 @@
 toolexeclibgoexpdir = $(toolexeclibgodir)/exp
 
 toolexeclibgoexp_DATA = \
+	exp/cookiejar.gox \
 	exp/ebnf.gox \
 	exp/html.gox \
 	$(exp_inotify_gox) \
@@ -251,6 +252,7 @@
 	go/ast.gox \
 	go/build.gox \
 	go/doc.gox \
+	go/format.gox \
 	go/parser.gox \
 	go/printer.gox \
 	go/scanner.gox \
@@ -1194,6 +1196,9 @@
 	go/encoding/xml/typeinfo.go \
 	go/encoding/xml/xml.go
 
+go_exp_cookiejar_files = \
+	go/exp/cookiejar/jar.go \
+	go/exp/cookiejar/storage.go
 go_exp_ebnf_files = \
 	go/exp/ebnf/ebnf.go \
 	go/exp/ebnf/parser.go
@@ -1284,6 +1289,8 @@
 	go/go/doc/filter.go \
 	go/go/doc/reader.go \
 	go/go/doc/synopsis.go
+go_go_format_files = \
+	go/go/format/format.go
 go_go_parser_files = \
 	go/go/parser/interface.go \
 	go/go/parser/parser.go
@@ -1384,6 +1391,7 @@
 go_mime_multipart_files = \
 	go/mime/multipart/formdata.go \
 	go/mime/multipart/multipart.go \
+	go/mime/multipart/quotedprintable.go \
 	go/mime/multipart/writer.go
 
 go_net_http_files = \
@@ -1456,6 +1464,7 @@
 
 go_os_user_files = \
 	go/os/user/user.go \
+	go/os/user/lookup.go \
 	go/os/user/lookup_unix.go
 
 go_path_filepath_files = \
@@ -1822,6 +1831,7 @@
 	encoding/json.lo \
 	encoding/pem.lo \
 	encoding/xml.lo \
+	exp/cookiejar.lo \
 	exp/ebnf.lo \
 	exp/html.lo \
 	exp/html/atom.lo \
@@ -1836,6 +1846,7 @@
 	go/ast.lo \
 	go/build.lo \
 	go/doc.lo \
+	go/format.lo \
 	go/parser.lo \
 	go/printer.lo \
 	go/scanner.lo \
@@ -2658,6 +2669,15 @@
 	@$(CHECK)
 .PHONY: encoding/xml/check
 
+@go_include@ exp/cookiejar.lo.dep
+exp/cookiejar.lo.dep: $(go_exp_cookiejar_files)
+	$(BUILDDEPS)
+exp/cookiejar.lo: $(go_exp_cookiejar_files)
+	$(BUILDPACKAGE)
+exp/cookiejar/check: $(CHECK_DEPS)
+	@$(CHECK)
+.PHONY: exp/cookiejar/check
+
 @go_include@ exp/ebnf.lo.dep
 exp/ebnf.lo.dep: $(go_exp_ebnf_files)
 	$(BUILDDEPS)
@@ -2802,6 +2822,15 @@
 	@$(CHECK)
 .PHONY: go/doc/check
 
+@go_include@ go/format.lo.dep
+go/format.lo.dep: $(go_go_format_files)
+	$(BUILDDEPS)
+go/format.lo: $(go_go_format_files)
+	$(BUILDPACKAGE)
+go/format/check: $(CHECK_DEPS)
+	@$(CHECK)
+.PHONY: go/format/check
+
 @go_include@ go/parser.lo.dep
 go/parser.lo.dep: $(go_go_parser_files)
 	$(BUILDDEPS)
@@ -3450,6 +3479,8 @@
 encoding/xml.gox: encoding/xml.lo
 	$(BUILDGOX)
 
+exp/cookiejar.gox: exp/cookiejar.lo
+	$(BUILDGOX)
 exp/ebnf.gox: exp/ebnf.lo
 	$(BUILDGOX)
 exp/html.gox: exp/html.lo
@@ -3482,6 +3513,8 @@
 	$(BUILDGOX)
 go/doc.gox: go/doc.lo
 	$(BUILDGOX)
+go/format.gox: go/format.lo
+	$(BUILDGOX)
 go/parser.gox: go/parser.lo
 	$(BUILDGOX)
 go/printer.gox: go/printer.lo
@@ -3681,6 +3714,7 @@
 	encoding/json/check \
 	encoding/pem/check \
 	encoding/xml/check \
+	exp/cookiejar/check \
 	exp/ebnf/check \
 	exp/html/check \
 	exp/html/atom/check \
@@ -3696,6 +3730,7 @@
 	go/ast/check \
 	$(go_build_check_omitted_since_it_calls_6g) \
 	go/doc/check \
+	go/format/check \
 	go/parser/check \
 	go/printer/check \
 	go/scanner/check \
diff -r ddfa689e6405 libgo/runtime/chan.c
--- a/libgo/runtime/chan.c	Wed Dec 05 20:09:59 2012 -0800
+++ b/libgo/runtime/chan.c	Wed Dec 12 15:03:42 2012 -0800
@@ -197,7 +197,7 @@
 	runtime_lock(c);
 	// TODO(dvyukov): add similar instrumentation to select.
 	if(raceenabled)
-		runtime_racereadpc(c, pc);
+		runtime_racereadpc(c, pc, runtime_chansend);
 	if(c->closed)
 		goto closed;
 
@@ -1271,7 +1271,7 @@
 	}
 
 	if(raceenabled) {
-		runtime_racewritepc(c, runtime_getcallerpc(&c));
+		runtime_racewritepc(c, runtime_getcallerpc(&c), runtime_closechan);
 		runtime_racerelease(c);
 	}
 
diff -r ddfa689e6405 libgo/runtime/mgc0.c
--- a/libgo/runtime/mgc0.c	Wed Dec 05 20:09:59 2012 -0800
+++ b/libgo/runtime/mgc0.c	Wed Dec 12 15:03:42 2012 -0800
@@ -949,6 +949,7 @@
 		dumpspan(spanidx);
 	}
 }
+
 void
 runtime_gchelper(void)
 {
@@ -1025,16 +1026,21 @@
 	mstats.stacks_sys = stacks_sys;
 }
 
+// Structure of arguments passed to function gc().
+// This allows the arguments to be passed via reflect_call.
+struct gc_args
+{
+	int32 force;
+};
+
+static void gc(struct gc_args *args);
+
 void
 runtime_gc(int32 force)
 {
 	M *m;
-	int64 t0, t1, t2, t3;
-	uint64 heap0, heap1, obj0, obj1;
 	const byte *p;
-	GCStats stats;
-	M *m1;
-	uint32 i;
+	struct gc_args a, *ap;
 
 	// Th

Re: [patch] fix libstdc++/55631

2012-12-12 Thread Jonathan Wakely
On 11 December 2012 22:31, Jonathan Wakely wrote:
> PR libstdc++/55631
> * include/ext/alloc_traits.h: Include missing header.
> * include/ext/pointer.h: Likewise.
> * include/ext/string_conversions.h: Require C++11.
> * libsupc++/initializer_list: Reindent.
>
> Tested x86_64-linux, committed to trunk.

Also committed to the 4.7 branch, the  error is a
regression from 4.6


Re: PATCH: Remove AM_MAKEFLAGS from libsanitizer

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 2:13 PM, Paolo Bonzini  wrote:
> Il 12/12/2012 19:11, H.J. Lu ha scritto:
 >>
 >> in RAW_CXX_TARGET_EXPORTS.  There is no need to do anything.
>>> >
>>> > Nope, if you remove this you get the wrong definition of CC and CXX from
>>> > EXTRA_TARGET_FLAGS.  Instead, you need to add RAW_CXX_FOR_TARGET to
>>> > AM_MAKEFLAGS and EXTRA_TARGET_FLAGS.
>>> >
>> We have set CXX and CXX_FOR_TARGET before EXTRA_TARGET_FLAGS
>> is used:
>>
>> RAW_CXX_TARGET_EXPORTS = \
>> $(BASE_TARGET_EXPORTS) \
>> CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
>> CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
>> export CXX;
>>
>> Shouldn't we the right CXX and CXX_FOR_TARGET?
>
> The purpose of EXTRA_TARGET_FLAGS is to forward any *_FOR_TARGET
> variable passed on the command line to the toplevel Makefile.
>

We don't pass down RAW_CXX_FOR_TARGET from toplevel
Makefile.  If one wants to change RAW_CXX_FOR_TARGET
from command line, he/she should do

# make RAW_CXX_FOR_TARGET=xxx

and RAW_CXX_TARGET_EXPORTS should handle it properly.

-- 
H.J.


Re: [cxx-conversion] Convert tree-sra.c'candidates to hash_table

2012-12-12 Thread Lawrence Crowl
On 12/12/12, Diego Novillo  wrote:
> On Dec 11, 2012 Lawrence Crowl  wrote:
> > Convert tree-sra.c'candidates from htab_t to hash_table.
> >
> > Fold uid_decl_map_hash and uid_decl_map_eq into new struct
> > uid_decl_hasher.  This change moves the definitions from tree-ssa.c
> > into tree-sra.c and removes the declarations from tree-flow.h
> >
> > Update dependent calls and types to hash_table.
> >
> > Tested on x86_64.
> >
> > Okay for branch?
> >
> >
> > Index: gcc/tree-sra.c
> > +/* Candidate ashtable helpers.  */
>
> s/ashtable/hash table/
>
> > +struct uid_decl_hasher : typed_noop_remove 
> > +{
> > +  typedef tree_node value_type;
> > +  typedef tree_node compare_type;
> > +  static inline hashval_t hash (const value_type *);
> > +  static inline bool equal (const value_type *, const compare_type *);
> > +};
> > +
> > +/* Hash a tree in a uid_decl_map.  */
> > +
> > +inline hashval_t
> > +uid_decl_hasher::hash (const value_type *item)
> > +{
> > +  return item->decl_minimal.uid;
> > +}
> > +
> > +/* Return true if the DECL_UID in both trees are equal.  */
> > +
> > +inline bool
> > +uid_decl_hasher::equal (const value_type *a, const compare_type *b)
> > +{
> > +  return (a->decl_minimal.uid == b->decl_minimal.uid);
> > +}
>
> ISTR other places where we hash decls.  I wonder if we shouldn't
> move this to some common spot.  Maybe later.

Decls are hashed in different ways to match the equality function.
There probably are duplicates in there, but I'm sure they aren't
all duplicates.

> The patch is OK.

-- 
Lawrence Crowl


Re: [patch, mips] Patch to allow madd, etc without 64 bit floating point

2012-12-12 Thread Steve Ellcey
On Wed, 2012-12-12 at 22:21 +, Maciej W. Rozycki wrote:

>  Given that, how about we coordinate these submissions -- I'll dig out the 
> most recent version of the madd change and push it through testing before 
> I post it and you do the reciprocals?
> 
>   Maciej

That sounds good to me.

Steve Ellcey
sell...@mips.com




Re: [patch, mips] Patch to allow madd, etc without 64 bit floating point

2012-12-12 Thread Maciej W. Rozycki
Steve,

 Thanks for raising my attention, I've got a proposal below.

> The MIPS32 architecture supports madd, msub, nmadd, and nmsub instructions
> as well as rsqrt and recip instructions even when it doesn't have a 64 bit
> floating point unit.  This was not clear in the original MIPS32 architecture
> document (MD00086) but was clarified in 2011 (version 3.0.2, March 21, 2011).
> This patch updates the ISA_HAS_FP4 and ISA_HAS_NMADD4_NMSUB4 macros and the
> recip_condition mode attribute to allow the use of these instructions for all
> MIPS32 architectures (with or without FLOAT64).
> 
> Testing showed one new test failure (gcc.target/mips/rsqrt-4.c) which I 
> updated to reflect that we can now generate and use a rsqrt.d instruction.
> 
> I put Chao-ying and Maciej's names on the patch since they did the
> original changes in GCC but those changes were never checked in upstream.

 I've got the changes scheduled for upstream submission, but testing 
against upstream GCC used to be problematic for various reasons and then 
disappeared from the radar owing to other stuff; in any case apologies for 
the delay incurred.

 However I believe I have a more complete version of the change, including 
in particular a number of code simplifications as well as a documentation 
update, although no update for the reciprocals (that I think should be 
handled separately; I reckon there was some peculiarity there).

 Given that, how about we coordinate these submissions -- I'll dig out the 
most recent version of the madd change and push it through testing before 
I post it and you do the reciprocals?

  Maciej


Re: PATCH: Remove AM_MAKEFLAGS from libsanitizer

2012-12-12 Thread Paolo Bonzini
Il 12/12/2012 19:11, H.J. Lu ha scritto:
>>> >>
>>> >> in RAW_CXX_TARGET_EXPORTS.  There is no need to do anything.
>> >
>> > Nope, if you remove this you get the wrong definition of CC and CXX from
>> > EXTRA_TARGET_FLAGS.  Instead, you need to add RAW_CXX_FOR_TARGET to
>> > AM_MAKEFLAGS and EXTRA_TARGET_FLAGS.
>> >
> We have set CXX and CXX_FOR_TARGET before EXTRA_TARGET_FLAGS
> is used:
> 
> RAW_CXX_TARGET_EXPORTS = \
> $(BASE_TARGET_EXPORTS) \
> CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
> CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
> export CXX;
> 
> Shouldn't we the right CXX and CXX_FOR_TARGET?

The purpose of EXTRA_TARGET_FLAGS is to forward any *_FOR_TARGET
variable passed on the command line to the toplevel Makefile.

Paolo


[patch, mips] Patch to allow madd, etc without 64 bit floating point

2012-12-12 Thread Steve Ellcey
The MIPS32 architecture supports madd, msub, nmadd, and nmsub instructions
as well as rsqrt and recip instructions even when it doesn't have a 64 bit
floating point unit.  This was not clear in the original MIPS32 architecture
document (MD00086) but was clarified in 2011 (version 3.0.2, March 21, 2011).
This patch updates the ISA_HAS_FP4 and ISA_HAS_NMADD4_NMSUB4 macros and the
recip_condition mode attribute to allow the use of these instructions for all
MIPS32 architectures (with or without FLOAT64).

Testing showed one new test failure (gcc.target/mips/rsqrt-4.c) which I 
updated to reflect that we can now generate and use a rsqrt.d instruction.

I put Chao-ying and Maciej's names on the patch since they did the
original changes in GCC but those changes were never checked in upstream.

OK to checkin?

Steve Ellcey
sell...@mips.com



In the spec MD00086, page 347 Revision History, 3.0.2, March 21, 2011, RECIP, R
SQRT instructions do not require 64-bit FPU.



2012-12-12  Steve Ellcey  
Chao-ying Fu  
Maciej W. Rozycki  

* config/mips/mips.h (ISA_HAS_FP4): Allow madd, etc. on all MIPS32R2.
(ISA_HAS_NMADD4_NMSUB4): Ditto.
* config/mips/mips.md (recip_condition): Allow recip and rsqrt on
all MIPS32R2.

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 60b26cb..44874e9 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -858,7 +858,7 @@ struct mips_cpu_info {
FP madd and msub instructions, and the FP recip and recip sqrt
instructions.  */
 #define ISA_HAS_FP4((ISA_MIPS4 \
- || (ISA_MIPS32R2 && TARGET_FLOAT64)   \
+ || ISA_MIPS32R2   \
  || ISA_MIPS64 \
  || ISA_MIPS64R2)  \
 && !TARGET_MIPS16)
@@ -889,10 +889,7 @@ struct mips_cpu_info {
 /* ISA has floating-point nmadd and nmsub instructions
'd = -((a * b) [+-] c)'.  */
 #define ISA_HAS_NMADD4_NMSUB4(MODE)\
-   ((ISA_MIPS4 \
- || (ISA_MIPS32R2 && (MODE) == V2SFmode) \
- || ISA_MIPS64 \
- || ISA_MIPS64R2)  \
+(ISA_HAS_FP4   \
 && (!TARGET_MIPS5400 || TARGET_MAD)\
 && !TARGET_MIPS16)
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index a3aa922..796597e 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -858,12 +858,10 @@
   [(SF "!ISA_MIPS1") (DF "!ISA_MIPS1") (V2SF "TARGET_SB1")])
 
 ;; This attribute gives the conditions under which RECIP.fmt and RSQRT.fmt
-;; instructions can be used.  The MIPS32 and MIPS64 ISAs say that RECIP.D
-;; and RSQRT.D are unpredictable when doubles are stored in pairs of FPRs,
-;; so for safety's sake, we apply this restriction to all targets.
+;; instructions can be used.
 (define_mode_attr recip_condition
   [(SF "ISA_HAS_FP4")
-   (DF "ISA_HAS_FP4 && TARGET_FLOAT64")
+   (DF "ISA_HAS_FP4")
(V2SF "TARGET_SB1")])
 
 ;; This code iterator allows signed and unsigned widening multiplications

= Test change ===

2012-12-12  Steve Ellcey  

* gcc.target/mips/rsqrt-4.c: Update to show use of sqrt.d.


diff --git a/gcc/testsuite/gcc.target/mips/rsqrt-4.c 
b/gcc/testsuite/gcc.target/mips/rsqrt-4.c
index 6b6577e..471dd28 100644
--- a/gcc/testsuite/gcc.target/mips/rsqrt-4.c
+++ b/gcc/testsuite/gcc.target/mips/rsqrt-4.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-ffast-math -mips64 -mhard-float -mgp32" } */
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
-/* { dg-final { scan-assembler-not "\trsqrt.d\t" } } */
+/* { dg-final { scan-assembler-times "\trsqrt.d\t" 2 } } */
 /* { dg-final { scan-assembler-times "\trsqrt.s\t" 2 } } */
 
 extern double sqrt(double);


Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-12 Thread Richard Henderson
On 12/12/2012 11:50 AM, Jakub Jelinek wrote:
> 2012-12-12  Jakub Jelinek  
> 
>   PR sanitizer/55508
>   * builtin-attrs.def (ATTR_TMPURE_NOTHROW_LEAF_LIST,
>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): New.
>   * asan.c (ATTR_TMPURE_NOTHROW_LEAF_LIST,
>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): Define.
>   * sanitizer.def: Make __asan_report_* and __asan_handle_no_return
>   builtins tm pure.

Ok.

Agreed about we need another solution for tsan + tm.


r~


Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-12 Thread Jack Howarth
On Wed, Dec 12, 2012 at 08:50:33PM +0100, Jakub Jelinek wrote:
> Hi!
> 
> Various TM tests ICE when built with -fgnu-tm -fsanitizer=address.
> The problem is that asan.c pass adds calls to builtins that weren't there
> before and TM is upset about it.  The __asan_report* are all like
> abort, in correctly written program they shouldn't have a user visible
> effect, in bad program they will terminate the process, but in any case
> it doesn't matter how many times they are retried as part of a transaction,
> there is no state to roll back on transaction cancellation.
> __asan_handle_no_return, while not being noreturn, just marks the stack as
> unprotected, so again in correctly written application no effect, in bad app
> might result in some issues being undetected, but still, it can be done many
> times and isn't irreversible.
> 
> The following patch fixes the ICEs by making all of these transaction pure.

Jakub,
   Are the tm.exp scan-tree-dump-times failures with...

make -k check RUNTESTFLAGS="tm.exp --target_board=unix'{-fsanitize=address}'"

expected? I see...

FAIL: gcc.dg/tm/memopt-3.c scan-tree-dump-times tmmark "logging: 
lala.x\\[i_4\\]" 1
FAIL: gcc.dg/tm/memopt-4.c scan-tree-dump-times tmedge "tm_save.[0-9_]+ = 
lala.x\\[55\\]" 1
FAIL: gcc.dg/tm/memopt-4.c scan-tree-dump-times tmedge "lala.x\\[55\\] = 
tm_save" 1
FAIL: gcc.dg/tm/memopt-5.c scan-tree-dump-times tmedge "ITM_LU[0-9] 
\\(&lala.x\\[55\\]" 1
FAIL: gcc.dg/tm/memopt-7.c scan-tree-dump-times tmedge "tm_save.[0-9_]+ = lala" 
1
FAIL: gcc.dg/tm/memopt-7.c scan-tree-dump-times tmedge "lala = tm_save" 1
FAIL: g++.dg/tm/noexcept-5.C scan-tree-dump-times tmmark "ITM_RU" 1

on x86_64-apple-darwin12 with your patch applied to current gcc trunk.
   Jack

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> As for TSAN, no idea what to do, TSAN diagnostics could be in a similar
> category, there is nothing to reverse, but as the library doesn't understand
> transactions, perhaps better would be not to tsan instrument anything inside
> of transactions, until the library is made TM aware.
> 
> 2012-12-12  Jakub Jelinek  
> 
>   PR sanitizer/55508
>   * builtin-attrs.def (ATTR_TMPURE_NOTHROW_LEAF_LIST,
>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): New.
>   * asan.c (ATTR_TMPURE_NOTHROW_LEAF_LIST,
>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): Define.
>   * sanitizer.def: Make __asan_report_* and __asan_handle_no_return
>   builtins tm pure.
> 
> --- gcc/builtin-attrs.def.jj  2012-10-22 08:42:23.0 +0200
> +++ gcc/builtin-attrs.def 2012-12-12 11:56:54.942938879 +0100
> @@ -263,6 +263,11 @@ DEF_ATTR_TREE_LIST (ATTR_TMPURE_MALLOC_N
>  DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LIST,
>  ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LIST)
>  
> +DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LEAF_LIST,
> + ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
> +DEF_ATTR_TREE_LIST (ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST,
> + ATTR_TM_TMPURE, ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
> +
>  /* Construct a tree for a format_arg attribute.  */
>  #define DEF_FORMAT_ARG_ATTRIBUTE(FA) \
>DEF_ATTR_TREE_LIST (ATTR_FORMAT_ARG_##FA, ATTR_FORMAT_ARG, \
> --- gcc/asan.c.jj 2012-12-11 13:05:36.0 +0100
> +++ gcc/asan.c2012-12-12 11:57:59.626550534 +0100
> @@ -1611,8 +1611,13 @@ initialize_sanitizer_builtins (void)
>  #define BT_FN_VOID_VPTR_I16_INT BT_FN_VOID_VPTR_IX_INT[4]
>  #undef ATTR_NOTHROW_LEAF_LIST
>  #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF
> +#undef ATTR_TMPURE_NOTHROW_LEAF_LIST
> +#define ATTR_TMPURE_NOTHROW_LEAF_LIST ECF_TM_PURE | ATTR_NOTHROW_LEAF_LIST
>  #undef ATTR_NORETURN_NOTHROW_LEAF_LIST
>  #define ATTR_NORETURN_NOTHROW_LEAF_LIST ECF_NORETURN | ATTR_NOTHROW_LEAF_LIST
> +#undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
> +#define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
> +  ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
>  #undef DEF_SANITIZER_BUILTIN
>  #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,
> \
> --- gcc/sanitizer.def.jj  2012-12-11 11:28:10.0 +0100
> +++ gcc/sanitizer.def 2012-12-12 11:57:12.714833945 +0100
> @@ -32,25 +32,25 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT
>  /* Do not reorder the BUILT_IN_ASAN_REPORT* builtins, e.g. cfgcleanup.c
> relies on this order.  */
>  DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD1, "__asan_report_load1",
> -   BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
> +   BT_FN_VOID_PTR, ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST)
>  DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD2, "__asan_report_load2",
> -   BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
> +   BT_FN_VOID_PTR, ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST)
>  DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD4, 

Re: x86-64 medium memory model

2012-12-12 Thread Leif Ekblad
The small memory model will not do since I want to put data at other 
distinct addresses above 4G. I also want to place the heap at yet another 
address interval. This way it becomes easy to separate out code, data and 
heap references, and making sure that pointers are valid. The primary reason 
for not using below 2G or the last 2G, is because such numbers are formed 
naturally when doing 32-bit arithmetics, and thus could be executed by 
chance from corrupt data.


If I understand it correctly, the PIE option is very similar to the PIC 
option, and will not make it possible to use any address for both code and 
data.


Additionally, when I tried the small memory model with a start address of 
text above 4G, the linker complains about 32-bit fixups overflowing.


Leif Ekblad


- Original Message - 
From: "H.J. Lu" 

To: "Leif Ekblad" 
Cc: "GCC Patches" ; "GCC Mailing List" 


Sent: Wednesday, December 12, 2012 9:59 PM
Subject: Re: x86-64 medium memory model



On Wed, Dec 12, 2012 at 12:56 PM, Leif Ekblad  wrote:
I'm working on OS-adaptations for an OS that would use x86-64 
applications

that are located above 4G, but not in the upper area. Binutils provide a
function to be able to set the start of text to above 4G, but there are
problems with GCC when using this memory model.



Have you tried PIE with small model?  You can place your
binaries above 4G with better performance.

--
H.J. 




Re: [PATCH] asan unit tests from llvm lit-test incremental changes

2012-12-12 Thread Dodji Seketeli
Hello,

Independently of this review, I think it's be interesting to hear
Kostya's voice on:

Jakub Jelinek  writes:

> 2) In large-func-test-1.C, I had to stop matching the backtrace after
> _Znw[jm], because libasan is using the fast but inaccurate backtrace,
> and while the tests can be easily tweaked to compile with
> -fno-omit-frame-pointer, we definitely can't rely on libstdc++.so to be
> built with that option.  Most likely it isn't.  I repeat that I think
> that at least for Linux libasan should use the _Unwind* based backtrace
> at least for the fatal functions (__asan_report* etc.), and perhaps for
> these malloc wrappers like ::operator new, ::operator new[] and their
> const std::nothrow_t& variants libasan could intercept them, call
> malloc and if that returns NULL, call the original corresponding function
> so that it deals with exceptions, new handler etc.

and on:

> 3) deep-thread-stack-1.C fails for me right now with some libasan assertion,
> Kostya, can you please look at that?
>   AsanThread *t = asanThreadRegistry().GetCurrent();
>   CHECK(t);
> where it failed on the CHECK, because t was NULL.  I've skipped the test for
> now.

[...]

> --- gcc/testsuite/g++.dg/asan/deep-tail-call-1.C.jj   2012-12-04 
> 20:24:10.0 +0100
> +++ gcc/testsuite/g++.dg/asan/deep-tail-call-1.C  2012-12-05 
> 11:01:48.600443834 +0100
> @@ -1,21 +1,22 @@
> -// { dg-do run } 
> +// { dg-do run }
>  // { dg-options "-fno-omit-frame-pointer -fno-optimize-sibling-calls" }
>  // { dg-additional-options "-mno-omit-leaf-frame-pointer" { target { 
> i?86-*-* x86_64-*-* } } }
> -// { dg-shouldfail "asan" } 
> +// { dg-shouldfail "asan" }
>  
>  int global[10];
>  void __attribute__((noinline)) call4(int i) { global[i+10]++; }
>  void __attribute__((noinline)) call3(int i) { call4(i); }
>  void __attribute__((noinline)) call2(int i) { call3(i); }
>  void __attribute__((noinline)) call1(int i) { call2(i); }
> -int main(int argc, char **argv) {
> -  call1(argc);
> +volatile int one = 1;

Just curious, why do we need this variable to be volatile, especially
since the test is compiled without optimization?

> +int main() {
> +  call1(one);
>return global[0];
>  }

[...]

The patch looks OK to me in any case.

Thanks.

-- 
Dodji


Re: [PATCH] asan unit tests from llvm lit-test incremental changes

2012-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2012 at 10:16:49PM +0100, Dodji Seketeli wrote:
> Independently of this review, I think it's be interesting to hear
> Kostya's voice on:
> 
> Jakub Jelinek  writes:
> 
> > 2) In large-func-test-1.C, I had to stop matching the backtrace after
> > _Znw[jm], because libasan is using the fast but inaccurate backtrace,
> > and while the tests can be easily tweaked to compile with
> > -fno-omit-frame-pointer, we definitely can't rely on libstdc++.so to be
> > built with that option.  Most likely it isn't.  I repeat that I think
> > that at least for Linux libasan should use the _Unwind* based backtrace
> > at least for the fatal functions (__asan_report* etc.), and perhaps for
> > these malloc wrappers like ::operator new, ::operator new[] and their
> > const std::nothrow_t& variants libasan could intercept them, call
> > malloc and if that returns NULL, call the original corresponding function
> > so that it deals with exceptions, new handler etc.

Yeah, I'd appreciate that too.

> and on:
> 
> > 3) deep-thread-stack-1.C fails for me right now with some libasan assertion,
> > Kostya, can you please look at that?
> >   AsanThread *t = asanThreadRegistry().GetCurrent();
> >   CHECK(t);
> > where it failed on the CHECK, because t was NULL.  I've skipped the test for
> > now.
> 
> [...]

This one is for the testcase solved right now already by the -lasan -lpthread 
linking instead of just -lpthread (and driver adding -lasan afterwards).
We'll need to think about how to tweak the driver to add -lasan early on the
command line, before user passed -l* options.
> 
> > --- gcc/testsuite/g++.dg/asan/deep-tail-call-1.C.jj 2012-12-04 
> > 20:24:10.0 +0100
> > +++ gcc/testsuite/g++.dg/asan/deep-tail-call-1.C2012-12-05 
> > 11:01:48.600443834 +0100
> > @@ -1,21 +1,22 @@
> > -// { dg-do run } 
> > +// { dg-do run }
> >  // { dg-options "-fno-omit-frame-pointer -fno-optimize-sibling-calls" }
> >  // { dg-additional-options "-mno-omit-leaf-frame-pointer" { target { 
> > i?86-*-* x86_64-*-* } } }
> > -// { dg-shouldfail "asan" } 
> > +// { dg-shouldfail "asan" }
> >  
> >  int global[10];
> >  void __attribute__((noinline)) call4(int i) { global[i+10]++; }
> >  void __attribute__((noinline)) call3(int i) { call4(i); }
> >  void __attribute__((noinline)) call2(int i) { call3(i); }
> >  void __attribute__((noinline)) call1(int i) { call2(i); }
> > -int main(int argc, char **argv) {
> > -  call1(argc);
> > +volatile int one = 1;
> 
> Just curious, why do we need this variable to be volatile, especially
> since the test is compiled without optimization?

asan.exp tests are torture tests, they iterate over several -O* options,
unless explicitly dg-skip-if skipped.  It could be non-volatile with
asm volatile ("" : : : "memory");
or asm volatile ("" : "+m" (one)); or similar too, sure.
I just don't want to rely on argc being one, and the compiler shouldn't know
that one is 1 in the test.

> [...]
> 
> The patch looks OK to me in any case.

Thanks.

Jakub


Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-12 Thread Dodji Seketeli
Jakub Jelinek  writes:

>   PR sanitizer/55508
>   * builtin-attrs.def (ATTR_TMPURE_NOTHROW_LEAF_LIST,
>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): New.
>   * asan.c (ATTR_TMPURE_NOTHROW_LEAF_LIST,
>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): Define.
>   * sanitizer.def: Make __asan_report_* and __asan_handle_no_return
>   builtins tm pure.

This looks OK to me, if no one else objects.

-- 
Dodji


Re: x86-64 medium memory model

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 12:56 PM, Leif Ekblad  wrote:
> I'm working on OS-adaptations for an OS that would use x86-64 applications
> that are located above 4G, but not in the upper area. Binutils provide a
> function to be able to set the start of text to above 4G, but there are
> problems with GCC when using this memory model.
>

Have you tried PIE with small model?  You can place your
binaries above 4G with better performance.

-- 
H.J.


x86-64 medium memory model

2012-12-12 Thread Leif Ekblad
I'm working on OS-adaptations for an OS that would use x86-64 applications 
that are located above 4G, but not in the upper area. Binutils provide a 
function to be able to set the start of text to above 4G, but there are 
problems with GCC when using this memory model.


The first issue has to do with creating a cross-compiler that defaults to 
medium memory model using PIC. While there are switches to achieve this on 
command line (-mcmodel=medium -fpic), this is inconvinient since everything 
ported must be changed to add these switches, including libgcc and newlib. 
The cross-compiler instead should default to this memory model.


One possibility to achieve this is to add a new .h-file in the 
gcc/config/i386 directory. However, further inspection of the source 
indicates there is no macro that can be redefined to achieve this. One 
possibility is to add such a macro in gcc/config/i386.h, and implement it in 
gcc/config/i386.c.


Here is a simple patch to do this:

diff -u -r -N gcc-4.8-20121202/gcc/config/i386/i386.h 
gcc-work/gcc/config/i386/i386.h
--- gcc-4.8-20121202/gcc/config/i386/i386.h 2012-11-23 17:02:10.0 
+0100

+++ gcc-work/gcc/config/i386/i386.h 2012-12-08 12:17:40.0 +0100
@@ -86,6 +86,8 @@
#define TARGET_LP64 TARGET_ABI_64
#define TARGET_X32 TARGET_ABI_X32

+#define TARGET_MEDIUM_PIC   0
+
/* SSE4.1 defines round instructions */
#define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1
#define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0)

diff -u -r -N gcc-4.8-20121202/gcc/config/i386/i386.c 
gcc-work/gcc/config/i386/i386.c
--- gcc-4.8-20121202/gcc/config/i386/i386.c 2012-12-02 00:43:52.0 
+0100

+++ gcc-work/gcc/config/i386/i386.c 2012-12-11 21:43:48.0 +0100
@@ -3235,6 +3235,8 @@
  DLL, and is essentially just as efficient as direct addressing.  */
  if (TARGET_64BIT && DEFAULT_ABI == MS_ABI)
 ix86_cmodel = CM_SMALL_PIC, flag_pic = 1;
+  else if (TARGET_64BIT && TARGET_MEDIUM_PIC)
+ix86_cmodel = CM_MEDIUM_PIC, flag_pic = 1;
  else if (TARGET_64BIT)
 ix86_cmodel = flag_pic ? CM_SMALL_PIC : CM_SMALL;
  else

It can be used like this:

+#undef TARGET_MEDIUM_PIC
+#define TARGET_MEDIUM_PIC 1

Next, with this issue fixed, there is still another problem in libgcc when 
using a cross-compiler compiled with this memory model:


../../gcc-work/libgcc/. -I../../../gcc-work/libgcc/../gcc -I../../../gcc-work/libgcc/../include 
-DHAVE_CC_TLS -o cpuinfo.o -MT cpuinfo.o -MD -MP -MF cpuinfo.dep -c 
../../../gcc-work/libgcc/config/i386/cpuinfo.c -fvisibility=hidden -DHIDE_EXPORTS

In file included from ../../../gcc-work/libgcc/config/i386/cpuinfo.c:21:0:
../../../gcc-work/libgcc/config/i386/cpuinfo.c: In function 
'get_available_features':
../../../gcc-work/libgcc/config/i386/cpuinfo.c:236:7: error: inconsistent 
operand constraints in an 'asm'

__cpuid_count (7, 0, eax, ebx, ecx, edx);
^
../../../gcc-work/libgcc/static-object.mk:17: recipe for target `cpuinfo.o' 
failed

make[1]: *** [cpuinfo.o] Error 1
make[1]: Lämnar katalogen "/usr/src/build-gcc-noheader/rdos/libgcc"
Makefile:10619: recipe for target `all-target-libgcc' failed
make: *** [all-target-libgcc] Error 2

My guess is that __cpuid_count uses 32-bit addressing when it should be 
using 64-bit addressing. I have no patch for this as I don't understand what 
is going on here well enough, but __cpuid_count is defined in 
gcc/config/i386/cpuid.h.


In order to be able to continue to test the medium memory model I'd need 
patches to be applied to fix these issues.


Regards,
Leif Ekblad
RDOS Development



[PATCH] AIX native TLS support

2012-12-12 Thread David Edelsohn
This is the next set of patches for native TLS support on AIX.  This
mainly adds support for BSS symbols. This should use local common and
I created a tbss section name, but I cannot figure out how to convince
the AIX assembler to match the TOC symbols reference to the lcomm
symbol.  Disassembly of XLC output shows that it uses COMMON symbols,
so I am punting to that as well.

This patch also removes the section anchor flag from native TLS
symbols if it was set.

This patch uses the logic from bss_initializer_p(). Should that
function in varasm.c be made non-static?

Bootstrapped on powerpc-ibm-aix7.1.0.0, although this still does not
pass libgomp testsuite.

Thanks, David

* xcoffout.c (xcoff_tbss_section_name): Define.
* xcoffout.h (xcoff_tbss_section_name): Declare.
* config/rs6000/xcoff.h (TARGET_ENCODE_SECTION_INFO): Define.
(ASM_OUTPUT_TLS_COMMON): Merge strings.
* config/rs6000/rs6000.c (tls_private_dat_section): New.
(output_toc): Only output CSECT decoration for TLS.
Output appropriate CSECT for data or bss.
(rs6000_xcoff_asm_init_sections) Define tls_private_data_section.
(rs6000_xcoff_select_section): Handle TLS bss and private data.
(rs6000_xcoff_file_start): Generate xcoff_tbss_section_name.
(rs6000_xcoff_encode_section_info): Strip SYMBOL_FLAG_HAS_BLOCK_INFO
from native TLS symbols.

Index: xcoffout.c
===
--- xcoffout.c  (revision 194435)
+++ xcoffout.c  (working copy)
@@ -67,6 +67,7 @@
 char *xcoff_bss_section_name;
 char *xcoff_private_data_section_name;
 char *xcoff_tls_data_section_name;
+char *xcoff_tbss_section_name;
 char *xcoff_read_only_section_name;

 /* Last source file name mentioned in a NOTE insn.  */
Index: xcoffout.h
===
--- xcoffout.h  (revision 194435)
+++ xcoffout.h  (working copy)
@@ -127,6 +127,7 @@
 extern char *xcoff_bss_section_name;
 extern char *xcoff_private_data_section_name;
 extern char *xcoff_tls_data_section_name;
+extern char *xcoff_tbss_section_name;
 extern char *xcoff_read_only_section_name;

 /* Last source file name mentioned in a NOTE insn.  */
Index: config/rs6000/xcoff.h
===
--- config/rs6000/xcoff.h   (revision 194435)
+++ config/rs6000/xcoff.h   (working copy)
@@ -98,6 +98,7 @@
 #define TARGET_ASM_FUNCTION_RODATA_SECTION default_no_function_rodata_section
 #define TARGET_STRIP_NAME_ENCODING  rs6000_xcoff_strip_name_encoding
 #define TARGET_SECTION_TYPE_FLAGS  rs6000_xcoff_section_type_flags
+#define TARGET_ENCODE_SECTION_INFO rs6000_xcoff_encode_section_info

 /* FP save and restore routines.  */
 #defineSAVE_FP_PREFIX "._savef"
@@ -308,8 +309,8 @@
 #define ASM_OUTPUT_TLS_COMMON(FILE, DECL, NAME, SIZE)  \
   do { fputs(COMMON_ASM_OP, (FILE));   \
RS6000_OUTPUT_BASENAME ((FILE), (NAME));\
-   fputs("[UL]", (FILE));  \
-   fprintf ((FILE), ","HOST_WIDE_INT_PRINT_UNSIGNED"\n", (SIZE)); \
+   fprintf ((FILE), "[UL],"HOST_WIDE_INT_PRINT_UNSIGNED"\n", \
+   (SIZE));\
   } while (0)
 #endif

Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c  (revision 194435)
+++ config/rs6000/rs6000.c  (working copy)
@@ -209,6 +209,7 @@ static short cached_can_issue_more;
 static GTY(()) section *read_only_data_section;
 static GTY(()) section *private_data_section;
 static GTY(()) section *tls_data_section;
+static GTY(()) section *tls_private_data_section;
 static GTY(()) section *read_only_private_data_section;
 static GTY(()) section *sdata2_section;
 static GTY(()) section *toc_section;
@@ -22316,23 +22317,39 @@ output_toc (FILE *file, rtx x, int labelno, enum m
 output_addr_const (file, x);

 #if HAVE_AS_TLS
-  if (TARGET_XCOFF && GET_CODE (base) == SYMBOL_REF)
+  if (TARGET_XCOFF && GET_CODE (base) == SYMBOL_REF
+  && SYMBOL_REF_TLS_MODEL (base) != 0)
 {
+  tree decl = SYMBOL_REF_DECL (base);
+  if (DECL_INITIAL (decl) == NULL_TREE
+  || DECL_INITIAL (decl) == error_mark_node
+  || (flag_zero_initialized_in_bss
+  /* Leave constant zeroes in .rodata so they
+ can be shared.  */
+  && !TREE_READONLY (decl)
+  && initializer_zerop (DECL_INITIAL (decl
+   fputs ("[UL]", file);
+  else
+   fputs ("[TL]", file);
+
   if (SYMBOL_REF_TLS_MODEL (base) == TLS_MODEL_LOCAL_EXEC)
-   fputs ("[TL]@le", file);
+   fputs ("@le", file);
   else if (SYMBOL_REF_TLS_MODEL (base) == TLS_MODEL_INITIAL_EXEC)
-   fputs ("[TL]@ie", file);
+   fputs ("@ie", file);
   /* Use global-dynamic for local-dy

[PATCH] Fix nonlocalized vars regression caused by vec changes (PR debug/55665)

2012-12-12 Thread Jakub Jelinek
Hi!

Before http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193595
aka the big vec.h changes, nonlocalized_list parameter to remap_decls
used to be **, but the changes changed it to *.  That is a problem,
because then there is no difference between the case where we don't
want to push anything to nonlocalized_list (caller passes NULL) and
calling it with BLOCK_NONLOCALIZED_VARS, which is empty.
That resulted in vec_safe_push not being called, because nonlocalized_list
test was false.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2012-12-12  Jakub Jelinek  

PR debug/55665
* tree-inline.c (remap_decls): Change nonlocalized_list
to pointer to pointer to vector from pointer to vector.
(remap_block): Pass address of BLOCK_NONLOCALIZED_VARS.

* g++.dg/guality/pr55665.C: New test.

--- gcc/tree-inline.c.jj2012-11-19 14:41:26.0 +0100
+++ gcc/tree-inline.c   2012-12-12 18:23:31.937008538 +0100
@@ -536,7 +536,7 @@ can_be_nonlocal (tree decl, copy_body_da
 }
 
 static tree
-remap_decls (tree decls, vec *nonlocalized_list,
+remap_decls (tree decls, vec **nonlocalized_list,
 copy_body_data *id)
 {
   tree old_var;
@@ -557,7 +557,7 @@ remap_decls (tree decls, vec DINFO_LEVEL_TERSE)
  && !DECL_IGNORED_P (old_var)
  && nonlocalized_list)
-   vec_safe_push (nonlocalized_list, old_var);
+   vec_safe_push (*nonlocalized_list, old_var);
  continue;
}
 
@@ -575,7 +575,7 @@ remap_decls (tree decls, vec DINFO_LEVEL_TERSE)
  && !DECL_IGNORED_P (old_var)
  && nonlocalized_list)
-   vec_safe_push (nonlocalized_list, old_var);
+   vec_safe_push (*nonlocalized_list, old_var);
}
   else
{
@@ -622,7 +622,7 @@ remap_block (tree *block, copy_body_data
 
   /* Remap its variables.  */
   BLOCK_VARS (new_block) = remap_decls (BLOCK_VARS (old_block),
-   BLOCK_NONLOCALIZED_VARS (new_block),
+   &BLOCK_NONLOCALIZED_VARS (new_block),
id);
 
   if (id->transform_lang_insert_block)
--- gcc/testsuite/g++.dg/guality/pr55665.C.jj   2012-12-12 18:37:11.829241332 
+0100
+++ gcc/testsuite/g++.dg/guality/pr55665.C  2012-12-12 18:34:59.0 
+0100
@@ -0,0 +1,33 @@
+// PR debug/55655
+// { dg-do run }
+// { dg-options "-g" }
+
+extern "C" void abort ();
+struct A { A (int); int a; };
+
+__attribute__((noinline, noclone)) int
+bar (void)
+{
+  return 40;
+}
+
+__attribute__((noinline, noclone)) void
+foo (int x)
+{
+  __asm volatile ("" : : "r" (x) : "memory");
+}
+
+A::A (int x)
+{
+  static int p = bar ();
+  foo (p); // { dg-final { gdb-test 23 "p" "40" } }
+  a = ++p;
+}
+
+int
+main ()
+{
+  A a (42);
+  if (a.a != 41)
+abort ();
+}

Jakub


[C PATCH] Don't perform function array conversions on inline asm "m" constrainted inputs (PR c++/55619)

2012-12-12 Thread Jakub Jelinek
Hi!

This is something I've changed recently on the C++ side, but C has similar
issue as the testcase shows.  For "m" or other constraints that allow memory
and don't allow registers, we want to mark the operand addressable (thus
handle it quite lvalue-ish, except that we know it won't be written to),
but default_function_array_conversion can be a problem for that.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2012-12-12  Jakub Jelinek  

PR c++/55619
* c-parser.c (c_parser_asm_operands): Remove CONVERT_P
argument, don't call default_function_array_conversion
nor c_fully_fold here.
(c_parser_asm_statement): Adjust callers.
* c-typeck.c (build_asm_expr): Call c_fully_fold on inputs
and outputs here, and call default_function_array_conversion
on inputs that don't need to be addressable.

* c-c++-common/pr55619.c: New test.

--- gcc/c/c-parser.c.jj 2012-11-19 14:41:12.0 +0100
+++ gcc/c/c-parser.c2012-12-12 14:02:37.062065983 +0100
@@ -1154,7 +1154,7 @@ static void c_parser_while_statement (c_
 static void c_parser_do_statement (c_parser *);
 static void c_parser_for_statement (c_parser *);
 static tree c_parser_asm_statement (c_parser *);
-static tree c_parser_asm_operands (c_parser *, bool);
+static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
 static tree c_parser_asm_clobbers (c_parser *);
 static struct c_expr c_parser_expr_no_commas (c_parser *, struct c_expr *);
@@ -5150,10 +5150,10 @@ c_parser_asm_statement (c_parser *parser
/* For asm goto, we don't allow output operands, but reserve
   the slot for a future extension that does allow them.  */
if (!is_goto)
- outputs = c_parser_asm_operands (parser, false);
+ outputs = c_parser_asm_operands (parser);
break;
  case 1:
-   inputs = c_parser_asm_operands (parser, true);
+   inputs = c_parser_asm_operands (parser);
break;
  case 2:
clobbers = c_parser_asm_clobbers (parser);
@@ -5191,9 +5191,7 @@ c_parser_asm_statement (c_parser *parser
   goto error;
 }
 
-/* Parse asm operands, a GNU extension.  If CONVERT_P (for inputs but
-   not outputs), apply the default conversion of functions and arrays
-   to pointers.
+/* Parse asm operands, a GNU extension.
 
asm-operands:
  asm-operand
@@ -5205,10 +5203,9 @@ c_parser_asm_statement (c_parser *parser
 */
 
 static tree
-c_parser_asm_operands (c_parser *parser, bool convert_p)
+c_parser_asm_operands (c_parser *parser)
 {
   tree list = NULL_TREE;
-  location_t loc;
   while (true)
 {
   tree name, str;
@@ -5243,12 +5240,8 @@ c_parser_asm_operands (c_parser *parser,
  parser->lex_untranslated_string = true;
  return NULL_TREE;
}
-  loc = c_parser_peek_token (parser)->location;
   expr = c_parser_expression (parser);
   mark_exp_read (expr.value);
-  if (convert_p)
-   expr = default_function_array_conversion (loc, expr);
-  expr.value = c_fully_fold (expr.value, false, NULL);
   parser->lex_untranslated_string = true;
   if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
{
--- gcc/c/c-typeck.c.jj 2012-12-10 08:42:35.0 +0100
+++ gcc/c/c-typeck.c2012-12-12 14:11:55.487236911 +0100
@@ -8500,6 +8500,8 @@ build_asm_expr (location_t loc, tree str
 {
   tree output = TREE_VALUE (tail);
 
+  output = c_fully_fold (output, false, NULL);
+
   /* ??? Really, this should not be here.  Users should be using a
 proper lvalue, dammit.  But there's a long history of using casts
 in the output operands.  In cases like longlong.h, this becomes a
@@ -8557,16 +8559,27 @@ build_asm_expr (location_t loc, tree str
 mark it addressable.  */
  if (!allows_reg && allows_mem)
{
+ input = c_fully_fold (input, false, NULL);
+
  /* Strip the nops as we allow this case.  FIXME, this really
 should be rejected or made deprecated.  */
  STRIP_NOPS (input);
  if (!c_mark_addressable (input))
input = error_mark_node;
}
- else if (input != error_mark_node && VOID_TYPE_P (TREE_TYPE (input)))
+ else
{
- error_at (loc, "invalid use of void expression");
- input = error_mark_node;
+ struct c_expr expr;
+ memset (&expr, 0, sizeof (expr));
+ expr.value = input;
+ expr = default_function_array_conversion (loc, expr);
+ input = c_fully_fold (expr.value, false, NULL);
+
+ if (input != error_mark_node && VOID_TYPE_P (TREE_TYPE (input)))
+   {
+ error_at (loc, "invalid use of void expression");
+ input = error_mark_node;
+   }
  

[PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-12 Thread Jakub Jelinek
Hi!

Various TM tests ICE when built with -fgnu-tm -fsanitizer=address.
The problem is that asan.c pass adds calls to builtins that weren't there
before and TM is upset about it.  The __asan_report* are all like
abort, in correctly written program they shouldn't have a user visible
effect, in bad program they will terminate the process, but in any case
it doesn't matter how many times they are retried as part of a transaction,
there is no state to roll back on transaction cancellation.
__asan_handle_no_return, while not being noreturn, just marks the stack as
unprotected, so again in correctly written application no effect, in bad app
might result in some issues being undetected, but still, it can be done many
times and isn't irreversible.

The following patch fixes the ICEs by making all of these transaction pure.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

As for TSAN, no idea what to do, TSAN diagnostics could be in a similar
category, there is nothing to reverse, but as the library doesn't understand
transactions, perhaps better would be not to tsan instrument anything inside
of transactions, until the library is made TM aware.

2012-12-12  Jakub Jelinek  

PR sanitizer/55508
* builtin-attrs.def (ATTR_TMPURE_NOTHROW_LEAF_LIST,
ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): New.
* asan.c (ATTR_TMPURE_NOTHROW_LEAF_LIST,
ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): Define.
* sanitizer.def: Make __asan_report_* and __asan_handle_no_return
builtins tm pure.

--- gcc/builtin-attrs.def.jj2012-10-22 08:42:23.0 +0200
+++ gcc/builtin-attrs.def   2012-12-12 11:56:54.942938879 +0100
@@ -263,6 +263,11 @@ DEF_ATTR_TREE_LIST (ATTR_TMPURE_MALLOC_N
 DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LIST,
   ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LIST)
 
+DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LEAF_LIST,
+   ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST,
+   ATTR_TM_TMPURE, ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
+
 /* Construct a tree for a format_arg attribute.  */
 #define DEF_FORMAT_ARG_ATTRIBUTE(FA)   \
   DEF_ATTR_TREE_LIST (ATTR_FORMAT_ARG_##FA, ATTR_FORMAT_ARG,   \
--- gcc/asan.c.jj   2012-12-11 13:05:36.0 +0100
+++ gcc/asan.c  2012-12-12 11:57:59.626550534 +0100
@@ -1611,8 +1611,13 @@ initialize_sanitizer_builtins (void)
 #define BT_FN_VOID_VPTR_I16_INT BT_FN_VOID_VPTR_IX_INT[4]
 #undef ATTR_NOTHROW_LEAF_LIST
 #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF
+#undef ATTR_TMPURE_NOTHROW_LEAF_LIST
+#define ATTR_TMPURE_NOTHROW_LEAF_LIST ECF_TM_PURE | ATTR_NOTHROW_LEAF_LIST
 #undef ATTR_NORETURN_NOTHROW_LEAF_LIST
 #define ATTR_NORETURN_NOTHROW_LEAF_LIST ECF_NORETURN | ATTR_NOTHROW_LEAF_LIST
+#undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
+#define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
+  ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
 #undef DEF_SANITIZER_BUILTIN
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,  \
--- gcc/sanitizer.def.jj2012-12-11 11:28:10.0 +0100
+++ gcc/sanitizer.def   2012-12-12 11:57:12.714833945 +0100
@@ -32,25 +32,25 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT
 /* Do not reorder the BUILT_IN_ASAN_REPORT* builtins, e.g. cfgcleanup.c
relies on this order.  */
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD1, "__asan_report_load1",
- BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
+ BT_FN_VOID_PTR, ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD2, "__asan_report_load2",
- BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
+ BT_FN_VOID_PTR, ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD4, "__asan_report_load4",
- BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
+ BT_FN_VOID_PTR, ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD8, "__asan_report_load8",
- BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
+ BT_FN_VOID_PTR, ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD16, "__asan_report_load16",
- BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
+ BT_FN_VOID_PTR, ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_STORE1, "__asan_report_store1",
- BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
+ BT_FN_VOID_PTR, ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_STORE2, "__asan_report_store2",
- BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
+ BT_F

[C++ PATCH] Fix ICE in merge_exception_specifiers (PR c++/55652)

2012-12-12 Thread Jakub Jelinek
Hi!

This is a follow-up fix for my earlier commit, in merge_exception_specifiers
can be NULL, noex == boolean_true_node comparison then was false, but
operand_equal_p crashes on it.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2012-12-12  Jakub Jelinek  

PR c++/55652
* typeck2.c (merge_exception_specifiers): Don't call operand_equal_p
if noex is NULL.

* g++.dg/cpp0x/noexcept19.C: New test.

--- gcc/cp/typeck2.c.jj 2012-12-06 19:54:56.0 +0100
+++ gcc/cp/typeck2.c2012-12-12 10:12:33.468377515 +0100
@@ -1871,7 +1871,7 @@ merge_exception_specifiers (tree list, t
   /* If ADD is a deferred noexcept, we must have been called from
 process_subob_fn.  For implicitly declared functions, we build up
 a list of functions to consider at instantiation time.  */
-  if (operand_equal_p (noex, boolean_true_node, 0))
+  if (noex && operand_equal_p (noex, boolean_true_node, 0))
noex = NULL_TREE;
   gcc_assert (fn && (!noex || is_overloaded_fn (noex)));
   noex = build_overload (fn, noex);
--- gcc/testsuite/g++.dg/cpp0x/noexcept19.C.jj  2012-12-12 10:11:20.182816929 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/noexcept19.C 2012-12-12 10:10:50.0 
+0100
@@ -0,0 +1,29 @@
+// PR c++/55652
+// { dg-do compile }
+// { dg-options "-std=c++11" }
+
+template 
+struct A
+{
+  static const bool a = false;
+};
+
+template >
+struct B
+{
+  B () noexcept (A ::a) {}
+};
+
+template 
+struct C
+{
+  X x;
+  Y y;
+};
+
+struct D
+{
+  D () throw (int);
+};
+
+C > c;

Jakub


[PATCH] Fix extremely large LPBX arrays (PR gcov-profile/55650)

2012-12-12 Thread Jakub Jelinek
Hi!

On the attached testcase prg_ctr_mask is non-zero, presumably set
while there still were some functions in the TU, but later on none of them
are being emitted.  This leads to n_functions in coverage_obj_finish being
0, and the array thus containing 0x1 elements.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2012-12-12  Jakub Jelinek  

PR gcov-profile/55650
* coverage.c (coverage_obj_init): Return false if no functions
are being emitted.

* g++.dg/other/pr55650.C: New test.
* g++.dg/other/pr55650.cc: New file.

--- gcc/coverage.c.jj   2012-11-19 14:41:24.0 +0100
+++ gcc/coverage.c  2012-12-12 08:54:35.005180211 +0100
@@ -999,6 +999,9 @@ coverage_obj_init (void)
   /* The function is not being emitted, remove from list.  */
   *fn_prev = fn->next;
 
+  if (functions_head == NULL)
+return false;
+
   for (ix = 0; ix != GCOV_COUNTERS; ix++)
 if ((1u << ix) & prg_ctr_mask)
   n_counters++;
--- gcc/testsuite/g++.dg/other/pr55650.C.jj 2012-12-12 09:03:53.342876593 
+0100
+++ gcc/testsuite/g++.dg/other/pr55650.C2012-12-12 09:03:11.0 
+0100
@@ -0,0 +1,21 @@
+// PR gcov-profile/55650
+// { dg-do link }
+// { dg-options "-O2 -fprofile-generate" }
+// { dg-additional-sources "pr55650.cc" }
+
+struct A
+{
+  virtual void foo ();
+};
+
+struct B : public A
+{
+  B ();
+  void foo () {}
+};
+
+inline A *
+bar ()
+{
+  return new B;
+}
--- gcc/testsuite/g++.dg/other/pr55650.cc.jj2012-12-12 09:03:56.329858741 
+0100
+++ gcc/testsuite/g++.dg/other/pr55650.cc   2012-12-12 09:03:48.982900718 
+0100
@@ -0,0 +1,4 @@
+int
+main ()
+{
+}

Jakub


Re: [patch c/c++]: Fix for PR c/52991 issue about ignored packed-attribute for ms-structure-layout

2012-12-12 Thread Kai Tietz
Hi,

I added two testcases  to this patch.  So that we might detect
regressions about this issue later more easily.

2012-12-12  Kai Tietz

PR c/52991
* stor-layout.c (start_record_layout): Handle
packed-attribute for ms-structure-layout.
(update_alignment_for_field): Likewise.
(place_field): Likewise.

2012-12-12  Kai Tietz

PR c/52991
* gcc.dg/attr-ms_struct-packed2.c: New file.
* gcc.dg/attr-ms_struct-packed3.c: New file.

Ok for apply?

Regards,
Kai

Index: gcc/gcc/stor-layout.c
===
--- gcc.orig/gcc/stor-layout.c
+++ gcc/gcc/stor-layout.c
@@ -756,7 +756,10 @@ start_record_layout (tree t)
   /* If the type has a minimum specified alignment (via an attribute
  declaration, for example) use it -- otherwise, start with a
  one-byte alignment.  */
-  rli->record_align = MAX (BITS_PER_UNIT, TYPE_ALIGN (t));
+  if (TYPE_PACKED (t))
+rli->record_align = BITS_PER_UNIT;
+  else
+rli->record_align = MAX (BITS_PER_UNIT, TYPE_ALIGN (t));
   rli->unpacked_align = rli->record_align;
   rli->offset_align = MAX (rli->record_align, BIGGEST_ALIGNMENT);

@@ -952,15 +955,20 @@ update_alignment_for_field (record_layou
  meaningless.  */
   if (targetm.ms_bitfield_layout_p (rli->t))
 {
+  if (rli->t && TYPE_PACKED (rli->t)
+  && (is_bitfield || !DECL_PACKED (field)
+  || DECL_SIZE (field) == NULL_TREE
+  || !integer_zerop (DECL_SIZE (field
+desired_align = BITS_PER_UNIT;
   /* Here, the alignment of the underlying type of a bitfield can
 affect the alignment of a record; even a zero-sized field
 can do this.  The alignment should be to the alignment of
 the type, except that for zero-size bitfields this only
 applies if there was an immediately prior, nonzero-size
 bitfield.  (That's the way it is, experimentally.) */
-  if ((!is_bitfield && !DECL_PACKED (field))
- || ((DECL_SIZE (field) == NULL_TREE
-  || !integer_zerop (DECL_SIZE (field)))
+  else if ((!is_bitfield && !DECL_PACKED (field))
+  || ((DECL_SIZE (field) == NULL_TREE
+  || !integer_zerop (DECL_SIZE (field)))
  ? !DECL_PACKED (field)
  : (rli->prev_field
 && DECL_BIT_FIELD_TYPE (rli->prev_field)
@@ -1414,7 +1422,12 @@ place_field (record_layout_info rli, tre
}

  /* Now align (conventionally) for the new type.  */
- type_align = TYPE_ALIGN (TREE_TYPE (field));
+ if (!TYPE_PACKED (rli->t))
+   {
+ type_align = TYPE_ALIGN (TREE_TYPE (field));
+ if (DECL_PACKED (field))
+   type_align = MIN (type_align, BITS_PER_UNIT);
+   }

  if (maximum_field_alignment != 0)
type_align = MIN (type_align, maximum_field_alignment);
Index: gcc/gcc/testsuite/gcc.dg/attr-ms_struct-packed2.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/attr-ms_struct-packed2.c
@@ -0,0 +1,31 @@
+/* Test for MS structure with packed attribute.  */
+/* { dg-do run { target *-*-interix* *-*-mingw* *-*-cygwin*
i?86-*-darwin* i?86-*-linux* x86_64-*-linux* } }
+/* { dg-options "-std=gnu99" } */
+
+extern void abort ();
+
+struct A {
+short s;
+struct { int i; } __attribute__((__ms_struct__));
+} __attribute__((__ms_struct__, __packed__));
+
+struct B {
+short s;
+struct { int i; } __attribute__((__ms_struct__, __packed__));
+} __attribute__((__ms_struct__, __packed__));
+
+struct C {
+struct { int i; } __attribute__((__ms_struct__));
+short s;
+} __attribute__((__ms_struct__, __packed__));
+
+int
+main (void)
+{
+  if (sizeof (struct C) != sizeof (struct B)
+  || sizeof (struct A) != sizeof (struct B)
+  || sizeof (struct B) != 6)
+abort ();
+
+  return 0;
+}


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Henderson
On 12/12/2012 10:32 AM, Uros Bizjak wrote:
> Please check the attached patch, it implements this limitation in a correct 
> way:
> - keeps memory operands for -Os or cold parts of the executable
> - doesn't increase register pressure
> - handles all situations where memory operand can propagate into RTX

I agree this is the right way to attack this problem.


r~


Re: [patch, mips] Follow up to pr54061 patch, fix another abort.

2012-12-12 Thread Richard Sandiford
Steve Ellcey  writes:
> OK, you have convinced me.  Here is what I am testing, OK to checkin
> once I have run it through the testsuite?

Yeah, looks good, thanks.

Richard


Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Andi Kleen
Andi Kleen  writes:
>
>>> >/* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict 
>>> > more
>>> >   than 4 branch instructions in the 16 byte window.  */
>>> > -  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>>> > +  m_PPRO | m_P4_NOCONA | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>>
>> This is special passs to handle limitations of AMD's K7/K8/K10 branch 
>> prediction.
>> Intel never had similar design, so this flag is pointless.
>
> Actually the Sandy Bridge decoded icache has a limit of 3 jumps per
> 16 byte window.

Actually it's four per 32bytes, sorry.

Here's an old patch I had lying around to optimize for that.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1b871be..9b57316 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2713,6 +2713,7 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, const 
char *arch,
 { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD},
 { "-mavx256-split-unaligned-store",
MASK_AVX256_SPLIT_UNALIGNED_STORE},
 { "-mprefer-avx128",   MASK_PREFER_AVX128},
+{ "-mjump-pad-32bytes",MASK_JUMP_PAD_32BYTES},
   };
 
   const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (flag_opts) + 6][2];
@@ -32182,6 +32183,7 @@ ix86_avoid_jump_mispredicts (void)
   rtx insn, start = get_insns ();
   int nbytes = 0, njumps = 0;
   int isjump = 0;
+  int jump_pad_window_size = TARGET_JUMP_PAD_32BYTES ? 32 : 16;
 
   /* Look for all minimal intervals of instructions containing 4 jumps.
  The intervals are bounded by START and INSN.  NBYTES is the total
@@ -32202,8 +32204,8 @@ ix86_avoid_jump_mispredicts (void)
  int align = label_to_alignment (insn);
  int max_skip = label_to_max_skip (insn);
 
- if (max_skip > 15)
-   max_skip = 15;
+ if (max_skip > jump_pad_window_size - 1)
+   max_skip = jump_pad_window_size - 1;
  /* If align > 3, only up to 16 - max_skip - 1 bytes can be
 already in the current 16 byte page, because otherwise
 ASM_OUTPUT_MAX_SKIP_ALIGN could skip max_skip or fewer
@@ -32216,7 +32218,7 @@ ix86_avoid_jump_mispredicts (void)
 INSN_UID (insn), max_skip);
  if (max_skip)
{
- while (nbytes + max_skip >= 16)
+ while (nbytes + max_skip >= jump_pad_window_size)
{
  start = NEXT_INSN (start);
  if ((JUMP_P (start)
@@ -32262,10 +32264,11 @@ ix86_avoid_jump_mispredicts (void)
 fprintf (dump_file, "Interval %i to %i has %i bytes\n",
 INSN_UID (start), INSN_UID (insn), nbytes);
 
-  if (njumps == 3 && isjump && nbytes < 16)
+  if (njumps == 3 && isjump && nbytes < jump_pad_window_size)
{
- int padsize = 15 - nbytes + min_insn_size (insn);
-
+ int padsize = jump_pad_window_size - 1 - nbytes + 
+   min_insn_size (insn);
+ 
  if (dump_file)
fprintf (dump_file, "Padding insn %i by %i bytes!\n",
 INSN_UID (insn), padsize);
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 6c516e7..b38d163 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -223,6 +223,10 @@ mintel-syntax
 Target Undocumented Alias(masm=, intel, att) Warn(%<-mintel-syntax%> and 
%<-mno-intel-syntax%> are deprecated; use %<-masm=intel%> and %<-masm=att%> 
instead)
 ;; Deprecated
 
+mjump-pad-32bytes
+Target RejectNegative Mask(JUMP_PAD_32BYTES) Save
+Avoid more than 4 jumps in each 32byte code window.
+
 mms-bitfields
 Target Report Mask(MS_BITFIELD_LAYOUT) Save
 Use native (MS) bitfield layout


-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jan Hubicka
> Jan Hubicka  writes:
> >
> > libcall is not faster up to 8KB to rep sequence that is better for 
> > regalloc/code
> > cache than fully blowin function call.
> 
> I noticed btw that some of the generated string instructions are slower 
> than just calling the C library.
> 
> rep scasb etc. is rarely a win over an optimized library function,
> it's not very optimized. Perhaps those patterns should just be disabled.
> The way to optimize that on modern CPUs is to use PCMP*STR*, but that's
> quite a bit more complicated and has some constraints.

This is only about memset/memcpy expanding.  The other sequences are quite lame 
indeed...
> 
> 
> >> >/* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict 
> >> > more
> >> >   than 4 branch instructions in the 16 byte window.  */
> >> > -  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | 
> >> > m_GENERIC,
> >> > +  m_PPRO | m_P4_NOCONA | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
> >
> > This is special passs to handle limitations of AMD's K7/K8/K10 branch 
> > prediction.
> > Intel never had similar design, so this flag is pointless.
> 
> Actually the Sandy Bridge decoded icache has a limit of 3 jumps per
> 16 byte window. If you exceed that it falls back to running 
> the full decoder from the normal icache.
> 
> I don't have solid data, but it may be a win for frontend limited
> code (otherwise possibly more in power than performance)
> 
> I would revisit that for Sandy Bridge

We are not particularly good on avoiding the branches - basically the code 
inserts alignment
whenever it things the 4 consecutive branches fit in the window.
I can make patch to change this to 3 and we can see if it helps at all.
> 
> -Andi
> -- 
> a...@linux.intel.com -- Speaking for myself only


Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Andi Kleen
Jan Hubicka  writes:
>
> libcall is not faster up to 8KB to rep sequence that is better for 
> regalloc/code
> cache than fully blowin function call.

I noticed btw that some of the generated string instructions are slower 
than just calling the C library.

rep scasb etc. is rarely a win over an optimized library function,
it's not very optimized. Perhaps those patterns should just be disabled.
The way to optimize that on modern CPUs is to use PCMP*STR*, but that's
quite a bit more complicated and has some constraints.


>> >/* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
>> >   than 4 branch instructions in the 16 byte window.  */
>> > -  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>> > +  m_PPRO | m_P4_NOCONA | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>
> This is special passs to handle limitations of AMD's K7/K8/K10 branch 
> prediction.
> Intel never had similar design, so this flag is pointless.

Actually the Sandy Bridge decoded icache has a limit of 3 jumps per
16 byte window. If you exceed that it falls back to running 
the full decoder from the normal icache.

I don't have solid data, but it may be a win for frontend limited
code (otherwise possibly more in power than performance)

I would revisit that for Sandy Bridge

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener
 wrote:

>> I assume that this is not right way for fixing such simple performance
>> anomaly since we need to do redundant work - combine load to
>> conditional and then split it back in peephole2? Does it look
>> reasonable? Why we should produce non-efficient instrucction that must
>> be splitted later?
>
> Well, either don't allow this instruction variant from the start, or allow
> the extra freedom for register allocation this creates.  It doesn't make
> sense to just reject it being generated by combine - that doesn't address
> when it materializes in another way.

Please check the attached patch, it implements this limitation in a correct way:
- keeps memory operands for -Os or cold parts of the executable
- doesn't increase register pressure
- handles all situations where memory operand can propagate into RTX

Yuri, can you please check if this patch fixes the performance problem for you?

BTW: I would really like to add some TARGET_USE_CMOVE_WITH_MEMOP
target macro and conditionalize new peephole2 patterns on it.

Uros.
Index: i386.md
===
--- i386.md (revision 194451)
+++ i386.md (working copy)
@@ -16122,6 +16122,31 @@
   operands[3] = gen_lowpart (SImode, operands[3]);
 })
 
+;; Don't do conditional moves with memory inputs
+(define_peephole2
+  [(match_scratch:SWI248 2 "r")
+   (set (match_operand:SWI248 0 "register_operand")
+   (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
+  [(reg FLAGS_REG) (const_int 0)])
+ (match_dup 0)
+ (match_operand:SWI248 3 "memory_operand")))]
+  "TARGET_CMOVE && optimize_insn_for_speed_p ()"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0)
+   (if_then_else:SWI248 (match_dup 1) (match_dup 0) (match_dup 2)))])
+
+(define_peephole2
+  [(match_scratch:SWI248 2 "r")
+   (set (match_operand:SWI248 0 "register_operand")
+   (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
+  [(reg FLAGS_REG) (const_int 0)])
+ (match_operand:SWI248 3 "memory_operand")
+ (match_dup 0)))]
+  "TARGET_CMOVE && optimize_insn_for_speed_p ()"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0)
+   (if_then_else:SWI248 (match_dup 1) (match_dup 2) (match_dup 0)))])
+
 (define_expand "movcc"
   [(set (match_operand:X87MODEF 0 "register_operand")
(if_then_else:X87MODEF
@@ -16209,6 +16234,35 @@
   [(set_attr "type" "fcmov,fcmov,icmov,icmov")
(set_attr "mode" "SF,SF,SI,SI")])
 
+;; Don't do conditional moves with memory inputs
+(define_peephole2
+  [(match_scratch:MODEF 2 "r")
+   (set (match_operand:MODEF 0 "register_and_not_any_fp_reg_operand")
+   (if_then_else:MODEF (match_operator 1 "fcmov_comparison_operator"
+ [(reg FLAGS_REG) (const_int 0)])
+ (match_dup 0)
+ (match_operand:MODEF 3 "memory_operand")))]
+  "(mode != DFmode || TARGET_64BIT)
+   && TARGET_80387 && TARGET_CMOVE
+   && optimize_insn_for_speed_p ()"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0)
+   (if_then_else:MODEF (match_dup 1) (match_dup 0) (match_dup 2)))])
+
+(define_peephole2
+  [(match_scratch:MODEF 2 "r")
+   (set (match_operand:MODEF 0 "register_and_not_any_fp_reg_operand")
+   (if_then_else:MODEF (match_operator 1 "fcmov_comparison_operator"
+ [(reg FLAGS_REG) (const_int 0)])
+ (match_operand:MODEF 3 "memory_operand")
+ (match_dup 0)))]
+  "(mode != DFmode || TARGET_64BIT)
+   && TARGET_80387 && TARGET_CMOVE
+   && optimize_insn_for_speed_p ()"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0)
+   (if_then_else:MODEF (match_dup 1) (match_dup 2) (match_dup 0)))])
+
 ;; All moves in XOP pcmov instructions are 128 bits and hence we restrict
 ;; the scalar versions to have only XMM registers as operands.
 


Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jan Hubicka
Concerning 1push per cycle, I think it is same as K7 hardware did, so move
prologue should be a win.
> > Index: config/i386/i386.c
> > ===
> > --- config/i386/i386.c  (revision 194452)
> > +++ config/i386/i386.c  (working copy)
> > @@ -1620,14 +1620,14 @@ struct processor_costs core_cost = {
> >COSTS_N_INSNS (8),   /* cost of FABS instruction.  */
> >COSTS_N_INSNS (8),   /* cost of FCHS instruction.  */
> >COSTS_N_INSNS (40),  /* cost of FSQRT instruction.  */
> > -  {{libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
> > -   {libcall, {{24, loop, true}, {128, rep_prefix_8_byte, true},
> > +  {{libcall, {{8192, rep_prefix_4_byte, true}, {-1, libcall, false}}},
> > +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
> >{-1, libcall, false,
> >{{libcall, {{6, loop_1_byte, true},
> >{24, loop, true},
> >{8192, rep_prefix_4_byte, true},
> >{-1, libcall, false}}},
> > -   {libcall, {{24, loop, true}, {512, rep_prefix_8_byte, true},
> > +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},

libcall is not faster up to 8KB to rep sequence that is better for regalloc/code
cache than fully blowin function call.
> > @@ -1806,7 +1806,7 @@ static unsigned int initial_ix86_tune_fe
> >m_PPRO,
> >
> >/* X86_TUNE_PARTIAL_FLAG_REG_STALL */
> > -  m_CORE2I7 | m_GENERIC,
> > +  m_GENERIC | m_CORE2,

This disable shifts that store just some flags. Acroding to Agner's manual I7 
handle
this well.

Partial flags stall
The Sandy Bridge uses the method of an extra Âľop to join partial registers not 
only for
general purpose registers but also for the flags register, unlike previous 
processors which
used this method only for general purpose registers. This occurs when a write 
to a part of
the flags register is followed by a read from a larger part of the flags 
register. The partial
flags stall of previous processors (See page 75) is therefore replaced by an 
extra Âľop. The
Sandy Bridge also generates an extra Âľop when reading the flags after a rotate 
instruction.

This is cheaper than the 7 cycle delay on Core this flags is trying to avoid.
> >
> >/* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
> > * on 16-bit immediate moves into memory on Core2 and Corei7.  */
> > @@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
> >m_K6,
> >
> >/* X86_TUNE_USE_CLTD */
> > -  ~(m_PENT | m_ATOM | m_K6),
> > +  ~(m_PENT | m_ATOM | m_K6 | m_GENERIC),

None of CPUs that generic care about are !USE_CLTD now after your change.
> > @@ -1910,10 +1910,10 @@ static unsigned int initial_ix86_tune_fe
> >m_ATHLON_K8,
> >
> >/* X86_TUNE_SSE_TYPELESS_STORES */
> > -  m_AMD_MULTIPLE,
> > +  m_AMD_MULTIPLE | m_CORE2I7, /**/

Hmm, I can not seem to find this in manual now, but I believe that stores also 
do not type,
so movaps is preferred over movapd store because it is shorter.  If not, this 
change should
produce a lot of slowdowns.
> >
> >/* X86_TUNE_SSE_LOAD0_BY_PXOR */
> > -  m_PPRO | m_P4_NOCONA,
> > +  m_PPRO | m_P4_NOCONA | m_CORE2I7, /**/

Agner:
A common way of setting a register to zero is XOR EAX,EAX or SUB EBX,EBX. The
Core2 and Nehalem processors recognize that certain instructions are 
independent of the
prior value of the register if the source and destination registers are the 
same.

This applies to all of the following instructions: XOR, SUB, PXOR, XORPS, 
XORPD, and all
variants of PSUBxxx and PCMPxxx except PCMPEQQ.
> >
> >/* X86_TUNE_MEMORY_MISMATCH_STALL */
> >m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
> > @@ -1938,7 +1938,7 @@ static unsigned int initial_ix86_tune_fe
> >
> >/* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
> >   than 4 branch instructions in the 16 byte window.  */
> > -  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
> > +  m_PPRO | m_P4_NOCONA | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,

This is special passs to handle limitations of AMD's K7/K8/K10 branch 
prediction.
Intel never had similar design, so this flag is pointless.

We apparently ought to disable it for K10, at least per Agner's manual.
> >
> >/* X86_TUNE_SCHEDULE */
> >m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | 
> > m_GENERIC,
> > @@ -1947,10 +1947,10 @@ static unsigned int initial_ix86_tune_fe
> >m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
> >
> >/* X86_TUNE_USE_INCDEC */
> > -  ~(m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GENERIC),
> > +  ~(m_P4_NOCONA | m_ATOM | m_GENERIC),

Skipping inc/dec is to avoid partial flag stall happening on P4 only.
> >
> >/* X86_TUNE_PAD_RETURNS */
> > -  m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC,
> > +  m_AMD_MULTIPLE | m_GENERIC,

Again this deals specifically with AMD K7/K8/K10 branch 

Re: [PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Steven Bosscher
On Wed, Dec 12, 2012 at 1:53 PM, Richard Biener wrote:
> +   fputs ("digraph \"\" { overlap=false;\n", fp);

This is "start_graph_dump (fp);"


> +   fputs ("}\n", fp);

And this is "end_graph_dump (fp);"

I think it'd be good to call those functions instead of the fputs
calls. Keeps things consistent.

Ciao!
Steven


Re: PATCH: Remove AM_MAKEFLAGS from libsanitizer

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 10:01 AM, Paolo Bonzini  wrote:
> Il 12/12/2012 18:30, H.J. Lu ha scritto:
>> On Wed, Dec 12, 2012 at 6:46 AM, Paolo Bonzini  wrote:
>>> Il 12/12/2012 15:41, H.J. Lu ha scritto:
 MAKEOVERRIDES is used for multilib.  I got

 /bin/sh ../libtool --tag=CXX   --mode=compile  -D_GNU_SOURCE -D_DEBUG
 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
 -DASAN_HAS_EXCEPTIONS=1 -DASAN_FLEXIBLE_MAPPING_AND_OFFSET=0
 -DASAN_NEEDS_SEGV=1  -I.
 -I/export/gnu/import/git/gcc/libsanitizer/asan  -I
 /export/gnu/import/git/gcc/libsanitizer/include -I
 /export/gnu/import/git/gcc/libsanitizer  -Wall -W
 -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC
 -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables
 -fvisibility=hidden -Wno-variadic-macros -Wno-c99-extensions
 -I../../libstdc++-v3/include
 -I../../libstdc++-v3/include/x86_64-unknown-linux-gnu
 -I/export/gnu/import/git/gcc/libsanitizer/../libstdc++-v3/libsupc++ -g
 -O2 -D_GNU_SOURCE  -m32 -MT asan_malloc_linux.lo -MD -MP -MF
 .deps/asan_malloc_linux.Tpo -c -o asan_malloc_linux.lo
 /export/gnu/import/git/gcc/libsanitizer/asan/asan_malloc_linux.cc
 libtool: compile: unrecognized option `-D_GNU_SOURCE'
 libtool: compile: Try `libtool --help' for more information.
 make[8]: *** [asan_allocator.lo] Error 1
 make[8]: *** Waiting for unfinished jobs
 libtool: compile: unrecognized option `-D_GNU_SOURCE'
 libtool: compile: Try `libtool --help' for more information

 I checked in this patch to restore MAKEOVERRIDES.
>>>
>>> This will break "make CFLAGS=-g".  Please revert the AM_MAKEFLAGS change
>>> fully.
>>
>> Done.
>>
>>> However, your patch that removed AM_MAKEFLAGS similarly broke "make
>>> CC=foo".  While it is much less useful, this nevertheless may be the
>>> sign of a bigger problem.  Why did you need to remove CC/CXX from
>>> AM_MAKEFLAGS in the first place?
>>>
>>
>> After further investigation, I found
>>
>> RAW_CXX_TARGET_EXPORTS = \
>> $(BASE_TARGET_EXPORTS) \
>> CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
>> CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
>> export CXX;
>> ...
>>
>> all-stage1-target-libsanitizer: configure-stage1-target-libsanitizer
>> @[ $(current_stage) = stage1 ] || $(MAKE) stage1-start
>> @r=`${PWD_COMMAND}`; export r; \
>> s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
>> TFLAGS="$(STAGE1_TFLAGS)"; \
>> $(RAW_CXX_TARGET_EXPORTS)  \
>> cd $(TARGET_SUBDIR)/libsanitizer && \
>> $(MAKE) $(BASE_FLAGS_TO_PASS) \
>> CFLAGS="$(CFLAGS_FOR_TARGET)" \
>> CXXFLAGS="$(CXXFLAGS_FOR_TARGET)" \
>> LIBCFLAGS="$(LIBCFLAGS_FOR_TARGET)" \
>> CFLAGS_FOR_TARGET="$(CFLAGS_FOR_TARGET)" \
>> CXXFLAGS_FOR_TARGET="$(CXXFLAGS_FOR_TARGET)" \
>> LIBCFLAGS_FOR_TARGET="$(LIBCFLAGS_FOR_TARGET)" \
>> $(EXTRA_TARGET_FLAGS) 'CXX=$$(RAW_CXX_FOR_TARGET)'
>> 'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'  \
>>   \
>> TFLAGS="$(STAGE1_TFLAGS)" \
>> $(TARGET-stage1-target-libsanitizer)
>>
>> The problem is
>>
>> CXX=$$(RAW_CXX_FOR_TARGET)' 'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'
>>
>> Those are bogus since
>>
>> 1. We never set RAW_CXX_FOR_TARGET.
>> 2. We have set
>>
>> CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
>> CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
>> export CXX;
>>
>> in RAW_CXX_TARGET_EXPORTS.  There is no need to do anything.
>
> Nope, if you remove this you get the wrong definition of CC and CXX from
> EXTRA_TARGET_FLAGS.  Instead, you need to add RAW_CXX_FOR_TARGET to
> AM_MAKEFLAGS and EXTRA_TARGET_FLAGS.
>

We have set CXX and CXX_FOR_TARGET before EXTRA_TARGET_FLAGS
is used:

RAW_CXX_TARGET_EXPORTS = \
$(BASE_TARGET_EXPORTS) \
CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
export CXX;

Shouldn't we the right CXX and CXX_FOR_TARGET?

-- 
H.J.


Re: [PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Andi Kleen
Richard Biener  writes:
>
> Probably not suitable for trunk because I use popen/pclose/fileno
> which I don't know whether they are available on all host platforms.

Just add a ifdef HAVE_popen or somesuch around it?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: PATCH: Remove AM_MAKEFLAGS from libsanitizer

2012-12-12 Thread Paolo Bonzini
Il 12/12/2012 18:30, H.J. Lu ha scritto:
> On Wed, Dec 12, 2012 at 6:46 AM, Paolo Bonzini  wrote:
>> Il 12/12/2012 15:41, H.J. Lu ha scritto:
>>> MAKEOVERRIDES is used for multilib.  I got
>>>
>>> /bin/sh ../libtool --tag=CXX   --mode=compile  -D_GNU_SOURCE -D_DEBUG
>>> -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
>>> -DASAN_HAS_EXCEPTIONS=1 -DASAN_FLEXIBLE_MAPPING_AND_OFFSET=0
>>> -DASAN_NEEDS_SEGV=1  -I.
>>> -I/export/gnu/import/git/gcc/libsanitizer/asan  -I
>>> /export/gnu/import/git/gcc/libsanitizer/include -I
>>> /export/gnu/import/git/gcc/libsanitizer  -Wall -W
>>> -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC
>>> -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables
>>> -fvisibility=hidden -Wno-variadic-macros -Wno-c99-extensions
>>> -I../../libstdc++-v3/include
>>> -I../../libstdc++-v3/include/x86_64-unknown-linux-gnu
>>> -I/export/gnu/import/git/gcc/libsanitizer/../libstdc++-v3/libsupc++ -g
>>> -O2 -D_GNU_SOURCE  -m32 -MT asan_malloc_linux.lo -MD -MP -MF
>>> .deps/asan_malloc_linux.Tpo -c -o asan_malloc_linux.lo
>>> /export/gnu/import/git/gcc/libsanitizer/asan/asan_malloc_linux.cc
>>> libtool: compile: unrecognized option `-D_GNU_SOURCE'
>>> libtool: compile: Try `libtool --help' for more information.
>>> make[8]: *** [asan_allocator.lo] Error 1
>>> make[8]: *** Waiting for unfinished jobs
>>> libtool: compile: unrecognized option `-D_GNU_SOURCE'
>>> libtool: compile: Try `libtool --help' for more information
>>>
>>> I checked in this patch to restore MAKEOVERRIDES.
>>
>> This will break "make CFLAGS=-g".  Please revert the AM_MAKEFLAGS change
>> fully.
> 
> Done.
> 
>> However, your patch that removed AM_MAKEFLAGS similarly broke "make
>> CC=foo".  While it is much less useful, this nevertheless may be the
>> sign of a bigger problem.  Why did you need to remove CC/CXX from
>> AM_MAKEFLAGS in the first place?
>>
> 
> After further investigation, I found
> 
> RAW_CXX_TARGET_EXPORTS = \
> $(BASE_TARGET_EXPORTS) \
> CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
> CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
> export CXX;
> ...
> 
> all-stage1-target-libsanitizer: configure-stage1-target-libsanitizer
> @[ $(current_stage) = stage1 ] || $(MAKE) stage1-start
> @r=`${PWD_COMMAND}`; export r; \
> s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
> TFLAGS="$(STAGE1_TFLAGS)"; \
> $(RAW_CXX_TARGET_EXPORTS)  \
> cd $(TARGET_SUBDIR)/libsanitizer && \
> $(MAKE) $(BASE_FLAGS_TO_PASS) \
> CFLAGS="$(CFLAGS_FOR_TARGET)" \
> CXXFLAGS="$(CXXFLAGS_FOR_TARGET)" \
> LIBCFLAGS="$(LIBCFLAGS_FOR_TARGET)" \
> CFLAGS_FOR_TARGET="$(CFLAGS_FOR_TARGET)" \
> CXXFLAGS_FOR_TARGET="$(CXXFLAGS_FOR_TARGET)" \
> LIBCFLAGS_FOR_TARGET="$(LIBCFLAGS_FOR_TARGET)" \
> $(EXTRA_TARGET_FLAGS) 'CXX=$$(RAW_CXX_FOR_TARGET)'
> 'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'  \
>   \
> TFLAGS="$(STAGE1_TFLAGS)" \
> $(TARGET-stage1-target-libsanitizer)
> 
> The problem is
> 
> CXX=$$(RAW_CXX_FOR_TARGET)' 'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'
> 
> Those are bogus since
> 
> 1. We never set RAW_CXX_FOR_TARGET.
> 2. We have set
> 
> CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
> CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
> export CXX;
> 
> in RAW_CXX_TARGET_EXPORTS.  There is no need to do anything.

Nope, if you remove this you get the wrong definition of CC and CXX from
EXTRA_TARGET_FLAGS.  Instead, you need to add RAW_CXX_FOR_TARGET to
AM_MAKEFLAGS and EXTRA_TARGET_FLAGS.

Paolo

> As the result, we get empty CXX and CXX_FOR_TARGET for multilib
> libsanitizer build.  That is why removing CC/CXX from AM_MAKEFLAGS
> was needed.  I am testing this patch.  But we don't want to pass
> CC/CXX to multilib build since:
> 
> [hjl@gnu-mic-2 x86_64-unknown-linux-gnu]$ grep "^CXX ="
> libsanitizer/Makefile 32/libsanitizer/Makefile
> libsanitizer/Makefile:CXX =
> /export/build/gnu/gcc-asan/build-x86_64-linux/./gcc/xgcc
> -shared-libgcc -B/export/build/gnu/gcc-asan/build-x86_64-linux/./gcc
> -nostdinc++ 
> -L/export/build/gnu/gcc-asan/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src
> -L/export/build/gnu/gcc-asan/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
> -B/usr/local/x86_64-unknown-linux-gnu/bin/
> -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem
> /usr/local/x86_64-unknown-linux-gnu/include -isystem
> /usr/local/x86_64-unknown-linux-gnu/sys-include
> 32/libsanitizer/Makefile:CXX =
> /export/build/gnu/gcc-asan/build-x86_64-linux/./gcc/xgcc
> -shared-libgcc -B/export/build/gnu/gcc-asan/build-x86_64-linux/./gcc
> -nostdinc++ 
> -L/export/build/gnu/gcc-asan/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src
> -L/export/build/gnu/gc

PATCH: Remove 'CXX=$$(RAW_CXX_FOR_TARGET)' 'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 9:30 AM, H.J. Lu  wrote:
> After further investigation, I found
>
> RAW_CXX_TARGET_EXPORTS = \
> $(BASE_TARGET_EXPORTS) \
> CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
> CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
> export CXX;
> ...
>
> all-stage1-target-libsanitizer: configure-stage1-target-libsanitizer
> @[ $(current_stage) = stage1 ] || $(MAKE) stage1-start
> @r=`${PWD_COMMAND}`; export r; \
> s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
> TFLAGS="$(STAGE1_TFLAGS)"; \
> $(RAW_CXX_TARGET_EXPORTS)  \
> cd $(TARGET_SUBDIR)/libsanitizer && \
> $(MAKE) $(BASE_FLAGS_TO_PASS) \
> CFLAGS="$(CFLAGS_FOR_TARGET)" \
> CXXFLAGS="$(CXXFLAGS_FOR_TARGET)" \
> LIBCFLAGS="$(LIBCFLAGS_FOR_TARGET)" \
> CFLAGS_FOR_TARGET="$(CFLAGS_FOR_TARGET)" \
> CXXFLAGS_FOR_TARGET="$(CXXFLAGS_FOR_TARGET)" \
> LIBCFLAGS_FOR_TARGET="$(LIBCFLAGS_FOR_TARGET)" \
> $(EXTRA_TARGET_FLAGS) 'CXX=$$(RAW_CXX_FOR_TARGET)'
> 'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'  \
>   \
> TFLAGS="$(STAGE1_TFLAGS)" \
> $(TARGET-stage1-target-libsanitizer)
>
> The problem is
>
> CXX=$$(RAW_CXX_FOR_TARGET)' 'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'
>
> Those are bogus since
>
> 1. We never set RAW_CXX_FOR_TARGET.
> 2. We have set
>
> CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
> CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
> export CXX;
>
> in RAW_CXX_TARGET_EXPORTS.  There is no need to do anything.
>

Here is a patch to remove 'CXX=$$(RAW_CXX_FOR_TARGET)'
'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'.  Tested on
Linux/x86-64.  OK to install?

Thanks.

-- 
H.J.
---
2012-12-12  H.J. Lu  

* Makefile.tpl (target_modules): Remove 'CXX=$$(RAW_CXX_FOR_TARGET)'
'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'.
* Makefile.in: Regenerated.

diff --git a/Makefile.tpl b/Makefile.tpl
index 5cdc119..dbcd5c3 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -1281,7 +1281,7 @@ maybe-[+make_target+]-[+module+]:
[+make_target+]-[+module+]

 [+ all prefix="target-" subdir="$(TARGET_SUBDIR)"
exports="$(RAW_CXX_TARGET_EXPORTS)"
-   args="$(EXTRA_TARGET_FLAGS) 'CXX=$$(RAW_CXX_FOR_TARGET)'
'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'" +]
+   args="$(EXTRA_TARGET_FLAGS)" +]
 [+ ELSE +]
 [+ configure prefix="target-" subdir="$(TARGET_SUBDIR)"
 check_multilibs=true
@@ -1313,11 +1313,7 @@ ELSE normal_cxx +]
$(NORMAL_TARGET_EXPORTS) \[+
 ENDIF raw_cxx +]
(cd $(TARGET_SUBDIR)/[+module+] && \
- $(MAKE) $(TARGET_FLAGS_TO_PASS) [+
-   IF raw_cxx
- +] 'CXX=$$(RAW_CXX_FOR_TARGET)'
'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)' [+
-   ENDIF raw_cxx
- +] [+extra_make_flags+] check)
+ $(MAKE) $(TARGET_FLAGS_TO_PASS) [+extra_make_flags+] check)
 [+ ENDIF no_check +]
 @endif target-[+module+]


[patch, testsuite] Check for pic support on tests that use -fPIC

2012-12-12 Thread Steve Ellcey
I think this patch qualifies as obvious so I will check it in later today if
I don't get any objections.  These two tests use the -fPIC flag but do not
check for pic support.  Tested on mips-mti-elf.

Steve Ellcey
sell...@mips.com


2012-12-12  Steve Ellcey  

* gcc.dg/pr55150-2.c: Add pic support check.
* gcc.dg/lto/pr54709_0.c: Ditto.

diff --git a/gcc/testsuite/gcc.dg/lto/pr54709_0.c 
b/gcc/testsuite/gcc.dg/lto/pr54709_0.c
index 7e38bd4..f3db5dc 100644
--- a/gcc/testsuite/gcc.dg/lto/pr54709_0.c
+++ b/gcc/testsuite/gcc.dg/lto/pr54709_0.c
@@ -1,5 +1,6 @@
 /* { dg-lto-do link } */
 /* { dg-require-visibility "hidden" } */
+/* { dg-require-effective-target fpic } */
 /* { dg-extra-ld-options { -shared } } */
 /* { dg-lto-options { { -fPIC -fvisibility=hidden -flto } } } */
 
diff --git a/gcc/testsuite/gcc.dg/pr55150-2.c b/gcc/testsuite/gcc.dg/pr55150-2.c
index 48dbb53..1be769d 100644
--- a/gcc/testsuite/gcc.dg/pr55150-2.c
+++ b/gcc/testsuite/gcc.dg/pr55150-2.c
@@ -1,5 +1,6 @@
 /* PR middle-end/55150 */
 /* { dg-do compile } */
+/* { dg-require-effective-target fpic } */
 /* { dg-options "-Os -g -fPIC" } */
 
 typedef unsigned char DES_cblock[8];


[patch] Make bitmap_head smaller again

2012-12-12 Thread Steven Bosscher
Hello,

This patch fixes a regression in the size of bitmap_head that resulted
from the removal of all the #ifdef GATHER_STATISTICS tests.

Instead of a pointer to a descriptor, this patch gives each bitmap an
integer that is the index of the bitmap_descriptor.

Bootstrapped&tested on powerpc64-unknown-linux-gnu, and built on the
same platform with memory statistics gathering enabled.
OK for trunk?

Ciao!
Steven


smaller_bitmap_head.diff
Description: Binary data


Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
Honza, can you explain each change and point to the reference?

thanks,

David

On Wed, Dec 12, 2012 at 8:37 AM, Jan Hubicka  wrote:
>> I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a
>> SP adjustment instead of a sequence of pushes/pops. The preference to
>> the MOVs are good for old CPU micro-architectures (before pentium-4,
>> K10), because it breaks the data dependency.  In modern
>> micro-architecture, push/pop is implemented using a mechanism called
>> stack engine. The data dependency is removed by the hardware, and
>> push/pop becomes very cheap (1 uOp, 1 cycle latency), and they are
>> smaller. There is no longer the need to avoid using them.   This is
>> also what ICC does.
>>
>> The following patch fixed the problem. It passes bootstrap/regression
>> test. OK to install?
>>
>> thanks,
>>
>> David
>>
>> Index: config/i386/i386.c
>> ===
>> --- config/i386/i386.c (revision 194324)
>> +++ config/i386/i386.c (working copy)
>> @@ -1919,10 +1919,10 @@ static unsigned int initial_ix86_tune_fe
>>m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>>
>>/* X86_TUNE_PROLOGUE_USING_MOVE */
>> -  m_PPRO | m_CORE2I7 | m_ATOM | m_ATHLON_K8 | m_GENERIC,
>> +  m_PPRO | m_ATHLON_K8,
>>
>>/* X86_TUNE_EPILOGUE_USING_MOVE */
>> -  m_PPRO | m_CORE2I7 | m_ATOM | m_ATHLON_K8 | m_GENERIC,
>> +  m_PPRO | m_ATHLON_K8,
>
> Push/pops wrt moves was always difficult to tune on old CPUs, so I am happy it
> is gone from generic (in fact I had similar patch pending).
> Are you sure about Atom having stack engine, too?
>
> Related thing is accumulate_outgoing_args. Igor is testing it on Core and I 
> will
> give it a try on K10.
>
> Honza
>
> I am attaching the changes for core costs I made if someone is interested in
> testing them.  If we can declare P4/PPRo and maybe K8 chips obsolette for
> generic, there is room for improvement in generic, too. Like using inc/dec
> again.
>
> Honza
>
> Index: config/i386/i386.c
> ===
> --- config/i386/i386.c  (revision 194452)
> +++ config/i386/i386.c  (working copy)
> @@ -1620,14 +1620,14 @@ struct processor_costs core_cost = {
>COSTS_N_INSNS (8),   /* cost of FABS instruction.  */
>COSTS_N_INSNS (8),   /* cost of FCHS instruction.  */
>COSTS_N_INSNS (40),  /* cost of FSQRT instruction.  */
> -  {{libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
> -   {libcall, {{24, loop, true}, {128, rep_prefix_8_byte, true},
> +  {{libcall, {{8192, rep_prefix_4_byte, true}, {-1, libcall, false}}},
> +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
>{-1, libcall, false,
>{{libcall, {{6, loop_1_byte, true},
>{24, loop, true},
>{8192, rep_prefix_4_byte, true},
>{-1, libcall, false}}},
> -   {libcall, {{24, loop, true}, {512, rep_prefix_8_byte, true},
> +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
>{-1, libcall, false,
>1,   /* scalar_stmt_cost.  */
>1,   /* scalar load_cost.  */
> @@ -1806,7 +1806,7 @@ static unsigned int initial_ix86_tune_fe
>m_PPRO,
>
>/* X86_TUNE_PARTIAL_FLAG_REG_STALL */
> -  m_CORE2I7 | m_GENERIC,
> +  m_GENERIC | m_CORE2,
>
>/* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
> * on 16-bit immediate moves into memory on Core2 and Corei7.  */
> @@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
>m_K6,
>
>/* X86_TUNE_USE_CLTD */
> -  ~(m_PENT | m_ATOM | m_K6),
> +  ~(m_PENT | m_ATOM | m_K6 | m_GENERIC),
>
>/* X86_TUNE_USE_XCHGB: Use xchgb %rh,%rl instead of rolw/rorw $8,rx.  */
>m_PENT4,
> @@ -1901,7 +1901,7 @@ static unsigned int initial_ix86_tune_fe
>m_COREI7 | m_BDVER,
>
>/* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL */
> -  m_BDVER ,
> +  m_BDVER,
>
>/* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and 
> dependencies
>   are resolved on SSE register parts instead of whole registers, so we may
> @@ -1910,10 +1910,10 @@ static unsigned int initial_ix86_tune_fe
>m_ATHLON_K8,
>
>/* X86_TUNE_SSE_TYPELESS_STORES */
> -  m_AMD_MULTIPLE,
> +  m_AMD_MULTIPLE | m_CORE2I7, /**/
>
>/* X86_TUNE_SSE_LOAD0_BY_PXOR */
> -  m_PPRO | m_P4_NOCONA,
> +  m_PPRO | m_P4_NOCONA | m_CORE2I7, /**/
>
>/* X86_TUNE_MEMORY_MISMATCH_STALL */
>m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
> @@ -1938,7 +1938,7 @@ static unsigned int initial_ix86_tune_fe
>
>/* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
>   than 4 branch instructions in the 16 byte window.  */
> -  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
> +  m_PPRO | m_P4_NOCONA | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>
>/*

Re: PATCH: Remove AM_MAKEFLAGS from libsanitizer

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 6:46 AM, Paolo Bonzini  wrote:
> Il 12/12/2012 15:41, H.J. Lu ha scritto:
>> MAKEOVERRIDES is used for multilib.  I got
>>
>> /bin/sh ../libtool --tag=CXX   --mode=compile  -D_GNU_SOURCE -D_DEBUG
>> -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
>> -DASAN_HAS_EXCEPTIONS=1 -DASAN_FLEXIBLE_MAPPING_AND_OFFSET=0
>> -DASAN_NEEDS_SEGV=1  -I.
>> -I/export/gnu/import/git/gcc/libsanitizer/asan  -I
>> /export/gnu/import/git/gcc/libsanitizer/include -I
>> /export/gnu/import/git/gcc/libsanitizer  -Wall -W
>> -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC
>> -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables
>> -fvisibility=hidden -Wno-variadic-macros -Wno-c99-extensions
>> -I../../libstdc++-v3/include
>> -I../../libstdc++-v3/include/x86_64-unknown-linux-gnu
>> -I/export/gnu/import/git/gcc/libsanitizer/../libstdc++-v3/libsupc++ -g
>> -O2 -D_GNU_SOURCE  -m32 -MT asan_malloc_linux.lo -MD -MP -MF
>> .deps/asan_malloc_linux.Tpo -c -o asan_malloc_linux.lo
>> /export/gnu/import/git/gcc/libsanitizer/asan/asan_malloc_linux.cc
>> libtool: compile: unrecognized option `-D_GNU_SOURCE'
>> libtool: compile: Try `libtool --help' for more information.
>> make[8]: *** [asan_allocator.lo] Error 1
>> make[8]: *** Waiting for unfinished jobs
>> libtool: compile: unrecognized option `-D_GNU_SOURCE'
>> libtool: compile: Try `libtool --help' for more information
>>
>> I checked in this patch to restore MAKEOVERRIDES.
>
> This will break "make CFLAGS=-g".  Please revert the AM_MAKEFLAGS change
> fully.

Done.

> However, your patch that removed AM_MAKEFLAGS similarly broke "make
> CC=foo".  While it is much less useful, this nevertheless may be the
> sign of a bigger problem.  Why did you need to remove CC/CXX from
> AM_MAKEFLAGS in the first place?
>

After further investigation, I found

RAW_CXX_TARGET_EXPORTS = \
$(BASE_TARGET_EXPORTS) \
CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
export CXX;
...

all-stage1-target-libsanitizer: configure-stage1-target-libsanitizer
@[ $(current_stage) = stage1 ] || $(MAKE) stage1-start
@r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
TFLAGS="$(STAGE1_TFLAGS)"; \
$(RAW_CXX_TARGET_EXPORTS)  \
cd $(TARGET_SUBDIR)/libsanitizer && \
$(MAKE) $(BASE_FLAGS_TO_PASS) \
CFLAGS="$(CFLAGS_FOR_TARGET)" \
CXXFLAGS="$(CXXFLAGS_FOR_TARGET)" \
LIBCFLAGS="$(LIBCFLAGS_FOR_TARGET)" \
CFLAGS_FOR_TARGET="$(CFLAGS_FOR_TARGET)" \
CXXFLAGS_FOR_TARGET="$(CXXFLAGS_FOR_TARGET)" \
LIBCFLAGS_FOR_TARGET="$(LIBCFLAGS_FOR_TARGET)" \
$(EXTRA_TARGET_FLAGS) 'CXX=$$(RAW_CXX_FOR_TARGET)'
'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'  \
  \
TFLAGS="$(STAGE1_TFLAGS)" \
$(TARGET-stage1-target-libsanitizer)

The problem is

CXX=$$(RAW_CXX_FOR_TARGET)' 'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)'

Those are bogus since

1. We never set RAW_CXX_FOR_TARGET.
2. We have set

CXX_FOR_TARGET="$(RAW_CXX_FOR_TARGET)"; export CXX_FOR_TARGET; \
CXX="$(RAW_CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS";
export CXX;

in RAW_CXX_TARGET_EXPORTS.  There is no need to do anything.

As the result, we get empty CXX and CXX_FOR_TARGET for multilib
libsanitizer build.  That is why removing CC/CXX from AM_MAKEFLAGS
was needed.  I am testing this patch.  But we don't want to pass
CC/CXX to multilib build since:

[hjl@gnu-mic-2 x86_64-unknown-linux-gnu]$ grep "^CXX ="
libsanitizer/Makefile 32/libsanitizer/Makefile
libsanitizer/Makefile:CXX =
/export/build/gnu/gcc-asan/build-x86_64-linux/./gcc/xgcc
-shared-libgcc -B/export/build/gnu/gcc-asan/build-x86_64-linux/./gcc
-nostdinc++ 
-L/export/build/gnu/gcc-asan/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src
-L/export/build/gnu/gcc-asan/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
-B/usr/local/x86_64-unknown-linux-gnu/bin/
-B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem
/usr/local/x86_64-unknown-linux-gnu/include -isystem
/usr/local/x86_64-unknown-linux-gnu/sys-include
32/libsanitizer/Makefile:CXX =
/export/build/gnu/gcc-asan/build-x86_64-linux/./gcc/xgcc
-shared-libgcc -B/export/build/gnu/gcc-asan/build-x86_64-linux/./gcc
-nostdinc++ 
-L/export/build/gnu/gcc-asan/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src
-L/export/build/gnu/gcc-asan/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-B/usr/local/x86_64-unknown-linux-gnu/bin/
-B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem
/usr/local/x86_64-unknown-linux-gnu/include -isystem
/usr/local/x86_64-unknown-linux-gnu/sys-include  -m32
[hjl@gnu-mic-2 x86_64-unknown-linux-gnu]$

As you can see, CXX are different for multilib.  If we CXX down:

all-multi:
 

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Xinliang David Li
On Wed, Dec 12, 2012 at 8:37 AM, Jan Hubicka  wrote:
>> I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a
>> SP adjustment instead of a sequence of pushes/pops. The preference to
>> the MOVs are good for old CPU micro-architectures (before pentium-4,
>> K10), because it breaks the data dependency.  In modern
>> micro-architecture, push/pop is implemented using a mechanism called
>> stack engine. The data dependency is removed by the hardware, and
>> push/pop becomes very cheap (1 uOp, 1 cycle latency), and they are
>> smaller. There is no longer the need to avoid using them.   This is
>> also what ICC does.
>>
>> The following patch fixed the problem. It passes bootstrap/regression
>> test. OK to install?
>>
>> thanks,
>>
>> David
>>
>> Index: config/i386/i386.c
>> ===
>> --- config/i386/i386.c (revision 194324)
>> +++ config/i386/i386.c (working copy)
>> @@ -1919,10 +1919,10 @@ static unsigned int initial_ix86_tune_fe
>>m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
>>
>>/* X86_TUNE_PROLOGUE_USING_MOVE */
>> -  m_PPRO | m_CORE2I7 | m_ATOM | m_ATHLON_K8 | m_GENERIC,
>> +  m_PPRO | m_ATHLON_K8,
>>
>>/* X86_TUNE_EPILOGUE_USING_MOVE */
>> -  m_PPRO | m_CORE2I7 | m_ATOM | m_ATHLON_K8 | m_GENERIC,
>> +  m_PPRO | m_ATHLON_K8,
>
> Push/pops wrt moves was always difficult to tune on old CPUs, so I am happy it
> is gone from generic (in fact I had similar patch pending).
> Are you sure about Atom having stack engine, too?
>

Good question. The instruction latency table
(http://www.agner.org/optimize/instruction_tables.pdf) shows that for
Atom: push r has one 1uop, 1 cycle latency. However the instruction is
not pairable which will affect ILP. The guide here
http://www.agner.org/optimize/microarchitecture.pdf does not mention
Atom has stack engine either.

I will help collect some performance data on Atom.


thanks,

David


> Related thing is accumulate_outgoing_args. Igor is testing it on Core and I 
> will
> give it a try on K10.
>
> Honza
>
> I am attaching the changes for core costs I made if someone is interested in
> testing them.  If we can declare P4/PPRo and maybe K8 chips obsolette for
> generic, there is room for improvement in generic, too. Like using inc/dec
> again.
>
> Honza
>
> Index: config/i386/i386.c
> ===
> --- config/i386/i386.c  (revision 194452)
> +++ config/i386/i386.c  (working copy)
> @@ -1620,14 +1620,14 @@ struct processor_costs core_cost = {
>COSTS_N_INSNS (8),   /* cost of FABS instruction.  */
>COSTS_N_INSNS (8),   /* cost of FCHS instruction.  */
>COSTS_N_INSNS (40),  /* cost of FSQRT instruction.  */
> -  {{libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
> -   {libcall, {{24, loop, true}, {128, rep_prefix_8_byte, true},
> +  {{libcall, {{8192, rep_prefix_4_byte, true}, {-1, libcall, false}}},
> +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
>{-1, libcall, false,
>{{libcall, {{6, loop_1_byte, true},
>{24, loop, true},
>{8192, rep_prefix_4_byte, true},
>{-1, libcall, false}}},
> -   {libcall, {{24, loop, true}, {512, rep_prefix_8_byte, true},
> +   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
>{-1, libcall, false,
>1,   /* scalar_stmt_cost.  */
>1,   /* scalar load_cost.  */
> @@ -1806,7 +1806,7 @@ static unsigned int initial_ix86_tune_fe
>m_PPRO,
>
>/* X86_TUNE_PARTIAL_FLAG_REG_STALL */
> -  m_CORE2I7 | m_GENERIC,
> +  m_GENERIC | m_CORE2,
>
>/* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
> * on 16-bit immediate moves into memory on Core2 and Corei7.  */
> @@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
>m_K6,
>
>/* X86_TUNE_USE_CLTD */
> -  ~(m_PENT | m_ATOM | m_K6),
> +  ~(m_PENT | m_ATOM | m_K6 | m_GENERIC),
>
>/* X86_TUNE_USE_XCHGB: Use xchgb %rh,%rl instead of rolw/rorw $8,rx.  */
>m_PENT4,
> @@ -1901,7 +1901,7 @@ static unsigned int initial_ix86_tune_fe
>m_COREI7 | m_BDVER,
>
>/* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL */
> -  m_BDVER ,
> +  m_BDVER,
>
>/* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and 
> dependencies
>   are resolved on SSE register parts instead of whole registers, so we may
> @@ -1910,10 +1910,10 @@ static unsigned int initial_ix86_tune_fe
>m_ATHLON_K8,
>
>/* X86_TUNE_SSE_TYPELESS_STORES */
> -  m_AMD_MULTIPLE,
> +  m_AMD_MULTIPLE | m_CORE2I7, /**/
>
>/* X86_TUNE_SSE_LOAD0_BY_PXOR */
> -  m_PPRO | m_P4_NOCONA,
> +  m_PPRO | m_P4_NOCONA | m_CORE2I7, /**/
>
>/* X86_TUNE_MEMORY_MISMATCH_STALL */
>m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
> @@ -1938,7 +1938,7 @@ static

Re: [patch, mips] Follow up to pr54061 patch, fix another abort.

2012-12-12 Thread Steve Ellcey
On Wed, 2012-12-12 at 08:21 +, Richard Sandiford wrote:

> I'm pretty sure we'll need more eventually though.  A quick inspection
> shows that we don't set mips_dbx_regno for DSP_ACC_REGS or ST_REGS.
> DSP_ACC_REGS in paticular seems likely to hit, although you need to
> test with an -mdsp option to get coverage.  It's even conceivable
> that we could end up with info for GOT_VERSION_REGNUM.
> 
> I don't think we really gain much by trying to distinguish in mips_dbx_regno
> between INVALID_REGNUM (register isn't used at all by the compiler) and
> IGNORED_DWARF_REGNUM.  Especially since we don't distinguish between
> registers that have been disabled through -msoft-float, -mips3 and below
> (for ST_REGS other than $fcc0), -mno-dsp, etc.
> 
> Richard

OK, you have convinced me.  Here is what I am testing, OK to checkin
once I have run it through the testsuite?

Steve Ellcey
sell...@mips.com


2012-12-12  Steve Ellcey  

* config/mips/mips.c (mips_option_override): Set
mips_dbx_regno entries to IGNORED_DWARF_REGNUM by default.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 820b228..7ac8bb7 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -16742,7 +16742,7 @@ mips_option_override (void)
 
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 {
-  mips_dbx_regno[i] = INVALID_REGNUM;
+  mips_dbx_regno[i] = IGNORED_DWARF_REGNUM;
   if (GP_REG_P (i) || FP_REG_P (i) || ALL_COP_REG_P (i))
mips_dwarf_regno[i] = i;
   else
@@ -16757,9 +16757,6 @@ mips_option_override (void)
   for (i = FP_REG_FIRST; i <= FP_REG_LAST; i++)
 mips_dbx_regno[i] = i + start;
 
-  for (i = ALL_COP_REG_FIRST; i <= ALL_COP_REG_LAST; i++)
-mips_dbx_regno[i] = IGNORED_DWARF_REGNUM;
-
   /* Accumulator debug registers use big-endian ordering.  */
   mips_dbx_regno[HI_REGNUM] = MD_DBX_FIRST + 0;
   mips_dbx_regno[LO_REGNUM] = MD_DBX_FIRST + 1;




[Patch, Fortran] *ping* + PR55638 - elemental: VALUE w/o INTENT fix

2012-12-12 Thread Tobias Burnus

Dear all,

first, I like to ping two patches:

* MOVE_ALLOC: http://gcc.gnu.org/ml/fortran/2012-12/msg00058.html
* MODULE renaming: http://gcc.gnu.org/ml/fortran/2012-12/msg00022.html
  Note: The proper PR number is 55197.

 * * *

Secondly, the attached patch allows VALUE arguments to ELEMENTAL without 
requiring an INTENT(IN). [intent(out)/intent(inout) are not allowed with 
VALUE.]


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

(In Fortran 2003, PURE required INTENT, which was relaxed in Fortran 
2008 such that no VALUE attribute is needed. However, due to IMPURE 
elemental, the intent constraint got lost and had to be re-added in 
F2008 Corr 1. With -std=f2003, an argument with VALUE and w/o intent(in) 
triggers the error message in PURE - also for ELEMENTAL, which should be 
okay.)


Tobias
2012-12-12  Tobias Burnus  

	PR fortran/55638
	* resolve.c (resolve_formal_arglist): Allow VALUE without
	INTENT for ELEMENTAL procedures.

2012-12-12  Tobias Burnus  

	PR fortran/55638
	* gfortran.dg/elemental_args_check_3.f90: Update dg-error.
	* gfortran.dg/elemental_args_check_7.f90: New.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 1c7b5fb..d4d5eb9 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -488,10 +488,12 @@ resolve_formal_arglist (gfc_symbol *proc)
 	  continue;
 	}
 
-	  if (sym->attr.intent == INTENT_UNKNOWN)
+	  /* Fortran 2008 Corrigendum 1, C1290a.  */
+	  if (sym->attr.intent == INTENT_UNKNOWN && !sym->attr.value)
 	{
 	  gfc_error ("Argument '%s' of elemental procedure '%s' at %L must "
-			 "have its INTENT specified", sym->name, proc->name,
+			 "have its INTENT specified or have the VALUE "
+			 "attribute", sym->name, proc->name,
 			 &sym->declared_at);
 	  continue;
 	}
diff --git a/gcc/testsuite/gfortran.dg/elemental_args_check_3.f90 b/gcc/testsuite/gfortran.dg/elemental_args_check_3.f90
index 77111f1..8d63874 100644
--- a/gcc/testsuite/gfortran.dg/elemental_args_check_3.f90
+++ b/gcc/testsuite/gfortran.dg/elemental_args_check_3.f90
@@ -13,7 +13,7 @@ CONTAINS
 (a, & ! { dg-error "must be scalar" }
  b, & ! { dg-error "POINTER attribute" }
  c, & ! { dg-error "ALLOCATABLE attribute" }
- d) ! { dg-error "INTENT specified" }
+ d) ! { dg-error "must have its INTENT specified or have the VALUE attribute" }
 INTEGER, INTENT(IN) :: a(:)
 INTEGER, POINTER, INTENT(IN) :: b
 INTEGER, ALLOCATABLE, INTENT(IN) :: c
diff --git a/gcc/testsuite/gfortran.dg/elemental_args_check_7.f90 b/gcc/testsuite/gfortran.dg/elemental_args_check_7.f90
new file mode 100644
index 000..7b5843b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/elemental_args_check_7.f90
@@ -0,0 +1,26 @@
+! { dg-do compile }
+!
+! PR fortran/55638
+!
+! Additionally, VALUE no INTENT is required (and only "intent(in)" allowed)
+!
+
+  elemental subroutine foo(x, y, z)
+integer, intent(inout) :: x
+integer, VALUE :: y
+integer, VALUE, intent(in) :: z
+x = y
+  end subroutine foo
+
+  impure elemental subroutine foo2(x, y, z) ! { dg-error "Argument 'x' of elemental procedure 'foo2' at .1. must have its INTENT specified or have the VALUE attribute" }
+integer :: x 
+integer, VALUE :: y
+integer, VALUE :: z
+x = y
+  end subroutine foo2
+
+  subroutine foo3(x, y, z)
+integer, VALUE, intent(in) :: x
+integer, VALUE, intent(inout) :: y ! { dg-error "VALUE attribute conflicts with INTENT.INOUT. attribute" }
+integer, VALUE, intent(out) :: z ! { dg-error "VALUE attribute conflicts with INTENT.OUT. attribute" }
+  end subroutine foo3



Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

2012-12-12 Thread Jan Hubicka
> I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a
> SP adjustment instead of a sequence of pushes/pops. The preference to
> the MOVs are good for old CPU micro-architectures (before pentium-4,
> K10), because it breaks the data dependency.  In modern
> micro-architecture, push/pop is implemented using a mechanism called
> stack engine. The data dependency is removed by the hardware, and
> push/pop becomes very cheap (1 uOp, 1 cycle latency), and they are
> smaller. There is no longer the need to avoid using them.   This is
> also what ICC does.
> 
> The following patch fixed the problem. It passes bootstrap/regression
> test. OK to install?
> 
> thanks,
> 
> David
> 
> Index: config/i386/i386.c
> ===
> --- config/i386/i386.c (revision 194324)
> +++ config/i386/i386.c (working copy)
> @@ -1919,10 +1919,10 @@ static unsigned int initial_ix86_tune_fe
>m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
> 
>/* X86_TUNE_PROLOGUE_USING_MOVE */
> -  m_PPRO | m_CORE2I7 | m_ATOM | m_ATHLON_K8 | m_GENERIC,
> +  m_PPRO | m_ATHLON_K8,
> 
>/* X86_TUNE_EPILOGUE_USING_MOVE */
> -  m_PPRO | m_CORE2I7 | m_ATOM | m_ATHLON_K8 | m_GENERIC,
> +  m_PPRO | m_ATHLON_K8,

Push/pops wrt moves was always difficult to tune on old CPUs, so I am happy it
is gone from generic (in fact I had similar patch pending).
Are you sure about Atom having stack engine, too?

Related thing is accumulate_outgoing_args. Igor is testing it on Core and I will
give it a try on K10.

Honza

I am attaching the changes for core costs I made if someone is interested in
testing them.  If we can declare P4/PPRo and maybe K8 chips obsolette for
generic, there is room for improvement in generic, too. Like using inc/dec
again.

Honza

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 194452)
+++ config/i386/i386.c  (working copy)
@@ -1620,14 +1620,14 @@ struct processor_costs core_cost = {
   COSTS_N_INSNS (8),   /* cost of FABS instruction.  */
   COSTS_N_INSNS (8),   /* cost of FCHS instruction.  */
   COSTS_N_INSNS (40),  /* cost of FSQRT instruction.  */
-  {{libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
-   {libcall, {{24, loop, true}, {128, rep_prefix_8_byte, true},
+  {{libcall, {{8192, rep_prefix_4_byte, true}, {-1, libcall, false}}},
+   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
   {-1, libcall, false,
   {{libcall, {{6, loop_1_byte, true},
   {24, loop, true},
   {8192, rep_prefix_4_byte, true},
   {-1, libcall, false}}},
-   {libcall, {{24, loop, true}, {512, rep_prefix_8_byte, true},
+   {libcall, {{24, loop, true}, {8192, rep_prefix_8_byte, true},
   {-1, libcall, false,
   1,   /* scalar_stmt_cost.  */
   1,   /* scalar load_cost.  */
@@ -1806,7 +1806,7 @@ static unsigned int initial_ix86_tune_fe
   m_PPRO,
 
   /* X86_TUNE_PARTIAL_FLAG_REG_STALL */
-  m_CORE2I7 | m_GENERIC,
+  m_GENERIC | m_CORE2,
 
   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
* on 16-bit immediate moves into memory on Core2 and Corei7.  */
@@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
   m_K6,
 
   /* X86_TUNE_USE_CLTD */
-  ~(m_PENT | m_ATOM | m_K6),
+  ~(m_PENT | m_ATOM | m_K6 | m_GENERIC),
 
   /* X86_TUNE_USE_XCHGB: Use xchgb %rh,%rl instead of rolw/rorw $8,rx.  */
   m_PENT4,
@@ -1901,7 +1901,7 @@ static unsigned int initial_ix86_tune_fe
   m_COREI7 | m_BDVER,
 
   /* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL */
-  m_BDVER ,
+  m_BDVER,
 
   /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
  are resolved on SSE register parts instead of whole registers, so we may
@@ -1910,10 +1910,10 @@ static unsigned int initial_ix86_tune_fe
   m_ATHLON_K8,
 
   /* X86_TUNE_SSE_TYPELESS_STORES */
-  m_AMD_MULTIPLE,
+  m_AMD_MULTIPLE | m_CORE2I7, /**/
 
   /* X86_TUNE_SSE_LOAD0_BY_PXOR */
-  m_PPRO | m_P4_NOCONA,
+  m_PPRO | m_P4_NOCONA | m_CORE2I7, /**/
 
   /* X86_TUNE_MEMORY_MISMATCH_STALL */
   m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
@@ -1938,7 +1938,7 @@ static unsigned int initial_ix86_tune_fe
 
   /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
  than 4 branch instructions in the 16 byte window.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SCHEDULE */
   m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | 
m_GENERIC,
@@ -1947,10 +1947,10 @@ static unsigned int initial_ix86_tune_fe
   m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_INCDEC */
-  ~(m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GENERIC),
+  ~

Re: [patch c/c++]: Fix for PR c/52991 issue about ignored packed-attribute for ms-structure-layout

2012-12-12 Thread Kai Tietz
2012/12/12 Richard Biener :
> On Wed, Dec 12, 2012 at 4:11 PM, Richard Henderson  wrote:
>> On 12/12/2012 02:57 AM, Richard Biener wrote:
>>> That looks wrong.  Having both TYPE_PACKED and TYPE_ALIGN != BITS_PER_UNIT
>>> is inconsistent, so this part of the patch should not be necessary.
>>
>> No, that is the only way to give a 4 byte int 2 byte alignment:
>> use both packed and aligned attributes.
>>
>> struct S {
>>   char x;
>>   int y;
>> };
>>
>> struct T {
>>   char x;
>>   int y __attribute__((aligned(2)));
>> };
>>
>> struct U {
>>   char x;
>>   int y __attribute__((packed, aligned(2)));
>> };
>>
>> int s_y = __builtin_offsetof(struct S, y);
>> int t_y = __builtin_offsetof(struct T, y);
>> int u_y = __builtin_offsetof(struct U, y);
>
> But the patch changes it for the RECORD_TYPE case and
>
> struct T __attribute__((packed,aligned(2))) {
>char x;
>short s;
>char y;
>short a;
> };
> struct T x;
>
> doesn't work as I would have expected (giving x.x and x.a 2-byte alignment).
>
> In fact, the type attribute docs for 'packed' only say that fields are packed,
> not that alignment of the type itself is affected (and 'aligned' is not 
> specifed
> in the docs for types at all it seems).
>
> Richard.

Hmm, I see the attribute aligned explicit mention for types.  See 5.33
Specifying Attributes of Types.
Well, the case to combine aligned and packed attribute seems indeed
not to be explicit mentioned.  Nevertheless documention tells for
packed-attribute for types "This attribute, attached to `struct' or
`union' type definition, specifies that each member of the structure
or union is placed to minimize the memory required.", which implies -
I might be wrong here - that an alignment of 1 is active by default.
So to put those two attributes wiithin one attribute doesn't make
sense, as either the aligned or the packed have to be interpreted.  To
specify within a packed-struct-type for a sepcific variable a special
alignment - as shown in Rth's testcase - makes sense and seems to be
covered by the docs.

Regards,
Kai


Re: [patch] cfglayout fixes

2012-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2012 at 01:41:49PM +0100, Steven Bosscher wrote:
> On Wed, Dec 12, 2012 at 1:17 PM, Steven Bosscher wrote:
> > I'll resubmit for GCC 4.9. Instead of a test case, I have a verifier
> > queued that asserts no BARRIERs appear in the insns stream when in
> > cfglayout mode. (This verifier currently breaks hot-cold partitioning
> > all over the place due to a bug in try_redirect_by_replacing_jump.)
> 
> BTW, should there be a meta-bug for patches for GCC 4.9?

Yes, we had one for 4.8 too (and earlier releases too I think).

Jakub


Re: [patch c/c++]: Fix for PR c/52991 issue about ignored packed-attribute for ms-structure-layout

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 4:11 PM, Richard Henderson  wrote:
> On 12/12/2012 02:57 AM, Richard Biener wrote:
>> That looks wrong.  Having both TYPE_PACKED and TYPE_ALIGN != BITS_PER_UNIT
>> is inconsistent, so this part of the patch should not be necessary.
>
> No, that is the only way to give a 4 byte int 2 byte alignment:
> use both packed and aligned attributes.
>
> struct S {
>   char x;
>   int y;
> };
>
> struct T {
>   char x;
>   int y __attribute__((aligned(2)));
> };
>
> struct U {
>   char x;
>   int y __attribute__((packed, aligned(2)));
> };
>
> int s_y = __builtin_offsetof(struct S, y);
> int t_y = __builtin_offsetof(struct T, y);
> int u_y = __builtin_offsetof(struct U, y);

But the patch changes it for the RECORD_TYPE case and

struct T __attribute__((packed,aligned(2))) {
   char x;
   short s;
   char y;
   short a;
};
struct T x;

doesn't work as I would have expected (giving x.x and x.a 2-byte alignment).

In fact, the type attribute docs for 'packed' only say that fields are packed,
not that alignment of the type itself is affected (and 'aligned' is not specifed
in the docs for types at all it seems).

Richard.

>
> r~


Re: [i386] scalar ops that preserve the high part of a vector

2012-12-12 Thread Richard Henderson
On 12/07/2012 04:02 PM, Marc Glisse wrote:
> 2012-12-08  Marc Glisse  
> 
> PR target/54855
> gcc/
> * config/i386/sse.md (_vm3): Rewrite
> pattern.
> (sse2_loadlpd, sse2_loadhpd): Use vec_merge.
> * config/i386/i386-builtin-types.def: New function types.
> * config/i386/i386.c (ix86_expand_args_builtin): Likewise.
> (bdesc_args) <__builtin_ia32_addss, __builtin_ia32_subss,
> __builtin_ia32_addsd, __builtin_ia32_subsd>: Change prototype.
> (ix86_expand_vector_set): Use vec_merge for V2DF.
> * config/i386/xmmintrin.h: Adapt to new builtin prototype.
> * config/i386/emmintrin.h: Likewise.
> * doc/extend.texi (X86 Built-in Functions): Document changed prototype.
> 
> testsuite/
> * gcc.target/i386/pr54855-1.c: New testcase.
> * gcc.target/i386/pr54855-2.c: New testcase.

This looks like the right approach.

I won't approve this for 4.8 because (1) this isn't a regression,
and (2) all of the other operations want to be handled similarly.

But I'll approve this for 4.9 as part 1 of a series.  (That need
not be developed and committed all at once, but will be all done
before the end of the next stage1.)


r~


Re: PR other/54324: allow bootstrapping with older compilers

2012-12-12 Thread Teresa Johnson
On Wed, Dec 12, 2012 at 7:25 AM, Richard Biener  wrote:
> On Wed, 12 Dec 2012, Teresa Johnson wrote:
>
>> On Wed, Dec 12, 2012 at 2:21 AM, Steven Bosscher  
>> wrote:
>> > On Wed, Dec 12, 2012 at 6:07 AM, Aldy Hernandez wrote:
>> >> I don't know how much of this is a fool's errand, and if we want to commit
>> >> to supporting < GCC 3.4, but your patch suggested c++98, and GCC 3.2 
>> >> claims
>> >> such.
>> >
>> > GCC 3.2 claims many things, but any GCC that has the old C++ parser
>> > has known non-conformances. IMHO we should only support GCC versions
>> > with the "new" C++ parser, i.e. GCC 3.4 and up.
>> >
>> >
>> >>Perhaps we could even deprecate ARG_UNUSED?
>> >
>> > +1
>> >
>> >
>> >> 2. gcov-io.c uses __builtin_popcountll and __builtin_clzll.  Older
>> >>GCC's do not have this.  For that matter, how does this even work on
>> >>non-GCC systems?
>> >
>> > This'd be a bug.
>> >
>> >
>> >> I really don't want to spend much more time on this, but at the same 
>> >> time, I
>> >> don't want to throw away a day's work, especially if it could conceivably
>> >> help us with (older) non-GCC bootstrap compilers.
>> >
>> > I think the work you've done here is great, but even if you'd fix
>> > these issues, there is very little that can be done to avoid the same
>> > mistakes, or new ones, creeping back in and breaking builds with older
>> > compilers.
>> >
>> >
>> >> What are your thoughts on this?
>> >
>> > At least: Thanks for uncovering the gcov-io.c bug :-)
>>
>> Yes, thanks for fixing this. The fix looks good to me.
>
> In fact it looks ugly ... can't we avoid this code duplication somehow?

If you prefer, I can simply inline the popcount/clz functionality into
gcov-io.c directly (or at least when not using recent versions of
GCC). But that in fact would be duplicating the code, when I thought
Aldy's solution was trying to avoid that by providing the more general
interfaces.

Teresa

>
> Richard.



--
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: PR other/54324: allow bootstrapping with older compilers

2012-12-12 Thread Richard Biener
On Wed, 12 Dec 2012, Teresa Johnson wrote:

> On Wed, Dec 12, 2012 at 2:21 AM, Steven Bosscher  
> wrote:
> > On Wed, Dec 12, 2012 at 6:07 AM, Aldy Hernandez wrote:
> >> I don't know how much of this is a fool's errand, and if we want to commit
> >> to supporting < GCC 3.4, but your patch suggested c++98, and GCC 3.2 claims
> >> such.
> >
> > GCC 3.2 claims many things, but any GCC that has the old C++ parser
> > has known non-conformances. IMHO we should only support GCC versions
> > with the "new" C++ parser, i.e. GCC 3.4 and up.
> >
> >
> >>Perhaps we could even deprecate ARG_UNUSED?
> >
> > +1
> >
> >
> >> 2. gcov-io.c uses __builtin_popcountll and __builtin_clzll.  Older
> >>GCC's do not have this.  For that matter, how does this even work on
> >>non-GCC systems?
> >
> > This'd be a bug.
> >
> >
> >> I really don't want to spend much more time on this, but at the same time, 
> >> I
> >> don't want to throw away a day's work, especially if it could conceivably
> >> help us with (older) non-GCC bootstrap compilers.
> >
> > I think the work you've done here is great, but even if you'd fix
> > these issues, there is very little that can be done to avoid the same
> > mistakes, or new ones, creeping back in and breaking builds with older
> > compilers.
> >
> >
> >> What are your thoughts on this?
> >
> > At least: Thanks for uncovering the gcov-io.c bug :-)
> 
> Yes, thanks for fixing this. The fix looks good to me.

In fact it looks ugly ... can't we avoid this code duplication somehow?

Richard.


Re: [PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Marek Polacek
On Wed, Dec 12, 2012 at 03:11:26PM +0100, Steven Bosscher wrote:
> There are several warts that need addressing for GCC 4.9. I'm aware of
> the following bugs. There may be others, if you find any please let me
> know.
 
Ok, in case I find something, I'll let you know.

> 1. dumping for debug_dot_cfg (struct function *fn), fn != cfun
> This doesn't work with non-loop dumping because
> pre_and_rev_post_order_compute only works on cfun. Much of the CFG
> infrastructure hasn't been fully converted yet to work on any
> arbitrary CFG. Historically, there would be only one CFG at a time,
> but with cgraph we now maintain CFGs (and even loops) for functions
> throughout the compilation process. I'm planning to fix most issues
> for GCC 4.9.

So ideally pre_and_rev_post_order_compute should have another
parameter of type struct function *, right?

> 2. dumping with unreachable blocks
> The CFG dumpers actually handle this (since my latest patch to
> graph.c) but pre_and_rev_post_order_compute does not. If you remove
> the assert there, it will work. Again, for GCC 4.9 we'll have to look
> into a better solution for this (e.g. only conditionally assert, on
> demand).

Yeah, I have unreachable blocks in CFG, so that's it.

> 3. dumping with loops but incorrect loop tree
> If the loop tree isn't up to date, get_loop_body* will not work
> properly and dumping the CFG will crash. I don't know of any means to
> check the state of the loop tree and fall back to non-loop dumping.
> 
> 4. dumping with some of the more exotic tree dumping flags
> Not sure if all of them work, I haven't tested all combinations (vops,
> stmtaddr, etc.).

Thanks,

Marek


Re: [patch c/c++]: Fix for PR c/52991 issue about ignored packed-attribute for ms-structure-layout

2012-12-12 Thread Richard Henderson
On 12/12/2012 02:57 AM, Richard Biener wrote:
> That looks wrong.  Having both TYPE_PACKED and TYPE_ALIGN != BITS_PER_UNIT
> is inconsistent, so this part of the patch should not be necessary.

No, that is the only way to give a 4 byte int 2 byte alignment:
use both packed and aligned attributes.

struct S {
  char x;
  int y;
};

struct T {
  char x;
  int y __attribute__((aligned(2)));
};

struct U {
  char x;
  int y __attribute__((packed, aligned(2)));
};

int s_y = __builtin_offsetof(struct S, y);
int t_y = __builtin_offsetof(struct T, y);
int u_y = __builtin_offsetof(struct U, y);


r~


Re: PR other/54324: allow bootstrapping with older compilers

2012-12-12 Thread Teresa Johnson
On Wed, Dec 12, 2012 at 2:21 AM, Steven Bosscher  wrote:
> On Wed, Dec 12, 2012 at 6:07 AM, Aldy Hernandez wrote:
>> I don't know how much of this is a fool's errand, and if we want to commit
>> to supporting < GCC 3.4, but your patch suggested c++98, and GCC 3.2 claims
>> such.
>
> GCC 3.2 claims many things, but any GCC that has the old C++ parser
> has known non-conformances. IMHO we should only support GCC versions
> with the "new" C++ parser, i.e. GCC 3.4 and up.
>
>
>>Perhaps we could even deprecate ARG_UNUSED?
>
> +1
>
>
>> 2. gcov-io.c uses __builtin_popcountll and __builtin_clzll.  Older
>>GCC's do not have this.  For that matter, how does this even work on
>>non-GCC systems?
>
> This'd be a bug.
>
>
>> I really don't want to spend much more time on this, but at the same time, I
>> don't want to throw away a day's work, especially if it could conceivably
>> help us with (older) non-GCC bootstrap compilers.
>
> I think the work you've done here is great, but even if you'd fix
> these issues, there is very little that can be done to avoid the same
> mistakes, or new ones, creeping back in and breaking builds with older
> compilers.
>
>
>> What are your thoughts on this?
>
> At least: Thanks for uncovering the gcov-io.c bug :-)

Yes, thanks for fixing this. The fix looks good to me.

Thanks,
Teresa

>
> Ciao!
> Steven



--
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: PATCH: Remove AM_MAKEFLAGS from libsanitizer

2012-12-12 Thread Paolo Bonzini
Il 12/12/2012 15:41, H.J. Lu ha scritto:
> MAKEOVERRIDES is used for multilib.  I got
> 
> /bin/sh ../libtool --tag=CXX   --mode=compile  -D_GNU_SOURCE -D_DEBUG
> -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
> -DASAN_HAS_EXCEPTIONS=1 -DASAN_FLEXIBLE_MAPPING_AND_OFFSET=0
> -DASAN_NEEDS_SEGV=1  -I.
> -I/export/gnu/import/git/gcc/libsanitizer/asan  -I
> /export/gnu/import/git/gcc/libsanitizer/include -I
> /export/gnu/import/git/gcc/libsanitizer  -Wall -W
> -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC
> -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables
> -fvisibility=hidden -Wno-variadic-macros -Wno-c99-extensions
> -I../../libstdc++-v3/include
> -I../../libstdc++-v3/include/x86_64-unknown-linux-gnu
> -I/export/gnu/import/git/gcc/libsanitizer/../libstdc++-v3/libsupc++ -g
> -O2 -D_GNU_SOURCE  -m32 -MT asan_malloc_linux.lo -MD -MP -MF
> .deps/asan_malloc_linux.Tpo -c -o asan_malloc_linux.lo
> /export/gnu/import/git/gcc/libsanitizer/asan/asan_malloc_linux.cc
> libtool: compile: unrecognized option `-D_GNU_SOURCE'
> libtool: compile: Try `libtool --help' for more information.
> make[8]: *** [asan_allocator.lo] Error 1
> make[8]: *** Waiting for unfinished jobs
> libtool: compile: unrecognized option `-D_GNU_SOURCE'
> libtool: compile: Try `libtool --help' for more information
> 
> I checked in this patch to restore MAKEOVERRIDES.

This will break "make CFLAGS=-g".  Please revert the AM_MAKEFLAGS change
fully.

However, your patch that removed AM_MAKEFLAGS similarly broke "make
CC=foo".  While it is much less useful, this nevertheless may be the
sign of a bigger problem.  Why did you need to remove CC/CXX from
AM_MAKEFLAGS in the first place?

Paolo


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 3:39 PM, Yuri Rumyantsev  wrote:
> Guys,
>
> I assume that this is not right way for fixing such simple performance
> anomaly since we need to do redundant work - combine load to
> conditional and then split it back in peephole2? Does it look
> reasonable? Why we should produce non-efficient instrucction that must
> be splitted later?

Well, either don't allow this instruction variant from the start, or allow
the extra freedom for register allocation this creates.  It doesn't make
sense to just reject it being generated by combine - that doesn't address
when it materializes in another way.

Just my 2 cents - I'm not a x86 target maintainer and it's definitely up to them
to decide.

Richard.

> Best regards.
> Yuri.
>
> 2012/12/12 Richard Biener :
>> On Wed, Dec 12, 2012 at 1:55 PM, Uros Bizjak  wrote:
>>> On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener
>>>  wrote:
>>>
> This fix is aimed to remove performance degradation introduced by new
> LRA phase that in fact is combining problem. Gcc combiner does
> propagation of memory load to if-then-else gimple that was splitted
> back by old reload phase. LRA does not perform such splitting. To
> avoid performance slowdown on important benchmark (this is true for
> all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
> with a check on such propagation and consider such conditional
> instruction with memory operand as illegal one from performance point
> of view.
>
> The fix was bootstrapped and regtested for x86-64.
> Is it OK for 4.8 and mainline?

 Isn't it a win for -Os though?  Thus, optimize_insn_for_size ()?  It can
 also increase register pressure, no?  So eventually this splitting should
 be done post-reload only.  Not sure what appropriate machinery there is,
 besides from mdreorg (or split itself).
>>>
>>> So, you are proposing to use peephole2 with (match_scratch)
>>> conditional temporary?
>>
>> Yes, if that works.  (sounds backward to me having a peephole split one
>> insn into two ... ;))
>>
>> Richard.
>>
>>> Uros.


Re: PATCH: Remove AM_MAKEFLAGS from libsanitizer

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 12:49 AM, Paolo Bonzini  wrote:
> Il 11/12/2012 22:39, H.J. Lu ha scritto:
>> On Tue, Dec 11, 2012 at 6:36 AM, Paolo Bonzini  wrote:
>>> Il 11/12/2012 14:47, H.J. Lu ha scritto:
 On Thu, Dec 6, 2012 at 7:07 AM, H.J. Lu  wrote:
> On Thu, Nov 29, 2012 at 10:30 AM, H.J. Lu  wrote:
>> Hi,
>>
>> Since libsanitizer is used for bootstrap and compiled with raw_cxx,
>> we need to use explicit -I for libstdc++-v3 header files in
>> libsanitizer.  Otherwise, we will get
>>
>> libtool: compile: unrecognized option `-D_GNU_SOURCE'
>> libtool: compile: Try `libtool --help' for more information.
>>
>> This patch fixes it.  OK to install?
>>
>> Thanks.
>>
>>
>> H.J.
>> ---
>>  libsanitizer/Makefile.am  |  2 --
>>  libsanitizer/Makefile.in  |  6 +++---
>>  libsanitizer/aclocal.m4   |  1 +
>>  libsanitizer/asan/Makefile.am |  6 --
>>  libsanitizer/asan/Makefile.in | 14 ++
>>  libsanitizer/configure| 22 --
>>  libsanitizer/configure.ac |  1 +
>>  libsanitizer/interception/Makefile.am |  6 --
>>  libsanitizer/interception/Makefile.in | 14 ++
>>  libsanitizer/sanitizer_common/Makefile.am |  6 --
>>  libsanitizer/sanitizer_common/Makefile.in | 14 ++
>>  libsanitizer/tsan/Makefile.am |  6 --
>>  libsanitizer/tsan/Makefile.in | 13 +
>>  14 files changed, 97 insertions(+), 31 deletions(-)
>>  create mode 100644 libsanitizer/ChangeLog.asan
>>
>> 2012-11-22  H.J. Lu  
>>
>> * Makefile.am (AM_MAKEFLAGS): Remove CC and CXX.
>> * configure.ac (ACX_NONCANONICAL_TARGET): New.
>> * asan/Makefile.am (AM_CXXFLAGS): Add -I for libstdc++-v3 header
>> files.
>> (AM_MAKEFLAGS): Remove CC and CXX.
>> * interception/Makefile.am: Likewise.
>> * sanitizer_common/Makefile.am: Likewise.
>> * tsan/Makefile.am: Likewise.
>> * Makefile.in: Regenerated.
>> * aclocal.m4: Likewise.
>> * configure: Likewise.
>> * asan/Makefile.in: Likewise.
>> * interception/Makefile.in: Likewise.
>> * sanitizer_common/Makefile.in: Likewise.
>> * tsan/Makefile.in: Likewise.
>>
>> diff --git a/libsanitizer/Makefile.am b/libsanitizer/Makefile.am
>> index 64d3d2e..cd4e92d 100644
>> --- a/libsanitizer/Makefile.am
>> +++ b/libsanitizer/Makefile.am
>> @@ -37,8 +37,6 @@ AM_MAKEFLAGS = \
>> "includedir=$(includedir)" \
>> "AR=$(AR)" \
>> "AS=$(AS)" \
>> -   "CC=$(CC)" \
>> -   "CXX=$(CXX)" \
>> "LD=$(LD)" \
>> "LIBCFLAGS=$(LIBCFLAGS)" \
>> "NM=$(NM)" \
>>>
>>> As a followup please check if AM_MAKEFLAGS is needed at all.
>>>
>> diff --git a/libsanitizer/Makefile.in b/libsanitizer/Makefile.in
>> index 21c2711..53e0be9 100644
>> --- a/libsanitizer/Makefile.in
>> +++ b/libsanitizer/Makefile.in
>>>
>>> Please do not include regenerated files in the patch.
>>>
>>
>> Here is a patch to remove AM_MAKEFLAGS from
>> libsanitizer.  Tested on Linux/x86-64.  OK to install?
>>
>> Thanks.
>>
>>
>> H.J.
>> ---
>> 2012-12-11  H.J. Lu  
>>
>>   * Makefile.am (AM_MAKEFLAGS): Removed.
>>   * interception/Makefile.am: Likewise.
>>   * sanitizer_common/Makefile.am: Likewise.
>>   * tsan/Makefile.am: Likewise.
>>   * Makefile.in: Regenerated.
>>   * asan/Makefile.in: Likewise.
>>   * interception/Makefile.in: Likewise.
>>   * sanitizer_common/Makefile.in: Likewise.
>>   * tsan/Makefile.in: Likewise.
>>
>> diff --git a/libsanitizer/Makefile.am b/libsanitizer/Makefile.am
>> index 308d438..272a218 100644
>> --- a/libsanitizer/Makefile.am
>> +++ b/libsanitizer/Makefile.am
>> @@ -10,44 +10,6 @@ if USING_MAC_INTERPOSE
>>  SUBDIRS = sanitizer_common asan
>>  endif
>>
>> -# Work around what appears to be a GNU make bug handling MAKEFLAGS
>> -# values defined in terms of make variables, as is the case for CC and
>> -# friends when we are called from the top level Makefile.
>> -AM_MAKEFLAGS = \
>> - "AR_FLAGS=$(AR_FLAGS)" \
>> - "CC_FOR_BUILD=$(CC_FOR_BUILD)" \
>> - "CFLAGS=$(CFLAGS)" \
>> - "CXXFLAGS=$(CXXFLAGS)" \
>> - "CFLAGS_FOR_BUILD=$(CFLAGS_FOR_BUILD)" \
>> - "CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET)" \
>> - "INSTALL=$(INSTALL)" \
>> - "INSTALL_DATA=$(INSTALL_DATA)" \
>> - "INSTALL_PROGRAM=$(INSTALL_PROGRAM)" \
>> - "INSTALL_SCRIPT=$(INSTALL_SCRIPT)" \
>> - "JC1FLAGS=$(JC1FLAGS)" \
>> - "LDFLAGS=$(LDFLAGS)" \
>> - "LIBCFLAGS=$(LIBCFLAGS)" \
>> - "LIBCFLAGS_FOR_TARGET=$(LIBCFLAGS_FOR_TARGET)" \
>> - "MAKE=$(MAKE)" \
>> - "MAK

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Guys,

I assume that this is not right way for fixing such simple performance
anomaly since we need to do redundant work - combine load to
conditional and then split it back in peephole2? Does it look
reasonable? Why we should produce non-efficient instrucction that must
be splitted later?

Best regards.
Yuri.

2012/12/12 Richard Biener :
> On Wed, Dec 12, 2012 at 1:55 PM, Uros Bizjak  wrote:
>> On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener
>>  wrote:
>>
 This fix is aimed to remove performance degradation introduced by new
 LRA phase that in fact is combining problem. Gcc combiner does
 propagation of memory load to if-then-else gimple that was splitted
 back by old reload phase. LRA does not perform such splitting. To
 avoid performance slowdown on important benchmark (this is true for
 all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
 with a check on such propagation and consider such conditional
 instruction with memory operand as illegal one from performance point
 of view.

 The fix was bootstrapped and regtested for x86-64.
 Is it OK for 4.8 and mainline?
>>>
>>> Isn't it a win for -Os though?  Thus, optimize_insn_for_size ()?  It can
>>> also increase register pressure, no?  So eventually this splitting should
>>> be done post-reload only.  Not sure what appropriate machinery there is,
>>> besides from mdreorg (or split itself).
>>
>> So, you are proposing to use peephole2 with (match_scratch)
>> conditional temporary?
>
> Yes, if that works.  (sounds backward to me having a peephole split one
> insn into two ... ;))
>
> Richard.
>
>> Uros.


Re: [cxx-conversion] Convert tree-vectorizer.h'_loop_vec_info::peeling_htab to hash_table

2012-12-12 Thread Diego Novillo
On Tue, Dec 11, 2012 at 5:49 PM, Lawrence Crowl  wrote:
> Convert tree-vectorizer.h'_loop_vec_info::peeling_htab from htab_t
> to hash_table.
>
> * tree-vectorizer.h
>
> New struct peel_info_hasher.
>
> * tree-vect-loop.c
>
> Update dependent calls and types to match.
>
> * tree-vect-data-refs.c
>
> Fold vect_peeling_hash and vect_peeling_hash_eq into struct peel_info_hasher.
>
> Update dependent calls and types to match.
>
> Tested on x86_64.
>
> Okay for branch?

OK.


Diego.


Re: [cxx-conversion] Convert various htab_t tables to hash_table

2012-12-12 Thread Diego Novillo
On Tue, Dec 11, 2012 at 5:46 PM, Lawrence Crowl  wrote:
> Update various htab_t tables to hash_table.  Each file is independent.
> Update dependent calls and types.
>
> * tree-ssa-strlen.c'decl_to_stridxlist_htab
>
> Fold decl_to_stridxlist_hash into new struct stridxlist_hasher.
>
> * tree-ssa-loop-ivopts.c'ivopts_data::inv_expr_tab
>
> Fold htab_inv_expr_hash and htab_inv_expr_eq into new struct
> iv_inv_expr_hasher.
>
> * tree-ssa-uncprop.c'equiv
>
> Equiv renamed to val_ssa_equiv because of name ambiguity with local variables.
>
> Fold equiv_hash, equiv_eq and equiv_free into new struct val_ssa_equiv_hasher.
>
> Renamed variables equiv_hash_elt to an_equiv_elt because of name ambiguity
> with struct type.  Changed equiv_hash_elt_p to an_equiv_elt_p to match.
>
> * tree-ssa-phiopt.c'seen_ssa_names
>
> Fold name_to_bb_hash and name_to_bb_eq into new struct ssa_names_hasher.
>
> * tree-ssa-structalias.c'pointer_equiv_class_table
> * tree-ssa-structalias.c'location_equiv_class_table
>
> Fold equiv_class_label_hash and equiv_class_label_eq into new
> struct equiv_class_hasher.
>
> * tree-ssa-structalias.c'shared_bitmap_table
>
> Fold shared_bitmap_hash and shared_bitmap_eq into new struct
> shared_bitmap_hasher.
>
> * tree-ssa-live.c'var_map_base_init::tree_to_index
>
> New struct tree_int_map_hasher.
>
> * tree-ssa-reassoc.c'undistribute_ops_list::ctable
>
> Fold oecount_hash and oecount_eq into new struct oecount_hasher.
>
> * tree-ssa-loop-im.c'memory_accesses.refs
>
> Fold memref_hash and memref_eq into new struct mem_ref_hasher.
>
> Tested on x86_64.
>
> Okay for branch?


OK.


Diego.


Re: [cxx-conversion] Convert tree-sra.c'candidates to hash_table

2012-12-12 Thread Diego Novillo
On Tue, Dec 11, 2012 at 5:44 PM, Lawrence Crowl  wrote:
> Convert tree-sra.c'candidates from htab_t to hash_table.
>
> Fold uid_decl_map_hash and uid_decl_map_eq into new struct
> uid_decl_hasher.  This change moves the definitions from tree-ssa.c
> into tree-sra.c and removes the declarations from tree-flow.h
>
> Update dependent calls and types to hash_table.
>
> Tested on x86_64.
>
> Okay for branch?
>
>
> Index: gcc/tree-sra.c
> ===
> --- gcc/tree-sra.c  (revision 194381)
> +++ gcc/tree-sra.c  (working copy)
> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> +#include "hash-table.h"
>  #include "alloc-pool.h"
>  #include "tm.h"
>  #include "tree.h"
> @@ -269,18 +270,44 @@ static alloc_pool link_pool;
>  /* Base (tree) -> Vector (vec *) map.  */
>  static struct pointer_map_t *base_access_vec;
>
> +/* Candidate ashtable helpers.  */

s/ashtable/hash table/

> +
> +struct uid_decl_hasher : typed_noop_remove 
> +{
> +  typedef tree_node value_type;
> +  typedef tree_node compare_type;
> +  static inline hashval_t hash (const value_type *);
> +  static inline bool equal (const value_type *, const compare_type *);
> +};
> +
> +/* Hash a tree in a uid_decl_map.  */
> +
> +inline hashval_t
> +uid_decl_hasher::hash (const value_type *item)
> +{
> +  return item->decl_minimal.uid;
> +}
> +
> +/* Return true if the DECL_UID in both trees are equal.  */
> +
> +inline bool
> +uid_decl_hasher::equal (const value_type *a, const compare_type *b)
> +{
> +  return (a->decl_minimal.uid == b->decl_minimal.uid);
> +}
> +

ISTR other places where we hash decls.  I wonder if we shouldn't move
this to some common spot.  Maybe later.

The patch is OK.


Diego.


Re: [PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Richard Biener
On Wed, 12 Dec 2012, Steven Bosscher wrote:

> On Wed, Dec 12, 2012 at 3:08 PM, Richard Biener wrote:
> > Well, it now uses dominators - so steven, I suppose "fancy" dumping
> > should be disabled whenever they are not already computed?
> 
> Uh, it does? I tried to avoid that (I used get_loop_body_in_bfs_order
> for this reason). Do you have a test case where it use dominators?

Ah, no - I mis-pattern-matched the backtrace.  I suppose for the
case of dumping from gdb a flag to tell the function "don't do
anything fancy" as fallback would work for me.

Richard.


PING: [PATCH Filter out -fsanitize=address if not in combined tree in lto-plugin

2012-12-12 Thread H.J. Lu
On Thu, Nov 29, 2012 at 9:38 AM, H.J. Lu  wrote:
> Hi,
>
> When GCC is configured with --with-build-config="bootstrap-asan", all
> -flto tests will fail since -fsanitize=address is used to compile
> liblto_plugin.so and linker isn't compiled with -fsanitize=address.
> This patch filters out -fsanitize=address from CFLAGS if we aren't
> in a combined tree with binutils.  OK to install?
>
>
> H.J.
> ---
> ---
>  lto-plugin/Makefile.am|  7 +++
>  lto-plugin/Makefile.in|  5 +
>  lto-plugin/configure  | 18 +++---
>  lto-plugin/configure.ac   |  4 
>  5 files changed, 39 insertions(+), 3 deletions(-)
>  create mode 100644 lto-plugin/ChangeLog.asan
>
> 2012-11-21  H.J. Lu  
>
> * Makefile.am (CFLAGS): Filter out -fsanitize=address if not
> in a combined tree of GCC and binutils.
> (LDFLAGS): Likewise.
> * configure.ac (COMBINED_TREE): New AM_CONDITIONAL.
> * Makefile.in: Regenerated.
> * configure: Likewise.
>

PING.


-- 
H.J.


PING: [PATCH] Filter out -fsanitize=address if not in combined tree in libiberty

2012-12-12 Thread H.J. Lu
On Thu, Nov 29, 2012 at 9:40 AM, H.J. Lu  wrote:
> Hi,
>
> When GCC is configured with --with-build-config="bootstrap-asan", all
> -flto tests will fail since -fsanitize=address is used to compile host
> libiberty, which is used to create liblto_plugin.so, and linker isn't
> compiled with -fsanitize=address.  This patch filters out
> -fsanitize=address from CFLAGS if we aren't in a combined tree with
> binutils.  OK to install?
>
> Thanks.
>
> H.J.
> ---
> 2012-11-21  H.J. Lu  
>
> * Makefile.in (CFLAGS): Filter out -fsanitize=address if in GCC
> tree, but not in a combined tree with binutils.
> * configure.ac (COMBINED_TREE_FALSE): New AC_SUBST.
> * configure: Regenerated.
>

PING.

-- 
H.J.


Re: [PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Steven Bosscher
On Wed, Dec 12, 2012 at 3:08 PM, Richard Biener wrote:
> Well, it now uses dominators - so steven, I suppose "fancy" dumping
> should be disabled whenever they are not already computed?

Uh, it does? I tried to avoid that (I used get_loop_body_in_bfs_order
for this reason). Do you have a test case where it use dominators?

Ciao!
Steven


Re: [PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Steven Bosscher
On Wed, Dec 12, 2012 at 2:53 PM, Marek Polacek wrote:
> On Wed, Dec 12, 2012 at 01:53:53PM +0100, Richard Biener wrote:
>>
>> This adds the function print_graph_cfg that you can call from a
>> gdb session and directly pipes a dot representation of the
>> function to 'dot -Tx11'.  The only change needed to the now very
>> good dumping code is splitting out the actual worker without
>> the FILE handling.
>
> Thanks, I like it.  Just a little correction: from gdb session we want
> probably call debug_dot_cfg (cfun).
> Also, usually I get an ICE, e.g.:
> (gdb) b cleanup_cfg
> Breakpoint 1 at 0xe7f260: file /home/marek/src/gcc/gcc/cfgcleanup.c, line 
> 2946.
> (gdb) r
> Starting program: /home/marek/rh/x/trunk/gcc/cc1plus -O2 -ftracer 
> -fno-tree-dce -fno-tree-sra x.c -quiet
>
> Breakpoint 1, cleanup_cfg (mode=16) at 
> /home/marek/src/gcc/gcc/cfgcleanup.c:2946
> 2946{
> (gdb) call debug_dot_cfg(cfun)
> Detaching after fork from child process 2458.
> x.c: In function ‘int main()’:
> x.c:9:1: internal compiler error: in pre_and_rev_post_order_compute, at 
> cfganal.c:869
>  }
>  ^
> 0x78fb4c pre_and_rev_post_order_compute(int*, int*, bool)
> /home/marek/src/gcc/gcc/cfganal.c:869
> 0xebceb2 print_graph_cfg_1
> /home/marek/src/gcc/gcc/graph.c:186
> 0xebd2a1 debug_dot_cfg(function*)
> /home/marek/src/gcc/gcc/graph.c:259
>
> When I can call debug_dot_cfg and when not?

There are several warts that need addressing for GCC 4.9. I'm aware of
the following bugs. There may be others, if you find any please let me
know.

1. dumping for debug_dot_cfg (struct function *fn), fn != cfun
This doesn't work with non-loop dumping because
pre_and_rev_post_order_compute only works on cfun. Much of the CFG
infrastructure hasn't been fully converted yet to work on any
arbitrary CFG. Historically, there would be only one CFG at a time,
but with cgraph we now maintain CFGs (and even loops) for functions
throughout the compilation process. I'm planning to fix most issues
for GCC 4.9.

2. dumping with unreachable blocks
The CFG dumpers actually handle this (since my latest patch to
graph.c) but pre_and_rev_post_order_compute does not. If you remove
the assert there, it will work. Again, for GCC 4.9 we'll have to look
into a better solution for this (e.g. only conditionally assert, on
demand).

3. dumping with loops but incorrect loop tree
If the loop tree isn't up to date, get_loop_body* will not work
properly and dumping the CFG will crash. I don't know of any means to
check the state of the loop tree and fall back to non-loop dumping.

4. dumping with some of the more exotic tree dumping flags
Not sure if all of them work, I haven't tested all combinations (vops,
stmtaddr, etc.).

Ciao!
Steven


Re: [PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Richard Biener
On Wed, 12 Dec 2012, Marek Polacek wrote:

> On Wed, Dec 12, 2012 at 01:53:53PM +0100, Richard Biener wrote:
> > 
> > This adds the function print_graph_cfg that you can call from a
> > gdb session and directly pipes a dot representation of the
> > function to 'dot -Tx11'.  The only change needed to the now very
> > good dumping code is splitting out the actual worker without
> > the FILE handling.
> 
> Thanks, I like it.  Just a little correction: from gdb session we want
> probably call debug_dot_cfg (cfun).
> Also, usually I get an ICE, e.g.:
> (gdb) b cleanup_cfg
> Breakpoint 1 at 0xe7f260: file /home/marek/src/gcc/gcc/cfgcleanup.c, line 
> 2946.
> (gdb) r
> Starting program: /home/marek/rh/x/trunk/gcc/cc1plus -O2 -ftracer 
> -fno-tree-dce -fno-tree-sra x.c -quiet
> 
> Breakpoint 1, cleanup_cfg (mode=16) at 
> /home/marek/src/gcc/gcc/cfgcleanup.c:2946
> 2946  {
> (gdb) call debug_dot_cfg(cfun)
> Detaching after fork from child process 2458.
> x.c: In function ?int main()?:
> x.c:9:1: internal compiler error: in pre_and_rev_post_order_compute, at 
> cfganal.c:869
>  }
>  ^
> 0x78fb4c pre_and_rev_post_order_compute(int*, int*, bool)
>   /home/marek/src/gcc/gcc/cfganal.c:869
> 0xebceb2 print_graph_cfg_1
>   /home/marek/src/gcc/gcc/graph.c:186
> 0xebd2a1 debug_dot_cfg(function*)
>   /home/marek/src/gcc/gcc/graph.c:259
> 
> When I can call debug_dot_cfg and when not?

Well, it now uses dominators - so steven, I suppose "fancy" dumping
should be disabled whenever they are not already computed?

Richard.
k


[committed] PATCH: Sync config.sub with src

2012-12-12 Thread H.J. Lu
Hi,

I checked in this patch to sync config.sub with src.


H.J.
---
Index: ChangeLog
===
--- ChangeLog   (revision 194447)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2012-12-12  Jan-Benedict Glaw  
+
+   * config.sub: Merge from config repo.
+
 2012-12-11  H.J. Lu  
 
* Makefile.def (target_modules): Add bootstrap=true and
Index: config.sub
===
--- config.sub  (revision 194447)
+++ config.sub  (working copy)
@@ -4,7 +4,7 @@
 #   2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
 #   2011, 2012 Free Software Foundation, Inc.
 
-timestamp='2012-04-18'
+timestamp='2012-12-06'
 
 # This file is (in principle) common to ALL GNU software.
 # The presence of a machine in this file suggests that SOME GNU software
@@ -123,7 +123,7 @@ esac
 maybe_os=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\2/'`
 case $maybe_os in
   nto-qnx* | linux-gnu* | linux-android* | linux-dietlibc | linux-newlib* | \
-  linux-uclibc* | uclinux-uclibc* | uclinux-gnu* | kfreebsd*-gnu* | \
+  linux-musl* | linux-uclibc* | uclinux-uclibc* | uclinux-gnu* | 
kfreebsd*-gnu* | \
   knetbsd*-gnu* | netbsd*-gnu* | \
   kopensolaris*-gnu* | \
   storm-chaos* | os2-emx* | rtmk-nova*)
@@ -156,7 +156,7 @@ case $os in
-convergent* | -ncr* | -news | -32* | -3600* | -3100* | -hitachi* |\
-c[123]* | -convex* | -sun | -crds | -omron* | -dg | -ultra | -tti* | \
-harris | -dolphin | -highlevel | -gould | -cbm | -ns | -masscomp | \
-   -apple | -axis | -knuth | -cray | -microblaze)
+   -apple | -axis | -knuth | -cray | -microblaze*)
os=
basic_machine=$1
;;
@@ -259,8 +259,10 @@ case $basic_machine in
| alpha | alphaev[4-8] | alphaev56 | alphaev6[78] | alphapca5[67] \
| alpha64 | alpha64ev[4-8] | alpha64ev56 | alpha64ev6[78] | 
alpha64pca5[67] \
| am33_2.0 \
-   | arc | arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb] | avr | 
avr32 \
-| be32 | be64 \
+   | arc \
+   | arm | arm[bl]e | arme[lb] | armv[2-8] | armv[3-8][lb] | armv7[arm] \
+   | avr | avr32 \
+   | be32 | be64 \
| bfin \
| c4x | clipper \
| d10v | d30v | dlx | dsp16xx \
@@ -273,7 +275,7 @@ case $basic_machine in
| le32 | le64 \
| lm32 \
| m32c | m32r | m32rle | m68000 | m68k | m88k \
-   | maxq | mb | microblaze | mcore | mep | metag \
+   | maxq | mb | microblaze | microblazeel | mcore | mep | metag \
| mips | mipsbe | mipseb | mipsel | mipsle \
| mips16 \
| mips64 | mips64el \
@@ -389,7 +391,8 @@ case $basic_machine in
| lm32-* \
| m32c-* | m32r-* | m32rle-* \
| m68000-* | m680[012346]0-* | m68360-* | m683?2-* | m68k-* \
-   | m88110-* | m88k-* | maxq-* | mcore-* | metag-* | microblaze-* \
+   | m88110-* | m88k-* | maxq-* | mcore-* | metag-* \
+   | microblaze-* | microblazeel-* \
| mips-* | mipsbe-* | mipseb-* | mipsel-* | mipsle-* \
| mips16-* \
| mips64-* | mips64el-* \
@@ -788,9 +791,13 @@ case $basic_machine in
basic_machine=ns32k-utek
os=-sysv
;;
-   microblaze)
+   microblaze*)
basic_machine=microblaze-xilinx
;;
+   mingw64)
+   basic_machine=x86_64-pc
+   os=-mingw64
+   ;;
mingw32)
basic_machine=i386-pc
os=-mingw32
@@ -1019,7 +1026,11 @@ case $basic_machine in
basic_machine=i586-unknown
os=-pw32
;;
-   rdos)
+   rdos | rdos64)
+   basic_machine=x86_64-pc
+   os=-rdos
+   ;;
+   rdos32)
basic_machine=i386-pc
os=-rdos
;;
@@ -1352,15 +1363,15 @@ case $os in
  | -nindy* | -vxsim* | -vxworks* | -ebmon* | -hms* | -mvs* \
  | -clix* | -riscos* | -uniplus* | -iris* | -rtu* | -xenix* \
  | -hiux* | -386bsd* | -knetbsd* | -mirbsd* | -netbsd* \
- | -openbsd* | -solidbsd* \
+ | -bitrig* | -openbsd* | -solidbsd* \
  | -ekkobsd* | -kfreebsd* | -freebsd* | -riscix* | -lynxos* \
  | -bosx* | -nextstep* | -cxux* | -aout* | -elf* | -oabi* \
  | -ptx* | -coff* | -ecoff* | -winnt* | -domain* | -vsta* \
  | -udi* | -eabi* | -lites* | -ieee* | -go32* | -aux* \
  | -chorusos* | -chorusrdb* | -cegcc* \
  | -cygwin* | -msys* | -pe* | -psos* | -moss* | -proelf* | -rtems* 
\
- | -mingw32* | -linux-gnu* | -linux-android* \
- | -linux-newlib* | -linux-uclibc* \
+ | -mingw32* | -mingw64* | -linux-gnu* | -linux-android* \
+ | -linux-newlib* | -linux-musl* | -linux-uclibc* \
  | -uxpv* | -beos*

Re: [PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Marek Polacek
On Wed, Dec 12, 2012 at 01:53:53PM +0100, Richard Biener wrote:
> 
> This adds the function print_graph_cfg that you can call from a
> gdb session and directly pipes a dot representation of the
> function to 'dot -Tx11'.  The only change needed to the now very
> good dumping code is splitting out the actual worker without
> the FILE handling.

Thanks, I like it.  Just a little correction: from gdb session we want
probably call debug_dot_cfg (cfun).
Also, usually I get an ICE, e.g.:
(gdb) b cleanup_cfg
Breakpoint 1 at 0xe7f260: file /home/marek/src/gcc/gcc/cfgcleanup.c, line 2946.
(gdb) r
Starting program: /home/marek/rh/x/trunk/gcc/cc1plus -O2 -ftracer -fno-tree-dce 
-fno-tree-sra x.c -quiet

Breakpoint 1, cleanup_cfg (mode=16) at /home/marek/src/gcc/gcc/cfgcleanup.c:2946
2946{
(gdb) call debug_dot_cfg(cfun)
Detaching after fork from child process 2458.
x.c: In function ‘int main()’:
x.c:9:1: internal compiler error: in pre_and_rev_post_order_compute, at 
cfganal.c:869
 }
 ^
0x78fb4c pre_and_rev_post_order_compute(int*, int*, bool)
/home/marek/src/gcc/gcc/cfganal.c:869
0xebceb2 print_graph_cfg_1
/home/marek/src/gcc/gcc/graph.c:186
0xebd2a1 debug_dot_cfg(function*)
/home/marek/src/gcc/gcc/graph.c:259

When I can call debug_dot_cfg and when not?

Marek


Re: [PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Steven Bosscher
On Wed, Dec 12, 2012 at 1:53 PM, Richard Biener wrote:
>
> This adds the function print_graph_cfg that you can call from a
> gdb session and directly pipes a dot representation of the
> function to 'dot -Tx11'.  The only change needed to the now very
> good dumping code is splitting out the actual worker without
> the FILE handling.
>
> Probably not suitable for trunk because I use popen/pclose/fileno
> which I don't know whether they are available on all host platforms.
> So - any taker to transform this into a gdb python macro for .gdbinit
> instead?

This needs a generic solution, there are other places where calling a
dot viewer from the debugger would be helpful:

graphite-poly.c:  system ("dotty /tmp/lst.dot &");
graphite-scop-detection.c:  x = system ("dotty /tmp/allscops.dot &");
graphite-scop-detection.c:x = system ("dotty /tmp/allscops.dot &");
tree-data-ref.c:  system ("dotty /tmp/rdg.dot &");

I'd like to add some solution for this in GCC 4.9, in part by creating
some abstraction for graph dumping and then adding a direct viewing
capability somehow (maybe even via libgraphviz).

(BTW I can recommend xdot as a "better dot -Tx11", see
http://code.google.com/p/jrfonseca/wiki/XDot. You'll want a recent
version, older copies don't render dashed edges correctly.)

Ciao!
Steven


Re: [PATCH] PR c++/53609 - Wrong argument deduction for pack expansion in argument pack

2012-12-12 Thread Dodji Seketeli
Jason Merrill  writes:


> I'd also like to move the scan and PACK_EXPANSION_EXTRA_ARGS code back
> out of the loop.

Like this?

Tested on x86_64-unknown-linux-gnu against trunk.

gcc/cp/

* pt.c (argument_pack_element_is_expansion_p)
(make_argument_pack_select, scan_parm_packs_and_args)
(gen_elem_of_pack_expansion_instantiation): New static functions.
(has_bare_parameter_packs): Factorized out of ...
(check_for_bare_parameter_packs): ... here.
(tsubst): When looking through an ARGUMENT_PACK_SELECT tree node,
look through the possibly resulting pack expansion as well.
(tsubst_pack_expansion): Now this function is clearly organized in
two parts: a part that maps each parameter pack with its matching
argument pack, and a part that walks that map to build the result
of substituting each element of the argument packs into the
parameter packs.  Use gen_elem_of_pack_expansion_instantiation for
the latter part.
(arg_from_parm_pack_p): Remove this for it's not used by
tsubst_pack_expansion anymore.

gcc/testsuite/

* g++.dg/cpp0x/variadic139.C: New test.
* g++.dg/cpp0x/variadic140.C: Likewise.
* g++.dg/cpp0x/variadic141.C: Likewise.
---
 gcc/cp/pt.c  | 485 ---
 gcc/testsuite/g++.dg/cpp0x/variadic139.C |  14 +
 gcc/testsuite/g++.dg/cpp0x/variadic140.C |  22 ++
 gcc/testsuite/g++.dg/cpp0x/variadic141.C |  22 ++
 4 files changed, 366 insertions(+), 177 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic139.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic140.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic141.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index ecb013e..221323c 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -201,7 +201,6 @@ static void append_type_to_template_for_access_check_1 
(tree, tree, tree,
 static tree listify (tree);
 static tree listify_autos (tree, tree);
 static tree template_parm_to_arg (tree t);
-static bool arg_from_parm_pack_p (tree, tree);
 static tree current_template_args (void);
 static tree tsubst_template_parm (tree, tree, tsubst_flags_t);
 static tree instantiate_alias_template (tree, tree, tsubst_flags_t);
@@ -3308,6 +3307,29 @@ make_pack_expansion (tree arg)
   return result;
 }
 
+/* Return NULL_TREE iff T contains *NO* unexpanded parameter packs.
+   Return the TREE_LIST of unexpanded parameter packs otherwise.  */
+
+static tree
+has_bare_parameter_packs (tree t)
+{
+  tree parameter_packs = NULL_TREE;
+  struct find_parameter_pack_data ppd;
+
+  if (!processing_template_decl || !t || t == error_mark_node)
+return NULL_TREE;
+
+  if (TREE_CODE (t) == TYPE_DECL)
+t = TREE_TYPE (t);
+
+  ppd.parameter_packs = ¶meter_packs;
+  ppd.visited = pointer_set_create ();
+  cp_walk_tree (&t, &find_parameter_packs_r, &ppd, ppd.visited);
+  pointer_set_destroy (ppd.visited);
+
+  return parameter_packs;
+}
+
 /* Checks T for any "bare" parameter packs, which have not yet been
expanded, and issues an error if any are found. This operation can
only be done on full expressions or types (e.g., an expression
@@ -3325,19 +3347,7 @@ make_pack_expansion (tree arg)
 bool 
 check_for_bare_parameter_packs (tree t)
 {
-  tree parameter_packs = NULL_TREE;
-  struct find_parameter_pack_data ppd;
-
-  if (!processing_template_decl || !t || t == error_mark_node)
-return false;
-
-  if (TREE_CODE (t) == TYPE_DECL)
-t = TREE_TYPE (t);
-
-  ppd.parameter_packs = ¶meter_packs;
-  ppd.visited = pointer_set_create ();
-  cp_walk_tree (&t, &find_parameter_packs_r, &ppd, ppd.visited);
-  pointer_set_destroy (ppd.visited);
+  tree parameter_packs = has_bare_parameter_packs (t);
 
   if (parameter_packs) 
 {
@@ -3812,42 +3822,6 @@ template_parm_to_arg (tree t)
   return t;
 }
 
-/* This function returns TRUE if PARM_PACK is a template parameter
-   pack and if ARG_PACK is what template_parm_to_arg returned when
-   passed PARM_PACK.  */
-
-static bool
-arg_from_parm_pack_p (tree arg_pack, tree parm_pack)
-{
-  /* For clarity in the comments below let's use the representation
- argument_pack' to denote an argument pack and its
- elements.
-
- In the 'if' block below, we want to detect cases where
- ARG_PACK is argument_pack.  I.e, we want to
- check if ARG_PACK is an argument pack which sole element is
- the expansion of PARM_PACK.  That argument pack is typically
- created by template_parm_to_arg when passed a parameter
- pack.  */
-
-  if (arg_pack
-  && TREE_VEC_LENGTH (ARGUMENT_PACK_ARGS (arg_pack)) == 1
-  && PACK_EXPANSION_P (TREE_VEC_ELT (ARGUMENT_PACK_ARGS (arg_pack), 0)))
-{
-  tree expansion = TREE_VEC_ELT (ARGUMENT_PACK_ARGS (arg_pack), 0);
-  tree pattern = PACK_EXPANSION_PATTERN (expansion);
-  if ((TYPE_P (pattern) && same_type_p (pattern, parm_pack))
- || (!TYPE_P (pattern) && cp_tr

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 1:55 PM, Uros Bizjak  wrote:
> On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener
>  wrote:
>
>>> This fix is aimed to remove performance degradation introduced by new
>>> LRA phase that in fact is combining problem. Gcc combiner does
>>> propagation of memory load to if-then-else gimple that was splitted
>>> back by old reload phase. LRA does not perform such splitting. To
>>> avoid performance slowdown on important benchmark (this is true for
>>> all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
>>> with a check on such propagation and consider such conditional
>>> instruction with memory operand as illegal one from performance point
>>> of view.
>>>
>>> The fix was bootstrapped and regtested for x86-64.
>>> Is it OK for 4.8 and mainline?
>>
>> Isn't it a win for -Os though?  Thus, optimize_insn_for_size ()?  It can
>> also increase register pressure, no?  So eventually this splitting should
>> be done post-reload only.  Not sure what appropriate machinery there is,
>> besides from mdreorg (or split itself).
>
> So, you are proposing to use peephole2 with (match_scratch)
> conditional temporary?

Yes, if that works.  (sounds backward to me having a peephole split one
insn into two ... ;))

Richard.

> Uros.


[PATCH] Fix PR55481

2012-12-12 Thread Richard Biener

This patch from Zdenek removes special code handling the forced
rewrite of the original BIV definition.  The code was broken,
so simply fallback to the general rewriting code which works fine.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
(the issue is latent on the branches).

Thanks Zdenek for the fix,
Richard.

2012-12-12  Zdenek Dvorak  

PR tree-optimization/55481
* tree-ssa-loop-ivopts.c (rewrite_use_nonlinear_expr): Fall
back to general rewriting if we cannot leave an original biv
definition alone.

* gcc.dg/torture/pr55481.c: New testcase.

Index: gcc/tree-ssa-loop-ivopts.c
===
*** gcc/tree-ssa-loop-ivopts.c  (revision 194438)
--- gcc/tree-ssa-loop-ivopts.c  (working copy)
*** rewrite_use_nonlinear_expr (struct ivopt
*** 6088,6122 
if (cand->pos == IP_ORIGINAL
&& cand->incremented_at == use->stmt)
  {
!   tree step, ctype, utype;
!   enum tree_code incr_code = PLUS_EXPR, old_code;
  
gcc_assert (is_gimple_assign (use->stmt));
gcc_assert (gimple_assign_lhs (use->stmt) == cand->var_after);
  
-   step = cand->iv->step;
-   ctype = TREE_TYPE (step);
-   utype = TREE_TYPE (cand->var_after);
-   if (TREE_CODE (step) == NEGATE_EXPR)
-   {
- incr_code = MINUS_EXPR;
- step = TREE_OPERAND (step, 0);
-   }
- 
/* Check whether we may leave the computation unchanged.
 This is the case only if it does not rely on other
 computations in the loop -- otherwise, the computation
 we rely upon may be removed in remove_unused_ivs,
 thus leading to ICE.  */
!   old_code = gimple_assign_rhs_code (use->stmt);
!   if (old_code == PLUS_EXPR
! || old_code == MINUS_EXPR
! || old_code == POINTER_PLUS_EXPR)
{
  if (gimple_assign_rhs1 (use->stmt) == cand->var_before)
op = gimple_assign_rhs2 (use->stmt);
! else if (old_code != MINUS_EXPR
!  && gimple_assign_rhs2 (use->stmt) == cand->var_before)
op = gimple_assign_rhs1 (use->stmt);
  else
op = NULL_TREE;
--- 6088,6111 
if (cand->pos == IP_ORIGINAL
&& cand->incremented_at == use->stmt)
  {
!   enum tree_code stmt_code;
  
gcc_assert (is_gimple_assign (use->stmt));
gcc_assert (gimple_assign_lhs (use->stmt) == cand->var_after);
  
/* Check whether we may leave the computation unchanged.
 This is the case only if it does not rely on other
 computations in the loop -- otherwise, the computation
 we rely upon may be removed in remove_unused_ivs,
 thus leading to ICE.  */
!   stmt_code = gimple_assign_rhs_code (use->stmt);
!   if (stmt_code == PLUS_EXPR
! || stmt_code == MINUS_EXPR
! || stmt_code == POINTER_PLUS_EXPR)
{
  if (gimple_assign_rhs1 (use->stmt) == cand->var_before)
op = gimple_assign_rhs2 (use->stmt);
! else if (gimple_assign_rhs2 (use->stmt) == cand->var_before)
op = gimple_assign_rhs1 (use->stmt);
  else
op = NULL_TREE;
*** rewrite_use_nonlinear_expr (struct ivopt
*** 6124,6147 
else
op = NULL_TREE;
  
!   if (op
! && (TREE_CODE (op) == INTEGER_CST
! || operand_equal_p (op, step, 0)))
return;
- 
-   /* Otherwise, add the necessary computations to express
-the iv.  */
-   op = fold_convert (ctype, cand->var_before);
-   comp = fold_convert (utype,
-  build2 (incr_code, ctype, op,
-  unshare_expr (step)));
- }
-   else
- {
-   comp = get_computation (data->current_loop, use, cand);
-   gcc_assert (comp != NULL_TREE);
  }
  
switch (gimple_code (use->stmt))
  {
  case GIMPLE_PHI:
--- 6113,6125 
else
op = NULL_TREE;
  
!   if (op && expr_invariant_in_loop_p (data->current_loop, op))
return;
  }
  
+   comp = get_computation (data->current_loop, use, cand);
+   gcc_assert (comp != NULL_TREE);
+ 
switch (gimple_code (use->stmt))
  {
  case GIMPLE_PHI:
Index: gcc/testsuite/gcc.dg/torture/pr55481.c
===
*** gcc/testsuite/gcc.dg/torture/pr55481.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr55481.c  (revision 0)
***
*** 0 
--- 1,16 
+ /* { dg-do run } */
+ 
+ int main()
+ {
+   signed char result = 0;
+   int n;
+   for (n = 0; n < 13; ++n)
+ {
+   int tem = result;
+   tem = tem + 31;
+   result = tem;
+ }
+   if (result != -109)
+ __builtin_abort ();
+   return 0;
+ }


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener
 wrote:

>> This fix is aimed to remove performance degradation introduced by new
>> LRA phase that in fact is combining problem. Gcc combiner does
>> propagation of memory load to if-then-else gimple that was splitted
>> back by old reload phase. LRA does not perform such splitting. To
>> avoid performance slowdown on important benchmark (this is true for
>> all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
>> with a check on such propagation and consider such conditional
>> instruction with memory operand as illegal one from performance point
>> of view.
>>
>> The fix was bootstrapped and regtested for x86-64.
>> Is it OK for 4.8 and mainline?
>
> Isn't it a win for -Os though?  Thus, optimize_insn_for_size ()?  It can
> also increase register pressure, no?  So eventually this splitting should
> be done post-reload only.  Not sure what appropriate machinery there is,
> besides from mdreorg (or split itself).

So, you are proposing to use peephole2 with (match_scratch)
conditional temporary?

Uros.


[PATCH] DOT a function directly from a gdb session

2012-12-12 Thread Richard Biener

This adds the function print_graph_cfg that you can call from a
gdb session and directly pipes a dot representation of the
function to 'dot -Tx11'.  The only change needed to the now very
good dumping code is splitting out the actual worker without
the FILE handling.

Probably not suitable for trunk because I use popen/pclose/fileno
which I don't know whether they are available on all host platforms.
So - any taker to transform this into a gdb python macro for .gdbinit
instead?

Richard.

2012-12-12  Richard Biener  

* graph.c (print_graph_cfg_1): New function split out from
print_graph_cfg.
(debug_dot_cfg): New function.

Index: gcc/graph.c
===
*** gcc/graph.c (revision 194438)
--- gcc/graph.c (working copy)
*** draw_cfg_node_succ_edges (pretty_printer
*** 165,176 
  
  /* Print a graphical representation of the CFG of function FUN.  */
  
! void
! print_graph_cfg (const char *base, struct function *fun)
  {
const char *funcname = function_name (fun);
int funcdef_no = fun->funcdef_no;
-   FILE *fp = open_graph_file (base, "a");
int *rpo = XNEWVEC (int, n_basic_blocks);
basic_block bb;
int i, n;
--- 165,175 
  
  /* Print a graphical representation of the CFG of function FUN.  */
  
! static void
! print_graph_cfg_1 (FILE *fp, struct function *fun)
  {
const char *funcname = function_name (fun);
int funcdef_no = fun->funcdef_no;
int *rpo = XNEWVEC (int, n_basic_blocks);
basic_block bb;
int i, n;
*** print_graph_cfg (const char *base, struc
*** 202,207 
--- 201,215 
  
pp_printf (pp, "\t}\n");
pp_flush (pp);
+ }
+ 
+ /* Print a graphical representation of the CFG of function FUN.  */
+ 
+ void
+ print_graph_cfg (const char *base, struct function *fun)
+ {
+   FILE *fp = open_graph_file (base, "a");
+   print_graph_cfg_1 (fp, fun);
fclose (fp);
  }
  
*** finish_graph_dump_file (const char *base
*** 239,241 
--- 247,265 
end_graph_dump (fp);
fclose (fp);
  }
+ 
+ /* Debug the CFG of the function FN using the dot command.  */
+ void DEBUG_FUNCTION
+ debug_dot_cfg (struct function *fn)
+ {
+   FILE *fp = popen("dot -Tx11", "w");
+   if (!fp)
+ return;
+   fputs ("digraph \"\" { overlap=false;\n", fp);
+   print_graph_cfg_1 (fp, fn);
+   fputs ("}\n", fp);
+   fflush (fp);
+   /* ???  Close the pipe fd, that avoids waiting for the child in pclose.  */
+   close (fileno (fp));
+   pclose (fp);
+ }


Re: [patch] cfglayout fixes

2012-12-12 Thread Steven Bosscher
On Wed, Dec 12, 2012 at 1:17 PM, Steven Bosscher wrote:
> I'll resubmit for GCC 4.9. Instead of a test case, I have a verifier
> queued that asserts no BARRIERs appear in the insns stream when in
> cfglayout mode. (This verifier currently breaks hot-cold partitioning
> all over the place due to a bug in try_redirect_by_replacing_jump.)

BTW, should there be a meta-bug for patches for GCC 4.9?

Ciao!
Steven


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Hi Richard,

I assume that this fix does not affect on code size since such pattern
happens very rare although I can add a check on it if you insist.
Register pressure is not a issue here since I assume that additional
fill won't affect on performance as cmove with memory operand. I
decided to not change machine description of cmov (prohibit memory
operand0 since it is very risky but only did a simple change to not
produce such instruction during forward substitution (aka combine)
phase.

Best regards.
Yuri.


2012/12/12 Richard Biener :
> On Wed, Dec 12, 2012 at 12:47 PM, Yuri Rumyantsev  wrote:
>> Hi Uros,
>>
>> This fix is for all x86 platforms, we tested it on core2/corei7,
>> atom/atom2 and AMD and got performance improvement +6% -- +11%. So I
>> don' think we need to introduce additioanl tune feature.
>>
>> Sorry for my typo with gcc version - I ment mainline only since 4.7
>> does not use LRA.
>
> Btw, if it's a performance improvement all over the board why not adjust
> the patterns itself to not allow memory operands?  Thus, why restrict it
> to combine not creating this instruction?  (My -Os comment still stands)
>
> Richard.
>
>> Thanks.
>> Yuri.
>>
>>
>>
>> 2012/12/12 Uros Bizjak :
>>> On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev  
>>> wrote:
>>>
 This fix is aimed to remove performance degradation introduced by new
 LRA phase that in fact is combining problem. Gcc combiner does
 propagation of memory load to if-then-else gimple that was splitted
 back by old reload phase. LRA does not perform such splitting. To
 avoid performance slowdown on important benchmark (this is true for
 all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
 with a check on such propagation and consider such conditional
 instruction with memory operand as illegal one from performance point
 of view.
>>>
>>> Is this true for all x86 targets? I have no objections to the
>>> implementation, but these fine-tunings should be declared in
>>> ix86_tune_features[] array, and used as conditions involving
>>> TARGET_xxx in the code. Please see many examples in the i386 source
>>> dir.
>>>
 Is it OK for 4.8 and mainline?
>>>
>>> Hm, currently 4.8 _is_ mainline. Did you mean 4.7?
>>>
>>> Thanks,
>>> Uros.


Re: [patch] cluster loop bodies in CFG graph dumping

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 1:08 PM, Steven Bosscher  wrote:
> On Wed, Dec 12, 2012 at 11:46 AM, Richard Biener wrote:
>> On Tue, Dec 11, 2012 at 8:57 PM, Steven Bosscher wrote:
>> I'm not sure we want to give up this:
>>
>> @@ -863,12 +863,13 @@ get_loop_body_in_dom_order (const struct loop *loo
>>basic_block *tovisit;
>>int tv;
>>
>> +  if (loop->latch == EXIT_BLOCK_PTR)
>> +return get_loop_body (loop);
>> +
>>gcc_assert (loop->num_nodes);
>>
>>tovisit = XNEWVEC (basic_block, loop->num_nodes);
>>
>> -  gcc_assert (loop->latch != EXIT_BLOCK_PTR);
>> -
>>
>> assert.  It makes sure nobody treats the root loop node as real loop ...
>
> Well, as you can see, get_loop_body does handle this case already. So
> if you use plain get_loop_body, you get the body of the root node
> returned.

Yes, but with your "fix" get_loop_body_in_dom_order would actually
_not_ return the body in DOM order.  I wonder what it does with
a noreturn call inside a inner loop body.  Without fake exit/loop edges
things are going to break anyway I guess (so for 4.9 eventually just
remove the assert and require fake edges setup?)

>> So, can you special-case this in your dumping routines instead?
>
> Sure.

Thanks,
Richard.

>
>> That makes it a dump-only patch and thus ok.  (indeed much nicer to look
>> at loop preserve issues this way!)
>
> Ciao!
> Steven


Re: [patch] cfglayout fixes

2012-12-12 Thread Steven Bosscher
On Wed, Dec 12, 2012 at 11:48 AM, Jakub Jelinek wrote:
>> The fixup_reorder_chain patch fixes and ICE with hot-cold partitioning
>> and GCAC checking. A BARRIER is removed somewhere along the line, but
>> the BB_FOOTER pointer still points to it. I'm not sure how things go
>> bad from there, but we end up with BB_FOOTER pointing to a collected
>> barrier.
>
> Then a testcase should be added together with the patch and the change
> shouldn't be part of further unrelated changes.

You're kidding, right? A test case for a GCAC failure with profiling?
I wouldn't even know how to write a test case for it :-)

I'll resubmit for GCC 4.9. Instead of a test case, I have a verifier
queued that asserts no BARRIERs appear in the insns stream when in
cfglayout mode. (This verifier currently breaks hot-cold partitioning
all over the place due to a bug in try_redirect_by_replacing_jump.)

>> The combine fix is a fix for a real bug. Your patch makes combine look
>> across basic block boundaries and look for BARRIERs which can never be
>> there. Technically this is a regression.
>
> I don't see how.  If there is no barrier, but the prev note or insn doesn't
> belong to the current bb, then BLOCK_FOR_INSN (last_combined_insn) != 
> this_basic_block
> will be true, and if there is no insn at all, last_combined_insn will be
> NULL.  I view your combine.c change purely as a cleanup, thus IMHO stage 1
> material.

Whatever, also fine. I assume you'll take care of this, given that you
introduced this bug?

Ciao!
Steven


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 12:47 PM, Yuri Rumyantsev  wrote:
> Hi Uros,
>
> This fix is for all x86 platforms, we tested it on core2/corei7,
> atom/atom2 and AMD and got performance improvement +6% -- +11%. So I
> don' think we need to introduce additioanl tune feature.
>
> Sorry for my typo with gcc version - I ment mainline only since 4.7
> does not use LRA.

Btw, if it's a performance improvement all over the board why not adjust
the patterns itself to not allow memory operands?  Thus, why restrict it
to combine not creating this instruction?  (My -Os comment still stands)

Richard.

> Thanks.
> Yuri.
>
>
>
> 2012/12/12 Uros Bizjak :
>> On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev  wrote:
>>
>>> This fix is aimed to remove performance degradation introduced by new
>>> LRA phase that in fact is combining problem. Gcc combiner does
>>> propagation of memory load to if-then-else gimple that was splitted
>>> back by old reload phase. LRA does not perform such splitting. To
>>> avoid performance slowdown on important benchmark (this is true for
>>> all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
>>> with a check on such propagation and consider such conditional
>>> instruction with memory operand as illegal one from performance point
>>> of view.
>>
>> Is this true for all x86 targets? I have no objections to the
>> implementation, but these fine-tunings should be declared in
>> ix86_tune_features[] array, and used as conditions involving
>> TARGET_xxx in the code. Please see many examples in the i386 source
>> dir.
>>
>>> Is it OK for 4.8 and mainline?
>>
>> Hm, currently 4.8 _is_ mainline. Did you mean 4.7?
>>
>> Thanks,
>> Uros.


Re: [patch] cluster loop bodies in CFG graph dumping

2012-12-12 Thread Steven Bosscher
On Wed, Dec 12, 2012 at 11:46 AM, Richard Biener wrote:
> On Tue, Dec 11, 2012 at 8:57 PM, Steven Bosscher wrote:
> I'm not sure we want to give up this:
>
> @@ -863,12 +863,13 @@ get_loop_body_in_dom_order (const struct loop *loo
>basic_block *tovisit;
>int tv;
>
> +  if (loop->latch == EXIT_BLOCK_PTR)
> +return get_loop_body (loop);
> +
>gcc_assert (loop->num_nodes);
>
>tovisit = XNEWVEC (basic_block, loop->num_nodes);
>
> -  gcc_assert (loop->latch != EXIT_BLOCK_PTR);
> -
>
> assert.  It makes sure nobody treats the root loop node as real loop ...

Well, as you can see, get_loop_body does handle this case already. So
if you use plain get_loop_body, you get the body of the root node
returned.

> So, can you special-case this in your dumping routines instead?

Sure.


> That makes it a dump-only patch and thus ok.  (indeed much nicer to look
> at loop preserve issues this way!)

Ciao!
Steven


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Hi Uros,

This fix is for all x86 platforms, we tested it on core2/corei7,
atom/atom2 and AMD and got performance improvement +6% -- +11%. So I
don' think we need to introduce additioanl tune feature.

Sorry for my typo with gcc version - I ment mainline only since 4.7
does not use LRA.

Thanks.
Yuri.



2012/12/12 Uros Bizjak :
> On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev  wrote:
>
>> This fix is aimed to remove performance degradation introduced by new
>> LRA phase that in fact is combining problem. Gcc combiner does
>> propagation of memory load to if-then-else gimple that was splitted
>> back by old reload phase. LRA does not perform such splitting. To
>> avoid performance slowdown on important benchmark (this is true for
>> all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
>> with a check on such propagation and consider such conditional
>> instruction with memory operand as illegal one from performance point
>> of view.
>
> Is this true for all x86 targets? I have no objections to the
> implementation, but these fine-tunings should be declared in
> ix86_tune_features[] array, and used as conditions involving
> TARGET_xxx in the code. Please see many examples in the i386 source
> dir.
>
>> Is it OK for 4.8 and mainline?
>
> Hm, currently 4.8 _is_ mainline. Did you mean 4.7?
>
> Thanks,
> Uros.


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> This fix is aimed to remove performance degradation introduced by new
> LRA phase that in fact is combining problem. Gcc combiner does
> propagation of memory load to if-then-else gimple that was splitted
> back by old reload phase. LRA does not perform such splitting. To
> avoid performance slowdown on important benchmark (this is true for
> all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
> with a check on such propagation and consider such conditional
> instruction with memory operand as illegal one from performance point
> of view.
>
> The fix was bootstrapped and regtested for x86-64.
> Is it OK for 4.8 and mainline?

Isn't it a win for -Os though?  Thus, optimize_insn_for_size ()?  It can
also increase register pressure, no?  So eventually this splitting should
be done post-reload only.  Not sure what appropriate machinery there is,
besides from mdreorg (or split itself).

Richard.

> ChangeLog:
>
> 2012-12-12  Yuri Rumyantsev  
>
> * config/i386/i386.c (ix86_legitimate_combined_insn) : Avoid combining
> of load and if_then_else instructions.


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev  wrote:

> This fix is aimed to remove performance degradation introduced by new
> LRA phase that in fact is combining problem. Gcc combiner does
> propagation of memory load to if-then-else gimple that was splitted
> back by old reload phase. LRA does not perform such splitting. To
> avoid performance slowdown on important benchmark (this is true for
> all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
> with a check on such propagation and consider such conditional
> instruction with memory operand as illegal one from performance point
> of view.

Is this true for all x86 targets? I have no objections to the
implementation, but these fine-tunings should be declared in
ix86_tune_features[] array, and used as conditions involving
TARGET_xxx in the code. Please see many examples in the i386 source
dir.

> Is it OK for 4.8 and mainline?

Hm, currently 4.8 _is_ mainline. Did you mean 4.7?

Thanks,
Uros.


[PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Hi All,

This fix is aimed to remove performance degradation introduced by new
LRA phase that in fact is combining problem. Gcc combiner does
propagation of memory load to if-then-else gimple that was splitted
back by old reload phase. LRA does not perform such splitting. To
avoid performance slowdown on important benchmark (this is true for
all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn'
with a check on such propagation and consider such conditional
instruction with memory operand as illegal one from performance point
of view.

The fix was bootstrapped and regtested for x86-64.
Is it OK for 4.8 and mainline?

ChangeLog:

2012-12-12  Yuri Rumyantsev  

* config/i386/i386.c (ix86_legitimate_combined_insn) : Avoid combining
of load and if_then_else instructions.


x86-cmove-combine-fix.diff
Description: Binary data


Re: PATCH: pass sysroot to cc1plus

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 12:59 AM, Etienne Le Sueur  wrote:
> Hi Richard,
>
> Thanks for your reply.
>
> The sysroot of /dev/null is basically to force the user or build system to
> pass a valid --sysroot argument. This helps us to ensure that we only link
> against known libraries (that are in a specific location) and there isn't
> any leakage from the host system.
>
> I'm not sure if we're using sysroot the way it was intended, but it is
> certainly helping us to maintain some sanity in our build.
>
> It seems to me like the sysroot that the user passes in as an argument to
> gcc should definitely be forwarded to cc1plus.

Currently --sysroot is only accepted by the driver and passed on to the
compiler as -isysroot, -isystem combination.  So I suppose distcc does
not replicate this behavior properly?

Richard.

> Etienne
>
>
>
> On 4/12/12 1:21 AM, Richard Biener wrote:
>>
>> On Mon, Dec 3, 2012 at 9:03 PM, Etienne Le Sueur 
>> wrote:
>>>
>>> First ping... anyone?
>>
>> A sysroot of /dev/null does not sound like something that we should
>> support.
>> If we do the semantics of this setting should be documented somewhere.
>>
>> Richard.
>>
>>> On 28/11/12 1:21 PM, Etienne Le Sueur wrote:

 Hi,

 With a sysroot of /dev/null, passing a .i file to cc1plus causes it to
 attempt to open /dev/null/usr/include, which fails. This causes problems
 for
 ccache and distcc. There is an open bugzilla ticket at [1].

 The patch below applies on to 4.6.3, but it appears the bug is still
 present in 4.7.2.

 If this is not the correct way to solve this problem, please suggest a
 better approach.

 Regards,
 Etienne Le Sueur

 [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54560


 diff --git a/gcc-4.6.3/gcc/cp/lang-specs.h
 b/gcc-4.6.3/gcc/cp/lang-specs.h
 index a73aba3..873609a 100644
 --- a/gcc-4.6.3/gcc/cp/lang-specs.h
 +++ b/gcc-4.6.3/gcc/cp/lang-specs.h
 @@ -64,5 +64,5 @@ along with GCC; see the file COPYING3.  If not see
 {".ii", "@c++-cpp-output", 0, 0, 0},
 {"@c++-cpp-output",
  "%{!M:%{!MM:%{!E:\
 -cc1plus -fpreprocessed %i %(cc1_options) %2\
 +cc1plus -fpreprocessed %i %I %(cc1_options) %2\
   %{!fsyntax-only:%(invoke_as)", 0, 0, 0},
 diff --git a/gcc-4.6.3/gcc/gcc.c b/gcc-4.6.3/gcc/gcc.c
 index 75f522e..214ef29 100644
 --- a/gcc-4.6.3/gcc/gcc.c
 +++ b/gcc-4.6.3/gcc/gcc.c
 @@ -950,7 +950,7 @@ static const struct compiler default_compilers[] =
   %W{o*:--output-pch=%*}}%V}}", 0, 0, 0},
 {".i", "@cpp-output", 0, 0, 0},
 {"@cpp-output",
 -   "%{!M:%{!MM:%{!E:cc1 -fpreprocessed %i %(cc1_options)
 %{!fsyntax-only:%(invoke_as)", 0, 0, 0},
 +   "%{!M:%{!MM:%{!E:cc1 -fpreprocessed %i %I %(cc1_options)
 %{!fsyntax-only:%(invoke_as)", 0, 0, 0},
 {".s", "@assembler", 0, 0, 0},
 {"@assembler",
  "%{!M:%{!MM:%{!E:%{!S:as %(asm_debug) %(asm_options) %i %A ",
 0,
 0, 0},

>


  1   2   >