Re: [Patch, Fortran, OOP] PR 59493: ICE: Segfault on Class(*) pointer association

2013-12-15 Thread Tobias Burnus

Hi Janus,

Janus Weil wrote:

here is a rather simple fix for a problem with the pointer assignment
of an unlimited polymorphic variable. The patch regtests cleanly on
x86_64-unknown-linux-gnu.

Firstly, I would like to commit to trunk, of course. Ok?

Secondly, the bug reporter asked me (privately) for the possibility of
backporting the fix to 4.8 (which is the only release supporting
unlimited polymorphism so far). It is important for his code at
http://sourceforge.net/projects/permix/.


OK. Thanks for the patch!

Tobias


Re: wide-int more performance fixes for wide multiplication.

2013-12-15 Thread Richard Sandiford
Kenneth Zadeck  writes:
>>> The current world
>>> is actually structured so that we never ask about overflow for the two
>>> larger classes because the reason that you used those classes was that
>>> you never wanted to have this discussion. So if you never ask about
>>> overflow, then it really does not matter because we are not going to
>>> return enough bits for you to care what happened on the inside.  Of
>>> course that could change and someone could say that they wanted overflow
>>> on widest-int.   Then the comment makes sense, with revisions, unless
>>> your review of the code that wants overflow on widest int suggests that
>>> they are just being stupid.
>> But widest_int is now supposed to be at least 1 bit wider than widest
>> input type (unlike previously where it was double the widest input type).
>> So I can definitely see cases where we'd want to know whether a
>> widest_int * widest_int result overflows.
>>
>> My point is that the widest_int * widest_int would normally be a signed
>> multiplication rather than an unsigned multiplication, since the extra
>> 1 bit of precision allows every operation to be signed.  So it isn't
>> a case of whether the top bit of a widest_int will be set, but whether
>> we ever reach here for widest_int in the first place.  We should be
>> going down the sgn == SIGNED path rather than the sgn == UNSIGNED path.
>>
>> widest_int can represent an all-1s value, usually interpreted as -1.
>> If we do go down this sgn == UNSIGNED path for widest_int then we will
>> instead treat the all-1s value as the maximum unsigned number, just like
>> for any other kind of wide int.
>>
>> As far as this function goes there really is no difference between
>> wide_int, offset_int and widest_int.  Which is good, because offset_int
>> and widest_int should just be wide_ints that are optimised for a specific
>> and fixed precision.
>>
>> Thanks,
>> Richard
> I am now seriously regretting letting richi talk me into changing the 
> size of the wide int buffer from being 2x of the largest mode on the 
> machine.   It was a terrible mistake AND i would guess making it smaller 
> does not provide any real benefit.
>
> The problem is that when you use widest-int (and by analogy offset int) 
> it should NEVER EVER overflow.  Furthermore we need to change the 
> interfaces for these two so that you cannot even ask!!(i do not 
> believe that anyone does ask so the change would be small.)

offset_int * offset_int could overflow too, at least in the sense that
there are combinations of valid offset_ints whose product can't be
represented in an offset_int.  E.g. (1ULL << 67) * (1ULL << 67).
I think that was always the case.

> There are a huge set of bugs on the trunk that are "fixed" with wide-int 
> because people wrote code for double-int thinking that it was infinite 
> precision.So they never tested the cases of what happens when the 
> size of the variable needed two HWIs.   Most of those cases were 
> resolved by making passes like tree-vrp use wide-int and then being 
> explicit about the overflow on every operation, because with wide-int 
> the issue is in your face since things overflow even for 32 bit 
> numbers.  However, with the current widest-int, we will only be safe for 
> add and subtract by adding the extra bit.  In multiply we are exposed.   
> The perception is that widest-int is a good as infinite precision and no 
> one will ever write the code to check if it overflowed because it only 
> rarely happens.

All operations can overflow.  We would need 2 extra bits rather than 1
extra bit to stop addition overflowing, because the 1 extra bit we already
have is to allow unsigned values to be treated as signed.  But 2 extra bits
is only good for one addition, not a chain of two additions.

That's why ignoring overflow seems dangerous to me.  The old wide-int
way might have allowed any x * y to be represented, but if nothing
checked whether x * y was bigger than expected then x * y + z could
overflow.

Thanks,
Richard




Re: wide-int more performance fixes for wide multiplication.

2013-12-15 Thread Richard Sandiford
Kenneth Zadeck  writes:
>>> +  vallen = canonize (val, (uvlen + 1) >> 1, prec);
>>> +
>>> +  /* Shift is not always safe to write over one of the
>>> +operands, so we must copy.  */
>>> +  HOST_WIDE_INT tval[2 * WIDE_INT_MAX_ELTS];
>>> +  memcpy (tval, val, vallen * CHAR_BIT / HOST_BITS_PER_WIDE_INT);
>
>
>> vallen * sizeof (HOST_WIDE_INT) would be more typical.
>> But why not unpack into tval directly and avoid the copy?
> I could special case this, but the old code was not correct for odd 
> precisions.

It's not really special-casing, since the pack is already local to this block.
I.e. the patch had:

  wi_pack ((unsigned HOST_WIDE_INT *) val,
   r, uvlen);
  vallen = canonize (val, (uvlen + 1) >> 1, prec);

  /* Shift is not always safe to write over one of the
 operands, so we must copy.  */ 
  HOST_WIDE_INT tval[2 * WIDE_INT_MAX_ELTS];
  memcpy (tval, val, vallen * CHAR_BIT / HOST_BITS_PER_WIDE_INT); 
  vallen = wi::lrshift_large (val, tval, vallen, prec*2, prec, prec);

and I think it should be:

  unsigned int tvallen = (uvlen + 1) >> 1;
  HOST_WIDE_INT *tval = XALLOCAVEC (HOST_WIDE_INT, tvallen);
  wi_pack ((unsigned HOST_WIDE_INT *) tval, r, tvallen);
  tvallen = canonize (tval, tvalen, prec);
  vallen = wi::lrshift_large (val, tval, tvallen, prec * 2, prec, prec);

Thanks,
Richard


Re: [Patch, Fortran, OOP] PR 59493: ICE: Segfault on Class(*) pointer association

2013-12-15 Thread Janus Weil
Hi Tobias,

>> here is a rather simple fix for a problem with the pointer assignment
>> of an unlimited polymorphic variable. The patch regtests cleanly on
>> x86_64-unknown-linux-gnu.
>>
>> Firstly, I would like to commit to trunk, of course. Ok?
>>
>> Secondly, the bug reporter asked me (privately) for the possibility of
>> backporting the fix to 4.8 (which is the only release supporting
>> unlimited polymorphism so far). It is important for his code at
>> http://sourceforge.net/projects/permix/.
>
> OK. Thanks for the patch!

thanks for the review. Committed to trunk as r205997. Will do the
backport to 4.8 within a week.

Cheers,
Janus


Re: RFA: revert libstdc++ r205810: simulator workload increase caused regression

2013-12-15 Thread Jonathan Wakely
On Dec 15, 2013 6:57 AM, "Hans-Peter Nilsson"
 wrote:
>
> From the revision range 205803:205810 (excluding:including) an
> on, my autotester for cris-elf reports a regression:
>
> Running 
> /tmp/hpautotest-gcc1/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp 
> ...
> WARNING: program timed out.
> FAIL: 20_util/hash/chi2_quality.cc execution test
>
> This appears to have come from revision r205810.  I can't find
> the discussion

http://gcc.gnu.org/ml/libstdc++/2013-10/msg00233.html

> or approval leading to that commit.

http://gcc.gnu.org/ml/libstdc++/2013-11/msg00098.html


>  It seems
> likely to have been a mistake, perhaps a commit intended for a
> local repository.  Why else increase the workload, without
> discussion with other target maintainers known to risk being
> affected by this change?  The somewhat obvious fix is to just
> revert revision r205810.
>
> CC to committer and author from ChangeLog.
>
> Ok to commit?
>
> libstdc++:
>         * testsuite/20_util/hash/chi2_quality.cc: Revert previous change.
>
> Index: testsuite/20_util/hash/chi2_quality.cc
> ===
> --- testsuite/20_util/hash/chi2_quality.cc      (revision 205810)
> +++ testsuite/20_util/hash/chi2_quality.cc      (revision 205809)
> @@ -1,7 +1,7 @@
>  // { dg-options "-std=gnu++0x" }
>
>  // Use smaller statistics when running on simulators, so it takes less time.
> -// { dg-options "-std=gnu++0x -DSAMPLES=3" { target simulator } }
> +// { dg-options "-std=gnu++0x -DSAMPLES=1" { target simulator } }
>
>  // Copyright (C) 2010-2013 Free Software Foundation, Inc.
>  //
>
> brgds, H-P


Re: PR c++/58567: Fix ICE on invalid code with -fopenmp in cp/pt.c

2013-12-15 Thread Tobias Burnus

*ping*
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00584.html

On December 6, 2013, Tobias Burnus wrote:
A rather simple fix for an ICE on invalid bug (low-priority 4.8/4.9 
regression).


Bootstrapped and regtested without new failure on x86-64-gnu-linux.
OK for the trunk and 4.8?

Tobias




[PATCH, i386 testsuite]: Fix -mabi=ms related failures for -mtune=corei7

2013-12-15 Thread Uros Bizjak
Hello!

Corei7 tuning doesn't set -maccumulate-outgoing-args option by default
and triggers various -mabi-ms related failures throughout the
testsuite [1]. Attached patch fixes failures by explicitly adding
-maccumulate-outgoing-args, as documented in the ms-abi option
documentation.

2013-12-15  Uros Bizjak  

* gcc.target/i386/pr43662.c (dg-options):
Add -maccumulate-outgoing-args.
* gcc.target/i386/pr43869.c (dg-options): Ditto.
* gcc.target/i386/pr57003.c (dg-options): Ditto.
* gcc.target/i386/avx-vzeroupper-16.c (dg-options):
Remove -mtune=generic and add -maccumulate-outgoing-args instead.
* gcc.target/i386/avx-vzeroupper-17.c (dg-options): Ditto.
* gcc.target/i386/avx-vzeroupper-18.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/func-1.c (dg-options):
Add -maccumulate-outgoing-args.
* gcc.target/x86_64/abi/callabi/func-2a.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/func-2b.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/func-indirect.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/func-indirect-2a.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/func-indirect-2b.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/leaf-1.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/leaf-2.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/pr38891.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/vaarg-1.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/vaarg-2.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/vaarg-3.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/vaarg-4a.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/vaarg-4b.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/vaarg-5a.c (dg-options): Ditto.
* gcc.target/x86_64/abi/callabi/vaarg-5b.c (dg-options): Ditto.

The patch was tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline.

The patch will be backported to 4.8 branch.

[1] http://gcc.gnu.org/ml/gcc-testresults/2013-12/msg01417.html

Uros.
Index: gcc.target/i386/avx-vzeroupper-16.c
===
--- gcc.target/i386/avx-vzeroupper-16.c (revision 205996)
+++ gcc.target/i386/avx-vzeroupper-16.c (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target lp64 } } */
-/* { dg-options "-O2 -mavx -mabi=ms -mtune=generic -dp" } */
+/* { dg-options "-O2 -mavx -mabi=ms -maccumulate-outgoing-args -dp" } */
 
 typedef float __m256 __attribute__ ((__vector_size__ (32), __may_alias__));
 
Index: gcc.target/i386/avx-vzeroupper-17.c
===
--- gcc.target/i386/avx-vzeroupper-17.c (revision 205996)
+++ gcc.target/i386/avx-vzeroupper-17.c (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target lp64 } } */
-/* { dg-options "-O2 -mavx -mabi=ms -mtune=generic -dp" } */
+/* { dg-options "-O2 -mavx -mabi=ms -maccumulate-outgoing-args -dp" } */
 
 typedef float __m256 __attribute__ ((__vector_size__ (32), __may_alias__));
 
Index: gcc.target/i386/avx-vzeroupper-18.c
===
--- gcc.target/i386/avx-vzeroupper-18.c (revision 205996)
+++ gcc.target/i386/avx-vzeroupper-18.c (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target lp64 } } */
-/* { dg-options "-O0 -mavx -mabi=ms -mtune=generic -dp" } */
+/* { dg-options "-O0 -mavx -mabi=ms -maccumulate-outgoing-args -dp" } */
 
 typedef float __m256 __attribute__ ((__vector_size__ (32), __may_alias__));
 
Index: gcc.target/i386/pr43662.c
===
--- gcc.target/i386/pr43662.c   (revision 205996)
+++ gcc.target/i386/pr43662.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target lp64 } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -maccumulate-outgoing-args" } */
 
 void __attribute__ ((ms_abi)) foo (void)
 {
Index: gcc.target/i386/pr43869.c
===
--- gcc.target/i386/pr43869.c   (revision 205996)
+++ gcc.target/i386/pr43869.c   (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile { target lp64 } } */
+/* { dg-options "-maccumulate-outgoing-args" } */
 
 int __attribute__((__noinline__))
 bugged(float f1, float f2, float f3, float f4,
Index: gcc.target/i386/pr57003.c
===
--- gcc.target/i386/pr57003.c   (revision 205996)
+++ gcc.target/i386/pr57003.c   (working copy)
@@ -1,6 +1,6 @@
 /* PR rtl-optimization/57003 */
 /* { dg-do run } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -maccumulate-outgoing-args" } */
 
 #define N 2001
 unsigned short *b, *c, *d;
Index: gcc.target/x86_64/abi/callabi/func-1.c
===
--- gcc.target/x86_64/abi/callabi/func-1.c  (revision 205996)
+++ gcc.target/x86_64/abi/callabi/func-1.c  (working copy)
@@ -2,7 +2,7 

Re: [PATCH, i386 testsuite]: Fix -mabi=ms related failures for -mtune=corei7

2013-12-15 Thread Dominique Dhumieres
Hi Uros,

This patch fix pr58630. However I still think that the tests
func-2a.c, func-indirect-2a.c, vaarg-4a.c, and vaarg-5a.c
should not be restricted to linux: they pass on darwin.

TIA

Dominique


Re: [PATCH, i386 testsuite]: Fix -mabi=ms related failures for -mtune=corei7

2013-12-15 Thread Uros Bizjak
On Sun, Dec 15, 2013 at 12:58 PM, Dominique Dhumieres
 wrote:

> This patch fix pr58630. However I still think that the tests
> func-2a.c, func-indirect-2a.c, vaarg-4a.c, and vaarg-5a.c
> should not be restricted to linux: they pass on darwin.

Thanks for the pointer, I'll add PR reference to the ChangeLog entry.

OTOH, I can't test darwin properly, please provide the patch and I'll
commit it for you.

Thanks,
Uros.


Re: [PATCH, i386 testsuite]: Fix -mabi=ms related failures for -mtune=corei7

2013-12-15 Thread Dominique Dhumieres
> OTOH, I can't test darwin properly, please provide the patch and I'll
> commit it for you.

Basically the patch I have in my tree since the PR replace 'linux' with '*' 
(see below).
Since I can only test darwin, there is no guarantee that the tests pass on 
non-linux,
non-darwin platforms. So if you apply the patch below as such, it will be 
necessary to
watch out for fall-out.

Dominique

diff -up ../_clean/gcc/testsuite/gcc.target/x86_64/abi/callabi/func-2a.c 
gcc/testsuite/gcc.target/x86_64/abi/callabi/func-2a.c
--- ../_clean/gcc/testsuite/gcc.target/x86_64/abi/callabi/func-2a.c 
2013-12-15 12:51:02.0 +0100
+++ gcc/testsuite/gcc.target/x86_64/abi/callabi/func-2a.c   2013-12-15 
12:58:49.0 +0100
@@ -1,5 +1,5 @@
 /* Test for cross x86_64<->w64 abi standard calls.  */
-/* { dg-do run { target i?86-*-linux* x86_64-*-linux* } } */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
 /* { dg-options "-O2 -mabi=ms -std=gnu99 -ffast-math -fno-builtin 
-maccumulate-outgoing-args" } */
 /* { dg-additional-sources "func-2b.c" } */
 
diff -up 
../_clean/gcc/testsuite/gcc.target/x86_64/abi/callabi/func-indirect-2a.c 
gcc/testsuite/gcc.target/x86_64/abi/callabi/func-indirect-2a.c
--- ../_clean/gcc/testsuite/gcc.target/x86_64/abi/callabi/func-indirect-2a.c
2013-12-15 12:51:02.0 +0100
+++ gcc/testsuite/gcc.target/x86_64/abi/callabi/func-indirect-2a.c  
2013-12-15 12:59:12.0 +0100
@@ -1,5 +1,5 @@
 /* Test for cross x86_64<->w64 abi standard calls via variable.  */
-/* { dg-do run { target i?86-*-linux* x86_64-*-linux* } } */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
 /* { dg-options "-O2 -mabi=ms -std=gnu99 -ffast-math -fno-builtin 
-maccumulate-outgoing-args" } */
 /* { dg-additional-sources "func-indirect-2b.c" } */
 
diff -up ../_clean/gcc/testsuite/gcc.target/x86_64/abi/callabi/vaarg-4a.c 
gcc/testsuite/gcc.target/x86_64/abi/callabi/vaarg-4a.c
--- ../_clean/gcc/testsuite/gcc.target/x86_64/abi/callabi/vaarg-4a.c
2013-12-15 12:51:02.0 +0100
+++ gcc/testsuite/gcc.target/x86_64/abi/callabi/vaarg-4a.c  2013-12-15 
12:59:36.0 +0100
@@ -1,5 +1,5 @@
 /* Test for cross x86_64<->w64 abi va_list calls.  */
-/* { dg-do run { target i?86-*-linux* x86_64-*-linux* } } */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
 /* { dg-options "-O2 -mabi=ms -std=gnu99 -fno-builtin 
-maccumulate-outgoing-args" } */
 /* { dg-additional-sources "vaarg-4b.c" } */
 
diff -up ../_clean/gcc/testsuite/gcc.target/x86_64/abi/callabi/vaarg-5a.c 
gcc/testsuite/gcc.target/x86_64/abi/callabi/vaarg-5a.c
--- ../_clean/gcc/testsuite/gcc.target/x86_64/abi/callabi/vaarg-5a.c
2013-12-15 12:51:02.0 +0100
+++ gcc/testsuite/gcc.target/x86_64/abi/callabi/vaarg-5a.c  2013-12-15 
13:00:00.0 +0100
@@ -1,5 +1,5 @@
 /* Test for cross x86_64<->w64 abi va_list calls.  */
-/* { dg-do run { target i?86-*-linux* x86_64-*-linux* } } */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
 /* { dg-options "-O2 -mabi=ms -std=gnu99 -fno-builtin 
-maccumulate-outgoing-args" } */
 /* { dg-additional-sources "vaarg-5b.c" } */
 


Re: [PATCH, i386 testsuite]: Fix -mabi=ms related failures for -mtune=corei7

2013-12-15 Thread Uros Bizjak
On Sun, Dec 15, 2013 at 1:14 PM, Dominique Dhumieres  wrote:
>> OTOH, I can't test darwin properly, please provide the patch and I'll
>> commit it for you.
>
> Basically the patch I have in my tree since the PR replace 'linux' with '*' 
> (see below).
> Since I can only test darwin, there is no guarantee that the tests pass on 
> non-linux,
> non-darwin platforms. So if you apply the patch below as such, it will be 
> necessary to
> watch out for fall-out.

Let's ask Rainer for help with x86 solaris.

Thanks,
Uros.


[PATCH, testsuite]: Fix FAIL: gcc.target/i386/pr57756.c (test for errors, line XX)

2013-12-15 Thread Uros Bizjak
Hello!

This testcase assumes default SSE level less than 4.2. Limit it at SSE2.

2013-12-15  Uros Bizjak  

* gcc.target/i386/pr57756.c (dg-options): Add -mno-sse3.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline.

Uros.

Index: gcc.target/i386/pr57756.c
===
--- gcc.target/i386/pr57756.c   (revision 205996)
+++ gcc.target/i386/pr57756.c   (working copy)
@@ -1,7 +1,7 @@
-/* callee cannot be inlined into caller because it has a higher
-   target ISA.  */
 /* { dg-do compile } */
+/* { dg-options "-mno-sse3" } */

+/* callee cannot be inlined into caller because it has a higher target ISA.  */
 __attribute__((always_inline,target("sse4.2")))
 __inline int callee () /* { dg-error "inlining failed in call to
always_inline" }  */
 {
@@ -18,5 +18,3 @@
 {
   return caller();
 }
-/* callee cannot be inlined into caller because it has a higher
-   target ISA.  */


Re: RFA: revert libstdc++ r205810: simulator workload increase caused regression

2013-12-15 Thread Hans-Peter Nilsson
> From: Jonathan Wakely 
> Date: Sun, 15 Dec 2013 11:38:43 +0100

> On Dec 15, 2013 6:57 AM, "Hans-Peter Nilsson"
>  wrote:
> >
> > From the revision range 205803:205810 (excluding:including) an
> > on, my autotester for cris-elf reports a regression:
> >
> > Running 
> > /tmp/hpautotest-gcc1/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
> >  ...
> > WARNING: program timed out.
> > FAIL: 20_util/hash/chi2_quality.cc execution test
> >
> > This appears to have come from revision r205810.  I can't find
> > the discussion
> 
> http://gcc.gnu.org/ml/libstdc++/2013-10/msg00233.html
> 
> > or approval leading to that commit.
> 
> http://gcc.gnu.org/ml/libstdc++/2013-11/msg00098.html

Aha, October and November; for gcc-patches I looked like a month
back but for that list alone I only looked at December, sorry.

An ill-formed test for the chi2 value to be
math-implementation-dependent, IMHO (certainly not obvious to
me), and it'd have been a service to others mentioning this
oddness in the patch.

I'll take the statement "pass for all arm target linking with
glibc" at face value and not bring up soft-float variants on
newlib and slower host machines (oops).

The time-eater I noticed at the time I added the
SAMPLES-for-simulators thingy is actually in test_document_words.

Would it be ok to split up chi2_quality.cc in five, along the
test_*() functions?

Maybe with a comment or two about the sensitivity to the number
of iterations in test_uniform_random() and test_bit_flip_set().
That will, if not take out the simulator timeout issue
altogether, then at least make it specific to the part with
test_document_words().

Before I thought of that, I started a run for arm-eabi (oops
again, not glibc) with the following patch, presented without a
ChangeLog entry for your disproval.

Index: libstdc++-v3/testsuite/20_util/hash/chi2_quality.cc
===
--- libstdc++-v3/testsuite/20_util/hash/chi2_quality.cc (revision 205997)
+++ libstdc++-v3/testsuite/20_util/hash/chi2_quality.cc (working copy)
@@ -1,7 +1,11 @@
 // { dg-options "-std=gnu++0x" }
 
 // Use smaller statistics when running on simulators, so it takes less time.
-// { dg-options "-std=gnu++0x -DSAMPLES=3" { target simulator } }
+// There's also an issue with "VERIFY( chi2 < k*1.1 )" failing for
+// test_uniform_random() and test_bit_flip_set() for SAMPLES=1
+// and some target library implementations, where 3 pass.
+// { dg-options "-std=gnu++0x -DSAMPLES=3" { target { { arm*-* } && 
simulator } } }
+// { dg-options "-std=gnu++0x -DSAMPLES=1" { target simulator } }
 
 // Copyright (C) 2010-2013 Free Software Foundation, Inc.
 //


brgds, H-P


Re: [Fortran] RFC / RFA patch for using build_predict_expr instead of builtin_expect / PR 58721

2013-12-15 Thread Jan Hubicka
Hi,
sorry for taking time to return back to this.
> Pre-remark:
> 
> I had hoped that something like the following patch would work.
> However, it will lead to a bunch of run-time segfaults in the test
> suite - but the original dump seems to be fine and I also fail to
> spot the problem when looking at the patch. Thus, instead of posting
> a working patch, I have below a nonworking draft patch. Questions:
> 
> (a) Does the build_predict_expr() usage look in principle right - or
> do I do something obviously wrong?
> 
> (b) Is it possible to use build_predict_expr() for code like "a =
> (cond) ? (expr1) : (expr2)"? After gimplifying, there should be a BB
> [and a phi] - thus, adding the predict to the BB should be possible.
> But is it also possible on tree level with code like above? That
> mainly applies to trans-array.c but also in another case.
> 
> (c) How to handle:  if (cond1 || overflow_cond) {...}? Currently,
> "overflow_cond" is marked as unlikely but as build_predict_expr()
> applies to the branch/basic block and not to the condition, how
> should it be handled?
> 
> (d) And in general: Does the approach look right and the choice of
> PRED_ and their values?
> 
> * * *
> 
> gfortran uses internally builtin_expect [i.e. gfc_likely()/gfc_unlikely()].
> 
> On the other hand, builtin_expect is also used by (C/C++) users.
> Google did some tests and empirical result was that unlikely is more
> likely than GCC's prediction handling expected. I believe, the
> likelyhood for likely() reduced from a hit rate of 99 to 90.
> 
> For gfortran, the change leads to PR58721, where some code is
> regarded as less likely as before and to no inlining. In many cases,
> the unlikely() is not necessary as those branches have a call to an
> error function, annotated with "noreturn". As explicit user request
> overrides other heurisitic, the builtin_expect will cause that the
> predictor regards the gfc_unlikely() as *more* likely than without
> (as "noreturn" is less likely).
> 
> 
> This patch tries to address this for the different cases:
> 
> a) If there is a "noreturn" function called, the gfc_(un)likely has
> been removed. [Unless, I wrongly classified is in one of the next
> items; however, a "noreturn" will override another PRED_*. Thus, it
> shouldn't matter.]
> 
> b) As OVERFLOWs should only occur extremely rarely, I used
> PRED_OVERFLOW with PROB_VERY_LIKELY (i.e. the branch is very likely
> to be NOT_TAKEN).
> 
> c) In many cases, GCC permits to replace run-time aborts for
> failures by asking for a status, e.g. "DEALLOCATE(var,
> stat=status)"; I have used PRED_FAIL with HITRATE(80) for those -
> but one can argue about the value.
> 
> d) gfortran supports a run-time warning. That's currently only used
> for copy-in/out. I think both branches can be approximately equally
> likely (with no-warning being more common in most codes). However,
> the warning is only shown once. I have set PRED_WARN_CALL's hitrate
> to  to 70 - but that's also an arbitrary value.
> 
> e) As in the trans_image_index: There one had:  "if (cond ||
> invalid_bound) {... } else {...}". The "invalid_bound" is marked as
> unlikely but PRED_* cannot be use as only part of the branch
> condition is unlikely. [I've removed the unlikely.]
> 
> 
> Does the patch look sensible? Does the addition of the PRED_* make
> sense? How about their values?
> 
> 
> Tobias

>  predict.def   |   16 +++
> 
>  fortran/trans-array.c |   26 +++-
>  fortran/trans-expr.c  |   15 ---
>  fortran/trans-intrinsic.c |3 -
>  fortran/trans-io.c|1 
>  fortran/trans-stmt.c  |   26 
>  fortran/trans.c   |   97 
> +++---
>  fortran/trans.h   |4 -
> 
>  8 files changed, 107 insertions(+), 81 deletions(-)
> 
> diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
> index 78b08d7..8f4db5a 100644
> --- a/gcc/fortran/trans-array.c
> +++ b/gcc/fortran/trans-array.c
> @@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "system.h"
>  #include "coretypes.h"
>  #include "tree.h"
> +#include "predict.h" /* For PRED_FAIL.  */
> @@ -5222,6 +5222,8 @@ gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree 
> status, tree errmsg,
>overflow = integer_zero_node;
>  
>gfc_init_block (&set_descriptor_block);
> +  gfc_add_expr_to_block (&set_descriptor_block,
> +  build_predict_expr (PRED_FAIL, TAKEN));
>size = gfc_array_init_size (se->expr, ref->u.ar.as->rank,
> ref->u.ar.as->corank, &offset, lower, upper,
> &se->pre, &set_descriptor_block, &overflow,
> @@ -5248,6 +5250,8 @@ gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree 
> status, tree errmsg,
> stmtblock_t set_status_block;
>  
> gfc_start_block (&set_status_block);
> +   gfc_add_expr_to_block (&set_status_block,
> +  build_predic

Re: wide-int more performance fixes for wide multiplication.

2013-12-15 Thread Kenneth Zadeck


On 12/15/2013 03:54 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

The current world
is actually structured so that we never ask about overflow for the two
larger classes because the reason that you used those classes was that
you never wanted to have this discussion. So if you never ask about
overflow, then it really does not matter because we are not going to
return enough bits for you to care what happened on the inside.  Of
course that could change and someone could say that they wanted overflow
on widest-int.   Then the comment makes sense, with revisions, unless
your review of the code that wants overflow on widest int suggests that
they are just being stupid.

But widest_int is now supposed to be at least 1 bit wider than widest
input type (unlike previously where it was double the widest input type).
So I can definitely see cases where we'd want to know whether a
widest_int * widest_int result overflows.

My point is that the widest_int * widest_int would normally be a signed
multiplication rather than an unsigned multiplication, since the extra
1 bit of precision allows every operation to be signed.  So it isn't
a case of whether the top bit of a widest_int will be set, but whether
we ever reach here for widest_int in the first place.  We should be
going down the sgn == SIGNED path rather than the sgn == UNSIGNED path.

widest_int can represent an all-1s value, usually interpreted as -1.
If we do go down this sgn == UNSIGNED path for widest_int then we will
instead treat the all-1s value as the maximum unsigned number, just like
for any other kind of wide int.

As far as this function goes there really is no difference between
wide_int, offset_int and widest_int.  Which is good, because offset_int
and widest_int should just be wide_ints that are optimised for a specific
and fixed precision.

Thanks,
Richard

I am now seriously regretting letting richi talk me into changing the
size of the wide int buffer from being 2x of the largest mode on the
machine.   It was a terrible mistake AND i would guess making it smaller
does not provide any real benefit.

The problem is that when you use widest-int (and by analogy offset int)
it should NEVER EVER overflow.  Furthermore we need to change the
interfaces for these two so that you cannot even ask!!(i do not
believe that anyone does ask so the change would be small.)

offset_int * offset_int could overflow too, at least in the sense that
there are combinations of valid offset_ints whose product can't be
represented in an offset_int.  E.g. (1ULL << 67) * (1ULL << 67).
I think that was always the case.


see answer below.



There are a huge set of bugs on the trunk that are "fixed" with wide-int
because people wrote code for double-int thinking that it was infinite
precision.So they never tested the cases of what happens when the
size of the variable needed two HWIs.   Most of those cases were
resolved by making passes like tree-vrp use wide-int and then being
explicit about the overflow on every operation, because with wide-int
the issue is in your face since things overflow even for 32 bit
numbers.  However, with the current widest-int, we will only be safe for
add and subtract by adding the extra bit.  In multiply we are exposed.
The perception is that widest-int is a good as infinite precision and no
one will ever write the code to check if it overflowed because it only
rarely happens.

All operations can overflow.  We would need 2 extra bits rather than 1
extra bit to stop addition overflowing, because the 1 extra bit we already
have is to allow unsigned values to be treated as signed.  But 2 extra bits
is only good for one addition, not a chain of two additions.

That's why ignoring overflow seems dangerous to me.  The old wide-int
way might have allowed any x * y to be represented, but if nothing
checked whether x * y was bigger than expected then x * y + z could
overflow.

Thanks,
Richard


it is certainly true that in order to do an unbounded set of operations, 
you would have to check on every operation.   so my suggestion that we 
should remove the checking from the infinite precision would not support 
this. but the reality is that there are currently no places in the 
compiler that do this.


Currently all of the uses of widest-int are one or two operations, and 
the style of code writing is that you do these and then you deal with 
the overflow at the time that you convert the widest-int to a tree.   I 
think that it is important to maintain the style of programming where 
for a small finite number of computations do not need to check until 
they convert back.


The problem with making the buffer size so tight is that we do not have 
an adequate reserves to allow this style for any supportable type.   
I personally think that 2x + some small n is what we need to have.



i am not as familiar with how this is used (or to be used when all of 
the offset math is converted to use wide-int), but there appear to be 
two

Re: wide-int more performance fixes for wide multiplication.

2013-12-15 Thread Richard Sandiford
Kenneth Zadeck  writes:
> it is certainly true that in order to do an unbounded set of operations, 
> you would have to check on every operation.   so my suggestion that we 
> should remove the checking from the infinite precision would not support 
> this. but the reality is that there are currently no places in the 
> compiler that do this.
>
> Currently all of the uses of widest-int are one or two operations, and 
> the style of code writing is that you do these and then you deal with 
> the overflow at the time that you convert the widest-int to a tree.   I 
> think that it is important to maintain the style of programming where 
> for a small finite number of computations do not need to check until 
> they convert back.
>
> The problem with making the buffer size so tight is that we do not have 
> an adequate reserves to allow this style for any supportable type.   
> I personally think that 2x + some small n is what we need to have.
>
>
> i am not as familiar with how this is used (or to be used when all of 
> the offset math is converted to use wide-int), but there appear to be 
> two uses of multiply.one is the "harmless" mult by 3" and the other 
> is where people are trying to compute the size of arrays.These last 
> operations do need to be checked for overflow.The question here is 
> do you want to force those operations to overflow individually or do you 
> want to check when you convert out.Again, i think 2x + some small 
> number is what we might want to consider.

It's a fair question, but personally I think checking for overflow
on the operation is much more robust.  Checking on conversion doesn't
allow you to stop thinking about overflow, it just changes the way you
think about it: rather than handling explicit overflow flags, you have
to remember to ask "is the range of the unconverted result within the
range of widest_int", which I bet it is something that would be easily
forgotten once widest_int & co. are part of the furniture.

E.g. the SPARC operation (picked only because I remember it):

  for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i)
{
  tree e0 = VECTOR_CST_ELT (arg0, i);
  tree e1 = VECTOR_CST_ELT (arg1, i);

  bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;

  tmp = wi::neg (e1, &neg1_ovf);
  tmp = wi::add (e0, tmp, SIGNED, &add1_ovf);
  if (wi::neg_p (tmp))
tmp = wi::neg (tmp, &neg2_ovf);
  else
neg2_ovf = false;
  result = wi::add (result, tmp, SIGNED, &add2_ovf);
  overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf;
}

  gcc_assert (!overflow);

  return wide_int_to_tree (rtype, result);

seems pretty natural.  If instead it was modelled as a widest_int
chain without overflow then it would be less obviously correct.

Thanks,
Richard


[patch] Fix PR debug/59418

2013-12-15 Thread Eric Botcazou
Hi,

this is a latent bug exposed on the mainline for the ARM by:
  http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01755.html

The problem is that the CFI expressions for:

(insn/f:TI 102 11 103 (parallel [
(set (mem/c:BLK (pre_modify:SI (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 13 sp)
(const_int -8 [0xfff8]))) [2  A8])
(unspec:BLK [
(reg:DF 32 s16)
] UNSPEC_PUSH_MULT))
]) pr59418.c:5 728 {*push_multi_vfp}
 (expr_list:REG_DEAD (reg:DF 32 s16)
(expr_list:REG_FRAME_RELATED_EXPR (sequence [
(set/f (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 13 sp)
(const_int -8 [0xfff8])))
(set/f (mem/c:DF (reg/f:SI 13 sp) [2  S8 A64])
(reg:DF 32 s16))
])
(nil

and:

(insn/f:TI 113 128 20 (parallel [
(set/f (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 13 sp)
(const_int 8 [0x8])))
(set/f (reg:DF 32 s16)
(mem/c:DF (reg/f:SI 13 sp) [2  S8 A64]))
]) pr59418.c:28 347 {*vfp_pop_multiple_with_writeback}
 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 13 sp)
(const_int 8 [0x8])))
(expr_list:REG_CFA_RESTORE (reg:DF 32 s16)
(nil

are treated differently: for the former, the source register (reg:DF 32 s16)
goes through TARGET_DWARF_REGISTER_SPAN but, for the latter, the same register
being restored does not, which results in CFI mismatch caught by the checking.

Fixed by handling TARGET_DWARF_REGISTER_SPAN for REG_CFA_RESTORE, tested on 
arm-eabi, OK for the mainline?


2013-12-15  Eric Botcazou  

PR debug/59418
* dwarf2cfi.c (dwarf2out_frame_debug_cfa_offset): Fix comment and clean 
up implementation.
(dwarf2out_frame_debug_cfa_restore): Handle TARGET_DWARF_REGISTER_SPAN.
(dwarf2out_frame_debug_expr): Clean up implementation.


2013-12-15  Eric Botcazou  

* gcc.dg/pr59418.c: New test.


-- 
Eric BotcazouIndex: dwarf2cfi.c
===
--- dwarf2cfi.c	(revision 205982)
+++ dwarf2cfi.c	(working copy)
@@ -1149,18 +1149,14 @@ dwarf2out_frame_debug_cfa_offset (rtx se
   else
 {
   /* We have a PARALLEL describing where the contents of SRC live.
-   	 Queue register saves for each piece of the PARALLEL.  */
-  int par_index;
-  int limit;
+   	 Adjust the offset for each piece of the PARALLEL.  */
   HOST_WIDE_INT span_offset = offset;
 
   gcc_assert (GET_CODE (span) == PARALLEL);
 
-  limit = XVECLEN (span, 0);
-  for (par_index = 0; par_index < limit; par_index++)
+  for (int par_index = 0; par_index < XVECLEN (span, 0); par_index++)
 	{
 	  rtx elem = XVECEXP (span, 0, par_index);
-
 	  sregno = dwf_regno (src);
 	  reg_save (sregno, INVALID_REGNUM, span_offset);
 	  span_offset += GET_MODE_SIZE (GET_MODE (elem));
@@ -1229,10 +1225,30 @@ dwarf2out_frame_debug_cfa_expression (rt
 static void
 dwarf2out_frame_debug_cfa_restore (rtx reg)
 {
-  unsigned int regno = dwf_regno (reg);
+  gcc_assert (REG_P (reg));
+
+  rtx span = targetm.dwarf_register_span (reg);
+  if (!span)
+{
+  unsigned int regno = dwf_regno (reg);
+  add_cfi_restore (regno);
+  update_row_reg_save (cur_row, regno, NULL);
+}
+  else
+{
+  /* We have a PARALLEL describing where the contents of REG live.
+	 Restore the register for each piece of the PARALLEL.  */
+  gcc_assert (GET_CODE (span) == PARALLEL);
 
-  add_cfi_restore (regno);
-  update_row_reg_save (cur_row, regno, NULL);
+  for (int par_index = 0; par_index < XVECLEN (span, 0); par_index++)
+	{
+	  reg = XVECEXP (span, 0, par_index);
+	  gcc_assert (REG_P (reg));
+	  unsigned int regno = dwf_regno (reg);
+	  add_cfi_restore (regno);
+	  update_row_reg_save (cur_row, regno, NULL);
+	}
+}
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_WINDOW_SAVE.
@@ -1884,23 +1900,22 @@ dwarf2out_frame_debug_expr (rtx expr)
 	}
 	}
 
-  span = NULL;
   if (REG_P (src))
 	span = targetm.dwarf_register_span (src);
+  else
+	span = NULL;
+
   if (!span)
 	queue_reg_save (src, NULL_RTX, offset);
   else
 	{
 	  /* We have a PARALLEL describing where the contents of SRC live.
 	 Queue register saves for each piece of the PARALLEL.  */
-	  int par_index;
-	  int limit;
 	  HOST_WIDE_INT span_offset = offset;
 
 	  gcc_assert (GET_CODE (span) == PARALLEL);
 
-	  limit = XVECLEN (span, 0);
-	  for (par_index = 0; par_index < limit; par_index++)
+	  for (int par_index = 0; par_index < XVECLEN (span, 0); par_index++)
 	{
 	  rtx elem = XVECEXP (span, 0, par_index);
 	  queue_reg_save (elem, NULL_RTX, span_offset);/* PR debug/59418 */
/* Reported by Ryan Mansf

[PATCH, i386 testsuite]: Fix gcc.dg/vect/vect-nop-move.c execution test for 32bit x86 targets

2013-12-15 Thread Uros Bizjak
Hello!

Attached patch emits necessary emms instructions to fix mixed
V2SFmode/SFmode runtime test on 32bit x86 targets. The patch also
reorders functions a bit to group together MMX argument handlings, so
it becomes less fragile w.r.t. MMX/x87 interactions.

2013-12-15  Uros Bizjak  

* gcc.dg/vect/vect-nop-move.c (foo32x2_be): Call
__builtin_ia32_emms for 32bit x86 targets.
(foo32x2_le): Ditto.
(main): Reorder function calls.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline.

Uros.
Index: gcc.dg/vect/vect-nop-move.c
===
--- gcc.dg/vect/vect-nop-move.c (revision 205996)
+++ gcc.dg/vect/vect-nop-move.c (working copy)
@@ -30,12 +30,21 @@ bar (float a)
 NOINLINE float
 foo32x2_be (float32x2_t x)
 {
+#ifdef __i386__
+  /* ix86 passes float32x2 vector arguments in mmx registers.  We need to
+ emit emms to empty MMS state and reenable x87 stack before float value
+ can be loaded to and passed in x87 floating-point return register.  */
+  __builtin_ia32_emms ();
+#endif
   return bar (x[1]);
 }
 
 NOINLINE float
 foo32x2_le (float32x2_t x)
 {
+#ifdef __i386__
+  __builtin_ia32_emms ();
+#endif
   return bar (x[0]);
 }
 
@@ -45,16 +54,16 @@ main()
   float32x4_t a = { 0.0f, 1.0f, 2.0f, 3.0f };
   float32x2_t b = { 0.0f, 1.0f };
 
-  if (foo32x4_be (a) != 3.0f)
+  if (foo32x2_be (b) != 1.0f)
 abort ();
 
-  if (foo32x4_le (a) != 0.0f)
+  if (foo32x2_le (b) != 0.0f)
 abort ();
 
-  if (foo32x2_be (b) != 1.0f)
+  if (foo32x4_be (a) != 3.0f)
 abort ();
 
-  if (foo32x2_le (b) != 0.0f)
+  if (foo32x4_le (a) != 0.0f)
 abort ();
 
   return 0;


Re: wide-int more performance fixes for wide multiplication.

2013-12-15 Thread Kenneth Zadeck


On 12/15/2013 11:40 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

it is certainly true that in order to do an unbounded set of operations,
you would have to check on every operation.   so my suggestion that we
should remove the checking from the infinite precision would not support
this. but the reality is that there are currently no places in the
compiler that do this.

Currently all of the uses of widest-int are one or two operations, and
the style of code writing is that you do these and then you deal with
the overflow at the time that you convert the widest-int to a tree.   I
think that it is important to maintain the style of programming where
for a small finite number of computations do not need to check until
they convert back.

The problem with making the buffer size so tight is that we do not have
an adequate reserves to allow this style for any supportable type.
I personally think that 2x + some small n is what we need to have.


i am not as familiar with how this is used (or to be used when all of
the offset math is converted to use wide-int), but there appear to be
two uses of multiply.one is the "harmless" mult by 3" and the other
is where people are trying to compute the size of arrays.These last
operations do need to be checked for overflow.The question here is
do you want to force those operations to overflow individually or do you
want to check when you convert out.Again, i think 2x + some small
number is what we might want to consider.

It's a fair question, but personally I think checking for overflow
on the operation is much more robust.  Checking on conversion doesn't
allow you to stop thinking about overflow, it just changes the way you
think about it: rather than handling explicit overflow flags, you have
to remember to ask "is the range of the unconverted result within the
range of widest_int", which I bet it is something that would be easily
forgotten once widest_int & co. are part of the furniture.

E.g. the SPARC operation (picked only because I remember it):

  for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i)
{
  tree e0 = VECTOR_CST_ELT (arg0, i);
  tree e1 = VECTOR_CST_ELT (arg1, i);

  bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;

  tmp = wi::neg (e1, &neg1_ovf);
  tmp = wi::add (e0, tmp, SIGNED, &add1_ovf);
  if (wi::neg_p (tmp))
tmp = wi::neg (tmp, &neg2_ovf);
  else
neg2_ovf = false;
  result = wi::add (result, tmp, SIGNED, &add2_ovf);
  overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf;
}

  gcc_assert (!overflow);

  return wide_int_to_tree (rtype, result);

seems pretty natural.  If instead it was modelled as a widest_int
chain without overflow then it would be less obviously correct.

Thanks,
Richard
Let us for the sake of argument assume that this was common code rather 
than code in a particular port, because code in a particular port can 
know more about the environment than common code is allowed to.


My main point is that this code is in wide-int not widest-int because at 
this level the writer of this code actually wants to model what the 
target wants to do.   So doing the adds in precision and testing 
overflow is perfectly fine at every step.But this loop CANNOT be 
written in a style where you tested the overflow at the end because if 
this is common code you cannot make any assumptions about the largest 
mode on the machine. If the buffer was 2x + n in size, then it would 
be reasonably safe to assume that the number of elements in the vector 
could be represented in an integer and so you could wait till the end.


I think that my point was that (and i feel a little uncomfortable 
putting words in richi's mouth but i believe that this was his point 
early on) was that he thinks of the widest int as an infinite precision 
representation.he was the one who was pushing for the entire rep to 
be done with a large internal (or perhaps unbounded) rep because he felt 
that this was more natural to not have to think about overflow. He 
wanted you to be able to chain a mult and a divide and not see the 
product get truncated before the divide was done.The rep that we 
have now really sucks with respect to this because widest int truncates 
if you are close to the largest precision on the machine and does not if 
you are small with respect to that.


My other point is that while you think that the example above is nice, 
the experience with double-int is contrary to this.   people will say 
(and test) the normal modes and anyone trying to use large modes will 
die a terrible death of a thousand cuts.






Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-15 Thread Allan Sandfeld Jensen
Hi again
On Wednesday 11 December 2013, Uros Bizjak wrote:
> Hello!
> 
> > PR gcc/59422
> > 
> > This patch extends the supported targets for function multi versiong to
> > also include Haswell, Silvermont, and the most recent AMD models. It
> > also prioritizes AVX2 versions over AMD specific pre-AVX2 versions.
> 
> Please add a ChangeLog entry and attach the complete patch. Please
> also state how you tested the patch, as outlined in the instructions
> [1].
> 
> [1] http://gcc.gnu.org/contribute.html
> 
Updated patch for better CPU model detection and added ChangeLog.

The patch has been tested with the attached test.cpp. Verified that it doesn't 
build before the patch, and that it builds after, and verified it selects 
correct versions at runtime based on either CPU model or supported ISA (tested 
on 3 machines: SandyBridge, IvyBridge and Phenom II).

Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is that 
an old term for what has become the Jaguar architecture?

`Allan
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 205984)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2013-12-14  Allan Sandfeld Jensen 
+
+PR gcc/59422
+* config/i386/i386.c: Extend function multiversioning
+to better support recent Intel and AMD models.
+
 2013-12-14  Marek Polacek  
 
 	PR sanitizer/59503
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 205984)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29962,9 +29962,14 @@
 P_PROC_SSE4_2,
 P_POPCNT,
 P_AVX,
+P_PROC_AVX,
+P_FMA4,
+P_XOP,
+P_PROC_XOP,
+P_FMA,
+P_PROC_FMA,
 P_AVX2,
-P_FMA,
-P_PROC_FMA
+P_PROC_AVX2
   };
 
  enum feature_priority priority = P_ZERO;
@@ -29983,11 +29988,15 @@
   {"sse", P_SSE},
   {"sse2", P_SSE2},
   {"sse3", P_SSE3},
+  {"sse4a", P_SSE4_a},
   {"ssse3", P_SSSE3},
   {"sse4.1", P_SSE4_1},
   {"sse4.2", P_SSE4_2},
   {"popcnt", P_POPCNT},
   {"avx", P_AVX},
+  {"fma4", P_FMA4},
+  {"xop", P_XOP},
+  {"fma", P_FMA},
   {"avx2", P_AVX2}
 };
 
@@ -30041,25 +30050,49 @@
 	  break;
 case PROCESSOR_COREI7_AVX:
   arg_str = "corei7-avx";
-  priority = P_PROC_SSE4_2;
+  priority = P_PROC_AVX;
   break;
+case PROCESSOR_HASWELL:
+  arg_str = "core-avx2";
+  priority = P_PROC_AVX2;
+  break;
 	case PROCESSOR_ATOM:
 	  arg_str = "atom";
 	  priority = P_PROC_SSSE3;
 	  break;
+case PROCESSOR_SLM:
+  arg_str = "slm";
+  priority = P_PROC_SSE4_2;
+  break;
 	case PROCESSOR_AMDFAM10:
 	  arg_str = "amdfam10h";
 	  priority = P_PROC_SSE4_a;
 	  break;
+case PROCESSOR_BTVER1:
+  arg_str = "btver1";
+  priority = P_PROC_SSE4_a;
+  break;
+case PROCESSOR_BTVER2:
+  arg_str = "btver2";
+  priority = P_PROC_SSE4_2;
+  break;
 	case PROCESSOR_BDVER1:
 	  arg_str = "bdver1";
-	  priority = P_PROC_FMA;
+	  priority = P_PROC_XOP;
 	  break;
 	case PROCESSOR_BDVER2:
 	  arg_str = "bdver2";
 	  priority = P_PROC_FMA;
 	  break;
-	}  
+case PROCESSOR_BDVER3:
+  arg_str = "bdver3";
+  priority = P_PROC_FMA;
+  break;
+case PROCESSOR_BDVER4:
+  arg_str = "bdver4";
+  priority = P_PROC_AVX2;
+  break;
+}  
 	}
 
   cl_target_option_restore (&global_options, &cur_target);
@@ -30919,9 +30952,13 @@
 F_SSE2,
 F_SSE3,
 F_SSSE3,
+F_SSE4_a,
 F_SSE4_1,
 F_SSE4_2,
 F_AVX,
+F_FMA4,
+F_XOP,
+F_FMA,
 F_AVX2,
 F_MAX
   };
@@ -30938,15 +30975,20 @@
 M_INTEL_CORE2,
 M_INTEL_COREI7,
 M_AMDFAM10H,
+M_AMDFAM14H,
 M_AMDFAM15H,
 M_INTEL_SLM,
 M_CPU_SUBTYPE_START,
 M_INTEL_COREI7_NEHALEM,
 M_INTEL_COREI7_WESTMERE,
 M_INTEL_COREI7_SANDYBRIDGE,
+M_INTEL_COREI7_IVYBRIDGE,
+M_INTEL_COREI7_HASWELL,
 M_AMDFAM10H_BARCELONA,
 M_AMDFAM10H_SHANGHAI,
 M_AMDFAM10H_ISTANBUL,
+M_AMDFAM14H_BTVER1,
+M_AMDFAM14H_BTVER2,
 M_AMDFAM15H_BDVER1,
 M_AMDFAM15H_BDVER2,
 M_AMDFAM15H_BDVER3,
@@ -30968,11 +31010,16 @@
   {"corei7", M_INTEL_COREI7},
   {"nehalem", M_INTEL_COREI7_NEHALEM},
   {"westmere", M_INTEL_COREI7_WESTMERE},
-  {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE},
+  {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE},
+  {"core-avx-i", M_INTEL_COREI7_IVYBRIDGE},
+  {"core-avx2", M_INTEL_COREI7_HASWELL},
   {"amdfam10h", M_AMDFAM10H},
   {"barcelona", M_AMDFAM10H_BARCELONA},
   {"shanghai", M_AMDFAM10H_SH

Re: [Fortran] RFC / RFA patch for using build_predict_expr instead of builtin_expect / PR 58721

2013-12-15 Thread Tobias Burnus

Hi Honza,

Jan Hubicka wrote:

But if you have something like

a=__builtin_expect (b?1:0,0)

and you produce

a=b?predict_expr not_taken, 0,0
...
if (a)
   unlikely path

We need to check how it goes down to gimple.


It seems as if something doesn't work in that case – at least I do not 
understand the failure in gfortran.dg/elemental_optional_args_5.f03. 
With the patch one has in gfc_conv_procedure_call:


+ gfc_init_block (&err_block);
+ gfc_add_expr_to_block (&err_block,
+ build_predict_expr (PRED_ZERO, NOT_TAKEN));
+ gfc_add_expr_to_block (&err_block,
+ fold_convert (TREE_TYPE (parmse.expr),
+ null_pointer_node));


tmp = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node,
descriptor_data,
fold_convert (TREE_TYPE (descriptor_data),
null_pointer_node));
parmse.expr
= fold_build3_loc (input_location, COND_EXPR,
+ TREE_TYPE (parmse.expr), tmp,
+ gfc_finish_block (&err_block), parmse.expr);


And for some reasons that will fail with -O0/-Os - it works with -O1 or 
when one removes the PREDICT.


Any idea what goes wrong here? I thought predictions can only produce 
inefficient code but not wrong code.




So in short if we want the predictions to work reliably, we need to be sure
we produce them in a way backend can reliably consume them.


I fully concur.


   2) if there are the other cases (i.e. fortran language allows the failure
  flag to be stored into user variable and handled by user later),
  I will add internal use only argument to predict_expr and extend its 
handling.
  Again it won't be 100% reliable, since one conditional can then be handled
  by multiple predict_exprs that leads to generally unsolvable problem on
  how to combine probabilities.
  But hopefully those cases will mostly be simple - i.e. user will have
  if (failure_flag)
make horrible death in controlled way
  statement somewhre after the allcation.
  We are smart enough to work out simple variants like
  if (failure_flag1 || failure_flag2)
make horrible death in controlled way


I think that's the case for, e.g.
ALLOCATE(variable, stat=status)
where gives status /= 0 in case an error occurs.


   3) make sure the runtime library calls are correctly annotated with 
noreturn/cold
  flags that may help back end to work out the cases it failed otherwise.


I think that's already the case - and is the most important case.


+/* Branch leading to an overflow are extremely unlikely.  */
+DEF_PREDICTOR (PRED_OVERFLOW, "overflow", PROB_VERY_LIKELY,
+  PRED_FLAG_FIRST_MATCH)

I would even gor for PROB_ALWAYS here.


Fine with me - I was only fearing that it would regard it as always and 
remove the NOT_TAKEN path. But if it keeps it, using PROB_ALWAYS makes 
sense.



I would go for a bit more dscriptive names, like "fortran alloc overflow"/"fortran 
zero size alloc" etc.
Those names will hopefuly be more obvious for middle end developer like myself.


Okay. When the issue with elemental_optional_args_5.f03 is understood, I 
will update the patch.


Tobias


Re: [Fortran] RFC / RFA patch for using build_predict_expr instead of builtin_expect / PR 58721

2013-12-15 Thread Jan Hubicka
> Hi Honza,
> 
> Jan Hubicka wrote:
> >But if you have something like
> >
> >a=__builtin_expect (b?1:0,0)
> >
> >and you produce
> >
> >a=b?predict_expr not_taken, 0,0
> >...
> >if (a)
> >   unlikely path
> >
> >We need to check how it goes down to gimple.
> 
> It seems as if something doesn't work in that case – at least I do
> not understand the failure in
> gfortran.dg/elemental_optional_args_5.f03. With the patch one has in
> gfc_conv_procedure_call:
> 
> + gfc_init_block (&err_block);
> + gfc_add_expr_to_block (&err_block,
> + build_predict_expr (PRED_ZERO, NOT_TAKEN));
> + gfc_add_expr_to_block (&err_block,
> + fold_convert (TREE_TYPE (parmse.expr),
> + null_pointer_node));
> 
> 
> tmp = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node,
> descriptor_data,
> fold_convert (TREE_TYPE (descriptor_data),
> null_pointer_node));
> parmse.expr
> = fold_build3_loc (input_location, COND_EXPR,
> + TREE_TYPE (parmse.expr), tmp,
> + gfc_finish_block (&err_block), parmse.expr);
> 
> 
> And for some reasons that will fail with -O0/-Os - it works with -O1
> or when one removes the PREDICT.
> 
> Any idea what goes wrong here? I thought predictions can only
> produce inefficient code but not wrong code.

Yep, they should not roduce incorrect code. Isn't the problem
that you expect the whole expression to have value and predit_expr
"reutrns" nothing?
Can you check what lands into gimple?
> 
> 
> >So in short if we want the predictions to work reliably, we need to be sure
> >we produce them in a way backend can reliably consume them.
> 
> I fully concur.
> 
> >   2) if there are the other cases (i.e. fortran language allows the failure
> >  flag to be stored into user variable and handled by user later),
> >  I will add internal use only argument to predict_expr and extend its 
> > handling.
> >  Again it won't be 100% reliable, since one conditional can then be 
> > handled
> >  by multiple predict_exprs that leads to generally unsolvable problem on
> >  how to combine probabilities.
> >  But hopefully those cases will mostly be simple - i.e. user will have
> >  if (failure_flag)
> >make horrible death in controlled way
> >  statement somewhre after the allcation.
> >  We are smart enough to work out simple variants like
> >  if (failure_flag1 || failure_flag2)
> >make horrible death in controlled way
> 
> I think that's the case for, e.g.
> ALLOCATE(variable, stat=status)
> where gives status /= 0 in case an error occurs.

Yep, in that case I will make the patch extending builtin_expect semantic then.
> 
> >   3) make sure the runtime library calls are correctly annotated with 
> > noreturn/cold
> >  flags that may help back end to work out the cases it failed otherwise.
> 
> I think that's already the case - and is the most important case.
> 
> >>+/* Branch leading to an overflow are extremely unlikely.  */
> >>+DEF_PREDICTOR (PRED_OVERFLOW, "overflow", PROB_VERY_LIKELY,
> >>+  PRED_FLAG_FIRST_MATCH)
> >I would even gor for PROB_ALWAYS here.
> 
> Fine with me - I was only fearing that it would regard it as always
> and remove the NOT_TAKEN path. But if it keeps it, using PROB_ALWAYS
> makes sense.

It is prediction only, not a fact one can use for optimization. Basically
things that will never ever be on time critical path can be marked
as PROB_ALWAYS.

Here I think we can be pretty sure (well of course except for artifically
constructed testcase.  We can also make difference in between case where
frontend autogenerate termination of the program and where user asks for
status, perhaps?)
> 
> >I would go for a bit more dscriptive names, like "fortran alloc 
> >overflow"/"fortran zero size alloc" etc.
> >Those names will hopefuly be more obvious for middle end developer like 
> >myself.
> 
> Okay. When the issue with elemental_optional_args_5.f03 is
> understood, I will update the patch.
> 
> Tobias


Fix PR ipa/59265

2013-12-15 Thread Jan Hubicka
Hi,
the problem here is ipa-prop trying to analyze indirect call that has been
already turned to direct.  While early opts should optimize this call (and
in fact I have approved patch to do so I forgot to apply), we should
not ICE in this case.

Fixed thus,
bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

PR ipa/59265
* g++.dg/torture/pr59265.C: New testcase.
* ipa-prop.c (ipa_analyze_call_uses): Do not analyze indirect calls
that was already turned into direct calls.

Index: testsuite/g++.dg/torture/pr59265.C
===
--- testsuite/g++.dg/torture/pr59265.C  (revision 0)
+++ testsuite/g++.dg/torture/pr59265.C  (revision 0)
@@ -0,0 +1,22 @@
+// { dg-do compile }
+// { dg-options "-fprofile-use" }
+
+class A {
+  int m_fn1() const;
+  unsigned m_fn2() const;
+};
+class B {
+public:
+  virtual void m_fn1();
+};
+class C final : B {
+  C();
+  virtual void m_fn2() { m_fn1(); }
+};
+int a;
+unsigned A::m_fn2() const {
+  if (m_fn1())
+return 0;
+  a = m_fn2();
+}
+C::C() {}
Index: ipa-prop.c
===
--- ipa-prop.c  (revision 205993)
+++ ipa-prop.c  (working copy)
@@ -2024,8 +2024,17 @@ ipa_analyze_call_uses (struct cgraph_nod
   struct param_analysis_info *parms_ainfo, gimple call)
 {
   tree target = gimple_call_fn (call);
+  struct cgraph_edge *cs;
 
-  if (!target)
+  if (!target
+  || (TREE_CODE (target) != SSA_NAME
+  && !virtual_method_call_p (target)))
+return;
+
+  /* If we previously turned the call into a direct call, there is
+ no need to analyze.  */
+  cs = cgraph_edge (node, call);
+  if (cs && !cs->indirect_unknown_callee)
 return;
   if (TREE_CODE (target) == SSA_NAME)
 ipa_analyze_indirect_call_uses (node, info, parms_ainfo, call, target);


Re: [Fortran] RFC / RFA patch for using build_predict_expr instead of builtin_expect / PR 58721

2013-12-15 Thread Tobias Burnus

Jan Hubicka wrote:

Yep, they should not roduce incorrect code. Isn't the problem
that you expect the whole expression to have value and predit_expr
"reutrns" nothing?
Can you check what lands into gimple?


That could well be the case – I replace "0" by "{ built_predict; 0 }" 
and I wouldn't be surprised if the built_predict causes problem as it 
returns 'nothing'. At least the basic block belonging to "else" 
() is empty:


Original dump (4.8 and 4.9 with predict patch):
  sub1 (&v[S.0 + -1], (logical(kind=4)) __builtin_expect 
((integer(kind=8)) (D.1897 == 0B), 0) ? 0B : &(*D.1897)[(S.0 + D.1901) * 
D.1903 + D.1898]);

and
  sub1 (&v[S.0 + -1], D.1917 != 0B ? &(*D.1917)[(S.0 + D.1921) 
* D.1923 + D.1918] : (void) 0);


Gimple of 4.8:
  if (D.1916 != 0) goto ; else goto ;
  :
  iftmp.2 = 0B;
  goto ;
  :
...
  iftmp.2 = &*D.1897[D.1922];
  :
..
  sub1 (D.1924, iftmp.2);

gimple of 4.9 with patch:
  if (D.1917 != 0B) goto ; else goto ;
  :
  D.1935 = S.0 + D.1921;
  D.1936 = D.1935 * D.1923;
  D.1937 = D.1936 + D.1918;
  iftmp.2 = &*D.1917[D.1937];
  goto ;
  :
  :
  D.1939 = S.0 + -1;
  D.1940 = &v[D.1939];
  sub1 (D.1940, iftmp.2);

Tobias

PS: That's for the code:

  implicit none
  type t2
integer, allocatable :: a
integer, pointer :: p2(:) => null()
  end type t2
  type(t2), save :: x
  integer, save :: s, v(2)
  call sub1 (v, x%p2)
contains
  elemental subroutine sub1 (x, y)
integer, intent(inout) :: x
integer, intent(in), optional :: y
  end subroutine sub1
end


[4.8] PATCH: Update x32 baseline_symbols.txt

2013-12-15 Thread H.J. Lu
Hi,

I checked in this patch to update x32 baseline_symbols.txt on 4.8
branch.

H.J.
---
Index: ChangeLog
===
--- ChangeLog   (revision 206002)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2013-12-15  H.J. Lu  
+
+   * config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt: Update.
+
 2013-11-22  Jonathan Wakely  
 
* acinclude.m4 (libtool_VERSION): Bump.
Index: config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt
===
--- config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt   (revision 
206002)
+++ config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt   (working copy)
@@ -403,6 +403,7 @@ FUNC:_ZNKSt15basic_streambufIwSt11char_t
 FUNC:_ZNKSt15basic_streambufIwSt11char_traitsIwEE6getlocEv@@GLIBCXX_3.4
 FUNC:_ZNKSt15basic_stringbufIcSt11char_traitsIcESaIcEE3strEv@@GLIBCXX_3.4
 FUNC:_ZNKSt15basic_stringbufIwSt11char_traitsIwESaIwEE3strEv@@GLIBCXX_3.4
+FUNC:_ZNKSt17bad_function_call4whatEv@@GLIBCXX_3.4.18
 FUNC:_ZNKSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strEv@@GLIBCXX_3.4
 FUNC:_ZNKSt18basic_stringstreamIcSt11char_traitsIcESaIcEE5rdbufEv@@GLIBCXX_3.4
 FUNC:_ZNKSt18basic_stringstreamIwSt11char_traitsIwESaIwEE3strEv@@GLIBCXX_3.4
@@ -590,6 +591,8 @@ FUNC:_ZNKSt7num_putIwSt19ostreambuf_iter
 
FUNC:_ZNKSt7num_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE6do_putES3_RSt8ios_basewm@@GLIBCXX_3.4
 
FUNC:_ZNKSt7num_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE6do_putES3_RSt8ios_basewx@@GLIBCXX_3.4
 
FUNC:_ZNKSt7num_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE6do_putES3_RSt8ios_basewy@@GLIBCXX_3.4
+FUNC:_ZNKSt8__detail20_Prime_rehash_policy11_M_next_bktEj@@GLIBCXX_3.4.18
+FUNC:_ZNKSt8__detail20_Prime_rehash_policy14_M_need_rehashEjjj@@GLIBCXX_3.4.18
 FUNC:_ZNKSt8bad_cast4whatEv@@GLIBCXX_3.4.9
 FUNC:_ZNKSt8ios_base7failure4whatEv@@GLIBCXX_3.4
 FUNC:_ZNKSt8messagesIcE18_M_convert_to_charERKSs@@GLIBCXX_3.4
@@ -1207,6 +1210,7 @@ FUNC:_ZNSt11range_errorD2Ev@@GLIBCXX_3.4
 FUNC:_ZNSt11regex_errorD0Ev@@GLIBCXX_3.4.15
 FUNC:_ZNSt11regex_errorD1Ev@@GLIBCXX_3.4.15
 FUNC:_ZNSt11regex_errorD2Ev@@GLIBCXX_3.4.15
+FUNC:_ZNSt11this_thread11__sleep_forENSt6chrono8durationIxSt5ratioILx1ELx1NS1_IxS2_ILx1ELx10@@GLIBCXX_3.4.18
 FUNC:_ZNSt12__basic_fileIcE2fdEv@@GLIBCXX_3.4
 FUNC:_ZNSt12__basic_fileIcE4fileEv@@GLIBCXX_3.4.1
 FUNC:_ZNSt12__basic_fileIcE4openEPKcSt13_Ios_Openmodei@@GLIBCXX_3.4
@@ -1485,6 +1489,11 @@ FUNC:_ZNSt13basic_ostreamIwSt11char_trai
 FUNC:_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEt@@GLIBCXX_3.4
 FUNC:_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEx@@GLIBCXX_3.4
 FUNC:_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEy@@GLIBCXX_3.4
+FUNC:_ZNSt13random_device14_M_init_pretr1ERKSs@@GLIBCXX_3.4.18
+FUNC:_ZNSt13random_device16_M_getval_pretr1Ev@@GLIBCXX_3.4.18
+FUNC:_ZNSt13random_device7_M_finiEv@@GLIBCXX_3.4.18
+FUNC:_ZNSt13random_device7_M_initERKSs@@GLIBCXX_3.4.18
+FUNC:_ZNSt13random_device9_M_getvalEv@@GLIBCXX_3.4.18
 FUNC:_ZNSt13runtime_errorC1ERKSs@@GLIBCXX_3.4
 FUNC:_ZNSt13runtime_errorC2ERKSs@@GLIBCXX_3.4
 FUNC:_ZNSt13runtime_errorD0Ev@@GLIBCXX_3.4
@@ -1929,6 +1938,8 @@ FUNC:_ZNSt6__norm15_List_node_base7rever
 FUNC:_ZNSt6__norm15_List_node_base8transferEPS0_S1_@@GLIBCXX_3.4.9
 FUNC:_ZNSt6__norm15_List_node_base9_M_unhookEv@@GLIBCXX_3.4.14
 FUNC:_ZNSt6chrono12system_clock3nowEv@@GLIBCXX_3.4.11
+FUNC:_ZNSt6chrono3_V212steady_clock3nowEv@@GLIBCXX_3.4.19
+FUNC:_ZNSt6chrono3_V212system_clock3nowEv@@GLIBCXX_3.4.19
 FUNC:_ZNSt6gslice8_IndexerC1EjRKSt8valarrayIjES4_@@GLIBCXX_3.4
 FUNC:_ZNSt6gslice8_IndexerC2EjRKSt8valarrayIjES4_@@GLIBCXX_3.4
 FUNC:_ZNSt6locale11_M_coalesceERKS_S1_i@@GLIBCXX_3.4
@@ -2467,6 +2478,7 @@ FUNC:__cxa_guard_acquire@@CXXABI_1.3
 FUNC:__cxa_guard_release@@CXXABI_1.3
 FUNC:__cxa_pure_virtual@@CXXABI_1.3
 FUNC:__cxa_rethrow@@CXXABI_1.3
+FUNC:__cxa_thread_atexit@@CXXABI_1.3.7
 FUNC:__cxa_throw@@CXXABI_1.3
 FUNC:__cxa_tm_cleanup@@CXXABI_TM_1
 FUNC:__cxa_vec_cctor@@CXXABI_1.3
@@ -2491,6 +2503,7 @@ OBJECT:0:CXXABI_1.3.3
 OBJECT:0:CXXABI_1.3.4
 OBJECT:0:CXXABI_1.3.5
 OBJECT:0:CXXABI_1.3.6
+OBJECT:0:CXXABI_1.3.7
 OBJECT:0:CXXABI_TM_1
 OBJECT:0:GLIBCXX_3.4
 OBJECT:0:GLIBCXX_3.4.1
@@ -2502,6 +2515,8 @@ OBJECT:0:GLIBCXX_3.4.14
 OBJECT:0:GLIBCXX_3.4.15
 OBJECT:0:GLIBCXX_3.4.16
 OBJECT:0:GLIBCXX_3.4.17
+OBJECT:0:GLIBCXX_3.4.18
+OBJECT:0:GLIBCXX_3.4.19
 OBJECT:0:GLIBCXX_3.4.2
 OBJECT:0:GLIBCXX_3.4.3
 OBJECT:0:GLIBCXX_3.4.4
@@ -3033,6 +3048,8 @@ OBJECT:1:_ZNSt21__numeric_limits_base9is
 OBJECT:1:_ZNSt21__numeric_limits_base9is_moduloE@@GLIBCXX_3.4
 OBJECT:1:_ZNSt21__numeric_limits_base9is_signedE@@GLIBCXX_3.4
 OBJECT:1:_ZNSt6chrono12system_clock12is_monotonicE@@GLIBCXX_3.4.11
+OBJECT:1:_ZNSt6chrono3_V212steady_clock9is_steadyE@@GLIBCXX_3.4.19
+OBJECT:1:_ZNSt6chrono3_V212system_clock9is_steadyE@@GLIBCXX_3.4.19
 OBJECT:1:_ZSt10adopt_lock@@GLIBCXX_3.4.11
 OBJECT:1:_ZSt10defer_lock@@GLIBCXX_3.4.11
 OBJECT:1:_ZSt11try_to_lock@@GLIBC

[patch, libgfortran] PR59419 Failing OPEN with FILE='xxx' and IOSTAT creates the file 'xxx'

2013-12-15 Thread Jerry DeLisle
Hi all,

The attached patch fixes the problem by properly exiting when an error has
occurred rather then falling through and creating the file.

The patch also fixes a few other places I found after auditing all calls to
generate error in libgfortran/io.

I will conjure up a test case for this.

I have regression tested on X86-64 Linux.  OK for trunk?

Regards,

Jerry

2013-12-15  Jerry DeLisle  

PR libfortran/59419
* io/file_pos.c (st_rewind): Do proper return after
generate_error.
* io/open.c (edit_modes): Move action code inside block that
checks for library ok. (new_unit): Do cleanup after error.
(st_open): Do proper return after error.
* io/transfer.c (data_transfer_init): Likewise.
Index: file_pos.c
===
--- file_pos.c	(revision 205993)
+++ file_pos.c	(working copy)
@@ -410,7 +410,11 @@ st_rewind (st_parameter_filepos *fpp)
 	  u->last_record = 0;
 
 	  if (sseek (u->s, 0, SEEK_SET) < 0)
-	generate_error (&fpp->common, LIBERROR_OS, NULL);
+	{
+	  generate_error (&fpp->common, LIBERROR_OS, NULL);
+	  library_end ();
+	  return;
+	}
 
 	  /* Set this for compatibilty with g77 for /dev/null.  */
 	  if (ssize (u->s) == 0)
Index: open.c
===
--- open.c	(revision 205993)
+++ open.c	(working copy)
@@ -265,39 +265,39 @@ edit_modes (st_parameter_open *opp, gfc_unit * u,
 	u->flags.round = flags->round;
   if (flags->sign != SIGN_UNSPECIFIED)
 	u->flags.sign = flags->sign;
-}
 
-  /* Reposition the file if necessary.  */
-
-  switch (flags->position)
-{
-case POSITION_UNSPECIFIED:
-case POSITION_ASIS:
-  break;
-
-case POSITION_REWIND:
-  if (sseek (u->s, 0, SEEK_SET) != 0)
-	goto seek_error;
-
-  u->current_record = 0;
-  u->last_record = 0;
-
-  test_endfile (u);
-  break;
-
-case POSITION_APPEND:
-  if (sseek (u->s, 0, SEEK_END) < 0)
-	goto seek_error;
-
-  if (flags->access != ACCESS_STREAM)
-	u->current_record = 0;
-
-  u->endfile = AT_ENDFILE;	/* We are at the end.  */
-  break;
-
-seek_error:
-  generate_error (&opp->common, LIBERROR_OS, NULL);
-  break;
+  /* Reposition the file if necessary.  */
+
+  switch (flags->position)
+	{
+	case POSITION_UNSPECIFIED:
+	case POSITION_ASIS:
+	  break;
+
+	case POSITION_REWIND:
+	  if (sseek (u->s, 0, SEEK_SET) != 0)
+	goto seek_error;
+
+	  u->current_record = 0;
+	  u->last_record = 0;
+
+	  test_endfile (u);
+	  break;
+
+	case POSITION_APPEND:
+	  if (sseek (u->s, 0, SEEK_END) < 0)
+	goto seek_error;
+
+	  if (flags->access != ACCESS_STREAM)
+	u->current_record = 0;
+
+	  u->endfile = AT_ENDFILE;	/* We are at the end.  */
+	  break;
+
+	seek_error:
+	  generate_error (&opp->common, LIBERROR_OS, NULL);
+	  break;
+	}
 }
 
   unlock_unit (u);
@@ -562,7 +562,10 @@ new_unit (st_parameter_open *opp, gfc_unit *u, uni
   if (flags->position == POSITION_APPEND)
 {
   if (sseek (u->s, 0, SEEK_END) < 0)
-	generate_error (&opp->common, LIBERROR_OS, NULL);
+	{
+	  generate_error (&opp->common, LIBERROR_OS, NULL);
+	  goto cleanup;
+	}
   u->endfile = AT_ENDFILE;
 }
 
@@ -852,8 +855,12 @@ st_open (st_parameter_open *opp)
 	{
 	  u = find_unit (opp->common.unit);
 	  if (u == NULL) /* Negative unit and no NEWUNIT-created unit found.  */
-	generate_error (&opp->common, LIBERROR_BAD_OPTION,
-			"Bad unit number in OPEN statement");
+	{
+	  generate_error (&opp->common, LIBERROR_BAD_OPTION,
+			  "Bad unit number in OPEN statement");
+	  library_end ();
+	  return;
+	}
 	}
 
   if (u == NULL)
Index: transfer.c
===
--- transfer.c	(revision 205993)
+++ transfer.c	(working copy)
@@ -2490,14 +2490,18 @@ data_transfer_init (st_parameter_dt *dtp, int read
   if ((cf & IOPARM_DT_HAS_NAMELIST_NAME) != 0 && dtp->u.p.ionml != NULL)
  {
 	if ((cf & IOPARM_DT_HAS_FORMAT) != 0)
-	   generate_error (&dtp->common, LIBERROR_OPTION_CONFLICT,
-		"A format cannot be specified with a namelist");
+	  {
+	generate_error (&dtp->common, LIBERROR_OPTION_CONFLICT,
+			"A format cannot be specified with a namelist");
+	return;
+	  }
  }
   else if (dtp->u.p.current_unit->flags.form == FORM_FORMATTED &&
 	   !(cf & (IOPARM_DT_HAS_FORMAT | IOPARM_DT_LIST_FORMAT)))
 {
   generate_error (&dtp->common, LIBERROR_OPTION_CONFLICT,
 		  "Missing format for FORMATTED data transfer");
+  return;
 }
 
   if (is_internal_unit (dtp)


Re: [Fortran] RFC / RFA patch for using build_predict_expr instead of builtin_expect / PR 58721

2013-12-15 Thread Jan Hubicka
> Jan Hubicka wrote:
> >Yep, they should not roduce incorrect code. Isn't the problem
> >that you expect the whole expression to have value and predit_expr
> >"reutrns" nothing?
> >Can you check what lands into gimple?
> 
> That could well be the case – I replace "0" by "{ built_predict; 0
> }" and I wouldn't be surprised if the built_predict causes problem

Yep, though this is also the case whrere you really want to predict
a value rather than code path, so the extended bultin_expect is probably
only resonable approach here.

I am not really generic person, but perhaps it is a difference in between {
built_predict; 0 } that is stmt with no value and built_predict,0 that is an
expression with value?

I will look into the builtin_expect extension. Then we can probably
keem gfc_likely/unlikely with extra argument specifying the predictor
for cases like this and use predict_expr in cases where you
really produce runtime conditional like
if (...)
  { bulitin_predict; abort ();}

Honza
> as it returns 'nothing'. At least the basic block belonging to
> "else" () is empty:
> 
> Original dump (4.8 and 4.9 with predict patch):
>   sub1 (&v[S.0 + -1], (logical(kind=4)) __builtin_expect
> ((integer(kind=8)) (D.1897 == 0B), 0) ? 0B : &(*D.1897)[(S.0 +
> D.1901) * D.1903 + D.1898]);
> and
>   sub1 (&v[S.0 + -1], D.1917 != 0B ? &(*D.1917)[(S.0 +
> D.1921) * D.1923 + D.1918] : (void) 0);
> 
> Gimple of 4.8:
>   if (D.1916 != 0) goto ; else goto ;
>   :
>   iftmp.2 = 0B;
>   goto ;
>   :
> ...
>   iftmp.2 = &*D.1897[D.1922];
>   :
> ..
>   sub1 (D.1924, iftmp.2);
> 
> gimple of 4.9 with patch:
>   if (D.1917 != 0B) goto ; else goto ;
>   :
>   D.1935 = S.0 + D.1921;
>   D.1936 = D.1935 * D.1923;
>   D.1937 = D.1936 + D.1918;
>   iftmp.2 = &*D.1917[D.1937];
>   goto ;
>   :
>   :
>   D.1939 = S.0 + -1;
>   D.1940 = &v[D.1939];
>   sub1 (D.1940, iftmp.2);
> 
> Tobias
> 
> PS: That's for the code:
> 
>   implicit none
>   type t2
> integer, allocatable :: a
> integer, pointer :: p2(:) => null()
>   end type t2
>   type(t2), save :: x
>   integer, save :: s, v(2)
>   call sub1 (v, x%p2)
> contains
>   elemental subroutine sub1 (x, y)
> integer, intent(inout) :: x
> integer, intent(in), optional :: y
>   end subroutine sub1
> end


Re: [PATCH i386] Enable -freorder-blocks-and-partition

2013-12-15 Thread Martin Liška
On 15 December 2013 23:17, Martin Liška  wrote:
> Dear Jan and Teresa,
> Jan was right that I've been using changes which were commited by
> Teresa and do live in trunk. So the graph with time profile presented
> in my previous post was really with enabled
> -freorder-blocks-and-partition. I removed the hack in varasm.c and I
> do use classic section layout. Please open the following dump
> (includes PDF graph+html report that shows functions with time profile
> located in cold section and all -fdump-ipa-all dumps):
>
> https://drive.google.com/file/d/0B0pisUJ80pO1YW1QWUFkZjdqME0/edit?usp=sharing
>
> Apart from that, I created also PDF graph 
> (https://drive.google.com/file/d/0B0pisUJ80pO1aHhPWW56dXpLVTQ/edit?usp=sharing)
>  that
> shows that time profile is almost perfect for GIMP. I miss just some
> examples that do not have profile in generate phase.
>
> I will merge current trunk and prepare final patch.
>
> Are there any other data that you want to be prepared?
>
> Martin
>
>
> On 13 December 2013 02:13, Jan Hubicka  wrote:
>>> On Wed, Dec 11, 2013 at 1:21 AM, Martin Liška  
>>> wrote:
>>> > Hello,
>>> >I prepared a collection of systemtap graphs for GIMP.
>>> >
>>> > 1) just my profile-based function reordering: 550 pages
>>> > 2) just -freorder-blocks-and-partitions: 646 pages
>>> > 3) just -fno-reorder-blocks-and-partitions: 638 pages
>>> >
>>> > Please see attached data.
>>>
>>> Thanks for the data. A few observations/questions:
>>>
>>> With both 1) (your (time-based?) reordering) and 2)
>>> (-freorder-blocks-and-partitions) there are a fair amount of accesses
>>> out of the cold section. I'm not seeing so many accesses out of the
>>> cold section in the apps I am looking at with splitting enabled. In
>>
>> I see you already comitted the patch, so perhaps Martin's measurement assume
>> the pass is off by default?
>>
>> I rebuilded GCC with profiledboostrap and with the linkerscript unmapping
>> text.unlikely.  I get ICE in:
>> (gdb) bt
>> #0  diagnostic_set_caret_max_width(diagnostic_context*, int) () at 
>> ../../gcc/diagnostic.c:108
>> #1  0x00f68457 in diagnostic_initialize (context=0x18ae000 
>> , n_opts=n_opts@entry=1290) at 
>> ../../gcc/diagnostic.c:135
>> #2  0x0100050e in general_init (argv0=) at 
>> ../../gcc/toplev.c:1110
>> #3  toplev_main(int, char**) () at ../../gcc/toplev.c:1922
>> #4  0x7774cbe5 in __libc_start_main () from /lib64/libc.so.6
>> #5  0x00f7898d in _start () at ../sysdeps/x86_64/start.S:122
>>
>> That is relatively early in startup process. The function seems inlined and
>> it fails only on second invocation, did not have time to investigate further,
>> yet while without -fprofile-use it starts...
>>
>> On our periodic testers I see off-noise improvement in crafty 2200->2300
>> and regression on Vortex, 2900->2800, plus code size increase.
>>
>> Honza


Re: [PATCH] Time profiler - phase 2

2013-12-15 Thread Martin Liška
There's latest version of the patch.
Could you please approve the patch?

Martin

On 5 December 2013 22:32, Jan Hubicka  wrote:
>> Hello,
>>thank you for the trick in ipa-split.c. It really helped! I
>
> Good!, this patch is pre-approved after testing.
>> prepared 2 tests for Inkscape, first was just with my function
>> reordering pass. And for the second, I enable also
>> -freorder-blocks-and-partition (note: files ending with _blocks.xxx in
>> attached tar).
>>
>> Touched pages:
>> just reorder functions: 1108
>> with -freorder-blocks-and-partition: 1120
>>
>> Please see dumps at:
>> https://drive.google.com/file/d/0B0pisUJ80pO1R19zSXR6U1Q4NmM/edit?usp=sharing
>> Note: I put all function to a single section (for easier layout orientation).
>
> I think for -freorder-blocks-and-partition you will need to split the sections
> again, otherwise we will not see the effect of the pass. Can you, please, make
> one extra systemtap with this (it would be good to have both
> -fno-reorder-blocks-and-partition and -freorder-blocks-and-partition so we can
> compare size of bloth)?
> But overall it looks good!
>
> Honza
>>
>> If I'm correct, there a small chunk of about 10 functions after the
>> boundary of untouched functions and a single miss at the end of
>> binary: __libc_csu_init.
>> If you look at the graph line, it looks really well.
>>
>> I will prepare patch for mailing list as a phase 2.
>>
>> Martin
>>
>> On 5 December 2013 14:38, Jan Hubicka  wrote:
>> >> Can you, please, send me the -flto systemtaps for gimp and/or inkscape so 
>> >> we can decide
>> >> on the patch? We should enable it earlier in stage3 rather than later.
>> >
>> > I see, the PDF was included within the tar file.  Was this with
>> > -freorder-blocks-and-partition?  If so, the patch is OK.
>> > I still think we should put cold parts of hot/normal function into a 
>> > subsection different
>> > from unlikely section, but lets handle that incrementally.
>> >
>> > Thanks,
>> > Honza
>> >>
>> >> Honza
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 93e857df..d5a0ac8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2013-12-15  Martin Liska  
+	Jan Hubicka  
+
+	* cgraphunit.c (node_cmp): New function.
+	(expand_all_functions): Function ordering added.
+	* common.opt: New profile based function reordering flag introduced.
+	* lto-partition.c: Support for time profile added.
+	* lto.c: Likewise.
+	* predict.c (handle_missing_profiles): Time profile handled in
+	  missing profiles.
+
 2013-12-14   Jan Hubicka  
 
 	PR middle-end/58477
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 44f3afd..2b66331 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1831,6 +1831,23 @@ expand_function (struct cgraph_node *node)
   ipa_remove_all_references (&node->ref_list);
 }
 
+/* Node comparer that is responsible for the order that corresponds
+   to time when a function was launched for the first time.  */
+
+static int
+node_cmp (const void *pa, const void *pb)
+{
+  const struct cgraph_node *a = *(const struct cgraph_node * const *) pa;
+  const struct cgraph_node *b = *(const struct cgraph_node * const *) pb;
+
+  /* Functions with time profile must be before these without profile.  */
+  if (!a->tp_first_run || !b->tp_first_run)
+return a->tp_first_run - b->tp_first_run;
+
+  return a->tp_first_run != b->tp_first_run
+	 ? b->tp_first_run - a->tp_first_run
+	 : b->order - a->order;
+}
 
 /* Expand all functions that must be output.
 
@@ -1842,11 +1859,14 @@ expand_function (struct cgraph_node *node)
to use subsections to make the output functions appear in top-down
order).  */
 
+
 static void
 expand_all_functions (void)
 {
   struct cgraph_node *node;
   struct cgraph_node **order = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
+
+  unsigned int expanded_func_count = 0, profiled_func_count = 0;
   int order_pos, new_order_pos = 0;
   int i;
 
@@ -1859,20 +1879,39 @@ expand_all_functions (void)
 if (order[i]->process)
   order[new_order_pos++] = order[i];
 
+  if (flag_profile_reorder_functions)
+qsort (order, new_order_pos, sizeof (struct cgraph_node *), node_cmp);
+
   for (i = new_order_pos - 1; i >= 0; i--)
 {
   node = order[i];
+
   if (node->process)
 	{
+ expanded_func_count++;
+ if(node->tp_first_run)
+   profiled_func_count++;
+
+if (cgraph_dump_file)
+  fprintf (cgraph_dump_file, "Time profile order in expand_all_functions:%s:%d\n", node->asm_name (), node->tp_first_run);
+
 	  node->process = 0;
 	  expand_function (node);
 	}
 }
+
+if (in_lto_p && dump_file)
+  fprintf (dump_file, "Expanded functions with time profile (%s):%u/%u\n",
+   main_input_filename, profiled_func_count, expanded_func_count);
+
+  if (cgraph_dump_file && flag_profile_reorder_functions && in_lto_p)
+fprintf (cgraph_dump_file, "Expanded functions with time profile:%u/%u\n",
+ profiled_func_count, expanded_func_count);
+
   cgraph_process_

Re: [PATCH] Time profiler - phase 2

2013-12-15 Thread Jan Hubicka
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 93e857df..d5a0ac8 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,14 @@
> +2013-12-15  Martin Liska  
> + Jan Hubicka  
> +
> + * cgraphunit.c (node_cmp): New function.
> + (expand_all_functions): Function ordering added.
> + * common.opt: New profile based function reordering flag introduced.
> + * lto-partition.c: Support for time profile added.
> + * lto.c: Likewise.
> + * predict.c (handle_missing_profiles): Time profile handled in
> +   missing profiles.
> +

OK, thanks, with the changes bellow!
(I tought this patch was already in! Also please be careful about
applying the changes - it seems that in the previous commit you
M
omitted some)
> @@ -1842,11 +1859,14 @@ expand_function (struct cgraph_node *node)
> to use subsections to make the output functions appear in top-down
> order).  */
>  
> +
Bogus whitespace
>  static void
>  expand_all_functions (void)
>  {
>struct cgraph_node *node;
>struct cgraph_node **order = XCNEWVEC (struct cgraph_node *, 
> cgraph_n_nodes);
> +
> +  unsigned int expanded_func_count = 0, profiled_func_count = 0;
>int order_pos, new_order_pos = 0;
>int i;
>  
> @@ -1859,20 +1879,39 @@ expand_all_functions (void)
>  if (order[i]->process)
>order[new_order_pos++] = order[i];
>  
> +  if (flag_profile_reorder_functions)
> +qsort (order, new_order_pos, sizeof (struct cgraph_node *), node_cmp);
> +
>for (i = new_order_pos - 1; i >= 0; i--)
>  {
>node = order[i];
> +
>if (node->process)
>   {
> + expanded_func_count++;
> + if(node->tp_first_run)
> +   profiled_func_count++;
> +
> +if (cgraph_dump_file)
> +  fprintf (cgraph_dump_file, "Time profile order in 
> expand_all_functions:%s:%d\n", node->asm_name (), node->tp_first_run);
> +
> node->process = 0;
> expand_function (node);
>   }
>  }
> +
> +if (in_lto_p && dump_file)
> +  fprintf (dump_file, "Expanded functions with time profile 
> (%s):%u/%u\n",
> +   main_input_filename, profiled_func_count, 
> expanded_func_count);
> +
> +  if (cgraph_dump_file && flag_profile_reorder_functions && in_lto_p)
> +fprintf (cgraph_dump_file, "Expanded functions with time 
> profile:%u/%u\n",
> + profiled_func_count, expanded_func_count);

Make the dumps unconditoinal, I do not see why they should be in_lto_p here.
> @@ -689,7 +713,6 @@ lto_balanced_map (void)
> best_i = i;
> best_n_nodes = lto_symtab_encoder_size (partition->encoder);
> best_total_size = total_size;
> -   best_varpool_pos = varpool_pos;
>   }
>if (cgraph_dump_file)
>   fprintf (cgraph_dump_file, "Step %i: added %s/%i, size %i, cost %i/%i "
> @@ -707,7 +730,6 @@ lto_balanced_map (void)
>   fprintf (cgraph_dump_file, "Unwinding %i insertions to step 
> %i\n",
>i - best_i, best_i);
> undo_partition (partition, best_n_nodes);
> -   varpool_pos = best_varpool_pos;
>   }
> i = best_i;
> /* When we are finished, avoid creating empty partition.  */

I already asked you to remove these changes - they revert earlier fix.

> diff --git a/gcc/predict.c b/gcc/predict.c
> index a5ad34f..1826a06 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -2839,12 +2839,24 @@ handle_missing_profiles (void)
>  {
>struct cgraph_edge *e;
>gcov_type call_count = 0;
> +  gcov_type max_tp_first_run = 0;
>struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>  
>if (node->count)
>  continue;
>for (e = node->callers; e; e = e->next_caller)
> +  {
>  call_count += e->count;
> +
> + if (e->caller->tp_first_run > max_tp_first_run)
> +   max_tp_first_run = e->caller->tp_first_run;
> +  }
> +
> +  /* If time profile is missing, let assign the maximum that comes from
> +  caller functions.  */
> +  if (!node->tp_first_run && max_tp_first_run)
> + node->tp_first_run = max_tp_first_run + 1;
> +

I believe you also need minizming node->tp_first_run in ipa_merge_profiles.
>if (call_count
>&& fn && fn->cfg
>&& (call_count * unlikely_count_fraction >= profile_info->runs))
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 5c5025a..f34946c 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -552,7 +552,14 @@ default_function_section (tree decl, enum node_frequency 
> freq,
>   unlikely executed (this happens especially with function splitting
>   where we can split away unnecessary parts of static constructors.  */
>if (startup && freq != NODE_FREQUENCY_UNLIKELY_EXECUTED)
> -return get_named_text_section (decl, ".text.startup", NULL);
> +  {
> +/* If we do have a profile or(and) LTO phase is executed, we do not need
> +these ELF section.  */
> +if (!in_lto_p || !flag_profile_values)
> +  

Re: [PATCH] Time profiler - phase 2

2013-12-15 Thread Martin Liška
Hello,
   there's updated version of the patch.

Tested on x86_64 with enable bootstrap.

Martin

On 16 December 2013 00:21, Jan Hubicka  wrote:
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 93e857df..d5a0ac8 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,14 @@
>> +2013-12-15  Martin Liska  
>> + Jan Hubicka  
>> +
>> + * cgraphunit.c (node_cmp): New function.
>> + (expand_all_functions): Function ordering added.
>> + * common.opt: New profile based function reordering flag introduced.
>> + * lto-partition.c: Support for time profile added.
>> + * lto.c: Likewise.
>> + * predict.c (handle_missing_profiles): Time profile handled in
>> +   missing profiles.
>> +
>
> OK, thanks, with the changes bellow!
> (I tought this patch was already in! Also please be careful about
> applying the changes - it seems that in the previous commit you
> M
> omitted some)
>> @@ -1842,11 +1859,14 @@ expand_function (struct cgraph_node *node)
>> to use subsections to make the output functions appear in top-down
>> order).  */
>>
>> +
> Bogus whitespace
>>  static void
>>  expand_all_functions (void)
>>  {
>>struct cgraph_node *node;
>>struct cgraph_node **order = XCNEWVEC (struct cgraph_node *, 
>> cgraph_n_nodes);
>> +
>> +  unsigned int expanded_func_count = 0, profiled_func_count = 0;
>>int order_pos, new_order_pos = 0;
>>int i;
>>
>> @@ -1859,20 +1879,39 @@ expand_all_functions (void)
>>  if (order[i]->process)
>>order[new_order_pos++] = order[i];
>>
>> +  if (flag_profile_reorder_functions)
>> +qsort (order, new_order_pos, sizeof (struct cgraph_node *), node_cmp);
>> +
>>for (i = new_order_pos - 1; i >= 0; i--)
>>  {
>>node = order[i];
>> +
>>if (node->process)
>>   {
>> + expanded_func_count++;
>> + if(node->tp_first_run)
>> +   profiled_func_count++;
>> +
>> +if (cgraph_dump_file)
>> +  fprintf (cgraph_dump_file, "Time profile order in 
>> expand_all_functions:%s:%d\n", node->asm_name (), node->tp_first_run);
>> +
>> node->process = 0;
>> expand_function (node);
>>   }
>>  }
>> +
>> +if (in_lto_p && dump_file)
>> +  fprintf (dump_file, "Expanded functions with time profile 
>> (%s):%u/%u\n",
>> +   main_input_filename, profiled_func_count, 
>> expanded_func_count);
>> +
>> +  if (cgraph_dump_file && flag_profile_reorder_functions && in_lto_p)
>> +fprintf (cgraph_dump_file, "Expanded functions with time 
>> profile:%u/%u\n",
>> + profiled_func_count, expanded_func_count);
>
> Make the dumps unconditoinal, I do not see why they should be in_lto_p here.
>> @@ -689,7 +713,6 @@ lto_balanced_map (void)
>> best_i = i;
>> best_n_nodes = lto_symtab_encoder_size (partition->encoder);
>> best_total_size = total_size;
>> -   best_varpool_pos = varpool_pos;
>>   }
>>if (cgraph_dump_file)
>>   fprintf (cgraph_dump_file, "Step %i: added %s/%i, size %i, cost %i/%i "
>> @@ -707,7 +730,6 @@ lto_balanced_map (void)
>>   fprintf (cgraph_dump_file, "Unwinding %i insertions to step 
>> %i\n",
>>i - best_i, best_i);
>> undo_partition (partition, best_n_nodes);
>> -   varpool_pos = best_varpool_pos;
>>   }
>> i = best_i;
>> /* When we are finished, avoid creating empty partition.  */
>
> I already asked you to remove these changes - they revert earlier fix.
>
>> diff --git a/gcc/predict.c b/gcc/predict.c
>> index a5ad34f..1826a06 100644
>> --- a/gcc/predict.c
>> +++ b/gcc/predict.c
>> @@ -2839,12 +2839,24 @@ handle_missing_profiles (void)
>>  {
>>struct cgraph_edge *e;
>>gcov_type call_count = 0;
>> +  gcov_type max_tp_first_run = 0;
>>struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>
>>if (node->count)
>>  continue;
>>for (e = node->callers; e; e = e->next_caller)
>> +  {
>>  call_count += e->count;
>> +
>> + if (e->caller->tp_first_run > max_tp_first_run)
>> +   max_tp_first_run = e->caller->tp_first_run;
>> +  }
>> +
>> +  /* If time profile is missing, let assign the maximum that comes from
>> +  caller functions.  */
>> +  if (!node->tp_first_run && max_tp_first_run)
>> + node->tp_first_run = max_tp_first_run + 1;
>> +
>
> I believe you also need minizming node->tp_first_run in ipa_merge_profiles.
>>if (call_count
>>&& fn && fn->cfg
>>&& (call_count * unlikely_count_fraction >= profile_info->runs))
>> diff --git a/gcc/varasm.c b/gcc/varasm.c
>> index 5c5025a..f34946c 100644
>> --- a/gcc/varasm.c
>> +++ b/gcc/varasm.c
>> @@ -552,7 +552,14 @@ default_function_section (tree decl, enum 
>> node_frequency freq,
>>   unlikely executed (this happens especially with function splitting
>>   where we can split away unnecessary parts of static constructors.  */
>>if (startu

Re: RFA: revert libstdc++ r205810: simulator workload increase caused regression

2013-12-15 Thread Hans-Peter Nilsson
> From: Hans-Peter Nilsson 
> Date: Sun, 15 Dec 2013 15:20:48 +0100

> +// { dg-options "-std=gnu++0x -DSAMPLES=3" { target { { arm*-* } && 
> simulator } } }
> +// { dg-options "-std=gnu++0x -DSAMPLES=1" { target simulator } }

JFTR, I managed to have two bugs here:
1 - the target tuple (unless being an "effective target") must match "*-*-*".
2 - the *last* matching line is used.

But as mentioned, I'd prefer to split chi2_quality.cc into
(five) separate tests, if a maintainer would be ok with that.

brgds, H-P


RE: [GOMP4][PATCH] SIMD-enabled functions (formerly Elemental functions) for C++

2013-12-15 Thread Iyer, Balaji V
Hello Everyone,
The following changes mentioned in this thread 
(http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01280.html) are also applicable 
to the C++ patch and the attached patch has been fixed accordingly:

1. Sharing the vectorlength parsing function between #pragma simd and SIMD 
enabled functions
2. Renaming the function that is parsing the SIMD enabled function attributes
3. Renaming "Cilk plus elementals" to "Cilk SIMD function" for the attribute 
name
4. Marking all the SIMD enabled function attributes with both "omp declare 
simd" and "cilk simd function."
5. Renaming an error message from "..SIMD-enabled function" to "Cilk Plus 
SIMD-enabled function..."

So, is this patch OK for branch/trunk?

Here are the ChangeLog entries:

Gcc/cp/ChangeLog:
2013-12-16  Balaji V. Iyer  

* decl2.c (is_late_template_attribute): Added a check for SIMD-enabled
functions attribute.  If found, return true.
* parser.c (cp_parser_direct_declarator): When Cilk Plus is enabled
see if there is an attribute after function decl.  If so, then
parse them now.
(cp_parser_late_return_type_opt): Handle parsing of Cilk Plus SIMD
enabled function late parsing.
(cp_parser_gnu_attribute_list): Parse all the tokens for the vector
attribute for a SIMD-enabled function.
(cp_parser_omp_all_clauses): Skip parsing to the end of pragma when
the function is used by SIMD-enabled function (indicated by NULL
pragma token).
(cp_parser_cilk_simd_vectorlength): Modified this function to parse
vectorlength attribute in SIMD-enabled function and #pragma SIMD's
vectorlength clause.  Added a new parameter to pass in SIMD-enabled
function's info.
(cp_parser_cilk_simd_fn_vector_attrs): New function.
(cp_parser_late_parsing_elem_fn_info): Likewise.
* parser.h (cp_parser::elem_fn_info): New field.
* decl.c (grokfndecl): Added a check if Cilk Plus is enabled and
if so, adjust the Cilk Plus SIMD-enabled function attributes.

Gcc/testsuite/ChangeLog
2013-12-16  Balaji V. Iyer  

* g++.dg/cilk-plus/cilk-plus.exp: Called the C/C++ common tests for
SIMD enabled function.
* g++.dg/cilk-plus/ef_test.C: New test.


Thanks,

Balaji V. Iyer.

> -Original Message-
> From: Iyer, Balaji V
> Sent: Thursday, December 5, 2013 11:37 AM
> To: Jakub Jelinek
> Cc: Aldy Hernandez (al...@redhat.com); gcc-patches@gcc.gnu.org
> Subject: FW: [GOMP4][PATCH] SIMD-enabled functions (formerly Elemental
> functions) for C++
> 
> PING!
> 
> -Balaji V. Iyer.
> 
> > -Original Message-
> > From: Iyer, Balaji V
> > Sent: Saturday, November 30, 2013 11:53 PM
> > To: 'Jakub Jelinek'
> > Cc: Aldy Hernandez (al...@redhat.com); 'Jeff Law'; 'gcc-
> > patc...@gcc.gnu.org'
> > Subject: RE: [GOMP4][PATCH] SIMD-enabled functions (formerly
> Elemental
> > functions) for C++
> >
> > Hello Everyone,
> > The changes mentioned in http://gcc.gnu.org/ml/gcc-patches/2013-
> > 11/msg03506.html is also applicable to my C++ patch. With this email,
> > I am attaching a fixed patch.
> >
> > Here are the ChangeLog entries:
> >
> > gcc/cp/ChangeLog
> > 2013-11-30  Balaji V. Iyer  
> >
> > * decl2.c (is_late_template_attribute): Added a check for SIMD-
> enabled
> > functions attribute.  If found, return true.
> > * parser.c (cp_parser_direct_declarator): When Cilk Plus is enabled
> > see if there is an attribute after function decl.  If so, then
> > parse them now.
> > (cp_parser_late_return_type_opt): Handle parsing of Cilk Plus SIMD
> > enabled function late parsing.
> > (cp_parser_gnu_attribute_list): Parse all the tokens for the vector
> > attribute for a SIMD-enabled function.
> > (cp_parser_omp_all_clauses): Skip parsing to the end of pragma when
> > the function is used by SIMD-enabled function (indicated by NULL
> > pragma token).
> > (cp_parser_elem_fn_vectorlength): New function.
> > (cp_parser_elem_fn_expr_list): Likewise.
> > (cp_parser_late_parsing_elem_fn_info): Likewise.
> > * parser.h (cp_parser::elem_fn_info): New field.
> > * decl.c (grokfndecl): Added a check if Cilk Plus is enabled and
> > if so, adjust the Cilk Plus SIMD-enabled function attributes.
> >
> >
> > gcc/testsuite/ChangeLog
> > 2013-11-30  Balaji V. Iyer  
> >
> > * g++.dg/cilk-plus/cilk-plus.exp: Called the C/C++ common tests for
> > SIMD enabled function.
> > * g++.dg/cilk-plus/ef_test.C: New test.
> >
> > Is this OK for branch?
> >
> > Thanks,
> >
> > Balaji V. Iyer.
> >
> > > -Original Message-
> > > From: Iyer, Balaji V
> > > Sent: Wednesday, November 20, 2013 6:19 PM
> > > To: Jakub Jelinek
> > > Cc: Aldy Hernandez (al...@redhat.com); Jeff Law;
> > > gcc-patches@gcc.gnu.org
> > > Subject: [GOMP4][PATCH] SIMD-enabled functions (former

Re: [patch, libgfortran] PR59419 Failing OPEN with FILE='xxx' and IOSTAT creates the file 'xxx'

2013-12-15 Thread Tobias Burnus

Am 15.12.2013 23:14, schrieb Jerry DeLisle:

The patch also fixes a few other places I found after auditing all calls to
generate error in libgfortran/io.
I will conjure up a test case for this.

Thanks for both.


I have regression tested on X86-64 Linux.  OK for trunk?


OK.

Tobias


2013-12-15  Jerry DeLisle  

PR libfortran/59419
* io/file_pos.c (st_rewind): Do proper return after
generate_error.
* io/open.c (edit_modes): Move action code inside block that
checks for library ok. (new_unit): Do cleanup after error.
(st_open): Do proper return after error.
* io/transfer.c (data_transfer_init): Likewise.




Re: GOMP_target: alignment (was: [gomp4] #pragma omp target* fixes)

2013-12-15 Thread Thomas Schwinge
Hi!

On Thu, 12 Dec 2013 10:53:02 +0100, I wrote:
> On Thu, 5 Sep 2013 18:11:05 +0200, Jakub Jelinek  wrote:
> > 3) I figured out we need to tell the runtime library not just
> > address, size and kind, but also alignment (we won't need that for
> > the #pragma omp declare target global vars though), so that the
> > runtime library can properly align it.  As TYPE_ALIGN/DECL_ALIGN
> > is in bits and is 32 bit wide, when that is in bytes and we only care
> > about power of twos, I've decided to encode it in the upper 5 bits
> > of the kind (lower 3 bits are used for OMP_CLAUSE_MAP_* kind).
> 
> Unfortunately, this scheme breaks down with OpenACC: we need an
> additional bit to codify a flag for present_or_* map clauses (meaning:
> only map the data (allocate/to/from/tofrom, as for OpenMP) if not already
> present on the device).
> 
> With five bits available for the OpenMP case, we can describe alignments
> up to 2 GiB, and I've empirically found on my development system that the
> largest possible alignment is MAX_OFILE_ALIGNMENT, 256 MiB for ELF
> systems, so that's fine.  But with only four bits available, we get to
> describe alignments up to 1 << ((1 << 4) - 1) = 32 KiB, which is too
> small -- even though it'd be fine for "normal" usage of __attribute__
> ((aligned (x))).
> 
> So it seems our options are to use a bigger datatype for the kinds array,
> to split off from the kinds array a new alignments array, or to generally
> switch to using an array of a struct containing hostaddr, size,
> alignment, kind.  The latter would require additional changes in the
> child_fn.
> 
> As it's an ABI change no matter what, would you like to see this limited
> to OpenACC?  Changing it also for OpenMP's GOMP_target would have the
> advantage to have them not diverge (especially at the generating side in
> omp-low.c's lowering functions), but I'm not sure whether such an ABI
> change would easily be possible now, with the OpenMP 4 support merged
> into trunk -- though, it is not yet part of a regular GCC release?

Here is the patch I propose for gomp-4_0-branch; OK?

commit ea56cdbd257b08421fefc8e30fd4a28d37d6e481
Author: Thomas Schwinge 
Date:   Sun Dec 15 11:03:47 2013 +0100

OpenACC memory mapping interface: Move alignments into its own array.

gcc/
* builtin-types.def
(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_PTR): New type.
gcc/fortran/
* types.def (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_PTR): New
type.
gcc/
* oacc-builtins.def (BUILT_IN_GOACC_PARALLEL): Use it.
* omp-low.c (expand_oacc_parallel, lower_oacc_parallel): Move
alignments into its own array.
libgomp/
* libgomp_g.h (GOACC_parallel): Add alignments array.
* oacc-parallel.c (GOACC_parallel): Likewise.
* testsuite/libgomp.oacc-c/goacc_parallel.c (main): Likewise.

diff --git gcc/builtin-types.def gcc/builtin-types.def
index e7bfaf9..59660cd 100644
--- gcc/builtin-types.def
+++ gcc/builtin-types.def
@@ -529,6 +529,9 @@ DEF_FUNCTION_TYPE_8 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR,
 BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
 BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 BT_BOOL, BT_UINT, BT_PTR)
+DEF_FUNCTION_TYPE_8 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_PTR,
+BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE,
+BT_PTR, BT_PTR, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_VAR_0 (BT_FN_VOID_VAR, BT_VOID)
 DEF_FUNCTION_TYPE_VAR_0 (BT_FN_INT_VAR, BT_INT)
diff --git gcc/fortran/types.def gcc/fortran/types.def
index 9bbee35..9ec752a 100644
--- gcc/fortran/types.def
+++ gcc/fortran/types.def
@@ -213,5 +213,8 @@ DEF_FUNCTION_TYPE_8 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR,
 BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
 BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 BT_BOOL, BT_UINT, BT_PTR)
+DEF_FUNCTION_TYPE_8 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_PTR,
+BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE,
+BT_PTR, BT_PTR, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_VAR_0 (BT_FN_VOID_VAR, BT_VOID)
diff --git gcc/oacc-builtins.def gcc/oacc-builtins.def
index a75e42d..5057e13 100644
--- gcc/oacc-builtins.def
+++ gcc/oacc-builtins.def
@@ -28,4 +28,5 @@ along with GCC; see the file COPYING3.  If not see
See builtins.def for details.  */
 
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel",
-  BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+  BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_PTR,
+  ATTR_NOTHROW_LIST)
diff --git gcc/omp-low.c gcc/omp-low.c
index e0f7d1d..ce99835 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -4886,7 +4886,7 @@ expand_oacc_parallel (struct omp_region *region)
 }
 
   /* Emit a library call to launch CHILD_FN.  */
-  tree t1, t2, t3, t4, de