GCC 5 Branch is frozen

2016-05-26 Thread Richard Biener
Hi,

The GCC 5 branch is frozen now in preparation for a GCC 5.4 release candidate.  
All changes require release manager approval until after the GCC 5.4 release.

Thanks,
Richard.


Re: Dominance related breakage, was Re: [PATCH] PR71275 ira.c bb_loop_depth

2016-05-26 Thread Vladimir Makarov

On 05/26/2016 10:14 PM, Alan Modra wrote:

On Thu, May 26, 2016 at 10:12:14AM -0400, Vladimir Makarov wrote:

On 05/26/2016 07:02 AM, Alan Modra wrote:

This fixes lack of bb_loop_depth info in some of the early parts of
ira, which has been the case for quite some time.  All active branches
return 0 from bb_loop_depth() in update_equiv_regs, but whether that
actually causes mis-optimization anywhere but trunk is yet to be
determined.

I played a little with trying to consolidate this loop_optimizer_init
call with one that occurs a little later, but ran into ICEs.  (We now
have four calls to loop_optimizer_init in ira.c.)

Bootstrapped and regression tested powerpc64le-linux and x86_64-linux.
OK to apply?


Yes.  Thank you, Alan.

Hi Vlad,
Sorry to do this to you and others, but the patch (committed as
r236789) may be wrong.  I didn't see any problems on trunk but when
I backported to gcc-5, I hit an error in stage2 compiling
insn-recog.c "dominator of 10 status unknown" from if_after_reload.

On gcc-5, the error disappears by adding a call to
   free_dominance_info (CDI_DOMINATORS);
after the newly added call to loop_optimizer_finalize.

I'm not sure yet what is going on.  Does anyone know whether the
free_dominance_info call is needed on trunk?

That is ok.  It is always a discovery.  I am not sure but I think I saw 
this problem when I wrote IRA.


Looking at the dominance code, I seems to me that it can reuse the 
previous info if it was not cleared.  So I guess free_dominance_info is 
important.




Re: [PATCH], PR 71294, Fix -O3 -fstack-protector bug on PowerPC power8

2016-05-26 Thread Segher Boessenkool
On Thu, May 26, 2016 at 09:50:18PM -0400, Michael Meissner wrote:
> It might be argued that this is a reload bug (since it runs on LRA), but
> sometimes it is simpler to place a simpler work around in the machine 
> dependent
> code.  If the maintainers decide that it should be fixed in reload instead of
> via this patch, that is fine.

It is either a bug in reload or in the rs6000 reload hooks (not because it
works with LRA, but because it crashes in reload ;-) )

I don't think using many lines of extra splitters to work around a missing
reload or two is such a great idea.  Someone needs to look deeper into
the problem, find what the actual problem is.

The splitters also hurt code quality in various cases, e.g. when splatting
the stack pointer (r1) or the hard frame pointer (r31); those work fine
without extra copy first.

> +(define_predicate "virtual_or_frame_reg_operand"
> +  (match_code "reg,subreg")
> +{
> +  HOST_WIDE_INT r;
> +  if (SUBREG_P (op))
> +op = SUBREG_REG (op);
> +
> +  if (!REG_P (op))
> +return 0;
> +
> +  r = REGNO (op);
> +  return REGNO_PTR_FRAME_P (r);
> +})

A regno is not a HOST_WIDE_INT but an unsigned int instead.  You can of
course get rid of "r" completely here.


Segher


Dominance related breakage, was Re: [PATCH] PR71275 ira.c bb_loop_depth

2016-05-26 Thread Alan Modra
On Thu, May 26, 2016 at 10:12:14AM -0400, Vladimir Makarov wrote:
> On 05/26/2016 07:02 AM, Alan Modra wrote:
> >This fixes lack of bb_loop_depth info in some of the early parts of
> >ira, which has been the case for quite some time.  All active branches
> >return 0 from bb_loop_depth() in update_equiv_regs, but whether that
> >actually causes mis-optimization anywhere but trunk is yet to be
> >determined.
> >
> >I played a little with trying to consolidate this loop_optimizer_init
> >call with one that occurs a little later, but ran into ICEs.  (We now
> >have four calls to loop_optimizer_init in ira.c.)
> >
> >Bootstrapped and regression tested powerpc64le-linux and x86_64-linux.
> >OK to apply?
> >
> Yes.  Thank you, Alan.

Hi Vlad,
Sorry to do this to you and others, but the patch (committed as
r236789) may be wrong.  I didn't see any problems on trunk but when
I backported to gcc-5, I hit an error in stage2 compiling
insn-recog.c "dominator of 10 status unknown" from if_after_reload.

On gcc-5, the error disappears by adding a call to
  free_dominance_info (CDI_DOMINATORS);
after the newly added call to loop_optimizer_finalize.

I'm not sure yet what is going on.  Does anyone know whether the
free_dominance_info call is needed on trunk?

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH], PR 71294, Fix -O3 -fstack-protector bug on PowerPC power8

2016-05-26 Thread Michael Meissner
It might be argued that this is a reload bug (since it runs on LRA), but
sometimes it is simpler to place a simpler work around in the machine dependent
code.  If the maintainers decide that it should be fixed in reload instead of
via this patch, that is fine.

PR 71294 involves vectorization where the compiler is forming a V2DI vector
from 2 DI elements.  The elements are the same value (a stack address), so the
register allocator copies the address over to the VSX register file, and does
an XXPERMDI.  Because of the -fstack-protector, frame addresses are modified,
and become an ADD operation, and the direct move fails.

I added a splitter for DImode so that if a virtual register or frame address
register was attempted to be splatted to the VSX register file, it would copy
the value to a pseudo register, and do a direct move on that.

I have done a bootstrap and regression test with these patches and there were
no regressions.  Are the patches ok to install in the trunk?

[gcc]
2016-05-26  Michael Meissner  

PR target/71294
* config/rs6000/predicates.md (virtual_or_frame_reg_operand): New
predicate to return true if the operand is a virtual or frame
register.
* config/rs6000/vsx.md (move splat splitters): Add splitters to
copy a frame related pointer into a new pseudo register during the
first split pass, so that we don't confuse the register allocator.

[gcc/testsuite]
2016-05-26  Michael Meissner  

PR target/71294
* g++.dg/pr71294.C: New test.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 236800)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1959,3 +1959,20 @@ (define_predicate "fusion_offsettable_me
 
   return offsettable_nonstrict_memref_p (op);
 })
+
+;; Return true if the operand is a virtual or frame register.  The register
+;; allocator gets confused if a virtual/frame register is used in a splat
+;; operation when -fstack-protector is used.
+(define_predicate "virtual_or_frame_reg_operand"
+  (match_code "reg,subreg")
+{
+  HOST_WIDE_INT r;
+  if (SUBREG_P (op))
+op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+return 0;
+
+  r = REGNO (op);
+  return REGNO_PTR_FRAME_P (r);
+})
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 236800)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -2397,6 +2397,20 @@ (define_insn "vsx_splat_"
lxvdsx %x0,%y1"
   [(set_attr "type" "vecperm,vecperm,vecload,vecperm,vecperm,vecload")])
 
+;; Virtual/frame registers cause problems because they are replaced by a PLUS
+;; operation which confuses RELOAD if -fstack-protector is used.  Add a
+;; splitter to copy such registers to a temporary
+(define_split
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "")
+   (vec_duplicate:V2DI
+(match_operand:DI 1 "virtual_or_frame_reg_operand" "")))]
+  "TARGET_VSX && TARGET_POWERPC64 && can_create_pseudo_p ()"
+  [(match_dup 2) (match_dup 1)
+   (match_dup 0) (vec_duplicate:VSX_D (match_dup 2))]
+{
+  operands[2] = gen_reg_rtx (DImode);
+})
+
 ;; V4SI splat (ISA 3.0)
 ;; When SI's are allowed in VSX registers, add XXSPLTW support
 (define_expand "vsx_splat_"
@@ -2411,6 +2425,17 @@ (define_expand "vsx_splat_"
 operands[1] = force_reg (mode, operands[1]);
 })
 
+(define_split
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "")
+   (vec_duplicate:V4SI
+(match_operand:SI 1 "virtual_or_frame_reg_operand" "")))]
+  "TARGET_P9_VECTOR && !TARGET_POWERPC64 && can_create_pseudo_p ()"
+  [(match_dup 2) (match_dup 1)
+   (match_dup 0) (vec_duplicate:VSX_D (match_dup 2))]
+{
+  operands[2] = gen_reg_rtx (SImode);
+})
+
 (define_insn "*vsx_splat_v4si_internal"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
(vec_duplicate:V4SI
Index: gcc/testsuite/g++.dg/pr71294.C
===
--- gcc/testsuite/g++.dg/pr71294.C  (revision 0)
+++ gcc/testsuite/g++.dg/pr71294.C  (revision 0)
@@ -0,0 +1,56 @@
+/* { dg-do compile { target { powerpc64*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O3 -fstack-protector" } */
+
+class A;
+template  class B {
+public:
+  _Tp val[m * n];
+};
+class C {
+public:
+  C(A);
+};
+struct D {
+  D();
+  unsigned long &operator[](int);
+  unsigned long *p;
+};
+class A {
+public:
+  template  A(const B<_Tp, m, n> &, bool);
+  int rows, cols;
+  unsigned char *data;
+  unsigned char *datastart;
+  unsigned char *dataend;
+  unsigned char *datalimit;
+  D step;

Re: [Patch] Disable text mode translation in ada for Cygwin

2016-05-26 Thread JonY
On 5/26/2016 21:55, Arnaud Charlet wrote:
>> Text mode translation should not be done for Cygwin, especially since it
>> does not
>> support unicode setmode calls. This also fixes ada builds for Cygwin.
>>
>> OK for trunk?
> 
> OK, thanks.
> 

Can someone please commit this? I don't have SVN write access.

Thanks.




signature.asc
Description: OpenPGP digital signature


Re: [PATCH], Add PowerPC ISA 3.0 min/max support

2016-05-26 Thread Michael Meissner
On Thu, May 26, 2016 at 03:59:43PM -0500, Segher Boessenkool wrote:
> On Thu, May 26, 2016 at 01:04:59PM -0400, Michael Meissner wrote:
> > * config/rs6000/rs6000.md (SFDF2): New iterator to allow doing
> > conditional moves there the comparison type is different from move
> > type.
> 
> s/there/where/ ?
> 
> > 
> 
> Don't forget to delete this line ;-)

Whoops :-)

> > +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-options "-mcpu=power9 -O2 -mpower9-minmax -ffast-math" } */
> 
> Does it still need this require if you have these options?

Yes it needs -ffast-math.  Right now, you need -ffast-math to convert:

a = (b >= c) ? b : c;

into a max, but you don't for:

a = (b > c) ? b : c;

p9-minmax-1.c tests whether both cases are handled with -ffast-math, while
p9-minmax-2.c tests whether just the second case is handled without
-ffast-math.

> The patch is okay for trunk; okay for 6 later.  Thanks,
> 
> 
> Segher
> 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2016-05-26 Thread Joseph Myers
On Thu, 26 May 2016, Jan Hubicka wrote:

> > > +ffp-int-builtin-inexact
> > > +Common Report Var(flag_fp_int_builtin_inexact) Optimization
> > > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" 
> > > exceptions.
> 
> When adding new codegen option which affects the correctness, it is also
> necessary to update can_inline_edge_p and inline_call.

This patch version adds handling for the new option in those places.
Other changes: the default for the option is corrected so that
-ffp-int-builtin-inexact really is in effect by default as intended;
md.texi documentation for the patterns in question is updated to
describe how they are affected by this option.


Add option for whether ceil etc. can raise "inexact", adjust x86 conditions.

In ISO C99/C11, the ceil, floor, round and trunc functions may or may
not raise the "inexact" exception for noninteger arguments.  Under TS
18661-1:2014, the C bindings for IEEE 754-2008, these functions are
prohibited from raising "inexact", in line with the general rule that
"inexact" is only when the mathematical infinite precision result of a
function differs from the result after rounding to the target type.

GCC has no option to select TS 18661 requirements for not raising
"inexact" when expanding built-in versions of these functions inline.
Furthermore, even given such requirements, the conditions on the x86
insn patterns for these functions are unnecessarily restrictive.  I'd
like to make the out-of-line glibc versions follow the TS 18661
requirements; in the cases where this slows them down (the cases using
x87 floating point), that makes it more important for inline versions
to be used when the user does not care about "inexact".

This patch fixes these issues.  A new option
-fno-fp-int-builtin-inexact is added to request TS 18661 rules for
these functions; the default -ffp-int-builtin-inexact reflects that
such exceptions are allowed by C99 and C11.  (The intention is that if
C2x incorporates TS 18661-1, then the default would change in C2x
mode.)

The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made
unconditionally available (no longer depending on
-funsafe-math-optimizations or -fno-trapping-math); "inexact" is
correct for noninteger arguments to rint.  For floor, ceil and trunc,
the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or
-fno-trapping-math (they may raise "inexact" for noninteger
arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that
they do not raise "inexact" and so are OK unconditionally.

Now, while there was no semantic reason for depending on
-funsafe-math-optimizations, the insn patterns had such a dependence
because of use of gen_truncxf2_i387_noop to truncate back to
SFmode or DFmode after using frndint in XFmode.  In this case a no-op
truncation is safe because rounding to integer always produces an
exactly representable value (the same reason why IEEE semantics say it
shouldn't produce "inexact") - but of course that insn pattern isn't
safe because it would also match cases where the truncation is not in
fact a no-op.  To allow frndint to be used for SFmode and DFmode
without that unsafe pattern, the relevant frndint patterns are
extended to SFmode and DFmode or new SFmode and DFmode patterns added,
so that the frndint operation can be represented in RTL as an
operation acting directly on SFmode or DFmode without the extension
and the problematic truncation.

A generic test of the new option is added, as well as x86-specific
tests, both execution tests including the generic test with different
x86 options and scan-assembler tests verifying that functions that
should be inlined with different options are indeed inlined.

I think other architectures are OK for TS 18661-1 semantics already.
Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390
use instructions that do not raise "inexact"; nvptx does not support
floating-point exceptions.  (This does mean the -f option in fact only
affects one architecture, but I think it should still be a -f option;
it's logically architecture-independent and is expected to be affected
by future -std options, so is similar to e.g. -fexcess-precision=,
which also does nothing on most architectures but is implied by -std
options.)

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  OK to
commit?

gcc:
2016-05-26  Joseph Myers  

PR target/71276
PR target/71277
* common.opt (ffp-int-builtin-inexact): New option.
* doc/invoke.texi (-fno-fp-int-builtin-inexact): Document.
* doc/md.texi (floor@var{m}2, btrunc@var{m}2, round@var{m}2)
(ceil@var{m}2): Document dependence on this option.
* ipa-inline-transform.c (inline_call): Handle
flag_fp_int_builtin_inexact.
* ipa-inline.c (can_inline_edge_p): Likewise.
* config/i386/i386.md (rintxf2): Do not test
flag_unsafe_math_optimizations.
(rint2_frndint): New define_insn.
(rint2): Do not test flag_

Re: [PATCH], Add PowerPC ISA 3.0 min/max support

2016-05-26 Thread Segher Boessenkool
On Thu, May 26, 2016 at 01:04:59PM -0400, Michael Meissner wrote:
>   * config/rs6000/rs6000.md (SFDF2): New iterator to allow doing
>   conditional moves there the comparison type is different from move
>   type.

s/there/where/ ?

>   

Don't forget to delete this line ;-)

> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-mcpu=power9 -O2 -mpower9-minmax -ffast-math" } */

Does it still need this require if you have these options?

The patch is okay for trunk; okay for 6 later.  Thanks,


Segher


Re: [PATCH] Improve *vec_concatv2si_sse4_1

2016-05-26 Thread Jakub Jelinek
On Thu, May 26, 2016 at 07:39:01PM +0200, Uros Bizjak wrote:
> On Thu, May 26, 2016 at 7:05 PM, Jakub Jelinek  wrote:
> > Hi!
> >
> > This patch adds an avx512dq alternative (EVEX vpinsrd requires that) and
> > enables EVEX vmovd and vpunpckldq.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2016-05-26  Jakub Jelinek  
> >
> > * config/i386/sse.md (*vec_concatv2si_sse4_1): Add avx512dq v=Yv,rm
> > alternative.  Change x=x,x alternative to v=Yv,Yv and x=rm,C
> > alternative to v=rm,C.
> >
> > * gcc.target/i386/avx512dq-concatv2si-1.c: New test.
> > * gcc.target/i386/avx512vl-concatv2si-1.c: New test.
> 
> Ouch, I have just changed these mega strings in attribute definitions
> to something more readable. Can you please redo the attribute part? It
> should be much more pleasant experience than counting all the
> commas...).

Here is updated version of this patch (the other two pending sse.md patches
from me still apply cleanly):

2016-05-26  Jakub Jelinek  

* config/i386/sse.md (*vec_concatv2si_sse4_1): Add avx512dq v=Yv,rm
alternative.  Change x=x,x alternative to v=Yv,Yv and x=rm,C
alternative to v=rm,C.

* gcc.target/i386/avx512dq-concatv2si-1.c: New test.
* gcc.target/i386/avx512vl-concatv2si-1.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-26 10:44:25.0 +0200
+++ gcc/config/i386/sse.md  2016-05-26 14:22:26.819313220 +0200
@@ -13488,43 +13488,44 @@
 
 (define_insn "*vec_concatv2si_sse4_1"
   [(set (match_operand:V2SI 0 "register_operand"
- "=Yr,*x,x, Yr,*x,x, x, *y,*y")
+ "=Yr,*x, x, v,Yr,*x, v, v, *y,*y")
(vec_concat:V2SI
  (match_operand:SI 1 "nonimmediate_operand"
- "  0, 0,x,  0,0, x,rm,  0,rm")
+ "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
  (match_operand:SI 2 "vector_move_operand"
- " rm,rm,rm,Yr,*x,x, C,*ym, C")))]
+ " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
pinsrd\t{$1, %2, %0|%0, %2, 1}
pinsrd\t{$1, %2, %0|%0, %2, 1}
vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
+   vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
punpckldq\t{%2, %0|%0, %2}
punpckldq\t{%2, %0|%0, %2}
vpunpckldq\t{%2, %1, %0|%0, %1, %2}
%vmovd\t{%1, %0|%0, %1}
punpckldq\t{%2, %0|%0, %2}
movd\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
+  [(set_attr "isa" "noavx,noavx,avx,avx512dq,noavx,noavx,avx,*,*,*")
(set (attr "type")
- (cond [(eq_attr "alternative" "6")
+ (cond [(eq_attr "alternative" "7")
  (const_string "ssemov")
-   (eq_attr "alternative" "7")
- (const_string "mmxcvt")
(eq_attr "alternative" "8")
+ (const_string "mmxcvt")
+   (eq_attr "alternative" "9")
  (const_string "mmxmov")
   ]
   (const_string "sselog")))
(set (attr "prefix_extra")
- (if_then_else (eq_attr "alternative" "0,1,2")
+ (if_then_else (eq_attr "alternative" "0,1,2,3")
   (const_string "1")
   (const_string "*")))
(set (attr "length_immediate")
- (if_then_else (eq_attr "alternative" "0,1,2")
+ (if_then_else (eq_attr "alternative" "0,1,2,3")
   (const_string "1")
   (const_string "*")))
-   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
-   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,DI,DI")])
+   (set_attr "prefix" 
"orig,orig,vex,evex,orig,orig,maybe_evex,maybe_vex,orig,orig")
+   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,TI,DI,DI")])
 
 ;; ??? In theory we can match memory for the MMX alternative, but allowing
 ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
--- gcc/testsuite/gcc.target/i386/avx512dq-concatv2si-1.c.jj2016-05-26 
15:14:55.853786550 +0200
+++ gcc/testsuite/gcc.target/i386/avx512dq-concatv2si-1.c   2016-05-26 
15:13:57.0 +0200
@@ -0,0 +1,43 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mavx512dq -masm=att" } */
+
+typedef int V __attribute__((vector_size (8)));
+
+void
+f1 (int x, int y)
+{
+  register int a __asm ("xmm16");
+  register int b __asm ("xmm17");
+  register V c __asm ("xmm3");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  c = (V) { a, b };
+  asm volatile ("" : "+v" (c));
+}
+
+/* { dg-final { scan-assembler 
"vpunpckldq\[^\n\r]*%xmm17\[^\n\r]*%xmm16\[^\n\r]*%xmm3" } } */
+
+void
+f2 (int x, int y)
+{
+  register int a __asm ("xmm16");
+  register V c __asm ("xmm3");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  c = (V) { a, y };
+  asm volatile ("" : "+v" (c));
+}
+
+void
+f3 (int x, int *y)
+{
+  register int a __asm ("xmm16");
+  register V c __asm ("xmm3");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  c = (V) { a, *y };
+  asm volatile ("" : "+v" (c));
+}
+
+/* { dg-final { scan-assembler-times 
"v

[committed] Warn about OpenMP schedule clause chunk size when proven not positive

2016-05-26 Thread Jakub Jelinek
Hi!

While porting doacross-1.c testcase to Fortran, I've discovered
that I've used there schedule(static, 0), which is invalid (I've meant
schedule(static) instead).  This patch adds warning for this for all FEs.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2016-05-26  Jakub Jelinek  

* c-parser.c (c_parser_omp_clause_schedule): Warn if
OMP_CLAUSE_SCHEDULE_CHUNK_EXPR is known not to be positive.

* semantics.c (finish_omp_clauses) : Warn
if OMP_CLAUSE_SCHEDULE_CHUNK_EXPR is known not to be positive.

* openmp.c (resolve_omp_clauses): Warn if chunk_size is known not to
be positive.

* c-c++-common/gomp/schedule-1.c: New test.
* gfortran.dg/gomp/schedule-1.f90: New test.

* testsuite/libgomp.c/doacross-1.c (main): Use schedule(static)
instead of invalid schedule(static, 0).
* testsuite/libgomp.c/doacross-2.c (main): Likewise.

--- gcc/c/c-parser.c.jj 2016-05-26 10:37:53.0 +0200
+++ gcc/c/c-parser.c2016-05-26 13:36:21.443785799 +0200
@@ -12128,7 +12128,20 @@ c_parser_omp_clause_schedule (c_parser *
  "schedule % does not take "
  "a % parameter");
   else if (TREE_CODE (TREE_TYPE (t)) == INTEGER_TYPE)
-   OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (c) = t;
+   {
+ /* Attempt to statically determine when the number isn't
+positive.  */
+ tree s = fold_build2_loc (loc, LE_EXPR, boolean_type_node, t,
+   build_int_cst (TREE_TYPE (t), 0));
+ protected_set_expr_location (s, loc);
+ if (s == boolean_true_node)
+   {
+ warning_at (loc, 0,
+ "chunk size value must be positive");
+ t = integer_one_node;
+   }
+ OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (c) = t;
+   }
   else
c_parser_error (parser, "expected integer expression");
 
--- gcc/cp/semantics.c.jj   2016-05-26 10:38:01.0 +0200
+++ gcc/cp/semantics.c  2016-05-26 19:09:16.989908218 +0200
@@ -6326,6 +6326,17 @@ finish_omp_clauses (tree clauses, enum c
  break;
}
}
+ else
+   {
+ t = maybe_constant_value (t);
+ if (TREE_CODE (t) == INTEGER_CST
+ && tree_int_cst_sgn (t) != 1)
+   {
+ warning_at (OMP_CLAUSE_LOCATION (c), 0,
+ "chunk size value must be positive");
+ t = integer_one_node;
+   }
+   }
  t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
}
  OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (c) = t;
--- gcc/fortran/openmp.c.jj 2016-05-05 15:30:29.0 +0200
+++ gcc/fortran/openmp.c2016-05-26 13:27:49.865524212 +0200
@@ -3259,6 +3259,11 @@ resolve_omp_clauses (gfc_code *code, gfc
  || expr->ts.type != BT_INTEGER || expr->rank != 0)
gfc_error ("SCHEDULE clause's chunk_size at %L requires "
   "a scalar INTEGER expression", &expr->where);
+  else if (expr->expr_type == EXPR_CONSTANT
+  && expr->ts.type == BT_INTEGER
+  && mpz_sgn (expr->value.integer) <= 0)
+   gfc_warning (0, "INTEGER expression of SCHEDULE clause's chunk_size "
+"at %L must be positive", &expr->where);
 }
 
   /* Check that no symbol appears on multiple clauses, except that
--- gcc/testsuite/c-c++-common/gomp/schedule-1.c.jj 2016-05-26 
13:32:04.566169067 +0200
+++ gcc/testsuite/c-c++-common/gomp/schedule-1.c2016-05-26 
13:37:30.638874653 +0200
@@ -0,0 +1,14 @@
+void
+foo (void)
+{
+  int i;
+  #pragma omp for schedule(static, 1)
+  for (i = 0; i < 10; i++)
+;
+  #pragma omp for schedule(static, 0)  /* { dg-warning "chunk size 
value must be positive" } */
+  for (i = 0; i < 10; i++)
+;
+  #pragma omp for schedule(static, -7) /* { dg-warning "chunk size 
value must be positive" } */
+  for (i = 0; i < 10; i++)
+;
+}
--- gcc/testsuite/gfortran.dg/gomp/schedule-1.f90.jj2016-05-26 
13:32:57.556471031 +0200
+++ gcc/testsuite/gfortran.dg/gomp/schedule-1.f90   2016-05-26 
13:39:07.357601082 +0200
@@ -0,0 +1,11 @@
+  integer :: i
+  !$omp do schedule(static, 1)
+  do i = 1, 10
+  end do
+  !$omp do schedule(static, 0) ! { dg-warning "must be positive" }
+  do i = 1, 10
+  end do
+  !$omp do schedule(static, -7)! { dg-warning "must be positive" }
+  do i = 1, 10
+  end do
+end
--- libgomp/testsuite/libgomp.c/doacross-1.c.jj 2015-10-13 20:57:41.0 
+0200
+++ libgomp/testsuite/libgomp.c/doacross-1.c2016-05-26 13:40:09.698780187 
+0200
@@ -36,7 +36,7 @@ main ()
#pragma omp atomic write
a[i] = 3;
   }
-#pragma omp for schedule(static, 0) ordered (3) nowai

Further refinement to -Wswitch-unreachable

2016-05-26 Thread Marek Polacek
Martin complained that -Wswitch-unreachable wouldn't warn on try-blocks,
either compiler-generated or user-written.  This patch, which looks into
GIMPLE_TRY's body, seems to DTRT for both.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-26  Marek Polacek  

* gimplify.c (gimplify_switch_expr): Also handle GIMPLE_TRY.

* c-c++-common/Wswitch-unreachable-3.c: New test.
* g++.dg/warn/Wswitch-unreachable-1.C: New test.

diff --git gcc/gimplify.c gcc/gimplify.c
index 8316bb8..8b7dddc 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -1609,10 +1609,17 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
  while (gimple_code (seq) == GIMPLE_BIND)
seq = gimple_bind_body (as_a  (seq));
  gimple *stmt = gimple_seq_first_stmt (seq);
- enum gimple_code code = gimple_code (stmt);
- if (code != GIMPLE_LABEL && code != GIMPLE_TRY)
+ if (gimple_code (stmt) == GIMPLE_TRY)
{
- if (code == GIMPLE_GOTO
+ /* A compiler-generated cleanup or a user-written try block.
+Try to get the first statement in its try-block, for better
+location.  */
+ if ((seq = gimple_try_eval (stmt)))
+   stmt = gimple_seq_first_stmt (seq);
+   }
+ if (gimple_code (stmt) != GIMPLE_LABEL)
+   {
+ if (gimple_code (stmt) == GIMPLE_GOTO
  && TREE_CODE (gimple_goto_dest (stmt)) == LABEL_DECL
  && DECL_ARTIFICIAL (gimple_goto_dest (stmt)))
/* Don't warn for compiler-generated gotos.  These occur
diff --git gcc/testsuite/c-c++-common/Wswitch-unreachable-3.c 
gcc/testsuite/c-c++-common/Wswitch-unreachable-3.c
index e69de29..3748701 100644
--- gcc/testsuite/c-c++-common/Wswitch-unreachable-3.c
+++ gcc/testsuite/c-c++-common/Wswitch-unreachable-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+
+extern void f (int *, int);
+
+void
+g (int i)
+{
+  switch (i)
+{
+  int a[3];
+  __builtin_memset (a, 0, sizeof a); /* { dg-warning "statement will never 
be executed" } */
+
+default:
+  f (a, 3);
+}
+}
diff --git gcc/testsuite/g++.dg/warn/Wswitch-unreachable-1.C 
gcc/testsuite/g++.dg/warn/Wswitch-unreachable-1.C
index e69de29..99d9a83 100644
--- gcc/testsuite/g++.dg/warn/Wswitch-unreachable-1.C
+++ gcc/testsuite/g++.dg/warn/Wswitch-unreachable-1.C
@@ -0,0 +1,34 @@
+// { dg-do compile }
+
+extern int j;
+
+void
+f (int i)
+{
+  switch (i) // { dg-warning "statement will never be executed" }
+{
+  try
+  {
+  }
+  catch (...)
+  {
+  }
+case 1:;
+}
+}
+
+void
+g (int i)
+{
+  switch (i)
+{
+  try
+  {
+   j = 42;  // { dg-warning "statement will never be executed" }
+  }
+  catch (...)
+  {
+  }
+case 1:;
+}
+}

Marek


Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2016-05-26 Thread Jan Hubicka
> > +ffp-int-builtin-inexact
> > +Common Report Var(flag_fp_int_builtin_inexact) Optimization
> > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" 
> > exceptions.

When adding new codegen option which affects the correctness, it is also
necessary to update can_inline_edge_p and inline_call.

(In general it would be great if we had fewer such flags and more stuff
explicitly represented in IL. I am not sure how hard that would be here and
if it is worth the effort.)

Honza
> > +
> >  ; Nonzero means don't put addresses of constant functions in registers.
> >  ; Used for compiling the Unix kernel, where strange substitutions are
> >  ; done on the assembly output.
> > Index: gcc/config/i386/i386.md
> > ===
> > --- gcc/config/i386/i386.md (revision 236740)
> > +++ gcc/config/i386/i386.md (working copy)
> > @@ -15512,25 +15512,31 @@
> >[(set (match_operand:XF 0 "register_operand" "=f")
> > (unspec:XF [(match_operand:XF 1 "register_operand" "0")]
> >UNSPEC_FRNDINT))]
> > -  "TARGET_USE_FANCY_MATH_387
> > -   && flag_unsafe_math_optimizations"
> > +  "TARGET_USE_FANCY_MATH_387"
> >"frndint"
> >[(set_attr "type" "fpspc")
> > (set_attr "znver1_decode" "vector")
> > (set_attr "mode" "XF")])
> >
> > +(define_insn "rint2_frndint"
> > +  [(set (match_operand:MODEF 0 "register_operand" "=f")
> > +   (unspec:MODEF [(match_operand:MODEF 1 "register_operand" "0")]
> > + UNSPEC_FRNDINT))]
> > +  "TARGET_USE_FANCY_MATH_387"
> > +  "frndint"
> > +  [(set_attr "type" "fpspc")
> > +   (set_attr "znver1_decode" "vector")
> > +   (set_attr "mode" "")])
> > +
> >  (define_expand "rint2"
> >[(use (match_operand:MODEF 0 "register_operand"))
> > (use (match_operand:MODEF 1 "register_operand"))]
> >"(TARGET_USE_FANCY_MATH_387
> >  && (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > -   || TARGET_MIX_SSE_I387)
> > -&& flag_unsafe_math_optimizations)
> > -   || (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
> > -   && !flag_trapping_math)"
> > +   || TARGET_MIX_SSE_I387))
> > +   || (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)"
> >  {
> > -  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
> > -  && !flag_trapping_math)
> > +  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> >  {
> >if (TARGET_ROUND)
> > emit_insn (gen_sse4_1_round2
> > @@ -15539,15 +15545,7 @@
> > ix86_expand_rint (operands[0], operands[1]);
> >  }
> >else
> > -{
> > -  rtx op0 = gen_reg_rtx (XFmode);
> > -  rtx op1 = gen_reg_rtx (XFmode);
> > -
> > -  emit_insn (gen_extendxf2 (op1, operands[1]));
> > -  emit_insn (gen_rintxf2 (op0, op1));
> > -
> > -  emit_insn (gen_truncxf2_i387_noop (operands[0], op0));
> > -}
> > +emit_insn (gen_rint2_frndint (operands[0], operands[1]));
> >DONE;
> >  })
> >
> > @@ -15770,13 +15768,13 @@
> >  (UNSPEC_FIST_CEIL "CEIL")])
> >
> >  ;; Rounding mode control word calculation could clobber FLAGS_REG.
> > -(define_insn_and_split "frndintxf2_"
> > -  [(set (match_operand:XF 0 "register_operand")
> > -   (unspec:XF [(match_operand:XF 1 "register_operand")]
> > +(define_insn_and_split "frndint2_"
> > +  [(set (match_operand:X87MODEF 0 "register_operand")
> > +   (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand")]
> >FRNDINT_ROUNDING))
> > (clobber (reg:CC FLAGS_REG))]
> >"TARGET_USE_FANCY_MATH_387
> > -   && flag_unsafe_math_optimizations
> > +   && (flag_fp_int_builtin_inexact || !flag_trapping_math)
> > && can_create_pseudo_p ()"
> >"#"
> >"&& 1"
> > @@ -15787,26 +15785,26 @@
> >operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED);
> >operands[3] = assign_386_stack_local (HImode, SLOT_CW_);
> >
> > -  emit_insn (gen_frndintxf2__i387 (operands[0], operands[1],
> > -operands[2], operands[3]));
> > +  emit_insn (gen_frndint2__i387 (operands[0], operands[1],
> > +operands[2], operands[3]));
> >DONE;
> >  }
> >[(set_attr "type" "frndint")
> > (set_attr "i387_cw" "")
> > -   (set_attr "mode" "XF")])
> > +   (set_attr "mode" "")])
> >
> > -(define_insn "frndintxf2__i387"
> > -  [(set (match_operand:XF 0 "register_operand" "=f")
> > -   (unspec:XF [(match_operand:XF 1 "register_operand" "0")]
> > -  FRNDINT_ROUNDING))
> > +(define_insn "frndint2__i387"
> > +  [(set (match_operand:X87MODEF 0 "register_operand" "=f")
> > +   (unspec:X87MODEF [(match_operand:X87MODEF 1 "register_operand" "0")]
> > +FRNDINT_ROUNDING))
> > (use (match_operand:HI 2 "memory_operand" "m"))
> > (use (match_operand:HI 3 "memory_operand" "m"))]
> >"TARGET_USE_FANCY_MATH_387
> > -   && flag_unsafe_math_optimizations"
> > +   && (flag_fp_int_builtin_inexact || !

Re: tuple move constructor

2016-05-26 Thread Jonathan Wakely

On 26/05/16 19:49 +0200, Marc Glisse wrote:

On Thu, 26 May 2016, Jonathan Wakely wrote:


On 25/05/16 14:54 +0100, Jonathan Wakely wrote:

On 23/05/16 20:39 +0200, Marc Glisse wrote:

Ping

(re-attaching, I just added a one-line comment before the tag 
class as asked by Ville)


This is OK for trunk - thanks.


On second thoughts - does this change the passing conventions for
std::tuple if it gets a trivial move ctor?


Note that this part of the ABI is ill-defined
http://sourcerytools.com/pipermail/cxx-abi-dev/2016-February/002884.html

but yes, good catch, it does change the passing convention (by value), 
and not just for weirdo types, it even changes for tuple. It is 
clearly a change in the right direction, not passing tuple in a 
register is weird, but yeah, compatibility :-(


I don't even want to think of trying to fix this issue in C++11 while 
artificially preserving the non-triviality of tuple, the headache is 
not worth it. I guess I'll open an entry in bugzilla with the ABI tag 
and let it rot there...


Please do.


Maybe we could

#if __cpp_concepts >= 201500
the alternative discussed with Ville
#endif

but that won't fix the fact that tuple should be trivially move 
constructible...


We could add 
__attribute__(non_trivial_for_purpose_of_passing_convention), but I 
think abi_tag has already stretched enough the idea that gcc is 
following the itanium abi.


Bah, forget this patch. Thanks for noticing early, that spares me the 
trouble of reverting later.


I have a dream of resurrecting the gnu-versioned-namespace config (and
bumping from std::__7 / libstdc++.so.7 to std::__8 / libstdc++.so.8)
so that people who don't want backwards compatibility can use that
mode, which would enable various nice optimizations that can't be made
in the default mode because of compatibility.

This would be something we should change when that config is in use.
So hopefully if you open a bugzilla entry it won't rot forever, but
only until I realise my dream.



Re: tuple move constructor

2016-05-26 Thread Marc Glisse

On Thu, 26 May 2016, Jonathan Wakely wrote:


On 25/05/16 14:54 +0100, Jonathan Wakely wrote:

On 23/05/16 20:39 +0200, Marc Glisse wrote:

Ping

(re-attaching, I just added a one-line comment before the tag class as 
asked by Ville)


This is OK for trunk - thanks.


On second thoughts - does this change the passing conventions for
std::tuple if it gets a trivial move ctor?


Note that this part of the ABI is ill-defined
http://sourcerytools.com/pipermail/cxx-abi-dev/2016-February/002884.html

but yes, good catch, it does change the passing convention (by value), and 
not just for weirdo types, it even changes for tuple. It is clearly a 
change in the right direction, not passing tuple in a register is 
weird, but yeah, compatibility :-(


I don't even want to think of trying to fix this issue in C++11 while 
artificially preserving the non-triviality of tuple, the headache is not 
worth it. I guess I'll open an entry in bugzilla with the ABI tag and let 
it rot there...


Maybe we could

#if __cpp_concepts >= 201500
the alternative discussed with Ville
#endif

but that won't fix the fact that tuple should be trivially move 
constructible...


We could add __attribute__(non_trivial_for_purpose_of_passing_convention), 
but I think abi_tag has already stretched enough the idea that gcc is 
following the itanium abi.


Bah, forget this patch. Thanks for noticing early, that spares me the 
trouble of reverting later.


--
Marc Glisse


Re: [PATCH] Improve *vec_concatv2si_sse4_1

2016-05-26 Thread Uros Bizjak
On Thu, May 26, 2016 at 7:05 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch adds an avx512dq alternative (EVEX vpinsrd requires that) and
> enables EVEX vmovd and vpunpckldq.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-26  Jakub Jelinek  
>
> * config/i386/sse.md (*vec_concatv2si_sse4_1): Add avx512dq v=Yv,rm
> alternative.  Change x=x,x alternative to v=Yv,Yv and x=rm,C
> alternative to v=rm,C.
>
> * gcc.target/i386/avx512dq-concatv2si-1.c: New test.
> * gcc.target/i386/avx512vl-concatv2si-1.c: New test.

Ouch, I have just changed these mega strings in attribute definitions
to something more readable. Can you please redo the attribute part? It
should be much more pleasant experience than counting all the
commas...).

Uros.

> --- gcc/config/i386/sse.md.jj   2016-05-26 10:44:25.0 +0200
> +++ gcc/config/i386/sse.md  2016-05-26 14:22:26.819313220 +0200
> @@ -13339,29 +13339,30 @@ (define_split
>
>  (define_insn "*vec_concatv2si_sse4_1"
>[(set (match_operand:V2SI 0 "register_operand"
> - "=Yr,*x,x, Yr,*x,x, x, *y,*y")
> + "=Yr,*x, x, v,Yr,*x, v, v, *y,*y")
> (vec_concat:V2SI
>   (match_operand:SI 1 "nonimmediate_operand"
> - "  0, 0,x,  0,0, x,rm,  0,rm")
> + "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
>   (match_operand:SI 2 "vector_move_operand"
> - " rm,rm,rm,Yr,*x,x, C,*ym, C")))]
> + " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
>"TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>"@
> pinsrd\t{$1, %2, %0|%0, %2, 1}
> pinsrd\t{$1, %2, %0|%0, %2, 1}
> vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
> +   vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
> punpckldq\t{%2, %0|%0, %2}
> punpckldq\t{%2, %0|%0, %2}
> vpunpckldq\t{%2, %1, %0|%0, %1, %2}
> %vmovd\t{%1, %0|%0, %1}
> punpckldq\t{%2, %0|%0, %2}
> movd\t{%1, %0|%0, %1}"
> -  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
> -   (set_attr "type" 
> "sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
> -   (set_attr "prefix_extra" "1,1,1,*,*,*,*,*,*")
> -   (set_attr "length_immediate" "1,1,1,*,*,*,*,*,*")
> -   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
> -   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,DI,DI")])
> +  [(set_attr "isa" "noavx,noavx,avx,avx512dq,noavx,noavx,avx,*,*,*")
> +   (set_attr "type" 
> "sselog,sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
> +   (set_attr "prefix_extra" "1,1,1,1,*,*,*,*,*,*")
> +   (set_attr "length_immediate" "1,1,1,1,*,*,*,*,*,*")
> +   (set_attr "prefix" 
> "orig,orig,vex,evex,orig,orig,maybe_evex,maybe_vex,orig,orig")
> +   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,TI,DI,DI")])
>
>  ;; ??? In theory we can match memory for the MMX alternative, but allowing
>  ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
> --- gcc/testsuite/gcc.target/i386/avx512dq-concatv2si-1.c.jj2016-05-26 
> 15:14:55.853786550 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512dq-concatv2si-1.c   2016-05-26 
> 15:13:57.0 +0200
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512vl -mavx512dq -masm=att" } */
> +
> +typedef int V __attribute__((vector_size (8)));
> +
> +void
> +f1 (int x, int y)
> +{
> +  register int a __asm ("xmm16");
> +  register int b __asm ("xmm17");
> +  register V c __asm ("xmm3");
> +  a = x;
> +  b = y;
> +  asm volatile ("" : "+v" (a), "+v" (b));
> +  c = (V) { a, b };
> +  asm volatile ("" : "+v" (c));
> +}
> +
> +/* { dg-final { scan-assembler 
> "vpunpckldq\[^\n\r]*%xmm17\[^\n\r]*%xmm16\[^\n\r]*%xmm3" } } */
> +
> +void
> +f2 (int x, int y)
> +{
> +  register int a __asm ("xmm16");
> +  register V c __asm ("xmm3");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  c = (V) { a, y };
> +  asm volatile ("" : "+v" (c));
> +}
> +
> +void
> +f3 (int x, int *y)
> +{
> +  register int a __asm ("xmm16");
> +  register V c __asm ("xmm3");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  c = (V) { a, *y };
> +  asm volatile ("" : "+v" (c));
> +}
> +
> +/* { dg-final { scan-assembler-times 
> "vpinsrd\[^\n\r]*\\\$1\[^\n\r]*%xmm16\[^\n\r]*%xmm3" 2 } } */
> --- gcc/testsuite/gcc.target/i386/avx512vl-concatv2si-1.c.jj2016-05-26 
> 15:15:11.921574803 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512vl-concatv2si-1.c   2016-05-26 
> 15:16:24.936612585 +0200
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512vl -mno-avx512dq -masm=att" } */
> +
> +typedef int V __attribute__((vector_size (8)));
> +
> +void
> +f1 (int x, int y)
> +{
> +  register int a __asm ("xmm16");
> +  register int b __asm ("xmm17");
> +  register V c __asm ("xmm3");
> +  a = x;
> +  b = y;
> +  asm volatile ("" : "+v" (a), "+v" (b));
> +  c = (V) { a, b };
> +  asm volatile ("" : "+v" (c));
> +}
> +
> +/* { dg-final { scan-assembler 
> "vpunpckldq\[^\n\r]*%

[PATCH, i386]: Use if_then_else or cond RTXes to calculate attribute value

2016-05-26 Thread Uros Bizjak
Hello!

Some of these strings went out of control. Use if_then_else or cond
RTXes to make things readable and maintainable again.

No functional changes.

2016-05-26  Uros Bizjak  

* config/i386/i386.md (*movqi_internal) : Use
if_then_else or cond RTXes to calculate attribute value.
* config/i386/mmx.md (*vec_extractv2sf_1) : Ditto.
: Ditto.
(*vec_extractv2sf_1) : Ditto.
* config/i386/sse.md (sse_loadlps) : Ditto.
(*vec_concatv2sf_sse4_1) : Ditto.
: Ditto.
: Ditto.
: Ditto.
: Ditto.
: Ditto.
(vec_set_0) : Ditto.
: Ditto.
: Ditto.
: Ditto.
(*vec_interleave_highv2df) : Ditto.
(*vec_interleave_lowv2df) : Ditto.
(sse2_storelpd) : Ditto.
(sse2_loadhpd) : Ditto.
(sse2_loadlpd) : Ditto.
: Ditto.
: Ditto.
(sse2_movsd) : Ditto.
: Ditto.
(vec_concatv2df)  : Ditto.
: Ditto.
(*vec_extractv4si) : Ditto.
(*vec_extractv2di_1) : Ditto.
: Ditto.
: Ditto.
: Ditto.
: Ditto.
(*vec_concatv2si_sse4_1) : Ditto.
: Ditto.
: Ditto.
(vec_concatv2di) : Ditto.
: Ditto.
: Ditto.
: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d20bbe4..6a2978e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2594,7 +2594,10 @@
 return "mov{b}\t{%1, %0|%0, %1}";
 }
 }
-  [(set_attr "isa" "*,*,*,*,*,*,*,*,*,*,avx512dq,avx512dq")
+  [(set (attr "isa")
+ (if_then_else (eq_attr "alternative" "10,11")
+   (const_string "avx512dq")
+   (const_string "*")))
(set (attr "type")
  (cond [(eq_attr "alternative" "7,8,9,10,11")
  (const_string "mskmov")
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 9a239c2f..65e8b46 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -610,8 +610,14 @@
#"
   [(set_attr "isa" "*,sse3,noavx,*,*,*,*")
(set_attr "type" "mmxcvt,sse,sseshuf1,mmxmov,ssemov,fmov,imov")
-   (set_attr "length_immediate" "*,*,1,*,*,*,*")
-   (set_attr "prefix_rep" "*,1,*,*,*,*,*")
+   (set (attr "length_immediate")
+ (if_then_else (eq_attr "alternative" "2")
+  (const_string "1")
+  (const_string "*")))
+   (set (attr "prefix_rep")
+ (if_then_else (eq_attr "alternative" "1")
+  (const_string "1")
+  (const_string "*")))
(set_attr "prefix" "orig,maybe_vex,orig,orig,orig,orig,orig")
(set_attr "mode" "DI,V4SF,V4SF,SF,SF,SF,SF")])
 
@@ -1297,7 +1303,10 @@
#"
   [(set_attr "isa" "*,sse2,noavx,*,*,*")
(set_attr "type" "mmxcvt,sseshuf1,sseshuf1,mmxmov,ssemov,imov")
-   (set_attr "length_immediate" "*,1,1,*,*,*")
+   (set (attr "length_immediate")
+ (if_then_else (eq_attr "alternative" "1,2")
+  (const_string "1")
+  (const_string "*")))
(set_attr "prefix" "orig,maybe_vex,orig,orig,orig,orig")
(set_attr "mode" "DI,TI,V4SF,SI,SI,SI")])
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2297ca2..ccd8173 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6510,7 +6510,10 @@
%vmovlps\t{%2, %0|%q0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
(set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov")
-   (set_attr "length_immediate" "1,1,*,*,*")
+   (set (attr "length_immediate")
+ (if_then_else (eq_attr "alternative" "0,1")
+  (const_string "1")
+  (const_string "*")))
(set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
(set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
 
@@ -6586,12 +6589,41 @@
%vmovss\t{%1, %0|%0, %1}
punpckldq\t{%2, %0|%0, %2}
movd\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
-   (set_attr "type" 
"sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
-   (set_attr "prefix_data16" "*,*,*,1,1,*,*,*,*")
-   (set_attr "prefix_extra" "*,*,*,1,1,1,*,*,*")
-   (set_attr "length_immediate" "*,*,*,1,1,1,*,*,*")
-   (set_attr "prefix" 
"orig,orig,maybe_evex,orig,orig,maybe_evex,maybe_vex,orig,orig")
+  [(set (attr "isa")
+ (cond [(eq_attr "alternative" "0,1,3,4")
+ (const_string "noavx")
+   (eq_attr "alternative" "2,5")
+ (const_string "avx")
+  ]
+  (const_string "*")))
+   (set (attr "type")
+ (cond [(eq_attr "alternative" "6")
+ (const_string "ssemov")
+   (eq_attr "alternative" "7")
+ (const_string "mmxcvt")
+   (eq_attr "alternative" "8")
+ (const_string "mmxmov")
+  ]
+  (const_string "sselog")))
+   (set (attr "prefix_data16")
+ (if_then_else (eq_attr "alternative" "3,4")
+  (const_string "1")
+  (const_string "*")))
+   (set (attr "prefix_extra")
+ (if_then_else (eq_attr "alternative" "3,4,5")
+  (cons

Re: C PATCH for comptypes handling of TYPE_REF_CAN_ALIAS_ALL

2016-05-26 Thread Joseph Myers
On Thu, 26 May 2016, Marek Polacek wrote:

> The C++ FE has been changed, as a part of c++/50800, in such a way that it no
> longer considers types differentiating only in TYPE_REF_CAN_ALIAS_ALL
> incompatible.  But the C FE still rejects the following testcase, so this 
> patch
> makes the C FE follow suit.  After all, the may_alias attribute is not
> considered as "affects_type_identity".  This TYPE_REF_CAN_ALIAS_ALL check was
> introduced back in 2004 (r90078), but since then we've gotten rid of them, 
> only
> comptypes_internal retained it.  I suspect the TYPE_MODE check might go too,
> but I don't feel like changing that right now.
> 
> This arised when discussing struct sockaddr vs. may_alias issue in glibc.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

I'd expect you to need to do something about composite types, to ensure 
that the composite type has TYPE_REF_CAN_ALIAS_ALL set if either of the 
two types does - along with tests in such a case, where the two types are 
in either order, that the composite type produced really is treated as 
may_alias.  (The sort of cases I'm thinking of are

typedef int T __attribute__((may_alias));
extern T *p;
extern int *p;

with the declarations in either order, and then making sure that 
type-based aliasing handles references through this pointer properly.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PTX] more tests annotations

2016-05-26 Thread Alexander Monakov
Hello,

On Thu, 26 May 2016, Nathan Sidwell wrote:

> Applied the attached to markup some more tests that PTX either crashes on or
> doesn't apply strict IEEE semantics to.  In one case it's about debug info
> that we don't emit.

> Index: gcc.dg/torture/c99-contract-1.c
> ===
> --- gcc.dg/torture/c99-contract-1.c   (revision 236702)
> +++ gcc.dg/torture/c99-contract-1.c   (working copy)
> @@ -2,6 +2,7 @@
> expressions.  */
>  /* { dg-do run } */
>  /* { dg-options "-std=c99 -pedantic-errors" } */
> +/* { dg-skip-if "ptx only loosely follows IEEE" { "nvptx-*-*" } { "*" } { "" 
> } } */

AFAIK both PTX and the underlying hardware implementation have good support
for IEEE semantics.  Here, either GCC emits fused multiply-add directly (at
-Os23), or it emits mul/add instructions without explicit rounding modifiers,
which allows PTX translation to contract the operations. In both cases that is
a GCC bug, correctly exposed by the testcase.

> Index: c-c++-common/torture/complex-sign-mixed-add.c
> ===
> --- c-c++-common/torture/complex-sign-mixed-add.c (revision 236702)
> +++ c-c++-common/torture/complex-sign-mixed-add.c (working copy)
> @@ -2,6 +2,7 @@
> addition.  */
>  /* { dg-do run } */
>  /* { dg-options "-std=gnu99" { target c } } */
> +/* { dg-skip-if "ptx can elide zero additions" { "nvptx-*-*" } { "-O0" } { 
> "" } } */

In light of the above I think this and the other similar case shouldn't be
disabled, not without deeper investigation.

Alexander


Re: [PATCH 1/3] Encapsulate comp_cost within a class with methods.

2016-05-26 Thread Martin Liška
On 05/19/2016 01:24 PM, Bin.Cheng wrote:
> On Thu, May 19, 2016 at 11:23 AM, Martin Liška  wrote:
>> On 05/16/2016 03:55 PM, Martin Liška wrote:
>>> On 05/16/2016 12:13 PM, Bin.Cheng wrote:
 Hi Martin,
 Could you please rebase this patch and the profiling one against
 latest trunk?  The third patch was applied before these two now.

 Thanks,
 bin
>>>
>>> Hello.
>>>
>>> Sending the rebased version of the patch.
>>>
>>> Martin
>>>
>>
>> Hello.
>>
>> As I've dramatically changed the 2/3 PATCH, a class encapsulation is not 
>> needed any longer.
>> Thus, I've reduced this patch just to usage of member function/operators 
>> that are useful
>> in my eyes. It's up the Bin whether to merge the patch?
> Yes, I think we want c++-ify such structures.
> 
>> +comp_cost
>> +operator- (comp_cost cost1, comp_cost cost2)
>> +{
>> +  if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
>> +return comp_cost::get_infinite ();
>> +
>> +  cost1.cost -= cost2.cost;
>> +  cost1.complexity -= cost2.complexity;
>> +
>> +  return cost1;
>> +}
> For subtraction, should we expect the second operand as infinite?
> Maybe add an assertion for it in case anything goes wrong here.

Hi.

Done.

> 
>> +comp_cost
>> +comp_cost::get_infinite ()
>> +{
>> +  return comp_cost (INFTY, INFTY);
>> +}
>> +
>> +comp_cost
>> +comp_cost::get_no_cost ()
>> +{
>> +  return comp_cost ();
>> +}
> I think we may keep the original global variables for
> no_cost&infinite_cost, and save these two methods.

Likewise.

>>
>> @@ -5982,11 +6083,11 @@ iv_ca_recount_cost (struct ivopts_data *data, struct 
>> iv_ca *ivs)
>>  {
>>comp_cost cost = ivs->cand_use_cost;
>>
>> -  cost.cost += ivs->cand_cost;
>> +  cost+= ivs->cand_cost;
> Space.

Likewise.

> 
> This is pure refactoring, could you please make sure there is no falls
> out by simply comparing SPEC code generation/disassembly?  I am asking
> since cost computation is sensitive, last time we didn't catch a "*"
> character typo in dump info improvement patch.

I've just verified that code generation for SPECv6 is unchanged and I'm going
to install the patch.

Thanks,
Martin


> 
> Okay with above changes, unless somebody else has comment on the C++
> part (which I know very little about).
> 
> Thanks,
> bin
>>
>> Martin

>From 6379f77c195ed128c4886c07747bf9b8b678c75c Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 17 May 2016 13:52:11 +0200
Subject: [PATCH] IVOPTS: make comp_cost in a more c++ fashion.

gcc/ChangeLog:

2016-05-17  Martin Liska  

	* tree-ssa-loop-ivopts.c (comp_cost::infinite_cost_p): New
	function.
	(operator+): Likewise.
	(operator-): Likewise.
	(comp_cost::operator+=): Likewise.
	(comp_cost::operator-=): Likewise.
	(comp_cost::operator/=): Likewise.
	(comp_cost::operator*=): Likewise.
	(operator<): Likewise.
	(operator==): Likewise.
	(operator<=): Likewise.
	(new_cost): Remove.
	(infinite_cost_p): Likewise.
	(add_costs): Likewise.
	(sub_costs): Likewise.
	(compare_costs): Likewise.
	(set_group_iv_cost): Use the newly introduced functions.
	(get_address_cost): Likewise.
	(get_shiftadd_cost): Likewise.
	(force_expr_to_var_cost): Likewise.
	(split_address_cost): Likewise.
	(ptr_difference_cost): Likewise.
	(difference_cost): Likewise.
	(get_computation_cost_at): Likewise.
	(determine_group_iv_cost_generic): Likewise.
	(determine_group_iv_cost_address): Likewise.
	(determine_group_iv_cost_cond): Likewise.
	(autoinc_possible_for_pair): Likewise.
	(determine_group_iv_costs): Likewise.
	(cheaper_cost_pair): Likewise.
	(iv_ca_recount_cost): Likewise.
	(iv_ca_set_no_cp): Likewise.
	(iv_ca_set_cp): Likewise.
	(iv_ca_cost): Likewise.
	(iv_ca_new): Likewise.
	(iv_ca_dump): Likewise.
	(iv_ca_narrow): Likewise.
	(iv_ca_prune): Likewise.
	(iv_ca_replace): Likewise.
	(try_add_cand_for): Likewise.
	(try_improve_iv_set): Likewise.
	(find_optimal_iv_set): Likewise.
---
 gcc/tree-ssa-loop-ivopts.c | 380 -
 1 file changed, 235 insertions(+), 145 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 9ce6b64..83b9aaf 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -173,16 +173,171 @@ enum use_type
 /* Cost of a computation.  */
 struct comp_cost
 {
+  comp_cost (): cost (0), complexity (0), scratch (0)
+  {}
+
+  comp_cost (int cost, unsigned complexity, int scratch = 0)
+: cost (cost), complexity (complexity), scratch (scratch)
+  {}
+
+  /* Returns true if COST is infinite.  */
+  bool infinite_cost_p ();
+
+  /* Adds costs COST1 and COST2.  */
+  friend comp_cost operator+ (comp_cost cost1, comp_cost cost2);
+
+  /* Adds COST to the comp_cost.  */
+  comp_cost operator+= (comp_cost cost);
+
+  /* Adds constant C to this comp_cost.  */
+  comp_cost operator+= (HOST_WIDE_INT c);
+
+  /* Subtracts constant C to this comp_cost.  */
+  comp_cost operator-= (HOST_WIDE_INT c);
+
+  /* Divide the comp_cost by constant C.  */
+  comp_cost operator/= (HOST_WIDE_INT c);
+
+ 

Re: [PTX] malloc/realloc/free

2016-05-26 Thread Alexander Monakov
Hello,

On Thu, 26 May 2016, Nathan Sidwell wrote:

> This patch removes the malloc/realloc/free wrappers from libgcc.  I've
> implemented  them  completely in C and  put them in the ptx newlib port --
> where one expects such functions.

It appears that the new Newlib code doesn't free 'p' on 'realloc (p, 0)';
this is a regression from previous behavior.  How about the following fix?

Alexander

diff --git a/newlib/libc/machine/nvptx/realloc.c 
b/newlib/libc/machine/nvptx/realloc.c
index 634507f..5b6bb62 100644
--- a/newlib/libc/machine/nvptx/realloc.c
+++ b/newlib/libc/machine/nvptx/realloc.c
@@ -34,6 +34,12 @@
 void *
 realloc (void *old_ptr, size_t new_size)
 {
+  if (!new_size)
+{
+  free (old_ptr);
+  return 0;
+}
+
   void *new_ptr = malloc (new_size);
 
   if (old_ptr && new_ptr)



[PATCH v4] gcov: Runtime configurable destination output

2016-05-26 Thread Aaron Conole
The previous gcov behavior was to always output errors on the stderr channel.
This is fine for most uses, but some programs will require stderr to be
untouched by libgcov for certain tests. This change allows configuring
the gcov output via an environment variable which will be used to open
the appropriate file.
---
 libgcc/libgcov-driver-system.c | 49 --
 libgcc/libgcov-driver.c|  8 ++-
 2 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/libgcc/libgcov-driver-system.c b/libgcc/libgcov-driver-system.c
index 4e3b244..ff8a521 100644
--- a/libgcc/libgcov-driver-system.c
+++ b/libgcc/libgcov-driver-system.c
@@ -23,19 +23,64 @@ a copy of the GCC Runtime Library Exception along with this 
program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
-/* A utility function for outputing errors.  */
+/* Configured via the GCOV_ERROR_FILE environment variable;
+   it will either be stderr, or a file of the user's choosing.
+   Non-static to prevent multiple gcov-aware shared objects from
+   instantiating their own copies. */
+FILE *__gcov_error_file = NULL;
+
+/* A utility function to populate the __gcov_error_file pointer.
+   This should NOT be called outside of the gcov system driver code. */
+
+static FILE *
+get_gcov_error_file(void)
+{
+#if !IN_GCOV_TOOL
+  return stderr;
+#else
+  char *gcov_error_filename = getenv ("GCOV_ERROR_FILE");
+
+  if (gcov_error_filename)
+{
+  FILE *openfile = fopen (gcov_error_filename, "a");
+  if (openfile)
+__gcov_error_file = openfile;
+}
+  if (!__gcov_error_file)
+__gcov_error_file = stderr;
+  return __gcov_error_file;
+#endif
+}
+
+/* A utility function for outputting errors.  */
 
 static int __attribute__((format(printf, 1, 2)))
 gcov_error (const char *fmt, ...)
 {
   int ret;
   va_list argp;
+
+  if (!__gcov_error_file)
+__gcov_error_file = get_gcov_error_file ();
+
   va_start (argp, fmt);
-  ret = vfprintf (stderr, fmt, argp);
+  ret = vfprintf (__gcov_error_file, fmt, argp);
   va_end (argp);
   return ret;
 }
 
+#if !IN_GCOV_TOOL
+static void
+gcov_error_exit (void)
+{
+  if (__gcov_error_file && __gcov_error_file != stderr)
+{
+  fclose (__gcov_error_file);
+  __gcov_error_file = NULL;
+}
+}
+#endif
+
 /* Make sure path component of the given FILENAME exists, create
missing directories. FILENAME must be writable.
Returns zero on success, or -1 if an error occurred.  */
diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index 9c4eeca..d51397e 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -43,9 +43,13 @@ void __gcov_init (struct gcov_info *p __attribute__ 
((unused))) {}
 
 #ifdef L_gcov
 
-/* A utility function for outputing errors.  */
+/* A utility function for outputting errors.  */
 static int gcov_error (const char *, ...);
 
+#if !IN_GCOV_TOOL
+static void gcov_error_exit (void);
+#endif
+
 #include "gcov-io.c"
 
 struct gcov_fn_buffer
@@ -878,6 +882,8 @@ gcov_exit (void)
 __gcov_root.prev->next = __gcov_root.next;
   else
 __gcov_master.root = __gcov_root.next;
+
+  gcov_error_exit ();
 }
 
 /* Add a new object file onto the bb chain.  Invoked automatically
-- 
2.5.5



Re: [PATCH][1/3][ARM] Keep ctz expressions together until after reload

2016-05-26 Thread Joseph Myers
On Thu, 26 May 2016, Kyrill Tkachov wrote:

> the early RTL optimisers.  This better expresses the semantics of the 
> operation as a whole, since the RBIT operation is represented as an 
> UNSPEC anyway and so will not see the benefits of combine,

This doesn't affect your patch, but I think it would make sense for RBIT 
not to be an UNSPEC but to have architecture-independent RTL and built-in 
function - see bug 50481.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Improve *vec_concatv4si

2016-05-26 Thread Jakub Jelinek
Hi!

Both vpunpcklqdq and vmovhps are available with XMM EVEX args in AVX512VL.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-26  Jakub Jelinek  

* config/i386/sse.md (*vec_concatv4si): Use v=v,v instead of
x=x,x and v=v,m instead of x=x,m.

* gcc.target/i386/avx512vl-concatv4si-1.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-26 14:22:26.0 +0200
+++ gcc/config/i386/sse.md  2016-05-26 15:37:40.029856077 +0200
@@ -13386,10 +13386,10 @@ (define_insn "*vec_concatv2si"
(set_attr "mode" "TI,TI,DI,V4SF,SF,DI,DI")])
 
 (define_insn "*vec_concatv4si"
-  [(set (match_operand:V4SI 0 "register_operand"   "=x,x,x,x,x")
+  [(set (match_operand:V4SI 0 "register_operand"   "=x,v,x,x,v")
(vec_concat:V4SI
- (match_operand:V2SI 1 "register_operand" " 0,x,0,0,x")
- (match_operand:V2SI 2 "nonimmediate_operand" " x,x,x,m,m")))]
+ (match_operand:V2SI 1 "register_operand" " 0,v,0,0,v")
+ (match_operand:V2SI 2 "nonimmediate_operand" " x,v,x,m,m")))]
   "TARGET_SSE"
   "@
punpcklqdq\t{%2, %0|%0, %2}
@@ -13399,7 +13399,7 @@ (define_insn "*vec_concatv4si"
vmovhps\t{%2, %1, %0|%0, %1, %q2}"
   [(set_attr "isa" "sse2_noavx,avx,noavx,noavx,avx")
(set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
-   (set_attr "prefix" "orig,vex,orig,orig,vex")
+   (set_attr "prefix" "orig,maybe_evex,orig,orig,maybe_evex")
(set_attr "mode" "TI,TI,V4SF,V2SF,V2SF")])
 
 ;; movd instead of movq is required to handle broken assemblers.
--- gcc/testsuite/gcc.target/i386/avx512vl-concatv4si-1.c.jj2016-05-26 
15:45:13.978880684 +0200
+++ gcc/testsuite/gcc.target/i386/avx512vl-concatv4si-1.c   2016-05-26 
15:46:27.643911021 +0200
@@ -0,0 +1,23 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl" } */
+
+typedef int V __attribute__((vector_size (8)));
+typedef int W __attribute__((vector_size (16)));
+
+void
+f1 (V x, V y)
+{
+  register W c __asm ("xmm16");
+  c = (W) { x[0], x[1], x[0], x[1] };
+  asm volatile ("" : "+v" (c));
+}
+
+void
+f2 (V x, V *y)
+{
+  register W c __asm ("xmm16");
+  c = (W) { x[0], x[1], (*y)[0], (*y)[1] };
+  asm volatile ("" : "+v" (c));
+}
+
+/* { dg-final { scan-assembler-times "vpunpcklqdq\[^\n\r]*xmm16" 2 } } */

Jakub


[PATCH] Improve *vec_concatv2si_sse4_1

2016-05-26 Thread Jakub Jelinek
Hi!

This patch adds an avx512dq alternative (EVEX vpinsrd requires that) and
enables EVEX vmovd and vpunpckldq.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-26  Jakub Jelinek  

* config/i386/sse.md (*vec_concatv2si_sse4_1): Add avx512dq v=Yv,rm
alternative.  Change x=x,x alternative to v=Yv,Yv and x=rm,C
alternative to v=rm,C.

* gcc.target/i386/avx512dq-concatv2si-1.c: New test.
* gcc.target/i386/avx512vl-concatv2si-1.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-26 10:44:25.0 +0200
+++ gcc/config/i386/sse.md  2016-05-26 14:22:26.819313220 +0200
@@ -13339,29 +13339,30 @@ (define_split
 
 (define_insn "*vec_concatv2si_sse4_1"
   [(set (match_operand:V2SI 0 "register_operand"
- "=Yr,*x,x, Yr,*x,x, x, *y,*y")
+ "=Yr,*x, x, v,Yr,*x, v, v, *y,*y")
(vec_concat:V2SI
  (match_operand:SI 1 "nonimmediate_operand"
- "  0, 0,x,  0,0, x,rm,  0,rm")
+ "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
  (match_operand:SI 2 "vector_move_operand"
- " rm,rm,rm,Yr,*x,x, C,*ym, C")))]
+ " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
pinsrd\t{$1, %2, %0|%0, %2, 1}
pinsrd\t{$1, %2, %0|%0, %2, 1}
vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
+   vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
punpckldq\t{%2, %0|%0, %2}
punpckldq\t{%2, %0|%0, %2}
vpunpckldq\t{%2, %1, %0|%0, %1, %2}
%vmovd\t{%1, %0|%0, %1}
punpckldq\t{%2, %0|%0, %2}
movd\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
-   (set_attr "type" 
"sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
-   (set_attr "prefix_extra" "1,1,1,*,*,*,*,*,*")
-   (set_attr "length_immediate" "1,1,1,*,*,*,*,*,*")
-   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
-   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,DI,DI")])
+  [(set_attr "isa" "noavx,noavx,avx,avx512dq,noavx,noavx,avx,*,*,*")
+   (set_attr "type" 
"sselog,sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
+   (set_attr "prefix_extra" "1,1,1,1,*,*,*,*,*,*")
+   (set_attr "length_immediate" "1,1,1,1,*,*,*,*,*,*")
+   (set_attr "prefix" 
"orig,orig,vex,evex,orig,orig,maybe_evex,maybe_vex,orig,orig")
+   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,TI,DI,DI")])
 
 ;; ??? In theory we can match memory for the MMX alternative, but allowing
 ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
--- gcc/testsuite/gcc.target/i386/avx512dq-concatv2si-1.c.jj2016-05-26 
15:14:55.853786550 +0200
+++ gcc/testsuite/gcc.target/i386/avx512dq-concatv2si-1.c   2016-05-26 
15:13:57.0 +0200
@@ -0,0 +1,43 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mavx512dq -masm=att" } */
+
+typedef int V __attribute__((vector_size (8)));
+
+void
+f1 (int x, int y)
+{
+  register int a __asm ("xmm16");
+  register int b __asm ("xmm17");
+  register V c __asm ("xmm3");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  c = (V) { a, b };
+  asm volatile ("" : "+v" (c));
+}
+
+/* { dg-final { scan-assembler 
"vpunpckldq\[^\n\r]*%xmm17\[^\n\r]*%xmm16\[^\n\r]*%xmm3" } } */
+
+void
+f2 (int x, int y)
+{
+  register int a __asm ("xmm16");
+  register V c __asm ("xmm3");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  c = (V) { a, y };
+  asm volatile ("" : "+v" (c));
+}
+
+void
+f3 (int x, int *y)
+{
+  register int a __asm ("xmm16");
+  register V c __asm ("xmm3");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  c = (V) { a, *y };
+  asm volatile ("" : "+v" (c));
+}
+
+/* { dg-final { scan-assembler-times 
"vpinsrd\[^\n\r]*\\\$1\[^\n\r]*%xmm16\[^\n\r]*%xmm3" 2 } } */
--- gcc/testsuite/gcc.target/i386/avx512vl-concatv2si-1.c.jj2016-05-26 
15:15:11.921574803 +0200
+++ gcc/testsuite/gcc.target/i386/avx512vl-concatv2si-1.c   2016-05-26 
15:16:24.936612585 +0200
@@ -0,0 +1,43 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mno-avx512dq -masm=att" } */
+
+typedef int V __attribute__((vector_size (8)));
+
+void
+f1 (int x, int y)
+{
+  register int a __asm ("xmm16");
+  register int b __asm ("xmm17");
+  register V c __asm ("xmm3");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  c = (V) { a, b };
+  asm volatile ("" : "+v" (c));
+}
+
+/* { dg-final { scan-assembler 
"vpunpckldq\[^\n\r]*%xmm17\[^\n\r]*%xmm16\[^\n\r]*%xmm3" } } */
+
+void
+f2 (int x, int y)
+{
+  register int a __asm ("xmm16");
+  register V c __asm ("xmm3");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  c = (V) { a, y };
+  asm volatile ("" : "+v" (c));
+}
+
+void
+f3 (int x, int *y)
+{
+  register int a __asm ("xmm16");
+  register V c __asm ("xmm3");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  c = (V) { a, *y };
+  asm volatile ("" : "+v" (c));
+}
+
+/* { dg-final { scan-assembler-not 
"vpinsrd\[^\n\r]*\\\$1\[^\n\r]*%xmm16\[^\n\r]*%xmm3" } } */


Re: [PATCH], Add PowerPC ISA 3.0 min/max support

2016-05-26 Thread Michael Meissner
On Mon, May 09, 2016 at 09:31:43AM -0500, Segher Boessenkool wrote:
> On Thu, May 05, 2016 at 03:18:39PM -0400, Michael Meissner wrote:
> > At the present time, the code does not support comparisons involving >= and 
> > <=
> > unless the -ffast-math option is used. I hope eventually to support 
> > generating
> > these instructions without having -ffast-math used.
> > 
> > The underlying reason is when fast math is not used, we change the condition
> > from:
> > 
> > (ge:SI (reg:CCFP ) (const_int 0))
> > 
> > to:
> > 
> > (ior:SI (gt:SI (reg:CCFP ) (const_int 0))
> > (eq:SI (reg:CCFP ) (const_int 0)))
> > 
> > The machine independent portion of the compiler does not recognize this when
> > trying to generate conditional moves.
> > 
> > I would imagine the 'fix' is to generate GE/LE all of the time, and then 
> > have a
> > splitter that converts it to IOR of GT/EQ if it is not a conditional move 
> > with
> > ISA 3.0 instructions.
> 
> That sounds like a plan :-)

Well in the list of my priorities, it is low on the list.  Hopefully I or
somebody else will be able to get to it by the time GCC 7 freezes.

> 
> > -;; Return true if operand is MIN or MAX operator.
> > +;; Return true if operand is MIN or MAX operator.  Since this is only used 
> > to
> > +;; convert floating point MIN/MAX operations into FSEL on pre-vsx systems,
> > +;; don't include UMIN or UMAX.
> >  (define_predicate "min_max_operator"
> > -  (match_code "smin,smax,umin,umax"))
> > +  (match_code "smin,smax"))
> 
> Please name it signed_min_max_operator instead?

In this set of patches, I rewrote the define_split that called it to use
SMIN/SMAX code iterators, so I deleted the min_max_operator predicate.

> > --- gcc/config/rs6000/rs6000.c  
> > (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
> > (revision 235831)
> > +++ gcc/config/rs6000/rs6000.c  (.../gcc/config/rs6000) (working copy)
> > @@ -20534,6 +20534,12 @@ print_operand (FILE *file, rtx x, int co
> > "local dynamic TLS references");
> >return;
> >  
> > +case '@':
> > +  /* If -mpower9-minmax, use xsmaxcpdp instead of xsmaxdp.  */
> > +  if (TARGET_P9_MINMAX)
> > +   putc ('c', file);
> > +  return;
> 
> I don't think @ is very mnemonic, nor is this special enough for such
> a nice letter.

I just remove the %@ and instead did a C++ test for the appropriate string to
return.

> Form looking at how it is used, it seems you can make it part of code_attr
> minmax (and give that a better name, minmax_fp or such)?

No, you can't use code attributes, because it is based on the target switches,
not on the insn (i.e. VSX with -ffast-math uses the same insn as p9 min/max
without -ffast-math).

> > +  rs6000_emit_minmax (dest, (max_p) ? SMAX : SMIN, op0, op1);
> 
> Superfluous parentheses.

Ok.

> > +rs6000_emit_power9_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
> 
> Maybe put some "fp" in the name?  For "minmax" as well.
> 
> > +  if (swap_p)
> > +compare_rtx = gen_rtx_fmt_ee (code, CCFPmode, op1, op0);
> > +  else
> > +compare_rtx = gen_rtx_fmt_ee (code, CCFPmode, op0, op1);
> 
> if (swap_p)
>   std::swap (op0, op1);
> 
> and then just generate the one form?

I renamed the functions, and used std::swap earlier.

These patches have been bootstrap on a big endian power7 system (both 32-bit
and 64-bit available) and little endian power8 system with no regressions.  Are
these patches ok to install in the trunk?  After a burn-in period, are they ok
to install on the GCC 6.2 branch?

[gcc]
2016-05-26  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_emit_p9_fp_minmax): New function
for ISA 3.0 min/max support.
(rs6000_emit_p9_fp_cmove): New function for ISA 3.0 floating point
conditional move support.
(rs6000_emit_cmove): Call rs6000_emit_p9_fp_minmax and
rs6000_emit_p9_fp_cmove if the ISA 3.0 instructions are
available.
* config/rs6000/rs6000.md (SFDF2): New iterator to allow doing
conditional moves there the comparison type is different from move
type.
(fp_minmax): New code iterator for smin/smax.
(minmax): New code attributes for min/max.
(SMINMAX): Likewise.
(smax3): Combine min, max insns into one insn using the
fp_minmax code iterator.  Add support for ISA 3.0 min/max
instructions that don't need -ffast-math.
(s3): Likewise.
(smax3_vsx): Likewise.
(smin3): Likewise.
(s3_vsx): Likewise.
(smin3_vsx): Likewise.
(pre-VSX min/max splitters): Likewise.
(s3_fpr): Likewise.
(movsfcc): Rewrite floating point conditional moves to combine
SFmode/DFmode into a single insn.
(movcc): Likewise.
(movdfcc): Likewise.
(fselsfsf4): Combine FSEL cases into a single insn, using SFDF and
SFDF2 iterators to handle all combinations.
(fseldfsf4): Lik

Re: RFA: Generate normal DWARF DW_LOC descriptors for non integer mode pointers

2016-05-26 Thread Nick Clifton
Hi Jeff,

>>> I may be missing something, but isn't it the transition to an FP
>>> relative address rather than a SP relative address that's the problem
>>> here?
>>
>> Yes, I believe so.
>>
>>> Where does that happen?

I think that it happens in dwarf2out.c:based_loc_descr()  which
detects the use of the frame pointer and works out that it is going 
to be eliminated to the stack pointer:

  /* We only use "frame base" when we're sure we're talking about the
 post-prologue local stack frame.  We do this by *not* running
 register elimination until this point, and recognizing the special
 argument pointer and soft frame pointer rtx's.  */
  if (reg == arg_pointer_rtx || reg == frame_pointer_rtx)
{
  rtx elim = (ira_use_lra_p
  ? lra_eliminate_regs (reg, VOIDmode, NULL_RTX)
  : eliminate_regs (reg, VOIDmode, NULL_RTX));

  if (elim != reg)
.

The problem, I believe, is that based_loc_descr() is only called
from mem_loc_descriptor when the mode of the rtl concerned is an
MODE_INT.  For example:

case REG:
  if (GET_MODE_CLASS (mode) != MODE_INT
 [...]
  else
  if (REGNO (rtl) < FIRST_PSEUDO_REGISTER)
mem_loc_result = based_loc_descr (rtl, 0, VAR_INIT_STATUS_INITIALIZED);

or, (this is another one that I found whilst investigating this 
problem further):

  case PLUS:
plus:
  if (is_based_loc (rtl)
  && (GET_MODE_SIZE (mode) <= DWARF2_ADDR_SIZE
  || XEXP (rtl, 0) == arg_pointer_rtx
  || XEXP (rtl, 0) == frame_pointer_rtx)
  && GET_MODE_CLASS (mode) == MODE_INT)
mem_loc_result = based_loc_descr (XEXP (rtl, 0),
  INTVAL (XEXP (rtl, 1)),
  VAR_INIT_STATUS_INITIALIZED);
  else


There are quite a few places in mem_loc_descriptor where the code checks
for the mode being in the MODE_INT class.  I am not exactly sure why.  I
think that it might be that the programmer thought that any expression that
does not involve integer based arithmetic cannot be expressed in DWARF CFA
notation and so would have to be handled specially.  If I am correct,
then it seems to me that the proper fix would be to use SCALAR_INT_MODE_P
instead.

I tried out the extended patch (attached) and it gave even better GDB 
results for the MSP430 and still no regressions (GCC or GDB) for MSP430 or
x86_64.

Is this enough justification ?

Cheers
  Nick



dwarf2out.c.patch.2
Description: Unix manual page


Re: tuple move constructor

2016-05-26 Thread Jonathan Wakely

On 25/05/16 14:54 +0100, Jonathan Wakely wrote:

On 23/05/16 20:39 +0200, Marc Glisse wrote:

Ping

(re-attaching, I just added a one-line comment before the tag class 
as asked by Ville)


This is OK for trunk - thanks.


On second thoughts - does this change the passing conventions for
std::tuple if it gets a trivial move ctor?




Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2016-05-26 Thread Uros Bizjak
On Thu, May 26, 2016 at 1:46 AM, Joseph Myers  wrote:
> In ISO C99/C11, the ceil, floor, round and trunc functions may or may
> not raise the "inexact" exception for noninteger arguments.  Under TS
> 18661-1:2014, the C bindings for IEEE 754-2008, these functions are
> prohibited from raising "inexact", in line with the general rule that
> "inexact" is only when the mathematical infinite precision result of a
> function differs from the result after rounding to the target type.
>
> GCC has no option to select TS 18661 requirements for not raising
> "inexact" when expanding built-in versions of these functions inline.
> Furthermore, even given such requirements, the conditions on the x86
> insn patterns for these functions are unnecessarily restrictive.  I'd
> like to make the out-of-line glibc versions follow the TS 18661
> requirements; in the cases where this slows them down (the cases using
> x87 floating point), that makes it more important for inline versions
> to be used when the user does not care about "inexact".
>
> This patch fixes these issues.  A new option
> -fno-fp-int-builtin-inexact is added to request TS 18661 rules for
> these functions; the default -ffp-int-builtin-inexact reflects that
> such exceptions are allowed by C99 and C11.  (The intention is that if
> C2x incorporates TS 18661-1, then the default would change in C2x
> mode.)
>
> The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made
> unconditionally available (no longer depending on
> -funsafe-math-optimizations or -fno-trapping-math); "inexact" is
> correct for noninteger arguments to rint.  For floor, ceil and trunc,
> the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or
> -fno-trapping-math (they may raise "inexact" for noninteger
> arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that
> they do not raise "inexact" and so are OK unconditionally.
>
> Now, while there was no semantic reason for depending on
> -funsafe-math-optimizations, the insn patterns had such a dependence
> because of use of gen_truncxf2_i387_noop to truncate back to
> SFmode or DFmode after using frndint in XFmode.  In this case a no-op
> truncation is safe because rounding to integer always produces an
> exactly representable value (the same reason why IEEE semantics say it
> shouldn't produce "inexact") - but of course that insn pattern isn't
> safe because it would also match cases where the truncation is not in
> fact a no-op.  To allow frndint to be used for SFmode and DFmode
> without that unsafe pattern, the relevant frndint patterns are
> extended to SFmode and DFmode or new SFmode and DFmode patterns added,
> so that the frndint operation can be represented in RTL as an
> operation acting directly on SFmode or DFmode without the extension
> and the problematic truncation.
>
> A generic test of the new option is added, as well as x86-specific
> tests, both execution tests including the generic test with different
> x86 options and scan-assembler tests verifying that functions that
> should be inlined with different options are indeed inlined.
>
> I think other architectures are OK for TS 18661-1 semantics already.
> Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390
> use instructions that do not raise "inexact"; nvptx does not support
> floating-point exceptions.  (This does mean the -f option in fact only
> affects one architecture, but I think it should still be a -f option;
> it's logically architecture-independent and is expected to be affected
> by future -std options, so is similar to e.g. -fexcess-precision=,
> which also does nothing on most architectures but is implied by -std
> options.)
>
> Bootstrapped with no regressions on x86_64-pc-linux-gnu.  OK to
> commit?
>
> gcc:
> 2016-05-26  Joseph Myers  
>
> PR target/71276
> PR target/71277
> * common.opt (ffp-int-builtin-inexact): New option.
> * doc/invoke.texi (-fno-fp-int-builtin-inexact): Document.
> * config/i386/i386.md (rintxf2): Do not test
> flag_unsafe_math_optimizations.
> (rint2_frndint): New define_insn.
> (rint2): Do not test flag_unsafe_math_optimizations for 387
> or !flag_trapping_math for SSE.  Just use gen_rint2_frndint
> for 387 instead of extending and truncating.
> (frndintxf2_): Test flag_fp_int_builtin_inexact ||
> !flag_trapping_math instead of flag_unsafe_math_optimizations.
> Change to frndint2_.
> (frndintxf2__i387): Likewise.  Change to
> frndint2__i387.
> (xf2): Likewise.
> (2): Test flag_fp_int_builtin_inexact ||
> !flag_trapping_math instead of flag_unsafe_math_optimizations for
> x87.  Test TARGET_ROUND || !flag_trapping_math ||
> flag_fp_int_builtin_inexact instead of !flag_trapping_math for
> SSE.  Use ROUND_NO_EXC in constant operand of
> gen_sse4_1_round2.  Just use gen_frndint2_
> for 387 instead of extending and trun

Re: Fix ivopts estimates for internal functions

2016-05-26 Thread Richard Biener
On May 26, 2016 4:39:02 PM GMT+02:00, Richard Sandiford 
 wrote:
>tree-ssa-loop-ivopts.c:loop_body_includes_call was treating internal
>calls such as IFN_SQRT as clobbering all caller-saved registers, which
>I don't think is appropriate for any current internal function.
>
>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Richard.

>Thanks,
>Richard
>
>
>gcc/
>   * tree-ssa-loop-ivopts.c (loop_body_includes_call): Don't assume
>   that internal functions will clobber all caller-saved registers.
>
>diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>index 9ce6b64..23c9886 100644
>--- a/gcc/tree-ssa-loop-ivopts.c
>+++ b/gcc/tree-ssa-loop-ivopts.c
>@@ -7643,6 +7643,7 @@ loop_body_includes_call (basic_block *body,
>unsigned num_nodes)
>   {
>   gimple *stmt = gsi_stmt (gsi);
>   if (is_gimple_call (stmt)
>+  && !gimple_call_internal_p (stmt)
>   && !is_inexpensive_builtin (gimple_call_fndecl (stmt)))
> return true;
>   }




Re: [v3 PATCH] PR libstdc++/66338

2016-05-26 Thread Jonathan Wakely

On 26/05/16 01:07 +0300, Ville Voutilainen wrote:

On 25 May 2016 at 16:55, Jonathan Wakely  wrote:

On 24/05/16 19:49 +0300, Ville Voutilainen wrote:


On 24 May 2016 at 19:35, Ville Voutilainen 
wrote:


Slight tweak. The avoidance of _NotSameTuple wasn't quite correct for
the templates that
take const tuple<_UElements...>& or  tuple<_UElements...>&& instead of
const _UElements&...
or _UElements&&...

This patch introduces a new helper alias to cover those cases and
takes it into use where appropriate.
All tests pass, but I don't have any sane tests to verify this tweak.




..and I don't need to be quite so round-about in the new helper, it
can just check !is_same
instead of doing a nested _TC call. Changelog the same as in the previous
one.



OK for trunk - thanks.



Ack, I will do the mechanics in the forthcoming days, but here's a
question: what do we want to do
about this patch for the gcc6-branch? I fully appreciate being careful
and not committing to the branch right now,
but presumably this patch is a candidate for backporting to gcc6.


Yes, I think so.

There's a trade-off between giving it more time on the trunk for any
issues to arise before we backport it, or putting it on the branch now
and giving more time for problems to show up there because more people
are testing the branch.

I think it shouldn't cause any regressions, so can go on the branch
sooner rather than later.



Re: [patch] libstdc++/69703 ignore endianness in codecvt_utf8

2016-05-26 Thread Jonathan Wakely

On 26/05/16 14:02 +0200, Christophe Lyon wrote:

I've seen you've backported the main patch to the gcc-6 branch, you
forgot to add the follow-up "Add dg-require-filesystem-ts directive to
test".


And likewise for the gcc-5 branch.


Both fixed now, sorry about that, again.



[PING] Re: Updated autofdo bootstrap and testing patches

2016-05-26 Thread Andi Kleen
Andi Kleen  writes:

Ping!

> Here's an updated version of the patchkit to enable autofdo bootstrap
> and testing. It also fixes some autofdo issues. The last patch is more a 
> workaround
> (to make autofdo bootstrap not ICE), but may need a better fix.
>
> The main motivation is to get better test coverage for autofdo 
> and also an useful benchmark (speed of generated compiler) for it. 
> If you want the absolutely fastest compiler using profiledbootstrap
> is still the way to go.
>
> I addressed most of the earlier review comments. The python script
> is still python 2 for better compatibility with old systems.
>
> Ok to commit?
>
>


[PTX] more tests annotations

2016-05-26 Thread Nathan Sidwell
Applied the attached to markup some more tests that PTX either crashes on or 
doesn't apply strict IEEE semantics to.  In one case it's about debug info that 
we don't emit.


nathan
2016-05-26  Nathan Sidwell  

	* gcc.dg/20060410.c: Xfail on ptx.
	* gcc.dg/torture/c99-contract-1.c: Skip on ptx.
	* c-c++-common/torture/complex-sign-mixed-add.c: Skip on ptx -O0
	* c-c++-common/torture/complex-sign-mixed-sub.c: Skip on ptx -O0
	* gcc.c-torture/execute/pr68185.c: Skip on ptx -O0 & Os.
	* gcc.c-torture/execute/20020529-1.c: Skip on ptx -00.

Index: gcc.dg/20060410.c
===
--- gcc.dg/20060410.c	(revision 236702)
+++ gcc.dg/20060410.c	(working copy)
@@ -13,4 +13,4 @@ int bar (void)
 return ((struct foo *)0x1234)->i;
 }
 
-/* { dg-final { scan-assembler "foo" } } */
+/* { dg-final { scan-assembler "foo" { xfail nvptx-*-* } } } */
Index: gcc.dg/torture/c99-contract-1.c
===
--- gcc.dg/torture/c99-contract-1.c	(revision 236702)
+++ gcc.dg/torture/c99-contract-1.c	(working copy)
@@ -2,6 +2,7 @@
expressions.  */
 /* { dg-do run } */
 /* { dg-options "-std=c99 -pedantic-errors" } */
+/* { dg-skip-if "ptx only loosely follows IEEE" { "nvptx-*-*" } { "*" } { "" } } */
 
 extern void abort (void);
 extern void exit (int);
Index: c-c++-common/torture/complex-sign-mixed-add.c
===
--- c-c++-common/torture/complex-sign-mixed-add.c	(revision 236702)
+++ c-c++-common/torture/complex-sign-mixed-add.c	(working copy)
@@ -2,6 +2,7 @@
addition.  */
 /* { dg-do run } */
 /* { dg-options "-std=gnu99" { target c } } */
+/* { dg-skip-if "ptx can elide zero additions" { "nvptx-*-*" } { "-O0" } { "" } } */
 
 #include "complex-sign.h"
 
Index: c-c++-common/torture/complex-sign-mixed-sub.c
===
--- c-c++-common/torture/complex-sign-mixed-sub.c	(revision 236702)
+++ c-c++-common/torture/complex-sign-mixed-sub.c	(working copy)
@@ -2,6 +2,7 @@
subtraction.  */
 /* { dg-do run } */
 /* { dg-options "-std=gnu99" { target c } } */
+/* { dg-skip-if "ptx can elide zero additions" { "nvptx-*-*" } { "-O0" } { "" } } */
 
 #include "complex-sign.h"
 
Index: gcc.c-torture/execute/pr68185.c
===
--- gcc.c-torture/execute/pr68185.c	(revision 236702)
+++ gcc.c-torture/execute/pr68185.c	(working copy)
@@ -1,4 +1,4 @@
-/* { dg-xfail-if "ptxas crashes" { nvptx-*-* } { "-O0" } { "" } } */
+/* { dg-skip-if "ptxas crashes or executes incorrectly" { nvptx-*-* } { "-O0" "-Os" } { "" } } Reported 2015-11-20  */
 
 int a, b, d = 1, e, f, o, u, w = 1, z;
 short c, q, t;
Index: gcc.c-torture/execute/20020529-1.c
===
--- gcc.c-torture/execute/20020529-1.c	(revision 236702)
+++ gcc.c-torture/execute/20020529-1.c	(working copy)
@@ -12,6 +12,10 @@
forced a splitter through the output pattern "#", but there was no
matching splitter.  */
 
+/* The ptx assembler appears to clobber 'b' inside foo during the f1 call.
+   Reported to nvidia 2016-05-18.  */
+/* { dg-skip-if "PTX assembler bug" { nvptx-*-* } { "-O0" } { "" } } */
+
 struct xx
  {
int a;


[PATCH 1/3] config-list.mk: add KNOWN_BROKEN

2016-05-26 Thread David Malcolm
When using config-list.mk to build all configurations, it's useful
to filter out the configurations that are known to be broken.

This patch does so, adding a KNOWN_BROKEN variable.

contrib/ChangeLog:
* config-list.mk (LIST): Rename to...
(FULL_LIST): ...this.
(KNOWN_BROKEN): New variable.
(LIST): Redefine, in terms of FULL_LIST and KNOWN_BROKEN.
---
 contrib/config-list.mk | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index 8210352..edc3dc7 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -11,7 +11,7 @@ TEST=all-gcc
 # nohup nice make -j25 -l36 -f ../gcc/contrib/config-list.mk > make.out 2>&1 &
 #
 # v850e1-elf is rejected by config.sub
-LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
+FULL_LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   alpha-linux-gnu alpha-freebsd6 alpha-netbsd alpha-openbsd \
   alpha64-dec-vms alpha-dec-vms am33_2.0-linux \
   arc-elf32OPT-with-cpu=arc600 arc-elf32OPT-with-cpu=arc700 \
@@ -81,6 +81,14 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   xtensa-linux \
   i686-interix3OPT-enable-obsolete
 
+# Which of the above are known to currently not work?
+KNOWN_BROKEN=
+
+# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52551
+KNOWN_BROKEN += i686-interix3OPT-enable-obsolete
+
+LIST= $(filter-out $(KNOWN_BROKEN),$(FULL_LIST))
+
 LOGFILES = $(patsubst %,log/%-make.out,$(LIST))
 all: $(LOGFILES)
 config: $(LIST)
-- 
1.8.5.3



Re: [PTX] crt0

2016-05-26 Thread Nathan Sidwell

On 05/26/16 10:36, Nathan Sidwell wrote:


Ib.  PTX appears to accept
'.extern .weak ...', but that has the same semantics as '.extern ...', which
IMHO is a bug.  '.extern .weak' doesn't mean anything special.   Working on a
GCC patch to stop us emitting it.


Pah, I'd misremembered what we emitted.  Still working on a patch to stop 
weakrefs ...


nathan


[PATCH 3/3] config-list.mk: add OPT-enable-obsolete to 4 targets

2016-05-26 Thread David Malcolm
r233165 marked three deprecated rtems targets as obsolete.
r233887 marked mep-elf as obsolete.

Update config-list.mk to add OPT-enable-obsolete to these 4
targets.

contrib/ChangeLog:
* config-list.mk (FULL_LIST): Add OPT-enable-obsolete to
avr-rtems, h8300-rtems, m32r-rtems, mep-elf.
---
 contrib/config-list.mk | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index 2e22b3c..6242c87 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -37,11 +37,12 @@ FULL_LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   arc-linux-uclibcOPT-with-cpu=arc700 arceb-linux-uclibcOPT-with-cpu=arc700 \
   arm-wrs-vxworks arm-netbsdelf \
   arm-linux-androideabi arm-uclinux_eabi arm-eabi arm-rtems \
-  arm-symbianelf avr-rtems avr-elf \
+  arm-symbianelf avr-rtemsOPT-enable-obsolete avr-elf \
   bfin-elf bfin-uclinux bfin-linux-uclibc bfin-rtems bfin-openbsd \
   c6x-elf c6x-uclinux cr16-elf cris-elf cris-linux crisv32-elf crisv32-linux \
   epiphany-elf epiphany-elfOPT-with-stack-offset=16 fido-elf \
-  fr30-elf frv-elf frv-linux ft32-elf h8300-elf h8300-rtems hppa-linux-gnu \
+  fr30-elf frv-elf frv-linux ft32-elf h8300-elf \
+  h8300-rtemsOPT-enable-obsolete hppa-linux-gnu \
   hppa-linux-gnuOPT-enable-sjlj-exceptions=yes hppa64-linux-gnu \
   hppa2.0-hpux10.1 hppa64-hpux11.3 \
   hppa64-hpux11.0OPT-enable-sjlj-exceptions=yes hppa2.0-hpux11.9 \
@@ -55,10 +56,11 @@ FULL_LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   i686-wrs-vxworksae \
   i686-cygwinOPT-enable-threads=yes i686-mingw32crt ia64-elf \
   ia64-freebsd6 ia64-linux ia64-hpux ia64-hp-vms iq2000-elf lm32-elf \
-  lm32-rtems lm32-uclinux m32c-rtems m32c-elf m32r-elf m32rle-elf m32r-rtems \
+  lm32-rtems lm32-uclinux m32c-rtems m32c-elf m32r-elf m32rle-elf \
+  m32r-rtemsOPT-enable-obsolete \
   m32r-linux m32rle-linux m68k-elf m68k-netbsdelf \
   m68k-openbsd m68k-uclinux m68k-linux m68k-rtems \
-  mcore-elf mep-elf microblaze-linux microblaze-elf \
+  mcore-elf mep-elfOPT-enable-obsolete microblaze-linux microblaze-elf \
   mips-netbsd \
   mips64el-st-linux-gnu mips64octeon-linux mipsisa64r2-linux \
   mipsisa32r2-linux-gnu mipsisa64r2-sde-elf mipsisa32-elfoabi \
-- 
1.8.5.3



[PATCH 2/3] config-list.mk: add GCC_SRC_DIR

2016-05-26 Thread David Malcolm
config-list.mk currently requires the pwd to be in a sibling directory
of the source tree.  However, building using config-list.mk can consume
over 400GB of disk space in this build directory (e.g. my machine
successfully built cc1 for 206 configurations last night, consuming
442GB of space).  I've found it useful to be able to run config-list.mk
in an arbitrary build location (i.e. one with plenty of free space),
so this patch adds a GCC_SRC_DIR variable which can be overridden.

contrib/ChangeLog:
* config-list.mk (GCC_SRC_DIR): New variable.
(make-log-dir): Use GCC_SRC_DIR.
($(LIST)): Likewise.
---
 contrib/config-list.mk | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index edc3dc7..2e22b3c 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -3,13 +3,32 @@ host_options='--with-mpc=/opt/cfarm/mpc' # gcc10
 TEST=all-gcc
 # Make sure you have a recent enough gcc (with ada support) in your path so
 # that --enable-werror-always will work.
-# To use, create a sibling directory to the gcc sources and cd into this.
+# To use, create a build directory with plenty of free disk space - a build of
+# all configurations can take 450GB.
+# By default, this file assumes the build directory is in a sibling directory
+# to the gcc sources, but you can override GCC_SRC_DIR to specify where to
+# find them.  GCC_SRC_DIR is used in the directory below the build directory,
+# hence the two ".." in the default value; if overriding it, it's easiest to
+# supply an absolute path.
+GCC_SRC_DIR=../../gcc
+
 # Use -j / -l make arguments and nice to assure a smooth resource-efficient
 # load on the build machine, e.g. for 24 cores:
 # svn co svn://gcc.gnu.org/svn/gcc/branches/foo-branch gcc
 # mkdir multi-mk; cd multi-mk
 # nohup nice make -j25 -l36 -f ../gcc/contrib/config-list.mk > make.out 2>&1 &
 #
+# Alternatively, if building against an existing gcc source tree:
+#
+#   cd /somewhere/with/plenty/of/disk/space
+#   mkdir multi-mk; cd multi-mk
+#   nohup nice make \
+# -j25 -l36 \
+# -f /path/to/contrib/config-list.mk \
+# GCC_SRC_DIR=/path/to/gcc/source/tree \
+# > make.out 2>&1 &
+#
+
 # v850e1-elf is rejected by config.sub
 FULL_LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   alpha-linux-gnu alpha-freebsd6 alpha-netbsd alpha-openbsd \
@@ -99,17 +118,17 @@ show:
 
 empty=
 
-#Check for the presence of the MAINTAINERS file to make sure we are in a
-#suitable current working directory.
-make-log-dir: ../gcc/MAINTAINERS
-   mkdir log
+#Check for the presence of the MAINTAINERS file to make sure we've located
+#the gcc sources.
+make-log-dir: $(GCC_SRC_DIR)/MAINTAINERS
+   -mkdir log
 
 $(LIST): make-log-dir
-mkdir $@
(   
\
cd $@ &&
\
TGT=`echo $@ | awk 'BEGIN { FS = "OPT" }; { print $$1 }'` &&
\
-   TGT=`../../gcc/config.sub $$TGT` && 
\
+   TGT=`$(GCC_SRC_DIR)/config.sub $$TGT` &&
\
case $$TGT in   
\
*-*-darwin* | *-*-cygwin* | *-*-mingw* | *-*-aix*)  
\
ADDITIONAL_LANGUAGES="";
\
@@ -118,7 +137,7 @@ $(LIST): make-log-dir
ADDITIONAL_LANGUAGES=",go"; 
\
;;  
\
esac && 
\
-   ../../gcc/configure 
\
+   $(GCC_SRC_DIR)/configure
\
--target=$(subst SCRIPTS,`pwd`/../scripts/,$(subst 
OPT,$(empty) -,$@))  \
--enable-werror-always ${host_options}  
\
--enable-languages=all,ada$$ADDITIONAL_LANGUAGES;   
\
-- 
1.8.5.3



Fix ivopts estimates for internal functions

2016-05-26 Thread Richard Sandiford
tree-ssa-loop-ivopts.c:loop_body_includes_call was treating internal
calls such as IFN_SQRT as clobbering all caller-saved registers, which
I don't think is appropriate for any current internal function.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-ssa-loop-ivopts.c (loop_body_includes_call): Don't assume
that internal functions will clobber all caller-saved registers.

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 9ce6b64..23c9886 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -7643,6 +7643,7 @@ loop_body_includes_call (basic_block *body, unsigned 
num_nodes)
   {
gimple *stmt = gsi_stmt (gsi);
if (is_gimple_call (stmt)
+   && !gimple_call_internal_p (stmt)
&& !is_inexpensive_builtin (gimple_call_fndecl (stmt)))
  return true;
   }



Re: [PTX] crt0

2016-05-26 Thread Nathan Sidwell

On 05/25/16 11:49, Alexander Monakov wrote:

On Wed, 25 May 2016, Nathan Sidwell wrote:



With today's trunk and newlib, if I run


unresolved symbol __exitval_ptr


Is should work now, just pushed a patch to newlib.  PTX appears to accept 
'.extern .weak ...', but that has the same semantics as '.extern ...', which 
IMHO is a bug.  '.extern .weak' doesn't mean anything special.   Working on a 
GCC patch to stop us emitting it.





It is possible, but it seems it's enough to set up soft stacks under #ifdef
__nvptx_softstack__ in __main. Is that fine?


That's fine -- I'd forgotten there was a  #define to check.  The whole point of 
reimplementing crt0 in C was to make that kind of thing easier!


nathan



Remove word_mode hack for split bitfields

2016-05-26 Thread Richard Sandiford
This patch is effectively reverting a change from 1994.  The reason
I think it's a hack is that store_bit_field_1 is creating a subreg
reference to one word of a field even though it has already proven that
the field spills into the following word.  We then rely on the special
SUBREG handling in store_split_bit_field to ignore the extent of op0 and
look inside the SUBREG_REG regardless.  I don't see any reason why we can't
pass the original op0 to store_split_bit_field instead.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* expmed.c (store_bit_field_1): Do not restrict a multiword op0
to one word if the field is known to overlap other words.
(extract_bit_field_1): Likewise.
(store_split_bit_field): Remove compensating code.
(extract_split_bit_field): Likewise.

diff --git a/gcc/expmed.c b/gcc/expmed.c
index ec968da..6645a53 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -967,11 +967,7 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT 
bitsize,
  If the region spans two words, defer to store_split_bit_field.  */
   if (!MEM_P (op0) && GET_MODE_SIZE (GET_MODE (op0)) > UNITS_PER_WORD)
 {
-  op0 = simplify_gen_subreg (word_mode, op0, GET_MODE (op0),
-bitnum / BITS_PER_WORD * UNITS_PER_WORD);
-  gcc_assert (op0);
-  bitnum %= BITS_PER_WORD;
-  if (bitnum + bitsize > BITS_PER_WORD)
+  if (bitnum % BITS_PER_WORD + bitsize > BITS_PER_WORD)
{
  if (!fallback_p)
return false;
@@ -980,6 +976,10 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT 
bitsize,
 bitregion_end, value, reverse);
  return true;
}
+  op0 = simplify_gen_subreg (word_mode, op0, GET_MODE (op0),
+bitnum / BITS_PER_WORD * UNITS_PER_WORD);
+  gcc_assert (op0);
+  bitnum %= BITS_PER_WORD;
 }
 
   /* From here on we can assume that the field to be stored in fits
@@ -1383,25 +1383,8 @@ store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT 
bitsize,
bitsdone, NULL_RTX, 1, false);
}
 
-  /* If OP0 is a register, then handle OFFSET here.
-
-When handling multiword bitfields, extract_bit_field may pass
-down a word_mode SUBREG of a larger REG for a bitfield that actually
-crosses a word boundary.  Thus, for a SUBREG, we must find
-the current word starting from the base register.  */
-  if (GET_CODE (op0) == SUBREG)
-   {
- int word_offset = (SUBREG_BYTE (op0) / UNITS_PER_WORD)
-   + (offset * unit / BITS_PER_WORD);
- machine_mode sub_mode = GET_MODE (SUBREG_REG (op0));
- if (sub_mode != BLKmode && GET_MODE_SIZE (sub_mode) < UNITS_PER_WORD)
-   word = word_offset ? const0_rtx : op0;
- else
-   word = operand_subword_force (SUBREG_REG (op0), word_offset,
- GET_MODE (SUBREG_REG (op0)));
- offset &= BITS_PER_WORD / unit - 1;
-   }
-  else if (REG_P (op0))
+  /* If OP0 is a register, then handle OFFSET here.  */
+  if (SUBREG_P (op0) || REG_P (op0))
{
  machine_mode op0_mode = GET_MODE (op0);
  if (op0_mode != BLKmode && GET_MODE_SIZE (op0_mode) < UNITS_PER_WORD)
@@ -1787,10 +1770,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT 
bitsize,
  If the region spans two words, defer to extract_split_bit_field.  */
   if (!MEM_P (op0) && GET_MODE_SIZE (GET_MODE (op0)) > UNITS_PER_WORD)
 {
-  op0 = simplify_gen_subreg (word_mode, op0, GET_MODE (op0),
-bitnum / BITS_PER_WORD * UNITS_PER_WORD);
-  bitnum %= BITS_PER_WORD;
-  if (bitnum + bitsize > BITS_PER_WORD)
+  if (bitnum % BITS_PER_WORD + bitsize > BITS_PER_WORD)
{
  if (!fallback_p)
return NULL_RTX;
@@ -1798,6 +1778,9 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT 
bitsize,
reverse);
  return convert_extracted_bit_field (target, mode, tmode, unsignedp);
}
+  op0 = simplify_gen_subreg (word_mode, op0, GET_MODE (op0),
+bitnum / BITS_PER_WORD * UNITS_PER_WORD);
+  bitnum %= BITS_PER_WORD;
 }
 
   /* From here on we know the desired field is smaller than a word.
@@ -2109,20 +2092,8 @@ extract_split_bit_field (rtx op0, unsigned HOST_WIDE_INT 
bitsize,
   thissize = MIN (bitsize - bitsdone, BITS_PER_WORD);
   thissize = MIN (thissize, unit - thispos);
 
-  /* If OP0 is a register, then handle OFFSET here.
-
-When handling multiword bitfields, extract_bit_field may pass
-down a word_mode SUBREG of a larger REG for a bitfield that actually
-crosses a word boundary.  Thus, for a SUBREG, we must find
-the current word starting from the bas

[PTX] malloc/realloc/free

2016-05-26 Thread Nathan Sidwell
This patch removes the malloc/realloc/free wrappers from libgcc.  I've 
implemented  them  completely in C and  put them in the ptx newlib port -- where 
one expects such functions.


applied to trunk.

nathan
2016-05-26  Nathan Sidwell  

	* config/nvptx/free.asm: Delete.
	* config/nvptx/malloc.asm: Delete.
	* config/nvptx/realloc.c: Delete.
	* t-nvptx: Update.

Index: libgcc/config/nvptx/free.asm
===
--- libgcc/config/nvptx/free.asm	(revision 236701)
+++ libgcc/config/nvptx/free.asm	(nonexistent)
@@ -1,50 +0,0 @@
-// A wrapper around free to enable a realloc implementation.
-
-// Copyright (C) 2014-2016 Free Software Foundation, Inc.
-
-// This file is free software; you can redistribute it and/or modify it
-// under the terms of the GNU General Public License as published by the
-// Free Software Foundation; either version 3, or (at your option) any
-// later version.
-
-// This file is distributed in the hope that it will be useful, but
-// WITHOUT ANY WARRANTY; without even the implied warranty of
-// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-// General Public License for more details.
-
-// Under Section 7 of GPL version 3, you are granted additional
-// permissions described in the GCC Runtime Library Exception, version
-// 3.1, as published by the Free Software Foundation.
-
-// You should have received a copy of the GNU General Public License and
-// a copy of the GCC Runtime Library Exception along with this program;
-// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-// .
-
-.version3.1
-.target sm_30
-.address_size 64
-
-.extern .func free(.param.u64 %in_ar1);
-
-// BEGIN GLOBAL FUNCTION DEF: __nvptx_free
-.visible .func __nvptx_free(.param.u64 %in_ar1)
-{
-	.reg.u64 %ar1;
-	.reg.u64 %hr10;
-	.reg.u64 %r23;
-	.reg.pred %r25;
-	.reg.u64 %r27;
-	ld.param.u64 %ar1, [%in_ar1];
-		mov.u64	%r23, %ar1;
-		setp.eq.u64 %r25,%r23,0;
-	@%r25	bra	$L1;
-		add.u64	%r27, %r23, -8;
-	{
-		.param.u64 %out_arg0;
-		st.param.u64 [%out_arg0], %r27;
-		call free, (%out_arg0);
-	}
-$L1:
-	ret;
-	}
Index: libgcc/config/nvptx/malloc.asm
===
--- libgcc/config/nvptx/malloc.asm	(revision 236701)
+++ libgcc/config/nvptx/malloc.asm	(nonexistent)
@@ -1,55 +0,0 @@
-// A wrapper around malloc to enable a realloc implementation.
-
-// Copyright (C) 2014-2016 Free Software Foundation, Inc.
-
-// This file is free software; you can redistribute it and/or modify it
-// under the terms of the GNU General Public License as published by the
-// Free Software Foundation; either version 3, or (at your option) any
-// later version.
-
-// This file is distributed in the hope that it will be useful, but
-// WITHOUT ANY WARRANTY; without even the implied warranty of
-// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-// General Public License for more details.
-
-// Under Section 7 of GPL version 3, you are granted additional
-// permissions described in the GCC Runtime Library Exception, version
-// 3.1, as published by the Free Software Foundation.
-
-// You should have received a copy of the GNU General Public License and
-// a copy of the GCC Runtime Library Exception along with this program;
-// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-// .
-
-.version3.1
-.target sm_30
-.address_size 64
-
-.extern .func (.param.u64 %out_retval) malloc(.param.u64 %in_ar1);
-
-// BEGIN GLOBAL FUNCTION DEF: __nvptx_malloc
-.visible .func (.param.u64 %out_retval) __nvptx_malloc(.param.u64 %in_ar1)
-{
-.reg.u64 %ar1;
-.reg.u64 %retval;
-.reg.u64 %hr10;
-.reg.u64 %r26;
-.reg.u64 %r28;
-.reg.u64 %r29;
-.reg.u64 %r31;
-ld.param.u64 %ar1, [%in_ar1];
-		mov.u64 %r26, %ar1;
-		add.u64 %r28, %r26, 8;
-{
-		.param.u64 %retval_in;
-		.param.u64 %out_arg0;
-		st.param.u64 [%out_arg0], %r28;
-		call (%retval_in), malloc, (%out_arg0);
-		ld.param.u64%r29, [%retval_in];
-}
-		st.u64  [%r29], %r26;
-		add.u64 %r31, %r29, 8;
-		mov.u64 %retval, %r31;
-		st.param.u64[%out_retval], %retval;
-		ret;
-}
Index: libgcc/config/nvptx/realloc.c
===
--- libgcc/config/nvptx/realloc.c	(revision 236701)
+++ libgcc/config/nvptx/realloc.c	(nonexistent)
@@ -1,50 +0,0 @@
-/* Implement realloc with the help of the malloc and free wrappers.
-
-   Copyright (C) 2014-2016 Free Software Foundation, Inc.
-
-   This file is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by the
-   Free Software Foundation; either version 3, or (at your option) any
-   later version.
-
-   This file is distributed in the hope that it will be useful, b

C PATCH for comptypes handling of TYPE_REF_CAN_ALIAS_ALL

2016-05-26 Thread Marek Polacek
The C++ FE has been changed, as a part of c++/50800, in such a way that it no
longer considers types differentiating only in TYPE_REF_CAN_ALIAS_ALL
incompatible.  But the C FE still rejects the following testcase, so this patch
makes the C FE follow suit.  After all, the may_alias attribute is not
considered as "affects_type_identity".  This TYPE_REF_CAN_ALIAS_ALL check was
introduced back in 2004 (r90078), but since then we've gotten rid of them, only
comptypes_internal retained it.  I suspect the TYPE_MODE check might go too,
but I don't feel like changing that right now.

This arised when discussing struct sockaddr vs. may_alias issue in glibc.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-26  Marek Polacek  

* c-typeck.c (comptypes_internal): Don't check TYPE_REF_CAN_ALIAS_ALL.

* gcc.dg/attr-may-alias-2.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 1520c20..02f2cf8 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -1106,9 +1106,8 @@ comptypes_internal (const_tree type1, const_tree type2, 
bool *enum_and_int_p,
   switch (TREE_CODE (t1))
 {
 case POINTER_TYPE:
-  /* Do not remove mode or aliasing information.  */
-  if (TYPE_MODE (t1) != TYPE_MODE (t2)
- || TYPE_REF_CAN_ALIAS_ALL (t1) != TYPE_REF_CAN_ALIAS_ALL (t2))
+  /* Do not remove mode information.  */
+  if (TYPE_MODE (t1) != TYPE_MODE (t2))
break;
   val = (TREE_TYPE (t1) == TREE_TYPE (t2)
 ? 1 : comptypes_internal (TREE_TYPE (t1), TREE_TYPE (t2),
diff --git gcc/testsuite/gcc.dg/attr-may-alias-2.c 
gcc/testsuite/gcc.dg/attr-may-alias-2.c
index e69de29..892748e 100644
--- gcc/testsuite/gcc.dg/attr-may-alias-2.c
+++ gcc/testsuite/gcc.dg/attr-may-alias-2.c
@@ -0,0 +1,13 @@
+/* We used to reject this because types differentiating only in
+   TYPE_REF_CAN_ALIAS_ALL were deemed incompatible.  */
+/* { dg-do compile } */
+
+struct sockaddr;
+struct sockaddr *f (void);
+
+struct __attribute__((may_alias)) sockaddr { int j; };
+struct sockaddr *
+f (void)
+{
+  return (void *) 0;
+}

Marek


Re: [PATCH] PR71275 ira.c bb_loop_depth

2016-05-26 Thread Vladimir Makarov

On 05/26/2016 07:02 AM, Alan Modra wrote:

This fixes lack of bb_loop_depth info in some of the early parts of
ira, which has been the case for quite some time.  All active branches
return 0 from bb_loop_depth() in update_equiv_regs, but whether that
actually causes mis-optimization anywhere but trunk is yet to be
determined.

I played a little with trying to consolidate this loop_optimizer_init
call with one that occurs a little later, but ran into ICEs.  (We now
have four calls to loop_optimizer_init in ira.c.)

Bootstrapped and regression tested powerpc64le-linux and x86_64-linux.
OK to apply?


Yes.  Thank you, Alan.



Re: [AArch64][2/4] PR63596, honor tree-stdarg analysis result to improve VAARG codegen

2016-05-26 Thread James Greenhalgh
On Fri, May 06, 2016 at 04:00:28PM +0100, Jiong Wang wrote:
> This patch fixes PR63596.
> 
> There is no need to push/pop all arguments registers. We only need to
> push and pop those registers used. These use info is calculated by a
> dedicated vaarg optimization tree pass "tree-stdarg", the backend should
> honor it's analysis result.
> 
> For a simple testcase where vaarg declared but actually not used:
> 
> int
> f (int a, ...)
> {
>   return a;
> }
> 
> before this patch, we are generating:
> 
> f:
> sub sp, sp, #192
> stp x1, x2, [sp, 136]
> stp x3, x4, [sp, 152]
>   stp x5, x6, [sp, 168]
> str x7, [sp, 184]
> str q0, [sp]
> str q1, [sp, 16]
> str q2, [sp, 32]
> str q3, [sp, 48]
>   str q4, [sp, 64]
> str q5, [sp, 80]
> str q6, [sp, 96]
> str q7, [sp, 112]
>   add sp, sp, 192
>   ret
> 
> after this patch, it's optimized into:
> 
> f:
>   ret

Can't argue with that! Nice!

> OK for trunk?

OK.

Thanks,
James




Re: Fix for PR70909 in Libiberty Demangler (4)

2016-05-26 Thread Jason Merrill
It seems like in cases of malformed input we should return the input
again rather than produce garbage like "K".  Maybe catch this sort of situation in
d_lookup_template_parameter?

Jason


On Mon, May 2, 2016 at 11:21 AM, Marcel Böhme  wrote:
> Hi,
>
> This fixes several stack overflows due to infinite recursion in d_print_comp 
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70909).
>
> The method d_print_comp in cp-demangle.c recursively constructs the 
> d_print_info dpi from the demangle_component dc. The method 
> d_print_comp_inner traverses dc as a graph. Now, dc can be a graph with 
> cycles leading to infinite recursion in several distinct cases. The patch 
> uses the component stack to find whether the current node dc has itself as 
> ancestor more than once.
>
> Bootstrapped and regression tested on x86_64-pc-linux-gnu. Test cases added 
> to libiberty/testsuite/demangler-expected and checked PR70909 and related 
> stack overflows are resolved.
>
> Best regards,
> - Marcel
>
>
>
> Index: ChangeLog
> ===
> --- ChangeLog   (revision 235760)
> +++ ChangeLog   (working copy)
> @@ -1,3 +1,19 @@
> +2016-05-02  Marcel Böhme  
> +
> +   PR c++/70909
> +   PR c++/61460
> +   PR c++/68700
> +   PR c++/67738
> +   PR c++/68383
> +   PR c++/70517
> +   PR c++/61805
> +   PR c++/62279
> +   PR c++/67264
> +   * cp-demangle.c: Prevent infinite recursion when traversing cyclic
> +   demangle component.
> +   (d_print_comp): Return when demangle component has itself as ancistor
> +   more than once.
> +
>  2016-04-30  Oleg Endo  
>
> * configure: Remove SH5 support.
> Index: cp-demangle.c
> ===
> --- cp-demangle.c   (revision 235760)
> +++ cp-demangle.c   (working copy)
> @@ -5436,6 +5436,24 @@ d_print_comp (struct d_print_info *dpi, int option
>  {
>struct d_component_stack self;
>
> +  self.parent = dpi->component_stack;
> +
> +  while (self.parent)
> +{
> +  self.dc = self.parent->dc;
> +  self.parent = self.parent->parent;
> +  if (dc != NULL && self.dc == dc)
> +   {
> + while (self.parent)
> +   {
> + self.dc = self.parent->dc;
> + self.parent = self.parent->parent;
> + if (self.dc == dc)
> +   return;
> +   }
> +   }
> +}
> +
>self.dc = dc;
>self.parent = dpi->component_stack;
>dpi->component_stack = &self;
> Index: testsuite/demangle-expected
> ===
> --- testsuite/demangle-expected (revision 235760)
> +++ testsuite/demangle-expected (working copy)
> @@ -4431,3 +4431,69 @@ _Q.__0
>
>  _Q10-__9cafebabe.
>  cafebabe.::-(void)
> +#
> +# Test demangler crash PR62279
> +
> +_ZN5Utils9transformIPN15ProjectExplorer13BuildStepListEZNKS1_18BuildConfiguration14knownStepListsEvEUlS3_E_EE5QListIDTclfp0_cvT__RKS6_IS7_ET0_
> +QList 
> Utils::transform ProjectExplorer::BuildConfiguration::knownStepLists() 
> const::{lambda(ProjectExplorer::BuildStepList*)#1}>(ProjectExplorer::BuildConfiguration::knownStepLists()
>  const::{lambda(ProjectExplorer::BuildStepList*)#1} const&, 
> ProjectExplorer::BuildConfiguration::knownStepLists() 
> const::{lambda(ProjectExplorer::BuildStepList*)#1})
> +#
> +
> +_ZSt7forwardIKSaINSt6thread5_ImplISt12_Bind_simpleIFZN6WIM_DL5Utils9AsyncTaskC4IMNS3_8Hardware12FpgaWatchdogEKFvvEIPS8_EEEibOT_DpOT0_EUlvE_vEESD_RNSt16remove_referenceISC_E4typeE
> +std::allocator  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, void 
> (WIM_DL::Hardware::FpgaWatchdog::*&&)() const, 
> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > const&& 
> std::forward  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, 
> std::allocator  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, void 
> (WIM_DL::Hardware::FpgaWatchdog::*&&)() const, 
> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > const&&, 
> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > 
> const>(std::remove_reference  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, void 
> (WIM_DL::Hardware::FpgaWatchdog::*&&)() const, 
> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > const>::type&)
> +#
> +# Test demangler crash PR61805
> +
> +_ZNK5niven5ColorIfLi4EEdvIfEENSt9enable_ifIXsrSt13is_arithmeticIT_E5valueEKNS0_IDTmlcvS5__Ecvf_EELi44typeES5_
> +std::enable_if::value, niven::Color (((float)())*((float)())), 4> const>::type niven::Color 4>::operator/(float) const
> +#
> +# Test recursion PR70517
> +
> +_ZSt4moveIRZN11tconcurrent6futureIvE4thenIZ5awaitIS2_EDaOT_EUlRKS6_E_EENS1_INSt5decayIDTclfp_defpTEEE4typeEEES7_EUlvE_EONSt16remove_referenceIS6_E4typeES7_
> +std::remove_reference ({

Re: [Patch] Disable text mode translation in ada for Cygwin

2016-05-26 Thread Arnaud Charlet
> Text mode translation should not be done for Cygwin, especially since it
> does not
> support unicode setmode calls. This also fixes ada builds for Cygwin.
> 
> OK for trunk?

OK, thanks.

> gcc/ada/ChangeLog:
>   * sysdep.c (__gnat_set_binary_mode, __gnat_set_text_mode,
>   __gnat_set_mode): Disable text mode translation, Cygwin should
>   follow *Nix behavior. This also fixes build failures on Cywgin.


Re: C PATCH to add -Wswitch-unreachable (PR c/49859)

2016-05-26 Thread Jason Merrill
On Thu, May 26, 2016 at 3:06 AM, Marek Polacek  wrote:
> On Wed, May 25, 2016 at 03:21:00PM -0600, Martin Sebor wrote:
>> I see.  Thanks for clarifying that.  No warning on a declaration
>> alone makes sense in the case above but it has the unfortunate
>> effect of suppressing the warning when the declaration is followed
>> by a statement, such as in:
>>
>>   void f (int*, int);
>>
>>   void g (int i)
>>   {
>> switch (i) {
>>   int a [3];
>>   memset (a, 0, sizeof a);
>>
>>   default:
>>   f (a, 3);
>> }
>>   }
>
> Ah, then I think we should probably look into GIMPLE_TRY, using
> gimple_try_eval, too.

It might also make sense to distinguish between gimple_try_kind of
GIMPLE_TRY_CATCH (a user-written try block) and GIMPLE_TRY_FINALLY (a
compiler-generated cleanup).

Jason


[PATCH, libstdc++/testsuite] 29_atomics/atomic/65913.cc: require atomic-builtins rather than specific target

2016-05-26 Thread Thomas Preudhomme
[Sorry for the large recipient list, I wasn't sure who of C++ and x86 
maintainers should approve this]

Hi,

29_atomics/atomic/65913.cc test in libstdc++ is a runtime test that only rely 
on atomic and gnu++11 support. Therefore I propose to require atomic-builtins 
instead of an x86 (32 or 64 bits) target.

ChangeLog entry is as follows:

2016-05-19  Thomas Preud'homme  

* testsuite/29_atomics/atomic/65913.cc: Require atomic-builtins rather
than specific target.


Patch is in attachment.


Is this ok for trunk?

Best regards,

Thomasdiff --git a/libstdc++-v3/testsuite/29_atomics/atomic/65913.cc b/libstdc++-v3/testsuite/29_atomics/atomic/65913.cc
index 713ef42d03cb9f7c1e691995df2d0943e24036c3..32a58ec991b41c74aafab84deed2c543d72505f5 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/65913.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/65913.cc
@@ -15,7 +15,8 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-do run { target x86_64-*-linux* powerpc*-*-linux* } }
+// { dg-do run }
+// { dg-require-atomic-builtins "" }
 // { dg-options "-std=gnu++11 -O0" }
 
 #include 


Re: [PATCH] Help PR70729, shuffle LIM and PRE

2016-05-26 Thread Christophe Lyon
On 18 May 2016 at 12:55, Richard Biener  wrote:
>
> The following patch moves LIM before PRE to allow it to cleanup CSE
> (and copyprop) opportunities LIM exposes.  It also moves the DCE done
> in loop before the loop pipeline as otherwise it is no longer executed
> uncoditionally at this point (since we have the no_loop pipeline).
>
> The patch requires some testsuite adjustments such as cope with LIM now
> running before PRE and thus disabling the former and to adjust
> for better optimization we now do in the two testcases with redundant
> stores where store motion enables sinking to sink all interesting code
> out of the innermost loop.
>
> It also requires the LIM PHI hoisting cost adjustment patch I am
> testing separately.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu (with testsuite
> fallout resulting in the following adjustments).
>
> I'm going to re-test before committing.
>
> Richard.

Hi Richard,

I've noticed that this patch introduces a regression on aarch64/arm targets:
gcc.dg/tree-ssa/scev-4.c scan-tree-dump-times optimized "&a" 1

because '&a' now appears twice in the log.

Actually, this is the only regression on aarch64, but on arm I've also
noticed regressions on scev-5 and scev-3 (for armv5t for the latter)

Christophe.


>
> 2016-05-18  Richard Biener  
>
> PR tree-optimization/70729
> * passes.def: Move LIM pass before PRE.  Remove no longer
> required copyprop and move first DCE out of the loop pipeline.
>
> * gcc.dg/autopar/outer-6.c: Adjust to avoid redundant store.
> * gcc.dg/graphite/scop-18.c: Likewise.
> * gcc.dg/pr41783.c: Disable LIM.
> * gcc.dg/tree-ssa/loadpre10.c: Likewise.
> * gcc.dg/tree-ssa/loadpre23.c: Likewise.
> * gcc.dg/tree-ssa/loadpre24.c: Likewise.
> * gcc.dg/tree-ssa/loadpre25.c: Likewise.
> * gcc.dg/tree-ssa/loadpre4.c: Likewise.
> * gcc.dg/tree-ssa/loadpre8.c: Likewise.
> * gcc.dg/tree-ssa/ssa-pre-16.c: Likewise.
> * gcc.dg/tree-ssa/ssa-pre-18.c: Likewise.
> * gcc.dg/tree-ssa/ssa-pre-20.c: Likewise.
> * gcc.dg/tree-ssa/ssa-pre-3.c: Likewise.
> * gfortran.dg/pr42108.f90: Likewise.
>
> Index: trunk/gcc/passes.def
> ===
> --- trunk.orig/gcc/passes.def   2016-05-18 11:46:56.518134310 +0200
> +++ trunk/gcc/passes.def2016-05-18 11:47:16.006355920 +0200
> @@ -243,12 +243,14 @@ along with GCC; see the file COPYING3.
>NEXT_PASS (pass_cse_sincos);
>NEXT_PASS (pass_optimize_bswap);
>NEXT_PASS (pass_laddress);
> +  NEXT_PASS (pass_lim);
>NEXT_PASS (pass_split_crit_edges);
>NEXT_PASS (pass_pre);
>NEXT_PASS (pass_sink_code);
>NEXT_PASS (pass_sancov);
>NEXT_PASS (pass_asan);
>NEXT_PASS (pass_tsan);
> +  NEXT_PASS (pass_dce);
>/* Pass group that runs when 1) enabled, 2) there are loops
>  in the function.  Make sure to run pass_fix_loops before
>  to discover/remove loops before running the gate function
> @@ -257,9 +259,6 @@ along with GCC; see the file COPYING3.
>NEXT_PASS (pass_tree_loop);
>PUSH_INSERT_PASSES_WITHIN (pass_tree_loop)
>   NEXT_PASS (pass_tree_loop_init);
> - NEXT_PASS (pass_lim);
> - NEXT_PASS (pass_copy_prop);
> - NEXT_PASS (pass_dce);
>   NEXT_PASS (pass_tree_unswitch);
>   NEXT_PASS (pass_scev_cprop);
>   NEXT_PASS (pass_record_bounds);
> Index: trunk/gcc/testsuite/gcc.dg/autopar/outer-6.c
> ===
> --- trunk.orig/gcc/testsuite/gcc.dg/autopar/outer-6.c   2016-01-20 
> 15:36:51.477802338 +0100
> +++ trunk/gcc/testsuite/gcc.dg/autopar/outer-6.c2016-05-18 
> 12:40:29.342665450 +0200
> @@ -24,7 +24,7 @@ void parloop (int N)
>for (i = 0; i < N; i++)
>{
>  for (j = 0; j < N; j++)
> -  y[i]=x[i][j];
> +  y[i]+=x[i][j];
>  sum += y[i];
>}
>g_sum = sum;
> Index: trunk/gcc/testsuite/gcc.dg/graphite/scop-18.c
> ===
> --- trunk.orig/gcc/testsuite/gcc.dg/graphite/scop-18.c  2015-09-14 
> 10:21:31.364089947 +0200
> +++ trunk/gcc/testsuite/gcc.dg/graphite/scop-18.c   2016-05-18 
> 12:38:35.673369299 +0200
> @@ -13,13 +13,13 @@ void test (void)
>for (i = 0; i < 24; i++)
>  for (j = 0; j < 24; j++)
>for (k = 0; k < 24; k++)
> -A[i][j] = B[i][k] * C[k][j];
> +A[i][j] += B[i][k] * C[k][j];
>
>/* These loops should still be strip mined.  */
>for (i = 0; i < 1000; i++)
>  for (j = 0; j < 1000; j++)
>for (k = 0; k < 1000; k++)
> -A[i][j] = B[i][k] * C[k][j];
> +A[i][j] += B[i][k] * C[k][j];
>  }
>
>  /* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
> Index: trunk/gcc/testsuite/gcc.dg/pr41783.c
> 

[Patch] Disable text mode translation in ada for Cygwin

2016-05-26 Thread JonY
Text mode translation should not be done for Cygwin, especially since it does 
not
support unicode setmode calls. This also fixes ada builds for Cygwin.

OK for trunk?

gcc/ada/ChangeLog:
* sysdep.c (__gnat_set_binary_mode, __gnat_set_text_mode,
__gnat_set_mode): Disable text mode translation, Cygwin should
follow *Nix behavior. This also fixes build failures on Cywgin.

diff --git a/gcc/ada/sysdep.c b/gcc/ada/sysdep.c
index 465007e..aeaed6d 100644
--- a/gcc/ada/sysdep.c
+++ b/gcc/ada/sysdep.c
@@ -128,15 +128,15 @@ extern struct tm *localtime_r(const time_t *, struct tm 
*);

 #if defined (WINNT) || defined (__CYGWIN__)

+#if defined (__CYGWIN__)
+const char __gnat_text_translation_required = 0;
+void __gnat_set_binary_mode (int handle) {}
+void __gnat_set_text_mode (int handle) {}
+void __gnat_set_mode(int handle, int mode) {}
+#else
 const char __gnat_text_translation_required = 1;

-#ifdef __CYGWIN__
-#define WIN_SETMODE setmode
-#include 
-#else
 #define WIN_SETMODE _setmode
-#endif
-
 void
 __gnat_set_binary_mode (int handle)
 {
@@ -172,6 +172,8 @@ __gnat_set_mode (int handle, int mode)
  }
 }

+#endif __CYGWIN__
+
 #ifdef __CYGWIN__

 char *



signature.asc
Description: OpenPGP digital signature


Re: [patch] libstdc++/69703 ignore endianness in codecvt_utf8

2016-05-26 Thread Christophe Lyon
On 26 May 2016 at 13:58, Christophe Lyon  wrote:
> On 5 May 2016 at 12:12, Jonathan Wakely  wrote:
>> On 04/05/16 17:19 +0100, Andre Vieira (lists) wrote:
>>>
>>> On 20/04/16 18:40, Jonathan Wakely wrote:

 On 19/04/16 19:07 +0100, Jonathan Wakely wrote:
>
> This was reported as a bug in the Filesystem library, but it's
> actually a problem in the codecvt_utf8 facet that it uses.


 The fix had a silly typo meaning it didn't work for big endian
 targets, which was revealed by the improved tests I added.

 Tested x86_64-linux and powerpc64-linux, committed to trunk.


>>> Hi Jonathan,
>>>
>>> We are seeing experimental/filesystem/path/native/string.cc fail on
>>> baremetal targets. I'm guessing this is missing a
>>> 'dg-require-filesystem-ts', as seen on other tests like
>>> experimental/filesystem/path/modifiers/swap.cc.
>>>
>>> Cheers,
>>> Andre
>>
>>
>> Sorry about that, I've committed the missing directive.
>>
>>
> Hi,
>
> I've seen you've backported the main patch to the gcc-6 branch, you
> forgot to add the follow-up "Add dg-require-filesystem-ts directive to
> test".
>
And likewise for the gcc-5 branch.

> Christophe


Re: [patch] libstdc++/69703 ignore endianness in codecvt_utf8

2016-05-26 Thread Christophe Lyon
On 5 May 2016 at 12:12, Jonathan Wakely  wrote:
> On 04/05/16 17:19 +0100, Andre Vieira (lists) wrote:
>>
>> On 20/04/16 18:40, Jonathan Wakely wrote:
>>>
>>> On 19/04/16 19:07 +0100, Jonathan Wakely wrote:

 This was reported as a bug in the Filesystem library, but it's
 actually a problem in the codecvt_utf8 facet that it uses.
>>>
>>>
>>> The fix had a silly typo meaning it didn't work for big endian
>>> targets, which was revealed by the improved tests I added.
>>>
>>> Tested x86_64-linux and powerpc64-linux, committed to trunk.
>>>
>>>
>> Hi Jonathan,
>>
>> We are seeing experimental/filesystem/path/native/string.cc fail on
>> baremetal targets. I'm guessing this is missing a
>> 'dg-require-filesystem-ts', as seen on other tests like
>> experimental/filesystem/path/modifiers/swap.cc.
>>
>> Cheers,
>> Andre
>
>
> Sorry about that, I've committed the missing directive.
>
>
Hi,

I've seen you've backported the main patch to the gcc-6 branch, you
forgot to add the follow-up "Add dg-require-filesystem-ts directive to
test".

Christophe


Re: [PATCH][AArch64] Adjust SIMD integer preference

2016-05-26 Thread Wilco Dijkstra
James Greenhalgh wrote:
> I really don't like [1][2][3] this technique of attempting to work around
> register allocator issues using the disparaging mechanisms.

I don't see the issue as it is a standard mechanism to describe higher cost
to the register allocator. On the other had the use of '*' is almost always
incorrect, leading to bad allocations and inefficient code.

> So doing this would be in line with other move operations, but is still
> a workaround to deeper issues.
>
> The patch is OK, on that justification, but I'd like not to set a
> precedent for using "?" rather than looking to find the underlying issue.

The underlying issue is well known - the register allocator cost code assumes
all alternatives in an instruction have equal cost.

If we used a distinct scalar vector type for scalar vectors (rather than 
reusing SI/DI) 
then we could drop all of the '*' and likely most of the '?' from the md 
description.

Wilco



Re: [AArch64][1/4] Enable tree-stdarg pass for AArch64 by defining counter fields

2016-05-26 Thread James Greenhalgh
On Fri, May 06, 2016 at 04:00:13PM +0100, Jiong Wang wrote:
> This patch initialize va_list_gpr_counter_field and
> va_list_fpr_counter_field properly for AArch64 backend that tree-stdarg
> pass will be enabled.
> 
> The "required register" analysis is largely target independent, but the
> user might operate on the inner offset field in vaarg structure directly,
> for example:
> 
>   d = __builtin_va_arg (ap, int);
>   ap.__gr_offs += 0x20;
>   e = __builtin_va_arg (ap, int);
> 
> in which case tree-stdarg require us to tell him what's the backend offset
> field inside vaarg structure that it can still figure out we actually need
> to save 6 general registers.
> 
> ok for upstream?

I have a small comment issue for you to fix, otherwise this is OK.

> 2016-05-06  Jiong Wang  
> gcc/
>   * config/aarch64/aarch64.c (aarch64_build_builtin_va_list): Initialize
>   va_list_gpr_counter_field and va_list_fpr_counter_field.
> 
> gcc/testsuite/
>   * gcc.dg/tree-ssa/stdarg-2.c: Enable all testcases for AArch64.
>   * gcc.dg/tree-ssa/stdarg-3.c: Likewise.
>   * gcc.dg/tree-ssa/stdarg-4.c: Likewise.
>   * gcc.dg/tree-ssa/stdarg-5.c: Likewise.
>   * gcc.dg/tree-ssa/stdarg-6.c: Likewise.
> 
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 9995494..aff4a95 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9463,6 +9463,13 @@ aarch64_build_builtin_va_list (void)
>   FIELD_DECL, get_identifier ("__vr_offs"),
>   integer_type_node);
>  
> +  /* Tell tree-stdarg pass what's our internal offset fields.

This doesn't read quite right, how about something like:

  "Tell tree-stdarg pass about our internal offset fields."

Thanks,
James



[PATCH] PR71275 ira.c bb_loop_depth

2016-05-26 Thread Alan Modra
This fixes lack of bb_loop_depth info in some of the early parts of
ira, which has been the case for quite some time.  All active branches
return 0 from bb_loop_depth() in update_equiv_regs, but whether that
actually causes mis-optimization anywhere but trunk is yet to be
determined.

I played a little with trying to consolidate this loop_optimizer_init
call with one that occurs a little later, but ran into ICEs.  (We now
have four calls to loop_optimizer_init in ira.c.)

Bootstrapped and regression tested powerpc64le-linux and x86_64-linux.
OK to apply?

PR rtl-optimization/71275
* ira.c (ira): Call loop_optimizer_init to set up bb_loop_depth
for update_equiv_regs and combine_and_move_insns.

diff --git a/gcc/ira.c b/gcc/ira.c
index 55b4bd7..1b269ea 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5171,6 +5171,7 @@ ira (FILE *f)
 ira_set_pseudo_classes (true, ira_dump_file);
 
   init_alias_analysis ();
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
   reg_equiv = XCNEWVEC (struct equivalence, max_reg_num ());
   update_equiv_regs ();
 
@@ -5186,6 +5187,7 @@ ira (FILE *f)
   if (optimize)
 add_store_equivs ();
 
+  loop_optimizer_finalize ();
   end_alias_analysis ();
   free (reg_equiv);
 

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH][AArch64] Adjust SIMD integer preference

2016-05-26 Thread James Greenhalgh
On Fri, Apr 22, 2016 at 03:35:42PM +, Wilco Dijkstra wrote:
> SIMD operations like combine prefer to have their operands in FP registers,
> so increase the cost of integer registers slightly to avoid unnecessary
> int<->FP moves. This improves register allocation of scalar SIMD operations.

I really don't like [1][2][3] this technique of attempting to work around
register allocator issues using the disparaging mechanisms.

If we take this, our set of patterns using disparaging becomes:

  aarch64_combinez
  aarch64_combinez_be
  aarch64_simd_mov
  movhf_aarch64
  movsf_aarch64
  movdf_aarch64
  movtf_aarch64
  xor_one_cmpl

So doing this would be in line with other move operations, but is still
a workaround to deeper issues.

The patch is OK, on that justification, but I'd like not to set a
precedent for using "?" rather than looking to find the underlying issue.

Thanks,
James

---
[1] Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
https://gcc.gnu.org/ml/gcc-patches/2014-03/msg01627.html
[2] Re: [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01332.html
[3] Re: [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for 
BIC,ORN,EON
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01278.html

> 
> OK for trunk?
> 
> ChangeLog:
> 2016-04-22  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64-simd.md (aarch64_combinez):
>   Add ? to integer variant.
>   (aarch64_combinez_be): Likewise.
> 
> --
> 



Re: [PATCH][AArch64] Tie operand 1 to operand 0 in AESMC pattern when AES/AESMC fusion is enabled

2016-05-26 Thread James Greenhalgh
On Fri, May 20, 2016 at 11:04:32AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> The recent -frename-registers change exposed a deficiency in the way we fuse
> AESE/AESMC instruction pairs in aarch64.
> 
> Basically we want to enforce:
> AESE Vn, _
> AESMC Vn, Vn
> 
> to enable the fusion, but regrename comes along and renames the output Vn
> register in AESMC to something else, killing the fusion in the hardware.
> 
> The solution in this patch is to add an alternative that ties the input and
> output registers in the AESMC pattern and enable that alternative when the
> fusion is enabled.
> 
> With this patch I've confirmed that the above preferred register sequence is
> kept even with -frename-registers when tuning for a cpu that enables the
> fusion and that the chain is broken by regrename otherwise and have seen the
> appropriate improvement in a proprietary benchmark (that I cannot name) that
> exercises this sequence.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-05-20  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function.
> * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare
> prototype.
> * config/aarch64/aarch64-simd.md (aarch64_crypto_aesv16qi):
> Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx);
>  bool aarch64_expand_movmem (rtx *);
>  bool aarch64_float_const_zero_rtx_p (rtx);
>  bool aarch64_function_arg_regno_p (unsigned);
> +bool aarch64_fusion_enabled_p (unsigned int);

This argument type should be "enum aarch64_fusion_pairs".

>  bool aarch64_gen_movmemqi (rtx *);
>  bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
>  bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
> *curr)
>return false;
>  }
>  
> +/* Return true iff the instruction fusion described by OP is enabled.  */
> +
> +bool
> +aarch64_fusion_enabled_p (unsigned int op)
> +{
> +  return (aarch64_tune_params.fusible_ops & op) != 0;
> +}
> +

A follow-up patch fixing the uses in aarch_macro_fusion_pair_p to use your
new function would be nice.

OK with the change to argument type.

Thanks,
James



[PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out

2016-05-26 Thread Kyrill Tkachov

Hi all,

In this PR we want to optimise:
int foo (int i)
{
  return (i == 0) ? N : __builtin_clz (i);
}

on targets where CLZ is defined at zero to the constant 'N'.
This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO macro.
The obvious place to implement this would be in combine through simplify-rtx 
where we'd
recognise an IF_THEN_ELSE of the form:
(set (reg:SI r1)
 (if_then_else:SI (ne (reg:SI r2)
  (const_int 0 [0]))
   (clz:SI (reg:SI r2))
   (const_int 32)))

and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify it 
into
just (clz:SI (reg:SI r2)).
However, I found this doesn't quite happen for a couple of reasons:
1) This depends on ifcvt or some other pass to have created a conditional move 
of the
two branches that provide the IF_THEN_ELSE to propagate the const_int and clz 
operation into.

2) Combine will refuse to propagate r2 from the above example into both the 
condition and the
CLZ at the same time, so the most we see is:
(set (reg:SI r1)
 (if_then_else:SI (ne (reg:CC cc)
(const_int 0))
   (clz:SI (reg:SI r2))
   (const_int 32)))

which is not enough information to perform the simplification.

This patch implements the optimisation in ce1 using the noce ifcvt framework.
During ifcvt noce_process_if_block can see that we're trying to optimise 
something
of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all the 
information
needed to perform the transformation.

The transformation is performed by adding a new noce_try* function that tries 
to put the
condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to 
simplify that
using the simplify-rtx machinery. That way, we can implement the simplification 
logic in
simplify-rtx.c where it belongs.

A similar transformation for CTZ is implemented as well.
So for code:
int foo (int i)
{
  return (i == 0) ? 32 : __builtin_clz (i);
}

On aarch64 we now emit:
foo:
clz w0, w0
ret

instead of:
foo:
mov w1, 32
clz w2, w0
cmp w0, 0
cselw0, w2, w1, ne
ret

and for arm similarly we generate:
foo:
clz r0, r0
bx  lr

instead of:
foo:
cmp r0, #0
clzne   r0, r0
moveq   r0, #32
bx  lr


and for x86_64 with -O2 -mlzcnt we generate:
foo:
xorl%eax, %eax
lzcntl  %edi, %eax
ret

instead of:
foo:
xorl%eax, %eax
movl$32, %edx
lzcntl  %edi, %eax
testl   %edi, %edi
cmove   %edx, %eax
ret


I tried getting this to work on other targets as well, but encountered 
difficulties.
For example on powerpc the two arms of the condition seen during ifcvt are:

(insn 4 22 11 4 (set (reg:DI 156 [  ])
(const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
 (nil))
and
(insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [  ]) 0)
(clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
 (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
(nil)))

So the setup code in noce_process_if_block sees that the set destination is not 
the same
((reg:DI 156 [  ]) and (subreg/s/u:SI (reg:DI 156 [  ]) 0))
so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
I suppose that's a consequence of how SImode operations are represented in 
early RTL
on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be 
taught to handle
destinations that are subregs of one another, but that would be a separate 
patch.

Anyway, is this patch ok for trunk?

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, 
x86_64-pc-linux-gnu.

Thanks,
Kyrill

2016-05-26  Kyrylo Tkachov  

PR middle-end/37780
* ifcvt.c (noce_try_ifelse_collapse): New function.
Declare prototype.
(noce_process_if_block): Call noce_try_ifelse_collapse.
* simplify-rtx.c (simplify_cond_clz_ctz): New function.
(simplify_ternary_operation): Use the above to simplify
conditional CLZ/CTZ expressions.

2016-05-26  Kyrylo Tkachov  

PR middle-end/37780
* gcc.c-torture/execute/pr37780.c: New test.
* gcc.target/aarch64/pr37780_1.c: Likewise.
* gcc.target/arm/pr37780_1.c: Likewise.
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 4949965c9dc771bbd2f219fa72bdace3d40424da..80af4a84363192879cc49ea45f777fc987fda555 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -817,6 +817,7 @@ struct noce_if_info
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
 static int noce_try_move (struct noce_if_info *);
+static int noce_try_ifelse_collapse (struct noce_if_info *);
 static int noce_try_store_flag (struct noce_if_info *);
 static int noce_try_addcc (struct noce_if_info *);
 static int noce_try_store_flag_constants (struct noce_if_info *);
@@ -1120,6 +1121,37 @@ noce_try_move (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Try forming an IF_THEN_ELSE (cond,

[PATCH][2/3][AArch64] Keep CTZ components together until after reload

2016-05-26 Thread Kyrill Tkachov

Hi all,

In a similar rationale to patch 1/3 this patch changes the AArch64 backend to 
keep the CTZ expression
as a single RTX until after reload when it is split into an RBIT and a CLZ 
instruction.
This enables CTZ-specific optimisations in the pre-reload RTL optimisers.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-05-26  Kyrylo Tkachov  

PR middle-end/37780
* config/aarch64/aarch64.md (ctz2): Convert to
define_insn_and_split.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a9e811e9f70f650fb9292b6d9a96ef4b2dbbaec6..7b3e2cd13bdcc05defda1e3ff74bf003443fe70f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3790,16 +3790,23 @@ (define_insn "rbit2"
   [(set_attr "type" "rbit")]
 )
 
-(define_expand "ctz2"
-  [(match_operand:GPI 0 "register_operand")
-   (match_operand:GPI 1 "register_operand")]
+;; Split after reload into RBIT + CLZ.  Since RBIT is represented as an UNSPEC
+;; it is unlikely to fold with any other operation, so keep this as a CTZ
+;; expression and split after reload to enable scheduling them apart if
+;; needed.
+
+(define_insn_and_split "ctz2"
+ [(set (match_operand:GPI   0 "register_operand" "=r")
+   (ctz:GPI (match_operand:GPI  1 "register_operand" "r")))]
   ""
-  {
-emit_insn (gen_rbit2 (operands[0], operands[1]));
-emit_insn (gen_clz2 (operands[0], operands[0]));
-DONE;
-  }
-)
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+  "
+  emit_insn (gen_rbit2 (operands[0], operands[1]));
+  emit_insn (gen_clz2 (operands[0], operands[0]));
+  DONE;
+")
 
 (define_insn "*and_compare0"
   [(set (reg:CC_NZ CC_REGNUM)


[PATCH][1/3][ARM] Keep ctz expressions together until after reload

2016-05-26 Thread Kyrill Tkachov

Hi all,

On arm we don't have a dedicated instruction that corresponds to a CTZ rtx but 
we synthesise it
with an RBIT instruction followed by a CLZ. This is currently done at expand 
time.
However, I'd like to push that step until after reload and keep the CTZ rtx as 
a single whole in
the early RTL optimisers.  This better expresses the semantics of the operation 
as a whole, since
the RBIT operation is represented as an UNSPEC anyway and so will not see the 
benefits of combine,
and a CTZ-specific optimisation that is implemented in patch 3/3 of this series 
won't be triggered
if the expression is broken up into an UNSPEC and a CLZ.

Therefore this patch changes the expander to expand to a CTZ rtx and split it 
after reload into
an RBIT + CLZ to allow sched2 to schedule them apart if it deems necessary.
This patch enables the optimisation in patch 3/3 where the appropriate test is 
added.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-05-26  Kyrylo Tkachov  

PR middle-end/37780
* config/arm/arm.md (ctzsi2): Convert to define_insn_and_split.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0d491f7ea41e4fb5fb58bbb3047294abda541a73..fcb07e7629dc14b7cf8a0cd3d4d1a57ff33efe07 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10784,19 +10784,22 @@ (define_insn "rbitsi2"
(set_attr "predicable_short_it" "no")
(set_attr "type" "clz")])
 
-(define_expand "ctzsi2"
- [(set (match_operand:SI   0 "s_register_operand" "")
-   (ctz:SI (match_operand:SI  1 "s_register_operand" "")))]
+;; Keep this as a CTZ expression until after reload and then split
+;; into RBIT + CLZ.  Since RBIT is represented as an UNSPEC it is unlikely
+;; to fold with any other expression.
+
+(define_insn_and_split "ctzsi2"
+ [(set (match_operand:SI   0 "s_register_operand" "=r")
+   (ctz:SI (match_operand:SI  1 "s_register_operand" "r")))]
   "TARGET_32BIT && arm_arch_thumb2"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
   "
-   {
- rtx tmp = gen_reg_rtx (SImode); 
- emit_insn (gen_rbitsi2 (tmp, operands[1]));
- emit_insn (gen_clzsi2 (operands[0], tmp));
-   }
-   DONE;
-  "
-)
+  emit_insn (gen_rbitsi2 (operands[0], operands[1]));
+  emit_insn (gen_clzsi2 (operands[0], operands[0]));
+  DONE;
+")
 
 ;; V5E instructions.
 


Re: [PATCH/AARCH64/ILP32] Fix unwinding (libgcc)

2016-05-26 Thread James Greenhalgh
On Wed, Apr 27, 2016 at 02:13:21PM -0700, Andrew Pinski wrote:
> Hi,
>   AARCH64 ILP32 is like x32 where UNITS_PER_WORD > sizeof(void*) so we
> need to define REG_VALUE_IN_UNWIND_CONTEXT for ILP32.  This fixes
> unwinding through the signal handler.  This is independent of the ABI
> which Linux kernel uses to store the registers.
> 
> OK?  Bootstrapped and tested on aarch64 with no regressions.

I've read back through the threads around this issue [1][2] and it looks
like most of the discussion was to do with the machinery and enabling
compatability between unwinder libraries rather than the ABI implications
of turning on REG_VALUE_IN_UNWIND_CONTEXT.

You had a concern in the PR:

  "Does it matter for the propose of unwinding as all we care about
   is pointers to get back to frame before?  Your commit might change
   the ABI for some targets."

That basically sums up my concerns too, particularly as I'm not remotely
familiar with this area! I'm assuming you convinced yourself this wasn't
an issue for x32, and therefore won't be an issue for AArch64 ilp32?

Could you write a paragraph on why this is OK to set my mind at ease, then
give it another few days for Marcus and Richard to comment? I've got no
objections to the patch content if you can help me to understand why this
is correct and does not introduce/break ABI.

Thanks,
James

---
[1]: Bug 48007 - [x32] Unwind library doesn't work with
UNITS_PER_WORD > sizeof (void *)
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48007
[2]: PATCH [8/n]: Prepare x32: PR other/48007: Unwind library doesn't
work with UNITS_PER_WORD > sizeof (void *)
 https://gcc.gnu.org/ml/gcc-patches/2011-06/msg01913.html

> ChangeLog:
> * config/aarch64/value-unwind.h: New file.
> * config.host (aarch64*-*-*): Add aarch64/value-unwind.h to tm_file.



Re: [PR71252][PR71269] Fix trunk errors due to stmt_to_insert

2016-05-26 Thread Kugan Vivekanandarajah
Hi Jakub,


On 26 May 2016 at 18:18, Jakub Jelinek  wrote:
> On Thu, May 26, 2016 at 02:17:56PM +1000, Kugan Vivekanandarajah wrote:
>> --- a/gcc/tree-ssa-reassoc.c
>> +++ b/gcc/tree-ssa-reassoc.c
>> @@ -3767,8 +3767,10 @@ swap_ops_for_binary_stmt (vec ops,
>>operand_entry temp = *oe3;
>>oe3->op = oe1->op;
>>oe3->rank = oe1->rank;
>> +  oe3->stmt_to_insert = oe1->stmt_to_insert;
>>oe1->op = temp.op;
>>oe1->rank= temp.rank;
>> +  oe1->stmt_to_insert = temp.stmt_to_insert;
>
> If you want to swap those 3 fields (what about the others?), can't you write
>   std::swap (oe1->op, oe3->op);
>   std::swap (oe1->rank, oe3->rank);
>   std::swap (oe1->stmt_to_insert, oe3->stmt_to_insert);
> instead and drop operand_entry temp = *oe3; ?
>
>>  }
>>else if ((oe1->rank == oe3->rank
>>   && oe2->rank != oe3->rank)
>> @@ -3779,8 +3781,10 @@ swap_ops_for_binary_stmt (vec ops,
>>operand_entry temp = *oe2;
>>oe2->op = oe1->op;
>>oe2->rank = oe1->rank;
>> +  oe2->stmt_to_insert = oe1->stmt_to_insert;
>>oe1->op = temp.op;
>>oe1->rank = temp.rank;
>> +  oe1->stmt_to_insert = temp.stmt_to_insert;
>>  }
>
> Similarly.

Done. Revised patch attached.

Thanks,
Kugan
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c
index e69de29..4dceaaa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c
@@ -0,0 +1,10 @@
+/* PR middle-end/71269 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+int a, b, c;
+void  fn2 (int);
+void fn1 ()
+{
+  fn2 (sizeof 0 + c + a + b + b);
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index c9ed679..db6ac6b 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -3764,11 +3764,9 @@ swap_ops_for_binary_stmt (vec ops,
  && !is_phi_for_stmt (stmt, oe1->op)
  && !is_phi_for_stmt (stmt, oe2->op)))
 {
-  operand_entry temp = *oe3;
-  oe3->op = oe1->op;
-  oe3->rank = oe1->rank;
-  oe1->op = temp.op;
-  oe1->rank= temp.rank;
+  std::swap (oe1->op, oe3->op);
+  std::swap (oe1->rank, oe3->rank);
+  std::swap (oe1->stmt_to_insert, oe3->stmt_to_insert);
 }
   else if ((oe1->rank == oe3->rank
&& oe2->rank != oe3->rank)
@@ -3776,11 +3774,9 @@ swap_ops_for_binary_stmt (vec ops,
   && !is_phi_for_stmt (stmt, oe1->op)
   && !is_phi_for_stmt (stmt, oe3->op)))
 {
-  operand_entry temp = *oe2;
-  oe2->op = oe1->op;
-  oe2->rank = oe1->rank;
-  oe1->op = temp.op;
-  oe1->rank = temp.rank;
+  std::swap (oe1->op, oe2->op);
+  std::swap (oe1->rank, oe2->rank);
+  std::swap (oe1->stmt_to_insert, oe2->stmt_to_insert);
 }
 }
 
@@ -3790,6 +3786,42 @@ swap_ops_for_binary_stmt (vec ops,
 static inline gimple *
 find_insert_point (gimple *stmt, tree rhs1, tree rhs2)
 {
+  /* If rhs1 is defined by stmt_to_insert, insert after its argument
+ definion stmt.  */
+  if (TREE_CODE (rhs1) == SSA_NAME
+  && !gimple_nop_p (SSA_NAME_DEF_STMT (rhs1))
+  && !gimple_bb (SSA_NAME_DEF_STMT (rhs1)))
+{
+  gimple *stmt1 = SSA_NAME_DEF_STMT (rhs1);
+  gcc_assert (is_gimple_assign (stmt1));
+  tree rhs11 = gimple_assign_rhs1 (stmt1);
+  tree rhs12 = gimple_assign_rhs2 (stmt1);
+  if (TREE_CODE (rhs11) == SSA_NAME
+ && reassoc_stmt_dominates_stmt_p (stmt, SSA_NAME_DEF_STMT (rhs11)))
+   stmt = SSA_NAME_DEF_STMT (rhs11);
+  if (TREE_CODE (rhs12) == SSA_NAME
+ && reassoc_stmt_dominates_stmt_p (stmt, SSA_NAME_DEF_STMT (rhs12)))
+   stmt = SSA_NAME_DEF_STMT (rhs12);
+}
+
+  /* If rhs2 is defined by stmt_to_insert, insert after its argument
+ definion stmt.  */
+  if (TREE_CODE (rhs2) == SSA_NAME
+  && !gimple_nop_p (SSA_NAME_DEF_STMT (rhs2))
+  && !gimple_bb (SSA_NAME_DEF_STMT (rhs2)))
+{
+  gimple *stmt1 = SSA_NAME_DEF_STMT (rhs2);
+  gcc_assert (is_gimple_assign (stmt1));
+  tree rhs11 = gimple_assign_rhs1 (stmt1);
+  tree rhs12 = gimple_assign_rhs2 (stmt1);
+  if (TREE_CODE (rhs11) == SSA_NAME
+ && reassoc_stmt_dominates_stmt_p (stmt, SSA_NAME_DEF_STMT (rhs11)))
+   stmt = SSA_NAME_DEF_STMT (rhs11);
+  if (TREE_CODE (rhs12) == SSA_NAME
+ && reassoc_stmt_dominates_stmt_p (stmt, SSA_NAME_DEF_STMT (rhs12)))
+   stmt = SSA_NAME_DEF_STMT (rhs12);
+}
+
   if (TREE_CODE (rhs1) == SSA_NAME
   && reassoc_stmt_dominates_stmt_p (stmt, SSA_NAME_DEF_STMT (rhs1)))
 stmt = SSA_NAME_DEF_STMT (rhs1);
@@ -3843,12 +3875,6 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
{
  gimple *insert_point
= find_insert_point (stmt, oe1->op, oe2->op);
- /* If the stmt that defines operand has to be inserted, insert it
-before the use.  */
- if (oe1->stmt_to_insert)
-   i

[committed] Add PR71280 testcase

2016-05-26 Thread Jakub Jelinek
Hi!

I came up with a simple C testcase for a PR, which apparently Martin
fixed yesterday in PR71239.

Thus, I've just committed following testcase as obvious and will close the
PR.

2016-05-26  Jakub Jelinek  

PR tree-optimization/71280
* gcc.dg/pr71280.c: New test.

--- gcc/testsuite/gcc.dg/pr71280.c.jj   2016-05-26 11:27:20.865844253 +0200
+++ gcc/testsuite/gcc.dg/pr71280.c  2016-05-26 11:26:52.0 +0200
@@ -0,0 +1,15 @@
+/* PR tree-optimization/71280 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+extern char v[];
+
+int
+foo ()
+{
+  int k = 0;
+  typedef char T[64];
+  for (int i = 0; i < 64; i++)
+k += (*(T *) &v[0])[i];
+  return k;
+}

Jakub


Re: [PATCH][ARM] PR target/71056: Don't use vectorized builtins when NEON is not available

2016-05-26 Thread Kyrill Tkachov

Hi Ramana,

On 19/05/16 14:36, Ramana Radhakrishnan wrote:

On 11/05/16 15:32, Kyrill Tkachov wrote:

Hi all,

In this PR a NEON builtin is introduced during SLP vectorisation even when NEON 
is not available
because arm_builtin_vectorized_function is missing an appropriate check in the 
BSWAP handling code.

Then during expand when we try to expand the NEON builtin the code in 
arm_expand_neon_builtin rightly
throws an error telling the user to enable NEON, even though the testcase 
doesn't use any intrinsics.

This patch fixes the bug by bailing out early if !TARGET_NEON. This allows us 
to remove a redundant
TARGET_NEON check further down in the function as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk?

This appears on GCC 6 as well.
On older branches the test failure doesn't trigger but the logic looks buggy 
anyway.
Ok for the branches as well if testing is clean?

Thanks,
Kyrill

2016-05-11  Kyrylo Tkachov  

 PR target/71056
 * config/arm/arm-builtins.c (arm_builtin_vectorized_function): Return
 NULL_TREE early if NEON is not available.  Remove now redundant check
 in ARM_CHECK_BUILTIN_MODE.

2016-05-11  Kyrylo Tkachov  

 PR target/71056
 * gcc.target/arm/pr71056.c: New test.

OK. LGTM - please apply if no regressions and backport onto GCC 6 after the 
auto-testers have let this bake on trunk for a little while.

I'd rather not apply it to the release branches unless we can trigger it there 
but it maybe newer logic in the bswap pass that detects this.


Thanks, the patch has been in trunk for a week without any complaints.
I'll apply it to GCC 6 next week.
As for the other branches, the logic in arm_builtin_vectorized_function there 
looks vulnerable
as well, but I haven't been able to trigger this there.
I think this needs certain combinations of bswap and SLP vectorisation 
improvements to trigger.
I've tried writing a few testcases manually to trigger this but was not able to.
I'm happy to not apply this to the other branches unless we get a bug report 
about it.

Thanks for the review,
Kyrill




regards
Ramana




Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-26 Thread James Greenhalgh
On Mon, May 16, 2016 at 11:38:04AM +0100, Wilco Dijkstra wrote:
> GCC expands switch statements in a very simplistic way and tries to use a 
> table
> expansion even when it is a bad idea for performance or codesize.
> GCC typically emits extremely sparse tables that contain mostly default 
> entries
> (something which currently cannot be tuned by backends).  Additionally the
> computation of the minimum/maximum label offsets is too simplistic so the 
> tables
> are often twice as large as necessary.
> 
> The cost of a table switch is significant due to the setup overhead, the table
> lookup (which due to being sparse and large adds unnecessary cachemisses)
> and hard to predict indirect jump.  Therefore it is best to avoid using a 
> table
> unless there are many real case labels.
> 
> This patch fixes that by setting the default aarch64_case_values_threshold to
> 16 when the per-CPU tuning is not set.  On SPEC2006 this improves the switch
> heavy benchmarks GCC and perlbench both in performance (1-2%) as well as size
> (0.5-1% smaller).
> 
> OK for trunk?

I have a trivial request to change the comment on the function. Otherwise,
this is now OK for trunk.

> ChangeLog:
> 2016-04-22  Wilco Dijkstra  
> 
> gcc/
> * config/aarch64/aarch64.c (aarch64_case_values_threshold):
> Return a better case_values_threshold when optimizing.
> 
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0620f1e..a240635 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3546,7 +3546,12 @@ aarch64_cannot_force_const_mem (machine_mode mode 
> ATTRIBUTE_UNUSED, rtx x)
>return aarch64_tls_referenced_p (x);
>  }
> 
> -/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
> +/* Implement TARGET_CASE_VALUES_THRESHOLD.
> +   The expansion for a table switch is quite expensive due to the number
> +   of instructions, the table lookup and hard to predict indirect jump.
> +   When optimizing for speed, with -O3 use the per-core tuning if set,
> +   otherwise use tables for > 16 cases as a tradeoff between size and
> +   performance.  */

This comment doesn't cover the "optimize_size" case.

Thanks,
James



Re: [AArch64, 2/4] Extend vector mutiply by element to all supported modes

2016-05-26 Thread James Greenhalgh
On Wed, May 18, 2016 at 02:13:53PM +0100, Jiong Wang wrote:
> Thanks for reporting this.
> 
> Yes, reproduced. I should force those res* local variable into
> memory so they can be in the same order as the expected result
> which is kept in memory.
> 
> The following patch fix this.
> 
> vmul_elem_1 pass on both aarch64_be-none-elf and aarch64-linux.
> 
> OK for trunk?

OK.

Thanks,
James

> 
> gcc/testsuite/
> 
> 2016-05-18  Jiong Wang  
> 
> * gcc.target/aarch64/simd/vmul_elem_1.c: Force result variables to be
> kept in memory.
> 



Re: [PATCH, testsuite] Skip tail call tests on Thumb-1 targets

2016-05-26 Thread Kyrill Tkachov


On 26/05/16 09:24, Thomas Preudhomme wrote:

On Wednesday 25 May 2016 11:38:44 Mike Stump wrote:

On May 25, 2016, at 10:20 AM, Thomas Preudhomme

 wrote:

2016-05-24  Thomas Preud'homme  

* gcc.dg/plugin/plugin.exp: skip tail call tests for Thumb-1.

Is this ok for trunk?

Ok.  Normally I'd just punt to the arm folks. Better to Cc them on the
patch.  I watch all the changes to the .exp files, and will scream if
something seems to be going wrong.

Fair enough. I was not sure because it was touching to .exp files. What do ARM
maintainers think?

Is this ok for trunk?


From my perspective the check for tailcall availability on arm is
correct. I don't know whether the place you add it in is correct
as I'm not familiar with that .exp file, so if Mike has no objections
to it's placement I'd say it's ok.

Thanks,
Kyrill


Best regards,

Thomas




[AArch64, testsuite] Fix vmul_elem_1.c on big-endian

2016-05-26 Thread Jiong Wang

On 18/05/16 14:13, Jiong Wang wrote:



On 18/05/16 09:17, Christophe Lyon wrote:



gcc/

2016-05-17  Jiong Wang  

 * config/aarch64/aarch64-simd.md 
(*aarch64_mul3_elt_to_128df): Extend

 to all supported modes.  Rename to...
 (*aarch64_mul3_elt_from_dup): ...this.

gcc/testsuite/

2016-05-17  Jiong Wang  

 * gcc.target/aarch64/simd/vmul_elem_1.c: New.

Otherwise, this patch is OK.


Hi Jiong,

The new testcase fails on aarch64_be, at execution time.

Christophe.


Thanks for reporting this.

Yes, reproduced. I should force those res* local variable into
memory so they can be in the same order as the expected result
which is kept in memory.

The following patch fix this.

vmul_elem_1 pass on both aarch64_be-none-elf and aarch64-linux.

OK for trunk?

gcc/testsuite/

2016-05-18  Jiong Wang  

* gcc.target/aarch64/simd/vmul_elem_1.c: Force result 
variables to be

kept in memory.



Ping ~


Re: [PATCH] Improve vcvtps2ph

2016-05-26 Thread Kirill Yukhin
On 23 May 19:21, Jakub Jelinek wrote:
> Hi!
> 
> These insns are available in AVX512VL, so we can just use v instead of x.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.

--
Thanks, K


Re: [PATCH] Improve *ssse3_palignr_perm

2016-05-26 Thread Kirill Yukhin
On 23 May 19:17, Jakub Jelinek wrote:
> Hi!
> 
> This pattern is used to improve __builtin_shuffle in some cases;
> VPALIGNR is AVX512BW & AVX512VL.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.

--
Thanks, K


Re: [PATCH] Improve *avx_vperm_broadcast_*

2016-05-26 Thread Kirill Yukhin
Hi Jakub,
On 23 May 19:15, Jakub Jelinek wrote:
> Hi!
> 
> The vbroadcastss and vpermilps insns are already in AVX512F & AVX512VL,
> so can be used with v instead of x, the splitter case where we for AVX
> emit vpermilps plus vpermf128 is more problematic, because the latter
> insn isn't available in EVEX.  But, we can get the same effect with
> vshuff32x4 when both source operands are the same.
> Alternatively, we could replace the vpermilps and vshuff32x4 insns
> with the AVX512VL arbitrary permutations I think, the question is
> what is faster, because we'd need to load the mask from memory.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
patch is OK.

--
Thanks, K


Re: [PATCH, testsuite] Skip tail call tests on Thumb-1 targets

2016-05-26 Thread Thomas Preudhomme
On Wednesday 25 May 2016 11:38:44 Mike Stump wrote:
> On May 25, 2016, at 10:20 AM, Thomas Preudhomme 
 wrote:
> > 2016-05-24  Thomas Preud'homme  
> > 
> >* gcc.dg/plugin/plugin.exp: skip tail call tests for Thumb-1.
> > 
> > Is this ok for trunk?
> 
> Ok.  Normally I'd just punt to the arm folks. Better to Cc them on the
> patch.  I watch all the changes to the .exp files, and will scream if
> something seems to be going wrong.

Fair enough. I was not sure because it was touching to .exp files. What do ARM 
maintainers think?

Is this ok for trunk?

Best regards,

Thomasdiff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 62f6797..321b4ba 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -90,6 +90,12 @@ foreach plugin_test $plugin_test_list {
 if ![runtest_file_p $runtests $plugin_src] then {
 continue
 }
+# Skip tail call tests on targets that do not have sibcall_epilogue.
+if {[regexp ".*must_tail_call_plugin.c" $plugin_src]
+	&& [istarget arm*-*-*]
+	&& [check_effective_target_arm_thumb1]} then {
+	continue
+}
 set plugin_input_tests [lreplace $plugin_test 0 0]
 plugin-test-execute $plugin_src $plugin_input_tests
 }


Re: [PR71252][PR71269] Fix trunk errors due to stmt_to_insert

2016-05-26 Thread Jakub Jelinek
On Thu, May 26, 2016 at 02:17:56PM +1000, Kugan Vivekanandarajah wrote:
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -3767,8 +3767,10 @@ swap_ops_for_binary_stmt (vec ops,
>operand_entry temp = *oe3;
>oe3->op = oe1->op;
>oe3->rank = oe1->rank;
> +  oe3->stmt_to_insert = oe1->stmt_to_insert;
>oe1->op = temp.op;
>oe1->rank= temp.rank;
> +  oe1->stmt_to_insert = temp.stmt_to_insert;

If you want to swap those 3 fields (what about the others?), can't you write
  std::swap (oe1->op, oe3->op);
  std::swap (oe1->rank, oe3->rank);
  std::swap (oe1->stmt_to_insert, oe3->stmt_to_insert);
instead and drop operand_entry temp = *oe3; ?

>  }
>else if ((oe1->rank == oe3->rank
>   && oe2->rank != oe3->rank)
> @@ -3779,8 +3781,10 @@ swap_ops_for_binary_stmt (vec ops,
>operand_entry temp = *oe2;
>oe2->op = oe1->op;
>oe2->rank = oe1->rank;
> +  oe2->stmt_to_insert = oe1->stmt_to_insert;
>oe1->op = temp.op;
>oe1->rank = temp.rank;
> +  oe1->stmt_to_insert = temp.stmt_to_insert;
>  }

Similarly.

Jakub


Re: C PATCH to add -Wswitch-unreachable (PR c/49859)

2016-05-26 Thread Marek Polacek
On Wed, May 25, 2016 at 03:21:00PM -0600, Martin Sebor wrote:
> I see.  Thanks for clarifying that.  No warning on a declaration
> alone makes sense in the case above but it has the unfortunate
> effect of suppressing the warning when the declaration is followed
> by a statement, such as in:
> 
>   void f (int*, int);
> 
>   void g (int i)
>   {
> switch (i) {
>   int a [3];
>   memset (a, 0, sizeof a);
> 
>   default:
>   f (a, 3);
> }
>   }

Ah, then I think we should probably look into GIMPLE_TRY, using
gimple_try_eval, too.

Marek


Re: Fix for PR70909 in Libiberty Demangler (4)

2016-05-26 Thread Marcel Böhme
Hi,

This patch is pending a careful review.

Best regards,
- Marcel

> On 2 May 2016, at 11:21 PM, Marcel Böhme  wrote:
> 
> Hi,
> 
> This fixes several stack overflows due to infinite recursion in d_print_comp 
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70909).
> 
> The method d_print_comp in cp-demangle.c recursively constructs the 
> d_print_info dpi from the demangle_component dc. The method 
> d_print_comp_inner traverses dc as a graph. Now, dc can be a graph with 
> cycles leading to infinite recursion in several distinct cases. The patch 
> uses the component stack to find whether the current node dc has itself as 
> ancestor more than once. 
> 
> Bootstrapped and regression tested on x86_64-pc-linux-gnu. Test cases added 
> to libiberty/testsuite/demangler-expected and checked PR70909 and related 
> stack overflows are resolved.
> 
> Best regards,
> - Marcel
> 
> 
> 
> Index: ChangeLog
> ===
> --- ChangeLog (revision 235760)
> +++ ChangeLog (working copy)
> @@ -1,3 +1,19 @@
> +2016-05-02  Marcel Böhme  
> +
> + PR c++/70909
> + PR c++/61460
> + PR c++/68700
> + PR c++/67738
> + PR c++/68383
> + PR c++/70517
> + PR c++/61805
> + PR c++/62279
> + PR c++/67264
> + * cp-demangle.c: Prevent infinite recursion when traversing cyclic
> + demangle component.
> + (d_print_comp): Return when demangle component has itself as ancistor
> + more than once.
> +
> 2016-04-30  Oleg Endo  
> 
>   * configure: Remove SH5 support.
> Index: cp-demangle.c
> ===
> --- cp-demangle.c (revision 235760)
> +++ cp-demangle.c (working copy)
> @@ -5436,6 +5436,24 @@ d_print_comp (struct d_print_info *dpi, int option
> {
>   struct d_component_stack self;
> 
> +  self.parent = dpi->component_stack;
> +
> +  while (self.parent)
> +{
> +  self.dc = self.parent->dc;
> +  self.parent = self.parent->parent;
> +  if (dc != NULL && self.dc == dc)
> + {
> +   while (self.parent)
> + {
> +   self.dc = self.parent->dc;
> +   self.parent = self.parent->parent;
> +   if (self.dc == dc)
> + return;
> + }
> + }
> +}
> +
>   self.dc = dc;
>   self.parent = dpi->component_stack;
>   dpi->component_stack = &self;
> Index: testsuite/demangle-expected
> ===
> --- testsuite/demangle-expected   (revision 235760)
> +++ testsuite/demangle-expected   (working copy)
> @@ -4431,3 +4431,69 @@ _Q.__0
> 
> _Q10-__9cafebabe.
> cafebabe.::-(void)
> +#
> +# Test demangler crash PR62279
> +
> +_ZN5Utils9transformIPN15ProjectExplorer13BuildStepListEZNKS1_18BuildConfiguration14knownStepListsEvEUlS3_E_EE5QListIDTclfp0_cvT__RKS6_IS7_ET0_
> +QList 
> Utils::transform ProjectExplorer::BuildConfiguration::knownStepLists() 
> const::{lambda(ProjectExplorer::BuildStepList*)#1}>(ProjectExplorer::BuildConfiguration::knownStepLists()
>  const::{lambda(ProjectExplorer::BuildStepList*)#1} const&, 
> ProjectExplorer::BuildConfiguration::knownStepLists() 
> const::{lambda(ProjectExplorer::BuildStepList*)#1})
> +#
> +
> +_ZSt7forwardIKSaINSt6thread5_ImplISt12_Bind_simpleIFZN6WIM_DL5Utils9AsyncTaskC4IMNS3_8Hardware12FpgaWatchdogEKFvvEIPS8_EEEibOT_DpOT0_EUlvE_vEESD_RNSt16remove_referenceISC_E4typeE
> +std::allocator  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, void 
> (WIM_DL::Hardware::FpgaWatchdog::*&&)() const, 
> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > const&& 
> std::forward  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, 
> std::allocator  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, void 
> (WIM_DL::Hardware::FpgaWatchdog::*&&)() const, 
> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > const&&, 
> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > 
> const>(std::remove_reference  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, void 
> (WIM_DL::Hardware::FpgaWatchdog::*&&)() const, 
> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > const>::type&)
> +#
> +# Test demangler crash PR61805
> +
> +_ZNK5niven5ColorIfLi4EEdvIfEENSt9enable_ifIXsrSt13is_arithmeticIT_E5valueEKNS0_IDTmlcvS5__Ecvf_EELi44typeES5_
> +std::enable_if::value, niven::Color (((float)())*((float)())), 4> const>::type niven::Color 4>::operator/(float) const
> +#
> +# Test recursion PR70517
> +
> +_ZSt4moveIRZN11tconcurrent6futureIvE4thenIZ5awaitIS2_EDaOT_EUlRKS6_E_EENS1_INSt5decayIDTclfp_defpTEEE4typeEEES7_EUlvE_EONSt16remove_referenceIS6_E4typeES7_
> +std::remove_reference ({parm#1}(*this))>::type> tconcurrent::future::then await 
> >(tconcurrent::future&&)::{lambda(tconcurrent::future  ({parm#1}(*this))>::type> tconcurrent::future

Re: Fix for PR70926 in Libiberty Demangler (5)

2016-05-26 Thread Marcel Böhme
Hi: Pending review.

Best - Marcel

> On 3 May 2016, at 10:40 PM, Marcel Böhme  wrote:
> 
> Hi,
> 
> This fixes four access violations 
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70926). 
> 
> Two of these first read the value of a length variable len from the mangled 
> string, then strncpy len characters from the mangled string; more than 
> necessary.
> The other two read the value of an array index n from the mangled string, 
> which can be negative due to an overflow.
> 
> Bootstrapped and regression tested on x86_64-pc-linux-gnu. Test cases added 
> to libiberty/testsuite/demangler-expected and checked PR70926 is resolved.
> 
> Best regards,
> - Marcel
> 
> Index: libiberty/ChangeLog
> ===
> --- libiberty/ChangeLog   (revision 235801)
> +++ libiberty/ChangeLog   (working copy)
> @@ -1,3 +1,12 @@
> +2016-05-03  Marcel Böhme  
> +
> + PR c++/70926
> + * cplus-dem.c: Handle large values and overflow when demangling
> + length variables. 
> + (demangle_template_value_parm): Read only until end of mangled string.  
>  
> + (do_hpacc_template_literal): Likewise.
> + (do_type): Handle overflow when demangling array indices.
> +
> 2016-05-02  Marcel Böhme  
> 
>   PR c++/70498
> Index: libiberty/cplus-dem.c
> ===
> --- libiberty/cplus-dem.c (revision 235801)
> +++ libiberty/cplus-dem.c (working copy)
> @@ -2051,7 +2051,8 @@ demangle_template_value_parm (struct work_stuff *w
>   else
>   {
> int symbol_len  = consume_count (mangled);
> -   if (symbol_len == -1)
> +   if (symbol_len == -1 
> +   || symbol_len > (long) strlen (*mangled))
>   return -1;
> if (symbol_len == 0)
>   string_appendn (s, "0", 1);
> @@ -3611,7 +3612,7 @@ do_type (struct work_stuff *work, const char **man
>   /* A back reference to a previously seen type */
>   case 'T':
> (*mangled)++;
> -   if (!get_count (mangled, &n) || n >= work -> ntypes)
> +   if (!get_count (mangled, &n) || n < 0 || n >= work -> ntypes)
>   {
> success = 0;
>   }
> @@ -3789,7 +3790,7 @@ do_type (struct work_stuff *work, const char **man
> /* A back reference to a previously seen squangled type */
> case 'B':
>   (*mangled)++;
> -  if (!get_count (mangled, &n) || n >= work -> numb)
> +  if (!get_count (mangled, &n) || n < 0 || n >= work -> numb)
>   success = 0;
>   else
>   string_append (result, work->btypevec[n]);
> @@ -4130,7 +4131,8 @@ do_hpacc_template_literal (struct work_stuff *work
> 
>   literal_len = consume_count (mangled);
> 
> -  if (literal_len <= 0)
> +  if (literal_len <= 0
> +  || literal_len > (long) strlen (*mangled))
> return 0;
> 
>   /* Literal parameters are names of arrays, functions, etc.  and the
> Index: libiberty/testsuite/demangle-expected
> ===
> --- libiberty/testsuite/demangle-expected (revision 235801)
> +++ libiberty/testsuite/demangle-expected (working copy)
> @@ -4441,3 +4441,16 @@ __vt_900cafebabe
> 
> _Z808
> _Z808
> +#
> +# Tests write access violation PR70926
> +
> +0__Ot2m02R5T50
> +0__Ot2m02R5T50
> +#
> +
> +0__GT500_
> +0__GT500_
> +#
> +
> +__t2m05B50_
> +__t2m05B50_
>