Re: [PATCH] avoid calling alloca(0)

2016-11-23 Thread Jakub Jelinek
On Wed, Nov 23, 2016 at 02:27:05PM -0700, Jeff Law wrote:
> I believe we should be warning on trying to allocation 0 bytes of memory via
> malloc, realloc or alloca, with the exception of a non-builtin alloca with
> no return value, but I think we've covered that elsewhere and Martin's code
> will DTRT more by accident than design.

But we aren't going to warn on all places which might call alloca at runtime
with 0 arguments, you would need runtime instrumentation for that (like
ubsan).  You are going to warn about explicit cases and random subset of the
other ones where the user is just unlucky enough that the compiler threaded
some jumps etc.  For the explicit ptr = alloca (0) it is easy to get rid of
it, for the implicit one one has to pessimize code if they want to avoid the
warning.

Jakub


Re: [PATCH] avoid calling alloca(0)

2016-11-23 Thread Jakub Jelinek
On Wed, Nov 23, 2016 at 06:15:11PM -0700, Martin Sebor wrote:
> >Can't we just
> >gcc_assert (x != 0) before the problematical calls?  That avoids
> >unnecessary over-allocation and gets us a clean ICE if we were to try
> >and call alloca with a problematical value.
> 
> gcc_assert works only in some instances (e.g., in c-ada-spec.c:191)
> but not in others because some actually do make the alloca(0) call
> at runtime: at a minimum, lto.c:3285, reg-stack.c:2008, and
> tree-ssa-threadedge.c:344 assert during bootstrap.
> 
> After reviewing a few more of the XALLOCAVEC calls in the affected
> files I suspect that those to alloca(0) pointed out by the warning
> may be just a subset that GCC happens to see thanks to constant
> propagation.  If that's so then changing this subset to
> alloca(N + !N) or some such is probably not the right approach
> because it will miss all the other calls where GCC doesn't see that
> N is zero.  I think the most robust solution is to do as Bernd
> suggests: change XALLOCAVEC as shown above.  That will let us
> prevent any and all unsafe assumptions about the result of
> alloca(0) being either non-null or distinct.  The other approach
> would be to change XALLOCAVEC to add 1 to the byte count.  That
> would be in line with what XMALLOC does.

I still fail to see why you want to change anything at least for
hosts where you know XALLOCAVEC is __builtin_alloca.
Show me one place which assumes the result of alloca (0) in gcc sources is
distinct, or non-NULL.  For 0 elements the pointer simply isn't used.
And whether for the malloc based alloca it GCs or not really doesn't matter
for us.

Jakub


Re: [Patch, testsuite] Fix bogus uninit-19.c failure for avr

2016-11-23 Thread Senthil Kumar Selvaraj

Jeff Law writes:

> On 11/23/2016 02:54 AM, Senthil Kumar Selvaraj wrote:
>> Hi,
>>
>>   The below patch fixes uninit-19.c for avr by adding
>>   -finline-small-functions for avr.
>>
>>   The test fails for avr because fn1 does not get inlined into fn2. Inlining
>>   occurs for x86_64 because fn1's computed size equals call_stmt_size. For
>>   avr, 32 bit memory moves are more expensive, and b[3] = p10[a] results in
>>   a bigger size for fn1, resulting in estimate_growth > 0 and no inlining.
>>
>>   Adding -finline-small-functions forces early inliner to inline fn1,
>>   making the test pass.
>>
>>   Committed to trunk.
>>
>> Regards
>> Senthil
>>
>> gcc/testsuite/ChangeLog
>>
>> 2016-11-23  Senthil Kumar Selvaraj  
>>
>> * gcc.dg/uninit-19.c: Add -finline-small-functions for avr.
> How about instead forcing inlining via the always_inline attribute?  If 
> inlining is indeed expected and necessary, that seems better than 
> -finline-small-functions.

I considered it, but a previous discussion in the mailing list backed
off from a similar proposal
(https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00517.html), and I wasn't
sure if always_inline would break the pic/non-pic distinction somehow.

Should I revert this and mark the function always_inline/static?

>
> Jeff



[tree-tailcall] Check if function returns it's argument

2016-11-23 Thread Prathamesh Kulkarni
Hi,
Consider following test-case:

void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  __builtin_memcpy (a1, a2, a3);
  return a1;
}

return a1 can be considered equivalent to return value of memcpy,
and the call could be emitted as a tail-call.
gcc doesn't emit the above call to memcpy as a tail-call,
but if it is changed to:

void *t1 = __builtin_memcpy (a1, a2, a3);
return t1;

Then memcpy is emitted as a tail-call.
The attached patch tries to handle the former case.

Bootstrapped+tested on x86_64-unknown-linux-gnu.
Cross tested on arm*-*-*, aarch64*-*-*
Does this patch look OK ?

Thanks,
Prathamesh
2016-11-24  Prathamesh Kulkarni  

* gimple.c (gimple_call_return_arg): New function.
* gimple.h (gimple_call_return_arg): Declare.
* tree-tailcall.c (find_tail_calls): Call gimple_call_return_arg.

testsuite/
* gcc.dg/tree-ssa/tailcall-8.c: New test.

diff --git a/gcc/gimple.c b/gcc/gimple.c
index 0a3dc72..ec460fc 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2561,6 +2561,23 @@ gimple_call_combined_fn (const gimple *stmt)
   return CFN_LAST;
 }
 
+/* Return arg, if function returns it's argument or NULL if it doesn't.  */
+tree
+gimple_call_return_arg (gcall *call_stmt)
+{
+  unsigned rf = gimple_call_return_flags (call_stmt);
+  if (rf & ERF_RETURNS_ARG)
+{
+  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+  if (argnum < gimple_call_num_args (call_stmt))
+   {
+ tree arg = gimple_call_arg (call_stmt, argnum);
+ return arg;
+   }
+}
+  return NULL_TREE;
+}
+
 /* Return true if STMT clobbers memory.  STMT is required to be a
GIMPLE_ASM.  */
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0d0296e..ebccbe1 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1520,6 +1520,7 @@ extern combined_fn gimple_call_combined_fn (const gimple 
*);
 extern bool gimple_call_builtin_p (const gimple *);
 extern bool gimple_call_builtin_p (const gimple *, enum built_in_class);
 extern bool gimple_call_builtin_p (const gimple *, enum built_in_function);
+extern tree gimple_call_return_arg (gcall *);
 extern bool gimple_asm_clobbers_memory_p (const gasm *);
 extern void dump_decl_set (FILE *, bitmap);
 extern bool nonfreeing_call_p (gimple *);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c
new file mode 100644
index 000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index f97541d..3396473 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -422,6 +422,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
{
  call = as_a  (stmt);
  ass_var = gimple_call_lhs (call);
+ if (!ass_var)
+   ass_var = gimple_call_return_arg (call);
  break;
}
 


Re: [patch, nios2] Fix PR78357, adjust sync builtin initialization

2016-11-23 Thread Sebastian Huber

Hello Jeff,

On 23/11/16 23:28, Jeff Law wrote:

On 11/16/2016 02:53 AM, Chung-Lin Tang wrote:

This patch adjusts the initialization of __sync built-in functions:
instead of conditionalizing on TARGET_LINUX_ABI, directly place the
target-hook #define in config/nios2/linux.h.  This appears to be in line
with other similar ports, e.g. m68k.

Sebastian, this should solve your issue of not wanting __sync_* libcalls
generated on RTEMS (which also uses TARGET_LINUX_ABI due to TLS 
support),

can you verify it works for you?

Chung-Lin

PR target/78357
* config/nios2/nios2.c (nios2_init_libfuncs): Remove 
TARGET_LINUX_ABI

condition.
(TARGET_INIT_LIBFUNCS): Delete definition and...
* config/nios2/linux.h (TARGET_INIT_LIBFUNCS): ...move to here, add
comments.


I fear you may have botched this;

g++ -fno-PIE -c   -g  -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE 
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall 
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. 
-I/home/law/gcc-testing/gcc/gcc -I/home/law/gcc-testing/gcc/gcc/. 
-I/home/law/gcc-testing/gcc/gcc/../include 
-I/home/law/gcc-testing/gcc/gcc/../libcpp/include 
-I/opt/cfarm/mpc/include 
-I/home/law/gcc-testing/gcc/gcc/../libdecnumber 
-I/home/law/gcc-testing/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/home/law/gcc-testing/gcc/gcc/../libbacktrace   -o nios2.o -MT 
nios2.o -MMD -MP -MF ./.deps/nios2.TPo 
/home/law/gcc-testing/gcc/gcc/config/nios2/nios2.c
/home/law/gcc-testing/gcc/gcc/config/nios2/nios2.c:3608:1: error: 
‘void nios2_init_libfuncs()’ defined but not used 
[-Werror=unused-function]

 nios2_init_libfuncs (void)


in my build this was only a warning. How do you enable the -Werror for 
the GCC build? Is this the normal way to build GCC?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



[PATCH] PR fortran/78279 -- convert assert to internal error

2016-11-23 Thread Steve Kargl
The patch has passed regression testing on x86_64-*-freebsd.
It should be self-explanatory.  OK to commit?

2016-11-23  Steven G. Kargl  

PR fortran/78279
* dependency.c (identical_array_ref): Convert gcc_assert to conditional
and gfc_internal_error.

2016-11-23  Steven G. Kargl  

PR fortran/78279
* gfortran.dg/pr78279.f90: New test.

Index: gcc/fortran/dependency.c
===
--- gcc/fortran/dependency.c(revision 242789)
+++ gcc/fortran/dependency.c(working copy)
@@ -101,7 +101,9 @@ identical_array_ref (gfc_array_ref *a1, 
 
   if (a1->type == AR_ELEMENT && a2->type == AR_ELEMENT)
 {
-  gcc_assert (a1->dimen == a2->dimen);
+  if (a1->dimen != a2->dimen)
+   gfc_internal_error ("identical_array_ref(): inconsistent dimensions");
+
   for (i = 0; i < a1->dimen; i++)
{
  if (gfc_dep_compare_expr (a1->start[i], a2->start[i]) != 0)
Index: gcc/testsuite/gfortran.dg/pr78279.f90
===
--- gcc/testsuite/gfortran.dg/pr78279.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr78279.f90   (working copy)
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! { dg-options "-Ofast" }
+program p
+   integer :: i
+   real :: z(2,4)
+   z = 0.0
+   do i = 1, 3
+  if ( z(i) > z(1,i+1) ) print *, i   ! { dg-error "mismatch in array 
reference" }
+   end do
+end


-- 
Steve


Re: [PATCH, rs6000] Fix PR78458, LRA ICE building libgcc for powerpc-linux-gnuspe e500v2

2016-11-23 Thread Peter Bergner

On 11/23/16 12:33 PM, Segher Boessenkool wrote:

On Wed, Nov 23, 2016 at 12:13:23PM -0600, Peter Bergner wrote:
Please put parens around NREGS and that last MODE.

[snip]

You don't need the default arguments, FWIW.

Okay for trunk.  Thanks!


Ok, commited as revision 242818 with your suggestions.  Thanks.

Peter





Add release notes for new TS 18661-1 macros in headers provided by GCC

2016-11-23 Thread Joseph Myers
I've applied this patch to add release notes for various new TS 18661-1 
macros added to the headers GCC provides.

(The GCC 7 release notes are still extremely incomplete in general, 
especially as regards the many diagnostic improvements in GCC 7, but 
probably in lots of other areas as well.)

Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.30
diff -u -r1.30 changes.html
--- htdocs/gcc-7/changes.html   23 Nov 2016 07:38:58 -  1.30
+++ htdocs/gcc-7/changes.html   24 Nov 2016 01:21:37 -
@@ -182,6 +182,27 @@
  enum operation { add, count };
^
 
+The limits.h header provided by GCC defines
+  macros such as INT_WIDTH for the width in bits of
+  integer types, if __STDC_WANT_IEC_60559_BFP_EXT__ is
+  defined before the header is included.
+  The stdint.h header defines such macros
+  as SIZE_WIDTH and INTMAX_WIDTH for the
+  width of some standard typedef names for integer types,
+  again if __STDC_WANT_IEC_60559_BFP_EXT__ is defined
+  before the header is included; note that GCC's implementation of
+  this header is only used for freestanding compilations, not hosted
+  compilations, on most systems.  These macros come from ISO/IEC TS
+  18661-1:2014.
+The float.h header provided by GCC defines
+  the macro CR_DECIMAL_DIG, from ISO/IEC TS 18661-1:2014,
+  if __STDC_WANT_IEC_60559_BFP_EXT__ is defined before
+  the header is included.  This represents the number of decimal
+  digits for which conversions between decimal character strings and
+  binary formats, in both directions, are correctly rounded, and
+  currently has the value of UINTMAX_MAX on all systems,
+  reflecting that GCC's compile-time conversions are correctly rounded
+  for any number of digits.
 
 
 C

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] avoid calling alloca(0)

2016-11-23 Thread Martin Sebor

On 11/23/2016 01:57 PM, Jeff Law wrote:

On 11/20/2016 04:06 PM, Martin Sebor wrote:

On 11/20/2016 01:03 AM, Bernd Edlinger wrote:

On 11/20/16 00:43, Martin Sebor wrote:

As best I can tell the result isn't actually used (the code that
uses the result gets branched over).  GCC just doesn't see it.
I verified this by changing the XALLOCAVEC macro to

  #define XALLOCAVEC(T, N)  (N ? alloca (sizeof (T) * N) : 0)

and bootstrapping and running the regression test suite with
no apparent regressions.



I would like this solution with braces around N better than
allocating one element when actually zero were requested.

The disadvantage of N+1 or N+!N is that it will hide true
programming errors from the sanitizer.


I agree.  Let me post an updated patch with the above fix and
leave it to others to choose which approach to go with.

Just so I'm clear, this part of the discussions is talking about
mitigation strategies within GCC itself, right?


That's right.


Can't we just
gcc_assert (x != 0) before the problematical calls?  That avoids
unnecessary over-allocation and gets us a clean ICE if we were to try
and call alloca with a problematical value.


gcc_assert works only in some instances (e.g., in c-ada-spec.c:191)
but not in others because some actually do make the alloca(0) call
at runtime: at a minimum, lto.c:3285, reg-stack.c:2008, and
tree-ssa-threadedge.c:344 assert during bootstrap.

After reviewing a few more of the XALLOCAVEC calls in the affected
files I suspect that those to alloca(0) pointed out by the warning
may be just a subset that GCC happens to see thanks to constant
propagation.  If that's so then changing this subset to
alloca(N + !N) or some such is probably not the right approach
because it will miss all the other calls where GCC doesn't see that
N is zero.  I think the most robust solution is to do as Bernd
suggests: change XALLOCAVEC as shown above.  That will let us
prevent any and all unsafe assumptions about the result of
alloca(0) being either non-null or distinct.  The other approach
would be to change XALLOCAVEC to add 1 to the byte count.  That
would be in line with what XMALLOC does.


I'm not sure we need to break things up.  I haven't seen a good argument
for that yet.

Is there an updated patch to post?  Or has it already been posted?


Given the above I suggest going with one of the attached two patches.

Martin


include/ChangeLog:

	* libiberty.h (XALLOCAVEC): Return null for zero-size allocations.

Index: include/libiberty.h
===
--- include/libiberty.h	(revision 242768)
+++ include/libiberty.h	(working copy)
@@ -353,7 +353,8 @@ extern unsigned int xcrc32 (const unsigned char *,
 
 /* Array allocators.  */
 
-#define XALLOCAVEC(T, N)	((T *) alloca (sizeof (T) * (N)))
+#define XALLOCAVEC(T, N) \
+  (T *) (((N) != 0) ? alloca (sizeof (T) * (N)) : 0)
 #define XNEWVEC(T, N)		((T *) xmalloc (sizeof (T) * (N)))
 #define XCNEWVEC(T, N)		((T *) xcalloc ((N), sizeof (T)))
 #define XDUPVEC(T, P, N)	((T *) xmemdup ((P), sizeof (T) * (N), sizeof (T) * (N)))
include/ChangeLog:

	* libiberty.h (XALLOCAVEC): Avoid calling alloca with a zero argument.

Index: include/libiberty.h
===
--- include/libiberty.h	(revision 242768)
+++ include/libiberty.h	(working copy)
@@ -353,7 +353,7 @@ extern unsigned int xcrc32 (const unsigned char *,
 
 /* Array allocators.  */
 
-#define XALLOCAVEC(T, N)	((T *) alloca (sizeof (T) * (N)))
+#define XALLOCAVEC(T, N)	((T *) alloca (sizeof (T) * (N) + 1))
 #define XNEWVEC(T, N)		((T *) xmalloc (sizeof (T) * (N)))
 #define XCNEWVEC(T, N)		((T *) xcalloc ((N), sizeof (T)))
 #define XDUPVEC(T, P, N)	((T *) xmemdup ((P), sizeof (T) * (N), sizeof (T) * (N)))


Re: [PATCH] Turn on gnu-indirect-function by default on PowerPC servers

2016-11-23 Thread Joseph Myers
On Wed, 23 Nov 2016, Michael Meissner wrote:

> Since some of the embedded hosts use powerpc-*-linux, I only set the
> default if the target is a 64-bit PowerPC system.  I tested the compiler
> manually to verify that ifunc support was enabled, and it was.  I did a
> boostrap build/check cycle on a little endian power8 system and there were
> regressions.  Can I install the patch on the trunk?
> 
> Note, the documentation for the --enable-gnu-indirect-option does not list
> which systems current enable the option, so I did not modify the
> documentation.

I don't think this should have anything to do with embedded/server, or 
about whether it's configured for 32-bit or 64-bit default.  
powerpc-linux-gnu --enable-targets=all -m64 should behave the same as 
powerpc64-linux-gnu.

We have information about the target glibc version in configure.ac, 
determined from the --with-glibc-version option (in the case of 
bootstrapping a cross toolchain) or from target headers when available 
(native and non-bootstrap cross builds).  That's the right information to 
test (based on whatever version of glibc got the required support for the 
given architecture).  (The relevant configure.ac code is after config.gcc 
is processed.  Maybe have a config.gcc setting to say "enable if glibc 
>= this version" and then rearrange the configure.ac code so glibc 
versions are processed before enable-gnu-indirect-function?)

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Turn on gnu-indirect-function by default on PowerPC servers

2016-11-23 Thread Michael Meissner
This patch changes the default on 64-bit PowerPC Linux systems for the
--enable-gnu-indirect-function configuration option (i.e. support for the ifunc
attribute) to mirror the x86_64/i386/s390x systems that assume the use of
glibcs that support it.

Since some of the embedded hosts use powerpc-*-linux, I only set the
default if the target is a 64-bit PowerPC system.  I tested the compiler
manually to verify that ifunc support was enabled, and it was.  I did a
boostrap build/check cycle on a little endian power8 system and there were
regressions.  Can I install the patch on the trunk?

Note, the documentation for the --enable-gnu-indirect-option does not list
which systems current enable the option, so I did not modify the
documentation.

2016-11-23  Michael Meissner  

* config.gcc (powerpc*-*-linux*): Set gnu-indirect-function by
default on Power server linux systems.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 242797)
+++ gcc/config.gcc  (working copy)
@@ -2443,6 +2443,15 @@ powerpc*-*-linux*)
if test x${enable_secureplt} = xyes; then
tm_file="rs6000/secureplt.h ${tm_file}"
fi
+   # Assume modern glibc if not targeting Android nor uclibc and
+   # generating code for a Power server (64-bit) environment
+   case ${target} in
+   *-*-*android*|*-*-*uclibc*|*-*-*musl*)
+   ;;
+   powerpc64*)
+   default_gnu_indirect_function=yes
+   ;;
+   esac
;;
 powerpc-wrs-vxworks|powerpc-wrs-vxworksae|powerpc-wrs-vxworksmils)
tm_file="${tm_file} elfos.h freebsd-spec.h rs6000/sysv4.h"


[PATCH, rfc] combine: Make code after a new trap unreachable (PR78432)

2016-11-23 Thread Segher Boessenkool
Combine can turn a conditional trap into an unconditional trap.  If it
does that it should make the code after it unreachable (an unconditional
trap should be the last insn in its bb, and that bb has no successors).

This patch seems to work.  It is hard to be sure, this is very hard to
trigger.  Quite a few other passes look like they need something similar
as well, but I don't see anything else handling it yet either.

How does this look?  Any better ideas?


Segher

---
 gcc/combine.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index 18777f8..90397f58 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "rtl.h"
 #include "tree.h"
+#include "cfghooks.h"
 #include "predict.h"
 #include "df.h"
 #include "memmodel.h"
@@ -4648,6 +4649,25 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   update_cfg_for_uncondjump (undobuf.other_insn);
 }
 
+  if (GET_CODE (PATTERN (i3)) == TRAP_IF
+  && XEXP (PATTERN (i3), 0) == const1_rtx)
+{
+  basic_block bb = BLOCK_FOR_INSN (i3);
+  gcc_assert (bb);
+  remove_edge (split_block (bb, i3));
+  *new_direct_jump_p = 1;
+}
+
+  if (undobuf.other_insn
+  && GET_CODE (PATTERN (undobuf.other_insn)) == TRAP_IF
+  && XEXP (PATTERN (undobuf.other_insn), 0) == const1_rtx)
+{
+  basic_block bb = BLOCK_FOR_INSN (undobuf.other_insn);
+  gcc_assert (bb);
+  remove_edge (split_block (bb, undobuf.other_insn));
+  *new_direct_jump_p = 1;
+}
+
   /* A noop might also need cleaning up of CFG, if it comes from the
  simplification of a jump.  */
   if (JUMP_P (i3)
-- 
1.9.3



[PATCH] PR fortran/78500 -- Yet another NULL pointer dereference

2016-11-23 Thread Steve Kargl
Regression tested on x86_64-*-freebsd.  OK to commit?

2016-11-23  Steven G. Kargl  

PR fortran/78500
* expr.c (gfc_check_vardef_contextm): YANPD
* interface.c (matching_typebound_op): Ditto.

2016-11-23  Steven G. Kargl  

PR fortran/78500
* gfortran.dg/pr78500.f90: New test.


Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c  (revision 242789)
+++ gcc/fortran/expr.c  (working copy)
@@ -5291,7 +5291,8 @@ gfc_check_vardef_context (gfc_expr* e, b
  component.  Note that (normal) assignment to procedure pointers is not
  possible.  */
   check_intentin = !own_scope;
-  ptr_component = (sym->ts.type == BT_CLASS && CLASS_DATA (sym))
+  ptr_component = (sym->ts.type == BT_CLASS && sym->ts.u.derived
+  && CLASS_DATA (sym))
  ? CLASS_DATA (sym)->attr.class_pointer : sym->attr.pointer;
   for (ref = e->ref; ref && check_intentin; ref = ref->next)
 {
Index: gcc/fortran/interface.c
===
--- gcc/fortran/interface.c (revision 242789)
+++ gcc/fortran/interface.c (working copy)
@@ -3885,7 +3885,7 @@ matching_typebound_op (gfc_expr** tb_bas
 
if (base->expr->ts.type == BT_CLASS)
  {
-   if (CLASS_DATA (base->expr) == NULL
+   if (!base->expr->ts.u.derived || CLASS_DATA (base->expr) == NULL
|| !gfc_expr_attr (base->expr).class_ok)
  continue;
derived = CLASS_DATA (base->expr)->ts.u.derived;
Index: gcc/testsuite/gfortran.dg/pr78500.f90
===
--- gcc/testsuite/gfortran.dg/pr78500.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr78500.f90   (working copy)
@@ -0,0 +1,5 @@
+! { dg-do compile }
+class(t) function f() ! { dg-error "must be dummy, allocatable or pointer" }
+   f = 1  ! { dg-error "variable must not be polymorphic" }
+end
+

-- 
Steve


[PATCH] combine: Query can_change_dest_mode before changing dest mode

2016-11-23 Thread Segher Boessenkool
As reported in https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02388.html .

Changing the mode of a hard register can lead to problems, or at least
it can make worse code if the result will need reloads.

Tested on avr-elf on the test in that email, and bootstrapped and
regression checked on powerpc64-linux {-m32,-m64}.  Committing to trunk.


Segher


2016-11-23  Segher Boessenkool  

* combine.c (change_zero_ext): Only change the mode of a hard register
destination if can_change_dest_mode holds for that.

---
 gcc/combine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index eef917a..c8a71eb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11287,7 +11287,8 @@ change_zero_ext (rtx pat)
   else if (GET_CODE (x) == ZERO_EXTEND
   && SCALAR_INT_MODE_P (mode)
   && REG_P (XEXP (x, 0))
-  && HARD_REGISTER_P (XEXP (x, 0)))
+  && HARD_REGISTER_P (XEXP (x, 0))
+  && can_change_dest_mode (XEXP (x, 0), 0, mode))
{
  size = GET_MODE_PRECISION (GET_MODE (XEXP (x, 0)));
  x = gen_rtx_REG (mode, REGNO (XEXP (x, 0)));
-- 
1.9.3



Re: [patch] Fix PR rtl-optimization/78437

2016-11-23 Thread Eric Botcazou
> It would need strict_low_part unless we're dealing with subwords. I
> think the patch should maybe check for that in the !W_R_O case.

The code already does the check, it simply won't mess with strict_low_part.

> Does WORD_REGISTER_OPERATIONS really buy much on targets that use it?

Yes, it makes it possible for the combiner to eliminate 2 redundant extensions 
on Alpha for the reduced testcase provided by Uros (that's why there is the 
mode mismatch).  And it's even more helpful if the loads are not explicitly 
extended, like on SPARC (in combination with LOAD_EXTEND_OP of course).

-- 
Eric Botcazou


Re: Add another e500 subreg pattern

2016-11-23 Thread Joseph Myers
On Wed, 23 Nov 2016, Segher Boessenkool wrote:

> > --- gcc/testsuite/gcc.c-torture/compile/20161123-1.c(nonexistent)
> > +++ gcc/testsuite/gcc.c-torture/compile/20161123-1.c(working copy)
> > @@ -0,0 +1,7 @@
> > +double
> > +f (long double x)
> > +{
> > +  union { long double ld; double d[2]; } u;
> > +  u.ld = x;
> > +  return u.d[1];
> > +}
> 
> This is undefined behaviour with -mlong-double-64; do we care?

It's a compile test, which is clearly required to compile for all 
architectures and ABIs, including the cases where it would read 
uninitialized memory on execution.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Tweak LRA handling of shared spill slots

2016-11-23 Thread Jeff Law

On 11/15/2016 09:14 AM, Richard Sandiford wrote:

The previous code processed the users of a stack slot in order of
decreasing size and allocated the slot based on the first user.
This seems a bit dangerous, since the ordering is based on the
mode of the biggest reference while the allocation is based also
on the size of the register itself (which I think could be larger).

That scheme doesn't scale well to polynomial sizes, since there's
no guarantee that the order of the sizes is known at compile time.
This patch instead records an upper bound on the size required
by all users of a slot.  It also records the maximum alignment
requirement.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* function.h (spill_slot_alignment): Declare.
* function.c (spill_slot_alignment): New function.
* lra-spills.c (slot): Add align and size fields.
(assign_mem_slot): Use them in the call to assign_stack_local.
(add_pseudo_to_slot): Update the fields.
(assign_stack_slot_num_and_sort_pseudos): Initialise the fields.
OK.  I like that we can avoid the BYTES_BIG_ENDIAN un-correction by 
allocating a BLKmode slot.  Thankfully that code was something I was 
already familiar with by way of an H8 bugfix eons ago.


Jeff



Fix sprintf buffer warning in assemble_name

2016-11-23 Thread Jeff Law



Until we sort out a reasonable API and implementation for building up 
label strings, this will have to be sufficient.


As outlined earlier this month, when we build up label names, part of 
the name is hidden in the backend, part in the caller.  As a result 
there's no good way to get the length and size the array.


sparc uses more characters than any other in-tree port and as a result 
it could (with sufficient local labels) do an out-of-range write into 
the name array when building a local label.


This patch increases the buffer by one char which is sufficient to avoid 
the problem.


Installing on the trunk.

Jeff
commit c348c67f8c305df504e88659eecf0f5c9994618a
Author: law 
Date:   Wed Nov 23 22:48:45 2016 +

* varasm.c (assemble_name): Increase buffer size for name.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242810 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 23f2c16..46b023e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,7 @@
 2016-11-23  Jeff Law  
 
+   * varasm.c (assemble_name): Increase buffer size for name.
+
* config/spu/spu.md (floatunsdidf2): Remove unused local variable.
 
 2016-11-23  Jakub Kicinski  
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 2c238950..2a20f64 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -2547,7 +2547,7 @@ assemble_name (FILE *file, const char *name)
 rtx
 assemble_static_space (unsigned HOST_WIDE_INT size)
 {
-  char name[16];
+  char name[17];
   const char *namestring;
   rtx x;
 


Re: [patch, nios2] Fix PR78357, adjust sync builtin initialization

2016-11-23 Thread Jeff Law

On 11/16/2016 02:53 AM, Chung-Lin Tang wrote:

This patch adjusts the initialization of __sync built-in functions:
instead of conditionalizing on TARGET_LINUX_ABI, directly place the
target-hook #define in config/nios2/linux.h.  This appears to be in line
with other similar ports, e.g. m68k.

Sebastian, this should solve your issue of not wanting __sync_* libcalls
generated on RTEMS (which also uses TARGET_LINUX_ABI due to TLS support),
can you verify it works for you?

Chung-Lin

PR target/78357
* config/nios2/nios2.c (nios2_init_libfuncs): Remove TARGET_LINUX_ABI
condition.
(TARGET_INIT_LIBFUNCS): Delete definition and...
* config/nios2/linux.h (TARGET_INIT_LIBFUNCS): ...move to here, add
comments.


I fear you may have botched this;

g++ -fno-PIE -c   -g  -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE 
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall 
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. 
-I/home/law/gcc-testing/gcc/gcc -I/home/law/gcc-testing/gcc/gcc/. 
-I/home/law/gcc-testing/gcc/gcc/../include 
-I/home/law/gcc-testing/gcc/gcc/../libcpp/include 
-I/opt/cfarm/mpc/include 
-I/home/law/gcc-testing/gcc/gcc/../libdecnumber 
-I/home/law/gcc-testing/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/home/law/gcc-testing/gcc/gcc/../libbacktrace   -o nios2.o -MT nios2.o 
-MMD -MP -MF ./.deps/nios2.TPo 
/home/law/gcc-testing/gcc/gcc/config/nios2/nios2.c
/home/law/gcc-testing/gcc/gcc/config/nios2/nios2.c:3608:1: error: ‘void 
nios2_init_libfuncs()’ defined but not used [-Werror=unused-function]

 nios2_init_libfuncs (void)

nios2-elf configuration

jeff


Fix trivial spu goof

2016-11-23 Thread Jeff Law


spu.md's floatunsdidf2 pattern has an unused local variable that 
prevents the port from building with config-list.mk.


This patch removes the unused local variable.  Installing on the trunk.

Jeff
commit c418b1aeb420bbce4f4d767bc1adc83a6009398c
Author: law 
Date:   Wed Nov 23 22:17:29 2016 +

* config/spu/spu.md (floatunsdidf2): Remove unused local variable.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242807 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 40fec3d..23f2c16 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2016-11-23  Jeff Law  
+
+   * config/spu/spu.md (floatunsdidf2): Remove unused local variable.
+
 2016-11-23  Jakub Kicinski  
 
* doc/extend.texi: Constify first argument to __builtin_object_size.
diff --git a/gcc/config/spu/spu.md b/gcc/config/spu/spu.md
index 1061cb8..1337344 100644
--- a/gcc/config/spu/spu.md
+++ b/gcc/config/spu/spu.md
@@ -939,7 +939,7 @@
 (unsigned_float:DF (match_operand:DI 1 "register_operand"   "r")))]
   ""
   "{
-rtx value, insns;
+rtx value;
 rtx c0 = spu_const_from_ints (V16QImode, 0x02031011, 0x12138080, 
  0x06071415, 0x16178080);
 rtx c1 = spu_const_from_ints (V4SImode, 1023+63, 1023+31, 0, 0);


Re: PR78153

2016-11-23 Thread Rainer Orth
Hi Prathamesh,

> Thanks, I committed the attached patch as r242786 after
> bootstrap+test on x86_64-unknown-linux-gnu and
> cross-test on arm*-*-*, aarch64*-*-*.

this patch broke Ada bootstrap on Solaris.  I've filed PR middle-end/78501.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix print_node for CONSTRUCTORs

2016-11-23 Thread Jeff Law

On 11/10/2016 06:08 AM, Martin Liška wrote:

Hello.

Following patch fixes indentation of print_node when printing a constructor
that has some equal elements. Current implementation caches tree to prevent deep
debug outputs. Such behavior is undesired for ctor elements. Apart from that,
I switch to hash_set for a table that is used for tree node caching.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Before:
  
constant 1>
val  
constant 120>
idx  
constant 2> val 
idx  
constant 3> val 
idx  
constant 4> val 
idx  constant 
5> val >

After:
  
constant 1>
val  
constant 120>
idx  
constant 2>
val  
constant 120>
idx  
constant 3>
val  
constant 120>
idx  
constant 4>
val  
constant 120>
idx  
constant 5>
val  constant 
120>>

Ready to be installed?
Martin


0001-Fix-print_node-for-CONSTRUCTORs.patch


From 6d18ff00ec1d8e6a8a154fbb70af25b2dda8165e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 16:28:52 +0100
Subject: [PATCH] Fix print_node for CONSTRUCTORs

gcc/ChangeLog:

2016-11-10  Martin Liska  

* print-tree.c (struct bucket): Remove.
(print_node): Add new argument which drives whether a tree node
is printed briefly or not.
(debug_tree): Replace a custom hash table with hash_set.
* print-tree.h (print_node): Add the argument.

OK.

jeff



Re: Fix e500 offset handling for TImode

2016-11-23 Thread Segher Boessenkool
On Wed, Nov 23, 2016 at 09:16:33PM +, Joseph Myers wrote:
> 2016-11-23  Joseph Myers  
> 
>   * config/rs6000/rs6000.c (rs6000_legitimate_offset_address_p): For
>   TARGET_E500_DOUBLE. handle TDmode, TImode and PTImode the same as
>   TFmode, IFmode and KFmode.

This is fine.  Thanks,


Segher


Re: [PATCH] docs: constify argument to __builtin_object_size()

2016-11-23 Thread Jeff Law

On 11/11/2016 06:40 AM, Jakub Kicinski wrote:

It's OK to pass const pointers to __builtin_object_size(),
correct the documentation.

Signed-off-by: Jakub Kicinski 
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

THanks.  Installed.

jeff


Re: Add another e500 subreg pattern

2016-11-23 Thread Segher Boessenkool
On Wed, Nov 23, 2016 at 09:14:44PM +, Joseph Myers wrote:
> This patch adds an insn pattern similar to various patterns already
> present to handle extracting such a subreg.  This allows the glibc
> build to get further, until it runs into an assembler error for which
> I have another patch.  OK to commit?
> 
> gcc:
> 2016-11-23  Joseph Myers  <jos...@codesourcery.com>
> 
>   * config/rs6000/spe.md (*frob__ti_8): New insn
>   pattern.
> 
> gcc/testsuite:
> 2016-11-23  Joseph Myers  <jos...@codesourcery.com>
> 
>   * gcc.c-torture/compile/20161123-1.c: New test.

Okay, thanks!  One question...

> --- gcc/testsuite/gcc.c-torture/compile/20161123-1.c  (nonexistent)
> +++ gcc/testsuite/gcc.c-torture/compile/20161123-1.c  (working copy)
> @@ -0,0 +1,7 @@
> +double
> +f (long double x)
> +{
> +  union { long double ld; double d[2]; } u;
> +  u.ld = x;
> +  return u.d[1];
> +}

This is undefined behaviour with -mlong-double-64; do we care?


Segher


[Committed] PR fortran/78297 -- NULL pointer dereference

2016-11-23 Thread Steve Kargl
I've committed the following patch, which avoids a NULL
pointer dereference.

2016-11-23  Steven G. Kargl  

PR fortran/78297
* trans-common.c (finish_equivalences): Do not dereference a NULL 
pointer.

2016-11-23  Steven G. Kargl  

PR fortran/78297
* gfortran.dg/pr78297.f90: New test.

Index: gcc/fortran/trans-common.c
===
--- gcc/fortran/trans-common.c  (revision 242638)
+++ gcc/fortran/trans-common.c  (working copy)
@@ -1246,8 +1246,12 @@ finish_equivalences (gfc_namespace *ns)
  {
c = gfc_get_common_head ();
/* We've lost the real location, so use the location of the
-  enclosing procedure.  */
-   c->where = ns->proc_name->declared_at;
+  enclosing procedure.  If we're in a BLOCK DATA block, then
+  use the location in the sym_root.  */
+   if (ns->proc_name)
+ c->where = ns->proc_name->declared_at;
+   else if (ns->is_block_data)
+ c->where = ns->sym_root->n.sym->declared_at;
strcpy (c->name, z->module);
  }
else

Index: gcc/testsuite/gfortran.dg/pr78297.f90
===
--- gcc/testsuite/gfortran.dg/pr78297.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr78297.f90   (working copy)
@@ -0,0 +1,11 @@
+! { dg-do compile }
+module m
+   real :: a(2), b(2)
+   real :: c(2), d(2)
+   equivalence (a, b)
+   equivalence (c, d)
+   common /xcom/ a
+end
+block data
+   use m
+end block data
-- 
Steve


Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-23 Thread Uros Bizjak
On Wed, Nov 23, 2016 at 8:01 PM, Bernd Schmidt  wrote:
> On 11/23/2016 07:57 PM, Bernd Schmidt wrote:
>>
>> 2. The i386 backend mishandles SET rtxs. If you have a fairly plain
>> single-insn SET, you tend to get COSTS_N_INSNS (2) out of set_rtx_cost,
>> because rtx_costs has a default of COSTS_N_INSNS (1) for a SET, and you
>> get the cost of the src in addition to that.

Looks good to me.

Thanks,
Uros.


Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE

2016-11-23 Thread Jeff Law

On 11/11/2016 11:41 AM, Jiong Wang wrote:

On 24/10/16 16:22, Jeff Law wrote:


Asserting couldn't hurt.  I'd much rather have the compiler issue an
error, ICE or somesuch than silently not generate a call to the stack
protector fail routine.


Hi Jeff,

  I have just send out the other patch which accelerates
-fstack-protector on
  AArch64, more background information there at:

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01168.html

Confirms what I suspected :-)



  Previous, I was emptying three target insns/hooks, and was relying GCC to
  optimize all remaining SSP runtime stuff out.  I am thinking it's
better and
  safer that GCC allow one backend to disable the default SSP runtime
cleanly,
  so the backend does't need to rely on optimization level, and libssp
is not
  needed at all optimization level.

  In this new patch, I introduced a new target macro for SSP to allow one
  backend GCC's default SSP runtime generation be disabled.

  How does this looks to you?

  Thanks.

gcc/
2016-11-11  Jiong Wang  
* function.c (expand_function_end): Guard stack_protect_epilogue
with
ENABLE_DEFAULT_SSP_RUNTIME.
* cfgexpand.c (pass_expand::execute): Likewise guard for
stack_protect_prologue.
* defaults.h (ENABLE_DEFAULT_SSP_RUNTIME): New macro.  Default
set to 1.
* doc/tm.texi.in (Misc): Documents ENABLE_DEFAULT_SSP_RUNTIME.
* doc/tm.texi: Regenerate.

Like Joseph, I think this should be a hook rather than a new target 
macro.  I do think it's closer to the right track though (separation of 
access to the guard from the rest of the SSP runtime bits).


Jeff


Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE

2016-11-23 Thread Jeff Law

On 10/24/2016 10:29 AM, Jiong Wang wrote:

Right.  But your change could mask backend problems.  Specifically if

their expander for stack_protect_fail did fail and returned NULL_TREE.

That would cause it to silently ignore stack protector failures, which
seems inadvisable.

Is there another way you can re-use the analysis code without
resorting to something like this?


In my case, I only want the canary variable which is
"crtl->stack_protect_guard", then I don't want the current runtime
support which GCC will always generate once crl->stack_protect_guard is
initialized.
Presumably you're using this with the new AArch64 security features. 
ISTM we ought to be able to separate access to the guard from the rest 
of the runtime bits.


Jeff


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Martin Sebor

On 11/23/2016 01:30 PM, Jeff Law wrote:

On 11/23/2016 01:09 PM, Martin Sebor wrote:


I hadn't thought of extending the gimple-ssa-sprintf pass to all
the memxxx and strxxx builtins.  The _chk functions are already
being handled in builtins.c so calling compute_builtin_object_size
for the non-checking ones there and detecting overflow in those
was an easy and, I had hoped, non-controversial enhancement to make.
In a discussion of bug 77784 (handled in the patch for bug 53562)
Jakub also expressed a preference for some of the diagnostics
staying in builtins.c.

I'm just trying to understand how the pieces fit together.  I wasn't
aware of Jakub's desire to keep them in builtins.c.


After thinking about it a bit it does seem that having all the size
and buffer overflow checking (though not necessarily BOS itself) in
the same place or pass would make sense.


I also suspect that the answer to your question is yes.  Range
information is pretty bad in the gimple-ssa-sprintf pass (it looks
like it runs after EVRP but before VRP).  Maybe the pass should run
after VRP?

Let's investigate this separately rather than draw in additional
potential issues.  But I do think this is worth investigation.


Sounds good.





That said, I defer to you on how to proceed here.  I'm prepared
to do the work(*) but I do worry about jeopardizing the chances
of this patch and the others making it into 7.0.

So would it make sense to just init/fini the b_o_s framework in your
pass and for builtin expansion?


I think that should work for the sprintf checking.  Let me test it.
We can deal with the memxxx and strxxx patch (53562) independently
if you prefer.

Thanks
Martin


Re: [PATCH] Dump probability for edges a frequency for BBs

2016-11-23 Thread Jeff Law

On 11/11/2016 06:59 AM, Martin Liška wrote:

Hello.

I spent quite time during this stage1 playing with predictors and we found
with Honza multiple situations where a prediction was oddly calculated.
Thus, we're suggesting to enhance default dump format to show BB frequencies
and edge probabilities, as follows:

main (int a)
{
  int _1;

   [100.0%]:
  if (a_2(D) == 123)
goto  (); [18.8%]
  else
goto  (sparta); [81.2%]

sparta [81.2%]:
  switch (a_2(D))  [33.3%], case 1 ... 2:  [66.7%]>

 [27.1%]:

  # _1 = PHI <2(2), 3(4), a_2(D)(3)>
 [100.0%]:
  return _1;

}

That would exhibit these numbers to people, which would eventually report
strange numbers seen in dump files.
I was quite surprised that the patch does not break many scanning tests.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Thoughts?
Martin


0001-Dump-probability-for-edges-a-frequency-for-BBs.patch


From 5b7d8393564a0111698b58989ac74b45cf019701 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 14:11:48 +0100
Subject: [PATCH] Dump probability for edges a frequency for BBs

gcc/ChangeLog:

2016-11-11  Martin Liska  

* gimple-pretty-print.c (dump_edge_probability): New function.
(dump_gimple_switch): Dump label edge probabilities.
(dump_gimple_cond): Likewise.
(dump_gimple_label): Dump
(dump_gimple_bb_header): Dump basic block frequency.
(pp_cfg_jump): Replace e->dest argument with e.
(dump_implicit_edges): Likewise.
* tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at):
Use gimple_bb (at) instead of at->bb.

gcc/testsuite/ChangeLog:

2016-11-11  Martin Liska  

* gcc.dg/builtin-unreachable-6.c: Update test to not to scan
parts for frequencies/probabilities.
* gcc.dg/pr34027-1.c: Likewise.
* gcc.dg/strict-overflow-2.c: Likewise.
* gcc.dg/tree-ssa/20040703-1.c: Likewise.
* gcc.dg/tree-ssa/builtin-sprintf-2.c: Likewise.
* gcc.dg/tree-ssa/pr32044.c: Likewise.
* gcc.dg/tree-ssa/vector-3.c: Likewise.
* gcc.dg/tree-ssa/vrp101.c: Likewise.
* gcc.dg/tree-ssa/dump-2.c: New test.
LGTM.  While I've often found a way to get this stuff when looking at 
probability updating code, having it in the standard dumps seems like a 
good enhancement.


jeff



Re: [PATCH] avoid calling alloca(0)

2016-11-23 Thread Jeff Law

On 11/18/2016 10:14 AM, Martin Sebor wrote:


Most apps know what malloc (0) means and treat it correctly, they know
they
shouldn't dereference the pointer, because it is either NULL or holds an
array with 0 elements.  I fail to see why you would want to warn.


Because people make mistakes and warnings help us avoid them (isn't
that obvious?)  Just because we get it right most of the time doesn't
mean we get right all of the time.  The papers and exploits I pointed
to should provide ample evidence that zero allocations are a problem
that can have serious (and costly) consequences.  We (i.e., Red Hat)
recognize this risk and have made it our goal to help stem some of
these problems.
I suspect most applications don't ever do malloc (0) and that its 
appearance is more likely an indicator of a bug.  That's what I want to 
cater to -- finding bugs before they get out into the field.


For the oddball application that wants to malloc (0) and try to do 
something sensible, they can turn off the warning.


So I'm in agreement with Martin here.




But malloc(0) has also been known to result from unsigned overflow
which has led to vulnerabilities and exploits (famously by the JPG
COM vulnerability exploit, but there are plenty of others, including
for instance CVE-2016-2851).  Much academic research has been devoted
to this problem and to solutions to detect and prevent it.


How is it different from overflowing to 1 or 2 or 27?  What is special on
the value 0?


It's a case that, unlike the others, can be readily detected.  It
would be nice to detect the others as well but GCC can't do that
yet.  That doesn't mean we shouldn't try to detect at least the
small subset for now.  It doesn't have to be perfect for it to be
useful.
Right.  And as I know I've mentioned to some folks, I'd really like us 
to be pondering a "may overflow" warning for expressions  that feed into 
an allocator.  There's a lot of value in that, but I suspect a lot of 
noise as well.








In the absence of a policy or guidelines it's a matter of opinion
whether or not this warning belongs in either -Wall or -Wextra.


It IMHO doesn't belong to either of these.


You've made that quite clear.  I struggle to reconcile your
position in this case with the one in debate about the
-Wimplicit-fallthrough option where you insisted on the exact
opposite despite the overwhelming number of false positives
caused by it.  Why is it appropriate for that option to be in
-Wextra and not this one?
I disagree with Jakub on this.  I think the warning would be fine for 
Wextra.  It's a lot less invasive than other stuff that's gone in there.


jeff


Re: [PATCH] avoid calling alloca(0)

2016-11-23 Thread Jeff Law

On 11/18/2016 10:25 AM, Jakub Jelinek wrote:

On Fri, Nov 18, 2016 at 10:14:09AM -0700, Martin Sebor wrote:

Because people make mistakes and warnings help us avoid them (isn't
that obvious?)  Just because we get it right most of the time doesn't
mean we get right all of the time.  The papers and exploits I pointed
to should provide ample evidence that zero allocations are a problem
that can have serious (and costly) consequences.  We (i.e., Red Hat)
recognize this risk and have made it our goal to help stem some of
these problems.


Are you talking about cases where user writes malloc (0) explicitly, or
where user writes malloc (n); and the optimizers figure out n is 0,
either always, or e.g. because the compiler decided to duplicate the code
and and in one of the branches it proves it is 0, or you want to add a
runtime warning when malloc is called with 0?
E.g. I don't see how writing malloc (0) in the code should have anything
to do with any overflows.  The cases where jump threading creates cases
where we call functions with constant arguments and we then warn on them
is also problematic, often such code is even dead except the compiler is not
able to prove it.
I would strongly suggest against using explicit vs implicit for this 
kind of thing.  Various transformations can take an implicit and turn it 
into an explicit.  Various transformations may take what starts off as a 
complex expression and eventually through cse, cprop, etc the expression 
collapses down to a 0 which could be explicit or implicit.


I believe we should be warning on trying to allocation 0 bytes of memory 
via malloc, realloc or alloca, with the exception of a non-builtin 
alloca with no return value, but I think we've covered that elsewhere 
and Martin's code will DTRT more by accident than design.


Are there going to be false positives, probably.  But I think Martin has 
already shown the false positive rate to be far lower than many other 
warnings.


Are there cases where someone might explicitly do malloc (0) and do so 
in a safe, but potentially non-portable manner.  Absolutely.  But that 
doesn't mean we should cater to that particular case.






It IMHO doesn't belong to either of these.


You've made that quite clear.  I struggle to reconcile your
position in this case with the one in debate about the
-Wimplicit-fallthrough option where you insisted on the exact
opposite despite the overwhelming number of false positives
caused by it.  Why is it appropriate for that option to be in
-Wextra and not this one?


It also matters what one can do to tell the compiler the code is fine.
For e.g. -Wimplicit-fallthrough and many other warnings one can add
a comment, or attribute, etc. to get the warning out of the way.
But the patch you've posted for the alloca (0) stuff contained
changes of n to n + !n, so the warning basically asks people to slow
down their code for no reason, just to get the warning out of the way.
I'm not buying that as a valid argument.  I suspect the amount of 
computing power to shuttle around the messages on this topic already 
outweighs the computing power for all the + !ns that might get added to 
code to avoid this warning.  Note that folks routinely have put 
initializers in code to avoid false positives from our uninitialized 
variables.


IMHO, the biggest argument against the + !n is that it makes the code 
less clear.  I think a simple assertion is a better choice.  It clearly 
documents that zero is not expected, stops the program if it does indeed 
occur and can be optimized away just as effectively as +!n when n is 
known to be nonzero.


Jeff


Fix e500 offset handling for TImode

2016-11-23 Thread Joseph Myers
Given my previous fix for a missing insn pattern for e500, building
glibc runs into an assembler error "Error: operand out of range (256
is not between 0 and 248)".  This comes from an insn:

(insn 115 1209 1210 (set (reg:DF 27 27 [orig:294 _129 ] [294])
(subreg:DF (mem/c:TI (plus:SI (reg/f:SI 1 1)
(const_int 256 [0x100])) [14 %sfp+256 S16 A128]) 0)) 1909 
{*frob_df_ti}
 (nil))

This patch adjusts the offset handling for TImode - and TDmode and
PTImode in case such subregs can arise for them - to be the same as
for TFmode, so that proper SPE offset checks are made in the
TARGET_E500_DOUBLE case.

This allows the glibc build to complete.  Testing shows 372 FAILs
across the gcc, g++ and libstdc++ testsuites; more cleanup is
certainly needed, but this gets to the point where the toolchain at
least builds so it's possible to compare test results when fixing
bugs.

2016-11-23  Joseph Myers  

* config/rs6000/rs6000.c (rs6000_legitimate_offset_address_p): For
TARGET_E500_DOUBLE. handle TDmode, TImode and PTImode the same as
TFmode, IFmode and KFmode.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 242751)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -8443,14 +8443,13 @@ rs6000_legitimate_offset_address_p (machine_mode m
 case TFmode:
 case IFmode:
 case KFmode:
+case TDmode:
+case TImode:
+case PTImode:
   if (TARGET_E500_DOUBLE)
return (SPE_CONST_OFFSET_OK (offset)
&& SPE_CONST_OFFSET_OK (offset + 8));
-  /* fall through */
 
-case TDmode:
-case TImode:
-case PTImode:
   extra = 8;
   if (!worst_case)
break;

-- 
Joseph S. Myers
jos...@codesourcery.com


Add another e500 subreg pattern

2016-11-23 Thread Joseph Myers
Building glibc for powerpc-linux-gnuspe --enable-e500-double, given
the patch <https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02404.html>
applied, fails with errors such as:

../sysdeps/ieee754/ldbl-128ibm/s_modfl.c: In function '__modfl':
../sysdeps/ieee754/ldbl-128ibm/s_modfl.c:91:1: error: unrecognizable insn:
 }
 ^
(insn 31 30 32 2 (set (reg:DF 203)
(subreg:DF (reg:TI 202) 8)) 
"../sysdeps/ieee754/ldbl-128ibm/s_modfl.c":44 -1
 (nil))
../sysdeps/ieee754/ldbl-128ibm/s_modfl.c:91:1: internal compiler error: in 
extract_insn, at recog.c:2311

This patch adds an insn pattern similar to various patterns already
present to handle extracting such a subreg.  This allows the glibc
build to get further, until it runs into an assembler error for which
I have another patch.  OK to commit?

gcc:
2016-11-23  Joseph Myers  <jos...@codesourcery.com>

* config/rs6000/spe.md (*frob__ti_8): New insn
pattern.

gcc/testsuite:
2016-11-23  Joseph Myers  <jos...@codesourcery.com>

* gcc.c-torture/compile/20161123-1.c: New test.

Index: gcc/config/rs6000/spe.md
===
--- gcc/config/rs6000/spe.md(revision 242751)
+++ gcc/config/rs6000/spe.md(working copy)
@@ -2314,6 +2314,18 @@
 }
 })
 
+(define_insn "*frob__ti_8"
+  [(set (match_operand:SPE64 0 "nonimmediate_operand" "=r")
+(subreg:SPE64 (match_operand:TI 1 "input_operand" "r") 8))]
+  "(TARGET_E500_DOUBLE && mode == DFmode)
+   || (TARGET_SPE && mode != DFmode)"
+{
+  if (WORDS_BIG_ENDIAN)
+return "evmergelo %0,%Y1,%Z1";
+  else
+return "evmergelo %0,%Z1,%Y1";
+})
+
 (define_insn "*frob_tf_ti"
   [(set (match_operand:TF 0 "gpc_reg_operand" "=r")
 (subreg:TF (match_operand:TI 1 "gpc_reg_operand" "r") 0))]
Index: gcc/testsuite/gcc.c-torture/compile/20161123-1.c
===
--- gcc/testsuite/gcc.c-torture/compile/20161123-1.c(nonexistent)
+++ gcc/testsuite/gcc.c-torture/compile/20161123-1.c(working copy)
@@ -0,0 +1,7 @@
+double
+f (long double x)
+{
+  union { long double ld; double d[2]; } u;
+  u.ld = x;
+  return u.d[1];
+}

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] avoid calling alloca(0)

2016-11-23 Thread Jeff Law

On 11/20/2016 04:06 PM, Martin Sebor wrote:

On 11/20/2016 01:03 AM, Bernd Edlinger wrote:

On 11/20/16 00:43, Martin Sebor wrote:

As best I can tell the result isn't actually used (the code that
uses the result gets branched over).  GCC just doesn't see it.
I verified this by changing the XALLOCAVEC macro to

  #define XALLOCAVEC(T, N)  (N ? alloca (sizeof (T) * N) : 0)

and bootstrapping and running the regression test suite with
no apparent regressions.



I would like this solution with braces around N better than
allocating one element when actually zero were requested.

The disadvantage of N+1 or N+!N is that it will hide true
programming errors from the sanitizer.


I agree.  Let me post an updated patch with the above fix and
leave it to others to choose which approach to go with.
Just so I'm clear, this part of the discussions is talking about 
mitigation strategies within GCC itself, right?   Can't we just
gcc_assert (x != 0) before the problematical calls?  That avoids 
unnecessary over-allocation and gets us a clean ICE if we were to try 
and call alloca with a problematical value.



ed in the first patch.  I'll post an updated patch soon.




I am a bit worried about that suggestion in libiberty/alloca.c:
"It is a good idea to use alloca(0) in
  your main control loop, etc. to force garbage collection."

Because:

#include 
#include 
int main()
{
   void* p;
   int i;
   for (i=0 ; i<100 ; i++)
   {
 p = alloca(0);
 printf("%p\n", p);
   }
}

... is fine, and allocates no memory.
But if I change that to alloca(1) the stack will blow up.


The suggestion to call alloca(0) in the main loop is to trigger
garbage collection when alloca is implemented using malloc (as
is apparently the case in some embedded environments).  In that
case, alloca is not a built-in but a library function (i.e.,
-fno-builtin-alloca must be set) and the warning doesn't trigger.

Correct.



This call to alloca(0) would also likely be made without using
the return value.  When __builtin_alloca is called without using
the return value, the call is eliminated and the warning doesn't
trigger either.

Also correct.


There is a -Walloca-larger-than that Aldy added earlier this
year.  The option also diagnoses alloca(0), and it also warns
on calls to alloca in loops, but it's disabled by default.
I feel fairly strongly that all zero-length allocations
should be diagnosed at least with -Wextra but if that's what
it takes to move forward I'm willing to break things up this
way.  If that's preferred then I would also expect to enable
-Walloca-larger-than=SIZE_MAX/2 (or smaller) by default, in
addition to having -Walloca-zero in -Wextra, analogously to
the corresponding warnings for the heap allocation functions.
(Ditto for -Wvla-larger-than=SIZE_MAX/2 and -Wvla-zero which
should probably also be added for consistency.)
I'm not sure we need to break things up.  I haven't seen a good argument 
for that yet.


Is there an updated patch to post?  Or has it already been posted?

jeff


Re: [PATCH 8/9] Introduce class function_reader (v4)

2016-11-23 Thread David Malcolm
On Wed, 2016-11-23 at 21:15 +0100, Bernd Schmidt wrote:
> On 11/11/2016 10:15 PM, David Malcolm wrote:
> 
> > +static void
> > +aarch64_test_loading_full_dump ()
> > +{
> > +  rtl_dump_test t (SELFTEST_LOCATION, locate_file ("aarch64/times
> > -two.rtl"));
> > +
> > +  ASSERT_STREQ ("times_two", IDENTIFIER_POINTER (DECL_NAME (cfun
> > ->decl)));
> > +
> > +  rtx_insn *insn_1 = get_insn_by_uid (1);
> > +  ASSERT_EQ (NOTE, GET_CODE (insn_1));
> > +
> > +  rtx_insn *insn_15 = get_insn_by_uid (15);
> > +  ASSERT_EQ (INSN, GET_CODE (insn_15));
> > +  ASSERT_EQ (USE, GET_CODE (PATTERN (insn_15)));
> > +
> > +  /* Verify crtl->return_rtx.  */
> > +  ASSERT_EQ (REG, GET_CODE (crtl->return_rtx));
> > +  ASSERT_EQ (0, REGNO (crtl->return_rtx));
> > +  ASSERT_EQ (SImode, GET_MODE (crtl->return_rtx));
> > +}
> 
> The problem I'm having with this patch, and why I've not really 
> commented on the latter parts of the series, is that I was hoping the
> tests would be more self-contained.

The tests in this patch (8/9) are more about verifying that the loader
is sane: they load a dump, and then assert various properties of it, to
ensure that we're loading things correctly.  I moved the example dumps
out of the .c files (using locate_file) to avoid having to embed them
in the .c file and escape things (which people objected to in an
earlier version of the kit).

> For transformations I had mentioned 
> the idea of having both before and after dumps.
> 
> Having a few test files with C code to verify them is probably not a
> big 
> deal, but I wonder whether this is going to be the norm rather than
> the 
> exception, and whether there are better ways of doing it.

These will be the exception; they're really just about verifying that
the loader is working correctly.

In a previous version of the kit there were indeed selftests that
loaded a dump and attempted to run parts of a pass on it.  I've deleted
all of those style of selftest.

> Maybe there 
> needs to be a way of annotating the input RTL to tell the compiler
> what 
> kinds of tests to run on it.

The "real" tests are in patch 9 of 9, and use DejaGnu directives
(scanning dumps, and "dg-do run" to actually run the generated code).


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Jeff Law

On 11/23/2016 01:09 PM, Martin Sebor wrote:


I hadn't thought of extending the gimple-ssa-sprintf pass to all
the memxxx and strxxx builtins.  The _chk functions are already
being handled in builtins.c so calling compute_builtin_object_size
for the non-checking ones there and detecting overflow in those
was an easy and, I had hoped, non-controversial enhancement to make.
In a discussion of bug 77784 (handled in the patch for bug 53562)
Jakub also expressed a preference for some of the diagnostics
staying in builtins.c.
I'm just trying to understand how the pieces fit together.  I wasn't 
aware of Jakub's desire to keep them in builtins.c.




I also suspect that the answer to your question is yes.  Range
information is pretty bad in the gimple-ssa-sprintf pass (it looks
like it runs after EVRP but before VRP).  Maybe the pass should run
after VRP?
Let's investigate this separately rather than draw in additional 
potential issues.  But I do think this is worth investigation.




That said, I defer to you on how to proceed here.  I'm prepared
to do the work(*) but I do worry about jeopardizing the chances
of this patch and the others making it into 7.0.
So would it make sense to just init/fini the b_o_s framework in your 
pass and for builtin expansion?



PS If I understand what you are suggesting this would mean
extending the gimple-ssa-sprintf pass to the memxxx and strxxx
functions and running the pass later, after VRP.
That was my original thought, but I'm certainly not deeply vested in it 
-- it was primarily to avoid initializing b_o_s an extra time.


Jeff



Re: [PATCH][PPC] Fix ICE using power9 with soft-float

2016-11-23 Thread Michael Meissner
On Wed, Nov 23, 2016 at 11:52:01AM +, Andrew Stubbs wrote:
> On 16/11/16 17:05, Michael Meissner wrote:
> >I'm starting to test this patch right now (it's on LE power8 stage3 right 
> >now,
> >and I need to build BE power8 and BE power7 versions when I get into the 
> >office
> >shortly, and build spec 2017 with it for PR 78101):
> 
> Did the testing go OK?

Yes, the testing worked fine, and I checked in the fix.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH 1/2 v3] PR77822

2016-11-23 Thread Jeff Law

On 11/21/2016 04:03 AM, Dominik Vogt wrote:

On Fri, Nov 18, 2016 at 04:29:18PM +0100, Dominik Vogt wrote:

> On Fri, Nov 18, 2016 at 08:02:08AM -0600, Segher Boessenkool wrote:

> > On Fri, Nov 18, 2016 at 01:09:24PM +0100, Dominik Vogt wrote:

> > > IN_RANGE(POS...) makes sure that POS is a non-negative number
> > > smaller than UPPER, so (UPPER) - (POS) does not wrap.  Or is there
> > > some case that the new macro does not handle?

> >
> > I think it works fine.
> >
> > With a name like UPPER, you may want to have SIZE_POS_IN_RANGE work like
> > IN_RANGE, i.e. UPPER is inclusive then.  Dunno.

>
> Yeah, maybe rather call it RANGE to avoid too much similarity.
> Some name that is so vague that one has to read the documentation.
> I'll post an updated patch later.

Third version of the patch attached.  The only difference is that
the macro argument name has been changed back to RANGE.

Ciao

Dominik ^_^  ^_^

-- Dominik Vogt IBM Germany


0001-v3-ChangeLog


gcc/ChangeLog

PR target/77822
* system.h (SIZE_POS_IN_RANGE): New.
OK.  Though system.h seems like an unfortunate place.  Would rtl.h work 
better since this is really about verifying the arguments to things like 
{zero,sign}_extract which are RTL concepts.


jeff



Re: [PATCH 01/11] use rtx_insn * more places where it is obvious

2016-11-23 Thread Jeff Law

On 11/22/2016 03:10 AM, Andreas Schwab wrote:

../../gcc/config/ia64/ia64.c:7141:13: error: 'void ia64_emit_insn_before(rtx, 
rtx)' declared 'static' but never defined [-Werror=unused-function]
 static void ia64_emit_insn_before (rtx, rtx);
 ^

I fixed this and a similar instance in another port.

jeff


Re: [PATCH 8/9] Introduce class function_reader (v4)

2016-11-23 Thread Bernd Schmidt

On 11/11/2016 10:15 PM, David Malcolm wrote:


+static void
+aarch64_test_loading_full_dump ()
+{
+  rtl_dump_test t (SELFTEST_LOCATION, locate_file ("aarch64/times-two.rtl"));
+
+  ASSERT_STREQ ("times_two", IDENTIFIER_POINTER (DECL_NAME (cfun->decl)));
+
+  rtx_insn *insn_1 = get_insn_by_uid (1);
+  ASSERT_EQ (NOTE, GET_CODE (insn_1));
+
+  rtx_insn *insn_15 = get_insn_by_uid (15);
+  ASSERT_EQ (INSN, GET_CODE (insn_15));
+  ASSERT_EQ (USE, GET_CODE (PATTERN (insn_15)));
+
+  /* Verify crtl->return_rtx.  */
+  ASSERT_EQ (REG, GET_CODE (crtl->return_rtx));
+  ASSERT_EQ (0, REGNO (crtl->return_rtx));
+  ASSERT_EQ (SImode, GET_MODE (crtl->return_rtx));
+}


The problem I'm having with this patch, and why I've not really 
commented on the latter parts of the series, is that I was hoping the 
tests would be more self-contained. For transformations I had mentioned 
the idea of having both before and after dumps.


Having a few test files with C code to verify them is probably not a big 
deal, but I wonder whether this is going to be the norm rather than the 
exception, and whether there are better ways of doing it. Maybe there 
needs to be a way of annotating the input RTL to tell the compiler what 
kinds of tests to run on it.



Bernd



Re: [PATCH 3/9] Introduce emit_status::ensure_regno_capacity

2016-11-23 Thread Jeff Law

On 11/11/2016 02:15 PM, David Malcolm wrote:

Link to earlier version of the patch:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00278.html

gcc/ChangeLog:
* emit-rtl.c (gen_reg_rtx): Move regno_pointer_align and
regno_reg_rtx resizing logic to...
(emit_status::ensure_regno_capacity): ...this new method.
(init_emit): Allocate regno_reg_rtx using ggc_cleared_vec_alloc
rather than ggc_vec_alloc.
* function.h (emit_status::ensure_regno_capacity): New method.

OK.
jeff



Re: [PATCH 4/9] (approved) Add some functions for use by the RTL frontend.

2016-11-23 Thread Jeff Law

On 11/11/2016 02:15 PM, David Malcolm wrote:

An earlier version of this was approved by Bernd as:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00280.html
and the changes since then probably fall under the "obvious" rule.

gcc/ChangeLog:
* read-md.c (rtx_reader::require_char): New method.
(require_char_ws): Convert from function to...
(rtx_reader::require_char_ws): ...method.
(rtx_reader::require_word_ws): New method.
* read-md.h (rtx_reader::require_char): New method decl.
(require_char_ws): Remove global decl in favor of...
(rtx_reader::require_char_ws): ...new method decl.
(rtx_reader::require_word_ws): New method decl.
(rtx_reader::peek_char): New method decl.

OK as well.

jeff



Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Martin Sebor

On 11/23/2016 12:47 PM, Jeff Law wrote:

On 11/23/2016 12:32 PM, Martin Sebor wrote:


My worry here would be a hash collision.  Then we'd be using object
sizes from the wrong function.


Ah, right, that might explain the ICE I just noticed during Ada
bootstrap.  Is there some other way to uniquely identify a function?
A DECL_UID maybe?

DECL_UID would be your best bet.  But ISTM that trying to avoid
invocations by reusing data from prior passes is likely to be more
fragile than recomputing on a per-pass basis -- as long as the number of
times we need this stuff is small (as I suspect it is).





Isn't the goal here to be able to get format-length warnings when there
aren't explicit calls to _b_o_s in the IL?   Can't you initialize the
object-size framework at the start of your pass and tear it down when
your pass is complete?  You could do that by exporting the init/fini
routines and calling them directly, or by wrapping that in a class and
instantiating the class when you need it.

That would avoid having to worry about the GC system entirely since you
wouldn't have stuff living across passes.


Yes, that is the immediate goal of this patch.  Beyond it, though,
I would like to call this function from anywhere, including during
expansion (as is done in my patch for bug 53562 and related).

But why not detect the builtins during your pass and check there.  ie, I
don't see why we necessarily need to have checking and expansion
intertwined together.  Maybe I'm missing something.  Is there something
that makes it inherently easier or better to implement checking during
builtin expansion?


I hadn't thought of extending the gimple-ssa-sprintf pass to all
the memxxx and strxxx builtins.  The _chk functions are already
being handled in builtins.c so calling compute_builtin_object_size
for the non-checking ones there and detecting overflow in those
was an easy and, I had hoped, non-controversial enhancement to make.
In a discussion of bug 77784 (handled in the patch for bug 53562)
Jakub also expressed a preference for some of the diagnostics
staying in builtins.c.

I also suspect that the answer to your question is yes.  Range
information is pretty bad in the gimple-ssa-sprintf pass (it looks
like it runs after EVRP but before VRP).  Maybe the pass should run
after VRP?

That said, I defer to you on how to proceed here.  I'm prepared
to do the work(*) but I do worry about jeopardizing the chances
of this patch and the others making it into 7.0.

Martin

PS If I understand what you are suggesting this would mean
extending the gimple-ssa-sprintf pass to the memxxx and strxxx
functions and running the pass later, after VRP.


Re: [PATCH 2/9] (approved) Introduce rtl_data::init_stack_alignment

2016-11-23 Thread Jeff Law

On 11/11/2016 02:15 PM, David Malcolm wrote:

(approved by Bernd)

And me too :-0



Move this part of "expand"'s initialization of crtl into its own
method so that it can used by the RTL frontend when postprocessing
RTL dumps.

gcc/ChangeLog:
* cfgexpand.c (pass_expand::execute): Move stack initializations
to rtl_data::init_stack_alignment and call it.
* emit-rtl.c (rtl_data::init_stack_alignment): New method.
* emit-rtl.h (rtl_data::init_stack_alignment): New method.

OK.
jeff


Re: [PATCH] eliminate calls to snprintf(0, 0, ...) with known return value (pr78476)

2016-11-23 Thread Jeff Law

On 11/22/2016 08:02 PM, Martin Sebor wrote:

Calls to bounded functions like snprintf with a zero-size buffer
are special requests to compute the size of output without actually
writing any.  For example:

  int n = snprintf(0, 0, "%08x", rand ());

is a request to compute the number of bytes that the function would
format if it were passed a buffer of non-zero size.  In the example
above since the return value is known to be exactly 8, not only can
the snprintf return value be folded into a constant but the whole
call to snprintf can be eliminated.

The attached patch enables this optimization under the
-fprintf-return-value option.  The patch depends on the one for bug
78461 (posted earlier today) during the testing of which I noticed
that this optimization was missing from the gimple-ssa-sprintf pass.

Thanks
Martin

gcc-78476.diff


PR tree-optimization/78476 - snprintf(0, 0, ...) with known arguments not 
optimized away

gcc/testsuite/ChangeLog:

PR tree-optimization/78476
* gcc.dg/tree-ssa/builtin-sprintf-5.c: New test.

gcc/ChangeLog:

PR tree-optimization/78476
* gimple-ssa-sprintf.c (struct pass_sprintf_length::call_info):
Add a member.
(handle_gimple_call): Adjust signature.
(try_substitute_return_value): Remove calls to bounded functions
with zero buffer size whose result is known.
(pass_sprintf_length::execute): Adjust call to handle_gimple_call.

OK.

I was initially a little surprised you didn't just delete the call from 
the IL and emit the trivial constant assignment.  Then I realized you 
probably would need to update the vuse/vdefs that were on the original 
call.  Using update_call_from_tree seems to take care of that for you.


I didn't see a negative test -- ie one that you shouldn't transform. 
Could you extract the one from c#0 in the referenced bz and add it as a 
negative test.  You can do that as a separate patch which you should 
consider pre-approved.


Jeff


Re: [PATCH] Fix up testcase (PR tree-optimization/78482)

2016-11-23 Thread Jeff Law

On 11/23/2016 12:36 PM, Jakub Jelinek wrote:

Hi!

The testcase uses char and can't work properly with unsigned char (otherwise
c >= 0 is always true).

In addition to that I've noticed that the testcase as is actually doesn't
fail with the unfixed gcc, it will just print 2 on stdout.

The following has been tested on x86_64-linux and i686-linux with both
the broken compiler (FAILs) and fixed one (PASSes).

Ok for trunk?

2016-11-23  Jakub Jelinek  

PR tree-optimization/78482
* gcc.dg/torture/pr78482.c (c, d): Use signed char instead of char.
(bar): New function.
(main): Call bar instead of printf.

OK.

jeff



[PATCH, i386]: Fix *hi_1 operand 2 constraints.

2016-11-23 Thread Uros Bizjak
De-macroization artefact.

2016-11-23  Uros Bizjak  

* config/i386/i386.md (*hi_1): Fix operand 2 constraints.

Bootstrapped, regression test on x86_64-linux-gnu {,-m32} in process.

Will commit to mainline and release branches ASAP.

Uros.

Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 242791)
+++ config/i386/i386.md (working copy)
@@ -8806,7 +8806,7 @@
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!k")
(any_or:HI
 (match_operand:HI 1 "nonimmediate_operand" "%0,0,k")
-(match_operand:HI 2 "general_operand" ",r,k")))
+(match_operand:HI 2 "general_operand" "rmn,rn,k")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (, HImode, operands)"
   "@


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Jeff Law

On 11/23/2016 12:32 PM, Martin Sebor wrote:


My worry here would be a hash collision.  Then we'd be using object
sizes from the wrong function.


Ah, right, that might explain the ICE I just noticed during Ada
bootstrap.  Is there some other way to uniquely identify a function?
A DECL_UID maybe?
DECL_UID would be your best bet.  But ISTM that trying to avoid 
invocations by reusing data from prior passes is likely to be more 
fragile than recomputing on a per-pass basis -- as long as the number of 
times we need this stuff is small (as I suspect it is).






Isn't the goal here to be able to get format-length warnings when there
aren't explicit calls to _b_o_s in the IL?   Can't you initialize the
object-size framework at the start of your pass and tear it down when
your pass is complete?  You could do that by exporting the init/fini
routines and calling them directly, or by wrapping that in a class and
instantiating the class when you need it.

That would avoid having to worry about the GC system entirely since you
wouldn't have stuff living across passes.


Yes, that is the immediate goal of this patch.  Beyond it, though,
I would like to call this function from anywhere, including during
expansion (as is done in my patch for bug 53562 and related).
But why not detect the builtins during your pass and check there.  ie, I 
don't see why we necessarily need to have checking and expansion 
intertwined together.  Maybe I'm missing something.  Is there something 
that makes it inherently easier or better to implement checking during 
builtin expansion?


Jeff



Re: [3/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-23 Thread Uros Bizjak
> 3. ifcvt computes the sum of costs for the involved blocks, but only
> makes a before/after comparison when optimizing for size. When
> optimizing for speed, it uses max_seq_cost, which is an estimate
> computed from BRANCH_COST, which in turn can be zero for predictable
> branches on x86.

Can you please correct the addition below? It makes me cry thinking
that buggy function will be immortalized in the gcc testsuite...

> +static inline u128 add_u128 (u128 a, u128 b)
> +{

  a.hi += b.hi;

> +  a.lo += b.lo;
> +  if (a.lo < b.lo)
> +a.hi++;
> +
> +  return a;

Uros.


Re: [PATCH] Fix up -fsanitize=undefined (PR sanitizer/69278)

2016-11-23 Thread Richard Biener
On November 23, 2016 8:33:54 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>As mentioned in the PR, the r240491 change broke -fsanitize=undefined,
>which no longer enables -fsanitize=unreachable or -fsanitize=return.
>That is undesirable change, we only want not to enable
>-fsanitize-recover=unreachable,return on -fsanitize-recover=undefined.
>
>Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK.

Richard.

>2016-11-23  Jakub Jelinek  
>
>   PR sanitizer/69278
>   * opts.c (parse_sanitizer_options): For -fsanitize=undefined,
>   restore enabling also SANITIZE_UNREACHABLE and SANITIZE_RETURN.
>
>   * g++.dg/ubsan/return-7.C: New test.
>   * c-c++-common/ubsan/unreachable-4.c: New test.
>
>--- gcc/opts.c.jj  2016-11-23 16:47:34.0 +0100
>+++ gcc/opts.c 2016-11-23 18:25:05.400325912 +0100
>@@ -1558,7 +1558,8 @@ parse_sanitizer_options (const char *p,
>   /* Do not enable -fsanitize-recover=unreachable and
>  -fsanitize-recover=return if -fsanitize-recover=undefined
>  is selected.  */
>-  if (sanitizer_opts[i].flag == SANITIZE_UNDEFINED)
>+  if (code == OPT_fsanitize_recover_
>+  && sanitizer_opts[i].flag == SANITIZE_UNDEFINED)
> flags |= (SANITIZE_UNDEFINED
>   & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN));
>   else
>--- gcc/testsuite/g++.dg/ubsan/return-7.C.jj   2016-11-23
>18:27:16.525687072 +0100
>+++ gcc/testsuite/g++.dg/ubsan/return-7.C  2016-11-23 18:27:21.854620469
>+0100
>@@ -0,0 +1,27 @@
>+// { dg-do run }
>+// { dg-options "-fsanitize=undefined" }
>+// { dg-shouldfail "ubsan" }
>+
>+struct S { S (); ~S (); };
>+
>+S::S () {}
>+S::~S () {}
>+
>+int
>+foo (int x)
>+{
>+  S a;
>+  {
>+S b;
>+if (x)
>+  return 1;
>+  }
>+}
>+
>+int
>+main ()
>+{
>+  foo (0);
>+}
>+
>+// { dg-output "execution reached the end of a value-returning
>function without returning a value" }
>--- gcc/testsuite/c-c++-common/ubsan/unreachable-4.c.jj2016-11-23
>18:25:57.692672348 +0100
>+++ gcc/testsuite/c-c++-common/ubsan/unreachable-4.c   2016-11-23
>18:26:05.961569001 +0100
>@@ -0,0 +1,10 @@
>+/* { dg-do run } */
>+/* { dg-options "-fsanitize=undefined" } */
>+/* { dg-shouldfail "ubsan" } */
>+
>+int
>+main (void)
>+{
>+  __builtin_unreachable ();
>+}
>+ /* { dg-output "execution reached a __builtin_unreachable\\(\\) call"
>} */
>
>   Jakub




Re: [PATCH] Fix up testcase (PR tree-optimization/78482)

2016-11-23 Thread Richard Biener
On November 23, 2016 8:36:11 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The testcase uses char and can't work properly with unsigned char
>(otherwise
>c >= 0 is always true).
>
>In addition to that I've noticed that the testcase as is actually
>doesn't
>fail with the unfixed gcc, it will just print 2 on stdout.
>
>The following has been tested on x86_64-linux and i686-linux with both
>the broken compiler (FAILs) and fixed one (PASSes).
>
>Ok for trunk?

OK.  Sorry for the breakage.  I thought we'd get excess errors from the printf.

Richard.

>2016-11-23  Jakub Jelinek  
>
>   PR tree-optimization/78482
>   * gcc.dg/torture/pr78482.c (c, d): Use signed char instead of char.
>   (bar): New function.
>   (main): Call bar instead of printf.
>
>--- gcc/testsuite/gcc.dg/torture/pr78482.c.jj  2016-11-23
>16:47:32.0 +0100
>+++ gcc/testsuite/gcc.dg/torture/pr78482.c 2016-11-23
>19:59:31.0 +0100
>@@ -1,9 +1,9 @@
>+/* PR tree-optimization/78482 */
> /* { dg-do run } */
> 
>-int printf(const char*, ...);
> short a = 65531;
> int b = 3, f;
>-char c, d;
>+signed char c, d;
> static void fn1(int p1)
> {
>   short e;
>@@ -22,13 +22,22 @@ static void fn1(int p1)
> }
> }
> 
>+__attribute__((noinline, noclone))
>+int bar (const char *x, int y)
>+{
>+  asm volatile ("" : "+g" (x), "+g" (y) : : "memory");
>+  if (y == 2)
>+__builtin_abort ();
>+  return 0;
>+}
>+
> int main()
> {
>   for (; c >= 0; c--)
> {
>   if (!b)
>   {
>-printf("%d\n", 2);
>+bar("%d\n", 2);
> continue;
>   }
>   fn1(a);
>
>   Jakub




Re: [3/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-23 Thread Jeff Law

On 11/23/2016 12:02 PM, Bernd Schmidt wrote:

On 11/23/2016 07:57 PM, Bernd Schmidt wrote:

3. ifcvt computes the sum of costs for the involved blocks, but only
makes a before/after comparison when optimizing for size. When
optimizing for speed, it uses max_seq_cost, which is an estimate
computed from BRANCH_COST, which in turn can be zero for predictable
branches on x86.


This is the final patch and has the testcase. It also happens to be the
least risky of the series so it could be applied on its own (without the
test).


Bernd


71280-3.diff


PR rtl-optimization/78120
* ifcvt.c (noce_conversion_profitable_p): Check original cost in all
cases, and additionally test against max_seq_cost for speed
optimization.
(noce_process_if_block): Compute an estimate for the original cost when
optimizing for speed, using the minimum of then and else block costs.

PR rtl-optimization/78120
* gcc.target/i386/pr78120.c: New test.
Also OK.  Obviously Uros has the call on the x86 target change.  Stage 
the series in as you see fit given the dependencies.


jeff



[PATCH] Fix up testcase (PR tree-optimization/78482)

2016-11-23 Thread Jakub Jelinek
Hi!

The testcase uses char and can't work properly with unsigned char (otherwise
c >= 0 is always true).

In addition to that I've noticed that the testcase as is actually doesn't
fail with the unfixed gcc, it will just print 2 on stdout.

The following has been tested on x86_64-linux and i686-linux with both
the broken compiler (FAILs) and fixed one (PASSes).

Ok for trunk?

2016-11-23  Jakub Jelinek  

PR tree-optimization/78482
* gcc.dg/torture/pr78482.c (c, d): Use signed char instead of char.
(bar): New function.
(main): Call bar instead of printf.

--- gcc/testsuite/gcc.dg/torture/pr78482.c.jj   2016-11-23 16:47:32.0 
+0100
+++ gcc/testsuite/gcc.dg/torture/pr78482.c  2016-11-23 19:59:31.0 
+0100
@@ -1,9 +1,9 @@
+/* PR tree-optimization/78482 */
 /* { dg-do run } */
 
-int printf(const char*, ...);
 short a = 65531;
 int b = 3, f;
-char c, d;
+signed char c, d;
 static void fn1(int p1)
 {
   short e;
@@ -22,13 +22,22 @@ static void fn1(int p1)
 }
 }
 
+__attribute__((noinline, noclone))
+int bar (const char *x, int y)
+{
+  asm volatile ("" : "+g" (x), "+g" (y) : : "memory");
+  if (y == 2)
+__builtin_abort ();
+  return 0;
+}
+
 int main()
 {
   for (; c >= 0; c--)
 {
   if (!b)
{
- printf("%d\n", 2);
+ bar("%d\n", 2);
  continue;
}
   fn1(a);

Jakub


[PATCH] Fix up -fsanitize=undefined (PR sanitizer/69278)

2016-11-23 Thread Jakub Jelinek
Hi!

As mentioned in the PR, the r240491 change broke -fsanitize=undefined,
which no longer enables -fsanitize=unreachable or -fsanitize=return.
That is undesirable change, we only want not to enable
-fsanitize-recover=unreachable,return on -fsanitize-recover=undefined.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-11-23  Jakub Jelinek  

PR sanitizer/69278
* opts.c (parse_sanitizer_options): For -fsanitize=undefined,
restore enabling also SANITIZE_UNREACHABLE and SANITIZE_RETURN.

* g++.dg/ubsan/return-7.C: New test.
* c-c++-common/ubsan/unreachable-4.c: New test.

--- gcc/opts.c.jj   2016-11-23 16:47:34.0 +0100
+++ gcc/opts.c  2016-11-23 18:25:05.400325912 +0100
@@ -1558,7 +1558,8 @@ parse_sanitizer_options (const char *p,
/* Do not enable -fsanitize-recover=unreachable and
   -fsanitize-recover=return if -fsanitize-recover=undefined
   is selected.  */
-   if (sanitizer_opts[i].flag == SANITIZE_UNDEFINED)
+   if (code == OPT_fsanitize_recover_
+   && sanitizer_opts[i].flag == SANITIZE_UNDEFINED)
  flags |= (SANITIZE_UNDEFINED
& ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN));
else
--- gcc/testsuite/g++.dg/ubsan/return-7.C.jj2016-11-23 18:27:16.525687072 
+0100
+++ gcc/testsuite/g++.dg/ubsan/return-7.C   2016-11-23 18:27:21.854620469 
+0100
@@ -0,0 +1,27 @@
+// { dg-do run }
+// { dg-options "-fsanitize=undefined" }
+// { dg-shouldfail "ubsan" }
+
+struct S { S (); ~S (); };
+
+S::S () {}
+S::~S () {}
+
+int
+foo (int x)
+{
+  S a;
+  {
+S b;
+if (x)
+  return 1;
+  }
+}
+
+int
+main ()
+{
+  foo (0);
+}
+
+// { dg-output "execution reached the end of a value-returning function 
without returning a value" }
--- gcc/testsuite/c-c++-common/ubsan/unreachable-4.c.jj 2016-11-23 
18:25:57.692672348 +0100
+++ gcc/testsuite/c-c++-common/ubsan/unreachable-4.c2016-11-23 
18:26:05.961569001 +0100
@@ -0,0 +1,10 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=undefined" } */
+/* { dg-shouldfail "ubsan" } */
+
+int
+main (void)
+{
+  __builtin_unreachable ();
+}
+ /* { dg-output "execution reached a __builtin_unreachable\\(\\) call" } */

Jakub


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Martin Sebor

On 11/23/2016 12:10 PM, Jeff Law wrote:

On 11/23/2016 11:26 AM, Martin Sebor wrote:

My only real concern here is that if we call compute_builtin_object_size
without having initialized the passes, then we initialize, compute, then
finalize.  Subsequent calls will go through the same process -- the key
being each one re-computes the internal state which might get expensive.

Wouldn't it just make more sense to pull up the init/fini calls, either
explicitly (which likely means externalizing the init/fini routines) or
by wrapping all this stuff in a class and instantiating a suitable
object?

I think the answer to your memory management question is that internal
state is likely not marked as a GC root and thus if you get a GC call
pieces of internal state are not seen as reachable, but you still can
reference them.  ie, you end up with dangling pointers.

Usually all you'd have to do is mark them so that gengtype will see
them.  Bitmaps, trees, rtl, are all good examples.  So marking the
bitmap would look like:

static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where
someone has an object expected to be live across passes, but it isn't
reachable because someone failed to register a GC root.


Thanks.  Attached is an updated patch that avoids flushing the computed
data until the current function changes, and tells the garbage collector
not to collect it.  I haven't finished bootstrapping and regtesting it
yet but running it through Valgrind doesn't show any errors.  Please
let me know if this is what you had in mind.

Thanks
Martin

gcc-78245.diff


PR middle-end/78245 - missing -Wformat-length on an overflow of a
dynamically allocated buffer

gcc/testsuite/ChangeLog:

PR middle-end/78245
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

PR middle-end/78245
* gimple-ssa-sprintf.c (object_sizes, computed): Declared GTY.
(get_destination_size): Call compute_object_size.
* tree-object-size.c (addr_object_size): Adjust.
(pass_through_call): Adjust.
(compute_object_size, internal_object_size): New functions.
(compute_builtin_object_size): Call internal_object_size.
(init_object_sizes): Initialize computed bitmap so the garbage
collector knows about it.
(fini_object_sizes): Clear the computed bitmap when non-null.
(pass_object_sizes::execute): Adjust.
* tree-object-size.h (compute_object_size): Declare.

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 1317ad7..39c32e3 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -72,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct
object_size_info *, tree,
the subobject (innermost array or field with address taken).
object_sizes[2] is lower bound for number of bytes till the end of
the object and object_sizes[3] lower bound for subobject.  */
-static vec object_sizes[4];
+static GTY (()) vec object_sizes[4];

I don't think this needs a GTY marker.


 /* Bitmaps what object sizes have been computed already.  */
-static bitmap computed[4];
+static GTY (()) bitmap computed[4];

This is the one you probably needed :-)



+/* Like compute_builtin_object_size but intended to be called
+   without a corresponding __builtin_object_size in the program.  */
+
+bool
+compute_object_size (tree ptr, int object_size_type,
+ unsigned HOST_WIDE_INT *psize)
+{
+  static unsigned lastfunchash;
+  unsigned curfunchash
+= IDENTIFIER_HASH_VALUE (DECL_NAME (current_function_decl));
+
+  /* Initialize the internal data structures for each new function
+ and keep the computed data around for any subsequent calls to
+ compute_object_size.  */
+  if (curfunchash != lastfunchash)

My worry here would be a hash collision.  Then we'd be using object
sizes from the wrong function.


Ah, right, that might explain the ICE I just noticed during Ada
bootstrap.  Is there some other way to uniquely identify a function?
A DECL_UID maybe?



Isn't the goal here to be able to get format-length warnings when there
aren't explicit calls to _b_o_s in the IL?   Can't you initialize the
object-size framework at the start of your pass and tear it down when
your pass is complete?  You could do that by exporting the init/fini
routines and calling them directly, or by wrapping that in a class and
instantiating the class when you need it.

That would avoid having to worry about the GC system entirely since you
wouldn't have stuff living across passes.


Yes, that is the immediate goal of this patch.  Beyond it, though,
I would like to call this function from anywhere, including during
expansion (as is done in my patch for bug 53562 and related).

Martin

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00896.html


Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-23 Thread Bernd Schmidt

On 11/23/2016 08:30 PM, Jeff Law wrote:

On 11/23/2016 12:00 PM, Bernd Schmidt wrote:

Note that I misspelled the PR number in the 0/3 message :-/

On 11/23/2016 07:57 PM, Bernd Schmidt wrote:

1. I noticed comparisons between set_src_cost and set_rtx_cost seemed to
be invalid. There seems to be no good reason that insn_rtx_cost
shouldn't use the latter. It also makes the numbers comparable to the
ones you get from seq_cost.



Bernd

71280-1.diff


PR rtl-optimization/78120
* rtlanal.c (insn_rtx_cost): Use set_rtx_cost.

LGTM.   As a principle, if we have the full set, we ought to use
set_rtx_cost, and only use set_src_cost if we don't have the full set
expression.


Note that this cannot really be applied on its own, it needs patch 2/3 
so as to not make the x86 port do strange things. So I'll be holding off 
on this one until we have a consensus on the whole set.



Bernd



[committed] Fix OpenMP ICE with private allocatable on orphaned worksharing construct (PR middle-end/69183)

2016-11-23 Thread Jakub Jelinek
Hi!

allocatable privatized vars need access to the outer var for sizing,
but unlike e.g. firstprivate the standard allows them on orphaned
worksharing constructs or when the var is already private outside of
the worksharing construct.  Therefore, we should treat it similarly
to outer refs on simd construct.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2016-11-23  Jakub Jelinek  

PR middle-end/69183
* omp-low.c (build_outer_var_ref): Change lastprivate argument
to code, pass it recursively, adjust uses.  For OMP_CLAUSE_PRIVATE
on worksharing constructs, treat it like clauses on simd construct.
Formatting fix.
(lower_rec_input_clauses): For OMP_CLAUSE_PRIVATE_OUTER_REF pass
OMP_CLAUSE_PRIVATE as last argument to build_outer_var_ref.
(lower_lastprivate_clauses): Pass OMP_CLAUSE_LASTPRIVATE instead
of true as last argument to build_outer_var_ref.

* gfortran.dg/gomp/pr69183.f90: New test.

--- gcc/omp-low.c.jj2016-11-22 21:31:49.0 +0100
+++ gcc/omp-low.c   2016-11-23 16:20:31.830429649 +0100
@@ -1283,7 +1283,8 @@ build_receiver_ref (tree var, bool by_re
this is some variable.  */
 
 static tree
-build_outer_var_ref (tree var, omp_context *ctx, bool lastprivate = false)
+build_outer_var_ref (tree var, omp_context *ctx,
+enum omp_clause_code code = OMP_CLAUSE_ERROR)
 {
   tree x;
 
@@ -1292,7 +1293,7 @@ build_outer_var_ref (tree var, omp_conte
   else if (is_variable_sized (var))
 {
   x = TREE_OPERAND (DECL_VALUE_EXPR (var), 0);
-  x = build_outer_var_ref (x, ctx, lastprivate);
+  x = build_outer_var_ref (x, ctx, code);
   x = build_simple_mem_ref (x);
 }
   else if (is_taskreg_ctx (ctx))
@@ -1300,11 +1301,17 @@ build_outer_var_ref (tree var, omp_conte
   bool by_ref = use_pointer_for_field (var, NULL);
   x = build_receiver_ref (var, by_ref, ctx);
 }
-  else if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
-  && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD)
-{
-  /* #pragma omp simd isn't a worksharing construct, and can reference even
-private vars in its linear etc. clauses.  */
+  else if ((gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
+   && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD)
+  || (code == OMP_CLAUSE_PRIVATE
+  && (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
+  || gimple_code (ctx->stmt) == GIMPLE_OMP_SECTIONS
+  || gimple_code (ctx->stmt) == GIMPLE_OMP_SINGLE)))
+{
+  /* #pragma omp simd isn't a worksharing construct, and can reference
+even private vars in its linear etc. clauses.
+Similarly for OMP_CLAUSE_PRIVATE with outer ref, that can refer
+to private vars in all worksharing constructs.  */
   x = NULL_TREE;
   if (ctx->outer && is_taskreg_ctx (ctx))
x = lookup_decl (var, ctx->outer);
@@ -1313,7 +1320,7 @@ build_outer_var_ref (tree var, omp_conte
   if (x == NULL_TREE)
x = var;
 }
-  else if (lastprivate && is_taskloop_ctx (ctx))
+  else if (code == OMP_CLAUSE_LASTPRIVATE && is_taskloop_ctx (ctx))
 {
   gcc_assert (ctx->outer);
   splay_tree_node n
@@ -1350,7 +1357,7 @@ build_outer_var_ref (tree var, omp_conte
  gcc_assert (outer
  && gimple_code (outer->stmt) != GIMPLE_OMP_GRID_BODY);
}
-   x = lookup_decl (var, outer);
+  x = lookup_decl (var, outer);
 }
   else if (is_reference (var))
 /* This can happen with orphaned constructs.  If var is reference, it is
@@ -5031,7 +5038,7 @@ lower_rec_input_clauses (tree clauses, g
  if (is_task_ctx (ctx))
x = build_receiver_ref (var, false, ctx);
  else
-   x = build_outer_var_ref (var, ctx);
+   x = build_outer_var_ref (var, ctx, OMP_CLAUSE_PRIVATE);
}
  else
x = NULL;
@@ -5687,7 +5694,7 @@ lower_lastprivate_clauses (tree clauses,
x = ovar;
}
  if (!x)
-   x = build_outer_var_ref (var, ctx, true);
+   x = build_outer_var_ref (var, ctx, OMP_CLAUSE_LASTPRIVATE);
  if (is_reference (var))
new_var = build_simple_mem_ref_loc (clause_loc, new_var);
  x = lang_hooks.decls.omp_clause_assign_op (c, x, new_var);
--- gcc/testsuite/gfortran.dg/gomp/pr69183.f90.jj   2016-11-23 
16:18:55.801635373 +0100
+++ gcc/testsuite/gfortran.dg/gomp/pr69183.f90  2016-11-23 16:18:41.0 
+0100
@@ -0,0 +1,11 @@
+! PR middle-end/69183
+! { dg-do compile }
+
+program pr69183
+  integer, allocatable :: z
+  integer :: i
+  !$omp do private(z)
+  do i = 1, 2
+z = i
+  end do
+end

Jakub


Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-23 Thread Jeff Law

On 11/23/2016 12:00 PM, Bernd Schmidt wrote:

Note that I misspelled the PR number in the 0/3 message :-/

On 11/23/2016 07:57 PM, Bernd Schmidt wrote:

1. I noticed comparisons between set_src_cost and set_rtx_cost seemed to
be invalid. There seems to be no good reason that insn_rtx_cost
shouldn't use the latter. It also makes the numbers comparable to the
ones you get from seq_cost.



Bernd

71280-1.diff


PR rtl-optimization/78120
* rtlanal.c (insn_rtx_cost): Use set_rtx_cost.
LGTM.   As a principle, if we have the full set, we ought to use 
set_rtx_cost, and only use set_src_cost if we don't have the full set 
expression.


Jeff


Re: [PATCH, i386]: Improve mask op patterns a bit

2016-11-23 Thread Jakub Jelinek
On Wed, Nov 23, 2016 at 08:09:01PM +0100, Uros Bizjak wrote:
> Hello!
> 
> This patch cleans and improves mask op patterns a bit. The patch uses
> insn mode attribute to control emission of word-mode operations and
> macroizes a couple of patterns.
> 
> No functional changes.
> 
> 2016-11-23  Uros Bizjak  
> 
> * gcc.target/config/i386.md (*movqi_internal): Calculate mode

s,gcc.target/config,config/i386,

> attribute of alternatives 7,8,9 depending on TARGET_AVX512DQ.
> : Emit kmovw for MODE_HI insn mode attribute.
> (*k): Calculate mode attribute depending on
> TARGET_AVX512DQ.  Emit kw for MODE_HI insn mode attribute.
> (*andqi_1): Calculate mode attribute of alternative 3 depending
> on TARGET_AVX512DQ.  Emit kandw for MODE_HI insn mode attribute.
> (kandn): Calculate mode attribute of alternative 2 depending
> on TARGET_AVX512DQ.  Emit kandnw for MODE_HI insn mode attribute.
> (kxnor): Merge insn patterns using SWI1248_AVX512BW mode
> iterator.  Calculate mode attribute of alternative 1 depending
> on TARGET_AVX512DQ.  Emit kxnorw for MODE_HI insn mode attribute.
> (*one_cmplqi2_1): Calculate mode attribute of alternative 2 depending
> on TARGET_AVX512DQ.  Emit knotw for MODE_HI insn mode attribute.

Jakub


Re: [Patch, testsuite] Fix bogus uninit-19.c failure for avr

2016-11-23 Thread Jeff Law

On 11/23/2016 02:54 AM, Senthil Kumar Selvaraj wrote:

Hi,

  The below patch fixes uninit-19.c for avr by adding
  -finline-small-functions for avr.

  The test fails for avr because fn1 does not get inlined into fn2. Inlining
  occurs for x86_64 because fn1's computed size equals call_stmt_size. For
  avr, 32 bit memory moves are more expensive, and b[3] = p10[a] results in
  a bigger size for fn1, resulting in estimate_growth > 0 and no inlining.

  Adding -finline-small-functions forces early inliner to inline fn1,
  making the test pass.

  Committed to trunk.

Regards
Senthil

gcc/testsuite/ChangeLog

2016-11-23  Senthil Kumar Selvaraj  

* gcc.dg/uninit-19.c: Add -finline-small-functions for avr.
How about instead forcing inlining via the always_inline attribute?  If 
inlining is indeed expected and necessary, that seems better than 
-finline-small-functions.


Jeff



Re: [PATCH, GCC] Improve comment for struct symbolic_number in bswap pass

2016-11-23 Thread Jeff Law

On 11/23/2016 03:45 AM, Thomas Preudhomme wrote:

Hi,

The current comment for struct symbolic_number in the bswap pass code
(tree-ssa-math-opts.c) does not explain all of the fields in the
structure. It is also a bit unclear at times. This patch rewrites the
comment to fix those. Note: it depends on the fix for PR77673 to be
installed first.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2016-11-22  Thomas Preud'homme  

* tree-ssa-math-opts.c (struct symbolic_number): Improve comment.

OK once the patch for 77673 is installed.

Thanks for updating the comments.

jeff



Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Jeff Law

On 11/23/2016 11:26 AM, Martin Sebor wrote:

My only real concern here is that if we call compute_builtin_object_size
without having initialized the passes, then we initialize, compute, then
finalize.  Subsequent calls will go through the same process -- the key
being each one re-computes the internal state which might get expensive.

Wouldn't it just make more sense to pull up the init/fini calls, either
explicitly (which likely means externalizing the init/fini routines) or
by wrapping all this stuff in a class and instantiating a suitable
object?

I think the answer to your memory management question is that internal
state is likely not marked as a GC root and thus if you get a GC call
pieces of internal state are not seen as reachable, but you still can
reference them.  ie, you end up with dangling pointers.

Usually all you'd have to do is mark them so that gengtype will see
them.  Bitmaps, trees, rtl, are all good examples.  So marking the
bitmap would look like:

static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where
someone has an object expected to be live across passes, but it isn't
reachable because someone failed to register a GC root.


Thanks.  Attached is an updated patch that avoids flushing the computed
data until the current function changes, and tells the garbage collector
not to collect it.  I haven't finished bootstrapping and regtesting it
yet but running it through Valgrind doesn't show any errors.  Please
let me know if this is what you had in mind.

Thanks
Martin

gcc-78245.diff


PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically 
allocated buffer

gcc/testsuite/ChangeLog:

PR middle-end/78245
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

PR middle-end/78245
* gimple-ssa-sprintf.c (object_sizes, computed): Declared GTY.
(get_destination_size): Call compute_object_size.
* tree-object-size.c (addr_object_size): Adjust.
(pass_through_call): Adjust.
(compute_object_size, internal_object_size): New functions.
(compute_builtin_object_size): Call internal_object_size.
(init_object_sizes): Initialize computed bitmap so the garbage
collector knows about it.
(fini_object_sizes): Clear the computed bitmap when non-null.
(pass_object_sizes::execute): Adjust.
* tree-object-size.h (compute_object_size): Declare.

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 1317ad7..39c32e3 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -72,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct 
object_size_info *, tree,
the subobject (innermost array or field with address taken).
object_sizes[2] is lower bound for number of bytes till the end of
the object and object_sizes[3] lower bound for subobject.  */
-static vec object_sizes[4];
+static GTY (()) vec object_sizes[4];

I don't think this needs a GTY marker.


 /* Bitmaps what object sizes have been computed already.  */
-static bitmap computed[4];
+static GTY (()) bitmap computed[4];

This is the one you probably needed :-)



+/* Like compute_builtin_object_size but intended to be called
+   without a corresponding __builtin_object_size in the program.  */
+
+bool
+compute_object_size (tree ptr, int object_size_type,
+unsigned HOST_WIDE_INT *psize)
+{
+  static unsigned lastfunchash;
+  unsigned curfunchash
+= IDENTIFIER_HASH_VALUE (DECL_NAME (current_function_decl));
+
+  /* Initialize the internal data structures for each new function
+ and keep the computed data around for any subsequent calls to
+ compute_object_size.  */
+  if (curfunchash != lastfunchash)
My worry here would be a hash collision.  Then we'd be using object 
sizes from the wrong function.



Isn't the goal here to be able to get format-length warnings when there 
aren't explicit calls to _b_o_s in the IL?   Can't you initialize the 
object-size framework at the start of your pass and tear it down when 
your pass is complete?  You could do that by exporting the init/fini 
routines and calling them directly, or by wrapping that in a class and 
instantiating the class when you need it.


That would avoid having to worry about the GC system entirely since you 
wouldn't have stuff living across passes.


Jeff


[PATCH, i386]: Improve mask op patterns a bit

2016-11-23 Thread Uros Bizjak
Hello!

This patch cleans and improves mask op patterns a bit. The patch uses
insn mode attribute to control emission of word-mode operations and
macroizes a couple of patterns.

No functional changes.

2016-11-23  Uros Bizjak  

* gcc.target/config/i386.md (*movqi_internal): Calculate mode
attribute of alternatives 7,8,9 depending on TARGET_AVX512DQ.
: Emit kmovw for MODE_HI insn mode attribute.
(*k): Calculate mode attribute depending on
TARGET_AVX512DQ.  Emit kw for MODE_HI insn mode attribute.
(*andqi_1): Calculate mode attribute of alternative 3 depending
on TARGET_AVX512DQ.  Emit kandw for MODE_HI insn mode attribute.
(kandn): Calculate mode attribute of alternative 2 depending
on TARGET_AVX512DQ.  Emit kandnw for MODE_HI insn mode attribute.
(kxnor): Merge insn patterns using SWI1248_AVX512BW mode
iterator.  Calculate mode attribute of alternative 1 depending
on TARGET_AVX512DQ.  Emit kxnorw for MODE_HI insn mode attribute.
(*one_cmplqi2_1): Calculate mode attribute of alternative 2 depending
on TARGET_AVX512DQ.  Emit knotw for MODE_HI insn mode attribute.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 242765)
+++ config/i386/i386.md (working copy)
@@ -970,6 +970,11 @@
 (define_mode_iterator SWI1248_AVX512BWDQ
   [(QI "TARGET_AVX512DQ") HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
 
+;; All integer modes with AVX512BW, where HImode operation
+;; can be used instead of QImode.
+(define_mode_iterator SWI1248_AVX512BW
+  [QI HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
+
 ;; All integer modes without QImode.
 (define_mode_iterator SWI248x [HI SI DI])
 
@@ -2574,11 +2582,15 @@
 
 (define_insn "*movqi_internal"
   [(set (match_operand:QI 0 "nonimmediate_operand"
-   "=q,q ,q ,r,r ,?r,m ,k,k,r ,m,k")
+   "=q,q ,q ,r,r ,?r,m ,k,k,r,m,k")
(match_operand:QI 1 "general_operand"
-   "q ,qn,qm,q,rn,qm,qn,r ,k,k,k,m"))]
+   "q ,qn,qm,q,rn,qm,qn,r,k,k,k,m"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
+  static char buf[128];
+  const char *ops;
+  const char *suffix;
+
   switch (get_attr_type (insn))
 {
 case TYPE_IMOVX:
@@ -2588,24 +2600,33 @@
 case TYPE_MSKMOV:
   switch (which_alternative)
 {
-   case 7: return TARGET_AVX512DQ ? "kmovb\t{%k1, %0|%0, %k1}"
-  : "kmovw\t{%k1, %0|%0, %k1}";
-   case 8: return TARGET_AVX512DQ ? "kmovb\t{%1, %0|%0, %1}"
-  : "kmovw\t{%1, %0|%0, %1}";
-   case 9: return TARGET_AVX512DQ ? "kmovb\t{%1, %k0|%k0, %1}"
-  : "kmovw\t{%1, %k0|%k0, %1}";
+   case 7:
+ ops = "kmov%s\t{%%k1, %%0|%%0, %%k1}";
+ break;
+   case 9:
+ ops = "kmov%s\t{%%1, %%k0|%%k0, %%1}";
+ break;
case 10:
case 11:
  gcc_assert (TARGET_AVX512DQ);
- return "kmovb\t{%1, %0|%0, %1}";
-   default: gcc_unreachable ();
+ /* FALLTHRU */
+   case 8:
+ ops = "kmov%s\t{%%1, %%0|%%0, %%1}";
+ break;
+   default:
+ gcc_unreachable ();
}
 
+  suffix = (get_attr_mode (insn) == MODE_HI) ? "w" : "b";
+
+  snprintf (buf, sizeof (buf), ops, suffix);
+  return buf;
+
 default:
   if (get_attr_mode (insn) == MODE_SI)
-return "mov{l}\t{%k1, %k0|%k0, %k1}";
+   return "mov{l}\t{%k1, %k0|%k0, %k1}";
   else
-return "mov{b}\t{%1, %0|%0, %1}";
+   return "mov{b}\t{%1, %0|%0, %1}";
 }
 }
   [(set (attr "isa")
@@ -2640,6 +2661,9 @@
   (const_string "SI")
 (eq_attr "alternative" "6")
   (const_string "QI")
+(and (eq_attr "alternative" "7,8,9")
+ (not (match_test "TARGET_AVX512DQ")))
+  (const_string "HI")
 (eq_attr "type" "imovx")
   (const_string "SI")
 (and (eq_attr "type" "imov")
@@ -8055,23 +8079,26 @@
(any_logic:SWI1248x (match_dup 1)
(match_dup 2)))])
 
-(define_mode_iterator SWI1248_AVX512BW
-  [QI HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
-
 (define_insn "*k"
   [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand" "=k")
-   (any_logic:SWI1248_AVX512BW (match_operand:SWI1248_AVX512BW 1 
"mask_reg_operand" "k")
-   (match_operand:SWI1248_AVX512BW 2 
"mask_reg_operand" "k")))]
+   (any_logic:SWI1248_AVX512BW
+ (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand" "k")
+ (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand" "k")))]
   "TARGET_AVX512F"
   {
-if (!TARGET_AVX512DQ && mode == QImode)
+if (get_attr_mode (insn) == MODE_HI)

[3/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-23 Thread Bernd Schmidt

On 11/23/2016 07:57 PM, Bernd Schmidt wrote:

3. ifcvt computes the sum of costs for the involved blocks, but only
makes a before/after comparison when optimizing for size. When
optimizing for speed, it uses max_seq_cost, which is an estimate
computed from BRANCH_COST, which in turn can be zero for predictable
branches on x86.


This is the final patch and has the testcase. It also happens to be the 
least risky of the series so it could be applied on its own (without the 
test).



Bernd

	PR rtl-optimization/78120
	* ifcvt.c (noce_conversion_profitable_p): Check original cost in all
	cases, and additionally test against max_seq_cost for speed
	optimization.
	(noce_process_if_block): Compute an estimate for the original cost when
	optimizing for speed, using the minimum of then and else block costs.

	PR rtl-optimization/78120
	* gcc.target/i386/pr78120.c: New test.

Index: gcc/ifcvt.c
===
--- gcc/ifcvt.c	(revision 242038)
+++ gcc/ifcvt.c	(working copy)
@@ -812,8 +812,10 @@ struct noce_if_info
  we're optimizing for size.  */
   bool speed_p;
 
-  /* The combined cost of COND, JUMP and the costs for THEN_BB and
- ELSE_BB.  */
+  /* An estimate of the original costs.  When optimizing for size, this is the
+ combined cost of COND, JUMP and the costs for THEN_BB and ELSE_BB.
+ When optimizing for speed, we use the costs of COND plus the minimum of
+ the costs for THEN_BB and ELSE_BB, as computed in the next field.  */
   unsigned int original_cost;
 
   /* Maximum permissible cost for the unconditional sequence we should
@@ -852,12 +857,12 @@ noce_conversion_profitable_p (rtx_insn *
   /* Cost up the new sequence.  */
   unsigned int cost = seq_cost (seq, speed_p);
 
+  if (cost <= if_info->original_cost)
+return true;
+
   /* When compiling for size, we can make a reasonably accurately guess
- at the size growth.  */
-  if (!speed_p)
-return cost <= if_info->original_cost;
-  else
-return cost <= if_info->max_seq_cost;
+ at the size growth.  When compiling for speed, use the maximum.  */
+  return speed_p && cost <= if_info->max_seq_cost;
 }
 
 /* Helper function for noce_try_store_flag*.  */
@@ -3441,15 +3446,24 @@ noce_process_if_block (struct noce_if_in
 	}
 }
 
-  if (! bb_valid_for_noce_process_p (then_bb, cond, _info->original_cost,
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+  unsigned int then_cost = 0, else_cost = 0;
+  if (!bb_valid_for_noce_process_p (then_bb, cond, _cost,
 _info->then_simple))
 return false;
 
   if (else_bb
-  && ! bb_valid_for_noce_process_p (else_bb, cond, _info->original_cost,
-  _info->else_simple))
+  && !bb_valid_for_noce_process_p (else_bb, cond, _cost,
+   _info->else_simple))
 return false;
 
+  if (else_bb == NULL)
+if_info->original_cost += then_cost;
+  else if (speed_p)
+if_info->original_cost += MIN (then_cost, else_cost);
+  else
+if_info->original_cost += then_cost + else_cost;
+
   insn_a = last_active_insn (then_bb, FALSE);
   set_a = single_set (insn_a);
   gcc_assert (set_a);
Index: gcc/testsuite/gcc.target/i386/pr78120.c
===
--- gcc/testsuite/gcc.target/i386/pr78120.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr78120.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=generic" } */
+/* { dg-final { scan-assembler "adc" } } */
+/* { dg-final { scan-assembler-not "jmp" } } */
+
+typedef unsigned long u64;
+
+typedef struct {
+  u64 hi, lo;
+} u128;
+
+static inline u128 add_u128 (u128 a, u128 b)
+{
+  a.lo += b.lo;
+  if (a.lo < b.lo)
+a.hi++;
+
+  return a;
+}
+
+extern u128 t1, t2, t3;
+
+void foo (void)
+{
+  t1 = add_u128 (t1, t2);
+  t1 = add_u128 (t1, t3);
+}


Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-23 Thread Bernd Schmidt

On 11/23/2016 07:57 PM, Bernd Schmidt wrote:

2. The i386 backend mishandles SET rtxs. If you have a fairly plain
single-insn SET, you tend to get COSTS_N_INSNS (2) out of set_rtx_cost,
because rtx_costs has a default of COSTS_N_INSNS (1) for a SET, and you
get the cost of the src in addition to that.



Bernd

	PR rtl-optimization/78120
	* config/i386/i386.c (ix86_rtx_costs): Fully handle SETs.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 242038)
+++ gcc/config/i386/i386.c	(working copy)
@@ -39925,6 +39925,7 @@ ix86_rtx_costs (rtx x, machine_mode mode
   enum rtx_code code = GET_CODE (x);
   enum rtx_code outer_code = (enum rtx_code) outer_code_i;
   const struct processor_costs *cost = speed ? ix86_cost : _size_cost;
+  int src_cost;
 
   switch (code)
 {
@@ -39935,7 +39936,23 @@ ix86_rtx_costs (rtx x, machine_mode mode
 	  *total = ix86_set_reg_reg_cost (GET_MODE (SET_DEST (x)));
 	  return true;
 	}
-  return false;
+
+  if (register_operand (SET_SRC (x), VOIDmode))
+	/* Avoid potentially incorrect high cost from rtx_costs
+	   for non-tieable SUBREGs.  */
+	src_cost = 0;
+  else
+	{
+	  src_cost = rtx_cost (SET_SRC (x), mode, SET, 1, speed);
+
+	  if (CONSTANT_P (SET_SRC (x)))
+	/* Constant costs assume a base value of COSTS_N_INSNS (1) and add
+	   a small value, possibly zero for cheap constants.  */
+	src_cost += COSTS_N_INSNS (1);
+	}
+
+  *total = src_cost + rtx_cost (SET_DEST (x), mode, SET, 0, speed);
+  return true;
 
 case CONST_INT:
 case CONST:


Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-23 Thread Bernd Schmidt

Note that I misspelled the PR number in the 0/3 message :-/

On 11/23/2016 07:57 PM, Bernd Schmidt wrote:

1. I noticed comparisons between set_src_cost and set_rtx_cost seemed to
be invalid. There seems to be no good reason that insn_rtx_cost
shouldn't use the latter. It also makes the numbers comparable to the
ones you get from seq_cost.



Bernd
	PR rtl-optimization/78120
	* rtlanal.c (insn_rtx_cost): Use set_rtx_cost.

Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c	(revision 242038)
+++ gcc/rtlanal.c	(working copy)
@@ -5211,7 +5211,7 @@ insn_rtx_cost (rtx pat, bool speed)
   else
 return 0;
 
-  cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
+  cost = set_rtx_cost (set, speed);
   return cost > 0 ? cost : COSTS_N_INSNS (1);
 }
 


[0/3] Fix PR71280, in ifcvt/rtlanal/i386.

2016-11-23 Thread Bernd Schmidt
This is a small series of patches to fix various problems in cost 
calculations that together caused PR71280, a missed optimization 
opportunity.


A summary of the problems:

1. I noticed comparisons between set_src_cost and set_rtx_cost seemed to 
be invalid. There seems to be no good reason that insn_rtx_cost 
shouldn't use the latter. It also makes the numbers comparable to the

ones you get from seq_cost.

2. The i386 backend mishandles SET rtxs. If you have a fairly plain 
single-insn SET, you tend to get COSTS_N_INSNS (2) out of set_rtx_cost, 
because rtx_costs has a default of COSTS_N_INSNS (1) for a SET, and you 
get the cost of the src in addition to that.


3. ifcvt computes the sum of costs for the involved blocks, but only 
makes a before/after comparison when optimizing for size. When 
optimizing for speed, it uses max_seq_cost, which is an estimate 
computed from BRANCH_COST, which in turn can be zero for predictable 
branches on x86.


It seems a little risky to tweak costs this late in the process, but all 
of these should be improvements so it would put us on a better footing 
for fixing performance issues. I'll leave it to the reviewer to decide 
whether we want this now or after gcc-7.


The series was bootstrapped and tested on x86_64-linux. There's the 
following new guality fail:


-PASS: gcc.dg/guality/pr54693-2.c   -Os  line 21 x == 10 - i
-PASS: gcc.dg/guality/pr54693-2.c   -Os  line 21 y == 20 - 2 * i
+FAIL: gcc.dg/guality/pr54693-2.c   -Os  line 21 x == 10 - i
+FAIL: gcc.dg/guality/pr54693-2.c   -Os  line 21 y == 20 - 2 * i

which appears to be caused by loss of debuginfo in ivopts:

-  # DEBUG x => (int) ((unsigned int) x_9(D) - (unsigned int) i_14)
-  # DEBUG y => (int) ((unsigned int) y_10(D) - (unsigned int) i_14 * 2)
+  # DEBUG x => NULL
+  # DEBUG y => NULL

I'd claim this is out of scope for this patch series. So, ok?


Bernd


Re: [patch] Fix PR rtl-optimization/78437

2016-11-23 Thread Jeff Law

On 11/23/2016 03:32 AM, Paolo Bonzini wrote:



On 23/11/2016 11:26, Eric Botcazou wrote:

Does it really do that with a (set (reg1:QI)), as opposed to a
(set (strict_low_part (subreg:QI (reg1:DI)))?


That's the question (note that REE runs after register allocation).


IIRC, strict_low_part is required even after register allocation, and
patterns such as this one in config/i386/i386.md seem to confirm it:
Right.  The strict_low_part is required after register allocation. 
Without the strict_low_part, the contents of the bits outside the mode 
of the strict_low_part are undefined.


Jeff


Re: Set mode of decimal floats before calling layout_type

2016-11-23 Thread Jeff Law

On 11/23/2016 07:45 AM, Richard Sandiford wrote:

Previously decimal floating-point types were created and laid
out as binary floating-point types, then the caller changed
the mode to a decimal mode later.  The problem with that
approach is that not all targets support an equivalent binary
floating-point mode.  When they didn't, we would give the
type BLKmode and lay it out as a zero-sized type.

This probably had no effect in practice.  If a target doesn't
support a binary mode then it's unlikely to support the decimal
equivalent either.  However, with the stricter mode checking
added by later patches, we would assert if a scalar floating-
point type didn't have a scalar floating-point mode.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-16  Richard Sandiford  
Alan Hayward  
David Sherwood  

* stor-layout.c (layout_type): Allow the caller to set the mode of
a float type.  Only choose one here if the mode is still VOIDmode.
* tree.c (build_common_tree_nodes): Set the type mode of decimal
floats before calling layout_type.
* config/rs6000/rs6000.c (rs6000_init_builtins): Likewise.

OK.
jeff



Re: [PATCH] combine: Improve change_zero_ext (fixes PR71847)

2016-11-23 Thread Segher Boessenkool
On Wed, Nov 23, 2016 at 05:53:43PM +0100, Georg-Johann Lay wrote:
> >So why does the define_insn allow it?
> 
> Because the insn predicate is register_operand:HI which should be fine 
> as it is non-strict RTL.  Or are predicates supposed to reject such odd 
> operands the backend would never generate itself?

I am not sure.

> Are you saying that register_operand:HI is not correct?

No; I think register_operand:HI should not allow odd hard registers.

> IMO, if combine is playing with hard regs, it should be conservative and 
> not create new hard regs out of thin air...  like
> 
> * Test HARD_REGNO_MODE_OK

Yes, I'll have a patch for that in a minute.  Combine is supposed to make
code better, not worse ;-)

> * Don't increase HARD_REGNO_NREGS because it's not a good idea to create 
> hard regs out of thin air.  On avr, for example, one should strive to 
> get to smaller modes, not to blow mode sizes...

Combine tries to make fewer RTL insns (not machine insns), and then rejects
a combination if the rtx costs say it is more expensive than before.  It
sounds like your costs are not tuned well enough?


Segher


Re: [PATCH, rs6000] Fix PR78458, LRA ICE building libgcc for powerpc-linux-gnuspe e500v2

2016-11-23 Thread Joseph Myers
On Wed, 23 Nov 2016, Peter Bergner wrote:

> Joseph, I copied the testsuite preamble from another SPE test case.
> Does it look correct to you to catch the bug?

I think the preamble is fine.

I'm running execution testing with my two SPE patches and will submit them 
if results seem reasonable (since the previous state was the build 
failing, I don't have baseline results for unmodified GCC to compare 
with).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-23 Thread Christophe Lyon
On 23 November 2016 at 17:30, Michael Collison  wrote:
> Hi Christophe,
>
> This is not a regression per se; the patch causes the test case to generate 
> one less instruction overall, but one additional 'and'. Trunk before the 
> patch (-O2):
>
> foo:
> and w0, w0, 255
> lsl w1, w0, 20
> orr w0, w1, w0, lsl 8
> mov w1, 65024
> movkw1, 0xfe0, lsl 16
> and w0, w0, w1
> ret
>
> The new trunk aarch64 compiler with the patch generates fewer instructions 
> but one additional 'and'
>
> foo:
> and w0, w0, 255
> lsl w1, w0, 20
> orr w0, w1, w0, lsl 8
> and x0, x0, 268434944
> and x0, x0, -2031617
> ret
>

OK,
by "regression" I meant that a test used to pass and now fails.
It looks like you have to update it to take into account the new, better code.

Christophe

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Wednesday, November 23, 2016 6:42 AM
> To: James Greenhalgh
> Cc: Michael Collison; gcc-patches@gcc.gnu.org; nd
> Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions
>
> Hi Michael,
>
>
> On 21 November 2016 at 10:52, James Greenhalgh  
> wrote:
>> On Fri, Nov 18, 2016 at 07:41:58AM +, Michael Collison wrote:
>>> James,
>>>
>>> I incorporated all your suggestions, and successfully bootstrapped
>>> and re-ran the testsuite.
>>>
>>> Okay for trunk?
>>>
>>> 2016-11-18  Michael Collison  
>>>
>>>   * config/aarch64/aarch64-protos.h
>>>   (aarch64_and_split_imm1, aarch64_and_split_imm2)
>>>   (aarch64_and_bitmask_imm): New prototypes
>>>   * config/aarch64/aarch64.c (aarch64_and_split_imm1):
>>>   New overloaded function to create bit mask covering the
>>>   lowest to highest bits set.
>>>   (aarch64_and_split_imm2): New overloaded functions to create bit
>>>   mask of zeros between first and last bit set.
>>>   (aarch64_and_bitmask_imm): New function to determine if a integer
>>>   is a valid two instruction "and" operation.
>>>   * config/aarch64/aarch64.md:(and3): New define_insn and _split
>>>   allowing wider range of constants with "and" operations.
>>>   * (ior3, xor3): Use new LOGICAL2 iterator to prevent
>>>   "and" operator from matching restricted constant range used for
>>>   ior and xor operators.
>>>   * config/aarch64/constraints.md (UsO constraint): New SImode 
>>> constraint
>>>   for constants in "and" operantions.
>>>   (UsP constraint): New DImode constraint for constants in "and" 
>>> operations.
>>>   * config/aarch64/iterators.md (lconst2): New mode iterator.
>>>   (LOGICAL2): New code iterator.
>>>   * config/aarch64/predicates.md (aarch64_logical_and_immediate): New
>>>   predicate
>>>   (aarch64_logical_and_operand): New predicate allowing extended 
>>> constants
>>>   for "and" operations.
>>>   * testsuite/gcc.target/aarch64/and_const.c: New test to verify
>>>   additional constants are recognized and fewer instructions generated.
>>>   * testsuite/gcc.target/aarch64/and_const2.c: New test to verify
>>>   additional constants are recognized and fewer instructions generated.
>>>
>>
>>> diff --git a/gcc/config/aarch64/aarch64-protos.h
>>> b/gcc/config/aarch64/aarch64-protos.h
>>> index 3cdd69b..7ef8cdf 100644
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;
>>> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned,
>>> unsigned);  int aarch64_get_condition_code (rtx);  bool
>>> aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>>> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT
>>> +val_in); unsigned HOST_WIDE_INT aarch64_and_split_imm2
>>> +(HOST_WIDE_INT val_in); bool aarch64_and_bitmask_imm (unsigned
>>> +HOST_WIDE_INT val_in, machine_mode mode);
>>>  int aarch64_branch_cost (bool, bool);  enum aarch64_symbol_type
>>> aarch64_classify_symbolic_expression (rtx);  bool
>>> aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git
>>> a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index
>>> 3e663eb..8e33c40 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -3600,6 +3600,43 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
>>> machine_mode mode)
>>>return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];
>>> }
>>>
>>> +/* Create mask of ones, covering the lowest to highest bits set in
>>> +VAL_IN.  */
>>> +
>>> +unsigned HOST_WIDE_INT
>>> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) {
>>> +  int lowest_bit_set = ctz_hwi (val_in);
>>> +  int highest_bit_set = floor_log2 (val_in);
>>> +  gcc_assert (val_in != 0);
>>
>> The comment above the function should make this 

Re: [PATCH, rs6000] Fix PR78458, LRA ICE building libgcc for powerpc-linux-gnuspe e500v2

2016-11-23 Thread Segher Boessenkool
On Wed, Nov 23, 2016 at 12:13:23PM -0600, Peter Bergner wrote:
> --- gcc/config/rs6000/rs6000.h(revision 241976)
> +++ gcc/config/rs6000/rs6000.h(working copy)
> @@ -1279,9 +1279,11 @@ enum data_align { align_abi, align_opt,
> enough space to account for vectors in FP regs.  However, TFmode/TDmode
> should not use VSX instructions to do a caller save. */
>  #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE)  
> \
> -  (TARGET_VSX
> \
> -   && ((MODE) == VOIDmode || ALTIVEC_OR_VSX_VECTOR_MODE (MODE))  
> \
> -   && FP_REGNO_P (REGNO) \
> +  (NREGS <= rs6000_hard_regno_nregs[MODE][REGNO] \
> +   ? MODE\

Please put parens around NREGS and that last MODE.

> --- gcc/testsuite/gcc.target/powerpc/pr78458.c(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr78458.c(working copy)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mcpu=8548 -mspe -mabi=spe -mlra" } */
> +/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } { "*" } { "" } 
> } */

You don't need the default arguments, FWIW.

Okay for trunk.  Thanks!


Segher


Re: [PATCH] Fix PR bootstrap/78493

2016-11-23 Thread Jeff Law

On 11/23/2016 09:13 AM, Martin Liška wrote:

Hello.

As described in the PR, the patch fixes profiled bootstrap on x86_64-linux-gnu.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. And
profiled bootstrap on x86_64-linux-gnu finishes successfully.

Ready to be installed?
Martin


0001-Fix-PR-bootstrap-78493.patch


From 8b7cd9a83cd14f7a15f39e105ccd78e143ec84f2 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 23 Nov 2016 14:08:52 +0100
Subject: [PATCH] Fix PR bootstrap/78493

gcc/ChangeLog:

2016-11-23  Martin Liska  

PR bootstrap/78493
* vec.h (~auto_vec): Do va_heap::release just if
this->m_vec == _auto.  That would help compiler not to
trigger -Werror=free-nonheap-object.
It's probably the case that the profiling information inhibited a jump 
thread optimization (path was considered cold and thus 
duplication/isolation not profitable) which left the dead path in the IL.


I don't like that we're essentially inlining the ::release method.  At 
the least we should have a pair of comments.  In the destructor we 
should indicate why we've inlined the release method.  In the release 
method we should make a note that if the release method is changed that 
a suitable change to the auto_vec destructor may be needed.


Alternately, given all the problems we have with this kind of problem, 
we should seriously consider throttling back what we consider an error 
during a profiled bootstrap.  This kind of stuff is a maintenance 
nightmare with minimal value.


jeff



Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Martin Sebor

My only real concern here is that if we call compute_builtin_object_size
without having initialized the passes, then we initialize, compute, then
finalize.  Subsequent calls will go through the same process -- the key
being each one re-computes the internal state which might get expensive.

Wouldn't it just make more sense to pull up the init/fini calls, either
explicitly (which likely means externalizing the init/fini routines) or
by wrapping all this stuff in a class and instantiating a suitable object?

I think the answer to your memory management question is that internal
state is likely not marked as a GC root and thus if you get a GC call
pieces of internal state are not seen as reachable, but you still can
reference them.  ie, you end up with dangling pointers.

Usually all you'd have to do is mark them so that gengtype will see
them.  Bitmaps, trees, rtl, are all good examples.  So marking the
bitmap would look like:

static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where
someone has an object expected to be live across passes, but it isn't
reachable because someone failed to register a GC root.


Thanks.  Attached is an updated patch that avoids flushing the computed
data until the current function changes, and tells the garbage collector
not to collect it.  I haven't finished bootstrapping and regtesting it
yet but running it through Valgrind doesn't show any errors.  Please
let me know if this is what you had in mind.

Thanks
Martin
PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer

gcc/testsuite/ChangeLog:

	PR middle-end/78245
	* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

	PR middle-end/78245
	* gimple-ssa-sprintf.c (object_sizes, computed): Declared GTY.
	(get_destination_size): Call compute_object_size.
	* tree-object-size.c (addr_object_size): Adjust.
	(pass_through_call): Adjust.
	(compute_object_size, internal_object_size): New functions.
	(compute_builtin_object_size): Call internal_object_size.
	(init_object_sizes): Initialize computed bitmap so the garbage
	collector knows about it.
	(fini_object_sizes): Clear the computed bitmap when non-null.
	(pass_object_sizes::execute): Adjust.
	* tree-object-size.h (compute_object_size): Declare.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index ead8b0e..ea56570 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -2468,7 +2468,7 @@ get_destination_size (tree dest)
  object (the function fails without optimization in this type).  */
   int ost = optimize > 0;
   unsigned HOST_WIDE_INT size;
-  if (compute_builtin_object_size (dest, ost, ))
+  if (compute_object_size (dest, ost, ))
 return size;
 
   return HOST_WIDE_INT_M1U;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
index 8d97fa8..9874332 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
@@ -1,5 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* { dg-options "-O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* Verify that all sprintf built-ins detect overflow involving directives
+   with non-constant arguments known to be constrained by some range of
+   values, and even when writing into dynamically allocated buffers.
+   -O2 (-ftree-vrp) is necessary for the tests involving ranges to pass,
+   otherwise -O1 is sufficient.  */
 
 #ifndef LINE
 #  define LINE 0
@@ -7,18 +12,26 @@
 
 #define bos(x) __builtin_object_size (x, 0)
 
-#define T(bufsize, fmt, ...)		\
-do {\
-  if (!LINE || __LINE__ == LINE)	\
-	{\
-	  char *d = (char *)__builtin_malloc (bufsize);			\
-	  __builtin___sprintf_chk (d, 0, bos (d), fmt, __VA_ARGS__);	\
-	  sink (d);			\
-	}\
-} while (0)
+/* Defined (and redefined) to the allocation function to use, either
+   malloc, or alloca, or a VLA.  */
+#define ALLOC(p, n)   (p) = __builtin_malloc (n)
 
-void
-sink (void*);
+/* Defined (and redefined) to the sprintf function to exercise.  */
+#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...)		\
+  __builtin___sprintf_chk (d, 0, objsize, fmt, __VA_ARGS__)
+
+#define T(bufsize, fmt, ...)\
+  do {			\
+if (!LINE || __LINE__ == LINE)			\
+  {			\
+	char *d;	\
+	ALLOC (d, bufsize);\
+	TEST_SPRINTF (d, 0, bos (d), fmt, __VA_ARGS__);	\
+	sink (d);	\
+  }			\
+  } while (0)
+
+void sink (void*);
 
 /* Identity function to verify that the checker figures out the value
of the operand even when it's not constant (i.e., makes use of
@@ -232,3 +245,88 @@ void test_sprintf_chk_range_sshort (signed short *a, signed short *b)
   T ( 4, "%i",  Ra (998,  999));
   

Re: [PATCH] combine: Improve change_zero_ext (fixes PR71847)

2016-11-23 Thread Jeff Law

On 11/23/2016 09:53 AM, Georg-Johann Lay wrote:

Segher Boessenkool schrieb:

On Wed, Nov 23, 2016 at 04:58:22PM +0100, Georg-Johann Lay wrote:

Hi, this causes an illegal code issue on avr.


Sorry about that.

[ snip ]


Trying 19 -> 7:
Failed to match this instruction:
(set (reg:HI 45 [ x+3 ])
(zero_extend:HI (reg:QI 25 r25 [ x+3 ])))
Successfully matched this instruction:
(set (reg:HI 45 [ x+3 ])
(and:HI (reg:HI 25 r25)
(const_int 255 [0xff])))
allowing combination of insns 19 and 7



R25 is a 16-bit hard reg, but HImode (and larger) registers must
start at an even register number.


So why does the define_insn allow it?


Because the insn predicate is register_operand:HI which should be fine
as it is non-strict RTL.  Or are predicates supposed to reject such odd
operands the backend would never generate itself?

Are you saying that register_operand:HI is not correct?

IMO, if combine is playing with hard regs, it should be conservative and
not create new hard regs out of thin air...  like

* Test HARD_REGNO_MODE_OK

* Don't increase HARD_REGNO_NREGS because it's not a good idea to create
hard regs out of thin air.  On avr, for example, one should strive to
get to smaller modes, not to blow mode sizes...
There's already some code in the combiner to avoid some of these 
situations.  For example:


 /* If register alignment is being enforced for multi-word items in all
 cases except for parameters, it is possible to have a register 
copy
 insn referencing a hard register that is not allowed to 
contain the
 mode being copied and which would not be valid as an operand 
of most

 insns.  Eliminate this problem by not combining with such an insn.

 Also, on some machines we don't want to extend the life of a hard
 register.  */

  if (REG_P (src)
  && ((REGNO (dest) < FIRST_PSEUDO_REGISTER
   && ! HARD_REGNO_MODE_OK (REGNO (dest), GET_MODE (dest)))
  /* Don't extend the life of a hard register unless it is
 user variable (if we have few registers) or it can't
 fit into the desired register (meaning something special
 is going on).
 Also avoid substituting a return register into I3, because
 reload can't handle a conflict with constraints of other
 inputs.  */
  || (REGNO (src) < FIRST_PSEUDO_REGISTER
  && ! HARD_REGNO_MODE_OK (REGNO (src), GET_MODE (src)
return 0;



Jeff


Re: Tighten check for whether a sibcall references local variables

2016-11-23 Thread Jeff Law

On 11/23/2016 10:09 AM, Richard Sandiford wrote:

Richard Biener  writes:

On Tue, Nov 22, 2016 at 10:00 AM, Richard Sandiford
 wrote:

This loop:

  /* Make sure the tail invocation of this function does not refer
 to local variables.  */
  FOR_EACH_LOCAL_DECL (cfun, idx, var)
{
  if (TREE_CODE (var) != PARM_DECL
  && auto_var_in_fn_p (var, cfun->decl)
  && (ref_maybe_used_by_stmt_p (call, var)
  || call_may_clobber_ref_p (call, var)))
return;
}

triggered even for local variables that are passed by value.
This meant that we didn't allow local aggregates to be passed
to a sibling call but did (for example) allow global aggregates
to be passed.

I think the loop is really checking for indirect references,
so should be able to skip any variables that never have their
address taken.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?


Ok.  We've had various correctness issues in this part in the past so
I'd prefer if you can rewrite your dg-do compile tests to dg-do run ones
that verify the code works correctly.  I suggest to use a dg-additional-sources
with a separate TU for the execution driver.


OK, how's this?  Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Richard


gcc/testsuite/
* gcc.dg/tree-ssa/tailcall-7-run.c: New test.
* gcc.dg/tree-ssa/tailcall-8-run.c: Likewise.

Works for me.
jeff



Re: [Patch 16/17 libgcc ARM] Half to double precision conversions

2016-11-23 Thread James Greenhalgh
On Wed, Nov 16, 2016 at 04:00:12PM +, Kyrill Tkachov wrote:
> <...>
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 8393f65..4074773 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -5177,20 +5177,35 @@
>""
>  )
> -;; DFmode to HFmode conversions have to go through SFmode.
> +;; DFmode to HFmode conversions on targets without a single-step hardware
> +;; instruction for it would have to go through SFmode.  This is dangerous
> +;; as it introduces double rounding.
> +;;
> +;; Disable this pattern unless we are in an unsafe math mode, or we have
> +;; a single-step instruction.
> +
>  (define_expand "truncdfhf2"
> -  [(set (match_operand:HF  0 "general_operand" "")
> +  [(set (match_operand:HF  0 "s_register_operand" "")
>   (float_truncate:HF
> -  (match_operand:DF 1 "general_operand" "")))]
> -  "TARGET_EITHER"
> -  "
> -  {
> -rtx op1;
> -op1 = convert_to_mode (SFmode, operands[1], 0);
> -op1 = convert_to_mode (HFmode, op1, 0);
> -emit_move_insn (operands[0], op1);
> -DONE;
> -  }"
> +  (match_operand:DF 1 "s_register_operand" "")))]
> +  "(TARGET_EITHER && flag_unsafe_math_optimizations)
> +   || (TARGET_32BIT && TARGET_FP16_TO_DOUBLE)"
> +{
> +  /* We don't have a direct instruction for this, so we must be in
> + an unsafe math mode, and going via SFmode.  */
> +
> +  if (!(TARGET_32BIT && TARGET_FP16_TO_DOUBLE))
> +{
> +  rtx op1;
> +  gcc_assert (flag_unsafe_math_optimizations);
> 
> I'd remove this assert. From the condition of the expander it is obvious
> that if !(TARGET_32BIT && TARGET_FP16_TO_DOUBLE) then
> flag_unsafe_math_optimizations is true.
> 
> 
> Ok with this change.

Thanks.

In the end, this is what I committed.

Thanks,
James

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index abd3276..118b93b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -245,7 +245,6 @@ static bool arm_output_addr_const_extra (FILE *, rtx);
 static bool arm_allocate_stack_slots_for_args (void);
 static bool arm_warn_func_return (tree);
 static tree arm_promoted_type (const_tree t);
-static tree arm_convert_to_type (tree type, tree expr);
 static bool arm_scalar_mode_supported_p (machine_mode);
 static bool arm_frame_pointer_required (void);
 static bool arm_can_eliminate (const int, const int);
@@ -654,9 +653,6 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_PROMOTED_TYPE
 #define TARGET_PROMOTED_TYPE arm_promoted_type
 
-#undef TARGET_CONVERT_TO_TYPE
-#define TARGET_CONVERT_TO_TYPE arm_convert_to_type
-
 #undef TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
 
@@ -2535,6 +2531,11 @@ arm_init_libfuncs (void)
 ? "__gnu_h2f_ieee"
 : "__gnu_h2f_alternative"));
 
+  set_conv_libfunc (trunc_optab, HFmode, DFmode,
+   (arm_fp16_format == ARM_FP16_FORMAT_IEEE
+? "__gnu_d2h_ieee"
+: "__gnu_d2h_alternative"));
+
   /* Arithmetic.  */
   set_optab_libfunc (add_optab, HFmode, NULL);
   set_optab_libfunc (sdiv_optab, HFmode, NULL);
@@ -5259,6 +5260,8 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
SFmode));
   add_libcall (libcall_htab, convert_optab_libfunc (trunc_optab, SFmode,
DFmode));
+  add_libcall (libcall_htab,
+  convert_optab_libfunc (trunc_optab, HFmode, DFmode));
 }
 
   return libcall && libcall_htab->find (libcall) != NULL;
@@ -22514,23 +22517,6 @@ arm_promoted_type (const_tree t)
   return NULL_TREE;
 }
 
-/* Implement TARGET_CONVERT_TO_TYPE.
-   Specifically, this hook implements the peculiarity of the ARM
-   half-precision floating-point C semantics that requires conversions between
-   __fp16 to or from double to do an intermediate conversion to float.  */
-
-static tree
-arm_convert_to_type (tree type, tree expr)
-{
-  tree fromtype = TREE_TYPE (expr);
-  if (!SCALAR_FLOAT_TYPE_P (fromtype) || !SCALAR_FLOAT_TYPE_P (type))
-return NULL_TREE;
-  if ((TYPE_PRECISION (fromtype) == 16 && TYPE_PRECISION (type) > 32)
-  || (TYPE_PRECISION (type) == 16 && TYPE_PRECISION (fromtype) > 32))
-return convert (type, convert (float_type_node, expr));
-  return NULL_TREE;
-}
-
 /* Implement TARGET_SCALAR_MODE_SUPPORTED_P.
This simply adds HFmode as a supported mode; even though we don't
implement arithmetic on this type directly, it's supported by
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 7ad0fbf..15930f0 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -179,6 +179,11 @@ extern void 
(*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_FP16\
   (ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_FP16))
 
+/* FPU supports converting 

Re: [Patch 14/17] [libgcc, ARM] Generalise float-to-half conversion function.

2016-11-23 Thread James Greenhalgh
On Wed, Nov 16, 2016 at 04:38:17PM +, Kyrill Tkachov wrote:
> Hi James,
> diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
> index 39c863c..ba89796 100644
> --- a/libgcc/config/arm/fp16.c
> +++ b/libgcc/config/arm/fp16.c
> @@ -22,40 +22,74 @@
> see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> .  */
> +struct format
> +{
> +  /* Number of bits.  */
> +  unsigned long long size;
> +  /* Exponent bias.  */
> +  unsigned long long bias;
> +  /* Exponent width in bits.  */
> +  unsigned long long exponent;
> +  /* Significand precision in explicitly stored bits.  */
> +  unsigned long long significand;
> +};
> +
> +static const struct format
> +binary32 =
> +{
> +  32,   /* size.  */
> +  127,  /* bias.  */
> +  8,/* exponent.  */
> +  23/* significand.  */
> +};
> +
>  static inline unsigned short
> -__gnu_f2h_internal(unsigned int a, int ieee)
> +__gnu_float2h_internal (const struct format* fmt,
> + unsigned long long a, int ieee)
>  {
> -  unsigned short sign = (a >> 16) & 0x8000;
> -  int aexp = (a >> 23) & 0xff;
> -  unsigned int mantissa = a & 0x007f;
> -  unsigned int mask;
> -  unsigned int increment;
> +  unsigned long long point = 1ULL << fmt->significand;;
> 
> Trailing ';'.
> 
> <...>
> @@ -93,7 +127,13 @@ __gnu_f2h_internal(unsigned int a, int ieee)
>/* We leave the leading 1 in the mantissa, and subtract one
>   from the exponent bias to compensate.  */
> -  return sign | (((aexp + 14) << 10) + (mantissa >> 13));
> +  return sign | (((aexp + 14) << 10) + (mantissa >> (fmt->significand - 
> 10)));
> +}
> 
> I suppose I'm not very familiar with the soft-fp code but I don't see at a 
> glance how
> the comment relates to the operation it's above of (where is the 'one' being 
> subtracted
> from the bias?). If you want to improve that comment or give me a quick 
> explanation of why
> the code does what it says it does it would be appreciated.
> 
> I've gone through the generalisation and it looks correct to me.
> So given that you have put this through the testing you say you did this is 
> ok with the nits
> above addressed.

Hi Kyrill,

Thanks. In the end this is what I committed.

James

diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
index 39c863c..76f7327 100644
--- a/libgcc/config/arm/fp16.c
+++ b/libgcc/config/arm/fp16.c
@@ -22,40 +22,74 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+struct format
+{
+  /* Number of bits.  */
+  unsigned long long size;
+  /* Exponent bias.  */
+  unsigned long long bias;
+  /* Exponent width in bits.  */
+  unsigned long long exponent;
+  /* Significand precision in explicitly stored bits.  */
+  unsigned long long significand;
+};
+
+static const struct format
+binary32 =
+{
+  32,   /* size.  */
+  127,  /* bias.  */
+  8,/* exponent.  */
+  23/* significand.  */
+};
+
 static inline unsigned short
-__gnu_f2h_internal(unsigned int a, int ieee)
+__gnu_float2h_internal (const struct format* fmt,
+   unsigned long long a, int ieee)
 {
-  unsigned short sign = (a >> 16) & 0x8000;
-  int aexp = (a >> 23) & 0xff;
-  unsigned int mantissa = a & 0x007f;
-  unsigned int mask;
-  unsigned int increment;
+  unsigned long long point = 1ULL << fmt->significand;
+  unsigned short sign = (a >> (fmt->size - 16)) & 0x8000;
+  int aexp;
+  unsigned long long mantissa;
+  unsigned long long mask;
+  unsigned long long increment;
+
+  /* Get the exponent and mantissa encodings.  */
+  mantissa = a & (point - 1);
 
-  if (aexp == 0xff)
+  mask = (1 << fmt->exponent) - 1;
+  aexp = (a >> fmt->significand) & mask;
+
+  /* Infinity, NaN and alternative format special case.  */
+  if (((unsigned int) aexp) == mask)
 {
   if (!ieee)
return sign;
   if (mantissa == 0)
return sign | 0x7c00;   /* Infinity.  */
   /* Remaining cases are NaNs.  Convert SNaN to QNaN.  */
-  return sign | 0x7e00 | (mantissa >> 13);
+  return sign | 0x7e00 | (mantissa >> (fmt->significand - 10));
 }
 
+  /* Zero.  */
   if (aexp == 0 && mantissa == 0)
 return sign;
 
-  aexp -= 127;
+  /* Construct the exponent and mantissa.  */
+  aexp -= fmt->bias;
+
+  /* Decimal point is immediately after the significand.  */
+  mantissa |= point;
 
-  /* Decimal point between bits 22 and 23.  */
-  mantissa |= 0x0080;
   if (aexp < -14)
 {
-  mask = 0x00ff;
+  mask = point | (point - 1);
+  /* Minimum exponent for half-precision is 2^-24.  */
   if (aexp >= -25)
mask >>= 25 + aexp;
 }
   else
-mask = 0x1fff;
+mask = (point - 1) >> 10;
 
   /* Round.  */
   if (mantissa & mask)
@@ -64,8 +98,8 @@ __gnu_f2h_internal(unsigned int a, int ieee)
   if ((mantissa & mask) == increment)
increment = mantissa & (increment << 1);
   mantissa += increment;
-  if (mantissa >= 

Re: [C++ PATCH] Fix wrong-code on var construction in static init (PR c++/77907)

2016-11-23 Thread Jason Merrill
On Wed, Nov 23, 2016 at 11:36 AM, Jakub Jelinek  wrote:
> On Wed, Nov 23, 2016 at 10:42:57AM -0500, Jason Merrill wrote:
>> OK, but I wonder why we don't do constant initialization of that variable...
>
> Dunno either, check_initializer simply returns the call to the constructor
> for the var.  I've tried something like:
>
> --- gcc/cp/decl.c.jj2016-11-21 19:47:06.0 +0100
> +++ gcc/cp/decl.c   2016-11-23 17:31:15.963594787 +0100
> @@ -6339,7 +6339,27 @@ check_initializer (tree decl, tree init,
>   init_code = NULL_TREE;
> }
>   else
> -   init = NULL_TREE;
> +   {
> + init = NULL_TREE;
> + if (TREE_CODE (init_code) == CALL_EXPR)
> +   {
> + tree callee = get_callee_fndecl (init_code);
> + /* If INIT_CODE is a call to constexpr ctor,
> +see if we can optimize it into a constant
> +initialization.  */
> + if (callee
> + && DECL_DECLARED_CONSTEXPR_P (callee)
> + && DECL_CONSTRUCTOR_P (callee)
> + && !flag_no_inline)
> +   {
> + init = maybe_constant_value (init_code);
> + if (TREE_CODE (init) == CALL_EXPR)
> +   init = NULL_TREE;
> + else
> +   init_code = NULL_TREE;
> +   }
> +   }
> +   }
> }
>
>if (init && TREE_CODE (init) != TREE_VEC)
>
> but store_init_value still doesn't like that and turns it into a runtime
> initialization.
>
> So, shall I check the patch in and defer the rest to you, or just defer it?

Go ahead and check it in.

Jason


And fix the mcore port too

2016-11-23 Thread Jeff Law


Another prototype that didn't get fixed in the rtx_insn * changes.

Installing on the trunk as obvious.

Jeff
commit 9fb780b14647e3c0d4232029b9f1b7d6de6103d6
Author: Jeff Law 
Date:   Wed Nov 23 11:15:03 2016 -0700

* config/mcore/mcore.c (emit_new_cond_insn): Fix prototype.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b8a968e..686eb72 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -7,6 +7,8 @@
 
 2016-11-23  Jeff Law  
 
+   * config/mcore/mcore.c (emit_new_cond_insn): Fix prototype.
+
* config/iq2000/iq2000.c (iq2000_rtx_costs): Avoid multiplication
in boolean context warning.
 
diff --git a/gcc/config/mcore/mcore.c b/gcc/config/mcore/mcore.c
index 2d4a911..f9fb629 100644
--- a/gcc/config/mcore/mcore.c
+++ b/gcc/config/mcore/mcore.c
@@ -98,7 +98,7 @@ static const char * output_inline_const 
(machine_mode, rtx *);
 static void   layout_mcore_frame(struct mcore_frame *);
 static void   mcore_setup_incoming_varargs (cumulative_args_t, 
machine_mode, tree, int *, int);
 static cond_type  is_cond_candidate (rtx);
-static rtx_insn  *emit_new_cond_insn(rtx, int);
+static rtx_insn  *emit_new_cond_insn(rtx_insn *, int);
 static rtx_insn  *conditionalize_block  (rtx_insn *);
 static void   conditionalize_optimization   (void);
 static void   mcore_reorg   (void);


[PATCH, rs6000] Fix PR78458, LRA ICE building libgcc for powerpc-linux-gnuspe e500v2

2016-11-23 Thread Peter Bergner
PR78458 shows a problem in LRA spilling caused by HARD_REGNO_CALLER_SAVE_MODE()
returning a bogus mode (IFmode).  This patch solves the problem by just
returning MODE if MODE is wide enough to save and restore NREGS itself.

This patch passed bootstrap and regtesting on powerpc64le-linux as well
as on powerpc64-linux (testsuite run in both 32-bit and 64-bit modes).
Joseph has confirmed it fixes his ICE and gets him further in his toolchain
build (now ICEs in glibc), which he has separate patches for.

Joseph, I copied the testsuite preamble from another SPE test case.
Does it look correct to you to catch the bug?

Assuming the test case is correct, is this ok for trunk?

Peter


gcc/
PR target/78458
* config/rs6000/rs6000.h (HARD_REGNO_CALLER_SAVE_MODE): Return MODE
if it is at least NREGS wide.

gcc/testsuite/
PR target/78458
* gcc.target/powerpc/pr78458.c: New.

Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 241976)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -1279,9 +1279,11 @@ enum data_align { align_abi, align_opt,
enough space to account for vectors in FP regs.  However, TFmode/TDmode
should not use VSX instructions to do a caller save. */
 #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE)
\
-  (TARGET_VSX  \
-   && ((MODE) == VOIDmode || ALTIVEC_OR_VSX_VECTOR_MODE (MODE))
\
-   && FP_REGNO_P (REGNO)   \
+  (NREGS <= rs6000_hard_regno_nregs[MODE][REGNO]   \
+   ? MODE  \
+   : TARGET_VSX
\
+ && ((MODE) == VOIDmode || ALTIVEC_OR_VSX_VECTOR_MODE (MODE))  \
+ && FP_REGNO_P (REGNO) \
? V2DFmode  \
: TARGET_E500_DOUBLE && (MODE) == SImode\
? SImode\
Index: gcc/testsuite/gcc.target/powerpc/pr78458.c
===
--- gcc/testsuite/gcc.target/powerpc/pr78458.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr78458.c  (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=8548 -mspe -mabi=spe -mlra" } */
+/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } { "*" } { "" } } 
*/
+
+extern void bar (void);
+long double
+pr78458 (long double p1)
+{
+  bar ();
+  asm volatile ("# clobbers" :::
+   "r14", "r15", "r16", "r17", "r18", "r19",
+   "r20", "r21", "r22", "r23", "r24", "r25",
+   "r26", "r27", "r28", "r29", "r30", "r31");
+  return p1;
+}



Fix iq2000 builds

2016-11-23 Thread Jeff Law



The XXX in boolean context warning is now triggering on a bit of code in 
the iq2000 backend.  I'm trying to guess Stan's intent from 2003.  If 
I've got it wrong, the worst that happens in an incorrect cost computation.


Installing on the trunk.

Jeff
commit 6ed89ddbdbbde3ffd42b2ab6fad61a76e85918f8
Author: law 
Date:   Wed Nov 23 18:10:53 2016 +

* config/iq2000/iq2000.c (iq2000_rtx_costs): Avoid multiplication
in boolean context warning.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242787 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index db002f7..b8a968e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -7,6 +7,9 @@
 
 2016-11-23  Jeff Law  
 
+   * config/iq2000/iq2000.c (iq2000_rtx_costs): Avoid multiplication
+   in boolean context warning.
+
* config/ia64/ia64.c (ia64_emit_insn_before): Fix prototype.
 
 2016-11-23  James Greenhalgh  
diff --git a/gcc/config/iq2000/iq2000.c b/gcc/config/iq2000/iq2000.c
index 2c936da..a3195db 100644
--- a/gcc/config/iq2000/iq2000.c
+++ b/gcc/config/iq2000/iq2000.c
@@ -3309,7 +3309,7 @@ iq2000_rtx_costs (rtx x, machine_mode mode, int 
outer_code ATTRIBUTE_UNUSED,
int num_words = (GET_MODE_SIZE (mode) > UNITS_PER_WORD) ? 2 : 1;
 
if (simple_memory_operand (x, mode))
- return COSTS_N_INSNS (num_words);
+ return COSTS_N_INSNS (num_words) != 0;
 
* total = COSTS_N_INSNS (2 * num_words);
break;


Re: PR78153

2016-11-23 Thread Prathamesh Kulkarni
On 23 November 2016 at 17:51, Richard Biener  wrote:
> On Wed, 23 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 23 November 2016 at 17:21, Richard Biener  wrote:
>> > On Wed, 23 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 23 November 2016 at 15:16, Richard Biener  wrote:
>> >> > On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 22 November 2016 at 20:53, Richard Biener  wrote:
>> >> >> > On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> On 22 November 2016 at 20:18, Richard Biener  
>> >> >> >> wrote:
>> >> >> >> > On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >> >
>> >> >> >> >> On 21 November 2016 at 15:10, Richard Biener  
>> >> >> >> >> wrote:
>> >> >> >> >> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >> >> >
>> >> >> >> >> >> Hi,
>> >> >> >> >> >> As suggested by Martin in PR78153 strlen's return value 
>> >> >> >> >> >> cannot exceed
>> >> >> >> >> >> PTRDIFF_MAX.
>> >> >> >> >> >> So I set it's range to [0, PTRDIFF_MAX - 1] in 
>> >> >> >> >> >> extract_range_basic()
>> >> >> >> >> >> in the attached patch.
>> >> >> >> >> >>
>> >> >> >> >> >> However it regressed strlenopt-3.c:
>> >> >> >> >> >>
>> >> >> >> >> >> Consider fn1() from strlenopt-3.c:
>> >> >> >> >> >>
>> >> >> >> >> >> __attribute__((noinline, noclone)) size_t
>> >> >> >> >> >> fn1 (char *p, char *q)
>> >> >> >> >> >> {
>> >> >> >> >> >>   size_t s = strlen (q);
>> >> >> >> >> >>   strcpy (p, q);
>> >> >> >> >> >>   return s - strlen (p);
>> >> >> >> >> >> }
>> >> >> >> >> >>
>> >> >> >> >> >> The optimized dump shows the following:
>> >> >> >> >> >>
>> >> >> >> >> >> __attribute__((noclone, noinline))
>> >> >> >> >> >> fn1 (char * p, char * q)
>> >> >> >> >> >> {
>> >> >> >> >> >>   size_t s;
>> >> >> >> >> >>   size_t _7;
>> >> >> >> >> >>   long unsigned int _9;
>> >> >> >> >> >>
>> >> >> >> >> >>   :
>> >> >> >> >> >>   s_4 = strlen (q_3(D));
>> >> >> >> >> >>   _9 = s_4 + 1;
>> >> >> >> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >> >> >> >> >>   _7 = 0;
>> >> >> >> >> >>   return _7;
>> >> >> >> >> >>
>> >> >> >> >> >> }
>> >> >> >> >> >>
>> >> >> >> >> >> which introduces the regression, because the test expects 
>> >> >> >> >> >> "return 0;" in fn1().
>> >> >> >> >> >>
>> >> >> >> >> >> The issue seems to be in vrp2:
>> >> >> >> >> >>
>> >> >> >> >> >> Before the patch:
>> >> >> >> >> >> Visiting statement:
>> >> >> >> >> >> s_4 = strlen (q_3(D));
>> >> >> >> >> >> Found new range for s_4: VARYING
>> >> >> >> >> >>
>> >> >> >> >> >> Visiting statement:
>> >> >> >> >> >> _1 = s_4;
>> >> >> >> >> >> Found new range for _1: [s_4, s_4]
>> >> >> >> >> >> marking stmt to be not simulated again
>> >> >> >> >> >>
>> >> >> >> >> >> Visiting statement:
>> >> >> >> >> >> _7 = s_4 - _1;
>> >> >> >> >> >> Applying pattern match.pd:111, gimple-match.c:27997
>> >> >> >> >> >> Match-and-simplified s_4 - _1 to 0
>> >> >> >> >> >> Intersecting
>> >> >> >> >> >>   [0, 0]
>> >> >> >> >> >> and
>> >> >> >> >> >>   [0, +INF]
>> >> >> >> >> >> to
>> >> >> >> >> >>   [0, 0]
>> >> >> >> >> >> Found new range for _7: [0, 0]
>> >> >> >> >> >>
>> >> >> >> >> >> __attribute__((noclone, noinline))
>> >> >> >> >> >> fn1 (char * p, char * q)
>> >> >> >> >> >> {
>> >> >> >> >> >>   size_t s;
>> >> >> >> >> >>   long unsigned int _1;
>> >> >> >> >> >>   long unsigned int _9;
>> >> >> >> >> >>
>> >> >> >> >> >>   :
>> >> >> >> >> >>   s_4 = strlen (q_3(D));
>> >> >> >> >> >>   _9 = s_4 + 1;
>> >> >> >> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >> >> >> >> >>   _1 = s_4;
>> >> >> >> >> >>   return 0;
>> >> >> >> >> >>
>> >> >> >> >> >> }
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> After the patch:
>> >> >> >> >> >> Visiting statement:
>> >> >> >> >> >> s_4 = strlen (q_3(D));
>> >> >> >> >> >> Intersecting
>> >> >> >> >> >>   [0, 9223372036854775806]
>> >> >> >> >> >> and
>> >> >> >> >> >>   [0, 9223372036854775806]
>> >> >> >> >> >> to
>> >> >> >> >> >>   [0, 9223372036854775806]
>> >> >> >> >> >> Found new range for s_4: [0, 9223372036854775806]
>> >> >> >> >> >> marking stmt to be not simulated again
>> >> >> >> >> >>
>> >> >> >> >> >> Visiting statement:
>> >> >> >> >> >> _1 = s_4;
>> >> >> >> >> >> Intersecting
>> >> >> >> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> >> >> >> >> and
>> >> >> >> >> >>   [0, 9223372036854775806]
>> >> >> >> >> >> to
>> >> >> >> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> >> >> >> >> Found new range for _1: [0, 9223372036854775806]
>> >> >> >> >> >> marking stmt to be not simulated again
>> >> >> >> >> >>
>> >> >> >> >> >> Visiting statement:
>> >> >> >> >> >> _7 = s_4 - _1;
>> >> >> >> >> >> Intersecting
>> >> >> >> >> >>   ~[9223372036854775807, 9223372036854775809]
>> >> >> >> >> >> and
>> >> >> >> >> >>   ~[9223372036854775807, 9223372036854775809]
>> >> >> 

Re: [patch] Fix PR rtl-optimization/78437

2016-11-23 Thread Jeff Law

On 11/23/2016 05:26 AM, Bernd Schmidt wrote:

On 11/23/2016 11:18 AM, Paolo Bonzini wrote:



On 23/11/2016 10:35, Eric Botcazou wrote:

I now wonder
whether this also matters for !WORD_REGISTER_OPERATIONS targets, e.g.
x86:

 (set (reg1:DI) ...
 ...
 (set (reg1:QI) (expression:QI))
 ...
 (set (reg2:DI) (any_extend:DI (reg1:QI)))
 ...
 (use (reg1:DI))

where the use reads well-defined upper bits from the very first set.


Does it really do that with a (set (reg1:QI)), as opposed to a
(set (strict_low_part (subreg:QI (reg1:DI)))?


It would need strict_low_part unless we're dealing with subwords. I
think the patch should maybe check for that in the !W_R_O case.

Does WORD_REGISTER_OPERATIONS really buy much on targets that use it?
This sort of situation seems very surprising and unfortunate.
It certainly was helpful on the PA back in the day.  Less sure on modern 
systems.

jeff


Fix ia64 builds

2016-11-23 Thread Jeff Law


The recent rtx_insn * changes mucked up ia64 in a trivial way.  The 
definition of ia64_emit_insn_before was changed, but not the prototype. 
This results in a build error.


Fixed in the obvious way.  Applying to the trunk.

Jeff
commit f491be0d30b8e0063e54972497b8e049482f03b8
Author: law 
Date:   Wed Nov 23 17:55:57 2016 +

* config/ia64/ia64.c (ia64_emit_insn_before): Fix prototype.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242785 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 600a9ef..62e354c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2016-11-23  Jeff Law  
+
+   * config/ia64/ia64.c (ia64_emit_insn_before): Fix prototype.
+
 2016-11-23  James Greenhalgh  
 
PR target/63250
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index c8dbb4c..169b571 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -7138,7 +7138,7 @@ static char mem_ops_in_group[4];
 static int current_cycle;
 
 static rtx ia64_single_set (rtx_insn *);
-static void ia64_emit_insn_before (rtx, rtx);
+static void ia64_emit_insn_before (rtx, rtx_insn *);
 
 /* Map a bundle number to its pseudo-op.  */
 


Re: [PATCH] Fix PR71762

2016-11-23 Thread Richard Biener
On November 23, 2016 6:25:43 PM GMT+01:00, Bernd Schmidt  
wrote:
>On 11/10/2016 02:28 PM, Richard Biener wrote:
>> Any takers for the RTL implementation?
>
>Do you have a testcase you think can be optimized?

The forwrop test case I XFAILed with the patch.

Richard.

>
>Bernd




Re: [PATCH] Add map clauses to libgomp test device-3.f90

2016-11-23 Thread Jakub Jelinek
On Tue, Nov 15, 2016 at 08:01:32PM +0300, Alexander Monakov wrote:
> On Tue, 15 Nov 2016, Alexander Monakov wrote:
> > Yep, I do see new test execution failures with both Intel MIC and PTX 
> > offloading
> > on device-1.f90, device-3.f90 and target2.f90.  Here's an actually-tested 
> > patch
> > for the first two (on target2.f90 there's a different problem).
> 
> And here's a patch for target2.f90.  I don't have a perfect understanding of
> mapping clauses, but the test appears to need to explicitly map pointer
> variables, at a minimum.  Also, 'map (from: r)' is missing on the last target
> region.
> 
>   * testsuite/libgomp.fortran/target2.f90 (foo): Add mapping clauses to
>   target construct.
> 
> diff --git a/libgomp/testsuite/libgomp.fortran/target2.f90 
> b/libgomp/testsuite/libgomp.fortran/target2.f90
> index 42f704f..7119774 100644
> --- a/libgomp/testsuite/libgomp.fortran/target2.f90
> +++ b/libgomp/testsuite/libgomp.fortran/target2.f90
> @@ -63,7 +63,7 @@ contains
>r = r .or. (any (k(5:n-5) /= 17)) .or. (lbound (k, 1) /= 4) .or. 
> (ubound (k, 1) /= n)
>  !$omp end target
>  if (r) call abort
> -!$omp target map (to: d(2:n+1), n)
> +!$omp target map (to: d(2:n+1), f, j) map (from: r)

The missing map (from: r) is obvious.  Explicitly mentioned f and j is
tested in another part of the test, here I believe it should be implicitly
firstprivate, and then
"If the original list item has the POINTER attribute, the new list items
receive the same association status of the original list item as if by
pointer assignment."
shall apply.  I think I'd prefer the test to be broken over forgetting this
isn't handled correctly.

Jakub


Re: [PATCH] Fix PR71762

2016-11-23 Thread Bernd Schmidt

On 11/10/2016 02:28 PM, Richard Biener wrote:

Any takers for the RTL implementation?


Do you have a testcase you think can be optimized?


Bernd



Re: Tighten check for whether a sibcall references local variables

2016-11-23 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Nov 22, 2016 at 10:00 AM, Richard Sandiford
>  wrote:
>> This loop:
>>
>>   /* Make sure the tail invocation of this function does not refer
>>  to local variables.  */
>>   FOR_EACH_LOCAL_DECL (cfun, idx, var)
>> {
>>   if (TREE_CODE (var) != PARM_DECL
>>   && auto_var_in_fn_p (var, cfun->decl)
>>   && (ref_maybe_used_by_stmt_p (call, var)
>>   || call_may_clobber_ref_p (call, var)))
>> return;
>> }
>>
>> triggered even for local variables that are passed by value.
>> This meant that we didn't allow local aggregates to be passed
>> to a sibling call but did (for example) allow global aggregates
>> to be passed.
>>
>> I think the loop is really checking for indirect references,
>> so should be able to skip any variables that never have their
>> address taken.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Ok.  We've had various correctness issues in this part in the past so
> I'd prefer if you can rewrite your dg-do compile tests to dg-do run ones
> that verify the code works correctly.  I suggest to use a 
> dg-additional-sources
> with a separate TU for the execution driver.

OK, how's this?  Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Richard


gcc/testsuite/
* gcc.dg/tree-ssa/tailcall-7-run.c: New test.
* gcc.dg/tree-ssa/tailcall-8-run.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-7-run.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-7-run.c
new file mode 100644
index 000..d18ff9d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-7-run.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-sources "tailcall-7.c" } */
+
+struct s { int x; };
+extern struct s global;
+
+void g1 (void);
+void g2 (void);
+void g3 (struct s *);
+struct s g4 (struct s);
+struct s g5 (void);
+struct s g6 (void);
+struct s g7 (void);
+struct s g8 (struct s *);
+int g9 (struct s);
+int g10 (int);
+
+struct s last;
+struct s tmp;
+
+struct s
+f (int i)
+{
+  struct s ret;
+  ret.x = i + 100;
+  last = ret;
+  return ret;
+}
+
+void
+callit (void (*fn) (void))
+{
+  fn ();
+}
+
+int
+test (int last_val, int global_val, int tmp_val)
+{
+  return last.x == last_val && global.x == global_val && tmp.x == tmp_val;
+}
+
+int
+main (void)
+{
+  global.x = 200;
+  tmp.x = 300;
+  g1 ();
+  if (!test (101, 200, 300))
+__builtin_abort ();
+  g2 ();
+  if (!test (102, 102, 300))
+__builtin_abort ();
+  g3 ();
+  if (!test (103, 102, 103))
+__builtin_abort ();
+  if (g4 (tmp).x != 104 || !test (104, 102, 103))
+__builtin_abort ();
+  if (g5 ().x != 105 || !test (105, 102, 103))
+__builtin_abort ();
+  if (g6 ().x != 106 || !test (106, 102, 103))
+__builtin_abort ();
+  if (g7 ().x != 107 || !test (107, 107, 103))
+__builtin_abort ();
+  if (g8 ().x != 108 || !test (108, 107, 108))
+__builtin_abort ();
+  if (g9 (tmp) != 9 || !test (109, 107, 108))
+__builtin_abort ();
+  if (g10 (10) != 10 || !test (110, 107, 108))
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8-run.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8-run.c
new file mode 100644
index 000..f892909
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8-run.c
@@ -0,0 +1,86 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-sources "tailcall-8.c" } */
+
+struct s { int x; };
+
+int expected;
+struct s *last_ptr;
+struct s tmp;
+
+void
+start (int val, struct s *initial_last_ptr)
+{
+  expected = val;
+  tmp.x = val;
+  last_ptr = initial_last_ptr;
+}
+
+void
+f_direct (struct s param)
+{
+  if (param.x != expected)
+__builtin_abort ();
+}
+
+void
+f_indirect (struct s *ptr)
+{
+  if (ptr->x != expected)
+__builtin_abort ();
+  last_ptr = ptr;
+  ptr->x += 100;
+}
+
+void
+f_void (void)
+{
+  if (last_ptr->x != expected + 100)
+__builtin_abort ();
+}
+
+
+void g1 (struct s);
+void g2 (struct s *);
+void g3 (struct s *);
+void g4 (struct s *);
+void g5 (struct s);
+void g6 (struct s);
+void g7 (struct s);
+void g8 (struct s *);
+void g9 (struct s *);
+
+int
+main (void)
+{
+  struct s g6_s = { 106 };
+
+  start (1, 0);
+  g1 (tmp);
+
+  start (2, 0);
+  g2 ();
+
+  start (3, 0);
+  g3 ();
+
+  start (4, 0);
+  g4 ();
+
+  start (5, 0);
+  g5 (tmp);
+
+  start (6, _s);
+  g6 (tmp);
+
+  start (7, 0);
+  g7 (tmp);
+
+  start (8, 0);
+  g8 ();
+
+  start (9, 0);
+  g9 ();
+
+  return 0;
+}



Re: [Patch v4 0/17] Add support for _Float16 to AArch64 and ARM

2016-11-23 Thread James Greenhalgh
On Mon, Nov 21, 2016 at 09:19:35AM +, Kyrill Tkachov wrote:
> 
> On 18/11/16 18:19, James Greenhalgh wrote:
> >On Fri, Nov 11, 2016 at 03:37:17PM +, James Greenhalgh wrote:
> >>Hi,
> >>
> >>This patch set enables the _Float16 type specified in ISO/IEC TS 18661-3
> >>for AArch64 and ARM. The patch set has been posted over the past two months,
> >>with many of the target-independent changes approved. I'm reposting it in
> >>entirity in the form I hope to commit it to trunk.
> >>
> >>The patch set can be roughly split in three; first, hookization of
> >>TARGET_FLT_EVAL_METHOD, and changes to the excess precision logic in the
> >>compiler to handle the new values for FLT_EVAL_METHOD defined in
> >>ISO/IEC TS-18661-3. Second, the AArch64 changes required to enable _Float16,
> >>and finally the ARM changes required to enable _Float16.
> >>
> >>The broad goals and an outline of each patch in the patch set were
> >>described in https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02383.html .
> >>As compared to the original submission, the patch set has grown an ARM
> >>port, and has had several rounds of technical review on the target
> >>independent aspects.
> >>
> >>This has resulted in many of the patches already being approved, a full
> >>summary of the status of each ticket is immediately below.
> >>
> >>Clearly the focus for review of this patch set now needs to be the AArch64
> >>and ARM ports, I hope the appropriate maintainers will be able to do so in
> >>time for the patch set to be accepted for GCC 7.
> >>
> >>I've built and tested the full patch set on ARM (cross and native),
> >>AArch64 (cross and native) and x86_64 (native) with no identified issues.
> >All the target independent changes are now approved, and all the ARM patches
> >have been approved (two are conditional on minor changes).
> >
> >I'd like to apply the target independent and ARM changes to trunk, while I
> >wait for approval of the AArch64 changes. The changes for the two ports 
> >should
> >be independent. Would that be acceptable, or would you prefer me to wait
> >for review of the AArch64 changes?
> 
> That's fine with me (I'd like to start getting the new features in to trunk so
> they can start getting the stage 3 testing).

I haven't seen any other objections to this, and I too would like to ensure
that the changes get as much Stage 3 testing across multiple architectures
as I can.

As the three AArch64 patches can be considered independent, I'll push the
target independent changes and the ARM changes now.

I've rerun a bootstrap on x86-64 and ARM before doing so, and there are
still no regressions.

That leaves these patches outstanding for AArch64 (which I'd like to ping).


AArch64 changes

3 patches, none reviewed


[Patch AArch64 11/17] Add floatdihf2 and floatunsdihf2 patterns

   Not reviewed, last pinged (^6):
   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00584.html

[Patch libgcc AArch64 12/17] Enable hfmode soft-float conversions and 
truncations

   Not reviewed:
   https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02395.html

[Patch AArch64 13/17] Enable _Float16 for AArch64

   Not reviewed:
   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01176.html

Thanks,
James

> 
> Kyrill
> 
> >I will then send a follow-up patch for doc/extend.texi detailing the
> >availability of _Float16 on ARM (I'm holding off on doing this until I know
> >which order the ARM and AArch64 parts will go in).
> >
> >Thanks,
> >James
> >
> >>--
> >>Target independent changes
> >>
> >>10 patches, 9 previously approved, 1 New implementing testsuite
> >>changes to enable _Float16 tests in more circumstances on ARM.
> >>--
> >>
> >>[Patch 1/17] Add a new target hook for describing excess precision 
> >>intentions
> >>
> >>   Approved: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00781.html
> >>
> >>[Patch 2/17] Implement TARGET_C_EXCESS_PRECISION for i386
> >>
> >>   Blanket approved by Jeff in:
> >> https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html
> >>
> >>[Patch 3/17] Implement TARGET_C_EXCESS_PRECISION for s390
> >>
> >>   Approved: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01554.html
> >>
> >>[Patch 4/17] Implement TARGET_C_EXCESS_PRECISION for m68k
> >>
> >>   Blanket approved by Jeff in:
> >> https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html
> >>   And by Andreas: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02414.html
> >>
> >>   There was a typo in the original patch, fixed in:
> >> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01173.html
> >>   which I would apply as an "obvious" fix to the original patch.
> >>
> >>[Patch 5/17] Add -fpermitted-flt-eval-methods=[c11|ts-18661-3]
> >>
> >>   Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02405.html
> >>
> >>   Joseph had a comment in
> >>   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00335.html that the tests
> >>   should check FLT_EVAL_METHOD from  rather than
> >>   

Re: [PATCH] combine: Improve change_zero_ext (fixes PR71847)

2016-11-23 Thread Georg-Johann Lay

Segher Boessenkool schrieb:

On Wed, Nov 23, 2016 at 04:58:22PM +0100, Georg-Johann Lay wrote:

Hi, this causes an illegal code issue on avr.


Sorry about that.

[ snip ]


Trying 19 -> 7:
Failed to match this instruction:
(set (reg:HI 45 [ x+3 ])
(zero_extend:HI (reg:QI 25 r25 [ x+3 ])))
Successfully matched this instruction:
(set (reg:HI 45 [ x+3 ])
(and:HI (reg:HI 25 r25)
(const_int 255 [0xff])))
allowing combination of insns 19 and 7


R25 is a 16-bit hard reg, but HImode (and larger) registers must start 
at an even register number.


So why does the define_insn allow it?


Because the insn predicate is register_operand:HI which should be fine 
as it is non-strict RTL.  Or are predicates supposed to reject such odd 
operands the backend would never generate itself?


Are you saying that register_operand:HI is not correct?

IMO, if combine is playing with hard regs, it should be conservative and 
not create new hard regs out of thin air...  like


* Test HARD_REGNO_MODE_OK

* Don't increase HARD_REGNO_NREGS because it's not a good idea to create 
hard regs out of thin air.  On avr, for example, one should strive to 
get to smaller modes, not to blow mode sizes...



Johann



Please open a PR so I don't lose track.

Thanks,

Segher


Re: [C++ PATCH] Fix wrong-code on var construction in static init (PR c++/77907)

2016-11-23 Thread Jakub Jelinek
On Wed, Nov 23, 2016 at 10:42:57AM -0500, Jason Merrill wrote:
> OK, but I wonder why we don't do constant initialization of that variable...

Dunno either, check_initializer simply returns the call to the constructor
for the var.  I've tried something like:

--- gcc/cp/decl.c.jj2016-11-21 19:47:06.0 +0100
+++ gcc/cp/decl.c   2016-11-23 17:31:15.963594787 +0100
@@ -6339,7 +6339,27 @@ check_initializer (tree decl, tree init,
  init_code = NULL_TREE;
}
  else
-   init = NULL_TREE;
+   {
+ init = NULL_TREE;
+ if (TREE_CODE (init_code) == CALL_EXPR)
+   {
+ tree callee = get_callee_fndecl (init_code);
+ /* If INIT_CODE is a call to constexpr ctor,
+see if we can optimize it into a constant
+initialization.  */
+ if (callee
+ && DECL_DECLARED_CONSTEXPR_P (callee)
+ && DECL_CONSTRUCTOR_P (callee)
+ && !flag_no_inline)
+   {
+ init = maybe_constant_value (init_code);
+ if (TREE_CODE (init) == CALL_EXPR)
+   init = NULL_TREE;
+ else
+   init_code = NULL_TREE;
+   }
+   }
+   }
}
 
   if (init && TREE_CODE (init) != TREE_VEC)

but store_init_value still doesn't like that and turns it into a runtime
initialization.

So, shall I check the patch in and defer the rest to you, or just defer it?

Jakub


Re: [PATCH] combine: Improve change_zero_ext (fixes PR71847)

2016-11-23 Thread Segher Boessenkool
On Wed, Nov 23, 2016 at 04:58:22PM +0100, Georg-Johann Lay wrote:
> Hi, this causes an illegal code issue on avr.

Sorry about that.

[ snip ]

> Trying 19 -> 7:
> Failed to match this instruction:
> (set (reg:HI 45 [ x+3 ])
> (zero_extend:HI (reg:QI 25 r25 [ x+3 ])))
> Successfully matched this instruction:
> (set (reg:HI 45 [ x+3 ])
> (and:HI (reg:HI 25 r25)
> (const_int 255 [0xff])))
> allowing combination of insns 19 and 7

> R25 is a 16-bit hard reg, but HImode (and larger) registers must start 
> at an even register number.

So why does the define_insn allow it?

Please open a PR so I don't lose track.

Thanks,


Segher


RE: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-23 Thread Michael Collison
Hi Christophe,

This is not a regression per se; the patch causes the test case to generate one 
less instruction overall, but one additional 'and'. Trunk before the patch 
(-O2):

foo:
and w0, w0, 255
lsl w1, w0, 20
orr w0, w1, w0, lsl 8
mov w1, 65024
movkw1, 0xfe0, lsl 16
and w0, w0, w1
ret

The new trunk aarch64 compiler with the patch generates fewer instructions but 
one additional 'and'

foo:
and w0, w0, 255
lsl w1, w0, 20
orr w0, w1, w0, lsl 8
and x0, x0, 268434944
and x0, x0, -2031617
ret

-Original Message-
From: Christophe Lyon [mailto:christophe.l...@linaro.org] 
Sent: Wednesday, November 23, 2016 6:42 AM
To: James Greenhalgh
Cc: Michael Collison; gcc-patches@gcc.gnu.org; nd
Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

Hi Michael,


On 21 November 2016 at 10:52, James Greenhalgh  wrote:
> On Fri, Nov 18, 2016 at 07:41:58AM +, Michael Collison wrote:
>> James,
>>
>> I incorporated all your suggestions, and successfully bootstrapped 
>> and re-ran the testsuite.
>>
>> Okay for trunk?
>>
>> 2016-11-18  Michael Collison  
>>
>>   * config/aarch64/aarch64-protos.h
>>   (aarch64_and_split_imm1, aarch64_and_split_imm2)
>>   (aarch64_and_bitmask_imm): New prototypes
>>   * config/aarch64/aarch64.c (aarch64_and_split_imm1):
>>   New overloaded function to create bit mask covering the
>>   lowest to highest bits set.
>>   (aarch64_and_split_imm2): New overloaded functions to create bit
>>   mask of zeros between first and last bit set.
>>   (aarch64_and_bitmask_imm): New function to determine if a integer
>>   is a valid two instruction "and" operation.
>>   * config/aarch64/aarch64.md:(and3): New define_insn and _split
>>   allowing wider range of constants with "and" operations.
>>   * (ior3, xor3): Use new LOGICAL2 iterator to prevent
>>   "and" operator from matching restricted constant range used for
>>   ior and xor operators.
>>   * config/aarch64/constraints.md (UsO constraint): New SImode constraint
>>   for constants in "and" operantions.
>>   (UsP constraint): New DImode constraint for constants in "and" 
>> operations.
>>   * config/aarch64/iterators.md (lconst2): New mode iterator.
>>   (LOGICAL2): New code iterator.
>>   * config/aarch64/predicates.md (aarch64_logical_and_immediate): New
>>   predicate
>>   (aarch64_logical_and_operand): New predicate allowing extended 
>> constants
>>   for "and" operations.
>>   * testsuite/gcc.target/aarch64/and_const.c: New test to verify
>>   additional constants are recognized and fewer instructions generated.
>>   * testsuite/gcc.target/aarch64/and_const2.c: New test to verify
>>   additional constants are recognized and fewer instructions generated.
>>
>
>> diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> b/gcc/config/aarch64/aarch64-protos.h
>> index 3cdd69b..7ef8cdf 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;  
>> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, 
>> unsigned);  int aarch64_get_condition_code (rtx);  bool 
>> aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT 
>> +val_in); unsigned HOST_WIDE_INT aarch64_and_split_imm2 
>> +(HOST_WIDE_INT val_in); bool aarch64_and_bitmask_imm (unsigned 
>> +HOST_WIDE_INT val_in, machine_mode mode);
>>  int aarch64_branch_cost (bool, bool);  enum aarch64_symbol_type 
>> aarch64_classify_symbolic_expression (rtx);  bool 
>> aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git 
>> a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 
>> 3e663eb..8e33c40 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -3600,6 +3600,43 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
>> machine_mode mode)
>>return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];  
>> }
>>
>> +/* Create mask of ones, covering the lowest to highest bits set in 
>> +VAL_IN.  */
>> +
>> +unsigned HOST_WIDE_INT
>> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) {
>> +  int lowest_bit_set = ctz_hwi (val_in);
>> +  int highest_bit_set = floor_log2 (val_in);
>> +  gcc_assert (val_in != 0);
>
> The comment above the function should make this precondition clear. Or 
> you could pick a well defined behaviour for when val_in == 0 (return 
> 0?), if that fits the rest of the algorithm.
>
> Otherwise, this patch looks OK to me.
>
> Thanks,
> James
>

This patch (r242739) causes a regression on aarch64:

  gcc.dg/zero_bits_compound-2.c scan-assembler-times \\(and: 2 now fails.

Christophe.


[PATCH] Fix PR bootstrap/78493

2016-11-23 Thread Martin Liška
Hello.

As described in the PR, the patch fixes profiled bootstrap on x86_64-linux-gnu.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. And
profiled bootstrap on x86_64-linux-gnu finishes successfully.

Ready to be installed?
Martin
>From 8b7cd9a83cd14f7a15f39e105ccd78e143ec84f2 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 23 Nov 2016 14:08:52 +0100
Subject: [PATCH] Fix PR bootstrap/78493

gcc/ChangeLog:

2016-11-23  Martin Liska  

	PR bootstrap/78493
	* vec.h (~auto_vec): Do va_heap::release just if
	this->m_vec == _auto.  That would help compiler not to
	trigger -Werror=free-nonheap-object.
---
 gcc/vec.h | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/vec.h b/gcc/vec.h
index 14fb2a6..d2d253b 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1272,7 +1272,14 @@ public:
 
   ~auto_vec ()
   {
-this->release ();
+if (this->m_vec == _auto)
+  {
+	gcc_checking_assert (this->using_auto_storage ());
+	this->m_vec->m_vecpfx.m_num = 0;
+	return;
+  }
+
+va_heap::release (this->m_vec);
   }
 
 private:
-- 
2.10.2



Re: [PATCH] combine: Convert subreg-of-lshiftrt to zero_extract properly (PR78390)

2016-11-23 Thread Michael Matz
Hi,

On Wed, 23 Nov 2016, Segher Boessenkool wrote:

> > Even with non-constant shifts it remains an extract and make_extraction 
> > does support variable start positions (which is the shift amount), as does 
> > the zero_extract pattern (depending on target of course).
> 
> Sure, but the extraction length is non-constant as well then!  And not
> just not-constant, with a conditional inside it, even.

Right, that would be the consequence.  I don't see much care for this 
problem anywhere, which probably hints that there aren't many targets 
accepting variant positions for zero_extract :)

> > >   if (GET_CODE (inner) == LSHIFTRT
> > > + && CONST_INT_P (XEXP (inner, 1))
> > >   && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner))
> > >   && subreg_lowpart_p (x))
> > > {
> > >   new_rtx = make_compound_operation (XEXP (inner, 0), next_code);
> > > + int width = GET_MODE_PRECISION (GET_MODE (inner))
> > > + - INTVAL (XEXP (inner, 1));
> > 
> > GET_MODE (new_rtx), because that's the object you're giving to 
> > make_extraction, not inner (and not XEXP(inner, 0)).
> 
> This is about the *original* rtx, the lshiftrt: how many non-zero bits
> does that leave.  The bits zeroed out by the lshiftrt have to be zeroed
> out by the zero_extract as well, to keep the same semantics in the
> resulting rtl.

Of course.  I saw it more as a requirements on the inputs of 
make_extraction.  In the end this is moot, as all three modes (of inner, 
of (inner,0) and of new_rtx) are the same :)


Ciao,
Michael.


Re: [PATCH] combine: Improve change_zero_ext (fixes PR71847)

2016-11-23 Thread Georg-Johann Lay

Hi, this causes an illegal code issue on avr.

Test case (reduced from gcc.dg/builtins-32.c):

extern int signbitf (float);

int test (float x)
{
  return signbitf (x);
}

Before combine, the dump reads

(note 4 0 19 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 19 4 3 2 (set (reg:QI 51 [ x+3 ])
(reg:QI 25 r25 [ x+3 ])) "builtins-32.c":4 56 {movqi_insn}
 (expr_list:REG_DEAD (reg:QI 25 r25 [ x+3 ])
(nil)))
(note 3 19 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 3 8 2 (set (reg:HI 45 [ x+3 ])
(zero_extend:HI (reg:QI 51 [ x+3 ]))) "builtins-32.c":5 370 
{zero_extendqihi2}



and then the .combine dump comes up with:


Trying 19 -> 7:
Failed to match this instruction:
(set (reg:HI 45 [ x+3 ])
(zero_extend:HI (reg:QI 25 r25 [ x+3 ])))
Successfully matched this instruction:
(set (reg:HI 45 [ x+3 ])
(and:HI (reg:HI 25 r25)
(const_int 255 [0xff])))
allowing combination of insns 19 and 7

...

Trying 8 -> 13:
Successfully matched this instruction:
(parallel [
(set (reg/i:HI 24 r24)
(and:HI (reg:HI 25 r25)
(const_int 128 [0x80])))
(clobber (scratch:QI))
])

R25 is a 16-bit hard reg, but HImode (and larger) registers must start 
at an even register number.


HARD_REGNO_MODE_OK rejects such registers, and reload then generates a 
new movhi insn to fix the andhi3.  The .reload dump reads:


(insn 22 8 13 2 (set (reg/i:HI 24 r24)
(reg:HI 25 r25)) "builtins-32.c":6 68 {*movhi}
 (nil))
(insn 13 22 14 2 (parallel [
(set (reg/i:HI 24 r24)
(and:HI (reg/i:HI 24 r24)
(const_int 128 [0x80])))
(clobber (scratch:QI))
]) "builtins-32.c":6 251 {andhi3}
 (nil))

This HI move is obviously wrong, persists until asm out and then gas 
complains, of course:


builtins-32.s: Assembler messages:
builtins-32.s:15: Error: even register number required


http://gcc.gnu.org/r241664 is the first revision that exposes the problem.

Command line used is

$ avr-gcc builtins-32.c -mmcu=avr5 -Os -c -save-temps -dp -da

and I configured with

$ ../../gcc.gnu.org/trunk/configure --target=avr 
--prefix=/local/gnu/install/gcc-7 --disable-shared --disable-nls 
--with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld 
--enable-checking=release --enable-languages=c,c++,lto



Ideas?

The problem is that reload cannot actually fix the situation.  The 
andhi3 is the only insn in the function, and r25 is an incoming register 
("x" is passed in QI:r22 .. QI:r25), but after the change from combine 
it appears as if QI:r26 is also an incoming register as r25 das HImode 
thereafter!



Johann


On 25.10.2016 12:40, Segher Boessenkool wrote:

This improves a few things in change_zero_ext.  Firstly, it should use
the passed in pattern in recog_for_combine, not the pattern of the insn
(they are not the same if the whole pattern was replaced).  Secondly,
it handled zero_ext of a subreg, but with hard registers we do not get
a subreg, instead the mode of the reg is changed.  So this handles that.
Thirdly, after changing a zero_ext to an AND, the resulting RTL may become
non-canonical, like (ior (ashift ..) (and ..)); the AND should be first,
it is commutative.  And lastly, zero_extract as a set_dest wasn't handled
at all, but now it is.

This fixes the testcase in PR71847, and improves code generation in some
other edge cases too.

Tested on powerpc64-linux {-m32,-m64}; I'll also test it on x86 and
will build lots of crosses before committing.


Segher


2016-10-25  Segher Boessenkool  

PR target/71847
* combine.c (change_zero_ext): Handle zero_ext of hard registers.
Swap commutative operands in new RTL if needed.  Handle zero_ext
in the set_dest.
(recog_for_combine): Pass *pnewpat to change_zero_ext instead of
PATTERN (insn).

---
 gcc/combine.c | 55 +++
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index ee12714..b46405b 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11140,9 +11140,10 @@ recog_for_combine_1 (rtx *pnewpat, rtx_insn *insn, rtx 
*pnotes)
Return whether anything was so changed.  */

 static bool
-change_zero_ext (rtx *src)
+change_zero_ext (rtx pat)
 {
   bool changed = false;
+  rtx *src = _SRC (pat);

   subrtx_ptr_iterator::array_type array;
   FOR_EACH_SUBRTX_PTR (iter, array, src, NONCONST)
@@ -11174,6 +11175,14 @@ change_zero_ext (rtx *src)
  size = GET_MODE_PRECISION (GET_MODE (XEXP (x, 0)));
  x = SUBREG_REG (XEXP (x, 0));
}
+  else if (GET_CODE (x) == ZERO_EXTEND
+  && SCALAR_INT_MODE_P (mode)
+  && REG_P (XEXP (x, 0))
+  && HARD_REGISTER_P (XEXP (x, 0)))
+   {
+ size = GET_MODE_PRECISION (GET_MODE (XEXP (x, 0)));
+ x = gen_rtx_REG (mode, REGNO (XEXP (x, 0)));
+   }
   else
continue;

@@ 

  1   2   >