[PATCH, i386] Introduce support for PKU instructions.

2015-12-17 Thread Kirill Yukhin
Hello,
Patch in the bottom introduces support Intel PKRU instructions:
rdpkru and wrpkru.
It is pretty straight-forward, so I hope it is still suitable for v6.

Names for new intrinsics will appear shortly in new revision of SDM.

Bootstrapped & regtested.

Is it ok for trunk?

gcc/
* common/config/i386/i386-common.c (OPTION_MASK_ISA_PKU_SET): New.
(OPTION_MASK_ISA_PKU_UNSET): Ditto.
(ix86_handle_option): Handle OPT_mpku.
* config.gcc: Add pkuintrin.h to i[34567]86-*-* and x86_64-*-*
targets.
* config/i386/cpuid.h (host_detect_local_cpu): Detect PKU feature.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle PKU ISA
flag.
* config/i386/i386.c (ix86_target_string): Add "-mpku" to
ix86_target_opts.
(ix86_option_override_internal): Define PTA_PKU, mention new key
in skylake-avx512. Handle new ISA bits.
(ix86_valid_target_attribute_inner_p): Add "pku".
(enum ix86_builtins): Add IX86_BUILTIN_RDPKRU and IX86_BUILTIN_WRPKRU.
(builtin_description bdesc_special_args[]): Add new built-ins.
* config/i386/i386.h (define TARGET_PKU): New.
(define TARGET_PKU_P): Ditto.
* config/i386/i386.md (define_c_enum "unspec"): Add UNSPEC_PKU.
(define_c_enum "unspecv"): Add UNSPECV_PKU.
(define_expand "rdpkru"): New.
(define_insn "rdpkru_2"): Ditto.
(define_expand "wrpkru"): Ditto.
(define_insn "wrpkru_2"): Ditto.
* config/i386/i386.opt (mpku): Ditto.
* config/i386/pkuintrin.h: New file.
* config/i386/x86intrin.h: Include pkuintrin.h
* doc/extend.texi: Describe new built-ins.
* doc/invoke.texi: Describe new switches.

gcc/testsuite/
* g++.dg/other/i386-2.C: Add -mpku.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/rdpku-1.c: New test.
* gcc.target/i386/sse-12.c: Add -mpku.
* gcc.target/i386/sse-13.c: Ditto..
* gcc.target/i386/sse-22.c: Ditto..
* gcc.target/i386/sse-33.c: Ditto..
* gcc.target/i386/wrpku-1.c: New test.

--
Thanks, K

commit ebd39dd557ddd0d1aae344655f1bd69673477865
Author: Kirill Yukhin 
Date:   Wed Dec 16 10:52:37 2015 +0300

PKU. Initial support.

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index a9d2208..6039e04 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -129,6 +129,7 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_F16C | OPTION_MASK_ISA_AVX_SET)
 #define OPTION_MASK_ISA_MWAITX_SET OPTION_MASK_ISA_MWAITX
 #define OPTION_MASK_ISA_CLZERO_SET OPTION_MASK_ISA_CLZERO
+#define OPTION_MASK_ISA_PKU_SET OPTION_MASK_ISA_PKU
 
 /* Define a set of ISAs which aren't available when a given ISA is
disabled.  MMX and SSE ISAs are handled separately.  */
@@ -190,6 +191,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB
 #define OPTION_MASK_ISA_MWAITX_UNSET OPTION_MASK_ISA_MWAITX
 #define OPTION_MASK_ISA_CLZERO_UNSET OPTION_MASK_ISA_CLZERO
+#define OPTION_MASK_ISA_PKU_UNSET OPTION_MASK_ISA_PKU
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
as -mno-sse4.1. */
@@ -962,6 +964,19 @@ ix86_handle_option (struct gcc_options *opts,
}
   return true;
 
+case OPT_mpku:
+  if (value)
+   {
+ opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PKU_SET;
+ opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PKU_SET;
+   }
+  else
+   {
+ opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_PKU_UNSET;
+ opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PKU_UNSET;
+   }
+  return true;
+
 
   /* Comes from final.c -- no real reason to change it.  */
 #define MAX_CODE_ALIGN 16
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 882e413..4fd6d8b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -372,7 +372,8 @@ i[34567]86-*-*)
   xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
   avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
   avx512ifmaintrin.h avx512ifmavlintrin.h 
avx512vbmiintrin.h
-  avx512vbmivlintrin.h clwbintrin.h pcommitintrin.h 
mwaitxintrin.h clzerointrin.h"
+  avx512vbmivlintrin.h clwbintrin.h pcommitintrin.h
+  mwaitxintrin.h clzerointrin.h pkuintrin.h"
;;
 x86_64-*-*)
cpu_type=i386
@@ -393,7 +394,8 @@ x86_64-*-*)
   xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
   avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
   avx512ifmaintrin.h avx512ifmavlintrin.h 
avx512vbmiintrin.h
-  avx512vbmivlintrin.h clwbintrin.h pcommitintrin.h 
mwaitxintrin.h clzerointrin.h"
+  avx512vbmivlintrin.h clwbintrin.h pcommitintrin.h
+ 

Re: [PATCH] OpenACC documentation for libgomp

2015-12-17 Thread Sandra Loosemore

On 12/16/2015 06:29 AM, James Norris wrote:

Hi,

Attached is the patch to add OpenACC documentation for libgomp.

Ok to commit to trunk?


I have some copy-editing nits.  I can't say I'm familiar enough with 
this functionality to comment intelligently on the content, though



+To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
+flag @command{-fopenacc} must be specified.  This enables the OpenACC directive


s/@command/@option


+@node acc_get_num_devices
+@section @code{acc_get_num_devices} -- Get number of devices for given device 
type
+@table @asis
+@item @emph{Description}
+This routine returns a value indicating the
+number of devices available for the given device type.  It determines
+the number of devices in a @emph{passive} manner.  In other words, it
+does not alter the state within the runtime environment aside from
+possibly initializing an uninitialized device.  This aspect allows


s/aspect //


+the routine to be called without concern for altering the interaction
+with an attached accelerator device.


I think "...concern that it might alter" is what you intend to say 
here.


I'm not too sure about the formatting style here.  It does seem to be 
consistent with the style of the existing content of the manual to have 
a separate section for each function instead of listing them in a table, 
but the existing docs have prototypes that are missing from your 
additions, and I'd really like to see index entries for all these things



+@node acc_on_device
+@section @code{acc_on_device} -- Whether executing on a particular device
+@table @asis
+@item @emph{Description}:
+This routine tells the program whether it is executing on a particular
+device.  Based on the argument passed, GCC tries to evaluate this to a
+constant at compile time, but library functions are also provided, for


s/, for/ for/


+@node CUDA Streams Usage
+@chapter CUDA Streams Usage
+
+This applies to the @code{nvptx} plugin only.
+
+The library provides elements that perform asynchronous movement of
+data and asynchronous operation of computing constructs.  This
+asynchronous functionality is implemented by making use of CUDA
+streams@footnote{See "Stream Management" in "CUDA Driver API",
+TRM-06703-001, Version 5.5, July 2013, for additional information}.
+
+The primary means by which the asychronous functionality is accessed
+is through the use of those OpenACC directives which make use of the


s/which/that/


+@code{async} and @code{wait} clauses.  When the @code{async} clause is
+first used with a directive, it will create a CUDA stream.  If an


s/will create/creates/


+@code{async-argument} is used with the @code{async} clause, then the
+stream will be associated with the specified @code{async-argument}.


s/will be/is/


+
+Following the creation of an association between a CUDA stream and the
+@code{async-argument} of an @code{async} clause, both the @code{wait}
+clause and the @code{wait} directive can be used.  When either the
+clause or directive is used after stream creation, it creates a
+rendezvous point whereby execution will wait until all operations


s/will wait/waits/


+associated with the @code{async-argument}, that is, stream, have
+completed.
+
+Normally, the management of the streams that are created as a result of
+using the @code{async} clause, is done without any intervention by the
+caller.  This implies the association between the @code{async-argument}


You've got an unnecessary comma there.  I think this would be easier to 
parse if rewritten "Normally, streams that are created as a result of 
using the @code{async} clause are managed without any intervention by 
the caller."



+and the CUDA stream will be maintained for the lifetime of the program.


s/will be/is/


+However, this association can be changed through the use of the library
+function @code{acc_set_cuda_stream}.  When the function
+@code{acc_set_cuda_stream} is used, the CUDA stream that was


s/is used/is called/ ??


+originally associated with the @code{async} clause will be destroyed.


s/will be/is/


+Caution should be taken when changing the association as subsequent
+references to the @code{async-argument} will be referring to a different


s/will be referring/refer/


+As the OpenACC library is built using the CUDA Driver API, the question has
+arisen on what impact does using the OpenACC library have on a program that
+uses the Runtime library, or a library based on the Runtime library, e.g.,
+CUBLAS@footnote{See section 2.26, "Interactions with the CUDA Driver API" in
+"CUDA Runtime API", Version 5.5, July 2013 and section 2.27, "VDPAU
+Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
+July 2013, for additional information on library interoperability.}.


This is really hard to parse.  Can we say something like

The OpenACC library uses the CUDA Driver API, and may interact with 
programs that use the Runtime library directly, or another library based 
on t

[COMMITTED] Add myself to MAINTAINERS (Write After Approval)

2015-12-17 Thread Saraswati, Sujoy (OSTL)
Hi,

I've added myself to "Write After Approval" maintainers.

Committed revision 231805.

Regards,
Sujoy

Index: MAINTAINERS
===
--- MAINTAINERS (revision 231804)
+++ MAINTAINERS (revision 231805)
@@ -554,6 +554,7 @@ Matthew Sachs   

 Hariharan Sandanagobalane  
 Iain Sandoe
 Duncan Sands   
+Sujoy Saraswati

 Trevor Saunders
 William Schmidt

 Tilo Schwarz   
Index: ChangeLog
===
--- ChangeLog   (revision 231804)
+++ ChangeLog   (revision 231805)
@@ -1,3 +1,7 @@
+2015-12-18  Sujoy Saraswati  
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2015-12-17  Sebastian Pop  

* Makefile.in: Replace ISL with isl.


[PATCH] Move split-path pass next to the tracer pass

2015-12-17 Thread Jeff Law


Richi noted that the two passes, which do very similar things are at 
slightly different places in the pipeline.  There really isn't a good 
reason for that.


This patch sinks the split-path pass slightly so that it's immediately 
before the tracer pass.



Bootstrapped and regression tested on x86_64-linux-gnu & installed on 
the trunk.




Jeff
commit 221e081904a5cb3423a03b7f6e3173fe205f0adb
Author: law 
Date:   Fri Dec 18 04:04:20 2015 +

[PATCH] Move split-path pass next to the tracer pass

* passes.def: Put the split-paths pass immediately before the
tracer pass.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@231800 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 070b2dd..bf01dfb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@
 2015-12-17  Jeff Law  
 
+   * passes.def: Put the split-paths pass immediately before the
+   tracer pass.
+
* doc/invoke.texi (-O2 options): Remove -fsplit-paths.
(-O3 options): Add -fsplit-paths.
* gimple-ssa-split-paths.c: Include predict.h
diff --git a/gcc/passes.def b/gcc/passes.def
index 2ba8490..59114a9 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -302,10 +302,10 @@ along with GCC; see the file COPYING3.  If not see
   POP_INSERT_PASSES ()
   NEXT_PASS (pass_simduid_cleanup);
   NEXT_PASS (pass_lower_vector_ssa);
-  NEXT_PASS (pass_split_paths);
   NEXT_PASS (pass_cse_reciprocals);
   NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
   NEXT_PASS (pass_strength_reduction);
+  NEXT_PASS (pass_split_paths);
   NEXT_PASS (pass_tracer);
   NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
   NEXT_PASS (pass_strlen);


Re: [Patch, fortran} pr68196 [4.9/5 Regression] ICE on function result with procedure pointer component

2015-12-17 Thread Steve Kargl
On Thu, Dec 17, 2015 at 11:12:17PM +0100, Paul Richard Thomas wrote:
> 
> Some problems have come up that are not dissimilar to the original
> bug, involving infinite recursion with procedure components, with the
> same type as the containing type. The fix is verging on the trivial.
> However, given that I found two further bugs in fixing the one
> reported, I worry that there are more lurking nearby.
> 
> Bootstraps and regtests on x86_64 - OK for trunk and, in a couple of
> weeks 5 and 4.9 branches?
> 

OK.

Do you have a testcase that should also be committed?

-- 
Steve


Re: Fix PR66206

2015-12-17 Thread Bernd Schmidt

On 12/18/2015 02:15 AM, Andrew Pinski wrote:


Except PATTERN (insn) will never be a REG.
The only case where the input can be a REG is:
gcc_assert (!find_btr_use (src));


Yeah, so we _are_ calling it with a REG. It's a minor issue that won't 
trigger in practice, but in order to close the PR we might as well fix 
it and this is the least invasive way.



Bernd


Re: [PATCH] shrink-wrap: Once more PRs 67778, 68634, and now 68909

2015-12-17 Thread Bernd Schmidt

On 12/17/2015 10:07 PM, Segher Boessenkool wrote:

It turns out v4 wasn't quite complete anyway; so here "v5".

If a candidate PRE cannot get the prologue because a block BB is
reachable from it, but PRE does not dominate BB, we try again with the
dominators of PRE.  That "try again" needs to again consider BB though,
we aren't done with it.

This fixes this problem.  Tested on the 68909 testcase, and bootstrapped
and regression checked on powerpc64-linux.  Is this okay for trunk?


This code is getting really quite confusing, and at the least I think we 
need more documentation of what exactly vec is supposed to contain at 
the entry to the inner while loop here.
I'm also beginning to think we should disable this part of the code for 
gcc-6.



Bernd


libgo patch committed: Make sure NLA_HDRLEN is defined

2015-12-17 Thread Ian Lance Taylor
This libgo patch from Lynn Boger makes sure that NLA_HDRLEN is defined
in the syscall package.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline and GCC 5 branch.

Ian
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 231795)
+++ libgo/mksysinfo.sh  (revision 231796)
@@ -267,6 +267,9 @@ enum {
 #ifdef TUNGETFILTER
   TUNGETFILTER_val = TUNGETFILTER,
 #endif
+#ifdef NLA_HDRLEN
+  NLA_HDRLEN_val = NLA_HDRLEN,
+#endif
 
 };
 EOF
@@ -1075,8 +1078,6 @@ if ! grep '^const TUNGETFILTER' ${OUT} >
   fi
 fi
 
-
-
 # The ioctl flags for terminal control
 grep '^const _TC[GS]ET' gen-sysinfo.go | grep -v _val | \
 sed -e 's/^\(const \)_\(TC[GS]ET[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
@@ -1422,9 +1423,15 @@ grep '^type _rtnexthop ' gen-sysinfo.go
 # The GNU/Linux netlink flags.
 grep '^const _NETLINK_' gen-sysinfo.go | \
   sed -e 's/^\(const \)_\(NETLINK_[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
-grep '^const _NLA_' gen-sysinfo.go | \
+grep '^const _NLA_' gen-sysinfo.go | grep -v '_val =' | \
   sed -e 's/^\(const \)_\(NLA_[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
 
+if ! grep '^const NLA_HDRLEN' ${OUT} >/dev/null 2>&1; then
+  if grep '^const _NLA_HDRLEN_val' ${OUT} >/dev/null 2>&1; then
+echo 'const NLA_HDRLEN = _NLA_HDRLEN_val' >> ${OUT}
+  fi
+fi
+
 # The GNU/Linux packet socket flags.
 grep '^const _PACKET_' gen-sysinfo.go | \
   sed -e 's/^\(const \)_\(PACKET_[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}


Re: Fix PR66206

2015-12-17 Thread Andrew Pinski
On Thu, Dec 17, 2015 at 5:00 PM, Bernd Schmidt  wrote:
> This is a small problem found by a static analyzer, a function in bt-load
> can in theory return the address of a local variable.
>
> Bootstrapped and tested on x86_64-linux, ok?

Except PATTERN (insn) will never be a REG.
The only case where the input can be a REG is:
gcc_assert (!find_btr_use (src));

And that is basically making sure the left hand side is not part of
all_btrs so the way to fix that would be simpler if you define a new
function called btr_use_p or something to that effect and return
true/false instead.

Also if you are touching this code, can you change the literal 0 to
NULL inside find_btr_use.

Thanks,
Andrew





>
>
> Bernd


Fix PR66206

2015-12-17 Thread Bernd Schmidt
This is a small problem found by a static analyzer, a function in 
bt-load can in theory return the address of a local variable.


Bootstrapped and tested on x86_64-linux, ok?


Bernd
	PR rtl-optimization/66206
	* bt-load.c (find_btr_use): Change first arg to be a pointer to an rtx.
	All callers changed.

Index: gcc/bt-load.c
===
--- gcc/bt-load.c	(revision 231653)
+++ gcc/bt-load.c	(working copy)
@@ -1,4 +1,3 @@
-
 /* Perform branch target register load optimizations.
Copyright (C) 2001-2015 Free Software Foundation, Inc.
 
@@ -188,14 +187,14 @@ basic_block_freq (const_basic_block bb)
   return bb->frequency;
 }
 
-/* If X references (sets or reads) any branch target register, return one
-   such register.  If EXCLUDEP is set, disregard any references within
-   that location.  */
+/* If the rtx at *XP references (sets or reads) any branch target
+   register, return one such register.  If EXCLUDEP is set, disregard
+   any references within that location.  */
 static rtx *
-find_btr_use (rtx x, rtx *excludep = 0)
+find_btr_use (rtx *xp, rtx *excludep = 0)
 {
   subrtx_ptr_iterator::array_type array;
-  FOR_EACH_SUBRTX_PTR (iter, array, &x, NONCONST)
+  FOR_EACH_SUBRTX_PTR (iter, array, xp, NONCONST)
 {
   rtx *loc = *iter;
   if (loc == excludep)
@@ -232,7 +231,7 @@ insn_sets_btr_p (const rtx_insn *insn, i
   if (REG_P (dest)
 	  && TEST_HARD_REG_BIT (all_btrs, REGNO (dest)))
 	{
-	  gcc_assert (!find_btr_use (src));
+	  gcc_assert (!find_btr_use (&src));
 
 	  if (!check_const || CONSTANT_P (src))
 	{
@@ -324,7 +323,7 @@ new_btr_user (basic_block bb, int insn_l
  to decide whether we can replace all target register
  uses easily.
*/
-  rtx *usep = find_btr_use (PATTERN (insn));
+  rtx *usep = find_btr_use (&PATTERN (insn));
   rtx use;
   btr_user *user = NULL;
 
@@ -335,7 +334,7 @@ new_btr_user (basic_block bb, int insn_l
   /* We want to ensure that USE is the only use of a target
 	 register in INSN, so that we know that to rewrite INSN to use
 	 a different target register, all we have to do is replace USE.  */
-  unambiguous_single_use = !find_btr_use (PATTERN (insn), usep);
+  unambiguous_single_use = !find_btr_use (&PATTERN (insn), usep);
   if (!unambiguous_single_use)
 	usep = NULL;
 }
@@ -511,7 +510,7 @@ compute_defs_uses_and_gen (btr_heap_t *a
 		}
 	  else
 		{
-		  if (find_btr_use (PATTERN (insn)))
+		  if (find_btr_use (&PATTERN (insn)))
 		{
 		  btr_user *user = new_btr_user (bb, insn_luid, insn);
 


Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread H.J. Lu
On Thu, Dec 17, 2015 at 1:59 PM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 1:21 PM, Uros Bizjak  wrote:
>> On Thu, Dec 17, 2015 at 7:09 PM, H.J. Lu  wrote:
>>> On Thu, Dec 17, 2015 at 8:11 AM, H.J. Lu  wrote:
 On Thu, Dec 17, 2015 at 7:50 AM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak  wrote:
>> On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
>>> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
 On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
> Since sibcall never returns, we can only use call-clobbered register
> as GOT base.  Otherwise, callee-saved register used as GOT base won't
> be properly restored.
>
> Tested on x86-64 with -m32.  OK for trunk?

 You don't have to add explicit clobber for members of "CLOBBERED_REGS"
 class, and register_no_elim_operand predicate should be used with "U"
 constraint. Also, please introduce new predicate, similar to how
 GOT_memory_operand is defined and handled.

>>>
>>> Here is the updated patch.  There is a predicate already,
>>> sibcall_memory_operand.  It allows any registers to
>>> be as GOT base, which is the root of our problem.
>>> This patch removes GOT slot from it and handles
>>> sibcall over GOT slot with *sibcall_GOT_32 and
>>> *sibcall_value_GOT_32 patterns.  Since I need to
>>> expose constraints on GOT base register to RA,
>>> I have to use 2 operands, GOT base and function
>>> symbol, to describe sibcall over 32-bit GOT slot.
>>
>> Please use
>>
>>(mem:SI (plus:SI
>>  (match_operand:SI 0 "register_no_elim_operand" "U")
>>  (match_operand:SI 1 "GOT32_symbol_operand")))
>> ...
>>
>> to avoid manual rebuild of the operand.
>>
>
> Is this OK?
>

 An updated patch to allow sibcall_memory_operand for RTL
 expansion.  OK for trunk if there is no regression?

>>>
>>> There is no regressions on x86-64 with -m32.  OK for trunk?
>>
>> OK for mainline, with a following change:
>>
>> @@ -597,11 +597,17 @@
>>  (match_operand 0 "memory_operand"
>>
>>  ;; Return true if OP is a memory operands that can be used in sibcalls.
>> +;; Since sibcall never returns, we can only use call-clobbered register
>> +;; as GOT base.  Allow GOT slot here only with pseudo register as GOT
>> +;; base.  Properly handle sibcall over GOT slot with *sibcall_GOT_32
>> +;; and *sibcall_value_GOT_32 patterns.
>>  (define_predicate "sibcall_memory_operand"
>>(and (match_operand 0 "memory_operand")
>> (match_test "CONSTANT_P (XEXP (op, 0))
>>  || (GET_CODE (XEXP (op, 0)) == PLUS
>>  && REG_P (XEXP (XEXP (op, 0), 0))
>> +&& (REGNO (XEXP (XEXP (op, 0), 0))
>> +>= FIRST_PSEUDO_REGISTER)
>>  && GET_CODE (XEXP (XEXP (op, 0), 1)) == CONST
>>  && GET_CODE (XEXP (XEXP (XEXP (op, 0), 1), 0)) == UNSPEC
>>  && XINT (XEXP (XEXP (XEXP (op, 0), 1), 0), 1) == UNSPEC_GOT)")))
>>
>> You can use (!HARD_REGISTER_NUM_P (...) || call_used_regs[...]) here.
>> Call-used hard regs are still allowed here.
>>
>> Can you please also rewrite this horrible match_test as a block of C
>> code using GOT32_symbol_operand predicate?
>>
>
> I am retesting the patch with
>
> ;; Return true if OP is a memory operands that can be used in sibcalls.
> ;; Since sibcall never returns, we can only use call-clobbered register
> ;; as GOT base.  Allow GOT slot here only with pseudo register as GOT
> ;; base.  Properly handle sibcall over GOT slot with *sibcall_GOT_32
> ;; and *sibcall_value_GOT_32 patterns.
> (define_predicate "sibcall_memory_operand"
>   (match_operand 0 "memory_operand")
> {
>   op = XEXP (op, 0);
>   if (CONSTANT_P (op))
> return true;
>   if (GET_CODE (op) == PLUS && REG_P (XEXP (op, 0)))
> {
>   int regno = REGNO (XEXP (op, 0));
>   if (!HARD_REGISTER_NUM_P (regno) || call_used_regs[regno])
> {
>   op = XEXP (op, 1);
>   if (GOT32_symbol_operand (op, VOIDmode))
> return true;
> }
> }
>   return false;
> })
>
>
> I will check it in if there is no regression.
>

There is no regression.  But I missed sibcall to local function
with -O2 -fPIC -m32 -fno-plt -mregparm=3:

extern void bar (int, int, int) __attribute__((visibility("hidden")));

void
foo (int a, int b, int c)
{
  bar (a, b, c);
  bar (a, b, c);
}

It doesn't need GOT.  This patch fixes it.

iff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0e2bec3..691915f9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6657,6 +6657,7 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
 {
   tree type, decl_or_type;
   rtx a, b;
+  bool bind_global = decl && !targetm.binds_local_p (decl);

   /* If we are generating position-independent code, we cannot sibcall
  optimize direct calls to

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-17 Thread Jeff Law

On 12/11/2015 03:05 AM, Richard Biener wrote:

On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law  wrote:

On 12/03/2015 07:38 AM, Richard Biener wrote:


This pass is now enabled by default with -Os but has no limits on the
amount of
stmts it copies.


The more statements it copies, the more likely it is that the path spitting
will turn out to be useful!  It's counter-intuitive.


Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer is enabled
with -fprofile-use (but it is also properly driven to only trace hot paths)
and otherwise not by default at any optimization level.
I've just committed a patch to limit to loops we're optimizing for speed 
and moved the transformation from -O2 to -O3.


I put in some instrumentation to see when this was triggering and, as 
expected the vast majority of triggers are with very small blocks, 2-3 
statements.  But those are probably the least interesting.  There's 
limited instances where it triggers on large blocks (say > 10 
statements).  But those were with GCC sources.  I'm going to pull out 
SPEC and do some instrumented builds with that, obviously focusing on 
those benchmarks where Ajit saw improvements.



Hmmm, the updated code keeps the single latch property, but I'm pretty sure
it won't keep a single exit policy.

To keep a single exit policy would require keeping an additional block
around.  Each of the split paths would unconditionally transfer to this new
block.  The new block would then either transfer to the latch block or out
of the loop.


Don't see how this would work for the CFG pattern it operates on unless you
duplicate the exit condition into that new block creating an even more
obfuscated CFG.
Upon further reflection, I don't think this is important as the pass 
runs after the tree loop optimizers.




Note that both passes are placed quite late and thus won't see much
of the GIMPLE optimizations (DOM mainly).  I wonder why they were
not placed adjacent to each other.
I'm going to move them to be adjacent.  If for no other reason than 
it'll make comparisons easier without having to worry about any passes 
between them.  I suspect that'll drop in tonight after I get the kids to 
sleep :-)


Jeff



[PATCH] Limit path splitting to loops we optimize for speed

2015-12-17 Thread Jeff Law


It's not currently clear what the final disposition for the path 
splitting code will be.  However, there's no reason not to implement 
Richi's request that we only do this transformation when optimizing for 
speed and at optimization levels higher than -O2.


This patch limits the transformation to loops where 
optimize_loop_for_speed_p is true.  The patch also moves the 
transformation to -O3.


Bootstrapped & regression tested on x86_64-linux-gnu.  Installed on the 
trunk.




Jeff
commit 47545a4249a6d9ff3003e0e98a11aced97c6c7e1
Author: Jeff Law 
Date:   Thu Dec 17 16:32:19 2015 -0700

[PATCH] Limit path splitting to loops we optimize for speed

* doc/invoke.texi (-O2 options): Remove -fsplit-paths.
(-O3 options): Add -fsplit-paths.
* gimple-ssa-split-paths.c: Include predict.h
(split_paths): Only split paths in a loop that should be
optimized for speed.
* opts.c (default_options_table): Move -fsplit-paths from -O2 to
-O3.

* gcc.dg/tree-ssa/split-path-1.c: Explicitly ask for path
splitting optimizations.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a1f71bd..070b2dd 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2015-12-17  Jeff Law  
+
+   * doc/invoke.texi (-O2 options): Remove -fsplit-paths.
+   (-O3 options): Add -fsplit-paths.
+   * gimple-ssa-split-paths.c: Include predict.h
+   (split_paths): Only split paths in a loop that should be
+   optimized for speed.
+   * opts.c (default_options_table): Move -fsplit-paths from -O2 to
+   -O3.
+
 2015-12-17  Nathan Sidwell  
 
* ipa-icf.c (sem_item_optimizer::merge): Don't pick 'main' as the
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index cdc5d2c..60530c0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7836,7 +7836,6 @@ also turns on the following optimization flags:
 -frerun-cse-after-loop  @gol
 -fsched-interblock  -fsched-spec @gol
 -fschedule-insns  -fschedule-insns2 @gol
--fsplit-paths @gol
 -fstrict-aliasing -fstrict-overflow @gol
 -ftree-builtin-call-dce @gol
 -ftree-switch-conversion -ftree-tail-merge @gol
@@ -7853,7 +7852,7 @@ Optimize yet more.  @option{-O3} turns on all 
optimizations specified
 by @option{-O2} and also turns on the @option{-finline-functions},
 @option{-funswitch-loops}, @option{-fpredictive-commoning},
 @option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
-@option{-ftree-loop-distribute-patterns},
+@option{-ftree-loop-distribute-patterns}, @option{-fsplit-paths}
 @option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
 @option{-ftree-partial-pre} and @option{-fipa-cp-clone} options.
 
diff --git a/gcc/gimple-ssa-split-paths.c b/gcc/gimple-ssa-split-paths.c
index 602e916..540fdf3 100644
--- a/gcc/gimple-ssa-split-paths.c
+++ b/gcc/gimple-ssa-split-paths.c
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "gimple-iterator.h"
 #include "tracer.h"
+#include "predict.h"
 
 /* Given LATCH, the latch block in a loop, see if the shape of the
path reaching LATCH is suitable for being split by duplication.
@@ -180,9 +181,14 @@ split_paths ()
 
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
 {
+  /* Only split paths if we are optimizing this loop for speed.  */
+  if (!optimize_loop_for_speed_p (loop))
+   continue;
+
   /* See if there is a block that we can duplicate to split the
 path to the loop latch.  */
-  basic_block bb = find_block_to_duplicate_for_splitting_paths 
(loop->latch);
+  basic_block bb
+   = find_block_to_duplicate_for_splitting_paths (loop->latch);
 
   /* BB is the merge point for an IF-THEN-ELSE we want to transform.
 
diff --git a/gcc/opts.c b/gcc/opts.c
index d46f304..7ab585f 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -523,11 +523,11 @@ static const struct default_options 
default_options_table[] =
 { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
-{ OPT_LEVELS_2_PLUS, OPT_fsplit_paths, NULL, 1 },
 
 /* -O3 optimizations.  */
 { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_fsplit_paths, NULL, 1 },
 /* Inlining of functions reducing size is a good idea with -Os
regardless of them being declared inline.  */
 { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d5ae299..baa159d 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2015-12-17  Jeff Law  
+
+   * gcc.dg/tree-ssa/split-path-1.c: Explicitly ask for path
+   splitting optimizations.
+
 2015-12-17  Nathan Sidwell  
 
* gcc.dg/ipa/ipa-icf-merge-1.c: New.
diff --git a/gcc/testsuite/gcc.

Re: [PTX] simplify calling struct

2015-12-17 Thread Bernhard Reutner-Fischer
On December 16, 2015 2:53:51 PM GMT+01:00, Nathan Sidwell  
wrote:
>PTX's machine_function structure squirrels away the function type to
>calculate 
>the presence of  varadic args later,  rather than calculate it
>immediately.  It 
>also uses an rtx field as a boolean.  This patch reorganizes it with
>less 
>verbose names and more apt types.


+  bool is_varadic;  /* This call is varadic  */
+  bool has_varadic;  /* Current function has a varadic call.  */

Just curious what varadic is? Is that maybe somehow related to variadic?

TIA,
>
>I also noticed that nvptx_hard_regno_mode_ok wasn't  being used, so
>that's deleted.
>
>nathan




Re: [testsuite][ARM target attributes] Fix effective_target tests

2015-12-17 Thread Christophe Lyon
Hi,

Here is an updated version of this patch.
I did test it with
-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard in
addition to my usual set of options.

Compared to the previous version:
- I added some doc in sourcebuild.texi
- I no longer modify arm_vfp_ok...
- I replaced all uses of arm_vfp with the new arm_fp because I found
that the existing tests do not actually need to pass -mfpu=vfp: this
is implicitly set as the default when using -mfloat-abi={softfp|hard}
- I chose not to remove arm_vfp_ok because we may need it in the
future, if a test really needs vfp (as opposed to neon for instance)
- in gcc.target/arm/attr-crypto.c I force the initial fpu to be vfp
via pragma instead, so that the next pragma fpu
fpu=crypto-neon-fp-armv8 is always compatible, regardless of the
command-line options/default fpu
- same for attr-neon2.c and attr-neon3.c
- I updated cmp-2.c, unsigned-float.c, vfp-1.c, vfp-ldmdbd.c,
vfp-ldmdbs.c, vfp-ldmiad.c, vfp-ldmias.c, vfp-stmdbd.c, vfp-stmdbs.c,
vfp-stmiad.c, vfp-stmias.c, vnmul-[1234].c to use the new arm_fp
effective target instead of arm_vfp. This is so that they don't need
to use -mfpu=vfp and can use the new dg-add-options arm_fp

The validation results show (in addition to what I originally reported):
- attr-crypto.c and attr-neon3.c now ICE in some cases. This is PR68895.
- depending on the GCC configuration (e.g. --with-fpu=neon)
attr-neon3.c may fail. This is PR68896.

OK?

Christophe

2015-12-17  Christophe Lyon  

* doc/sourcebuild.texi (arm_fp_ok): Document new entry.
(arm_fp): Likewise.
* lib/target-supports.exp
(check_effective_target_arm_fp_ok_nocache): New.
(check_effective_target_arm_fp_ok): New.
(add_options_for_arm_fp): New.
(check_effective_target_arm_crypto_ok_nocache): Require
target_arm_v8_neon_ok instead of arm32.
(check_effective_target_arm_crypto_pragma_ok_nocache): New.
(check_effective_target_arm_crypto_pragma_ok): New.
(add_options_for_arm_vfp): New.
* gcc.target/arm/attr-crypto.c: Use arm_crypto_pragma_ok effective
target. Do not force -mfloat-abi=softfp, use arm_fp_ok effective
target instead. Force initial fpu to vfp.
* gcc.target/arm/attr-neon-builtin-fail.c: Do not force
-mfloat-abi=softfp, use arm_fp_ok effective target instead.
* gcc.target/arm/attr-neon-fp16.c: Likewise. Remove arm_neon_ok
dependency.
* gcc.target/arm/attr-neon2.c: Do not force -mfloat-abi=softfp,
use arm_vfp effective target instead. Force initial fpu to vfp.
* gcc.target/arm/attr-neon3.c: Likewise.
* gcc.target/arm/cmp-2.c: Use arm_fp_ok effective target instead of
arm_vfp_ok.
* gcc.target/arm/unsigned-float.c: Likewise.
* gcc.target/arm/vfp-1.c: Likewise.
* gcc.target/arm/vfp-ldmdbd.c: Likewise.
* gcc.target/arm/vfp-ldmdbs.c: Likewise.
* gcc.target/arm/vfp-ldmiad.c: Likewise.
* gcc.target/arm/vfp-ldmias.c: Likewise.
* gcc.target/arm/vfp-stmdbd.c: Likewise.
* gcc.target/arm/vfp-stmdbs.c: Likewise.
* gcc.target/arm/vfp-stmiad.c: Likewise.
* gcc.target/arm/vfp-stmias.c: Likewise.
* gcc.target/arm/vnmul-1.c: Likewise.
* gcc.target/arm/vnmul-2.c: Likewise.
* gcc.target/arm/vnmul-3.c: Likewise.
* gcc.target/arm/vnmul-4.c: Likewise.



On 10 December 2015 at 20:52, Christophe Lyon
 wrote:
> On 10 December 2015 at 14:14, Kyrill Tkachov  wrote:
>>
>> On 10/12/15 13:04, Christophe Lyon wrote:
>>>
>>> On 10 December 2015 at 13:30, Kyrill Tkachov 
>>> wrote:

 Hi Christophe,


 On 08/12/15 11:18, Christophe Lyon wrote:
>
> On 8 December 2015 at 11:50, Kyrill Tkachov 
> wrote:
>>
>> Hi Christophe,
>>
>>
>> On 27/11/15 13:00, Christophe Lyon wrote:
>>>
>>> Hi,
>>>
>>> After the recent commits from Christian adding target attributes
>>> support for ARM FPU settings,  I've noticed that some of the tests
>>> were failing because of incorrect assumptions wrt to the default
>>> cpu/fpu/float-abi of the compiler.
>>>
>>> This patch fixes the problems I've noticed in the following way:
>>> - do not force -mfloat-abi=softfp in dg-options, to avoid conflicts
>>> when gcc is configured --with-float=hard
>>>
>>> - change arm_vfp_ok such that it tries several -mfpu/-mfloat-abi
>>> flags, checks that __ARM_FP is defined and __ARM_NEON_FP is not
>>> defined
>>>
>>> - introduce arm_fp_ok, which is similar but does not enforce fpu
>>> setting
>>>
>>> - add a new effective_target: arm_crypto_pragma_ok to check that
>>> setting this fpu via a pragma is actually supported by the current
>>> "multilib". This is different from checking the command-line option
>>> because the pragma might conflict with the command-line options in
>>> use.
>>>
>>> The updates in the testcases are as follows:
>>> - attr-crypto.c, we have to make sure that the defaut fpu does not
>>> conflict with t

[Patch, fortran} pr68196 [4.9/5 Regression] ICE on function result with procedure pointer component

2015-12-17 Thread Paul Richard Thomas
Dear All,

Some problems have come up that are not dissimilar to the original
bug, involving infinite recursion with procedure components, with the
same type as the containing type. The fix is verging on the trivial.
However, given that I found two further bugs in fixing the one
reported, I worry that there are more lurking nearby.

Bootstraps and regtests on x86_64 - OK for trunk and, in a couple of
weeks 5 and 4.9 branches?

Cheers

Paul

2015-12-17  Paul Thomas  

PR fortran/68196
*expr.c (gfc_has_default_initializer): Prevent infinite recursion
through this function for procedure pointer components.
* trans-array.c (structure_alloc_comps): Ditto times two.


2015-12-17  Paul Thomas  

PR fortran/68196
* gfortran.dg/proc_ptr_48.f90: New test.
Index: gcc/fortran/expr.c
===
*** gcc/fortran/expr.c  (revision 231253)
--- gcc/fortran/expr.c  (working copy)
*** gfc_has_default_initializer (gfc_symbol
*** 3930,3936 
for (c = der->components; c; c = c->next)
  if (c->ts.type == BT_DERIVED)
{
! if (!c->attr.pointer
 && gfc_has_default_initializer (c->ts.u.derived))
  return true;
if (c->attr.pointer && c->initializer)
--- 3930,3936 
for (c = der->components; c; c = c->next)
  if (c->ts.type == BT_DERIVED)
{
! if (!c->attr.pointer && !c->attr.proc_pointer
 && gfc_has_default_initializer (c->ts.u.derived))
  return true;
if (c->attr.pointer && c->initializer)
Index: gcc/fortran/trans-array.c
===
*** gcc/fortran/trans-array.c   (revision 231253)
--- gcc/fortran/trans-array.c   (working copy)
*** structure_alloc_comps (gfc_symbol * der_
*** 8074,8080 
}

  if (cmp_has_alloc_comps
!   && !c->attr.pointer
&& !called_dealloc_with_status)
{
  /* Do not deallocate the components of ultimate pointer
--- 8075,8081 
}

  if (cmp_has_alloc_comps
!   && !c->attr.pointer && !c->attr.proc_pointer
&& !called_dealloc_with_status)
{
  /* Do not deallocate the components of ultimate pointer
*** structure_alloc_comps (gfc_symbol * der_
*** 8264,8270 
 components that are really allocated, the deep copy code has to
 be generated first and then added to the if-block in
 gfc_duplicate_allocatable ().  */
! if (cmp_has_alloc_comps)
{
  rank = c->as ? c->as->rank : 0;
  tmp = fold_convert (TREE_TYPE (dcmp), comp);
--- 8265,8272 
 components that are really allocated, the deep copy code has to
 be generated first and then added to the if-block in
 gfc_duplicate_allocatable ().  */
! if (cmp_has_alloc_comps
! && !c->attr.proc_pointer)
{
  rank = c->as ? c->as->rank : 0;
  tmp = fold_convert (TREE_TYPE (dcmp), comp);


Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Jason Merrill
The C++ changes are also for handling comparing an element to itself, 
which shouldn't happen; I'd prefer a gcc_checking_assert that it doesn't.


Jason


Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread H.J. Lu
On Thu, Dec 17, 2015 at 1:21 PM, Uros Bizjak  wrote:
> On Thu, Dec 17, 2015 at 7:09 PM, H.J. Lu  wrote:
>> On Thu, Dec 17, 2015 at 8:11 AM, H.J. Lu  wrote:
>>> On Thu, Dec 17, 2015 at 7:50 AM, H.J. Lu  wrote:
 On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak  wrote:
> On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
>> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
>>> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
 Since sibcall never returns, we can only use call-clobbered register
 as GOT base.  Otherwise, callee-saved register used as GOT base won't
 be properly restored.

 Tested on x86-64 with -m32.  OK for trunk?
>>>
>>> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
>>> class, and register_no_elim_operand predicate should be used with "U"
>>> constraint. Also, please introduce new predicate, similar to how
>>> GOT_memory_operand is defined and handled.
>>>
>>
>> Here is the updated patch.  There is a predicate already,
>> sibcall_memory_operand.  It allows any registers to
>> be as GOT base, which is the root of our problem.
>> This patch removes GOT slot from it and handles
>> sibcall over GOT slot with *sibcall_GOT_32 and
>> *sibcall_value_GOT_32 patterns.  Since I need to
>> expose constraints on GOT base register to RA,
>> I have to use 2 operands, GOT base and function
>> symbol, to describe sibcall over 32-bit GOT slot.
>
> Please use
>
>(mem:SI (plus:SI
>  (match_operand:SI 0 "register_no_elim_operand" "U")
>  (match_operand:SI 1 "GOT32_symbol_operand")))
> ...
>
> to avoid manual rebuild of the operand.
>

 Is this OK?

>>>
>>> An updated patch to allow sibcall_memory_operand for RTL
>>> expansion.  OK for trunk if there is no regression?
>>>
>>
>> There is no regressions on x86-64 with -m32.  OK for trunk?
>
> OK for mainline, with a following change:
>
> @@ -597,11 +597,17 @@
>  (match_operand 0 "memory_operand"
>
>  ;; Return true if OP is a memory operands that can be used in sibcalls.
> +;; Since sibcall never returns, we can only use call-clobbered register
> +;; as GOT base.  Allow GOT slot here only with pseudo register as GOT
> +;; base.  Properly handle sibcall over GOT slot with *sibcall_GOT_32
> +;; and *sibcall_value_GOT_32 patterns.
>  (define_predicate "sibcall_memory_operand"
>(and (match_operand 0 "memory_operand")
> (match_test "CONSTANT_P (XEXP (op, 0))
>  || (GET_CODE (XEXP (op, 0)) == PLUS
>  && REG_P (XEXP (XEXP (op, 0), 0))
> +&& (REGNO (XEXP (XEXP (op, 0), 0))
> +>= FIRST_PSEUDO_REGISTER)
>  && GET_CODE (XEXP (XEXP (op, 0), 1)) == CONST
>  && GET_CODE (XEXP (XEXP (XEXP (op, 0), 1), 0)) == UNSPEC
>  && XINT (XEXP (XEXP (XEXP (op, 0), 1), 0), 1) == UNSPEC_GOT)")))
>
> You can use (!HARD_REGISTER_NUM_P (...) || call_used_regs[...]) here.
> Call-used hard regs are still allowed here.
>
> Can you please also rewrite this horrible match_test as a block of C
> code using GOT32_symbol_operand predicate?
>

I am retesting the patch with

;; Return true if OP is a memory operands that can be used in sibcalls.
;; Since sibcall never returns, we can only use call-clobbered register
;; as GOT base.  Allow GOT slot here only with pseudo register as GOT
;; base.  Properly handle sibcall over GOT slot with *sibcall_GOT_32
;; and *sibcall_value_GOT_32 patterns.
(define_predicate "sibcall_memory_operand"
  (match_operand 0 "memory_operand")
{
  op = XEXP (op, 0);
  if (CONSTANT_P (op))
return true;
  if (GET_CODE (op) == PLUS && REG_P (XEXP (op, 0)))
{
  int regno = REGNO (XEXP (op, 0));
  if (!HARD_REGISTER_NUM_P (regno) || call_used_regs[regno])
{
  op = XEXP (op, 1);
  if (GOT32_symbol_operand (op, VOIDmode))
return true;
}
}
  return false;
})


I will check it in if there is no regression.

Thanks.


-- 
H.J.


Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread Uros Bizjak
On Thu, Dec 17, 2015 at 7:09 PM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 8:11 AM, H.J. Lu  wrote:
>> On Thu, Dec 17, 2015 at 7:50 AM, H.J. Lu  wrote:
>>> On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak  wrote:
 On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
>> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
>>> Since sibcall never returns, we can only use call-clobbered register
>>> as GOT base.  Otherwise, callee-saved register used as GOT base won't
>>> be properly restored.
>>>
>>> Tested on x86-64 with -m32.  OK for trunk?
>>
>> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
>> class, and register_no_elim_operand predicate should be used with "U"
>> constraint. Also, please introduce new predicate, similar to how
>> GOT_memory_operand is defined and handled.
>>
>
> Here is the updated patch.  There is a predicate already,
> sibcall_memory_operand.  It allows any registers to
> be as GOT base, which is the root of our problem.
> This patch removes GOT slot from it and handles
> sibcall over GOT slot with *sibcall_GOT_32 and
> *sibcall_value_GOT_32 patterns.  Since I need to
> expose constraints on GOT base register to RA,
> I have to use 2 operands, GOT base and function
> symbol, to describe sibcall over 32-bit GOT slot.

 Please use

(mem:SI (plus:SI
  (match_operand:SI 0 "register_no_elim_operand" "U")
  (match_operand:SI 1 "GOT32_symbol_operand")))
 ...

 to avoid manual rebuild of the operand.

>>>
>>> Is this OK?
>>>
>>
>> An updated patch to allow sibcall_memory_operand for RTL
>> expansion.  OK for trunk if there is no regression?
>>
>
> There is no regressions on x86-64 with -m32.  OK for trunk?

OK for mainline, with a following change:

@@ -597,11 +597,17 @@
 (match_operand 0 "memory_operand"

 ;; Return true if OP is a memory operands that can be used in sibcalls.
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.  Allow GOT slot here only with pseudo register as GOT
+;; base.  Properly handle sibcall over GOT slot with *sibcall_GOT_32
+;; and *sibcall_value_GOT_32 patterns.
 (define_predicate "sibcall_memory_operand"
   (and (match_operand 0 "memory_operand")
(match_test "CONSTANT_P (XEXP (op, 0))
 || (GET_CODE (XEXP (op, 0)) == PLUS
 && REG_P (XEXP (XEXP (op, 0), 0))
+&& (REGNO (XEXP (XEXP (op, 0), 0))
+>= FIRST_PSEUDO_REGISTER)
 && GET_CODE (XEXP (XEXP (op, 0), 1)) == CONST
 && GET_CODE (XEXP (XEXP (XEXP (op, 0), 1), 0)) == UNSPEC
 && XINT (XEXP (XEXP (XEXP (op, 0), 1), 0), 1) == UNSPEC_GOT)")))

You can use (!HARD_REGISTER_NUM_P (...) || call_used_regs[...]) here.
Call-used hard regs are still allowed here.

Can you please also rewrite this horrible match_test as a block of C
code using GOT32_symbol_operand predicate?

Thanks,
Uros.


Fix alias.c wrt aliases and anchors

2015-12-17 Thread Jan Hubicka
Hi,
the alias-2.c testcase fails on targets with anchors.  The reason is that
the variable itself is anchored while the alias is not and they point to the
same location.   I folllowed the docs of SYMBOL_REF claiming that
SYMBOL_REF_DECL if the symbol is label and tought it is safe to disambiguate
them.

This patch commonizes the logic to compare_base_symbol_refs which acts
equivalently to compare_base_decls but on tRTL's SYMBOL_REF and can handle
querries when one parameter is DECL while other is anchor.

I am not fully sure about the case where we know that the variable is
in a block and we know its offset.  With current implementation I do not think
it is safe to use offset orracle because one offset is offset inside a variable,
while other is offset inside of the block.  This can be compensated for.
Also it seems that the alias code should not ignore the base informaiton it has
in memory attributes as it can be more precise.

This patch fixes the alias-2.c testcase and was bootstrapped/regtsted on arm.
OK?

I will look into anchor code - it seems it should create anchor for the alias,
too because it is non-interposable, but for some reason it does not.

Honza

PR middle-end/68832
* alias.c (compare_base_symbol_refs): New function.
(rtx_equal_for_memref_p, base_alias_check, memrefs_conflict_p): Use it.

Index: alias.c
===
--- alias.c (revision 231722)
+++ alias.c (working copy)
@@ -158,6 +158,7 @@
 static int write_dependence_p (const_rtx,
   const_rtx, machine_mode, rtx,
   bool, bool, bool);
+static int compare_base_symbol_refs (const_rtx, const_rtx);
 
 static void memory_modified_1 (rtx, const_rtx, void *);
 
@@ -1756,16 +1757,8 @@
   return LABEL_REF_LABEL (x) == LABEL_REF_LABEL (y);
 
 case SYMBOL_REF:
-  {
-   tree x_decl = SYMBOL_REF_DECL (x);
-   tree y_decl = SYMBOL_REF_DECL (y);
+  return compare_base_symbol_refs (x, y) == 1;
 
-   if (!x_decl || !y_decl)
- return XSTR (x, 0) == XSTR (y, 0);
-   else
- return compare_base_decls (x_decl, y_decl) == 1;
-  }
-
 case ENTRY_VALUE:
   /* This is magic, don't go through canonicalization et al.  */
   return rtx_equal_p (ENTRY_VALUE_EXP (x), ENTRY_VALUE_EXP (y));
@@ -2052,6 +2045,65 @@
   return ret;
 }
 
+/* Same as compare_base_decls but for SYMBOL_REF.
+   Return -2 if the offset based oracle can not be used (i.e.
+   we have a symbol and section anchor which is located insite
+   the same block.  */
+
+static int
+compare_base_symbol_refs (const_rtx x_base, const_rtx y_base)
+{
+  tree x_decl = SYMBOL_REF_DECL (x_base);
+  tree y_decl = SYMBOL_REF_DECL (y_base);
+  bool binds_def = true;
+  if (x_decl && y_decl)
+return compare_base_decls (x_decl, y_decl);
+  if (x_decl || y_decl)
+{
+  if (!x_decl)
+   {
+ std::swap (x_decl, y_decl);
+ std::swap (x_base, y_base);
+   }
+  /* Variable and anchor representing the variable alias.  If x_base
+is not a static variable or y_base is not an anchor (it is a label)
+we are safe.  */
+  if (!SYMBOL_REF_HAS_BLOCK_INFO_P (y_base)
+ || (TREE_CODE (x_decl) != VAR_DECL
+ || (!TREE_STATIC (x_decl) && !TREE_PUBLIC (x_decl
+   return 0;
+  symtab_node *x_node = symtab_node::get_create (x_decl)
+   ->ultimate_alias_target ();
+  /* External variable can not be in section anchor.  */
+  if (!x_node->definition)
+   return 0;
+  x_base = XEXP (DECL_RTL (x_node->decl), 0);
+  /* If not in anchor, we can disambiguate.  */
+  if (!SYMBOL_REF_HAS_BLOCK_INFO_P (x_base))
+   return 0;
+
+  /* We have an alias of anchored variable.  If it can be interposed;
+we must assume it may or may not alias its anchor.  */
+  binds_def = decl_binds_to_current_def_p (x_decl);
+}
+  /* If we have variable in section anchor, we can compare by offset.  */
+  if (SYMBOL_REF_HAS_BLOCK_INFO_P (x_base)
+  && SYMBOL_REF_HAS_BLOCK_INFO_P (y_base))
+{
+  if (SYMBOL_REF_BLOCK (x_base) != SYMBOL_REF_BLOCK (y_base))
+   return 0;
+  if (SYMBOL_REF_BLOCK_OFFSET (x_base) == SYMBOL_REF_BLOCK_OFFSET (y_base))
+   return binds_def ? 1 : -1;
+  /* Mixing anchors and non-anchors may result to false negative.
+We probably never do that.  */
+  if (SYMBOL_REF_ANCHOR_P (x_base) != SYMBOL_REF_ANCHOR_P (y_base))
+   return -2;
+  return 0;
+}
+  /* Label and label or label and section anchor. Copare symbol name.  */
+  return XSTR (x_base, 0) == XSTR (y_base, 0);
+}
+
 /* Return 0 if the addresses X and Y are known to point to different
objects, 1 if they might be pointers to the same object.  */
 
@@ -2090,16 +2142,8 @@
 return 1;
 
   if (GET_CODE (x_base) == SYMBOL_REF && GET_CODE (y_base) == SYMBOL_REF)
-{
-  

Re: ipa-cp heuristics fixes

2015-12-17 Thread Jan Hubicka
> Jakub,
> thanks a lot for looking into this! I am now bit on tight schedule moving back
> to Prague and I knew little about the implementation of debug info for
> optimized out arguments.
Hi,
here is better testcase that also trigger splitting
struct a {int a;int b;};

inline
static int reta (struct a a, int unused, int c)
{
  if (__builtin_expect (c,1) != 0)
   {
 return c;
   }
  test();
  test();
  test();
  test();
  test();
  test();
  test();
  test();
  test();
  return a.a;
}
main()
{
  struct a a={1,1};
  int v = reta(a,1,1);
  struct a a2={1,1};
  v += reta(a2,2,1);
  return v;
}

Compile with -fno-early-inlining

Honza


[PATCH] shrink-wrap: Once more PRs 67778, 68634, and now 68909

2015-12-17 Thread Segher Boessenkool
It turns out v4 wasn't quite complete anyway; so here "v5".

If a candidate PRE cannot get the prologue because a block BB is
reachable from it, but PRE does not dominate BB, we try again with the
dominators of PRE.  That "try again" needs to again consider BB though,
we aren't done with it.

This fixes this problem.  Tested on the 68909 testcase, and bootstrapped
and regression checked on powerpc64-linux.  Is this okay for trunk?


Segher


2015-12-17  Segher Boessenkool  

PR rtl-optimization/67778
PR rtl-optimization/68634
PR rtl-optimization/68909
* shrink-wrap.c (try_shrink_wrapping): If BB isn't dominated by PRE,
push it back on VEC.

---
 gcc/shrink-wrap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index f65b0c3..85e5a8b 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -781,6 +781,7 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
  if (!dominated_by_p (CDI_DOMINATORS, bb, pre))
{
  ok = false;
+ vec.quick_push (bb);
  break;
}
 
-- 
1.9.3



Re: config-list.mk and obsoleted configurations

2015-12-17 Thread Jeff Law




hWell, it might be the only target that has warnings because of that,
but from a quick look it seems like any target that uses avr-stdint.h or
newlib-stdint.h could theoretically have null values for those macros.
Without a bit of digging I'm not sure how much of that is real and how
much is completely theoretical archs that would have any number of other
problems.
The other targets don't trip over it for various reasons.  You have to 
dig into how the target stuff is setup in the avr port.  I outlined it a 
while back then went and had a beer to erase the memory of how that 
stuff got expanded.


jeff


Re: stop IPA wrapping 'main'

2015-12-17 Thread Jan Hubicka
> gcc.dg/20031102-1.c now causes some 'surprising' optimization
> behaviour.  It is essentially
> 
> int FooBar(void)
> {
>  ... stuff
>   return 0;
> }
> 
> int main(void)
> {
>   return FooBar();
> }
> 
> 
> What happens is  that FooBar gets inlined into main, and then
> ipa-icf notices FooBar and main have identical bodies.  It chooses
> to have FooBar tail call main, which results in a surprising  call
> of 'main'.   On PTX this is particularly unfortunate because we have
> to emit a single prototype for main with the regular argc and argv
> arguments (the backend gets around 'int main (void)' by faking the
> additional 2 args).  But that fails here because the tail call
> doesn't match the prototype.
> 
> Anyway, picking 'main' as the source function struck me as a poor
> choice, hence the attached patch.  It picks the second function of a
> congruent set, if the first is 'main'.  Note that even on, say
> x86-linux, we emit a tail call rather than an alias for the included
> testcase.
> 
> I removed the gcc_assert, as the vector indexing operator already
> checks the subscript is within range.
> 
> Alternatively I could probably just fixup the testcase to make
> FooBar uninlinable, as I suspect that might have been the original
> intent.
> 
> tested on x86_64-linux and ptx-none.
> 
> nathan

> 2015-12-17  Nathan Sidwell  
> 
>   gcc/
>   * ipa-icf.c (sem_item_optimizer::merge): Don't pick 'main' as the
>   source function.
> 
>   gcc/testsuite/
>   * gcc.dg/ipa/ipa-icf-merge-1.c: New.

OK, thanks. Indeed we should not introduce new calls to main :)
It contains some magic stuff on x86 targets, too.

Honza


Re: ipa-cp heuristics fixes

2015-12-17 Thread Jan Hubicka
Jakub,
thanks a lot for looking into this! I am now bit on tight schedule moving back
to Prague and I knew little about the implementation of debug info for
optimized out arguments.
> 
> Ok, so here is a WIP patch changing the functions you wanted, untested so
> far.
> 
> I've been looking at 3 testcases (attached), -1.c and -3.c with -g -O2,
> and -2.c with -g -O3.
> The -3.c one is a copy of the test we have for the ipa-split debug info
> stuff, before/after the patch we generate the same stuff.
> -2.c testcase is for the (new or now much more often taken patch) of

The path is not really new or much more taken :) We used to do 50k clones on
Firefox, now we do 11k clones.  It is just taken on different testcases
than before... So good news is that solving this should improve debug info
in quite few cases.

> ipa-cp, the patch arranges for proper debug info in that case
> (but, I'm really surprised why when the function is already cloned, nothing
> figures out that the clone is always called with the same constant
> passed to the arg8 and the argument isn't removed and replaced by constant.

I will take a look. Perhaps Martin will know.

> -1.c is a testcase for the IPA-SRA path, where we unfortunately end up with
> -a slight regression (on the IL size, in the end we generate the same
> assembly):
> +  # DEBUG D#8 s=> arg8
> +  # DEBUG arg8 => D#8
># DEBUG arg8 => 7
> with the patch.  On that testcase, arg8 is used, but it is always passed
> value 7 (similarly to -2.c testcase) and in that case we really don't
> need/want the decl_debug_args stuff, it is unnecessary, it is enough to say
> in the callee that arg8 is 7.  Nothing on the caller side sets the magic
> corresponding D# debug expr decl anyway.
> Either tree_versioning is too low-level for the debug info addition, or
> we need to figure out how to tell it if a constant will be always passed
> to some argument and what that constant will be, so that we'd emit

tree_versioning has the info for that.  It obtains args_to_skip for arguments
that should be skipped and also tree_map which tells for those argument
that are skipped (removed) what they should be replaced for.  For those we
substituted by a constant, we ge the constant there.  

Note that there is also code to replace aggrgates by constants that bypasses
tree_versioning but I donot think we remove the aggregate arguments after 
propagating the constant in (we should).

I wonder if you tried to trigger a cascaded clonning.  For example:
struct a {int a;int b;};

static int reta (struct a a, int unused)
{
  return a.a;
}
main()
{
  struct a a={1,1};
  int v = reta(a,1);
  struct a a2={1,1};
  v += reta(a2,2);
  return v;
}

will first produce isra clone and then constprop clone and finally turn it to 
inline clone.

The changed to cgraph and tree-inline makes sense to me.

Honza
> +  /* For optimized away parameters, add on the caller side
> +  before the call
> +  DEBUG D#X => parm_Y(D)
> +  stmts and associate D#X with parm in decl_debug_args_lookup
> +  vector to say for debug info that if parameter parm had been passed,
> +  it would have value parm_Y(D).  */
> +  if (e->callee->clone.combined_args_to_skip && MAY_HAVE_DEBUG_STMTS)
> + {
> +   vec **debug_args
> + = decl_debug_args_lookup (e->callee->decl);
> +   if (debug_args)
> + {
> +   tree parm;
> +   unsigned i = 0, num;
> +   unsigned len = vec_safe_length (*debug_args);
> +   for (parm = DECL_ARGUMENTS (decl), num = 0;
> +parm; parm = DECL_CHAIN (parm), num++)
> + if (bitmap_bit_p (e->callee->clone.combined_args_to_skip, num)
> + && is_gimple_reg (parm))
> +   {
> + gimple *def_temp;
> + unsigned last = i;
> +
> + while (i < len && (**debug_args)[i] != DECL_ORIGIN (parm))
> +   i += 2;
> + if (i >= len)
> +   {
> + i = 0;
> + while (i < last && (**debug_args)[i]
> +!= DECL_ORIGIN (parm))
> +   i += 2;
> + if (i >= last)
> +   continue;
> +   }
> + tree ddecl = (**debug_args)[i + 1];
> + tree arg = gimple_call_arg (e->call_stmt, num);
> + def_temp
> +   = gimple_build_debug_bind (ddecl, unshare_expr (arg),
> +  e->call_stmt);
> + gsi_insert_before (&gsi, def_temp, GSI_SAME_STMT);
> +   }
> + }
> + }
> +
>gsi_replace (&gsi, new_stmt, false);
>/* We need to defer cleaning EH info on the new statement to
>   fixup-cfg.  We may not have dominator information at this point
> --- gcc/tree-inline.c.jj  2015-12-10 16:56:26.0 +0100
> +++ gcc/tree-inline.c 2015-12-17 18:56:18.66716

Re: PATCH: PR target/66232: -fPIC -fno-plt -mx32 fails to generate indirect branch via GOT

2015-12-17 Thread H.J. Lu
On Thu, Dec 17, 2015 at 8:49 AM, H.J. Lu  wrote:
> Since Pmode is 64-bit with -maddress-mode=long for x32, indirect call
> via GOT slot doesn't need zero_extend.  This patch limits *call_got_x32
> and *call_value_got_x32 patterns to 32-bit Pmode, adds *call_got_x32_long
> and *call_value_got_x32_long for 64-bit Pmode.
>
> OK for trunk if there is no regression?
>
>
> H.J.
> ---
> gcc/
>
> PR target/66232
> * config/i386/i386.md (*call_got_x32): Limited to 32-bit Pmode.
> (*call_value_got_x32): Likewise.
> (*call_got_x32_long): New pattern.
> (call_value_got_x32_long): Likewise.
>

Here is a different approach without adding new patterns.
Either one works.

-- 
H.J.
From 118cf4c2c928608f7ad7edd0812d6f2f880dbd55 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 17 Dec 2015 08:42:06 -0800
Subject: [PATCH] Allow indirect call via GOT for 64-bit Pmode x32

Since Pmode is 64-bit with -maddress-mode=long for x32, indirect call
via GOT slot doesn't need zero_extend.  This patch enables indirect call
via GOT for x32 with 64-bit Pmode.

gcc/

	PR target/66232
	* config/i386/constraints.md (Bs): Allow GOT slot for x32 with
	64-bit Pmode.
	(Bw): Likewise.
	(Bz): Likewise.
	* config/i386/predicates.md (call_insn_operand): Likewise.
	(sibcall_insn_operand): Likewise.

gcc/testsuite/

	PR target/66232
	* gcc.target/i386/pr66232-10.c: New test.
	* gcc.target/i386/pr66232-11.c: Likewise.
	* gcc.target/i386/pr66232-12.c: Likewise.
	* gcc.target/i386/pr66232-13.c: Likewise.
---
 gcc/config/i386/constraints.md | 12 +++
 gcc/config/i386/predicates.md  | 32 +-
 gcc/testsuite/gcc.target/i386/pr66232-10.c | 13 
 gcc/testsuite/gcc.target/i386/pr66232-11.c | 14 +
 gcc/testsuite/gcc.target/i386/pr66232-12.c | 13 
 gcc/testsuite/gcc.target/i386/pr66232-13.c | 13 
 6 files changed, 79 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-13.c

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 2861d8d..b46d32b 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -162,13 +162,17 @@
 
 (define_constraint "Bs"
   "@internal Sibcall memory operand."
-  (and (not (match_test "TARGET_X32"))
-   (match_operand 0 "sibcall_memory_operand")))
+  (ior (and (not (match_test "TARGET_X32"))
+	(match_operand 0 "sibcall_memory_operand"))
+   (and (match_test "TARGET_X32 && Pmode == DImode")
+	(match_operand 0 "GOT_memory_operand"
 
 (define_constraint "Bw"
   "@internal Call memory operand."
-  (and (not (match_test "TARGET_X32"))
-   (match_operand 0 "memory_operand")))
+  (ior (and (not (match_test "TARGET_X32"))
+	(match_operand 0 "memory_operand"))
+   (and (match_test "TARGET_X32 && Pmode == DImode")
+	(match_operand 0 "GOT_memory_operand"
 
 (define_constraint "Bz"
   "@internal Constant call address operand."
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 8bdd5d8..ab8ef06 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -606,32 +606,36 @@
 			&& GET_CODE (XEXP (XEXP (XEXP (op, 0), 1), 0)) == UNSPEC
 			&& XINT (XEXP (XEXP (XEXP (op, 0), 1), 0), 1) == UNSPEC_GOT)")))
 
+;; Return true if OP is a GOT memory operand.
+(define_predicate "GOT_memory_operand"
+  (match_operand 0 "memory_operand")
+{
+  op = XEXP (op, 0);
+  return (GET_CODE (op) == CONST
+	  && GET_CODE (XEXP (op, 0)) == UNSPEC
+	  && XINT (XEXP (op, 0), 1) == UNSPEC_GOTPCREL);
+})
+
 ;; Test for a valid operand for a call instruction.
 ;; Allow constant call address operands in Pmode only.
 (define_special_predicate "call_insn_operand"
   (ior (match_test "constant_call_address_operand
 		 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "call_register_no_elim_operand")
-   (and (not (match_test "TARGET_X32"))
-	(match_operand 0 "memory_operand"
+   (ior (and (not (match_test "TARGET_X32"))
+		 (match_operand 0 "sibcall_memory_operand"))
+	(and (match_test "TARGET_X32 && Pmode == DImode")
+		 (match_operand 0 "GOT_memory_operand")
 
 ;; Similarly, but for tail calls, in which we cannot allow memory references.
 (define_special_predicate "sibcall_insn_operand"
   (ior (match_test "constant_call_address_operand
 		 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "register_no_elim_operand")
-   (and (not (match_test "TARGET_X32"))
-	(match_operand 0 "sibcall_memory_operand"
-
-;; Return true if OP is a GOT memory operand.
-(define_predicate "GOT_memory_operand"
-  (match_operand 0 "memory_operand")
-{
-  op = XEXP (op, 0);
-  return (GET_CODE (op) == CONST
-	  && GET_CODE (X

Re: config-list.mk and obsoleted configurations

2015-12-17 Thread Trevor Saunders
On Thu, Dec 17, 2015 at 12:53:20PM -0700, Jeff Law wrote:
> On 12/17/2015 11:58 AM, Jan-Benedict Glaw wrote:
> >On Thu, 2015-12-17 11:39:24 -0700, Jeff Law  wrote:
> >>On 12/17/2015 11:34 AM, Jan-Benedict Glaw wrote:
> >>>On Thu, 2015-12-17 11:05:42 -0700, Jeff Law  wrote:
> On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:
> >Shall I bisect one of the cases anew, with the "Test value of
> >_GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that
> >uncovered it, applied? Starting with some arbitrary old revision?
> Yes.  I'd really like to see config-list.mk working again.  The
> first step is always building a test the developers can easily work
> with.
> >>>
> >>>Will do. Have a good starting point?
> >>The biggest problem is the breakage around wither USE_C99_WCHAR or delayed
> >>folding.  I think I counted 30+ targets that were effected.
> >
> >It's probably delayed folding; seems the USE_C99_WCHAR stuff only
> >uncovers it, doesn't it?
> >
> >>Once that's settled, I suspect anything remaining will be pretty minor.
> >>
> >>I'd disable interix completely.
> >
> >Seems to be not hard to fix. Breaks with:
> I know, but it's not worth fixing IMHO.  Interix has been a dead product for
> a long time.  We almost got rid of it several years ago, but someone
> objected and said they'd maintain it.  I asked Trevor to put it back on the
> deprecated list a little while ago.
> 
> AFAICT it hasn't been building since 2012.  I fixed some of the problems a
> few months ago, but just can't really justify anyone's time to figure out
> which way to #define this away to preserve prior behaviour and to continue
> to keep it working over time.

 and killing it will help move towards killing other things you dislike
 like sdb and dbx.

> 
> >
> >>Not sure what to do with avr-rtems at this point.
> >
> >My buildrobot just fails at the very same USE_C99_WCHAR issue right
> >now. Is there something more hidden, later on in the build?
> avr-rtems has deeper issues, which ultimately look like the same problem
> you're seeing with delayed folding, but aren't the same problem.
> 
> Essentially avr-rtems's definitions of various standard types are all
> conditional on flags with a default that is NULL.  Those are ultimately
> passed to one of the str* functions and GCC throws a warning/failure.

hWell, it might be the only target that has warnings because of that,
but from a quick look it seems like any target that uses avr-stdint.h or
newlib-stdint.h could theoretically have null values for those macros.
Without a bit of digging I'm not sure how much of that is real and how
much is completely theoretical archs that would have any number of other
problems.

Trev


> 
> There's no way to fold those down to a constant, (or even to prove the NULL
> case couldn't happen IIRC).  So even once the current delayed folding issue
> gets fixed, avr-rtems will remain broken.
> 
> It's also unclear how long avr-rtems will be around.  I get the sense it's
> on its last legs -- and given we have both avr and rtems coverage via other
> targets, I don't think building avr-rtems is really all that helpful.
> 
> Jeff


Re: [PATCH][WIP] libstdc++: Make certain exceptions transaction_safe.

2015-12-17 Thread Jonathan Wakely

On 14/11/15 20:45 +0100, Torvald Riegel wrote:

+void
+_txnal_cow_string_D1(void *that)
+{
+  typedef std::basic_string bs_type;
+  bs_type::_Rep *rep = reinterpret_cast(
+  const_cast(_txnal_cow_string_c_str(that))) - 1;
+
+  // The string can be shared, in which case we would need to decrement the
+  // reference count.  We cannot undo that because we might loose the string
+  // otherwise.  Therefore, we register a commit action that will dispose of
+  // the string's _Rep.
+  enum {_ITM_noTransactionId  = 1};
+  _ITM_addUserCommitAction(_txnal_cow_string_D1_commit, _ITM_noTransactionId,
+  rep);
+}


s/loose/lose/


Re: config-list.mk and obsoleted configurations

2015-12-17 Thread Jeff Law

On 12/17/2015 11:58 AM, Jan-Benedict Glaw wrote:

On Thu, 2015-12-17 11:39:24 -0700, Jeff Law  wrote:

On 12/17/2015 11:34 AM, Jan-Benedict Glaw wrote:

On Thu, 2015-12-17 11:05:42 -0700, Jeff Law  wrote:

On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:

Shall I bisect one of the cases anew, with the "Test value of
_GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that
uncovered it, applied? Starting with some arbitrary old revision?

Yes.  I'd really like to see config-list.mk working again.  The
first step is always building a test the developers can easily work
with.


Will do. Have a good starting point?

The biggest problem is the breakage around wither USE_C99_WCHAR or delayed
folding.  I think I counted 30+ targets that were effected.


It's probably delayed folding; seems the USE_C99_WCHAR stuff only
uncovers it, doesn't it?


Once that's settled, I suspect anything remaining will be pretty minor.

I'd disable interix completely.


Seems to be not hard to fix. Breaks with:
I know, but it's not worth fixing IMHO.  Interix has been a dead product 
for a long time.  We almost got rid of it several years ago, but someone 
objected and said they'd maintain it.  I asked Trevor to put it back on 
the deprecated list a little while ago.


AFAICT it hasn't been building since 2012.  I fixed some of the problems 
a few months ago, but just can't really justify anyone's time to figure 
out which way to #define this away to preserve prior behaviour and to 
continue to keep it working over time.






Not sure what to do with avr-rtems at this point.


My buildrobot just fails at the very same USE_C99_WCHAR issue right
now. Is there something more hidden, later on in the build?
avr-rtems has deeper issues, which ultimately look like the same problem 
you're seeing with delayed folding, but aren't the same problem.


Essentially avr-rtems's definitions of various standard types are all 
conditional on flags with a default that is NULL.  Those are ultimately 
passed to one of the str* functions and GCC throws a warning/failure.


There's no way to fold those down to a constant, (or even to prove the 
NULL case couldn't happen IIRC).  So even once the current delayed 
folding issue gets fixed, avr-rtems will remain broken.


It's also unclear how long avr-rtems will be around.  I get the sense 
it's on its last legs -- and given we have both avr and rtems coverage 
via other targets, I don't think building avr-rtems is really all that 
helpful.


Jeff


Re: [PATCH] IRA: Fix % constraint modifier handling on disabled alternatives.

2015-12-17 Thread Vladimir Makarov

On 12/14/2015 08:05 AM, Andreas Krebbel wrote:

Hi,

the constraint modifier % applies to all the alternatives of a pattern
and hence is mostly added to the first constraint of an operand.  IRA
currently ignores it if the alternative with the % gets disabled by
using the `enabled' attribute or if it is not among the preferred
alternatives.

Fixed with the attached patch by moving the % check to the first loop
which walks unconditionally over all the constraints.

Ok for mainline?



Yes, Andreas.

Thanks for working on this issue.



Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Andrew Pinski
On Thu, Dec 17, 2015 at 12:58 AM, Yury Gribov  wrote:
> Some obvious symmetry fixes.
>
> Cc-ing
> * Andrey (Belevantsev) for bb_top_order_comparator
> * Andrew (MacLeod) for compare_case_labels
> * Andrew (Pinski) for resort_field_decl_cmp

IIRC this was actually not written by me but I copied it back from an
older version.  But then again this was over 10 years ago so I don't
remember the history on this any more.

Thanks,
Andrew

> * Diego for pair_cmp
> * Geoff for resort_method_name_cmp
> * Jakub for compare_case_labels
> * Jason for method_name_cmp
> * Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
> * Steven for cmp_v_in_regset_pool
>
> /Yury


Re: [PATCH 3/5] "Fix" intransitive comparison in reload_pseudo_compare_func

2015-12-17 Thread Vladimir Makarov

On 12/17/2015 04:00 AM, Yury Gribov wrote:
This patch fixes intransitive comparison in 
reload_pseudo_compare_func. Imagine the following

situation:
1) bitmap_bit_p is unset for A and B but set for C
2) A < B (due to early ira_reg_class_max_nregs comparison)
3) B < C (due to following regno_assign_info comparison)

It may then easily happen that A > C (due to regno_assign_info 
comparison) which violates the transitiveness requirement of total 
ordering.


Unfortunately I'm not sure how to properly fix this so Cc-ing Vladimir 
for help.


  Yury, thanks for reporting this.  Yes that could be a problem but I 
can not approve this patch as it might result in *significant* 
performance degradation.  I remember the code.  What you propose is the 
original patch (PR57878) and it was exactly modified to the current 
version because of the negative performance impact.  The current code is 
safe although it might result into infinite cycling for some sort 
algorithms but not for used qsort.


  I'll think how to fix it better. Probably I will need two comparison 
functions for different assignment iterations.  The solution will need 
benchmarking as the code is critical for LRA performance.  Could you 
fill a bug report in order not to forget the issue.





Re: [PATCH] Fix PR c++/68831 (superfluous -Waddress warning for C++ delete)

2015-12-17 Thread Jason Merrill

OK.

Jason


Re: config-list.mk and obsoleted configurations

2015-12-17 Thread Jan-Benedict Glaw
On Thu, 2015-12-17 11:39:24 -0700, Jeff Law  wrote:
> On 12/17/2015 11:34 AM, Jan-Benedict Glaw wrote:
> > On Thu, 2015-12-17 11:05:42 -0700, Jeff Law  wrote:
> > > On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:
> > > > Shall I bisect one of the cases anew, with the "Test value of
> > > > _GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that
> > > > uncovered it, applied? Starting with some arbitrary old revision?
> > > Yes.  I'd really like to see config-list.mk working again.  The
> > > first step is always building a test the developers can easily work
> > > with.
> >
> > Will do. Have a good starting point?
> The biggest problem is the breakage around wither USE_C99_WCHAR or delayed
> folding.  I think I counted 30+ targets that were effected.

It's probably delayed folding; seems the USE_C99_WCHAR stuff only
uncovers it, doesn't it?

> Once that's settled, I suspect anything remaining will be pretty minor.
> 
> I'd disable interix completely.

Seems to be not hard to fix. Breaks with:

g++ -fno-PIE -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
-Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual 
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror 
-fno-common  -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. 
-I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include 
-I/opt/cfarm/mpc/include  -I../../../gcc/gcc/../libdecnumber 
-I../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I../../../gcc/gcc/../libbacktrace   -I. -I. -I../../../gcc/gcc 
-I../../../gcc/gcc/. -I../../../gcc/gcc/../include 
-I../../../gcc/gcc/../libcpp/include -I/opt/cfarm/mpc/include  
-I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../libdecnumber/dpd 
-I../libdecnumber -I../../../gcc/gcc/../libbacktrace  \
../../../gcc/gcc/config/i386/winnt.c
../../../gcc/gcc/config/i386/winnt.c: In function ‘void 
i386_pe_unique_section(tree, int)’:
../../../gcc/gcc/config/i386/winnt.c:376:8: error: ‘flag_writable_rel_rdata’ 
was not declared in this scope
   if (!flag_writable_rel_rdata)
^~~

../../../gcc/gcc/config/i386/winnt.c: In function ‘unsigned int 
i386_pe_section_type_flags(tree, const char*, int)’:
../../../gcc/gcc/config/i386/winnt.c:432:8: error: ‘flag_writable_rel_rdata’ 
was not declared in this scope
   if (!flag_writable_rel_rdata)
^~~

../../../gcc/gcc/config/i386/t-interix:22: recipe for target 'winnt.o' failed




jbglaw@pluto:~/src/toolchain/gcc [master] $ git grep flag_writable_rel_rdata
gcc/ChangeLog-2012: Add new flag variable flag_writable_rel_rdata.
gcc/config/i386/cygming.opt:Common Report Var(flag_writable_rel_rdata) Init(0)
gcc/config/i386/winnt.c:  if (!flag_writable_rel_rdata)
gcc/config/i386/winnt.c:  if (!flag_writable_rel_rdata)



> Not sure what to do with avr-rtems at this point.

My buildrobot just fails at the very same USE_C99_WCHAR issue right
now. Is there something more hidden, later on in the build?

> >   Oh, there are some targets that were obsoleted today. I think the
> >OpenBSD3 and the two knetbsd configurations will need an
> >--enable-obsolete. I suggest this (untested) patch:
> >
> >contrib/
> >2015-12-17  Jan-Benedict Glaw  
> >
> > * config-list.mk (LIST): Add --enable-obsolete to recently obsoleted
> > targets x86_64-knetbsd-gnu, i686-knetbsd-gnu and i686-openbsd3.0 .
> Seems fine to me once it's gone through whatever testing you want to do.

Will verify that it's needed and if it is (as suspected), I'll commit
it properly.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of: 17:45 <@Eimann> Hrm, das E90 hat keinen Lebenszeit Call-Time 
Counter mehr
the second  : 17:46 <@jbglaw> Eimann: Wofür braucht man das?
  17:46 <@jbglaw> Eimann: Für mich ist an 'nem Handy wichtig, daß 
ich mein
  Gegeüber hören kann. Und daß mein Gegenüber mich 
versteht...
  17:47 <@KrisK> jbglaw: was du meinst ist wodka.
  17:47 <@KrisK> jbglaw: es klingelt und man hört stimmen


signature.asc
Description: Digital signature


Re: [PATCH] C FE: improvements to ranges of bad return values

2015-12-17 Thread Jeff Law

On 12/16/2015 07:19 PM, David Malcolm wrote:

In the C FE, c_parser_statement_after_labels passes "xloc" to
c_finish_return, which is the location of the first token
within the returned expression.

Hence we don't get a full underline for the following:

diagnostic-range-bad-return.c:34:10: warning: function returns address of local 
variable [-Wreturn-local-addr]
return &some_local;
   ^

This feels like a bug; this patch fixes it to use the location of
the expr if available, and to fall back to xloc otherwise, giving
us underlining of the full expression:

diagnostic-range-bad-return.c:34:10: warning: function returns address of local 
variable [-Wreturn-local-addr]
return &some_local;
   ^~~

The testcase also adds some coverage for underlining the
"return" token for the cases where we're warning about th
erroneous presence/absence of a return value.

As an additional tweak, it struck me that we could be more
user-friendly for these latter diagnostics by issuing a note
about where the function was declared, so this patch also adds
an inform for these cases:

diagnostic-range-bad-return.c: In function 'missing_return_value':
diagnostic-range-bad-return.c:31:3: warning: 'return' with no value, in 
function returning non-void
return; /* { dg-warning "'return' with no value, in function returning 
non-void" } */
^~

diagnostic-range-bad-return.c:29:5: note: declared here
  int missing_return_value (void)
  ^~~~

(ideally we'd put the underline on the return type, but that location
isn't captured)

This latter part of the patch is an enhancement rather than a
bugfix, though FWIW, and I'm not sure I can argue this with a
straight face, the tweak was posted as part of:
   "[PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics"
in https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00745.html
during stage 1.  Hopefully low risk, and a small usability improvement;
but if this is pushing it, it'd be simple to split this up and only
do the bug fix.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
adds 12 PASS results to gcc.sum.

OK for trunk for stage 3?

gcc/c/ChangeLog:
* c-parser.c (c_parser_statement_after_labels): When calling
c_finish_return, Use the return expression's location if it has
one, falling back to the location of the first token within it.
* c-typeck.c (c_finish_return): When issuing warnings about
the incorrect presence/absence of a return value, issue a note
showing the declaration of the function.
This is fine.  I think the first is pretty easy to justify.  THe second 
is harder.  Again I think it's very low risk and has user-visible benfits.


Jeff


[PATCH] [graphite] replace ISL with isl

2015-12-17 Thread Sebastian Pop
---
 Makefile.in   |  2 +-
 Makefile.tpl  |  2 +-
 config/isl.m4 |  2 +-
 configure | 10 +++---
 configure.ac  | 14 
 contrib/download_prerequisites|  2 +-
 gcc/Makefile.in   |  2 +-
 gcc/common.opt|  2 +-
 gcc/configure |  8 ++---
 gcc/configure.ac  |  8 ++---
 gcc/doc/install.texi  |  8 ++---
 gcc/doc/invoke.texi   |  4 +--
 gcc/graphite-isl-ast-to-gimple.c  | 47 +--
 gcc/graphite-scop-detection.c |  4 +--
 gcc/graphite-sese-to-poly.c   |  6 ++--
 gcc/graphite.c|  4 +--
 gcc/graphite.h|  2 +-
 gcc/params.def|  2 +-
 gcc/testsuite/gcc.dg/graphite/fuse-1.c|  4 +--
 gcc/testsuite/gcc.dg/graphite/fuse-2.c|  4 +--
 gcc/testsuite/gcc.dg/graphite/interchange-1.c |  2 +-
 gcc/testsuite/gcc.dg/graphite/pr35356-1.c |  2 +-
 gcc/toplev.c  |  2 +-
 23 files changed, 69 insertions(+), 74 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index cb62c35..e9b5950 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -312,7 +312,7 @@ NORMAL_TARGET_EXPORTS = \
 HOST_GMPLIBS = @gmplibs@
 HOST_GMPINC = @gmpinc@
 
-# Where to find ISL
+# Where to find isl
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
 
diff --git a/Makefile.tpl b/Makefile.tpl
index 693e4d5..f7bb77e 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -315,7 +315,7 @@ NORMAL_TARGET_EXPORTS = \
 HOST_GMPLIBS = @gmplibs@
 HOST_GMPINC = @gmpinc@
 
-# Where to find ISL
+# Where to find isl
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
 
diff --git a/config/isl.m4 b/config/isl.m4
index e4e4aab..86ccb94 100644
--- a/config/isl.m4
+++ b/config/isl.m4
@@ -94,7 +94,7 @@ AC_DEFUN([ISL_REQUESTED],
 
 # ISL_CHECK_VERSION ISL_CHECK_VERSION ()
 # 
-# Test that ISL contains functionality added to the minimum expected version.
+# Test whether isl contains functionality added to the minimum expected 
version.
 AC_DEFUN([ISL_CHECK_VERSION],
 [
   if test "${ENABLE_ISL_CHECK}" = yes ; then
diff --git a/configure b/configure
index c3c5cb0..f5786ed 100755
--- a/configure
+++ b/configure
@@ -1549,7 +1549,7 @@ Optional Packages:
   --with-boot-libs=LIBS   libraries for stage2 and later
   --with-boot-ldflags=FLAGS
   linker flags for stage2 and later
-  --with-isl=PATH Specify prefix directory for the installed ISL
+  --with-isl=PATH Specify prefix directory for the installed isl
   package. Equivalent to
   --with-isl-include=PATH/include plus
   --with-isl-lib=PATH/lib
@@ -5943,7 +5943,7 @@ fi
 
 
 
-# GCC GRAPHITE dependency ISL.
+# GCC GRAPHITE dependency isl.
 # Basic setup is inlined here, actual checks are in config/isl.m4
 
 
@@ -5956,7 +5956,7 @@ fi
 # Treat --without-isl as a request to disable
 # GRAPHITE support and skip all following checks.
 if test "x$with_isl" != "xno"; then
-  # Check for ISL
+  # Check for isl
 
 
 # Check whether --with-isl-include was given.
@@ -6079,13 +6079,13 @@ $as_echo "recommended isl version is 0.15, minimum 
required isl version 0.14 is
 && test "x${isllibs}" = x \
 && test "x${islinc}" = x ; then
 
-as_fn_error "Unable to find a usable ISL.  See config.log for details." 
"$LINENO" 5
+as_fn_error "Unable to find a usable isl.  See config.log for details." 
"$LINENO" 5
   fi
 
 
 fi
 
-# If the ISL check failed, disable builds of in-tree variant of ISL
+# If the isl check failed, disable builds of in-tree variant of isl
 if test "x$with_isl" = xno ||
test "x$gcc_cv_isl" = xno; then
   noconfigdirs="$noconfigdirs isl"
diff --git a/configure.ac b/configure.ac
index a6998ff..a719e03 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1773,31 +1773,31 @@ AC_ARG_WITH(boot-ldflags,
  fi])
 AC_SUBST(poststage1_ldflags)
 
-# GCC GRAPHITE dependency ISL.
+# GCC GRAPHITE dependency isl.
 # Basic setup is inlined here, actual checks are in config/isl.m4
 
 AC_ARG_WITH(isl,
   [AS_HELP_STRING(
[--with-isl=PATH],
-   [Specify prefix directory for the installed ISL package.
+   [Specify prefix directory for the installed isl package.
 Equivalent to --with-isl-include=PATH/include
 plus --with-isl-lib=PATH/lib])])
 
 # Treat --without-isl as a request to disable
 # GRAPHITE support and skip all following checks.
 if test "x$with_isl" != "xno"; then
-  # Check for ISL
+  # Check for isl
   dnl Provide configure switches and initialize islinc & isllibs
   dnl with user input.
   ISL_INIT_FLAGS
- 

stop IPA wrapping 'main'

2015-12-17 Thread Nathan Sidwell
gcc.dg/20031102-1.c now causes some 'surprising' optimization behaviour.  It is 
essentially


int FooBar(void)
{
 ... stuff
  return 0;
}

int main(void)
{
  return FooBar();
}


What happens is  that FooBar gets inlined into main, and then ipa-icf notices 
FooBar and main have identical bodies.  It chooses to have FooBar tail call 
main, which results in a surprising  call of 'main'.   On PTX this is 
particularly unfortunate because we have to emit a single prototype for main 
with the regular argc and argv arguments (the backend gets around 'int main 
(void)' by faking the additional 2 args).  But that fails here because the tail 
call doesn't match the prototype.


Anyway, picking 'main' as the source function struck me as a poor choice, hence 
the attached patch.  It picks the second function of a congruent set, if the 
first is 'main'.  Note that even on, say x86-linux, we emit a tail call rather 
than an alias for the included testcase.


I removed the gcc_assert, as the vector indexing operator already checks the 
subscript is within range.


Alternatively I could probably just fixup the testcase to make FooBar 
uninlinable, as I suspect that might have been the original intent.


tested on x86_64-linux and ptx-none.

nathan
2015-12-17  Nathan Sidwell  

	gcc/
	* ipa-icf.c (sem_item_optimizer::merge): Don't pick 'main' as the
	source function.

	gcc/testsuite/
	* gcc.dg/ipa/ipa-icf-merge-1.c: New.
	
Index: ipa-icf.c
===
--- ipa-icf.c	(revision 231770)
+++ ipa-icf.c	(working copy)
@@ -3398,14 +3398,20 @@ sem_item_optimizer::merge_classes (unsig
 	if (c->members.length () == 1)
 	  continue;
 
-	gcc_assert (c->members.length ());
-
 	sem_item *source = c->members[0];
 
-	for (unsigned int j = 1; j < c->members.length (); j++)
+	if (MAIN_NAME_P (DECL_NAME (source->decl)))
+	  /* If merge via wrappers, picking main as the target can be
+	 problematic.  */
+	  source = c->members[1];
+
+	for (unsigned int j = 0; j < c->members.length (); j++)
 	  {
 	sem_item *alias = c->members[j];
 
+	if (alias == source)
+	  continue;
+
 	if (dump_file)
 	  {
 		fprintf (dump_file, "Semantic equality hit:%s->%s\n",
Index: testsuite/gcc.dg/ipa/ipa-icf-merge-1.c
===
--- testsuite/gcc.dg/ipa/ipa-icf-merge-1.c	(revision 0)
+++ testsuite/gcc.dg/ipa/ipa-icf-merge-1.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -fdump-ipa-icf" } */
+
+/* Picking 'main' as a candiate target for equivalent functios is not a
+   good idea.  */
+
+int baz (int);
+
+int foo ()
+{
+  return baz (baz (0));
+}
+
+
+int main ()
+{
+  return baz (baz (0));
+}
+
+/* Notice the two functions are the same.  */
+/* { dg-final { scan-ipa-dump "Semantic equality hit:foo->main" "icf" } } */
+
+/* Make sure we don't tail call main.  */
+/* { dg-final { scan-ipa-dump-not "= main \\(\\);" "icf" } } */
+
+/* Make sure we tail call foo.  */
+/* { dg-final { scan-ipa-dump "= foo \\(\\);" "icf" } } */


Re: config-list.mk and obsoleted configurations

2015-12-17 Thread Jeff Law

On 12/17/2015 11:34 AM, Jan-Benedict Glaw wrote:

On Thu, 2015-12-17 11:05:42 -0700, Jeff Law  wrote:

On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:

Shall I bisect one of the cases anew, with the "Test value of
_GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that
uncovered it, applied? Starting with some arbitrary old revision?

Yes.  I'd really like to see config-list.mk working again.  The
first step is always building a test the developers can easily work
with.


Will do. Have a good starting point?
The biggest problem is the breakage around wither USE_C99_WCHAR or 
delayed folding.  I think I counted 30+ targets that were effected.


Once that's settled, I suspect anything remaining will be pretty minor.

I'd disable interix completely.

Not sure what to do with avr-rtems at this point.


   Oh, there are some targets that were obsoleted today. I think the
OpenBSD3 and the two knetbsd configurations will need an
--enable-obsolete. I suggest this (untested) patch:

contrib/
2015-12-17  Jan-Benedict Glaw  

* config-list.mk (LIST): Add --enable-obsolete to recently obsoleted
targets x86_64-knetbsd-gnu, i686-knetbsd-gnu and i686-openbsd3.0 .

Seems fine to me once it's gone through whatever testing you want to do.

jeff



config-list.mk and obsoleted configurations (was: [BUILDROBOT] "error: null argument where non-null required" on multiple targets)

2015-12-17 Thread Jan-Benedict Glaw
On Thu, 2015-12-17 11:05:42 -0700, Jeff Law  wrote:
> On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:
> > Shall I bisect one of the cases anew, with the "Test value of
> > _GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that
> > uncovered it, applied? Starting with some arbitrary old revision?
> Yes.  I'd really like to see config-list.mk working again.  The
> first step is always building a test the developers can easily work
> with.

Will do. Have a good starting point?

  Oh, there are some targets that were obsoleted today. I think the
OpenBSD3 and the two knetbsd configurations will need an
--enable-obsolete. I suggest this (untested) patch:

contrib/
2015-12-17  Jan-Benedict Glaw  

* config-list.mk (LIST): Add --enable-obsolete to recently obsoleted
targets x86_64-knetbsd-gnu, i686-knetbsd-gnu and i686-openbsd3.0 .

diff --git a/contrib/ChangeLog b/contrib/ChangeLog
index 8d39e68..ab8060b 100644
--- a/contrib/ChangeLog
+++ b/contrib/ChangeLog
@@ -1,3 +1,8 @@
+2015-12-17  Jan-Benedict Glaw  
+
+   * config-list.mk (LIST): Add --enable-obsolete to recently obsoleted
+   targets x86_64-knetbsd-gnu, i686-knetbsd-gnu and i686-openbsd3.0 .
+
 2015-12-06  Tobias Burnus  
 
* download_prerequisites: Download ISL 0.15 instead of 0.14.
diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index f0e39d6..0f15464 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -28,7 +28,8 @@ LIST = aarch64-elf aarch64-linux-gnu \
   hppa64-hpux11.0OPT-enable-sjlj-exceptions=yes hppa2.0-hpux11.9 \
   i686-pc-linux-gnu i686-apple-darwin i686-apple-darwin9 i686-apple-darwin10 \
   i486-freebsd4 i686-freebsd6 i686-kfreebsd-gnu \
-  i686-netbsdelf9 i686-knetbsd-gnu i686-openbsd i686-openbsd3.0 \
+  i686-netbsdelf9 i686-knetbsd-gnuOPT-enable-obsolete \
+  i686-openbsd i686-openbsd3.0OPT-enable-obsolete \
   i686-elf i686-kopensolaris-gnu i686-symbolics-gnu i686-pc-msdosdjgpp \
   i686-lynxos i686-nto-qnx \
   i686-rtems i686-solaris2.10 i686-wrs-vxworks \
@@ -74,7 +75,7 @@ LIST = aarch64-elf aarch64-linux-gnu \
   vax-netbsdelf vax-openbsd visium-elf x86_64-apple-darwin \
   x86_64-pc-linux-gnuOPT-with-fpmath=avx \
   x86_64-elfOPT-with-fpmath=sse x86_64-freebsd6 x86_64-netbsd \
-  x86_64-knetbsd-gnu x86_64-w64-mingw32 \
+  x86_64-knetbsd-gnuOPT-enable-obsolete x86_64-w64-mingw32 \
   x86_64-mingw32OPT-enable-sjlj-exceptions=yes xstormy16-elf xtensa-elf \
   xtensa-linux \
   i686-interix3OPT-enable-obsolete



MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of: 23:53 <@jbglaw> So, ich kletter' jetzt mal ins Bett.
the second  : 23:57 <@jever2> .oO( kletter ..., hat er noch Gitter vorm Bett, 
wie früher meine Kinder?)
  00:00 <@jbglaw> jever2: *patsch*
  00:01 <@jever2> *aua*, wofür, Gedanken sind frei!
  00:02 <@jbglaw> Nee, freie Gedanken, die sind seit 1984 doch aus!
  00:03 <@jever2> 1984? ich bin erst seit 1985 verheiratet!


signature.asc
Description: Digital signature


Re: [PATCH] PR c++/68795: fix uninitialized close_paren_loc in cp_parser_postfix_expression

2015-12-17 Thread Bernd Schmidt

On 12/17/2015 07:32 PM, David Malcolm wrote:

+   if (close_paren_loc)


close_paren_loc != UNKNOWN_LOCATION - it's very confusing otherwise.


Bernd


Re: ipa-cp heuristics fixes

2015-12-17 Thread Jakub Jelinek
On Wed, Dec 16, 2015 at 08:15:12PM +0100, Jan Hubicka wrote:
> just to summarize a discussion on IRC. The problem is that we produce debug
> statements for eliminated arguments only in ipa-sra and ipa-split, while we
> don't do anything for cgraph clones. This is a problem on release branches,
> too.
> 
> It seems we have all the necessary logic, but the callee modification code 
> from
> ipa-split should be moved to tree_function_versioning (which is used by both
> ipa-split and cgraph clone mechanizm) and caller modifcation copied to
> cgraph_edge::redirect_call_stmt_to_callee.
> 
> I am trying to do that. It seems bit difficult as the caller and callee
> modifications are tied together and I do not know how chaining of
> transfomraitons is going to work. 

Ok, so here is a WIP patch changing the functions you wanted, untested so
far.

I've been looking at 3 testcases (attached), -1.c and -3.c with -g -O2,
and -2.c with -g -O3.
The -3.c one is a copy of the test we have for the ipa-split debug info
stuff, before/after the patch we generate the same stuff.
-2.c testcase is for the (new or now much more often taken patch) of
ipa-cp, the patch arranges for proper debug info in that case
(but, I'm really surprised why when the function is already cloned, nothing
figures out that the clone is always called with the same constant
passed to the arg8 and the argument isn't removed and replaced by constant.
-1.c is a testcase for the IPA-SRA path, where we unfortunately end up with
-a slight regression (on the IL size, in the end we generate the same
assembly):
+  # DEBUG D#8 s=> arg8
+  # DEBUG arg8 => D#8
   # DEBUG arg8 => 7
with the patch.  On that testcase, arg8 is used, but it is always passed
value 7 (similarly to -2.c testcase) and in that case we really don't
need/want the decl_debug_args stuff, it is unnecessary, it is enough to say
in the callee that arg8 is 7.  Nothing on the caller side sets the magic
corresponding D# debug expr decl anyway.
Either tree_versioning is too low-level for the debug info addition, or
we need to figure out how to tell it if a constant will be always passed
to some argument and what that constant will be, so that we'd emit
always the # DEBUG arg8 => constant in that case instead of the source bind
stuff (but then figure out what has added that and avoid duplication too).

And then there is another thing, but best to be handled somewhere in
dwarf2out.c or in the debugger.  The arguments are printed in pretty random
order:
#0  foo (arg7=arg7@entry=30, arg8=arg8@entry=7, arg6=6, arg5=5, arg4=4, arg3=3, 
arg2=2, arg1=1) at pr68860-2.c:15
So, either the debugger for functions with abstract origins should look at
the order of arguments in the abstract origin and ignore order in the
particular instantiation, or dwarf2out.c should sort the
DW_TAG_formal_parameter such that it if at all possible matches the order
specified in the source.

--- gcc/ipa-split.c.jj  2015-12-10 11:14:00.0 +0100
+++ gcc/ipa-split.c 2015-12-17 18:21:39.402036180 +0100
@@ -1209,7 +1209,6 @@ split_function (basic_block return_bb, s
   gimple *last_stmt = NULL;
   unsigned int i;
   tree arg, ddef;
-  vec **debug_args = NULL;
 
   if (dump_file)
 {
@@ -1432,73 +1431,38 @@ split_function (basic_block return_bb, s
  vector to say for debug info that if parameter parm had been passed,
  it would have value parm_Y(D).  */
   if (args_to_skip)
-for (parm = DECL_ARGUMENTS (current_function_decl), num = 0;
-parm; parm = DECL_CHAIN (parm), num++)
-  if (bitmap_bit_p (args_to_skip, num)
- && is_gimple_reg (parm))
-   {
- tree ddecl;
- gimple *def_temp;
-
- /* This needs to be done even without MAY_HAVE_DEBUG_STMTS,
-otherwise if it didn't exist before, we'd end up with
-different SSA_NAME_VERSIONs between -g and -g0.  */
- arg = get_or_create_ssa_default_def (cfun, parm);
- if (!MAY_HAVE_DEBUG_STMTS)
-   continue;
-
- if (debug_args == NULL)
-   debug_args = decl_debug_args_insert (node->decl);
- ddecl = make_node (DEBUG_EXPR_DECL);
- DECL_ARTIFICIAL (ddecl) = 1;
- TREE_TYPE (ddecl) = TREE_TYPE (parm);
- DECL_MODE (ddecl) = DECL_MODE (parm);
- vec_safe_push (*debug_args, DECL_ORIGIN (parm));
- vec_safe_push (*debug_args, ddecl);
- def_temp = gimple_build_debug_bind (ddecl, unshare_expr (arg),
- call);
- gsi_insert_after (&gsi, def_temp, GSI_NEW_STMT);
-   }
-  /* And on the callee side, add
- DEBUG D#Y s=> parm
- DEBUG var => D#Y
- stmts to the first bb where var is a VAR_DECL created for the
- optimized away parameter in DECL_INITIAL block.  This hints
- in the debug info that var (whole DECL_ORIGIN is the parm PARM_DECL)
- is optimized away, but could be looked up at the call site
- as value of D#X there.  */
-  if (debug_args != NULL)
 

[PATCH] PR c++/68795: fix uninitialized close_paren_loc in cp_parser_postfix_expression

2015-12-17 Thread David Malcolm
cp_parser_parenthesized_expression_list can leave *close_paren_loc
untouched if an error occurs; specifically when following this goto:

7402  if (expr == error_mark_node)
7403goto skip_comma;

which can lead to cp_parser_postfix_expression attempting to
use uninitialized data for the finishing location of a
parenthesized expression.

The attached patch fixes this by having cp_parser_postfix_expression
initialize the underlying location to UNKNOWN_LOCATION, and only use
it if it's been written to.

Verified the fix manually by compiling
  g++.old-deja/g++.ns/invalid1.C
before and after under valgrind.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/cp/ChangeLog:
* parser.c (cp_parser_postfix_expression): Initialize
close_paren_loc to UNKNOWN_LOCATION; only use it if
it has been written to by
cp_parser_parenthesized_expression_list.
(cp_parser_postfix_dot_deref_expression): Likewise.
(cp_parser_parenthesized_expression_list): Document the behavior
with respect to the CLOSE_PAREN_LOC param.
---
 gcc/cp/parser.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a420cf1..56dfe42 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -6664,7 +6664,7 @@ cp_parser_postfix_expression (cp_parser *parser, bool 
address_p, bool cast_p,
bool saved_non_integral_constant_expression_p = false;
tsubst_flags_t complain = complain_flags (decltype_p);
vec *args;
-   location_t close_paren_loc;
+   location_t close_paren_loc = UNKNOWN_LOCATION;
 
 is_member_access = false;
 
@@ -6826,10 +6826,13 @@ cp_parser_postfix_expression (cp_parser *parser, bool 
address_p, bool cast_p,
koenig_p,
complain);
 
-   location_t combined_loc = make_location (token->location,
-start_loc,
-close_paren_loc);
-   postfix_expression.set_location (combined_loc);
+   if (close_paren_loc)
+ {
+   location_t combined_loc = make_location (token->location,
+start_loc,
+close_paren_loc);
+   postfix_expression.set_location (combined_loc);
+ }
 
/* The POSTFIX_EXPRESSION is certainly no longer an id.  */
idk = CP_ID_KIND_NONE;
@@ -7298,7 +7301,10 @@ cp_parser_postfix_dot_deref_expression (cp_parser 
*parser,
plain identifier argument, normal_attr for an attribute that wants
an expression, or non_attr if we aren't parsing an attribute list.  If
NON_CONSTANT_P is non-NULL, *NON_CONSTANT_P indicates whether or
-   not all of the expressions in the list were constant.  */
+   not all of the expressions in the list were constant.
+   If CLOSE_PAREN_LOC is non-NULL, and no errors occur, then *CLOSE_PAREN_LOC
+   will be written to with the location of the closing parenthesis.  If
+   an error occurs, it may or may not be written to.  */
 
 static vec *
 cp_parser_parenthesized_expression_list (cp_parser* parser,
-- 
1.8.5.3



Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread H.J. Lu
On Thu, Dec 17, 2015 at 8:11 AM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 7:50 AM, H.J. Lu  wrote:
>> On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak  wrote:
>>> On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
 On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
>> Since sibcall never returns, we can only use call-clobbered register
>> as GOT base.  Otherwise, callee-saved register used as GOT base won't
>> be properly restored.
>>
>> Tested on x86-64 with -m32.  OK for trunk?
>
> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
> class, and register_no_elim_operand predicate should be used with "U"
> constraint. Also, please introduce new predicate, similar to how
> GOT_memory_operand is defined and handled.
>

 Here is the updated patch.  There is a predicate already,
 sibcall_memory_operand.  It allows any registers to
 be as GOT base, which is the root of our problem.
 This patch removes GOT slot from it and handles
 sibcall over GOT slot with *sibcall_GOT_32 and
 *sibcall_value_GOT_32 patterns.  Since I need to
 expose constraints on GOT base register to RA,
 I have to use 2 operands, GOT base and function
 symbol, to describe sibcall over 32-bit GOT slot.
>>>
>>> Please use
>>>
>>>(mem:SI (plus:SI
>>>  (match_operand:SI 0 "register_no_elim_operand" "U")
>>>  (match_operand:SI 1 "GOT32_symbol_operand")))
>>> ...
>>>
>>> to avoid manual rebuild of the operand.
>>>
>>
>> Is this OK?
>>
>
> An updated patch to allow sibcall_memory_operand for RTL
> expansion.  OK for trunk if there is no regression?
>

There is no regressions on x86-64 with -m32.  OK for trunk?

-- 
H.J.


Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Bernd Schmidt

On 12/17/2015 06:44 PM, Kyrill Tkachov wrote:

Perhaps I had underestimated how involved this issue is :)
So if I want to improve the aarch64 situation for GCC 6,
would the recommended course of action be to just define the
QI and HImode compare against zero patterns?


For GCC 6 I think this is the only approach.


Bernd



Re: [BUILDROBOT] "error: null argument where non-null required" on multiple targets

2015-12-17 Thread Jeff Law

On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:

On Tue, 2015-12-15 10:43:58 -0700, Jeff Law  wrote:

On 12/14/2015 01:07 PM, Jan-Benedict Glaw wrote:

On Mon, 2015-12-14 18:54:28 +, Moore, Catherine 
 wrote:

avr-rtems   
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478544
mipsel-elf  
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478844
mipsisa64r2-sde-elf 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478855
mipsisa64sb1-elf
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478865
mips-rtems  
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478877
powerpc-eabialtivec 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478922
powerpc-eabispe 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478932
powerpc-rtems   
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478956
ppc-elf 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478968
sh-superh-elf   
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=479077


Is there an easy way to reproduce the MIPS problems that you
reported?  I don't seem to be able to do it with a cross-compiler
targeting mipsel-elf.


What's your build compiler? For these builds, where it showed up, I'm
using a freshly compiles HEAD/master version. So basically, compile a
current GCC for your build machine:

Right.  This is something that only shows up when using the trunk to build
the crosses.

When I looked, I thought I bisected it to the delayed folding work.


Shall I bisect one of the cases anew, with the "Test value of
_GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that uncovered
it, applied? Starting with some arbitrary old revision?
Yes.  I'd really like to see config-list.mk working again.  The first 
step is always building a test the developers can easily work with.



jeff


Re: [PATCH] Fix PR c++/68831 (superfluous -Waddress warning for C++ delete)

2015-12-17 Thread Patrick Palka
On Thu, Dec 10, 2015 at 6:54 PM, Patrick Palka  wrote:
> Is this OK to commit if bootstrap + regtest on x86_64 succeeds?
>
> gcc/cp/ChangeLog:
>
> PR c++/68831
> * init.c (build_delete): Use a warning sentinel to disable
> -Waddress warnings when building the conditional that tests
> if the operand is NULL.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/68831
> * g++.dg/pr68831.C: New test.

Ping.

> ---
>  gcc/cp/init.c  |  1 +
>  gcc/testsuite/g++.dg/pr68831.C | 10 ++
>  2 files changed, 11 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/pr68831.C
>
> diff --git a/gcc/cp/init.c b/gcc/cp/init.c
> index 5ecf9fb..2fffc61 100644
> --- a/gcc/cp/init.c
> +++ b/gcc/cp/init.c
> @@ -4439,6 +4439,7 @@ build_delete (tree otype, tree addr, 
> special_function_kind auto_delete,
>else
> {
>   /* Handle deleting a null pointer.  */
> + warning_sentinel s (warn_address);
>   ifexp = fold (cp_build_binary_op (input_location,
> NE_EXPR, addr, nullptr_node,
> complain));
> diff --git a/gcc/testsuite/g++.dg/pr68831.C b/gcc/testsuite/g++.dg/pr68831.C
> new file mode 100644
> index 000..8d32819
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr68831.C
> @@ -0,0 +1,10 @@
> +// PR c++/68831
> +// { dg-options "-Waddress" }
> +
> +class DenseMap {
> +public:
> +  ~DenseMap();
> +};
> +extern const DenseMap &GCMap;
> +void foo() { delete &GCMap; }
> +
> --
> 2.6.4.491.gda30757.dirty
>


Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Kyrill Tkachov


On 17/12/15 17:27, Segher Boessenkool wrote:

On Thu, Dec 17, 2015 at 05:12:16PM +0100, Bernd Schmidt wrote:

On 12/17/2015 05:10 PM, Kyrill Tkachov wrote:

Well, this patch still produces the QImode comparison if the target has
a QImode comparison
(the have_insn_for check in the simplify_comparison hunk).

Ok, I didn't look that closely because I had doubts about the approach.
This kind of check also goes somewhat against the principles of just
producing canonical forms of RTL.

The canonicalisation rules exist so that optimisers only need to match
one form instead of several, and machine descriptions only need to
describe one form instead of several.  For this bitmasking case it
perversely forces you to describe the same instruction in many ways,
for many targets.  This is what the change_zero_ext was about as well.

It's not so easy to fix for the compare case.  Maybe the idea of making
genrecog make code that recognises more forms of the same insn will work
out.  GCC 7 in any case...


Perhaps I had underestimated how involved this issue is :)
So if I want to improve the aarch64 situation for GCC 6,
would the recommended course of action be to just define the
QI and HImode compare against zero patterns?

Note that I think the make_extraction hunk from my patch is in line
with the function comment of make_extraction that says:
"   IN_COMPARE is nonzero if we are in a COMPARE.  This means that a
ZERO_EXTRACT should be built even for bits starting at bit 0."

whereas the condition that I'm adding "&& !in_compare" is explicitly trying
to avoid an extraction.

But anyway, if this has the potential to cause negative fallout that I
had not anticipated, it can wait for later.

Thanks,
Kyrill



Segher




Re: [PATCH][AArch64][1/2] PR rtl-optimization/68796 Add compare-of-zero_extract pattern

2015-12-17 Thread Kyrill Tkachov

Hi James,

On 17/12/15 17:24, James Greenhalgh wrote:

On Thu, Dec 17, 2015 at 03:36:40PM +, Kyrill Tkachov wrote:

2015-12-17  Kyrylo Tkachov  

 PR rtl-optimization/68796
 * config/aarch64/aarch64.md (*and3nr_compare0_zextract):
 New pattern.
 * config/aarch64/aarch64.c (aarch64_select_cc_mode): Handle
 ZERO_EXTRACT comparison with zero.
 (aarch64_mask_from_zextract_ops): New function.
 * config/aarch64/aarch64-protos.h (aarch64_mask_from_zextract_ops):
 New prototype.

2015-12-17  Kyrylo Tkachov  

 PR rtl-optimization/68796
 * gcc.target/aarch64/tst_3.c: New test.
 * gcc.target/aarch64/tst_4.c: Likewise.

Two comments.


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
87d6eb1358845527d7068550925949802a7e48e2..febca98d38d5f09c97b0f79adc55bb29eca217b9
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -330,6 +330,7 @@ int aarch64_uxt_size (int, HOST_WIDE_INT);
  int aarch64_vec_fpconst_pow_of_2 (rtx);
  rtx aarch64_final_eh_return_addr (void);
  rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
+rtx aarch64_mask_from_zextract_ops (rtx, rtx);
  const char *aarch64_output_move_struct (rtx *operands);
  rtx aarch64_return_addr (int, rtx);
  rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
cb8955d5d6c909e8179bb1ab8203eb165f55e4b6..58a9fc68f391162ed9847d7fb79d70d3ee9919f5
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4147,7 +4147,9 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
&& y == const0_rtx
&& (code == EQ || code == NE || code == LT || code == GE)
&& (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND
- || GET_CODE (x) == NEG))
+ || GET_CODE (x) == NEG
+ || (GET_CODE (x) == ZERO_EXTRACT && CONST_INT_P (XEXP (x, 1))
+ && CONST_INT_P (XEXP (x, 2)
  return CC_NZmode;
  
/* A compare with a shifted operand.  Because of canonicalization,

@@ -10757,6 +10759,21 @@ aarch64_simd_imm_zero_p (rtx x, machine_mode mode)
return x == CONST0_RTX (mode);
  }
  
+

+/* Return the bitmask CONST_INT to select the bits required by a zero extract
+   operation of width WIDTH at bit position POS.  */
+
+rtx
+aarch64_mask_from_zextract_ops (rtx width, rtx pos)
+{

It is up to you, but would this not more naturally be:

   unsigned HOST_WIDE_INT
   aarch64_mask_from_zextract_ops (rtx width, rtx pos)

Given how it gets used elsewhere?


It gets used in exactly two places, once in the condition of the pattern
where we have to extract its UINTVAL and once when outputting the assembly
string where we want the rtx wrapper around it to assign it to operands[1],
so I'd argue it's a 50-50 choice.
So I'll leave it as it is unless you have a strong preference.


+  gcc_assert (CONST_INT_P (width));
+  gcc_assert (CONST_INT_P (pos));
+
+  unsigned HOST_WIDE_INT mask
+= ((unsigned HOST_WIDE_INT)1 << UINTVAL (width)) - 1;

Space between (unsigned HOST_WIDE_INT) and 1.



Consider it done.
Thanks,
Kyrill


+  return GEN_INT (mask << UINTVAL (pos));
+}
+
  bool
  aarch64_simd_imm_scalar_p (rtx x, machine_mode mode ATTRIBUTE_UNUSED)
  {

Otherwise, this is OK.

Thanks,
James





Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Jeff Law

On 12/17/2015 10:04 AM, Kyrill Tkachov wrote:


In this case, I'm expecting a QImode compare with zero to map down to
the aarch64 TST reg, #255 instruction which
definitely zeroes out any bits outside of QImode (as it is a bitwise AND
with a bitmask),
so zero_extract is the more correct expression here, no?
It's more about the semantics of the code and how it interacts with RTL 
generation, optimization and analysis than it is with the final assembly 
generated by the backend that drives SUBREG vs zero_extract.


The backend assembly code generator is free to implement stricter 
semantics (such as defining all the bits for a paradoxical subreg), but 
the rest of the compiler can not depend on those stricter semantics.


The easiest way to think about the subreg case here is that it's used 
when we've got a narrow object that we want to view in a wider mode, but 
we don't actually care about the upper bits.  The widening is merely to 
make the mode match another operand.



zero_extract is still the canonical form.  subreg is a specialized form 
for cases where the upper bits are "don't care" values.  This should 
probably be documented as the current state of the world.


I think it's an open question whether or not to drop the subreg form and 
always use zero-extract.  I've certainly seen cases where the former is 
*supposed* to allow better code generation, but in fact actually gets in 
the way resulting in poorer code generation.


Jeff


Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Segher Boessenkool
On Thu, Dec 17, 2015 at 05:12:16PM +0100, Bernd Schmidt wrote:
> On 12/17/2015 05:10 PM, Kyrill Tkachov wrote:
> >Well, this patch still produces the QImode comparison if the target has
> >a QImode comparison
> >(the have_insn_for check in the simplify_comparison hunk).
> 
> Ok, I didn't look that closely because I had doubts about the approach. 
> This kind of check also goes somewhat against the principles of just 
> producing canonical forms of RTL.

The canonicalisation rules exist so that optimisers only need to match
one form instead of several, and machine descriptions only need to
describe one form instead of several.  For this bitmasking case it
perversely forces you to describe the same instruction in many ways,
for many targets.  This is what the change_zero_ext was about as well.

It's not so easy to fix for the compare case.  Maybe the idea of making
genrecog make code that recognises more forms of the same insn will work
out.  GCC 7 in any case...


Segher


Re: [PATCH][AArch64][1/2] PR rtl-optimization/68796 Add compare-of-zero_extract pattern

2015-12-17 Thread James Greenhalgh
On Thu, Dec 17, 2015 at 03:36:40PM +, Kyrill Tkachov wrote:
> 2015-12-17  Kyrylo Tkachov  
> 
> PR rtl-optimization/68796
> * config/aarch64/aarch64.md (*and3nr_compare0_zextract):
> New pattern.
> * config/aarch64/aarch64.c (aarch64_select_cc_mode): Handle
> ZERO_EXTRACT comparison with zero.
> (aarch64_mask_from_zextract_ops): New function.
> * config/aarch64/aarch64-protos.h (aarch64_mask_from_zextract_ops):
> New prototype.
> 
> 2015-12-17  Kyrylo Tkachov  
> 
> PR rtl-optimization/68796
> * gcc.target/aarch64/tst_3.c: New test.
> * gcc.target/aarch64/tst_4.c: Likewise.

Two comments.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 87d6eb1358845527d7068550925949802a7e48e2..febca98d38d5f09c97b0f79adc55bb29eca217b9
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -330,6 +330,7 @@ int aarch64_uxt_size (int, HOST_WIDE_INT);
>  int aarch64_vec_fpconst_pow_of_2 (rtx);
>  rtx aarch64_final_eh_return_addr (void);
>  rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
> +rtx aarch64_mask_from_zextract_ops (rtx, rtx);
>  const char *aarch64_output_move_struct (rtx *operands);
>  rtx aarch64_return_addr (int, rtx);
>  rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> cb8955d5d6c909e8179bb1ab8203eb165f55e4b6..58a9fc68f391162ed9847d7fb79d70d3ee9919f5
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4147,7 +4147,9 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
>&& y == const0_rtx
>&& (code == EQ || code == NE || code == LT || code == GE)
>&& (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == 
> AND
> -   || GET_CODE (x) == NEG))
> +   || GET_CODE (x) == NEG
> +   || (GET_CODE (x) == ZERO_EXTRACT && CONST_INT_P (XEXP (x, 1))
> +   && CONST_INT_P (XEXP (x, 2)
>  return CC_NZmode;
>  
>/* A compare with a shifted operand.  Because of canonicalization,
> @@ -10757,6 +10759,21 @@ aarch64_simd_imm_zero_p (rtx x, machine_mode mode)
>return x == CONST0_RTX (mode);
>  }
>  
> +
> +/* Return the bitmask CONST_INT to select the bits required by a zero extract
> +   operation of width WIDTH at bit position POS.  */
> +
> +rtx
> +aarch64_mask_from_zextract_ops (rtx width, rtx pos)
> +{

It is up to you, but would this not more naturally be:

  unsigned HOST_WIDE_INT
  aarch64_mask_from_zextract_ops (rtx width, rtx pos)

Given how it gets used elsewhere?

> +  gcc_assert (CONST_INT_P (width));
> +  gcc_assert (CONST_INT_P (pos));
> +
> +  unsigned HOST_WIDE_INT mask
> += ((unsigned HOST_WIDE_INT)1 << UINTVAL (width)) - 1;

Space between (unsigned HOST_WIDE_INT) and 1.

> +  return GEN_INT (mask << UINTVAL (pos));
> +}
> +
>  bool
>  aarch64_simd_imm_scalar_p (rtx x, machine_mode mode ATTRIBUTE_UNUSED)
>  {

Otherwise, this is OK.

Thanks,
James



Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Kyrill Tkachov

Hi Jeff,

On 17/12/15 16:59, Jeff Law wrote:

On 12/17/2015 09:26 AM, Kyrill Tkachov wrote:

One could argue that if the target has (or advertises having) a native
QImode register comparison then it's objectively a simplification to
transform a comparison in a wider mode
to a comparison in the shorter mode.

Generally true.

The most commonly cited exception is any port that defines WORD_REGISTER_OPERATIONS.  However, I would be comfortable with the idea that defining QImode comparisons on a target with WORD_REGISTER_OPERATIONS is a pretty explicit indication 
that it wants to try and shorten comparisons for one reason or another.





I was investigating WORD_REGISTER_OPERATIONS as part of this. But we can't 
define it for aarch64.
In any case, aarch64 doesn't have QImode registers so I thought we'd try to 
avoid creating them.






If, however, the target doesn't have such an instruction (like aarch64
doesn't have QImode registers) then
truncating the wider mode to QImode through a subreg is not less complex
than a zero_extract, as both will
involve some form of extracting/masking the desired QImode bits. So
picking a canonical form there makes sense,
and the documentation already specifies the zero_extract form as the
canonical.

Would be nice to get a definite clarification on whether the subreg form
is indeed the canonical one.
The subreg style "extension" isn't really an extension.  It is a way to say that we want to look at the object in a wider mode, but we don't actually care about the upper bits.  It's generally expected that the subreg won't result in the 
generation of any code.


A zero extract defines all the bits.


In this case, I'm expecting a QImode compare with zero to map down to the 
aarch64 TST reg, #255 instruction which
definitely zeroes out any bits outside of QImode (as it is a bitwise AND with a 
bitmask),
so zero_extract is the more correct expression here, no?




In theory the optimizers can use a SUBREG just like they could a REG, which 
should enable additional optimization.  In practice I don't think that's been 
as true as we'd like.

jeff





Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Jeff Law

On 12/17/2015 09:26 AM, Kyrill Tkachov wrote:

One could argue that if the target has (or advertises having) a native
QImode register comparison then it's objectively a simplification to
transform a comparison in a wider mode
to a comparison in the shorter mode.

Generally true.

The most commonly cited exception is any port that defines 
WORD_REGISTER_OPERATIONS.  However, I would be comfortable with the idea 
that defining QImode comparisons on a target with 
WORD_REGISTER_OPERATIONS is a pretty explicit indication that it wants 
to try and shorten comparisons for one reason or another.







If, however, the target doesn't have such an instruction (like aarch64
doesn't have QImode registers) then
truncating the wider mode to QImode through a subreg is not less complex
than a zero_extract, as both will
involve some form of extracting/masking the desired QImode bits. So
picking a canonical form there makes sense,
and the documentation already specifies the zero_extract form as the
canonical.

Would be nice to get a definite clarification on whether the subreg form
is indeed the canonical one.
The subreg style "extension" isn't really an extension.  It is a way to 
say that we want to look at the object in a wider mode, but we don't 
actually care about the upper bits.  It's generally expected that the 
subreg won't result in the generation of any code.


A zero extract defines all the bits.

In theory the optimizers can use a SUBREG just like they could a REG, 
which should enable additional optimization.  In practice I don't think 
that's been as true as we'd like.


jeff



Re: [Patch, avr] Provide correct memory move costs

2015-12-17 Thread Denis Chertykov
2015-12-16 10:08 GMT+03:00 Senthil Kumar Selvaraj
:
> Hi,
>
>   When analyzing code size regressions for AVR for top-of-trunk, I
>   found a few cases where aggresive inlining (by the middle-end)
>   of functions containing calls to memcpy was bloating up the code.
>
>   Turns out that the AVR backend has MOVE_MAX set to 4 (unchanged from the
>   original commit), when it really should be 1, as the AVRs can only
>   move a single byte between reg and memory in a single instruction.
>   Setting it to 4 causes the middle end to underestimate the
>   cost of memcopys with a compile time constant length parameter, as it
>   thinks a 4 byte copy's cost is only a single instruction.
>
>   Just setting MOVE_MAX to 1 makes the middle end too conservative
>   though, and causes a bunch of regression tests to fail, as lots of
>   optimizations fail to pass the code size increase threshold check,
> even when not optimizing for size.
>
>   Instead, the below patch sets MOVE_MAX_PIECES to 2, and implements a
>   target hook that tells the middle-end to use load/store insns for
>   memory moves upto two bytes. Also, the patch sets MOVE_RATIO to 3 when
>   optimizing for speed, so that moves upto 4 bytes will occur through
>   load/store sequences, like it does now.
>
>   With this, only a couple of regression tests fail. uninit-19.c fails
>   because it thinks only non-pic code won't inline a function, but the
>   cost computation prevents inlining for AVRs. The test passes if
>   the optimization level is increased to -O3.
>
> strlenopt-8.c has an XPASS and a FAIL because a previous pass issued
> a builtin_memcpy instead of a MEM assignment. Execution still passes.
>
>   I'll continue running more tests to see if there are other performance
>   related consequences.
>
>   Is this ok? If ok, could someone commit please? I don't have commit
>   access.
>
> Regards
> Senthil
>
> gcc/ChangeLog
>
> 2015-12-16  Senthil Kumar Selvaraj  
>
> * config/avr/avr.h (MOVE_MAX): Set value to 1.
> (MOVE_MAX_PIECES): Define.
> (MOVE_RATIO): Define.
> * config/avr/avr.c (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P):
> Provide target hook.
> (avr_use_by_pieces_infrastructure_p): New function.

Committed.

Denis.


Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Jeff Law

On 12/17/2015 08:58 AM, Bernd Schmidt wrote:


I suspect that this is an oversight in the documentation, and if given
two choices the simpler form is intended to be the canonical one.
The other BZ I was looking at in this space was 15596.  It's PPC, but 
shows a generic weakness in how we identify extractions and insertions. 
 Fixing it would probably help all the ports that have relatively 
strong methods to set/clear a series of bits in the middle of a word.


It feels like combine has all the information necessary to improve 
things, but the overall combiner flow and APIs are extremely uncooperative.


jeff



C++ PATCH for c++/67550 (wrong value for reference to const class var)

2015-12-17 Thread Jason Merrill
In my rework of decl_constant_value and kin, I enabled its use in more 
places, which revealed a problem: when it is allowing non-constexpr 
aggregate initializers, we need to double-check that we aren't returning 
something that had initializers stripped out in split_nonconstant_init.


Tested x86_64-pc-linux-gnu, applying to trunk and 5.
commit 230c77258443c76da9837a078f0326fee3311f02
Author: Jason Merrill 
Date:   Thu Dec 17 09:56:20 2015 -0500

	PR c++/67550
	* init.c (constant_value_1): Don't return a CONSTRUCTOR missing
	non-constant elements.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index a08f7d7..b7f10a1 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2093,6 +2093,11 @@ constant_value_1 (tree decl, bool strict_p, bool return_aggregate_cst_ok_p)
 	  && (TREE_CODE (init) == CONSTRUCTOR
 		  || TREE_CODE (init) == STRING_CST)))
 	break;
+  /* Don't return a CONSTRUCTOR for a variable with partial run-time
+	 initialization, since it doesn't represent the entire value.  */
+  if (TREE_CODE (init) == CONSTRUCTOR
+	  && !DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl))
+	break;
   decl = unshare_expr (init);
 }
   return decl;
diff --git a/gcc/testsuite/g++.dg/init/aggr13.C b/gcc/testsuite/g++.dg/init/aggr13.C
new file mode 100644
index 000..08248a6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/aggr13.C
@@ -0,0 +1,17 @@
+// PR c++/67550
+// { dg-do run }
+
+struct S {
+  int x;
+  int y;
+};
+int foo() { return 1; }
+
+int main() {
+  S const data[] = {{0, foo()}};
+
+  S data2[] = {data[0]};
+
+  if (!data2[0].y)
+__builtin_abort();
+}


PATCH: PR target/66232: -fPIC -fno-plt -mx32 fails to generate indirect branch via GOT

2015-12-17 Thread H.J. Lu
Since Pmode is 64-bit with -maddress-mode=long for x32, indirect call
via GOT slot doesn't need zero_extend.  This patch limits *call_got_x32
and *call_value_got_x32 patterns to 32-bit Pmode, adds *call_got_x32_long
and *call_value_got_x32_long for 64-bit Pmode.

OK for trunk if there is no regression?


H.J.
---
gcc/

PR target/66232
* config/i386/i386.md (*call_got_x32): Limited to 32-bit Pmode.
(*call_value_got_x32): Likewise.
(*call_got_x32_long): New pattern.
(call_value_got_x32_long): Likewise.

gcc/testsuite/

PR target/66232
* gcc.target/i386/pr66232-10.c: New test.
* gcc.target/i386/pr66232-11.c: Likewise.
* gcc.target/i386/pr66232-12.c: Likewise.
* gcc.target/i386/pr66232-13.c: Likewise.
---
 gcc/config/i386/i386.md| 19 +--
 gcc/testsuite/gcc.target/i386/pr66232-10.c | 13 +
 gcc/testsuite/gcc.target/i386/pr66232-11.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr66232-12.c | 13 +
 gcc/testsuite/gcc.target/i386/pr66232-13.c | 13 +
 5 files changed, 70 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-13.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 49b2216..dc61050 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11861,7 +11861,14 @@
   [(call (mem:QI (zero_extend:DI
   (match_operand:SI 0 "GOT_memory_operand" "Bg")))
 (match_operand 1))]
-  "TARGET_X32"
+  "TARGET_X32 && Pmode == SImode"
+  "* return ix86_output_call_insn (insn, operands[0]);"
+  [(set_attr "type" "call")])
+
+(define_insn "*call_got_x32_long"
+  [(call (mem:QI (match_operand:DI 0 "GOT_memory_operand" "Bg"))
+(match_operand 1))]
+  "TARGET_X32 && Pmode == DImode"
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
@@ -12038,7 +12045,15 @@
(zero_extend:DI
  (match_operand:SI 1 "GOT_memory_operand" "Bg")))
  (match_operand 2)))]
-  "TARGET_X32"
+  "TARGET_X32 && Pmode == SImode"
+  "* return ix86_output_call_insn (insn, operands[1]);"
+  [(set_attr "type" "callv")])
+
+(define_insn "*call_value_got_x32_long"
+  [(set (match_operand 0)
+   (call (mem:QI (match_operand:DI 1 "GOT_memory_operand" "Bg"))
+ (match_operand 2)))]
+  "TARGET_X32 && Pmode == DImode"
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
diff --git a/gcc/testsuite/gcc.target/i386/pr66232-10.c 
b/gcc/testsuite/gcc.target/i386/pr66232-10.c
new file mode 100644
index 000..c4e9157
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66232-10.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-options "-O2 -mx32 -fpic -fno-plt -maddress-mode=long" } */
+
+extern void bar (void);
+
+void
+foo (void)
+{
+  bar ();
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*.bar@GOTPCREL" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr66232-11.c 
b/gcc/testsuite/gcc.target/i386/pr66232-11.c
new file mode 100644
index 000..05794af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66232-11.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-options "-O2 -mx32 -fpic -fno-plt -maddress-mode=long" } */
+
+extern void bar (void);
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "call\[ \t\]*.bar@GOTPCREL" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr66232-12.c 
b/gcc/testsuite/gcc.target/i386/pr66232-12.c
new file mode 100644
index 000..313b9e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66232-12.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-options "-O2 -mx32 -fpic -fno-plt -maddress-mode=long" } */
+
+extern int bar (void);
+
+int
+foo (void)
+{
+  return bar ();
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*.bar@GOTPCREL" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr66232-13.c 
b/gcc/testsuite/gcc.target/i386/pr66232-13.c
new file mode 100644
index 000..50a12cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66232-13.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-options "-O2 -mx32 -fpic -fno-plt -maddress-mode=long" } */
+
+extern int bar (void);
+
+int
+foo (void)
+{
+  return bar () + 1;
+}
+
+/* { dg-final { scan-assembler "call\[ \t\]*.bar@GOTPCREL" } } */
-- 
2.5.0



[COMMITTED] Add myself to MAINTAINERS (Write After Approval)

2015-12-17 Thread Andris Pavenis

Just committed.revision 231774

Andris

Index: MAINTAINERS
===
--- MAINTAINERS (revision 231774)
+++ MAINTAINERS (working copy)
@@ -525,6 +525,7 @@
  Patrick Palka
  Seongbae Park
  Devang Patel
+Andris Pavenis
  Fernando Pereira
  Kaushik Phatak
  Nicolas Pitre
Index: ChangeLog
===
--- ChangeLog   (revision 231774)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2015-12-17  Andris Pavenis
+
+   * MAINTAINERS (Write After Approval): Add Myself.
+
  2015-12-17  Nathan Sidwell

 * config/isl.m4 (ISL_CHECK_VERSION): Add gmp libs.



Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Jeff Law

On 12/17/2015 08:58 AM, Bernd Schmidt wrote:

On 12/17/2015 04:36 PM, Kyrill Tkachov wrote:

The documentation on RTL canonical forms in md.texi says:

"Equality comparisons of a group of bits (usually a single bit) with zero
  will be written using @code{zero_extract} rather than the equivalent
  @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to
optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.


I suspect that this is an oversight in the documentation, and if given
two choices the simpler form is intended to be the canonical one.
It's also the case that sometimes a SUBREG is preferred because it 
conveys that certain bits are "don't care".  In theory this may allow 
things to optimize better.


However, in practice, I'm not sure that's regularly the case because 
various passes are weak in trying to exploit the semantics of the SUBREG 
and passes are generally pretty strong in their handling of zero_extract 
and friends.


IIRC I actually bumped against this in the gcc-5 cycle when fixing some 
suboptimal code generation issues.  I think it was BZ15184.  I'd check 
the archives for Dec 2014 and Jan 2015.  There may be a mention of this 
issue in there from me (I can recall bumping into it, but can't recall 
if I ever did mentioned it publicly or if I ever submitted the change to 
prefer the zero_extract form over the subreg form.


Jeff


C++ PATCH for c++/67576 (multiple evaluation of typeid operand)

2015-12-17 Thread Jason Merrill
When I changed build_typeid to take the address of a polymorphic operand 
rather than using the lvalue directly, I forgot the parallel change from 
stabilize_reference to save_expr.


Tested x86_64-pc-linux-gnu, applying to trunk and 5.
commit 5361caf55040d2a15b5ebb5ff0fc1e3e605dba9c
Author: Jason Merrill 
Date:   Thu Dec 17 00:10:20 2015 -0500

	PR c++/67576
	PR c++/25466
	* rtti.c (build_typeid): Use save_expr, not stabilize_reference.

diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index b397b55..f42b1cb 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -332,7 +332,7 @@ build_typeid (tree exp, tsubst_flags_t complain)
   /* So we need to look into the vtable of the type of exp.
  Make sure it isn't a null lvalue.  */
   exp = cp_build_addr_expr (exp, complain);
-  exp = stabilize_reference (exp);
+  exp = save_expr (exp);
   cond = cp_convert (boolean_type_node, exp, complain);
   exp = cp_build_indirect_ref (exp, RO_NULL, complain);
 }
diff --git a/gcc/testsuite/g++.dg/rtti/typeid11.C b/gcc/testsuite/g++.dg/rtti/typeid11.C
new file mode 100644
index 000..384b0f4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/rtti/typeid11.C
@@ -0,0 +1,16 @@
+// { dg-do run }
+
+#include 
+
+struct Base { virtual void foo() {} }; // polymorphic
+
+int main()
+{
+  Base b;
+  Base *ary[] = { &b, &b, &b};
+
+  int iter = 0;
+  typeid(*ary[iter++]);
+  if (iter != 1)	// should be 1
+__builtin_abort();	// but 2
+}


Re: [PATCH] Fix PR68852

2015-12-17 Thread Kyrill Tkachov


On 14/12/15 15:14, Richard Biener wrote:

The following fixes PR68852 - so I finally needed to sit down and
fix the "build-from-scalars" hack in the SLP vectorizer by pretending
we'd have a sane vectorizer IL.  Basically I now mark the SLP node
with a proper vect_def_type but I have to push that down to the
stmt-info level whenever sth would look at it.

It's a bit ugly but not too much yet ;)

Anyway, the proper fix is to have a sane data structure, nothing for
GCC 6 though.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Verified SPEC CPU 2006 is happy with the patch.


Unfortunately it's not very happy on aarch64 ;)
416.gamess and the trans.fppized.f in particular ICEs after this patch with

trans.fppized.f:2086:0:

   SUBROUTINE TRFMCX(NPRINT,ICORBS,IORBS,IORB,DOFOCK,DOEXCH,


internal compiler error: in vect_analyze_stmt, at tree-vect-stmts.c:8013
0xd34d1b vect_analyze_stmt(gimple*, bool*, _slp_tree*)
$SRC/tree-vect-stmts.c:8013
0xd4b64a vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2237
0xd4b533 vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2221
0xd4b533 vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2221
0xd4b533 vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2221
0xd4b533 vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2221
0xd4f7dc vect_slp_analyze_operations(vec<_slp_instance*, va_heap, vl_ptr>, 
void*)
$SRC/tree-vect-slp.c:2269
0xd546a0 vect_slp_analyze_bb_1
$SRC/tree-vect-slp.c:2543
0xd546a0 vect_slp_bb(basic_block_def*)
$SRC/tree-vect-slp.c:2630
0xd56985 execute
$SRC/tree-vectorizer.c:759
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

when using the flags
-mcpu=cortex-a53+crypto -save-temps -Ofast -fomit-frame-pointer 
-fno-aggressive-loop-optimizations

I'll open a bug report to keep track of it.

Thanks,
Kyrill


Richard.

2015-12-14  Richard Biener  

PR tree-optimization/68852
* tree-vectorizer.h (struct _slp_tree): Add def_type member.
(SLP_TREE_DEF_TYPE): New accessor.
* tree-vect-stmts.c (vect_is_simple_use): Remove BB vectorization
hack.
* tree-vect-slp.c (vect_create_new_slp_node): Initialize
SLP_TREE_DEF_TYPE.
(vect_build_slp_tree): When a node is to be built up from scalars
do not push a NULL as child but instead set its def_type to
vect_external_def.
(vect_analyze_slp_cost_1): Check for child def-type instead
of NULL.
(vect_detect_hybrid_slp_stmts): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_get_slp_defs): Likewise.
(vect_slp_analyze_node_operations): Likewise.  Before
processing node push the children def-types to the underlying
stmts vinfo and restore it afterwards.
(vect_schedule_slp_instance): Likewise.
(vect_slp_analyze_bb_1): Do not mark stmts not in SLP instances
as not vectorizable.

* g++.dg/torture/pr68852.C: New testcase.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 231552)
--- gcc/tree-vectorizer.h   (working copy)
*** struct _slp_tree {
*** 107,112 
--- 107,114 
 unsigned int vec_stmts_size;
 /* Whether the scalar computations use two different operators.  */
 bool two_operators;
+   /* The DEF type of this node.  */
+   enum vect_def_type def_type;
   };
   
   
*** typedef struct _slp_instance {

*** 139,144 
--- 141,147 
   #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)  (S)->vec_stmts_size
   #define SLP_TREE_LOAD_PERMUTATION(S) (S)->load_permutation
   #define SLP_TREE_TWO_OPERATORS(S) (S)->two_operators
+ #define SLP_TREE_DEF_TYPE(S)   (S)->def_type
   
   
   
Index: gcc/tree-vect-stmts.c

===
*** gcc/tree-vect-stmts.c   (revision 231552)
--- gcc/tree-vect-stmts.c   (working copy)
*** vect_is_simple_use (tree operand, vec_in
*** 8649,8658 
 else
   {
 stmt_vec_info stmt_vinfo = vinfo_for_stmt (*def_stmt);
!   if (is_a  (vinfo) && !STMT_VINFO_VECTORIZABLE (stmt_vinfo))
!   *dt = vect_external_def;
!   else
!   *dt = STMT_VINFO_DEF_TYPE (stmt_vinfo);
   }
   
 if (dump_enabled_p ())

--- 8652,8658 
 else
   {
 stmt_vec_info stmt_vinfo = vinfo_for_stmt (*def_stmt);
!   *dt = STMT_VINFO_DEF_TYPE (stmt_vinfo);
   }
   
 if (dump_enabled_p ())

Index: gcc/testsuite/g++.dg/torture/pr68852.C
===
--- gcc/testsuite/g++.dg/torture/pr68852.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr68852.C  (wo

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Kyrill Tkachov


On 17/12/15 16:12, Bernd Schmidt wrote:

On 12/17/2015 05:10 PM, Kyrill Tkachov wrote:

Well, this patch still produces the QImode comparison if the target has
a QImode comparison
(the have_insn_for check in the simplify_comparison hunk).


Ok, I didn't look that closely because I had doubts about the approach. This 
kind of check also goes somewhat against the principles of just producing 
canonical forms of RTL.



One could argue that if the target has (or advertises having) a native
QImode register comparison then it's objectively a simplification to transform 
a comparison in a wider mode
to a comparison in the shorter mode.

If, however, the target doesn't have such an instruction (like aarch64 doesn't 
have QImode registers) then
truncating the wider mode to QImode through a subreg is not less complex than a 
zero_extract, as both will
involve some form of extracting/masking the desired QImode bits. So picking a 
canonical form there makes sense,
and the documentation already specifies the zero_extract form as the canonical.

Would be nice to get a definite clarification on whether the subreg form is 
indeed the canonical one.
Then we can document it and I can just add a QI/HImode compare pattern to 
aarch64.

Thanks,
Kyrill



Bernd





Re: [Fortran, Patch] (RFC, Coarray) Implement TS18508's EVENTS

2015-12-17 Thread Alessandro Fanfarillo
Great! Thanks.

2015-12-17 15:57 GMT+01:00 Steve Kargl :
> On Thu, Dec 17, 2015 at 01:22:06PM +0100, Alessandro Fanfarillo wrote:
>>
>> I've noticed that this patch has been applied only on trunk and not on
>> the gcc-5-branch. Is it a problem to include EVENTS in gcc-5?
>>
>
> No problem.  When I applied the EVENTS patch to trunk,
> the 5.3 release was being prepared.  I was going to
> wait for a week or two after 5.3 came out, then apply
> the patch.  Now that you have commit access, feel
> free to back port the patch.  Rememer to post the
> patch that you commit to both the fortran and gcc-patches
> list.
>
> --
> Steve


Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread H.J. Lu
On Thu, Dec 17, 2015 at 7:50 AM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak  wrote:
>> On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
>>> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
 On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
> Since sibcall never returns, we can only use call-clobbered register
> as GOT base.  Otherwise, callee-saved register used as GOT base won't
> be properly restored.
>
> Tested on x86-64 with -m32.  OK for trunk?

 You don't have to add explicit clobber for members of "CLOBBERED_REGS"
 class, and register_no_elim_operand predicate should be used with "U"
 constraint. Also, please introduce new predicate, similar to how
 GOT_memory_operand is defined and handled.

>>>
>>> Here is the updated patch.  There is a predicate already,
>>> sibcall_memory_operand.  It allows any registers to
>>> be as GOT base, which is the root of our problem.
>>> This patch removes GOT slot from it and handles
>>> sibcall over GOT slot with *sibcall_GOT_32 and
>>> *sibcall_value_GOT_32 patterns.  Since I need to
>>> expose constraints on GOT base register to RA,
>>> I have to use 2 operands, GOT base and function
>>> symbol, to describe sibcall over 32-bit GOT slot.
>>
>> Please use
>>
>>(mem:SI (plus:SI
>>  (match_operand:SI 0 "register_no_elim_operand" "U")
>>  (match_operand:SI 1 "GOT32_symbol_operand")))
>> ...
>>
>> to avoid manual rebuild of the operand.
>>
>
> Is this OK?
>

An updated patch to allow sibcall_memory_operand for RTL
expansion.  OK for trunk if there is no regression?

Thanks.


-- 
H.J.
From dffd3a70b9788174f9b279ff27bf72dbc2384659 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 16 Dec 2015 12:34:57 -0800
Subject: [PATCH] Use call-clobbered register for sibcall via GOT

Since sibcall never returns, we can only use call-clobbered register as
GOT base.  Otherwise, callee-saved register used as GOT base won't be
properly restored.  sibcall_memory_operand is changed to allow 32-bit
GOT slot only with pseudo register as GOT base for RTL expansion.  2
new patterns, *sibcall_GOT_32 and *sibcall_value_GOT_32, are added to
expose GOT base register to register allocator so that call-clobbered
register will be used for GOT base.

gcc/

	PR target/68937
	* config/i386/i386.c (ix86_function_ok_for_sibcall): Count
	call via GOT slot as indirect call.
	* config/i386/i386.md (*sibcall_GOT_32): New pattern.
	(*sibcall_value_GOT_32): Likewise.
	* config/i386/predicates.md (sibcall_memory_operand): Allow
	32-bit GOT slot only with pseudo register as GOT base.
	(GOT32_symbol_operand): New predicate.

gcc/testsuite/

	PR target/68937
	* gcc.target/i386/pr68937-1.c: New test.
	* gcc.target/i386/pr68937-2.c: Likewise.
	* gcc.target/i386/pr68937-3.c: Likewise.
	* gcc.target/i386/pr68937-4.c: Likewise.
	* gcc.target/i386/pr68937-5.c: Likewise.
---
 gcc/config/i386/i386.c|  4 +++-
 gcc/config/i386/i386.md   | 33 +++
 gcc/config/i386/predicates.md | 12 +++
 gcc/testsuite/gcc.target/i386/pr68937-1.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-2.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-3.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-4.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-5.c |  9 +
 8 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-5.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cecea24..0e2bec3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6723,8 +6723,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   /* If this call is indirect, we'll need to be able to use a
 	 call-clobbered register for the address of the target function.
 	 Make sure that all such registers are not used for passing
-	 parameters.  Note that DLLIMPORT functions are indirect.  */
+	 parameters.  Note that DLLIMPORT functions and call via GOT
+	 slot are indirect.  */
   if (!decl
+	  || (flag_pic && !flag_plt)
 	  || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
 	{
 	  /* Check if regparm >= 3 since arg_reg_available is set to
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 49b2216..6ab8eaa 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11865,6 +11865,22 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.
+(define_insn "*sibcall_GOT_32"
+  [(call (mem:QI
+	   (mem:SI (plus:SI
+		 (match_operand:SI 0 "register_

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Bernd Schmidt

On 12/17/2015 05:10 PM, Kyrill Tkachov wrote:

Well, this patch still produces the QImode comparison if the target has
a QImode comparison
(the have_insn_for check in the simplify_comparison hunk).


Ok, I didn't look that closely because I had doubts about the approach. 
This kind of check also goes somewhat against the principles of just 
producing canonical forms of RTL.



Bernd


Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Kyrill Tkachov


On 17/12/15 15:58, Bernd Schmidt wrote:

On 12/17/2015 04:36 PM, Kyrill Tkachov wrote:

The documentation on RTL canonical forms in md.texi says:

"Equality comparisons of a group of bits (usually a single bit) with zero
  will be written using @code{zero_extract} rather than the equivalent
  @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.


I suspect that this is an oversight in the documentation, and if given two 
choices the simpler form is intended to be the canonical one.


it ends up trying to make a QImode comparison against zero, for which
targets like
aarch64 have no pattern.


So, can you define a pattern for it...


To get the benefit on aarch64 this needs patch 1/2 that adds an aarch64
pattern
for comparing a zero_extract with zero.


... instead of this one?



Yes, I had investigated that approach and it has the same effect (on aarch64).
My motivation for this approach was to try avoiding defining multiple patterns 
for what should
be equivalent expressions. But if the short subreg form is intended to be the 
canonical form...


What do people think of this approach?
I hope this just enforces the already documented canonicalisation rules
with minimal(none?) negative
fallout.


I'm not so sure about this. Other ports have QImode comparisons and I would 
want to see some evidence that there are no code quality regressions. This is 
not stage 3 material in any case.



Well, this patch still produces the QImode comparison if the target has a 
QImode comparison
(the have_insn_for check in the simplify_comparison hunk).
As I said, the effects on arm and aarch64 were strictly beneficial.
On x86_64 I saw no codegen difference on SPEC2006.
If this is considered too risky at this stage I can propose a QImode pattern for
aarch64 instead to isolate this fix to that backend.

Thanks,
Kyrill



Bernd




Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Bernd Schmidt

On 12/17/2015 04:36 PM, Kyrill Tkachov wrote:

The documentation on RTL canonical forms in md.texi says:

"Equality comparisons of a group of bits (usually a single bit) with zero
  will be written using @code{zero_extract} rather than the equivalent
  @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.


I suspect that this is an oversight in the documentation, and if given 
two choices the simpler form is intended to be the canonical one.



it ends up trying to make a QImode comparison against zero, for which
targets like
aarch64 have no pattern.


So, can you define a pattern for it...


To get the benefit on aarch64 this needs patch 1/2 that adds an aarch64
pattern
for comparing a zero_extract with zero.


... instead of this one?


What do people think of this approach?
I hope this just enforces the already documented canonicalisation rules
with minimal(none?) negative
fallout.


I'm not so sure about this. Other ports have QImode comparisons and I 
would want to see some evidence that there are no code quality 
regressions. This is not stage 3 material in any case.



Bernd


Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread H.J. Lu
On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak  wrote:
> On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
>> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
>>> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
 Since sibcall never returns, we can only use call-clobbered register
 as GOT base.  Otherwise, callee-saved register used as GOT base won't
 be properly restored.

 Tested on x86-64 with -m32.  OK for trunk?
>>>
>>> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
>>> class, and register_no_elim_operand predicate should be used with "U"
>>> constraint. Also, please introduce new predicate, similar to how
>>> GOT_memory_operand is defined and handled.
>>>
>>
>> Here is the updated patch.  There is a predicate already,
>> sibcall_memory_operand.  It allows any registers to
>> be as GOT base, which is the root of our problem.
>> This patch removes GOT slot from it and handles
>> sibcall over GOT slot with *sibcall_GOT_32 and
>> *sibcall_value_GOT_32 patterns.  Since I need to
>> expose constraints on GOT base register to RA,
>> I have to use 2 operands, GOT base and function
>> symbol, to describe sibcall over 32-bit GOT slot.
>
> Please use
>
>(mem:SI (plus:SI
>  (match_operand:SI 0 "register_no_elim_operand" "U")
>  (match_operand:SI 1 "GOT32_symbol_operand")))
> ...
>
> to avoid manual rebuild of the operand.
>

Is this OK?

Thanks.

-- 
H.J.
From 9a5818415f9de92454ee555e8d8c3bd675fe30dd Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 16 Dec 2015 12:34:57 -0800
Subject: [PATCH] Use call-clobbered register for sibcall via GOT

Since sibcall never returns, we can only use call-clobbered register
as GOT base.  Otherwise, callee-saved register used as GOT base won't
be properly restored.

gcc/

	PR target/68937
	* config/i386/i386.c (ix86_function_ok_for_sibcall): Count
	call via GOT slot as indirect call.
	* config/i386/i386.md (*sibcall_GOT_32): New pattern.
	(*sibcall_value_GOT_32): Likewise.
	* config/i386/predicates.md (sibcall_memory_operand): Remove
	GOT slot.
	(GOT32_symbol_operand): New predicate.

gcc/testsuite/

	PR target/68937
	* gcc.target/i386/pr68937-1.c: New test.
	* gcc.target/i386/pr68937-2.c: Likewise.
	* gcc.target/i386/pr68937-3.c: Likewise.
	* gcc.target/i386/pr68937-4.c: Likewise.
	* gcc.target/i386/pr68937-5.c: Likewise.
---
 gcc/config/i386/i386.c|  4 +++-
 gcc/config/i386/i386.md   | 33 +++
 gcc/config/i386/predicates.md | 16 +--
 gcc/testsuite/gcc.target/i386/pr68937-1.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-2.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-3.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-4.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-5.c |  9 +
 8 files changed, 107 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-5.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cecea24..0e2bec3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6723,8 +6723,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   /* If this call is indirect, we'll need to be able to use a
 	 call-clobbered register for the address of the target function.
 	 Make sure that all such registers are not used for passing
-	 parameters.  Note that DLLIMPORT functions are indirect.  */
+	 parameters.  Note that DLLIMPORT functions and call via GOT
+	 slot are indirect.  */
   if (!decl
+	  || (flag_pic && !flag_plt)
 	  || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
 	{
 	  /* Check if regparm >= 3 since arg_reg_available is set to
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 49b2216..6ab8eaa 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11865,6 +11865,22 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.
+(define_insn "*sibcall_GOT_32"
+  [(call (mem:QI
+	   (mem:SI (plus:SI
+		 (match_operand:SI 0 "register_no_elim_operand" "U")
+		 (match_operand:SI 1 "GOT32_symbol_operand"
+	 (match_operand 2))]
+  "!TARGET_MACHO && !TARGET_64BIT && SIBLING_CALL_P (insn)"
+{
+  rtx fnaddr = gen_rtx_PLUS (Pmode, operands[0], operands[1]);
+  fnaddr = gen_const_mem (Pmode, fnaddr);
+  return ix86_output_call_insn (insn, fnaddr);
+}
+  [(set_attr "type" "call")])
+
 (define_insn "*sibcall"
   [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "UBsBz"))
 	 (match_operand 1))]
@@ -12042,6 +12058,23 @@
   "* return ix86_output_call_insn (in

[PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Kyrill Tkachov

Hi all,

The documentation on RTL canonical forms in md.texi says:

"Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.
This means that for the example:
int
f255 (int x)
{
  if (x & 255)
return 1;
  return x;
}

it ends up trying to make a QImode comparison against zero, for which targets 
like
aarch64 have no pattern.

This patch attempts to fix this in two places in combine.
First is simplify_comparison when handling the and-bitmask case.
Currently it will call gen_lowpart_or_truncate on the argument to produce the 
short subreg.
With this patch we don't do that when comparing against zero.
This way the and-bitmask form is preserved for make_extraction later on to 
convert
into a zero_extract.
The second place is in make_extraction itself where it tries to avoid creating 
a zero_extract,
but the canonicalisation rules and the function comment for make_extraction say 
that it should
try hard create a zero_extraction when inside a comparison in particular
(" IN_COMPARE is nonzero if we are in a COMPARE.  This means that a
   ZERO_EXTRACT should be built even for bits starting at bit 0.")

With this patch for the testcases:
int
f255 (int x)
{
  if (x & 255)
return 1;
  return x;
}

int
foo (long x)
{
   return ((short) x != 0) ? x : 1;
}

we now generate for aarch64 at -O2:
f255:
tst x0, 255
csinc   w0, w0, wzr, eq
ret

and
foo:
tst x0, 65535
csinc   x0, x0, xzr, ne
ret


instead of the previous:
f255:
and w1, w0, 255
cmp w1, wzr
csinc   w0, w0, wzr, eq
ret

foo:
sxthw1, w0
cmp w1, wzr
csinc   x0, x0, xzr, ne
ret


Bootstrapped and tested on arm, aarch64, x86_64.
To get the benefit on aarch64 this needs patch 1/2 that adds an aarch64 pattern
for comparing a zero_extract with zero.
On aarch64 this greatly increases the usage of the TST instruction by about 54% 
on SPEC2006.
Performance-wise there were no regressions and slight improvements on SPECINT 
that may just
be above normal noise (overall 0.5% improvement).
On arm it makes very little difference (arm already defines QI and HImode 
comparisons against zero)
but makes more use of the lsrs-immediate instruction in place of the arm tst 
instruction, which has
a shorter encoding in Thumb2 state.
On x86_64 I saw no difference in code size for SPEC2006 on my setup.

What do people think of this approach?
I hope this just enforces the already documented canonicalisation rules with 
minimal(none?) negative
fallout.

Thanks,
Kyrill


2015-12-17  Kyrylo Tkachov  

PR rtl-optimization/68796
* combine.c (make_extraction): Don't try to avoid the extraction if
inside a compare.
(simplify_comparison): Don't truncate to lowpart if comparing against
zero and target doesn't have a native compare instruction in the
required short mode.

2015-12-17  Kyrylo Tkachov  

PR rtl-optimization/68796
* gcc.target/aarch64/tst_5.c: New test.
* gcc.target/aarch64/tst_6.c: Likewise.
diff --git a/gcc/combine.c b/gcc/combine.c
index 8601d8983ce345e2129dd047b3520d98c0582842..345e63f9a05f2310a5c9e5b239ed069d22565d1c 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7337,10 +7337,13 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
  low-order bit and this is either not in the destination or we have the
  appropriate STRICT_LOW_PART operation available.
 
+ Don't do this if we are inside a comparison, as the canonicalization
+ rules call for a zero_extract form.
  For MEM, we can avoid an extract if the field starts on an appropriate
  boundary and we can change the mode of the memory reference.  */
 
   if (tmode != BLKmode
+  && !in_compare
   && ((pos_rtx == 0 && (pos % BITS_PER_WORD) == 0
 	   && !MEM_P (inner)
 	   && (inner_mode == tmode
@@ -12108,14 +12111,19 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 
 	 unless TRULY_NOOP_TRUNCATION allows it or the register is
 	 known to hold a value of the required mode the
-	 transformation is invalid.  */
+	 transformation is invalid.
+	 If the target does not have a compare instruction of that mode
+	 don't do this when comparing against 0 since the canonicalization
+	 rules require such an operation to be represented as a
+	 zero_extract, which make_extraction will produce later on.  */
 	  if ((equality_comparison_p || unsigned_comparison_p)
 	  && CONST_INT_P (XEXP (op0, 1))
 	  && (i = exact_log2 ((UINTVAL (XEXP (op0, 1))
    & GET_MOD

[PATCH][AArch64][1/2] PR rtl-optimization/68796 Add compare-of-zero_extract pattern

2015-12-17 Thread Kyrill Tkachov

Hi all,

In this PR I'm trying to increase the use of the aarch64 instruction TST that 
performs a
bitwise AND with a bitmask and compares the result with zero.
GCC has many ways of representing these operations in RTL. Depending on the 
mask, the target
and the context it might be an AND-immediate, a ZERO_EXTRACT or a ZERO_EXTEND 
of a subreg.

aarch64.md already contains a pattern for the compare with and-immediate case, 
which is the most
general form of this, but it doesn't match in many common cases

The documentation on canonicalization in md.texi says:
"Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations. "

This means that we should define a compare with a zero-extract pattern in 
aarch64,
which is what this patch does. It's fairly simple: it constructs the TST mask 
from
the operands of the zero_extract and updates the SELECT_CC_MODE implementation 
to
assign the correct CC_NZ mode to such comparisons.  Note that this is valid only
for equality comparisons against zero.

So for the testcase:
int
f1 (int x)
{
  if (x & 1)
return 1;
  return x;
}

we now generate:
f1:
tst x0, 1
csinc   w0, w0, wzr, eq
ret

instead of the previous:
f1:
and w1, w0, 1
cmp w1, wzr
csinc   w0, w0, wzr, eq
ret


and for the testcase:
int
f2 (long x)
{
   return ((short) x >= 0) ? x : 0;
}

we now generate:
f2:
tst x0, 32768
cselx0, x0, xzr, eq
ret

instead of:
f2:
sxthw1, w0
cmp w1, wzr
cselx0, x0, xzr, ge
ret

i.e. we test the sign bit rather than perform the full comparison with zero.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2015-12-17  Kyrylo Tkachov  

PR rtl-optimization/68796
* config/aarch64/aarch64.md (*and3nr_compare0_zextract):
New pattern.
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Handle
ZERO_EXTRACT comparison with zero.
(aarch64_mask_from_zextract_ops): New function.
* config/aarch64/aarch64-protos.h (aarch64_mask_from_zextract_ops):
New prototype.

2015-12-17  Kyrylo Tkachov  

PR rtl-optimization/68796
* gcc.target/aarch64/tst_3.c: New test.
* gcc.target/aarch64/tst_4.c: Likewise.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 87d6eb1358845527d7068550925949802a7e48e2..febca98d38d5f09c97b0f79adc55bb29eca217b9 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -330,6 +330,7 @@ int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
+rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
 rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cb8955d5d6c909e8179bb1ab8203eb165f55e4b6..58a9fc68f391162ed9847d7fb79d70d3ee9919f5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4147,7 +4147,9 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   && y == const0_rtx
   && (code == EQ || code == NE || code == LT || code == GE)
   && (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND
-	  || GET_CODE (x) == NEG))
+	  || GET_CODE (x) == NEG
+	  || (GET_CODE (x) == ZERO_EXTRACT && CONST_INT_P (XEXP (x, 1))
+	  && CONST_INT_P (XEXP (x, 2)
 return CC_NZmode;
 
   /* A compare with a shifted operand.  Because of canonicalization,
@@ -10757,6 +10759,21 @@ aarch64_simd_imm_zero_p (rtx x, machine_mode mode)
   return x == CONST0_RTX (mode);
 }
 
+
+/* Return the bitmask CONST_INT to select the bits required by a zero extract
+   operation of width WIDTH at bit position POS.  */
+
+rtx
+aarch64_mask_from_zextract_ops (rtx width, rtx pos)
+{
+  gcc_assert (CONST_INT_P (width));
+  gcc_assert (CONST_INT_P (pos));
+
+  unsigned HOST_WIDE_INT mask
+= ((unsigned HOST_WIDE_INT)1 << UINTVAL (width)) - 1;
+  return GEN_INT (mask << UINTVAL (pos));
+}
+
 bool
 aarch64_simd_imm_scalar_p (rtx x, machine_mode mode ATTRIBUTE_UNUSED)
 {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4604fd2588be87944a72224dccb3dfb32e42a1ad..fd2b3ef64f1736545948eb49e5ac6dfbd206e3e9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3698,6 +3698,28 @@ (define_insn "*and3nr_compare0"
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
+(define_insn "*and3nr_compare0_zextract"
+  [(set (reg:CC_NZ CC_REGNUM)
+	(compare:CC_NZ
+	 (zero_extract:GPI (match_operand:GPI 0 "register_operand" "r")
+		  (match_operand:GPI 1 "const_int_operand" "n")
+		 

Re: [PATCH 0/2] obsolete some old targets

2015-12-17 Thread Trevor Saunders
On Tue, Dec 15, 2015 at 03:25:18PM -0700, Jeff Law wrote:
> On 12/15/2015 03:02 PM, Trevor Saunders wrote:
> >>
> >>Can you mark interix as obsolete?  It hasn't even built for a long time.
> >
> >  Sure, I can do that if you want, I just wasn't sure before you wanted
> >  to.
> Please do.  I know we've been round and round on that one before, but given
> it hasn't been building since 2012, I think obsoleting is appropriate.

ok, I committed these two patches and a third obsoleting interix, given
its mechanical the same as these I took this as approval to go ahead
with it and save you a second of review time.  If oyu object obviously
we can change that.

> Fixing it wouldn't be hard, it just doesn't seem worth the effort.

agreed

Trev

> 
> jeff


Re: [PATCH 0/2] obsolete some old targets

2015-12-17 Thread Trevor Saunders
On Thu, Dec 17, 2015 at 03:36:18PM +0100, Kamil Rytarowski wrote:
> Hi,
> 
> I talked with devs and it will be better to just keep it removed and focus on 
> native NetBSD with NetBSD userland.
> 
> Actually nobody seems to be interested in the Debian/NetBSD distribution.

that's what I thought from googling, so I'll just go ahead and commit
these patches.

Trev

> 
> Thanks,
> 
> > Sent: Thursday, December 17, 2015 at 3:24 PM
> > From: "Trevor Saunders" 
> > To: "Kamil Rytarowski" 
> > Cc: tbsaunde+...@tbsaunde.org
> > Subject: Re: [PATCH 0/2] obsolete some old targets
> >
> > On Thu, Dec 17, 2015 at 12:37:47PM +0100, Kamil Rytarowski wrote:
> > > -BEGIN PGP SIGNED MESSAGE-
> > > Hash: SHA256
> > > 
> > > I want to keep knetbsd alive. My application for FSF is still ongoing.
> > > 
> > > Please hold on.
> > 
> > Well, These patches aren't going to make resurrecting knetbsd
> > significantly harder, there isn't even a significant amount of knetbsd
> > specific code in gcc, so even removing it should be easily reverted
> > should that become desirable.  On the other hand it doesn't seem like
> > removing the knetbsd specific code will help much at the moment either.
> > So I guess it doesn't really matter one way or another to me.
> > 
> > Trev
> > 
> > > 
> > > Thanks
> > > 
> > > On 15.12.2015 04:55, tbsaunde+...@tbsaunde.org wrote:
> > > > From: Trevor Saunders 
> > > > 
> > > > Hi,
> > > > 
> > > > http://gcc.gnu.org/ml/gcc-patches/2015-12/msg00365.html reminded me
> > > > I hadn't gotten around to marking *-knetbsd and openbsd 2/3
> > > > obsolete as I offered to do back in the spring.
> > > > 
> > > > I tested I could still build on x86_64-linux-gnu, and could only
> > > > cross compile to i386-openbsd2 i386-openbsd3 and
> > > > x86_64_64-knetbsd-gnu with --enable-obsolete.  Given how late in
> > > > the cycle we are I'm not sure if we should remove these targets as
> > > > soon as stage 1 opens, but we might as well obsolete them I guess,
> > > > ok to commit?
> > > > 
> > > > Trev
> > > > 
> > > > 
> > > > Trevor Saunders (2): mark *-knetbsd-* as obsolete obsolete openbsd
> > > > 2.0 and 3.X
> > > > 
> > > > gcc/config.gcc | 4 +++- 1 file changed, 3 insertions(+), 1
> > > > deletion(-)
> > > > 
> > > 
> > > -BEGIN PGP SIGNATURE-
> > > Version: GnuPG v2
> > > 
> > > iQIcBAEBCAAGBQJWcp6JAAoJEEuzCOmwLnZs6IEQAKzugPu0CurmIRNyLR6oyTd3
> > > sTTt/ffzD3RibyJEIVjTBC5tfOFcnS2Mi57TRdN5lDfyF1gwsPpvcY5Ce+WTjnHf
> > > 4Npi/SDego2HPQka5laeJv/MJdBrc7f5bncowcicrZMvo1QImYA4BFQuRk3rMSWj
> > > y31GUlTlP7yQFQ0FSXGFegkEZ7J/LqYmW+piSMhqEcqnRD6FJgGNwGPIngdQ3HvE
> > > w4z37n1Bs8qD9P6AW0D3YZfvDKn7GbGGTVq3uk1MI78hivXdCgXPyY3qnhVCmTjj
> > > 2dAX2h0Tl5aYBbVseO2ecPm/U7BnOYQBACnysnNjh3TLBzIjoXrt1Sao1m2aywj/
> > > f1+LUS2ySknZKidJRNGO/IqrhDIG2Qgmrn2MQDofTCFIwcrvZkt2wRqjBhf7IaCc
> > > Y5o7/emwj+dbfbPQNvu7RS6kFtOS4JXgs8b8D3oXHc9D9BNWYWEu5XSIWK+1HwwF
> > > 3wMcqZoZdqDFm1swM1XjOFpMjengq4AY8HAEROnj1p1qG4LhFKD84qFnELpEDowa
> > > leG9B+l9yoJQVi2GgZA8XmE7gT54oHu+pqlL7N/FgMNRS1rg4YUmAF6DOWl9cWm+
> > > NAdugbI+6VDUcvhgtrPIUv378Zn2jSUwzdl+hFp9C+jrwsc0KQN8Sg3a1wX3e8yf
> > > 0nsnHzcG0ulJnBPTDdEN
> > > =RMIO
> > > -END PGP SIGNATURE-
> > 


Re: [PATCH] Fix PR68707, 67323

2015-12-17 Thread Alan Lawrence

On 17/12/15 10:46, Richard Biener wrote:

On Thu, 17 Dec 2015, Alan Lawrence wrote:


On 16/12/15 15:01, Richard Biener wrote:


The following patch adds a heuristic to prefer store/load-lanes
over SLP when vectorizing.  Compared to the variant attached to
the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
what you've tested).


Not sure I follow this. Compared to the variant attached to the PR - we will
now attempt to use load-lanes, if (say) all of the loads are strided, even if
we know we don't support load-lanes (for any of them). That sounds the wrong
way around and I think rather different to what you proposed earlier? (At the
least, the debug message "can use load/store lanes" is potentially misleading,
that's not necessarily the case!)


Ah, indeed.  Note that the whole thing is still guarded by the check
that we can use store-lanes for the store.

I can also do it the other way around (as previously proposed) which
would change outcome for slp-perm-11.c.  That proposal would not reject
the SLP if there were any strided grouped loads involved.


Indeed; the STMT_VINFO_STRIDED_P || !vect_load_lanes_supported approach (as on 
PR68707) vectorizes slp-perm-11.c with SLP, which works much better than the 
!STMT_VINFO_STRIDED_P && !vect_load_lanes_supported, which tries to use st2 (and 
only sort-of works - you get an st2 output, but no ld2, and lots of faff).


I think I move for the patch from PR68707, therefore. (Ramana - any thoughts?)


Btw, another option is to push the decision past full SLP analysis
and thus make the decision globally for all SLP instances - currently
SLP instances are cancelled one a one-by-one basis meaning we might
do SLP plus load/store-lanes in the same loop.


I don't see anything inherently wrong with doing both in the same loop. On 
simple loops, I suspect we'll do better committing to one strategy or the other 
(tho really it's only the VF required I think?), but then, on such simple loops, 
there are probably not very many SLP instances!



Maybe we have to go all the way to implementing a better vectorization
cost hook just for the permutations - the SLP path in theory knows
exactly which ones it will generate.


Yes, I think this sounds like a good plan for GCC 7. It doesn't require 
constructing an entire stmt (if you are concerned about the cost of that), and 
on most targets, probably integrates fairly easily with the 
expand_vec_perm_const hooks.


--Alan


[PTX] Reorder hard regs

2015-12-17 Thread Nathan Sidwell
This  reorders the hardregs to be a  contiguous block, and names them somewhat 
more conventionally.  (I had considered %sp, %fp etc, but went with the longer 
names).


nathan
2015-12-17  Nathan Sidwell  

	* config/nvptx/nvptx.h (NVPTX_RETURN_REGNUM, FRAME_POINTER_REGNUM,
	ARG_POINTER_REGNUM, STATIC_CHAIN_REGNUM): Renumber.
	(REGISTER_NAMES): Update and rename.
	(FIXED_REGISTERS, CALL_USED_REGISTERS): Update.
	(enum_reg_class, REG_CLASS_NAMES, REG_CLASS_CONTENTS): Reformat.

Index: config/nvptx/nvptx.h
===
--- config/nvptx/nvptx.h	(revision 231769)
+++ config/nvptx/nvptx.h	(working copy)
@@ -78,19 +78,15 @@
 #define PTRDIFF_TYPE (TARGET_ABI64 ? "long int" : "int")
 
 #define POINTER_SIZE (TARGET_ABI64 ? 64 : 32)
-
 #define Pmode (TARGET_ABI64 ? DImode : SImode)
 
 /* Registers.  Since ptx is a virtual target, we just define a few
-   hard registers for special purposes and leave pseudos unallocated.  */
-
-#define FIRST_PSEUDO_REGISTER 16
-/* We have to have some available hard registers, to keep gcc setup
+   hard registers for special purposes and leave pseudos unallocated.
+   We have to have some available hard registers, to keep gcc setup
happy.  */
-#define FIXED_REGISTERS	\
-  { 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 }
-#define CALL_USED_REGISTERS\
-  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }
+#define FIRST_PSEUDO_REGISTER 16
+#define FIXED_REGISTERS	{ 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
+#define CALL_USED_REGISTERS { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }
 
 #define HARD_REGNO_NREGS(REG, MODE)		\
   ((void)(REG), (void)(MODE), 1)
@@ -100,32 +96,13 @@
  ((void)(REG), (void)(MODE), true)
 
 /* Register Classes.  */
-
-enum reg_class
-  {
-NO_REGS,
-ALL_REGS,
-LIM_REG_CLASSES
-  };
-
+enum reg_class {  NO_REGS,ALL_REGS,	LIM_REG_CLASSES };
+#define REG_CLASS_NAMES{ "NO_REGS",  "ALL_REGS" }
+#define REG_CLASS_CONTENTS { { 0x }, { 0x } }
 #define N_REG_CLASSES (int) LIM_REG_CLASSES
 
-#define REG_CLASS_NAMES {	  \
-"NO_REGS",			  \
-"ALL_REGS" }
-
-#define REG_CLASS_CONTENTS	\
-{\
-  /* NO_REGS.  */		\
-  { 0x },			\
-  /* ALL_REGS.  */		\
-  { 0x },			\
-}
-
 #define GENERAL_REGS ALL_REGS
-
 #define REGNO_REG_CLASS(R) ((void)(R), ALL_REGS)
-
 #define BASE_REG_CLASS ALL_REGS
 #define INDEX_REG_CLASS NO_REGS
 
@@ -151,17 +128,16 @@ enum reg_class
 #define FRAME_GROWS_DOWNWARD 0
 #define STACK_GROWS_DOWNWARD 1
 
+#define NVPTX_RETURN_REGNUM 0
 #define STACK_POINTER_REGNUM 1
-#define NVPTX_RETURN_REGNUM 4
-#define FRAME_POINTER_REGNUM 15
-#define ARG_POINTER_REGNUM 14
-
-#define STATIC_CHAIN_REGNUM 12
+#define FRAME_POINTER_REGNUM 2
+#define ARG_POINTER_REGNUM 3
+#define STATIC_CHAIN_REGNUM 4
 
 #define REGISTER_NAMES			\
   {	\
-"%hr0", "%outargs", "%hfp", "%hr3", "%retval", "%hr5", "%hr6", "%hr7",	\
-"%hr8", "%hr9", "%hr10", "%hr11", "%chain_in", "%hr13", "%argp", "%frame" \
+"%value", "%stack", "%frame", "%args", "%chain", "%hr5", "%hr6", "%hr7", \
+"%hr8", "%hr9", "%hr10", "%hr11", "%hr12", "%hr13", "%hr14", "%hr15" \
   }
 
 #define FIRST_PARM_OFFSET(FNDECL) ((void)(FNDECL), 0)


Re: [PATCH 4/5] Fix intransitive comparison in compare_access_positions

2015-12-17 Thread Martin Jambor
Hi,

On Thu, Dec 17, 2015 at 12:02:11PM +0300, Yury Gribov wrote:
> Another intransitive comparison in reload_pseudo_compare_func. Buggy
> scenario:
> 1) A and B are ints of equal presion so we return 0
> 2) C is REAL and thus can compare differently to A and B
> 
> Cc-ing Martin who's the original author.

I cannot approve it but I also do not object to this change.
Thanks,

Martin

> 
> /Yury

> From 6f3930ad81945f6b5d7aecfdda16089547a592d3 Mon Sep 17 00:00:00 2001
> From: Yury Gribov 
> Date: Sat, 12 Dec 2015 10:39:15 +0300
> Subject: [PATCH 4/5] Fix intransitive comparison in compare_access_positions.
> 
> 2015-12-17  Yury Gribov  
> 
>   * tree-sra.c (compare_access_positions):
>   Make transitive.
> 


Re: [Fortran, Patch] (RFC, Coarray) Implement TS18508's EVENTS

2015-12-17 Thread Steve Kargl
On Thu, Dec 17, 2015 at 01:22:06PM +0100, Alessandro Fanfarillo wrote:
> 
> I've noticed that this patch has been applied only on trunk and not on
> the gcc-5-branch. Is it a problem to include EVENTS in gcc-5?
> 

No problem.  When I applied the EVENTS patch to trunk,
the 5.3 release was being prepared.  I was going to
wait for a week or two after 5.3 came out, then apply
the patch.  Now that you have commit access, feel 
free to back port the patch.  Rememer to post the
patch that you commit to both the fortran and gcc-patches
list. 

-- 
Steve


Re: [PATCH][combine] Check WORD_REGISTER_OPERATIONS normally rather than through preprocessor

2015-12-17 Thread Segher Boessenkool
Hi Kyrill,

On Tue, Dec 15, 2015 at 05:07:41PM +, Kyrill Tkachov wrote:
> As part of the war on conditional compilation here's an #if check on 
> WORD_REGISTER_OPERATIONS that
> seems to have been missed out.
> 
> Bootstrapped and tested on arm, aarch64, x86_64.
> 
> Is it still ok to commit these kinds of conditional compilation conversions?

You could say it is a bugfix, a missed case in the conversion ;-)

> diff --git a/gcc/combine.c b/gcc/combine.c
> index 
> 8601d8983ce345e2129dd047b3520d98c0582842..0658a6dbc6df6862df662bc7842c13ed06b36b04
>  100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -11488,10 +11488,10 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
> rtx *pop1)
>/* Try a few ways of applying the same transformation to both operands.  */
>while (1)
>  {
> -#if !WORD_REGISTER_OPERATIONS
>/* The test below this one won't handle SIGN_EXTENDs on these machines,
>so check specially.  */
> -  if (code != GTU && code != GEU && code != LTU && code != LEU
> +  if (!WORD_REGISTER_OPERATIONS && code != GTU && code != GEU
> +   && code != LTU && code != LEU

Please keep all the code != together, i.e.

+  if (!WORD_REGISTER_OPERATIONS
+ && code != GTU && code != GEU && code != LTU && code != LEU

Okay with that change.


Segher


[PATCH] Fix PR68946

2015-12-17 Thread Richard Biener

This fixes PR68946.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-12-17  Richard Biener  

PR tree-optimization/68946
* tree-vect-slp.c (vect_slp_analyze_node_operations): Push
SLP def type to stmt operands one stmt at a time.

* gcc.dg/torture/pr68946.c: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 231745)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2221,12 +2250,6 @@ vect_slp_analyze_node_operations (slp_tr
 if (!vect_slp_analyze_node_operations (child))
   return false;
 
-  /* Push SLP node def-type to stmts.  */
-  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
-  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (child), j, stmt)
-   STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) = SLP_TREE_DEF_TYPE (child);
-
   bool res = true;
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt)
 {
@@ -2234,19 +2257,21 @@ vect_slp_analyze_node_operations (slp_tr
   gcc_assert (stmt_info);
   gcc_assert (STMT_SLP_TYPE (stmt_info) != loop_vect);
 
-  if (!vect_analyze_stmt (stmt, &dummy, node))
-   {
- res = false;
- break;
-   }
+  /* Push SLP node def-type to stmt operands.  */
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), j, child)
+   if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
+ STMT_VINFO_DEF_TYPE (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS 
(child)[i]))
+   = SLP_TREE_DEF_TYPE (child);
+  res = vect_analyze_stmt (stmt, &dummy, node);
+  /* Restore def-types.  */
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), j, child)
+   if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
+ STMT_VINFO_DEF_TYPE (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS 
(child)[i]))
+   = vect_internal_def;
+  if (! res)
+   break;
 }
 
-  /* Restore stmt def-types.  */
-  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
-  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (child), j, stmt)
-   STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) = vect_internal_def;
-
   return res;
 }
 
Index: gcc/testsuite/gcc.dg/torture/pr68946.c
===
--- gcc/testsuite/gcc.dg/torture/pr68946.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr68946.c  (working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-vect-cost-model" } */
+
+int printf (const char *, ...);
+
+int a, b, g;
+short c, e, h, i;
+int f[8];
+void fn1() {
+short j;
+for (; a;) {
+   printf("%d", g);
+   b = 7;
+   for (; b >= 0; b--) {
+   i = 1;
+   short k = f[b];
+   e = k ? k : 3;
+   j = (i && (c |= e)) << 3;
+   int l = j, m = 0;
+   h = l < 0 || l >> m;
+   f[b] = h;
+   }
+}
+}


[ptx] annotate 2 tests

2015-12-17 Thread Nathan Sidwell

These two tests require label values,  annotated thusly.

nathan
2015-12-17  Nathan Sidwell  

	* c-c++-common/Wunused-var-13.c: Requires label values.
	* gcc.dg/torture/pr46216.c: Likewise.

Index: c-c++-common/Wunused-var-13.c
===
--- c-c++-common/Wunused-var-13.c	(revision 231757)
+++ c-c++-common/Wunused-var-13.c	(working copy)
@@ -1,6 +1,7 @@
 /* PR c/46015 */
 /* { dg-options "-Wunused" } */
 /* { dg-do compile } */
+/* { dg-require-effective-target label_values } */
 
 int
 f1 (int i)
Index: gcc.dg/torture/pr46216.c
===
--- gcc.dg/torture/pr46216.c	(revision 231757)
+++ gcc.dg/torture/pr46216.c	(working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target label_values } */
 
 typedef int Embryo_Cell;
 int


Re: [PATCHES, PING*5] Enhance standard DWARF for Ada

2015-12-17 Thread Pierre-Marie de Rodat

On 12/16/2015 10:30 PM, Jason Merrill wrote:

OK with those changes.


All changes done, and all patches pushed. Thank you very much!!

--
Pierre-Marie de Rodat


Re: libgcc: unwind-ia64.c without malloc/free

2015-12-17 Thread Jeff Law

On 12/17/2015 06:17 AM, Bernd Schmidt wrote:

On 12/17/2015 12:17 AM, Bernd Edlinger wrote:

this is just an idea, how to avoid use of malloc in unwind-ia64.c.

[...]

What do you think?


Not worth worrying about IMO. I think ia64 is dead and best left to rest
in maintenance mode.
Agreed.  And in general using alloca is a problem waiting to happen 
unless you can prove there's no way to blow out the stack.   I can't 
count the number of problems of that nature we've fixed in glibc over 
the last 5 years when the hackers realized that was a great attack vector.


jeff


Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread Uros Bizjak
On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
>> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
>>> Since sibcall never returns, we can only use call-clobbered register
>>> as GOT base.  Otherwise, callee-saved register used as GOT base won't
>>> be properly restored.
>>>
>>> Tested on x86-64 with -m32.  OK for trunk?
>>
>> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
>> class, and register_no_elim_operand predicate should be used with "U"
>> constraint. Also, please introduce new predicate, similar to how
>> GOT_memory_operand is defined and handled.
>>
>
> Here is the updated patch.  There is a predicate already,
> sibcall_memory_operand.  It allows any registers to
> be as GOT base, which is the root of our problem.
> This patch removes GOT slot from it and handles
> sibcall over GOT slot with *sibcall_GOT_32 and
> *sibcall_value_GOT_32 patterns.  Since I need to
> expose constraints on GOT base register to RA,
> I have to use 2 operands, GOT base and function
> symbol, to describe sibcall over 32-bit GOT slot.

Please use

   (mem:SI (plus:SI
 (match_operand:SI 0 "register_no_elim_operand" "U")
 (match_operand:SI 1 "GOT32_symbol_operand")))
...

to avoid manual rebuild of the operand.

Uros.


RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-12-17 Thread Wilco Dijkstra
James Greenhalgh wrote:
> On Wed, Dec 16, 2015 at 01:05:21PM +, Wilco Dijkstra wrote:
> > James Greenhalgh wrote:
> > > On Tue, Dec 15, 2015 at 10:54:49AM +, Wilco Dijkstra wrote:
> > > > ping
> > > >
> > > > > -Original Message-
> > > > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > > > > Sent: 06 November 2015 20:06
> > > > > To: 'gcc-patches@gcc.gnu.org'
> > > > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > >
> > > > > This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the 
> > > > > register
> > > > > allocator always uses ALL_REGS even when it has a much higher cost. 
> > > > > The
> > > > > hook changes the class to either FP_REGS or GENERAL_REGS depending on 
> > > > > the
> > > > > mode of the register. This results in better register allocation 
> > > > > overall,
> > > > > fewer spills and reduced codesize - particularly in SPEC2006 gamess.
> > > > >
> > > > > GCC regression passes with several minor fixes.
> > > > >
> > > > > OK for commit?
> > > > >
> > > > > ChangeLog:
> > > > > 2015-11-06  Wilco Dijkstra  
> > > > >
> > > > >   * gcc/config/aarch64/aarch64.c
> > > > >   (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
> > > > >   (aarch64_ira_change_pseudo_allocno_class): New function.
> > > > >   * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
> > > > >   * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > >   (test_corners_sisd_di): Improve force to SIMD register.
> > > > >   (test_corners_sisd_si): Likewise.
> > > > >   * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with 
> > > > > -O2.
> > > > >   * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
> > > > >   Remove scan-assembler check for ldr.
> > >
> > > Drop the gcc/ from the ChangeLog.
> > >
> > > > > --
> > > > >  gcc/config/aarch64/aarch64.c   | 22 
> > > > > ++
> > > > >  gcc/testsuite/gcc.target/aarch64/cvtf_1.c  |  2 +-
> > > > >  gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c  |  4 ++--
> > > > >  gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c |  2 +-
> > > > >  .../gcc.target/aarch64/vect-ld1r-compile-fp.c  |  1 -
> > >
> > > These testsuite changes concern me a bit, and you don't mention them 
> > > beyond
> > > saying they are minor fixes...
> >
> > Well any changes to register allocator preferencing would cause fallout in
> > tests that are assuming which register is allocated, especially if they use
> > nasty inline assembler hacks to do so...
> 
> Sure, but the testcases here each operate on data that should live in
> FP_REGS given the initial conditions that the nasty hacks try to mimic -
> that's what makes the regressions notable.
> 
> >
> > > > >  #define FCVTDEF(ftype,itype) \
> > > > >  void \
> > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
> > > > > b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > index 363f554..8465c89 100644
> > > > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
> > > > >  {
> > > > >force_simd_di (b);
> > > > >b = b >> 63;
> > > > > +  force_simd_di (b);
> > > > >b = b >> 0;
> > > > >b += b >> 65; /* { dg-warning "right shift count >= width of type" 
> > > > > } */
> > > > > -  force_simd_di (b);
> > >
> > > This one I don't understand, but seems to say that we've decided to move
> > > b out of FP_REGS after getting it in there for b = b << 63; ? So this is
> > > another register allocator regression?
> >
> > No, basically the register allocator is now making better decisions as to
> > where to allocate integer variables. It will only allocate them to FP
> > registers if they are primarily used by other FP operations. The
> > force_simd_di inline assembler tries to mimic FP uses, and if there are
> > enough of them at the right places then everything works as expected.  If
> > however you do 3 consecutive integer operations then the allocator will now
> > correctly prefer to allocate them to the integer registers (while previously
> > it wouldn't, which is inefficient).
> 
> I'm not sure I understand this argument in the abstract (though I believe
> it for some of the supported cores for the AArch64 target). At an abstract
> level, given a set of operations which can execute in either FP_REGS or
> GENERAL_REGS and initial and post conditions that allocate all input and
> output registers from those operations to FP_REGS, I would expect those
> operations to take place using FP_REGS? Your patch seems to break this
> expectation?

No my patch doesn't break that expectation. The goal is that if the cost of 
allocating to either integer or FP registers is the same, we prefer the most
natural register file based on the 

Re: [PATCH][ARC] Refurbish and extend builtin function support for ARC

2015-12-17 Thread Joern Wolfgang Rennecke



On 17/12/15 09:31, Claudiu Zissulescu wrote:

Please find a new patch that refurbishes and extends the builtin function 
support for ARC. I also added a number of builtins for ARCv2 architecture, and 
a number of tests.

Ok to commit?

gcc/
2015-12-14  Claudiu Zissulescu  

...


  
	(VUNSPEC_DEXCL_NORES, VUNSPEC_LR_HIGH): Remove

Typo: missing a period.
Otherwise, this is OK.

Although, I think the regular builtin part of arc_expand_builtin could be
simpler if you increased the size op xop by one, and put target into xop[0].

This would then lend itself to further simplification if we had something
to common these switches on the number of arguments to pass to GEN_FCN
strewn over various parts and ports of gcc.
Like:

rtx_insn *
apply_GEN_FCN (enum insn_code icode, rtx *arg)
{
  switch (insn_data[icode].n_generator_args)
{
case 0:
  return GEN_FCN (icode) ();
case 1:
  return GEN_FCN (icode) (arg[0]);
...
}
}

This could be generated by one of the generator programs so that the 
switch has as many cases

as required to cover the full range of insn_data[icode].n_generator_args .


Re: [PATCH] Fix some blockers of PR c++/24666 (arrays decay to pointers too early)

2015-12-17 Thread Patrick Palka

On Thu, 17 Dec 2015, Paolo Carlini wrote:


Hi,

On 16/12/2015 23:10, Patrick Palka wrote:

gcc/cp/ChangeLog:

PR c++/59878
* typeck.c (convert_for_initialization): Don't perform an early
decaying conversion if converting to a class type.

gcc/testsuite/ChangeLog:

PR c++/59878
* g++.dg/conversion/pr59878.C: New test.
Nit: note that the actual bug number is 59879, not 59878. Can you please 
correct all those 8 to 9?


Sorry about that... Going to correct this with the following patch after
a quick regtest:

--- 8< ---

Subject: [PATCH] Fix wrong PR references

PR c++/59878 -> PR c++/59879
---
 gcc/cp/ChangeLog  |  2 +-
 gcc/testsuite/ChangeLog   |  4 ++--
 gcc/testsuite/g++.dg/conversion/pr59878.C | 25 -
 gcc/testsuite/g++.dg/conversion/pr59879.C | 25 +
 4 files changed, 28 insertions(+), 28 deletions(-)
 delete mode 100644 gcc/testsuite/g++.dg/conversion/pr59878.C
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr59879.C

diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 14292e9..a192f00 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -2,7 +2,7 @@

PR c++/16333
PR c++/41426
-   PR c++/59878
+   PR c++/59879
PR c++/66895
* typeck.c (convert_for_initialization): Don't perform an early
decaying conversion if converting to a class type.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index fafa8cc..7386f6b 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -37,11 +37,11 @@

PR c++/16333
PR c++/41426
-   PR c++/59878
+   PR c++/59879
PR c++/66895
* g++.dg/conversion/pr16333.C: New test.
* g++.dg/conversion/pr41426.C: New test.
-   * g++.dg/conversion/pr59878.C: New test.
+   * g++.dg/conversion/pr59879.C: New test.
* g++.dg/conversion/pr66895.C: New test.

 2015-12-16  Martin Sebor  
diff --git a/gcc/testsuite/g++.dg/conversion/pr59878.C 
b/gcc/testsuite/g++.dg/conversion/pr59878.C
deleted file mode 100644
index ed567fe..000
--- a/gcc/testsuite/g++.dg/conversion/pr59878.C
+++ /dev/null
@@ -1,25 +0,0 @@
-// PR c++/59878
-
-struct Test {
- template 
- Test(const char (&array)[N]) {}
-};
-
-Test test() {
- return "test1";
-}
-
-void test2(Test arg = "test12") {}
-
-template 
-void test3(T arg = "test123") {}
-
-template 
-void test4(const T &arg = "test123") {}
-
-int main() {
- test();
- test2();
- test3();
- test4();
-}
diff --git a/gcc/testsuite/g++.dg/conversion/pr59879.C 
b/gcc/testsuite/g++.dg/conversion/pr59879.C
new file mode 100644
index 000..7bd5b99
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr59879.C
@@ -0,0 +1,25 @@
+// PR c++/59879
+
+struct Test {
+ template 
+ Test(const char (&array)[N]) {}
+};
+
+Test test() {
+ return "test1";
+}
+
+void test2(Test arg = "test12") {}
+
+template 
+void test3(T arg = "test123") {}
+
+template 
+void test4(const T &arg = "test123") {}
+
+int main() {
+ test();
+ test2();
+ test3();
+ test4();
+}
--
2.7.0.rc0.50.g1470d8f.dirty


Re: [PATCH] Remove unused modified_noreturn_calls

2015-12-17 Thread Richard Biener
On Thu, 17 Dec 2015, Bernd Schmidt wrote:

> On 12/17/2015 10:59 AM, Richard Biener wrote:
> > 
> > +extern void gt_ggc_mx (gimple *&);
> > +extern void gt_pch_nx (gimple *&);
> > +
> 
> This doesn't occur in the ChangeLog - unrelated change?

Not unrelated, it's required to make gtype-desc.c compile.  See
other occurances of these forward-decls.  They are needed from
hash_map/table.

Took me quite a while to figure out ;)

Richard.


Re: libgcc: unwind-ia64.c without malloc/free

2015-12-17 Thread Bernd Schmidt

On 12/17/2015 12:17 AM, Bernd Edlinger wrote:

this is just an idea, how to avoid use of malloc in unwind-ia64.c.

[...]

What do you think?


Not worth worrying about IMO. I think ia64 is dead and best left to rest 
in maintenance mode.



Bernd


Re: [PATCH] Remove unused modified_noreturn_calls

2015-12-17 Thread Bernd Schmidt

On 12/17/2015 10:59 AM, Richard Biener wrote:


+extern void gt_ggc_mx (gimple *&);
+extern void gt_pch_nx (gimple *&);
+


This doesn't occur in the ChangeLog - unrelated change?


Bernd



Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread H.J. Lu
On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
>> Since sibcall never returns, we can only use call-clobbered register
>> as GOT base.  Otherwise, callee-saved register used as GOT base won't
>> be properly restored.
>>
>> Tested on x86-64 with -m32.  OK for trunk?
>
> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
> class, and register_no_elim_operand predicate should be used with "U"
> constraint. Also, please introduce new predicate, similar to how
> GOT_memory_operand is defined and handled.
>

Here is the updated patch.  There is a predicate already,
sibcall_memory_operand.  It allows any registers to
be as GOT base, which is the root of our problem.
This patch removes GOT slot from it and handles
sibcall over GOT slot with *sibcall_GOT_32 and
*sibcall_value_GOT_32 patterns.  Since I need to
expose constraints on GOT base register to RA,
I have to use 2 operands, GOT base and function
symbol, to describe sibcall over 32-bit GOT slot.

OK for master if there is no regression.

Thanks.

-- 
H.J.
From e055e1ea71353897aa7ce4b38a5186c8b64ddc7c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 16 Dec 2015 12:34:57 -0800
Subject: [PATCH] Use call-clobbered register for sibcall via GOT

Since sibcall never returns, we can only use call-clobbered register
as GOT base.  Otherwise, callee-saved register used as GOT base won't
be properly restored.

gcc/

	PR target/68937
	* config/i386/i386.c (ix86_function_ok_for_sibcall): Count
	call via GOT slot as indirect call.
	* config/i386/i386.md (*sibcall_GOT_32): New pattern.
	(*sibcall_value_GOT_32): Likewise.
	* config/i386/predicates.md (sibcall_memory_operand): Remove
	GOT slot.

gcc/testsuite/

	PR target/68937
	* gcc.target/i386/pr68937-1.c: New test.
	* gcc.target/i386/pr68937-2.c: Likewise.
	* gcc.target/i386/pr68937-3.c: Likewise.
	* gcc.target/i386/pr68937-4.c: Likewise.
	* gcc.target/i386/pr68937-5.c: Likewise.
---
 gcc/config/i386/i386.c|  4 ++-
 gcc/config/i386/i386.md   | 43 +++
 gcc/config/i386/predicates.md | 10 +++
 gcc/testsuite/gcc.target/i386/pr68937-1.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr68937-2.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr68937-3.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr68937-4.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr68937-5.c |  9 +++
 8 files changed, 111 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-5.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cecea24..0e2bec3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6723,8 +6723,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   /* If this call is indirect, we'll need to be able to use a
 	 call-clobbered register for the address of the target function.
 	 Make sure that all such registers are not used for passing
-	 parameters.  Note that DLLIMPORT functions are indirect.  */
+	 parameters.  Note that DLLIMPORT functions and call via GOT
+	 slot are indirect.  */
   if (!decl
+	  || (flag_pic && !flag_plt)
 	  || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
 	{
 	  /* Check if regparm >= 3 since arg_reg_available is set to
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 49b2216..7c62586 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11865,6 +11865,27 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.
+(define_insn "*sibcall_GOT_32"
+  [(call (mem:QI
+	   (mem:SI (plus:SI
+		 (match_operand:SI 0 "register_no_elim_operand" "U")
+		 (const:SI
+		   (unspec:SI [(match_operand:SI 1 "symbol_operand")]
+			UNSPEC_GOT)
+	 (match_operand 2))]
+  "!TARGET_MACHO && !TARGET_64BIT && SIBLING_CALL_P (insn)"
+{
+  rtx fnaddr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[1]),
+			   UNSPEC_GOT);
+  fnaddr = gen_rtx_CONST (Pmode, fnaddr);
+  fnaddr = gen_rtx_PLUS (Pmode, operands[0], fnaddr);
+  fnaddr = gen_const_mem (Pmode, fnaddr);
+  return ix86_output_call_insn (insn, fnaddr);
+}
+  [(set_attr "type" "call")])
+
 (define_insn "*sibcall"
   [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "UBsBz"))
 	 (match_operand 1))]
@@ -12042,6 +12063,28 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.
+(define_insn "*sibcall_value_GOT_32"
+  [(set (match_operand 0)
+(call (mem:QI
+		(mem:SI (p

Re: [PATCH 5/5] Fix intransitive comparison in dr_group_sort_cmp

2015-12-17 Thread Richard Biener
On Thu, 17 Dec 2015, Yury Gribov wrote:

> On 12/17/2015 02:57 PM, Richard Biener wrote:
> > On Thu, 17 Dec 2015, Yury Gribov wrote:
> > 
> > > That's an interesting one. The original comparison function assumes that
> > > operand_equal_p(a,b) is true iff compare_tree(a, b) == 0.
> > > Unfortunately that's not true (functions are written by different
> > > authors).
> > > 
> > > This causes subtle violation of transitiveness.
> > > 
> > > I believe removing operand_equal_p should preserve the intended semantics
> > > (same approach taken in another comparison function in this file -
> > > comp_dr_with_seg_len_pair).
> > > 
> > > Cc-ing Cong Hou and Richard who are the authours.
> > 
> > I don't think the patch is good.  compare_tree really doesn't expect
> > equal elements (and it returning zero is bad or a bug).
> 
> Hm but that's how it's used in other comparator in this file
> (comp_dr_with_seg_len_pair).

But for sure

  switch (code)
{
/* For const values, we can just use hash values for comparisons.  */
case INTEGER_CST:
case REAL_CST:
case FIXED_CST:
case STRING_CST:
case COMPLEX_CST:
case VECTOR_CST:
  {
hashval_t h1 = iterative_hash_expr (t1, 0);
hashval_t h2 = iterative_hash_expr (t2, 0);
if (h1 != h2)
  return h1 < h2 ? -1 : 1;
break;
  }

doesn't detect un-equality correctly (it assumes the hash is 
collision-free).

Also note that operator== of dr_with_seg_len again also uses
operand_equal_p (plus compare_tree).

IMHO compare_tree should be cleaned up with respect to what
trees we expect here (no REAL_CSTs for example) and properly
do comparisons.

> > But it's also
> > "lazy" in that it will return 0 when it hopes a further disambiguation
> > inside dr_group_sort_cmp on a different field will eventually lead to
> > a non-zero compare_tree.
> > 
> > So eventually if compare_tree returns zero we have to fall back to the
> > final disambiguator using gimple_uid.
> >
> > That said, I'd like to see the testcase where you observe an
> > intransitive comparison.
> 
> Let me dig my debugging logs (I'll send detailed repro tomorrow).

Thanks.

Richard.


Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Andrey Belevantsev

On 17.12.2015 15:13, Yury Gribov wrote:

On 12/17/2015 02:58 PM, Andrey Belevantsev wrote:

Hello,

On 17.12.2015 11:58, Yury Gribov wrote:

Some obvious symmetry fixes.

Cc-ing
* Andrey (Belevantsev) for bb_top_order_comparator


Here, as Jakub mentioned, we assume that the argument addresses will
never be equal,


The problem is that this is not guaranteed.


Well, if the consensus is that this is indeed the case, you're free to 
change both places as you suggest.


Yours,
Andrey




thus that would always be different basic blocks (the
comparator is used for providing a custom sort over loop body bbs) and
you don't need a return 0 there.  You can put there gcc_unreachable
instead as in ...


* Andrew (MacLeod) for compare_case_labels
* Andrew (Pinski) for resort_field_decl_cmp
* Diego for pair_cmp
* Geoff for resort_method_name_cmp
* Jakub for compare_case_labels
* Jason for method_name_cmp
* Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
* Steven for cmp_v_in_regset_pool


... this case -- here gcc_unreachable () marks that we're sorting pool
pointers and their values are always different.  Please do not remove it.


Same here.

/Yury




Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Yury Gribov

On 12/17/2015 03:25 PM, Richard Biener wrote:

On Thu, 17 Dec 2015, Yury Gribov wrote:


On 12/17/2015 02:59 PM, Richard Biener wrote:

On Thu, 17 Dec 2015, Yury Gribov wrote:


On 12/17/2015 02:41 PM, Richard Biener wrote:

On Thu, 17 Dec 2015, Yury Gribov wrote:


Some obvious symmetry fixes.

Cc-ing
* Andrey (Belevantsev) for bb_top_order_comparator
* Andrew (MacLeod) for compare_case_labels
* Andrew (Pinski) for resort_field_decl_cmp
* Diego for pair_cmp
* Geoff for resort_method_name_cmp
* Jakub for compare_case_labels
* Jason for method_name_cmp
* Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
* Steven for cmp_v_in_regset_pool


So for compare_case_labels we only ever have one label with
!CASE_LOW - which means you only run into the case that needs
!CASE_LOW && !CASE_LOW if comparing an element with itself, correct?

In this case (missing "same element" handling rather than symmetry
fixing) I'd prefer a

if (case1 == case2)
  return 0;

So just to confirm - do the patches also contain same element
compare fixings?


Yes, that's a fix for same element.  How about adding if + gcc_assert that
both cases can't be NULL otherwise?


Well, does qsort require the compare function to result in zero
for same elements when the sequence to be sorted doesn't contain
duplicates?


Sure, that's part of total ordering requirement in standard.


If an assert works for you that hints at these places found via static
analysis rather than a runtime fuzzer?


Sorry, not sure I fully understood but - yes, adding assertion would typically
allow for better checking by static analyzers.


The question was if you actually observed the case to happen with a
testcase (and whatever mungled qsort implementation) or whether
it was a theoretical outcome computed by a static analyzer.

That is, whether you could hand me a testcase where it happens
or not.


Well, this was detected by calling qsort(x, x) and checking that return 
value is zero in qsort interceptor. So I guess it's more of 
"theoretical" sort.


/Yura


Re: [PATCH 5/5] Fix intransitive comparison in dr_group_sort_cmp

2015-12-17 Thread Yury Gribov

On 12/17/2015 02:57 PM, Richard Biener wrote:

On Thu, 17 Dec 2015, Yury Gribov wrote:


That's an interesting one. The original comparison function assumes that
operand_equal_p(a,b) is true iff compare_tree(a, b) == 0.
Unfortunately that's not true (functions are written by different authors).

This causes subtle violation of transitiveness.

I believe removing operand_equal_p should preserve the intended semantics
(same approach taken in another comparison function in this file -
comp_dr_with_seg_len_pair).

Cc-ing Cong Hou and Richard who are the authours.


I don't think the patch is good.  compare_tree really doesn't expect
equal elements (and it returning zero is bad or a bug).


Hm but that's how it's used in other comparator in this file 
(comp_dr_with_seg_len_pair).



But it's also
"lazy" in that it will return 0 when it hopes a further disambiguation
inside dr_group_sort_cmp on a different field will eventually lead to
a non-zero compare_tree.

So eventually if compare_tree returns zero we have to fall back to the
final disambiguator using gimple_uid.

>

That said, I'd like to see the testcase where you observe an
intransitive comparison.


Let me dig my debugging logs (I'll send detailed repro tomorrow).

/Yura



[PATCH] Fix PR68951

2015-12-17 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-17  Richard Biener  

PR tree-optimization/68951
* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
Ignore strided non-group accesses.

* gcc.dg/torture/pr68951.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 231745)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -1215,6 +1215,12 @@ vect_peeling_hash_get_lowest_cost (_vect
   && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
 continue;
 
+  /* Strided accesses perform only component accesses, alignment is
+ irrelevant for them.  */
+  if (STMT_VINFO_STRIDED_P (stmt_info)
+ && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
+   continue;
+
   save_misalignment = DR_MISALIGNMENT (dr);
   vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
   vect_get_data_access_cost (dr, &inside_cost, &outside_cost,
Index: gcc/testsuite/gcc.dg/torture/pr68951.c
===
--- gcc/testsuite/gcc.dg/torture/pr68951.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr68951.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+static int g_534[1][1];
+int fn1()
+{
+  int i;
+  for (i = 0; i < 4; i++)
+g_534[i + 2][i] ^= 3;
+  for (;;)
+;
+}


Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Richard Biener
On Thu, 17 Dec 2015, Yury Gribov wrote:

> On 12/17/2015 02:59 PM, Richard Biener wrote:
> > On Thu, 17 Dec 2015, Yury Gribov wrote:
> > 
> > > On 12/17/2015 02:41 PM, Richard Biener wrote:
> > > > On Thu, 17 Dec 2015, Yury Gribov wrote:
> > > > 
> > > > > Some obvious symmetry fixes.
> > > > > 
> > > > > Cc-ing
> > > > > * Andrey (Belevantsev) for bb_top_order_comparator
> > > > > * Andrew (MacLeod) for compare_case_labels
> > > > > * Andrew (Pinski) for resort_field_decl_cmp
> > > > > * Diego for pair_cmp
> > > > > * Geoff for resort_method_name_cmp
> > > > > * Jakub for compare_case_labels
> > > > > * Jason for method_name_cmp
> > > > > * Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
> > > > > * Steven for cmp_v_in_regset_pool
> > > > 
> > > > So for compare_case_labels we only ever have one label with
> > > > !CASE_LOW - which means you only run into the case that needs
> > > > !CASE_LOW && !CASE_LOW if comparing an element with itself, correct?
> > > > 
> > > > In this case (missing "same element" handling rather than symmetry
> > > > fixing) I'd prefer a
> > > > 
> > > >if (case1 == case2)
> > > >  return 0;
> > > > 
> > > > So just to confirm - do the patches also contain same element
> > > > compare fixings?
> > > 
> > > Yes, that's a fix for same element.  How about adding if + gcc_assert that
> > > both cases can't be NULL otherwise?
> > 
> > Well, does qsort require the compare function to result in zero
> > for same elements when the sequence to be sorted doesn't contain
> > duplicates?
> 
> Sure, that's part of total ordering requirement in standard.
> 
> > If an assert works for you that hints at these places found via static
> > analysis rather than a runtime fuzzer?
> 
> Sorry, not sure I fully understood but - yes, adding assertion would typically
> allow for better checking by static analyzers.

The question was if you actually observed the case to happen with a
testcase (and whatever mungled qsort implementation) or whether
it was a theoretical outcome computed by a static analyzer.

That is, whether you could hand me a testcase where it happens
or not.

Richard.


Re: [Fortran, Patch] (RFC, Coarray) Implement TS18508's EVENTS

2015-12-17 Thread Alessandro Fanfarillo
Hi,

I've noticed that this patch has been applied only on trunk and not on
the gcc-5-branch. Is it a problem to include EVENTS in gcc-5?

2015-12-02 23:00 GMT+01:00 Steve Kargl :
> Committed as revision 231208.
>
> Alessandro, Tobias, is this a candidate for a commit to
> the 5-branch when it is re-opened?
>
> --
> steve
>
> On Wed, Dec 02, 2015 at 03:16:05PM +0100, Alessandro Fanfarillo wrote:
>> *PING*
>>
>> 2015-11-26 17:51 GMT+01:00 Steve Kargl :
>> > On Wed, Nov 25, 2015 at 06:24:49PM +0100, Alessandro Fanfarillo wrote:
>> >> Dear all,
>> >>
>> >> in attachment the previous patch compatible with the current trunk.
>> >> The patch also includes the changes introduced in the latest TS 18508.
>> >>
>> >> Built and regtested on x86_64-pc-linux-gnu.
>> >>
>> >> PS: I will add the test cases in a different patch.
>> >>
>> >
>> > I have now built and regression tested the patch on
>> > x86_64-*-freebsd and i386-*-freebsd.  There were no
>> > regressions.  In reading through the patch, nothing
>> > jumped out at me as suspicious/wrong.  Tobias, this
>> > is OK to commit.  If you don't committed by Sunday,
>> > I'll do it for you.
>> >
>> > --
>> > steve
>
> --
> Steve


Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Yury Gribov

On 12/17/2015 02:59 PM, Richard Biener wrote:

On Thu, 17 Dec 2015, Yury Gribov wrote:


On 12/17/2015 02:41 PM, Richard Biener wrote:

On Thu, 17 Dec 2015, Yury Gribov wrote:


Some obvious symmetry fixes.

Cc-ing
* Andrey (Belevantsev) for bb_top_order_comparator
* Andrew (MacLeod) for compare_case_labels
* Andrew (Pinski) for resort_field_decl_cmp
* Diego for pair_cmp
* Geoff for resort_method_name_cmp
* Jakub for compare_case_labels
* Jason for method_name_cmp
* Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
* Steven for cmp_v_in_regset_pool


So for compare_case_labels we only ever have one label with
!CASE_LOW - which means you only run into the case that needs
!CASE_LOW && !CASE_LOW if comparing an element with itself, correct?

In this case (missing "same element" handling rather than symmetry
fixing) I'd prefer a

   if (case1 == case2)
 return 0;

So just to confirm - do the patches also contain same element
compare fixings?


Yes, that's a fix for same element.  How about adding if + gcc_assert that
both cases can't be NULL otherwise?


Well, does qsort require the compare function to result in zero
for same elements when the sequence to be sorted doesn't contain
duplicates?


Sure, that's part of total ordering requirement in standard.


If an assert works for you that hints at these places found via static
analysis rather than a runtime fuzzer?


Sorry, not sure I fully understood but - yes, adding assertion would 
typically allow for better checking by static analyzers.


/Yura



Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Yury Gribov

On 12/17/2015 02:58 PM, Andrey Belevantsev wrote:

Hello,

On 17.12.2015 11:58, Yury Gribov wrote:

Some obvious symmetry fixes.

Cc-ing
* Andrey (Belevantsev) for bb_top_order_comparator


Here, as Jakub mentioned, we assume that the argument addresses will
never be equal,


The problem is that this is not guaranteed.


thus that would always be different basic blocks (the
comparator is used for providing a custom sort over loop body bbs) and
you don't need a return 0 there.  You can put there gcc_unreachable
instead as in ...


* Andrew (MacLeod) for compare_case_labels
* Andrew (Pinski) for resort_field_decl_cmp
* Diego for pair_cmp
* Geoff for resort_method_name_cmp
* Jakub for compare_case_labels
* Jason for method_name_cmp
* Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
* Steven for cmp_v_in_regset_pool


... this case -- here gcc_unreachable () marks that we're sorting pool
pointers and their values are always different.  Please do not remove it.


Same here.

/Yury


[RFC] Use gfc_decl_attributes in fortran frontend

2015-12-17 Thread Tom de Vries

Hi,

Consider this patch, which reduces max_len of the oacc function 
attribute to 0:

...
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 8556b70..60f4ad3 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -93,7 +93,7 @@ static const struct attribute_spec gfc_attribute_table[] =
affects_type_identity } */
   { "omp declare target", 0, 0, true,  false, false,
 gfc_handle_omp_declare_target_attribute, false },
-  { "oacc function", 0, -1, true,  false, false,
+  { "oacc function", 0, 0, true,  false, false,
 gfc_handle_omp_declare_target_attribute, false },
   { NULL,0, 0, false, false, false, NULL, false }
 };
...

The patch is obviously incorrect, but the idea here is to try to trigger 
this error in decl_attributes:

...
  else if (list_length (args) < spec->min_length
   || (spec->max_length >= 0
   && list_length (args) > spec->max_length))
{
  error ("wrong number of arguments specified for %qE"
 " attribute",
 name);
...

When running goacc.exp=routine-4.f90, we trigger the error, but then run 
into an assert.


The assert is caused by the fact that %qE is not handled by the fortran 
format decoder gfc_format_decoder, so this assert triggers in pp_format:

...
ok = pp_format_decoder (pp) (pp, text, p,
 precision, wide, plus, hash);
gcc_assert (ok);
...


So, it seems that we call decl_attributes from the fortran frontend 
without installing a format decoder that can handle any potential errors.


This patch attempts to fix that, but having little experience in both 
diagnostics and fortran frontend, I'm not sure if this is the right way.


After applying the patch, the assert is fixed and we can see the actual 
error without having to start up the debugger:

...
src/gcc/testsuite/gfortran.dg/goacc/routine-4.f90:121:0: Error: wrong 
number of arguments specified for ‘oacc function’ attribute

...

Thanks,
- Tom
Use gfc_decl_attributes in fortran frontend

---
 gcc/fortran/error.c  | 18 --
 gcc/fortran/gfortran.h   |  2 ++
 gcc/fortran/trans-decl.c | 18 ++
 3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c
index 8f57aff..fd66d75 100644
--- a/gcc/fortran/error.c
+++ b/gcc/fortran/error.c
@@ -1417,11 +1417,18 @@ gfc_errors_to_warnings (bool f)
 }
 
 void
-gfc_diagnostics_init (void)
+gfc_diagnostics_fortran (void)
 {
   diagnostic_starter (global_dc) = gfc_diagnostic_starter;
   diagnostic_finalizer (global_dc) = gfc_diagnostic_finalizer;
   diagnostic_format_decoder (global_dc) = gfc_format_decoder;
+}
+
+void
+gfc_diagnostics_init (void)
+{
+  gfc_diagnostics_fortran ();
+
   global_dc->caret_chars[0] = '1';
   global_dc->caret_chars[1] = '2';
   pp_warning_buffer = new (XNEW (output_buffer)) output_buffer ();
@@ -1433,13 +1440,20 @@ gfc_diagnostics_init (void)
 }
 
 void
-gfc_diagnostics_finish (void)
+gfc_diagnostics_tree (void)
 {
   tree_diagnostics_defaults (global_dc);
   /* We still want to use the gfc starter and finalizer, not the tree
  defaults.  */
   diagnostic_starter (global_dc) = gfc_diagnostic_starter;
   diagnostic_finalizer (global_dc) = gfc_diagnostic_finalizer;
+}
+
+void
+gfc_diagnostics_finish (void)
+{
+  gfc_diagnostics_tree ();
+
   global_dc->caret_chars[0] = '^';
   global_dc->caret_chars[1] = '^';
 }
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index d203c32..1f7cdc2 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2687,6 +2687,8 @@ bool gfc_find_sym_in_expr (gfc_symbol *, gfc_expr *);
 void gfc_error_init_1 (void);
 void gfc_diagnostics_init (void);
 void gfc_diagnostics_finish (void);
+void gfc_diagnostics_fortran (void);
+void gfc_diagnostics_tree (void);
 void gfc_buffer_error (bool);
 
 const char *gfc_print_wide_char (gfc_char_t);
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 8c4fa03..9ed1d07 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -1326,6 +1326,16 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
 }
 
 
+static tree
+gfc_decl_attributes (tree *node, tree attributes, int flags)
+{
+  tree res;
+  gfc_diagnostics_tree ();
+  res = decl_attributes (node, attributes, flags);
+  gfc_diagnostics_fortran ();
+  return res;
+}
+
 static void build_function_decl (gfc_symbol * sym, bool global);
 
 
@@ -1567,7 +1577,7 @@ gfc_get_symbol_decl (gfc_symbol * sym)
 
   /* Add attributes to variables.  Functions are handled elsewhere.  */
   attributes = add_attributes_to_decl (sym->attr, NULL_TREE);
-  decl_attributes (&decl, attributes, 0);
+  gfc_decl_attributes (&decl, attributes, 0);
 
   /* Symbols from modules should have their assembler names mangled.
  This is done here rather than in gfc_finish_var_decl because it
@@ -1802,7 +1812,7 @@ get_proc_pointer_decl (gfc_symbol *sym)
 set

Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Yury Gribov

On 12/17/2015 02:39 PM, Jakub Jelinek wrote:

On Thu, Dec 17, 2015 at 11:58:30AM +0300, Yury Gribov wrote:

2015-12-17  Yury Gribov  

* c-family/c-common.c (resort_field_decl_cmp):
Make symmteric.
* cp/class.c (method_name_cmp): Ditto.
(resort_method_name_cmp): Ditto.
* fortran/interface.c (pair_cmp): Ditto.


Note, c-family, cp and fortran have their own ChangeLog files, so
the entries without those prefixes need to go into each one and can't
refer to other ChangeLog through Ditto/Likewise etc.
Typo in symmteric.


Right, thanks.


That said, is this actually really a problem?  I mean, is qsort
allowed to call the comparison function with the same arguments?
I think lots of the comparison functions just assume that
for int cmpfn (const void *x, const void *y) x != y.
And if qsort can't call the comparison function with the same argument,
then perhaps the caller has some knowledge your checker does not, say
that the entries that would compare equal by the comparison function
simply can't appear in the array (so the caller knows that the comparison
function should never return 0).


Self-comparisons are certainly less dangerous than transitive ones. I 
personally not aware about libc's which can compare element to itself.


However
* comparing an element to itself still a valid thing for qsort to do
* most other comparison functions in GCC support this


--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5882,7 +5882,9 @@ compare_case_labels (const void *p1, const void *p2)
else if (idx1 == idx2)
  {
/* Make sure the default label is first in a group.  */
-  if (!CASE_LOW (ci1->expr))
+  if (!CASE_LOW (ci1->expr) && !CASE_LOW (ci2->expr))
+   return 0;
+  else if (!CASE_LOW (ci1->expr))
return -1;
else if (!CASE_LOW (ci2->expr))
return 1;
--
1.9.1


Say here, we know there is at most one default label in a switch, never
more.  So, unless qsort is allowed to call compare_case_labels
with p1 == p2 (which really doesn't make sense), this case just won't
happen.




  1   2   >