date:20170926

[PR middle-end/82319] Fix ICE in pattern

2017-09-26 Thread Yuri Gribov

Hi all,

This patch fixes a trivial ICE in recent pattern.  Bootstrapped and
regtested on x86_64.

Ok to commit?

-Y


pr82319-1.patch
Description: Binary data

Re: 0005-Part-5.-Add-x86-CET-documentation

2017-09-26 Thread Sandra Loosemore


On 09/26/2017 07:47 AM, Tsimbalist, Igor V wrote:

Here is a new version of the patch.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index a374890..a900ed1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5655,6 +5655,13 @@ compiled with the @option{-fcf-protection=branch} 
option.  The
 compiler assumes that the function's address is a valid target for a
 control-flow transfer.

+@emph{x86 implementation:} when @option{-fcf-protection} option is
+specified the compiler inserts an ENDBR instruction at function's
+prologue if the function's type does not have the @code{nocf_check}
+attribute and addresses to which indirect control-flow transfer can
+happen.  The instruction triggers the HW check if a control-flow
+transfer to the address of ENDBR instruction is valid.


Implementation details like this should be comments in the code, not 
included in the user-facing documentation.



@@ -5662,7 +5669,8 @@ not be instrumented when compiled with the
 that the function's address from the pointer is a valid target for
 a control-flow transfer.  A direct function call through a function
 name is assumed to be a safe call thus direct calls are not
-instrumented by the compiler.
+instrumented by the compiler.  For @emph{x86 implementation} the
+compiler inserts a NOTRACK prefix before an indirect call instruction.


Likewise here.


@@ -21217,6 +21225,25 @@ void __builtin_ia32_wrpkru (unsigned int)
 unsigned int __builtin_ia32_rdpkru ()
 @end smallexample

+The following built-in functions are available when @option{-mcet} is used.
+They are used to support Intel Control-flow Enforcment Technology (CET).
+Each built-in function generate a machine instruction that is part of the


s/generate a/generates the/


@@ -11378,6 +11379,20 @@ You can also use the @code{nocf_check} attribute to 
identify
 which functions and calls should be skipped from instrumentation
 (@pxref{Function Attributes}).

+Currently x86 GNU/Linux target provides an implementation based on


s/x86/the x86/


+Intel Control-flow Enforcement Technology (CET), thus @option{-mcet}


s/@option/the @option/


+option is required to enable this feature.


I think you should put a cross-reference to the x86 options node here, 
and move all the following x86-specific discussion to that section.



In order to get an
+application to be CET compatible the x86 implementation requires
+all object files have to be compiled with
+@option{-fcf-protection} option and all linked in libraries have
+to be CET compatible.


I'm having difficulty parsing this.  What does "CET compatible" mean? 
Is this an ABI compatibility issue, so that all objects linked into the 
executable have to be compiled with the (same?) @option{-fcf-protection} 
option if any of them do?  Or do you just lose checking on code in 
uninstrumented objects?



+Instrumentation for x86 is controlled by target specific options


hyphenate target-specific here


+@option{-mcet}, @option{-mibt} and @option{-mshstk}. The compiler
+also provides a number of built-in functions for fine-grained control
+of CET-based implementation.  See @xref{x86 Built-in Functions},
+for more information.
+
 @item -fstack-protector
 @opindex fstack-protector
 Emit extra code to check for buffer overflows, such as stack smashing
@@ -25755,15 +25770,19 @@ preferred alignment to 
@option{-mpreferred-stack-boundary=2}.
 @need 200
 @itemx -mclzero
 @opindex mclzero
+@need 200
 @itemx -mpku
 @opindex mpku
+@need 200
+@itemx -mcet
+@opindex mcet
 These switches enable the use of instructions in the MMX, SSE,
 SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD,
 SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM,
 AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA AVX512VBMI, BMI, BMI2, FXSR,
-XSAVE, XSAVEOPT, LZCNT, RTM, MPX, MWAITX, PKU, 3DNow!@: or enhanced 3DNow!@:
-extended instruction sets.  Each has a corresponding @option{-mno-} option
-to disable use of these instructions.
+XSAVE, XSAVEOPT, LZCNT, RTM, MPX, MWAITX, PKU, IBT, SHSTK,
+3DNow!@: or enhanced 3DNow!@: extended instruction sets.  Each has a
+corresponding @option{-mno-} option to disable use of these instructions.

 These extensions are also available as built-in functions: see
 @ref{x86 Built-in Functions}, for details of the functions enabled and
@@ -25783,6 +25802,11 @@ supported architecture, using the appropriate flags.  
In particular,
 the file containing the CPU detection code should be compiled without
 these options.

+The @option{-mcet} option turns on @option{-mibt} and @option{-mshstk}


s/turns on/turns on the/


+options.  @option{-mibt} option enables idirect branch tracking support


s/@option/The @option/
s/idirect/indirect/


+and @option{-mshstk} option enables shadow stack support from


s/@option/the @option/


+Intel Control-flow Enforcement Technology (CET).
+
 @item -mdump-tune-features
 @opindex mdump-tune-features
 This option instructs GCC to dump the names of the x86

Re: 0002-Part-2.-Document-finstrument-control-flow-and-notrack attribute

2017-09-26 Thread Sandra Loosemore


On 09/26/2017 07:45 AM, Tsimbalist, Igor V wrote:

Here is the updated version (version#3). All comments below are fixed.


This still needs more work.  Specific comments below:


+The @code{nocf_check} attribute is applied to an object's type.
+In case of assignment of a function address or a function pointer to
+another pointer, the attribute is not carried over from the right-hand
+object's type, the type of left-hand object stays unchanged.  The


s/object's type,/object's type;/


@@ -11348,6 +11349,35 @@ is used to link a program, the GCC driver 
automatically links
 against @file{libmpxwrappers}.  See also @option{-static-libmpxwrappers}.
 Enabled by default.

+@item -fcf-protection==@r{[}full@r{|}branch@r{|}return@r{|}none@r{]}
+@opindex fcf-protection
+Enable code instrumentation of control-flow transfers to increase
+program security by checking that target addresses of control-flow
+transfer instructions (such as indirect function call, function return,
+indirect jump) are valid.  This prevents diverting the control
+flow instructions from its original target address to a new undesigned


s/control flow instructions/control-flow instructions/

I'd rewrite the next sentence as

This prevents diverting the flow of control to an unexpected target.


+target.  This is intended to protect against such threats as
+Return-oriented Programming (ROP), and similarly call/jmp-oriented
+programming (COP/JOP).
+
+Each compiler target, which is going to support the control-flow
+instrumentation, is supposed to have its own target specific
+implementation. For all targets where an implementation is absent the
+usage of @option{-fcf-protection} option causes an error message.


I would really prefer that you list the targets this works on here instead.


+The value @code{branch} tells the compiler to implement checking of
+validity of control-flow transfer at the point of indirect branch
+instructions, i.e. call/jmp instructions.  The value @code{return}
+implements checking of validity at the point of returning from a
+function.  The value @code{full} is an alias for specifying both
+@code{branch} and @code{return}. The value @code{none} turns off
+instrumentation.  This value may be used for future architectures
+where @option{-fcf-protection} option is switched on by default.


I don't think we need to document GCC's future behavior for future 
architectures (I'm always going around removing useless discussion from 
20 years ago of possible extensions that never got implemented).  I 
assume that this is just provided for completeness and to override a 
previous -fcf-protection option on the command line.



+You can also use the @code{nocf_check} attribute to identify
+which functions and calls should be skipped from instrumentation
+(@pxref{Function Attributes}).
+
 @item -fstack-protector
 @opindex fstack-protector
 Emit extra code to check for buffer overflows, such as stack smashing
diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 12355c2..b4fc5f3 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -4040,6 +4040,22 @@ is used in place of the actual insn pattern.  This is 
done in cases where
 the pattern is either complex or misleading.
 @end table

+The note @code{REG_CALL_NOCF_CHECK} is used in conjunction with the
+@option{-fcf-protection=branch} option.  The note is set if a
+@code{nocf_check} attribute is specified for a function type or a
+pointer to function type.  The note is stored in the @code{REG_NOTES}
+field of an insn.
+
+@table @code
+@findex REG_CALL_NOCF_CHECK
+@item REG_CALL_NOCF_CHECK
+A user has a control through the @code{nocf_check} attribute to identify


S/A user has a control/Users have control/


+which call to a function should be skipped from control-flow instrumentation


s/call/calls/


+when the option @option{-fcf-protection=branch} is specified.  The compiler
+puts a @code{REG_CALL_NOCF_CHECK} note on @code{CALL_INSN} instruction,
+which has a function type marked with a @code{nocf_check} attribute.


s/@code{CALL_INSN} instruction, which/each @code{CALL_INSN} instruction 
that/


-Sandra

Re: [RFC] propagate malloc attribute in ipa-pure-const pass

2017-09-26 Thread Prathamesh Kulkarni

On 25 September 2017 at 17:24, Jan Hubicka  wrote:
>> Hi Honza,
>> Could you please have a look at this patch ?
>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html
>
> I can and I should have done long time ago. I really apologize for slow 
> response
> and I will try to be more timely from now on. The reason was that I had some
> patches that I was thinking I would like to push out first, but I guess since
> they are still not ready it is better to go other way around.
No worries, and thanks for the feedback!
>
> +/* A map from node to subset of callees. The subset contains those callees
> + * whose return-value is returned by the node. */
> +static hash_map< cgraph_node *, vec* > *return_callees_map;
>
> Extra * at the beggining of line.  It would make more sense to put those
> and the other bits into function_summary rather than using the hooks
> but that is something we co do incrementally.
>
> I wonder what happens here when, say, ipa-icf redirect the call to eqivaelnt
> function and removes the callee?  Perhaps we realy want to have set of call
> sites rahter than nodes stored from analysis to execution. Call sites have
> unique stmts and uids, so it will be possible to map them back and forth.
IIUC, call site is represented using cgraph_edge ?
So change return_callees_map to be the mapping from node to subset of
it's call-sites where
node returns the value of one of it's callees:
static hash_map< cgraph_node *, vec *> *return_callees_map; ?
>
> +static bool
> +check_retval_uses (tree retval, gimple *stmt)
> +{
>
> there is missing toplevel comment on those.
>
> +/*
> + * Currently this function does a very conservative analysis to check if
> + * function could be a malloc candidate.
> + *
> + * The function is considered to be a candidate if
> + * 1) The function returns a value of pointer type.
> + * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or
> + *a phi, and element of phi is either NULL or
> + *SSA_NAME_DEF_STMT(element) is function call.
> + * 3) The return-value has immediate uses only within comparisons (gcond or 
> gassign)
> + *and return_stmt (and likewise a phi arg has immediate use only within 
> comparison
> + *or the phi stmt).
> + */
>
> Now * in begginig of lines. Theoretically by coding standards the comment
> should start with description of what function does and what are the 
> parameters.
> I believe Richi already commented on this part - which is more of his domain,
> but it seems fine to me.
>
> Pehraps with -details dump it would be nice to dump reason why the malloc
> candidate was rejected.
>
> +DEBUG_FUNCTION
> +static void
> +dump_malloc_lattice (FILE *dump_file, const char *s)
>
> +static void
> +propagate_malloc (void)
>
> For coding standards, please add block comments.
Thanks for the suggestions, I will try to address them in the next
version of the patch.

Regards,
Prathamesh
>
> With these changes the patch looks good to me!
> Honza
>
>>
>> I tested it with SPEC2006 on AArch64 Cortex-a57 processor and saw some
>> improvement for
>> 433.milc (+1.79%), 437.leslie3d (+2.84%) and 470.lbm (+4%) and not
>> much differences for other benchmarks.
>> I don't expect them to be precise though, it was run with only one
>> iteration of SPEC.
>> Thanks!
>>
>> Regards,
>> Prathamesh
>> >
>> > Thanks,
>> > Prathamesh
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >>>
>> >>> Thanks,
>> >>> Prathamesh
>> 
>>  Thanks,
>>  Prathamesh
>> >
>> > Regards,
>> > Prathamesh
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >>>
>> >>> Honza

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #7 of 8

2017-09-26 Thread Michael Meissner

On Tue, Sep 26, 2017 at 04:56:54PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote:
> > * config/rs6000/vsx.md (peephole for optimizing move SF to GPR):
> > Adjust code to eliminate needing to do the shift right 32-bits
> > operation after XSCVDPSPN.
> 
> After staring at this way too long...  Looks correct.  What a monster :-)
> 
> Okay for trunk.  Thanks!

Thanks for taking the time to verify it.

Yeah, it is a monster to get right.  It would be nice to put this off to a
separate MD pass, instead of abusing peephole2's.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[committed] Fix PR 39570 - cabs/cabsf named differently on NetBSD

2017-09-26 Thread Krister Walfridsson


I have committed the attached patch to fix PR 39570.

The problem is that the NetBSD cabs/cabsf/cabsl funcions are called 
__c99_cabs etc. as NetBSD needed to change the ABI before it had symbol 
versioning. This is handled in the system header file as

   double cabs(double _Complex) __asm("__c99_cabs");
but __builtin_cabs still generates a call to cabs (which fails much of the 
fortran testsuite).


I have fixed this by using SUBTARGET_INIT_BUILTINS in the same way as 
Darwin is solving a similar problem.



Bootstrapped and tested on i386-unknown-netbsdelf6.1 and
x86_64-unknown-netbsd6.1.

 /Krister


2017-09-26  Krister Walfridsson  

PR target/39570
* gcc/config/netbsd-protos.h: New file.
* gcc/config/netbsd.c: New file.
* gcc/config/netbsd.h (SUBTARGET_INIT_BUILTINS): Define.
* gcc/config/t-netbsd: New file.
* gcc/config.gcc (tm_p_file): Add netbsd-protos.h.
(tmake_file) Add t-netbsd.
(extra_objs) Add netbsd.o.Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 253215)
+++ gcc/config.gcc  (revision 253216)
@@ -792,7 +792,9 @@
   target_has_targetcm=yes
   ;;
 *-*-netbsd*)
-  tmake_file="t-slibgcc"
+  tm_p_file="${tm_p_file} netbsd-protos.h"
+  tmake_file="t-netbsd t-slibgcc"
+  extra_objs="${extra_objs} netbsd.o"
   gas=yes
   gnu_ld=yes
   use_gcc_stdint=wrap
Index: gcc/config/t-netbsd
===
--- gcc/config/t-netbsd (nonexistent)
+++ gcc/config/t-netbsd (revision 253216)
@@ -0,0 +1,21 @@
+# Copyright (C) 2017 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+netbsd.o: $(srcdir)/config/netbsd.c
+   $(COMPILE) $<
+   $(POSTCOMPILE)
Index: gcc/config/netbsd.c
===
--- gcc/config/netbsd.c (nonexistent)
+++ gcc/config/netbsd.c (revision 253216)
@@ -0,0 +1,54 @@
+/* Functions for generic NetBSD as target machine for GNU C compiler.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "varasm.h"
+#include "netbsd-protos.h"
+
+static void
+netbsd_patch_builtin (enum built_in_function fncode)
+{
+  tree fn = builtin_decl_explicit (fncode);
+  tree sym;
+  char *newname;
+
+  if (!fn)
+return;
+
+  sym = DECL_ASSEMBLER_NAME (fn);
+  newname = ACONCAT (("__c99_", IDENTIFIER_POINTER (sym), NULL));
+
+  set_user_assembler_name (fn, newname);
+
+  fn = builtin_decl_implicit (fncode);
+  if (fn)
+set_user_assembler_name (fn, newname);
+}
+
+void
+netbsd_patch_builtins (void)
+{
+  netbsd_patch_builtin (BUILT_IN_CABSF);
+  netbsd_patch_builtin (BUILT_IN_CABS);
+  netbsd_patch_builtin (BUILT_IN_CABSL);
+}
Index: gcc/config/netbsd.h
===
--- gcc/config/netbsd.h (revision 253215)
+++ gcc/config/netbsd.h (revision 253216)
@@ -164,3 +164,9 @@
 
 #undef WINT_TYPE
 #define WINT_TYPE "int"
+
+#undef  SUBTARGET_INIT_BUILTINS
+#define SUBTARGET_INIT_BUILTINS
\
+  do { \
+netbsd_patch_builtins ();  \
+  } while(0)
Index: gcc/config/netbsd-protos.h
===
--- gcc/config/netbsd-protos.h  (nonexistent)
+++ gcc/config/netbsd-protos.h  (revision 253216)
@@ -0,0 +1,20 @@
+/* Prototypes.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #7 of 8

2017-09-26 Thread Segher Boessenkool

On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote:
>   * config/rs6000/vsx.md (peephole for optimizing move SF to GPR):
>   Adjust code to eliminate needing to do the shift right 32-bits
>   operation after XSCVDPSPN.

After staring at this way too long...  Looks correct.  What a monster :-)

Okay for trunk.  Thanks!


Segher

[PATCH] C++: show location of problematic extern "C" specifications

2017-09-26 Thread David Malcolm

There are a few places where the C++ FE will complain when attempting
to do things within an extern "C" linkage specifier.

I've run into problems where it wasn't clear where the pertinent
extern "C" was; for example, when failing to close an extern "C" linkage
specifier in a header, leading to "template with C linkage" errors in
a different source file.

As of r251026 there will be a message highlighting the unclosed '{', but
this may be hard to spot at the very end of the errors.

This patch adds a note to the various diagnostics that complain
about C linkage, showing the user where the extern "C" specification
began.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/cp/ChangeLog:
* cp-tree.h (struct saved_scope): Add "location" field.
(maybe_show_extern_c_location): New decl.
* decl.c (grokfndecl): When complaining about literal operators
with C linkage, issue a note giving the location of the
extern "C".
* parser.c (cp_parser_linkage_specification): Store the location
of the "extern" token within the scope_chain.
(maybe_show_extern_c_location): New function.
(cp_parser_explicit_specialization): When complaining about
template specializations with C linkage, issue a note giving the
location of the extern "C".
(cp_parser_explicit_template_declaration): Likewise for templates.

gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/udlit-extern-c.C: New test case.
* g++.dg/diagnostic/unclosed-extern-c.C: Add example of a template
erroneously covered by an unclosed extern "C".
* g++.dg/template/extern-c.C: New test case.
---
 gcc/cp/cp-tree.h   |  3 ++
 gcc/cp/decl.c  |  1 +
 gcc/cp/parser.c| 19 ++-
 gcc/testsuite/g++.dg/cpp0x/udlit-extern-c.C|  7 
 .../g++.dg/diagnostic/unclosed-extern-c.C  | 11 +-
 gcc/testsuite/g++.dg/template/extern-c.C   | 39 ++
 6 files changed, 78 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extern-c.C
 create mode 100644 gcc/testsuite/g++.dg/template/extern-c.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index e508598..762cc7b 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1568,6 +1568,8 @@ struct GTY(()) saved_scope {
   hash_map *GTY((skip)) x_local_specializations;
 
   struct saved_scope *prev;
+
+  location_t location;
 };
 
 extern GTY(()) struct saved_scope *scope_chain;
@@ -6352,6 +6354,7 @@ extern bool parsing_nsdmi (void);
 extern bool parsing_default_capturing_generic_lambda_in_template (void);
 extern void inject_this_parameter (tree, cp_cv_quals);
 extern location_t defarg_location (tree);
+extern void maybe_show_extern_c_location (void);
 
 /* in pt.c */
 extern bool check_template_shadow  (tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 50fa1ba..d08ac9a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -8724,6 +8724,7 @@ grokfndecl (tree ctype,
   if (DECL_LANGUAGE (decl) == lang_c)
{
  error ("literal operator with C linkage");
+ maybe_show_extern_c_location ();
  return NULL_TREE;
}
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d831d66..b90f40d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -13823,7 +13823,8 @@ cp_parser_linkage_specification (cp_parser* parser)
   tree linkage;
 
   /* Look for the `extern' keyword.  */
-  cp_parser_require_keyword (parser, RID_EXTERN, RT_EXTERN);
+  cp_token *extern_token
+= cp_parser_require_keyword (parser, RID_EXTERN, RT_EXTERN);
 
   /* Look for the string-literal.  */
   linkage = cp_parser_string_literal (parser, false, false);
@@ -13843,6 +13844,7 @@ cp_parser_linkage_specification (cp_parser* parser)
 
   /* We're now using the new linkage.  */
   push_lang_context (linkage);
+  scope_chain->location = extern_token->location;
 
   /* If the next token is a `{', then we're using the first
  production.  */
@@ -16589,6 +16591,19 @@ cp_parser_explicit_instantiation (cp_parser* parser)
   timevar_pop (TV_TEMPLATE_INST);
 }
 
+/* Helper function for diagnostics that have complained about things
+   being used with 'extern "C"' linkage.
+
+   Attempt to issue a note showing where the 'extern "C"' linkage began.  */
+
+void
+maybe_show_extern_c_location (void)
+{
+  if (scope_chain->location != UNKNOWN_LOCATION)
+inform (scope_chain->location, "% linkage started here");
+}
+
+
 /* Parse an explicit-specialization.
 
explicit-specialization:
@@ -16623,6 +16638,7 @@ cp_parser_explicit_specialization (cp_parser* parser)
   if (current_lang_name == lang_name_c)
 {
   error_at (token->location, "template specialization with C linkage");
+  maybe_show_extern_c_location ();
   /* Give it C++ linkage to avoid confusing other parts of the
 front

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-09-26 Thread James Greenhalgh

On Mon, Sep 25, 2017 at 11:13:57AM +0100, Sudi Das wrote:
> 
> Hi James
> 
> I put aarch64_output_simd_general_immediate looking at the similarities of
> the immediates for mov/mvni and orr/bic. The CHECK macro in
> aarch64_simd_valid_immediate both checks
> and converts the immediates in a manner that are needed for the instructions.
> 
> Having said that, I agree that maybe I could have refactored
> aarch64_output_simd_mov_immediate to do the work rather than creating a new
> functions to do similar things. I have done so in this patch.

Thanks, this looks much neater.

> I have also changed the names of the enum simd_immediate_check to be better
> indicative of what they are doing. 

Thanks, I'd tweak them to look more like the bitmasks you use them as, but
that is a small change for my personal preference.

> Lastly I have added more cases in the tests (according to all the possible
> CHECKs) and made them dg-do assemble (although I had to add --save-temps so
> that the scan-assembler would work). Do you think I should not put that
> option and rather create separate tests?

This is good - thanks.

I think clean up the enum definitions and this patch will be good.

> @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result
>AARCH64_PARSE_INVALID_ARG  /* Invalid arch, tune, cpu arg.  */
>  };
>  
> +/* Enum to distinguish which type of check is to be done in
> +   aarch64_simd_valid_immediate.  This is used as a bitmask where
> +   AARCH64_CHECK_MOV has both bits set.  Thus AARCH64_CHECK_MOV will
> +   perform all checks.  Adding new types would require changes accordingly.  
> */
> +enum simd_immediate_check {
> +  AARCH64_CHECK_ORR  = 1,/* Perform immediate checks for ORR.  */
> +  AARCH64_CHECK_BIC  = 2,/* Perform immediate checks for BIC.  */
> +  AARCH64_CHECK_MOV  = 3 /* Perform all checks (used for MOVI/MNVI).  */

These are used in bit-mask style, so how about:

  AARCH64_CHECK_ORR = 1 << 0,
  AARCH64_CHECK_BIC = 1 << 1,
  AARCH64_CHECK_MOV = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC

Which is more self-documenting.

> @@ -13001,7 +13013,8 @@ aarch64_float_const_representable_p (rtx x)
>  char*
>  aarch64_output_simd_mov_immediate (rtx const_vector,
>  machine_mode mode,
> -unsigned width)
> +unsigned width,
> +enum simd_immediate_check which)

This function is sorely missing a comment explaining the parameters - it
would be very helpful if you could add one as part of this patch.

Thanks,
James

Re: [PATCH] [ARC][ZOL] Account for empty body loops

2017-09-26 Thread Andrew Burgess

* Claudiu Zissulescu  [2017-09-01 14:32:10 
+0200]:

> From: claziss 
> 
> Hi Andrew,
> 
> By mistake I've pushed an incoplete ZOL-rework patch, and it missing the 
> attached parts. Please can you check if it is ok?
> 
> Thank you,
> Claudiu
> 
> gcc/
> 2017-09-01  Claudiu Zissulescu 
> 
>   * config/arc/arc.c (hwloop_optimize): Account for empty
>   body loops.

Looks good to me.

Thanks,
Andrew


> 
> testsuite/
> 2017-09-01  Claudiu Zissulescu 
> 
>   * gcc.target/arc/loop-1.c: Add test.
> ---
>  gcc/config/arc/arc.c  | 13 +++--
>  gcc/testsuite/gcc.target/arc/loop-1.c | 12 
>  2 files changed, 23 insertions(+), 2 deletions(-)
>  create mode 100755 gcc/testsuite/gcc.target/arc/loop-1.c
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 52a9b24..d519063 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -7240,6 +7240,12 @@ hwloop_optimize (hwloop_info loop)
>   fprintf (dump_file, ";; loop %d too long\n", loop->loop_no);
>return false;
>  }
> +  else if (!loop->length)
> +{
> +  if (dump_file)
> + fprintf (dump_file, ";; loop %d is empty\n", loop->loop_no);
> +  return false;
> +}
>  
>/* Check if we use a register or not.  */
>if (!REG_P (loop->iter_reg))
> @@ -7311,8 +7317,11 @@ hwloop_optimize (hwloop_info loop)
>&& INSN_P (last_insn)
>&& (JUMP_P (last_insn) || CALL_P (last_insn)
> || GET_CODE (PATTERN (last_insn)) == SEQUENCE
> -   || get_attr_type (last_insn) == TYPE_BRCC
> -   || get_attr_type (last_insn) == TYPE_BRCC_NO_DELAY_SLOT))
> +   /* At this stage we can have (insn (clobber (mem:BLK
> +  (reg instructions, ignore them.  */
> +   || (GET_CODE (PATTERN (last_insn)) != CLOBBER
> +   && (get_attr_type (last_insn) == TYPE_BRCC
> +   || get_attr_type (last_insn) == TYPE_BRCC_NO_DELAY_SLOT
>  {
>if (loop->length + 2 > ARC_MAX_LOOP_LENGTH)
>   {
> diff --git a/gcc/testsuite/gcc.target/arc/loop-1.c 
> b/gcc/testsuite/gcc.target/arc/loop-1.c
> new file mode 100755
> index 000..274bb46
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/loop-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Check how we handle empty body loops.  */
> +
> +int a;
> +void fn1(void) {
> +  int i;
> +  for (; i < 8; i++) {
> +double A[a];
> +  }
> +}
> -- 
> 1.9.1
>

Re: Remove non-GAS non-ELF support in alpha backend

2017-09-26 Thread Bernhard Reutner-Fischer

On Wed, Mar 14, 2012 at 07:02:33PM +0100, Rainer Orth wrote:
> Richard Henderson  writes:
> 
> > On 03/14/12 09:09, Rainer Orth wrote:
> >
> > Nearly ok.
> >
> >> +  targetm.asm_file_start_file_directive = 0;
> >
> > This is default and may be deleted.
> 
> Or would be if alpha.c didn't override the default.
> 
> This is what I actually committed after retesting.

Looking at ASM_OUTPUT_SOURCE_FILENAME i wonder why both alpha and mips
do not use the default hook nowadays?
mmix seems to do something different for real, fwiw.

TIA,
> 
>   Rainer
> 
> 
> 2012-03-09  Rainer Orth  
> 
>   * config/alpha/alpha.c [HAVE_STAMP_H]: Remove.
>   (alpha_file_start) [MS_STAMP]: Remove.
> 
>   * config/alpha/elf.h (TARGET_GAS): Remove.
>   * config/alpha/freebsd.h (TARGET_DEFAULT): Remove.
>   * config/alpha/linux.h (TARGET_DEFAULT): Remove.
>   * config/alpha/netbsd.h (TARGET_DEFAULT): Remove.
>   * config/alpha/vms.h (TARGET_DEFAULT): Remove.
>   * config.gcc (alpha*-*-linux*): Remove target_cpu_default.
>   (alpha*-*-freebsd*): Likewise.
>   (alpha*-*-netbsd*): Likewise.
>   (alpha*-*-openbsd*): Likewise.
>   (alpha*-*-*): Remove target_cpu_default2.
>   * config/alpha/alpha.c (alpha_output_filename): Remove !TARGET_GAS
>   handling.
>   * config/alpha/alpha.h (TARGET_AS_CAN_SUBTRACT_LABELS): Remove.
>   (TARGET_AS_SLASH_BEFORE_SUFFIX): Remove.
>   * config/alpha/alpha.c (print_operand): Always assume
>   TARGET_AS_SLASH_BEFORE_SUFFIX.
>   * config/alpha/alpha.md ("*builtin_setjmp_receiver_er_sl_1"):
>   Remove TARGET_AS_CAN_SUBTRACT_LABELS.
>   ("*builtin_setjmp_receiver_er_1"): Remove.
>   * config/alpha/alpha.opt (malpha-as): Remove.
>   (mgas): Ignore.
>   * doc/invoke.texi (Option Summary, DEC Alpha Options): Remove
>   -malpha-as, -mgas.
>   Remove DEC Unix reference.
> 
>   * config/alpha/alpha.h (OBJECT_FORMAT_COFF): Remove.
>   (EXTENDED_COFF): Remove.
>   * config/alpha/elf.h (OBJECT_FORMAT_COFF): Don't undef.
>   (EXTENDED_COFF): Don't undef.
>   * config/alpha/alpha.c (alpha_file_start): Always assume
>   OBJECT_FORMAT_ELF.
>   Don't set targetm.asm_file_start_file_directive.
>   [!OBJECT_FORMAT_ELF]: Remove.
>   (TARGET_ASM_FILE_START_FILE_DIRECTIVE): Remove.
> 
>   * config/alpha/alpha.h (SDB_DEBUGGING_INFO): Remove.
>   (DBX_DEBUGGING_INFO): Remove.
>   (MIPS_DEBUGGING_INFO): Remove.
>   (PREFERRED_DEBUGGING_TYPE): Remove.
>   (DBX_OUTPUT_SOURCE_LINE): Remove.
>   (SDB_OUTPUT_SOURCE_LINE): Remove.
>   (DBX_CONTIN_LENGTH): Remove.
>   (NO_DBX_FUNCTION_END): Remove.
>   (ASM_STABS_OP): Remove.
>   (ASM_STABN_OP): Remove.
>   (ASM_STABD_OP): Remove.
>   (SDB_ALLOW_FORWARD_REFERENCES): Remove.
>   (SDB_ALLOW_UNKNOWN_REFERENCES): Remove.
>   (PUT_SDB_DEF): Remove.
>   (PUT_SDB_PLAIN_DEF): Remove.
>   (PUT_SDB_TYPE): Remove.
>   (sdb_label_count): Remove.
>   (PUT_SDB_BLOCK_START): Remove.
>   (PUT_SDB_BLOCK_END): Remove.
>   (PUT_SDB_FUNCTION_START): Remove.
>   (PUT_SDB_FUNCTION_END): Remove.
>   (PUT_SDB_EPILOGUE_END): Remove.
>   * config/alpha/elf.h (SDB_DEBUGGING_INFO): Don't undef.
>   (MIPS_DEBUGGING_INFO): Don't undef.
>   (DBX_DEBUGGING_INFO): Don't undef.
>   * config/alpha/vms.h (SDB_DEBUGGING_INFO): Don't undef.
>   (MIPS_DEBUGGING_INFO): Don't undef.
>   (DBX_DEBUGGING_INFO): Don't undef.
>   * config/alpha/freebsd.h (DBX_CONTIN_CHAR): Remove.
>   * config/alpha/alpha.c (alpha_option_override): Remove SDB_DEBUG
>   handling.
>   (alpha_start_function): Likewise.
>   (sdb_label_count): Remove.
>   (alpha_output_filename): Remove DBX_DEBUG handling.
>   (alpha_file_start): Likewise.
>

[patch, fortran, committed] Fix wrong warning inside associate construct

2017-09-26 Thread Thomas Koenig


Hello world,

I have committed the attached patch as obvious after regression-testing.

It removes the wrong warning from my recend DO warning patch that Jakub
pointed out.  The test case that is restored with this patch is enough
to catch any regression.

Regards

Thomas

2017-09-26  Thomas Koenig  

* frontend-passes.c (do_subscript): Don't do anything
if inside an associate list.

2017-09-26  Thomas Koenig  

* gfortran.dg/gomp/associate1.f90: Remove unnecessary
warning from associate construct and do loop.

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #3 of 8

2017-09-26 Thread Michael Meissner

On Tue, Sep 26, 2017 at 11:36:14AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 26, 2017 at 10:34:44AM -0400, Michael Meissner wrote:
> > * config/rs6000/rs6000.md (movsi_from_df): Optimize converting a
> > DFmode to a SFmode, and then needing to move the SFmode to a GPR
> > to use the XSCVDPSP instruction instead of FRSP and XSCVDPSPN.
> 
> > --- gcc/config/rs6000/rs6000.md (revision 253170)
> > +++ gcc/config/rs6000/rs6000.md (working copy)
> > @@ -6919,6 +6919,26 @@ (define_insn_and_split "*movdi_from_sf_z
> > "4,  4,   4,   4,8,
> >  8,  4")])
> >  
> > +;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before
> > +;; moving it to SImode.  We can do a SFmode store without having to do the
> > +;; conversion explicitly.  If we are doing a register->register 
> > conversion, use
> > +;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the
> > +;; input will not fit in a SFmode, and the later assumes the value has 
> > already
> > +;; been rounded.
> > +(define_insn "*movsi_from_df"
> > +  [(set (match_operand:SI 0 "nonimmediate_operand" "=wa,m,wY,Z")
> > +   (unspec:SI [(float_truncate:SF
> > +(match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))]
> > +   UNSPEC_SI_FROM_SF))]
> 
> (The indentation is a bit broken here -- DF line is indented a space too
> many, and the constraint strings do not line up).

That must something funky with patches and tabs.  It looks ok after I apply the
patch (the match_operand:DF is indented one space under float_truncate:SF).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Enable ifunc attribute by default for SPARC GNU/Linux

2017-09-26 Thread Joseph Myers

Similar to other architectures with IFUNC binutils/glibc support, this
patch enables the ifunc attribute for SPARC GNU/Linux.  This is needed
for building glibc with the current checks on IFUNC resolver types
(and use of the attribute in glibc rather than manually created IFUNCs
is beneficial anyway because it results in better debug info).

Tested compilation of glibc (in conjunction with a glibc patch to
support using the attribute on SPARC) with build-many-glibcs.py.  I
have not run the GCC tests for SPARC.  OK to commit?

2017-09-26  Joseph Myers  

* config.gcc (default_gnu_indirect_function): Default to yes for
sparc*-*-linux* with glibc.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 253204)
+++ gcc/config.gcc  (working copy)
@@ -3100,7 +3100,7 @@
 ;;
 *-*-linux*)
case ${target} in
-   aarch64*-* | i[34567]86-* | powerpc*-* | s390*-* | x86_64-*)
+   aarch64*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | x86_64-*)
default_gnu_indirect_function=yes
;;
esac

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #2 of 8

2017-09-26 Thread Michael Meissner

On Tue, Sep 26, 2017 at 11:06:09AM -0500, Segher Boessenkool wrote:
> > @@ -6850,52 +6850,41 @@ (define_insn_and_split "movsi_from_sf"
> >rtx op1 = operands[1];
> >rtx op2 = operands[2];
> >rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
> > +  rtx op2_si = gen_rtx_REG (SImode, REGNO (op2));
> >  
> 
> Does this work, btw?  I would expect you need reg_or_subregno, for op0
> that is (the new op2 might be fine, not sure).  You do use it in most
> places; please check.

It does work, but using reg_or_subreno is better.  I've changed it.

> >  ;; movsi_from_sf with zero extension
> >  ;;
> >  ;; RLDICL   LWZ  LFIWZX   LXSIWZX   VSX->GPR
> > -;; MTVSRWZ  VSX->VSX
> > +;; VSX->VSX MTVSRWZ
> >  
> >  (define_insn_and_split "*movdi_from_sf_zero_ext"
> >[(set (match_operand:DI 0 "gpc_reg_operand"
> > "=r, r,   ?*wI,?*wH, r,
> > -   wIwH,?wK")
> > +wK, wIwH")
> 
> This loses the "?", is that on purpose?

No.  I'll put it back (but I do think it is harmless, since the direct move
occurs before it).  In my original patches, I just left the value in a vector
register, and let the register allocator generate the direct move.  However,
when I updated the peephole2 (patch #7), it was a lot easier to write, if I
kept in the direct move option.  But I missed putting the '?' back in.
Thanks.

> >[(set_attr "type"
> > "*,  load,fpload,  fpload,  mftgpr,
> > -mffgpr, veclogical")
> > +vecexts,mffgpr")
> 
> vecsimple or vecfloat I guess, not vecexts.  We have no way of describing
> it exactly, of course.  Maybe just "two".

Ok.

> Okay for trunk with those things taken care of.  Thanks!

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #8 of 8

2017-09-26 Thread Segher Boessenkool

On Tue, Sep 26, 2017 at 10:50:14AM -0400, Michael Meissner wrote:
>   * gcc.target/powerpc/pr71977-1.c: Update test to know that we
>   don't generate a 32-bit shift after doing XSCVDPSPN.
>   * gcc.target/powerpc/direct-move-float1.c: Likewise.
>   * gcc.target/powerpc/direct-move-float3.c: New test.

> --- gcc/testsuite/gcc.target/powerpc/pr71977-1.c  (revision 253176)
> +++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c  (working copy)
> @@ -23,9 +23,9 @@ mask_and_float_var (float f, uint32_t ma
>return u.value;
>  }
>  
> -/* { dg-final { scan-assembler "\[ \t\]xxland " } } */
> -/* { dg-final { scan-assembler-not "\[ \t\]and "} } */
> -/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */
> -/* { dg-final { scan-assembler-not "\[ \t\]stxv"} } */
> -/* { dg-final { scan-assembler-not "\[ \t\]lxv" } } */
> -/* { dg-final { scan-assembler-not "\[ \t\]srdi "   } } */
> +/* { dg-final { scan-assembler {\mxxland\M}  } } */
> +/* { dg-final { scan-assembler-not {\mand\M} } } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M}  } } */
> +/* { dg-final { scan-assembler-not {\mstxv\M}} } */
> +/* { dg-final { scan-assembler-not {\mlxv\M} } } */
> +/* { dg-final { scan-assembler-not {\msrdi\M}} } */

Careful, you still want to disallow lxvx and stxvx -- so just remove
the \M from those patterns, I'd say (if that works :-) )

Okay for trunk with that fixed.  Thanks,


Segher

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #6 of 8

2017-09-26 Thread Segher Boessenkool

On Tue, Sep 26, 2017 at 10:44:24AM -0400, Michael Meissner wrote:
>   * config/rs6000/vsx.md (vsx_xscvdpspn): Eliminate useless
>   alternative constraint.
>   (vsx_xscvspdpn): Likewise.
>   (vsx_xscvspdpn_scalar): Likewise.

Okay, nice cleanup!  Thanks,


Segher

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #5 of 8

2017-09-26 Thread Segher Boessenkool

Hi!

On Tue, Sep 26, 2017 at 10:39:06AM -0400, Michael Meissner wrote:
>   * config/rs6000/vsx.md (vsx_xscvdpsp_scalar): Use "ww" constraint
>   instead of "f" to allow SFmode to be in traditional Altivec
>   registers.

Okay.  Thanks,


Segher

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #4 of 8

2017-09-26 Thread Segher Boessenkool

On Tue, Sep 26, 2017 at 10:36:34AM -0400, Michael Meissner wrote:
>   * config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so that
>   it is adjacent to the other XSCVSPDP insns.

Okay for trunk.  Thanks,


Segher

Re: [patch, fortran] Warn about out-of-bounds access with DO subscripts

2017-09-26 Thread Thomas Koenig


Hi Jakub,


   associate(k => v, l => a(i, j), m => a(i, :))

And I don't really see a bug in the testcase...


Hm, I will look at this. Maybe some strange
interaction with associate here...

Regards

Thomas

Re: [PATCH v2] [libcc1] Rename C{,P}_COMPILER_NAME and remove triplet from them

2017-09-26 Thread Sergio Durigan Junior

Ping^2.

On Friday, September 15 2017, I wrote:

> Ping.
>
> On Friday, September 01 2017, I wrote:
>
>> On Wednesday, August 23 2017, Pedro Alves wrote:
>>
>>> On 08/23/2017 05:17 AM, Sergio Durigan Junior wrote:
 Hi there,
 
 This is a series of two patches, one for GDB and one for GCC, which aims
 to improve the detection and handling of triplets present on compiler
 names.  The motivation for this series was mostly the fact that GDB's
 "compile" command is broken on Debian unstable, as can be seen here:
 
   
 
 The reason for the failure is the fact that Debian compiles GCC using
 the --program-{prefix,suffix} options from configure in order to name
 the compiler using the full triplet (i.e., Debian's GCC is not merely
 named "gcc", but e.g. "x86_64-linux-gnu-gcc-7"), which end up naming the
 C_COMPILER_NAME and CP_COMPILER_NAME defines with the specified prefix
 and suffix.  Therefore, the regexp being used to match the compiler name
 is wrong because it doesn't take into account the fact that the defines
 may already contain the triplets.
>>>
>>> As discussed on IRC, I think the problem is that C_COMPILER_NAME
>>> in libcc1 includes the full triplet in the first place.  I think
>>> that it shouldn't.  I think that C_COMPILER_NAME should always
>>> be "gcc".
>>>
>>> The problem is in bootstrapping code, before there's a plugin
>>> yet -- i.e.., in the code that libcc1 uses to find the compiler (which
>>> then loads a plugin that libcc1 talks with).
>>>
>>> Please bear with me while I lay down my rationale, so that we're
>>> in the same page.
>>>
>>> C_COMPILER_NAME seems to include the prefix currently in an attempt
>>> to support cross debugging, or more generically, --enable-targets=all
>>> in gdb, but the whole thing doesn't really work as intended if
>>> C_COMPILER_NAME already includes a target prefix.
>>>
>>> IIUC the libcc1/plugin design, a single "libcc1.so" (what gdb loads,
>>> not the libcc1plugin compiler plugin) should work with any compiler in
>>> the PATH, in case you have several in the system.  E.g., one for
>>> each arch.
>>>
>>> Let me expand.
>>>
>>> The idea is that gdb always dlopens "libcc1.so", by that name exactly.
>>> Usually that'll open the libcc1.so installed in the system, e.g.,
>>> "/usr/lib64/libcc1.so", which for convenience was originally built from the
>>> same source tree as the systems's compiler was built.  You could force gdb 
>>> to
>>> load some other libcc1.so, e.g., by tweaking LD_LIBRARY_PATH of course,
>>> but you shouldn't need to.
>>>
>>> libcc1.so is responsible for finding a compiler that targets the
>>> architecture of the inferior that the user is debugging in gdb.
>>> E.g., say you're cross debugging for arm-none-eabi, on a
>>> x86-64 Fedora host.  GDB knows the target inferior's architecture, and 
>>> passes
>>> down to (the system) libcc1 a triplet regex like "arm*-*eabi*" or
>>> similar to libcc1,.  libcc1 appends "-" + C_COMPILER_NAME to that regex,
>>> generating something like "arm*-*eabi*-gcc", and then looks for binaries
>>> in PATH that match that regex.  When one is found, e.g., 
>>> "arm-none-eabi-gcc",
>>> libcc1 forks/execs that compiler, passing it "-fplugin=libcc1plugin".
>>> libcc1 then communicates with that compiler's libcc1plugin plugin
>>> via a socket.
>>>
>>> In this scheme, "libcc1.so", the library that gdb loads, has no
>>> target-specific logic at all.  It should work with any compiler
>>> in the system, for any target/arch.  All it does is marshall the gcc/gdb
>>> interface between the gcc plugin and gdb, it is not linked against gcc.
>>> That boundary is versioned, and ABI-stable.  So as long as the
>>> libcc1.so that gdb loads understands the same API version of the gcc/gdb
>>> interface API as gdb understands, it all should work.  (The APIs
>>> are always extended keeping backward compatibility.)
>>>
>>> So in this scheme, having the "C_COMPILER_NAME" macro in libcc1
>>> include the target prefix for the --target that the plugin that
>>> libcc1 is built along with, seems to serve no real purpose, AFAICT.
>>> It's just getting in the way.
>>>
>>> I.e., something like:
>>>
>>>   "$gdb_specified_triplet_re" + "-" + C_COMPILER_NAME
>>>
>>> works if C_COMPILER_NAME is exactly "gcc", but not if C_COMPILER_NAME is 
>>> already:
>>>
>>>   "$whatever_triplet_libcc1_happened_to_be_built_with" + "-gcc"
>>>
>>> because we end up with:
>>>
>>>   "$gdb_specified_triplet_re" + "-" 
>>> "$whatever_triplet_libcc1_happened_to_be_built_with" +  "-gcc"
>>>
>>> which is the problem case.
>>>
>>> In sum, I think the libcc1.so (not the plugin) should _not_ have baked
>>> in target awareness, and thus C_COMPILER_NAME should always be "gcc", and
>>> then libcc1's regex should be adjusted to also tolerate a suffix in
>>> the final compiler binary name regex.
>>>
>>> WDYT?
>>
>> As I replied before, I agree with Pedro's

Re: [patch, fortran] Warn about out-of-bounds access with DO subscripts

2017-09-26 Thread Thomas Koenig


Hi!


On Mon, 25 Sep 2017 18:50:49 +0200, Thomas Koenig  wrote:

Thanks for the review, committed as r253156.

Now, on to some other bugs...


No, back to this one please.  ;-)


OK, if you insist :-)


Apparently, the changes you prepared for existing testcases did not get
committed, so I'm now seeing some FAILs there.  See also recent posts on
the  mailing list:

 FAIL: gfortran.dg/gomp/associate1.f90   -O  (test for excess errors)
 FAIL: gfortran.dg/predcom-1.f   -O  (test for excess errors)
 FAIL: gfortran.dg/unconstrained_commons.f   -O  (test for excess errors)


I have committed those changes, so that should be gone.


And the following gets highlighted, too:

 FAIL: compiler driver --help=fortran option(s): "^ +-.*[^:.]$" absent from output: 
"  -Wdo-subscript  Warn about possibly incorrect subscripts in do loops"
 FAIL: compiler driver --help=warnings option(s): "^ +-.*[^:.]$" absent from output: 
"  -Wdo-subscript  Warn about possibly incorrect subscripts in do loops"


This I don't understand.  Was there anything wrong with my
change to fortran/lang.opt?

Regards

Thomas

Re: [GCC][PATCH][TESTSUITE][ARM][COMMITTED] Invert check to misalign in vect_hw_misalign (PR 78421)

2017-09-26 Thread Mike Stump

On Sep 25, 2017, at 9:58 PM, Christophe Lyon  wrote:
> 
> Yes, thanks! I was missing the 'expr' part.
> 
> Here is what I have committed (r253187), to avoid further noise in the 
> results.

Yup, looks good.  Thanks.

Re: [PATCH 4/5] New target check: vect_nopeel - v2

2017-09-26 Thread Richard Sandiford

Andreas Krebbel  writes:
> - vect_nopeel renamed to vect_no_peel
> - documentation added.
>
> gcc/testsuite/ChangeLog:
>
> 2017-09-26  Andreas Krebbel  
>
>   * doc/sourcebuild.texi: Document vect_no_peel.
>
> gcc/testsuite/ChangeLog:
>
> 2017-09-26  Andreas Krebbel  
>
>   * g++.dg/vect/slp-pr56812.cc: Check vect_nopeel.
>   * lib/target-supports.exp (check_effective_target_vect_nopeel):
>   New proc.

Sorry for the bikeshedding, but how about having a positive test
like vect_can_peel instead?  ! vect_no... can be hard to read in
complex conditions.  (There's already that problem with existing
vect_no...s.)

> -/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */
>
> +/* For targets without vector loop peeling the loop becomes cheap
>
> +   enough to be vectorized.  */
>
> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" { 
> target { ! vect_no_peel }
> } } } */

How about an xfail instead?  Then it'll be noticeable (via an XPASS)
if we fail to vectorise the loop when we should.

Thanks,
Richard

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #3 of 8

2017-09-26 Thread Segher Boessenkool

Hi!

On Tue, Sep 26, 2017 at 10:34:44AM -0400, Michael Meissner wrote:
>   * config/rs6000/rs6000.md (movsi_from_df): Optimize converting a
>   DFmode to a SFmode, and then needing to move the SFmode to a GPR
>   to use the XSCVDPSP instruction instead of FRSP and XSCVDPSPN.

> --- gcc/config/rs6000/rs6000.md   (revision 253170)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -6919,6 +6919,26 @@ (define_insn_and_split "*movdi_from_sf_z
>   "4,  4,   4,   4,8,
>8,  4")])
>  
> +;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before
> +;; moving it to SImode.  We can do a SFmode store without having to do the
> +;; conversion explicitly.  If we are doing a register->register conversion, 
> use
> +;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the
> +;; input will not fit in a SFmode, and the later assumes the value has 
> already
> +;; been rounded.
> +(define_insn "*movsi_from_df"
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=wa,m,wY,Z")
> + (unspec:SI [(float_truncate:SF
> +  (match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))]
> + UNSPEC_SI_FROM_SF))]

(The indentation is a bit broken here -- DF line is indented a space too
many, and the constraint strings do not line up).

Other than that: looks fine, thanks!


Segher

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #2 of 8

2017-09-26 Thread Segher Boessenkool

Hi,

On Tue, Sep 26, 2017 at 10:32:03AM -0400, Michael Meissner wrote:
>   * config/rs6000/rs6000.md (movsi_from_sf): Adjust code to
>   eliminate doing a 32-bit shift right or vector extract after doing
>   XSCVDPSPN.  Use zero_extendsidi2 instead of p8_mfvsrd_4_disf to
>   move the value to the GPRs.
>   (movdi_from_sf_zero_ext): Likewise.
>   (reload_gpr_from_vsxsf): Likewise.

> --- gcc/config/rs6000/rs6000.md   (revision 253169)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -6806,25 +6806,25 @@ (define_insn "*movsi_internal1_single"
>  ;; needed.
>  
>  ;;   MR   LWZ  LFIWZX   LXSIWZX   STW
> -;;   STFS STXSSP   STXSSPX  VSX->GPR  MTVSRWZ
> -;;   VSX->VSX
> +;;   STFS STXSSP   STXSSPX  VSX->GPR  VSX->VSX,
> +;;   MTVSRWZ

(Typo: comma at end of line).

>  (define_insn_and_split "movsi_from_sf"
>[(set (match_operand:SI 0 "nonimmediate_operand"
>   "=r, r,   ?*wI,?*wH, m,
> -  m,  wY,  Z,   r,wIwH,
> -  ?wK")
> +  m,  wY,  Z,   r,?*wIwH,
> +  wIwH")
>  
>   (unspec:SI [(match_operand:SF 1 "input_operand"
>   "r,  m,   Z,   Z,r,
> -  f,  wb,  wu,  wIwH, r,
> -  wK")]
> +  f,  wb,  wu,  wIwH, wIwH,
> +  r")]
>   UNSPEC_SI_FROM_SF))
>  
> (clobber (match_scratch:V4SF 2
>   "=X, X,   X,   X,X,
> -  X,  X,   X,   wa,   X,
> -  wa"))]
> +  X,  X,   X,   wIwH, X,
> +  X"))]
>  
>"TARGET_NO_SF_SUBREG
> && (register_operand (operands[0], SImode)
> @@ -6839,10 +6839,10 @@ (define_insn_and_split "movsi_from_sf"
> stxssp %1,%0
> stxsspx %x1,%y0
> #
> -   mtvsrwz %x0,%1
> -   #"
> +   xscvdpspn %x0,%x1
> +   mtvsrwz %x0,%1"
>"&& reload_completed
> -   && register_operand (operands[0], SImode)
> +   && int_reg_operand (operands[0], SImode)
> && vsx_reg_sfsubreg_ok (operands[1], SFmode)"
>[(const_int 0)]
>  {

So you swap the last two alternatives.  Hrm okay.

> @@ -6850,52 +6850,41 @@ (define_insn_and_split "movsi_from_sf"
>rtx op1 = operands[1];
>rtx op2 = operands[2];
>rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
> +  rtx op2_si = gen_rtx_REG (SImode, REGNO (op2));
>  

Does this work, btw?  I would expect you need reg_or_subregno, for op0
that is (the new op2 might be fine, not sure).  You do use it in most
places; please check.

>  ;; movsi_from_sf with zero extension
>  ;;
>  ;;   RLDICL   LWZ  LFIWZX   LXSIWZX   VSX->GPR
> -;;   MTVSRWZ  VSX->VSX
> +;;   VSX->VSX MTVSRWZ
>  
>  (define_insn_and_split "*movdi_from_sf_zero_ext"
>[(set (match_operand:DI 0 "gpc_reg_operand"
>   "=r, r,   ?*wI,?*wH, r,
> - wIwH,?wK")
> +  wK, wIwH")

This loses the "?", is that on purpose?

>[(set_attr "type"
>   "*,  load,fpload,  fpload,  mftgpr,
> -  mffgpr, veclogical")
> +  vecexts,mffgpr")

vecsimple or vecfloat I guess, not vecexts.  We have no way of describing
it exactly, of course.  Maybe just "two".

Okay for trunk with those things taken care of.  Thanks!


Segher

Re: [PATCH v2,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant

2017-09-26 Thread Bill Schmidt

On Sep 26, 2017, at 5:57 AM, Segher Boessenkool  
wrote:
> 
>> +/* { dg-final { scan-assembler-not "swap" } } */
> 
> So what is this really testing for?  xxswapd?  But a) we never generate
> that, and b) you could use a better regex?

Agreed, this looks like an unnecessary test for now.  Changing to "xxswapd"
would future-proof the test in case we ever generated that.  Agree with 
Segher that it would be much better to have the tests have uniform naming.

No further comments from me; looks good.

Bill

Re: [PATCH] Make SRA qsort comparator transitive

2017-09-26 Thread Richard Biener

On September 26, 2017 5:20:25 PM GMT+02:00, Martin Jambor  
wrote:
>Hi,
>
>On Mon, Sep 25, 2017 at 04:22:06PM +0300, Alexander Monakov wrote:
>> 
>> Thanks!  If this is resolved, haifa-sched autoprefetch ranking will
>become the
>> last remaining (among discovered so far) inconsistent qsort
>comparator in GCC.
>> 
>
>So the following has passed bootstrap and testing on x86_64-linux
>(including Ada).  I have failed to create a C/C++ testcase but when I
>was trying to come up with one I realized there is code in
>analyze_access_subtree that replaces integer types with smaller
>precision with full-precision ones for all cases but bit-fields.
>Therefore I adjusted the disqualification to only trigger for
>bit-fields (that happen to be of equal size as a non-integer register
>type), which I think is very unlikely to ever happen.
>
>Richi, is it OK for trunk?

OK. 

Richard. 

>Thanks,
>
>Martin
>
>
>2017-09-26  Martin Jambor  
>
>   * tree-sra.c (compare_access_positions): Put integral types first,
>   stabilize sorting of integral types, remove conditions putting
>   non-full-precision integers last.
>   (sort_and_splice_var_accesses): Disable scalarization if a
>   non-integert would be represented by a non-full-precision integer.
>---
> gcc/tree-sra.c | 42 --
> 1 file changed, 32 insertions(+), 10 deletions(-)
>
>diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>index 163b7a2d03b..f5675edc7f1 100644
>--- a/gcc/tree-sra.c
>+++ b/gcc/tree-sra.c
>@@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const
>void *b)
>  && TREE_CODE (f2->type) != COMPLEX_TYPE
>  && TREE_CODE (f2->type) != VECTOR_TYPE)
>   return -1;
>-  /* Put the integral type with the bigger precision first.  */
>+  /* Put any integral type before any non-integral type.  When
>splicing, we
>+   make sure that those with insufficient precision and occupying the
>+   same space are not scalarized.  */
>   else if (INTEGRAL_TYPE_P (f1->type)
>+ && !INTEGRAL_TYPE_P (f2->type))
>+  return -1;
>+  else if (!INTEGRAL_TYPE_P (f1->type)
>  && INTEGRAL_TYPE_P (f2->type))
>-  return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
>-  /* Put any integral type with non-full precision last.  */
>-  else if (INTEGRAL_TYPE_P (f1->type)
>- && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type))
>- != TYPE_PRECISION (f1->type)))
>   return 1;
>-  else if (INTEGRAL_TYPE_P (f2->type)
>- && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type))
>- != TYPE_PRECISION (f2->type)))
>-  return -1;
>+  /* Put the integral type with the bigger precision first.  */
>+  else if (INTEGRAL_TYPE_P (f1->type)
>+ && INTEGRAL_TYPE_P (f2->type)
>+ && (TYPE_PRECISION (f2->type) != TYPE_PRECISION (f1->type)))
>+  return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
>   /* Stabilize the sort.  */
>   return TYPE_UID (f1->type) - TYPE_UID (f2->type);
> }
>@@ -2055,6 +2056,11 @@ sort_and_splice_var_accesses (tree var)
>   bool grp_partial_lhs = access->grp_partial_lhs;
>   bool first_scalar = is_gimple_reg_type (access->type);
>   bool unscalarizable_region = access->grp_unscalarizable_region;
>+  bool bf_non_full_precision
>+  = (INTEGRAL_TYPE_P (access->type)
>+ && TYPE_PRECISION (access->type) != access->size
>+ && TREE_CODE (access->expr) == COMPONENT_REF
>+ && DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1)));
> 
>   if (first || access->offset >= high)
>   {
>@@ -2102,6 +2108,22 @@ sort_and_splice_var_accesses (tree var)
>this combination of size and offset, the comparison function
>should have put the scalars first.  */
> gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type));
>+/* It also prefers integral types to non-integral.  However, when
>the
>+   precision of the selected type does not span the entire area and
>+   should also be used for a non-integer (i.e. float), we must not
>+   let that happen.  Normally analyze_access_subtree expands the
>type
>+   to cover the entire area but for bit-fields it doesn't.  */
>+if (bf_non_full_precision && !INTEGRAL_TYPE_P (ac2->type))
>+  {
>+if (dump_file && (dump_flags & TDF_DETAILS))
>+  {
>+fprintf (dump_file, "Cannot scalarize the following access "
>+ "because insufficient precision integer type was "
>+ "selected.\n  ");
>+dump_access (dump_file, access, false);
>+  }
>+unscalarizable_region = true;
>+  }
> ac2->group_representative = access;
> j++;
>   }

Re: [PATCH] Make SRA qsort comparator transitive

2017-09-26 Thread Martin Jambor

Hi,

On Mon, Sep 25, 2017 at 04:22:06PM +0300, Alexander Monakov wrote:
> 
> Thanks!  If this is resolved, haifa-sched autoprefetch ranking will become the
> last remaining (among discovered so far) inconsistent qsort comparator in GCC.
> 

So the following has passed bootstrap and testing on x86_64-linux
(including Ada).  I have failed to create a C/C++ testcase but when I
was trying to come up with one I realized there is code in
analyze_access_subtree that replaces integer types with smaller
precision with full-precision ones for all cases but bit-fields.
Therefore I adjusted the disqualification to only trigger for
bit-fields (that happen to be of equal size as a non-integer register
type), which I think is very unlikely to ever happen.

Richi, is it OK for trunk?

Thanks,

Martin


2017-09-26  Martin Jambor  

* tree-sra.c (compare_access_positions): Put integral types first,
stabilize sorting of integral types, remove conditions putting
non-full-precision integers last.
(sort_and_splice_var_accesses): Disable scalarization if a
non-integert would be represented by a non-full-precision integer.
---
 gcc/tree-sra.c | 42 --
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 163b7a2d03b..f5675edc7f1 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const void *b)
   && TREE_CODE (f2->type) != COMPLEX_TYPE
   && TREE_CODE (f2->type) != VECTOR_TYPE)
return -1;
-  /* Put the integral type with the bigger precision first.  */
+  /* Put any integral type before any non-integral type.  When splicing, we
+make sure that those with insufficient precision and occupying the
+same space are not scalarized.  */
   else if (INTEGRAL_TYPE_P (f1->type)
+  && !INTEGRAL_TYPE_P (f2->type))
+   return -1;
+  else if (!INTEGRAL_TYPE_P (f1->type)
   && INTEGRAL_TYPE_P (f2->type))
-   return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
-  /* Put any integral type with non-full precision last.  */
-  else if (INTEGRAL_TYPE_P (f1->type)
-  && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type))
-  != TYPE_PRECISION (f1->type)))
return 1;
-  else if (INTEGRAL_TYPE_P (f2->type)
-  && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type))
-  != TYPE_PRECISION (f2->type)))
-   return -1;
+  /* Put the integral type with the bigger precision first.  */
+  else if (INTEGRAL_TYPE_P (f1->type)
+  && INTEGRAL_TYPE_P (f2->type)
+  && (TYPE_PRECISION (f2->type) != TYPE_PRECISION (f1->type)))
+   return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
   /* Stabilize the sort.  */
   return TYPE_UID (f1->type) - TYPE_UID (f2->type);
 }
@@ -2055,6 +2056,11 @@ sort_and_splice_var_accesses (tree var)
   bool grp_partial_lhs = access->grp_partial_lhs;
   bool first_scalar = is_gimple_reg_type (access->type);
   bool unscalarizable_region = access->grp_unscalarizable_region;
+  bool bf_non_full_precision
+   = (INTEGRAL_TYPE_P (access->type)
+  && TYPE_PRECISION (access->type) != access->size
+  && TREE_CODE (access->expr) == COMPONENT_REF
+  && DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1)));
 
   if (first || access->offset >= high)
{
@@ -2102,6 +2108,22 @@ sort_and_splice_var_accesses (tree var)
 this combination of size and offset, the comparison function
 should have put the scalars first.  */
  gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type));
+ /* It also prefers integral types to non-integral.  However, when the
+precision of the selected type does not span the entire area and
+should also be used for a non-integer (i.e. float), we must not
+let that happen.  Normally analyze_access_subtree expands the type
+to cover the entire area but for bit-fields it doesn't.  */
+ if (bf_non_full_precision && !INTEGRAL_TYPE_P (ac2->type))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Cannot scalarize the following access "
+  "because insufficient precision integer type was "
+  "selected.\n  ");
+ dump_access (dump_file, access, false);
+   }
+ unscalarizable_region = true;
+   }
  ac2->group_representative = access;
  j++;
}
-- 
2.14.1

[PING] [PATCH] rl78 adddi3 improvement

2017-09-26 Thread Sebastian Perta

Hi,

I would like to ping the below patch posted on 14th of august.

Thank you!

Sebastian


-Original Message-
From: Sebastian Perta
Sent: 14 August 2017 15:26
To: 'gcc-patches@gcc.gnu.org' 
Subject: [PATCH] rl78 adddi3 improvement

The following patch improves both the speed and code size for 64 bit addition 
for RL78:
it emits a library function call instead of emitting code for  the 64 bit add 
for every single addition.
The addition function which was added in libgcc is hand written, so more 
optimal than what GCC generates.

The change can easily be seen on the following test case.
long long my_adddi3(long long a, long long b) {
return a + b;
}
I did not add this to the regression as it very simple and there are many test 
cases in the regression which test this, for example 
gcc.c-torture/execute/20090711-1.c and  gcc.c-torture/execute/20091229-1.c and 
so on.

Regression test is OK, tested with the following command:
make -k check-gcc RUNTESTFLAGS=--target_board=rl78-sim

Please let me know if this is OK, Thank you!
Sebastian



Index: gcc/ChangeLog
===
--- gcc/ChangeLog(revision 251091)
+++ gcc/ChangeLog(working copy)
@@ -1,3 +1,12 @@
+2017-08-14  Sebastian Perta  
+
+changed long long addition for RL78
+* gcc/config/rl78/rl78.c (rl78_emit_libcall): new function.
+* gcc/config/rl78/rl78-protos.h (rl78_emit_libcall): new function.
+* gcc/config/rl78/rl78.md: new define_expand "adddi3".
+* libgcc/config/rl78/adddi3.S: new assembly file.
+* libgcc/config/rl78/t-rl78: added adddi3.S to LIB2ADD.
+
 2017-08-14  Bin Cheng  

 PR tree-optimization/81799
Index: gcc/config/rl78/rl78-protos.h
===
--- gcc/config/rl78/rl78-protos.h(revision 251091)
+++ gcc/config/rl78/rl78-protos.h(working copy)
@@ -56,3 +56,13 @@
 int, int, int);

 intrl78_one_far_p (rtx *operands, int num_operands);
+
+#ifdef RTX_CODE
+#ifdef HAVE_MACHINE_MODES
+
+rtx rl78_emit_libcall (const char*, enum rtx_code,
+   enum machine_mode, enum machine_mode,
+   int, rtx*);
+
+#endif
+#endif
Index: gcc/config/rl78/rl78.c
===
--- gcc/config/rl78/rl78.c(revision 251091)
+++ gcc/config/rl78/rl78.c(working copy)
@@ -4791,4 +4791,43 @@


 struct gcc_target targetm = TARGET_INITIALIZER;

+rtx
+rl78_emit_libcall (const char *name, enum rtx_code code,
+   enum machine_mode dmode, enum machine_mode smode,
+   int noperands, rtx *operands) {
+  rtx ret;
+  rtx_insn *insns;
+  rtx libcall;
+  rtx equiv;
+
+  start_sequence ();
+  libcall = gen_rtx_SYMBOL_REF (Pmode, name);
+
+  switch (noperands)
+{
+case 2:
+  ret = emit_library_call_value (libcall, NULL_RTX, LCT_CONST,
+ dmode, 1, operands[1], smode);
+  equiv = gen_rtx_fmt_e (code, dmode, operands[1]);
+  break;
+
+case 3:
+  ret = emit_library_call_value (libcall, NULL_RTX,
+ LCT_CONST, dmode, 2,
+ operands[1], smode, operands[2],
+ smode);
+  equiv = gen_rtx_fmt_ee (code, dmode, operands[1], operands[2]);
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  insns = get_insns ();
+  end_sequence ();
+  emit_libcall_block (insns, operands[0], ret, equiv);
+  return ret;
+}
+
 #include "gt-rl78.h"
Index: gcc/config/rl78/rl78.md
===
--- gcc/config/rl78/rl78.md(revision 251091)
+++ gcc/config/rl78/rl78.md(working copy)
@@ -224,6 +224,16 @@
DONE;"
 )

+(define_expand "adddi3"
+ [(set (match_operand:DI  0 "nonimmediate_operand" "")
+(plus:DI (match_operand:DI 1 "general_operand"  "")
+ (match_operand:DI2 "general_operand"  "")))
+   ]
+  ""
+  "rl78_emit_libcall (\"__adddi3\", PLUS, DImode, DImode, 3, operands);
+   DONE;"
+)
+
 (define_insn "addsi3_internal_virt"
   [(set (match_operand:SI  0 "nonimmediate_operand" "=v,, vm")
 (plus:SI (match_operand:SI 1 "general_operand"  "0, vim, vim")
Index: libgcc/config/rl78/adddi3.S
===
--- libgcc/config/rl78/adddi3.S(nonexistent)
+++ libgcc/config/rl78/adddi3.S(working copy)
@@ -0,0 +1,58 @@
+;   Copyright (C) 2017 Free Software Foundation, Inc.
+;   Contributed by Sebastian Perta.
+;
+; This file is free software; you can redistribute it and/or modify it
+; under the terms of the GNU General Public License as published by the
+; Free Software Foundation; either version 3, or (at your option) any ;
+later version.
+;
+; This file is distributed in the hope that it will be useful, but ;
+WITHOUT ANY WARRANTY; without even the implied warranty of ;

Re: [PATCH][GRAPHITE] More TLC

2017-09-26 Thread Sven Verdoolaege

On Tue, Sep 26, 2017 at 09:19:50AM -0500, Sebastian Pop wrote:
> Sven, is there already a function that computes the sum of all
> strides in a proximity map?  Maybe you have code that does
> something similar in pet or ppcg?

What exactly do you want to sum?
If this involves any counting, then it cannot currently
be done in pet or ppcg since isl does not support counting yet
and the public version of barvinok is GPL licensed.

Also, it's better to ask such questions on the isl mailing list
isl-developm...@googlegroups.com

skimo

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #1 of 8

2017-09-26 Thread Segher Boessenkool

Hi!

On Tue, Sep 26, 2017 at 10:30:03AM -0400, Michael Meissner wrote:
> I have broken the patches down to 8 chunks.

Thanks for doing this.

> +(define_split
> +  [(set (match_operand:DI 0 "int_reg_operand")
> + (sign_extend:DI (match_operand:SI 1 "vsx_register_operand")))]

Should be EXTSI instead of DI, for clarity.

> +  "TARGET_DIRECT_MOVE_64BIT && reload_completed"
> +  [(set (match_dup 2)
> + (match_dup 1))
> +   (set (match_dup 0)
> + (sign_extend:DI (match_dup 2)))]
> +{
> +  operands[2] = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
> +})

Okay for trunk with that.  Thanks!


Segher

Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag

2017-09-26 Thread David Edelsohn

On Tue, Sep 26, 2017 at 4:44 AM, Janus Weil  wrote:
> 2017-09-25 23:23 GMT+02:00 Steve Kargl :
>> On Mon, Sep 25, 2017 at 11:14:42PM +0200, Janus Weil wrote:
>>> 2017-09-25 17:07 GMT+02:00 David Edelsohn :
>>> > promotion_3.f90 and promotion_4.f90 are failing on at least PowerPC
>>> > and AArch64.  Are these new tests limited to x86 or some long double
>>> > assumptions?
>>>
>>> These tests require the availability of  a 10- or 16-byte-wide REAL
>>> type, respectively. I have to admit that I do not have a complete
>>> overview of which targets in GCC's wide portfolio provide such a type.
>>>
>>> It seems that REAL(16) is supported via libquadmath on 32-bit x86,
>>> x86-64 and Itanium at least. I'm not sure about REAL(10).
>>>
>>> Targets that do not support such a type probably need to be XFAILed.
>>>
>>
>> Janus, I think you can control with a dg option
>>
>> dg-require-effective-target fortran_large_real
>>
>> See, for example, gfortran.dg/random_3.f90
>
> Thanks for the pointer, Steve.
>
> However, it seems that "fortran_large_real" only requires some real
> type that is larger than 8 byte, but makes no assumptions on its
> actual size (10 or 16 byte). Therefore it's probably not very useful
> for promotion_{3,4}.
>
> But: I found that there's also a "fortran_real_16", which should be
> suitable for promotion_3. Can someone verify if the following fixes
> the problem on the failing targets:
>
> Index: promotion_3.f90
> ===
> --- promotion_3.f90(revision 253134)
> +++ promotion_3.f90(working copy)
> @@ -1,5 +1,6 @@
>  ! { dg-do run }
>  ! { dg-options "-fdefault-real-16" }
> +! { dg-require-effective-target fortran_real_16 }
>  !
>  ! PR 82143: add a -fdefault-real-16 flag
>  !
>
>
> If it does, I'll be happy to commit that. For promotion_4, we probably
> need to add an effective target "fortran_real_10" (which does not seem
> to exists yet).

Testing fortran_real_16 fixes promotion_3.f90 on AIX.  I expect that
the new dg test will work for promotion_4.f90.

Thanks, David

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #8 of 8

2017-09-26 Thread Michael Meissner

Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #8.  Can I check this into the trunk?  It fixes the two tests
(pr71977-1.c and direct-move-float1.c) that need to be adjusted with the
previous patches applies.  It also adds a new test to test combining round from
DFmode to SFmode and move it to a GPR.

2017-09-25  Michael Meissner  

* gcc.target/powerpc/pr71977-1.c: Update test to know that we
don't generate a 32-bit shift after doing XSCVDPSPN.
* gcc.target/powerpc/direct-move-float1.c: Likewise.
* gcc.target/powerpc/direct-move-float3.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/testsuite/gcc.target/powerpc/pr71977-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr71977-1.c(revision 253176)
+++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c(working copy)
@@ -23,9 +23,9 @@ mask_and_float_var (float f, uint32_t ma
   return u.value;
 }
 
-/* { dg-final { scan-assembler "\[ \t\]xxland " } } */
-/* { dg-final { scan-assembler-not "\[ \t\]and "} } */
-/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */
-/* { dg-final { scan-assembler-not "\[ \t\]stxv"} } */
-/* { dg-final { scan-assembler-not "\[ \t\]lxv" } } */
-/* { dg-final { scan-assembler-not "\[ \t\]srdi "   } } */
+/* { dg-final { scan-assembler {\mxxland\M}  } } */
+/* { dg-final { scan-assembler-not {\mand\M} } } */
+/* { dg-final { scan-assembler-not {\mmfvsrd\M}  } } */
+/* { dg-final { scan-assembler-not {\mstxv\M}} } */
+/* { dg-final { scan-assembler-not {\mlxv\M} } } */
+/* { dg-final { scan-assembler-not {\msrdi\M}} } */
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float1.c
===
--- gcc/testsuite/gcc.target/powerpc/direct-move-float1.c   (revision 
253159)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float1.c   (working copy)
@@ -4,10 +4,10 @@
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O2" } */
-/* { dg-final { scan-assembler "mtvsrd" } } */
-/* { dg-final { scan-assembler "mfvsrd" } } */
-/* { dg-final { scan-assembler "xscvdpspn" } } */
-/* { dg-final { scan-assembler "xscvspdpn" } } */
+/* { dg-final { scan-assembler {\mmtvsrd\M}} } */
+/* { dg-final { scan-assembler {\mmfvsrwz\M}   } } */
+/* { dg-final { scan-assembler {\mxscvdpspn\M} } } */
+/* { dg-final { scan-assembler {\mxscvspdpn\M} } } */
 
 /* Check code generation for direct move for float types.  */
 
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float3.c
===
--- gcc/testsuite/gcc.target/powerpc/direct-move-float3.c   (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float3.c   (revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mpower8-vector -O2" } */
+
+/* Test that we generate XSCVDPSP instead of FRSP and XSCVDPSPN when we combine
+   a round from double to float and moving the float value to a GPR.  */
+
+union u {
+  float f;
+  unsigned int ui;
+  int si;
+};
+
+unsigned int
+ui_d (double d)
+{
+  union u x;
+  x.f = d;
+  return x.ui;
+}
+
+/* { dg-final { scan-assembler {\mmfvsrwz\M}   } } */
+/* { dg-final { scan-assembler {\mxscvdpsp\M}  } } */
+/* { dg-final { scan-assembler-not {\mmfvsrd\M}} } */
+/* { dg-final

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #7 of 8

2017-09-26 Thread Michael Meissner

Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #7.  Can I check this into the trunk?  This patch fixes the
peephole to optimize code like (that shows up in the math library):

float value;
unsigned int u2;
union {
  float f;
  unsigned int ui;
} u;

u.f = value;
u2 = u.ui & 0x8000;

2017-09-25  Michael Meissner  

* config/rs6000/vsx.md (peephole for optimizing move SF to GPR):
Adjust code to eliminate needing to do the shift right 32-bits
operation after XSCVDPSPN.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 253175)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -4733,9 +4733,10 @@ (define_constants
(SFBOOL_SHL_D7) ;; shift left dest
(SFBOOL_SHL_A8) ;; shift left arg
(SFBOOL_MTVSR_D  9) ;; move to vecter dest
-   (SFBOOL_BOOL_A_DI   10) ;; SFBOOL_BOOL_A1/A2 as DImode
-   (SFBOOL_TMP_VSX_DI  11) ;; SFBOOL_TMP_VSX as DImode
-   (SFBOOL_MTVSR_D_V4SF12)])   ;; SFBOOL_MTVSRD_D as 
V4SFmode
+   (SFBOOL_MFVSR_A_V4SF10) ;; SFBOOL_MFVSR_A as 
V4SFmode
+   (SFBOOL_BOOL_A_DI   11) ;; SFBOOL_BOOL_A1/A2 as DImode
+   (SFBOOL_TMP_VSX_DI  12) ;; SFBOOL_TMP_VSX as DImode
+   (SFBOOL_MTVSR_D_V4SF13)])   ;; SFBOOL_MTVSRD_D as 
V4SFmode
 
 ;; Attempt to optimize some common GLIBC operations using logical operations to
 ;; pick apart SFmode operations.  For example, there is code from e_powf.c
@@ -4773,29 +4774,22 @@ (define_constants
 ;;
 ;; (set (reg:DI reg3) (unspec:DI [(reg:V4SF reg2)] UNSPEC_P8V_RELOAD_FROM_VSX))
 ;;
-;; (set (reg:DI reg3) (lshiftrt:DI (reg:DI reg3) (const_int 32)))
+;; (set (reg:DI reg4) (and:DI (reg:DI reg3) (reg:DI reg3)))
 ;;
-;; (set (reg:DI reg5) (and:DI (reg:DI reg3) (reg:DI reg4)))
+;; (set (reg:DI reg5) (ashift:DI (reg:DI reg4) (const_int 32)))
 ;;
-;; (set (reg:DI reg6) (ashift:DI (reg:DI reg5) (const_int 32)))
+;; (set (reg:SF reg6) (unspec:SF [(reg:DI reg5)] UNSPEC_P8V_MTVSRD))
 ;;
-;; (set (reg:SF reg7) (unspec:SF [(reg:DI reg6)] UNSPEC_P8V_MTVSRD))
-;;
-;; (set (reg:SF reg7) (unspec:SF [(reg:SF reg7)] UNSPEC_VSX_CVSPDPN))
+;; (set (reg:SF reg6) (unspec:SF [(reg:SF reg6)] UNSPEC_VSX_CVSPDPN))
 
 (define_peephole2
   [(match_scratch:DI SFBOOL_TMP_GPR "r")
(match_scratch:V4SF SFBOOL_TMP_VSX "wa")
 
-   ;; MFVSRD
+   ;; MFVSRWZ (aka zero_extend)
(set (match_operand:DI SFBOOL_MFVSR_D "int_reg_operand")
-   (unspec:DI [(match_operand:V4SF SFBOOL_MFVSR_A "vsx_register_operand")]
-  UNSPEC_P8V_RELOAD_FROM_VSX))
-
-   ;; SRDI
-   (set (match_dup SFBOOL_MFVSR_D)
-   (lshiftrt:DI (match_dup SFBOOL_MFVSR_D)
-(const_int 32)))
+   (zero_extend:DI
+(match_operand:SI SFBOOL_MFVSR_A "vsx_register_operand")))
 
;; AND/IOR/XOR operation on int
(set (match_operand:SI SFBOOL_BOOL_D "int_reg_operand")
@@ -4820,15 +4814,15 @@ (define_peephole2
&& (REG_P (operands[SFBOOL_BOOL_A2])
|| CONST_INT_P (operands[SFBOOL_BOOL_A2]))
&& (REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_MFVSR_D])
-   || peep2_reg_dead_p (3, operands[SFBOOL_MFVSR_D]))
+   || peep2_reg_dead_p (2, operands[SFBOOL_MFVSR_D]))
&& (REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A1])
|| (REG_P (operands[SFBOOL_BOOL_A2])
   && REGNO (operands[SFBOOL_MFVSR_D])
== REGNO

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #6 of 8

2017-09-26 Thread Michael Meissner

Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #6.  Can I check this into the trunk?  When I first did the
power7 port years ago, I decorated a lot of the insns with 2 alternatives, one
that used the type specific constraint (i.e. "wf" for V4SF, "wd" for V2DF,
etc.)  and then the all VSX constraint ("wa") as an alternative with '?'.  The
theory was we might want to favor Altivec or FPR registers for a given vector
type.  However, we've never done this.  Further more with the change to use
LRA instead of reload, this isn't as useful.  So, as I encounter these dual
alternatived, I have been eliminating them.

2017-09-25  Michael Meissner  

* config/rs6000/vsx.md (vsx_xscvdpspn): Eliminate useless
alternative constraint.
(vsx_xscvspdpn): Likewise.
(vsx_xscvspdpn_scalar): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 253166)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -1811,24 +1811,24 @@ (define_insn "vsx_xscvdpsp_scalar"
 
 ;; ISA 2.07 xscvdpspn/xscvspdpn that does not raise an error on signalling NaNs
 (define_insn "vsx_xscvdpspn"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww,?ww")
-   (unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "wd,wa")]
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww")
+   (unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "ws")]
 UNSPEC_VSX_CVDPSPN))]
   "TARGET_XSCVDPSPN"
   "xscvdpspn %x0,%x1"
   [(set_attr "type" "fp")])
 
 (define_insn "vsx_xscvspdpn"
-  [(set (match_operand:DF 0 "vsx_register_operand" "=ws,?ws")
-   (unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wf,wa")]
+  [(set (match_operand:DF 0 "vsx_register_operand" "=ws")
+   (unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
   UNSPEC_VSX_CVSPDPN))]
   "TARGET_XSCVSPDPN"
   "xscvspdpn %x0,%x1"
   [(set_attr "type" "fp")])
 
 (define_insn "vsx_xscvdpspn_scalar"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wf,?wa")
-   (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww,ww")]
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+   (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")]
 UNSPEC_VSX_CVDPSPN))]
   "TARGET_XSCVDPSPN"
   "xscvdpspn %x0,%x1"

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #5 of 8

2017-09-26 Thread Michael Meissner

Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #5.  Can I check this into the trunk?  In working on the patch, I
noticed that the XSCVDPSP insn used "f" to limit the register to traditional
FPR registers.  This insn was written for power7, which did not support SFmode
in Altivec registers.  Now that power8 supports SFmode in Altivec registers,
this patch uses the proper constraint ("ww") so we can avoid a move
instruction.

2017-09-25  Michael Meissner  

* config/rs6000/vsx.md (vsx_xscvdpsp_scalar): Use "ww" constraint
instead of "f" to allow SFmode to be in traditional Altivec
registers.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 253165)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -1803,7 +1803,7 @@ (define_insn "vsx_xvcvhpsp"
 ;; format of scalars is actually DF.
 (define_insn "vsx_xscvdpsp_scalar"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
-   (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "f")]
+   (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")]
 UNSPEC_VSX_CVSPDP))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "xscvdpsp %x0,%x1"

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #4 of 8

2017-09-26 Thread Michael Meissner

Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #4.  Can I check this into the trunk?  This is a cosmetic change,
in that I noticed the two insns providing XSCVDPSP were separated by another
insn, and it moves the 2nd insn to be adjacent to the first.

2017-09-25  Michael Meissner  

* config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so that
it is adjacent to the other XSCVSPDP insns.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 253173)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -1781,6 +1781,15 @@ (define_insn "vsx_xscvspdp"
   "xscvspdp %x0,%x1"
   [(set_attr "type" "fp")])
 
+;; Same as vsx_xscvspdp, but use SF as the type
+(define_insn "vsx_xscvspdp_scalar2"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
+   (unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
+  UNSPEC_VSX_CVSPDP))]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+  "xscvspdp %x0,%x1"
+  [(set_attr "type" "fp")])
+
 ;; Generate xvcvhpsp instruction
 (define_insn "vsx_xvcvhpsp"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
@@ -1800,15 +1809,6 @@ (define_insn "vsx_xscvdpsp_scalar"
   "xscvdpsp %x0,%x1"
   [(set_attr "type" "fp")])
 
-;; Same as vsx_xscvspdp, but use SF as the type
-(define_insn "vsx_xscvspdp_scalar2"
-  [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
-   (unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
-  UNSPEC_VSX_CVSPDP))]
-  "VECTOR_UNIT_VSX_P (V4SFmode)"
-  "xscvspdp %x0,%x1"
-  [(set_attr "type" "fp")])
-
 ;; ISA 2.07 xscvdpspn/xscvspdpn that does not raise an error on signalling NaNs
 (define_insn "vsx_xscvdpspn"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww,?ww")

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #3 of 8

2017-09-26 Thread Michael Meissner

Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #3.  Can I check this into the trunk?  I noticed that sometimes
you want to convert a DFmode to SFmode and then move it to a GPR register.
This patch adds a combiner insn to use the XSCVDPSP instruction (which does the
proper rounding and handles out of bound values) instead of the XSCVDPSPN
instruction (which assumes the value has already been rounded, and does not
raise an exception in case of a Nan).

2017-09-25  Michael Meissner  

* config/rs6000/rs6000.md (movsi_from_df): Optimize converting a
DFmode to a SFmode, and then needing to move the SFmode to a GPR
to use the XSCVDPSP instruction instead of FRSP and XSCVDPSPN.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 253170)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -6919,6 +6919,26 @@ (define_insn_and_split "*movdi_from_sf_z
"4,  4,   4,   4,8,
 8,  4")])
 
+;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before
+;; moving it to SImode.  We can do a SFmode store without having to do the
+;; conversion explicitly.  If we are doing a register->register conversion, use
+;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the
+;; input will not fit in a SFmode, and the later assumes the value has already
+;; been rounded.
+(define_insn "*movsi_from_df"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=wa,m,wY,Z")
+   (unspec:SI [(float_truncate:SF
+(match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))]
+   UNSPEC_SI_FROM_SF))]
+
+  "TARGET_NO_SF_SUBREG"
+  "@
+   xscvdpsp %x0,%x1
+   stfs%U0%X0 %1,%0
+   stxssp %1,%0
+   stxsspx %x1,%y0"
+  [(set_attr "type"   "fp,fpstore,fpstore,fpstore")])
+
 ;; Split a load of a large constant into the appropriate two-insn
 ;; sequence.

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #2 of 8

2017-09-26 Thread Michael Meissner

Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #2.  Can I check this into the trunk?  Compared to the previous
patch, I simplified this to use zero_extendsidi2 instead of using an UNSPEC,
and I deleted the UNSPEC.

2017-09-25  Michael Meissner  

* config/rs6000/rs6000.md (movsi_from_sf): Adjust code to
eliminate doing a 32-bit shift right or vector extract after doing
XSCVDPSPN.  Use zero_extendsidi2 instead of p8_mfvsrd_4_disf to
move the value to the GPRs.
(movdi_from_sf_zero_ext): Likewise.
(reload_gpr_from_vsxsf): Likewise.
(p8_mfvsrd_4_disf): Delete, no longer used.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 253169)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -6806,25 +6806,25 @@ (define_insn "*movsi_internal1_single"
 ;; needed.
 
 ;; MR   LWZ  LFIWZX   LXSIWZX   STW
-;; STFS STXSSP   STXSSPX  VSX->GPR  MTVSRWZ
-;; VSX->VSX
+;; STFS STXSSP   STXSSPX  VSX->GPR  VSX->VSX,
+;; MTVSRWZ
 
 (define_insn_and_split "movsi_from_sf"
   [(set (match_operand:SI 0 "nonimmediate_operand"
"=r, r,   ?*wI,?*wH, m,
-m,  wY,  Z,   r,wIwH,
-?wK")
+m,  wY,  Z,   r,?*wIwH,
+wIwH")
 
(unspec:SI [(match_operand:SF 1 "input_operand"
"r,  m,   Z,   Z,r,
-f,  wb,  wu,  wIwH, r,
-wK")]
+f,  wb,  wu,  wIwH, wIwH,
+r")]
UNSPEC_SI_FROM_SF))
 
(clobber (match_scratch:V4SF 2
"=X, X,   X,   X,X,
-X,  X,   X,   wa,   X,
-wa"))]
+X,  X,   X,   wIwH, X,
+X"))]
 
   "TARGET_NO_SF_SUBREG
&& (register_operand (operands[0], SImode)
@@ -6839,10 +6839,10 @@ (define_insn_and_split "movsi_from_sf"
stxssp %1,%0
stxsspx %x1,%y0
#
-   mtvsrwz %x0,%1
-   #"
+   xscvdpspn %x0,%x1
+   mtvsrwz %x0,%1"
   "&& reload_completed
-   && register_operand (operands[0], SImode)
+   && int_reg_operand (operands[0], SImode)
&& vsx_reg_sfsubreg_ok (operands[1], SFmode)"
   [(const_int 0)]
 {
@@ -6850,52 +6850,41 @@ (define_insn_and_split "movsi_from_sf"
   rtx op1 = operands[1];
   rtx op2 = operands[2];
   rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
+  rtx op2_si = gen_rtx_REG (SImode, REGNO (op2));
 
   emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
-
-  if (int_reg_operand (op0, SImode))
-{
-  emit_insn (gen_p8_mfvsrd_4_disf (op0_di, op2));
-  emit_insn (gen_lshrdi3 (op0_di, op0_di, GEN_INT (32)));
-}
-  else
-{
-  rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1));
-  rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12);
-  emit_insn (gen_vextract4b (op0_di, op1_v16qi, byte_off));
-}
-
+  emit_insn (gen_zero_extendsidi2 (op0_di, op2_si));
   DONE;
 }
   [(set_attr "type"
"*,  load,fpload,  fpload,   store,
-fpstore,fpstore, fpstore, mftgpr,   mffgpr,
-veclogical")
+fpstore,fpstore, fpstore, mftgpr,   fp,
+mffgpr")
 
(set_attr "length"
"4,  4,

Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #1 of 8

2017-09-26 Thread Michael Meissner

Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #1.  Can I check this into the trunk?  I noticed without this
patch, sometimes the register allocator would do a store and then a load to
move a SImode value from a vector register to a GPR and sign extend it, instead
of doing the move and then the sign extension.

2017-09-25  Michael Meissner  

* config/rs6000/rs6000.md (extendsi2): Add a splitter to do
sign extension from a vector register to a GPR by doing a 32-bit
direct move and then an EXTSW.
(extendsi2 splitter): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 253157)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -986,8 +986,11 @@ (define_insn_and_split "*extendhi2
 
 
 (define_insn "extendsi2"
-  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,wl,wu,wj,wK,wH")
-   (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand" 
"Y,r,Z,Z,r,wK,wH")))]
+  [(set (match_operand:EXTSI 0 "gpc_reg_operand"
+"=r, r,   wl,wu,wj,wK, wH,wr")
+
+   (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand"
+"Y,  r,   Z, Z, r, wK, wH,?wIwH")))]
   ""
   "@
lwa%U1%X1 %0,%1
@@ -996,10 +999,23 @@ (define_insn "extendsi2"
lxsiwax %x0,%y1
mtvsrwa %x0,%1
vextsw2d %0,%1
+   #
#"
-  [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm")
+  [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm,mftgpr")
(set_attr "sign_extend" "yes")
-   (set_attr "length" "4,4,4,4,4,4,8")])
+   (set_attr "length" "4,4,4,4,4,4,8,8")])
+
+(define_split
+  [(set (match_operand:DI 0 "int_reg_operand")
+   (sign_extend:DI (match_operand:SI 1 "vsx_register_operand")))]
+  "TARGET_DIRECT_MOVE_64BIT && reload_completed"
+  [(set (match_dup 2)
+   (match_dup 1))
+   (set (match_dup 0)
+   (sign_extend:DI (match_dup 2)))]
+{
+  operands[2] = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+})
 
 (define_split
   [(set (match_operand:DI 0 "altivec_register_operand")

Re: [PATCH][GRAPHITE] Simplify SCOP detection

2017-09-26 Thread Sebastian Pop

On Tue, Sep 26, 2017 at 7:03 AM, Richard Biener  wrote:

>
> The following is the result of me trying to understand SCOP detection
> and the validity checks spread around the machinery.  It removes several
> quadraticnesses by folding validity checks into
> scop_detection::harmful_loop_in_region where we already walk over all
> BBs in the region and process individual found loops.
>
> It also rewrites build_scop_depth/build_scop_breadth into something
> I can undestand.
>
> Bootstrap and regtest is running on x86_64-unknown-linux-gnu (graphite.exp
> for all langs is happy, so is SPEC CPU 2006 testing where the statistics
> agree before/after the patch).
>
> I'll apply this after the bootstrap finished.
>

Have you tried to bootstrap with BOOT_CFLAGS="-O2 -fgraphite-identity"?


> Richard.
>
> 2017-09-26  Richard Biener  
>
> * graphite-scop-detection.c (scop_detection::build_scop_depth):
> Rewrite,
> fold in ...
> (scop_detection::build_scop_breadth): ... this.  Removed.
> (scop_detection::loop_is_valid_in_scop): Fold into single caller.
> (scop_detection::harmful_stmt_in_bb): Likewise.
> (scop_detection::graphite_can_represent_stmt): Likewise.
> (scop_detection::loop_body_is_valid_scop): Likewise.  Remove
> recursion.
> (scop_detection::can_represent_loop): Remove recursion, fold in
> ...
> (scop_detection::can_represent_loop_1): ... this.  Removed.
> (scop_detection::harmful_loop_in_region): Simplify after inlining
> the above and remove more quadraticness.
> (build_scops): Adjust.
> * tree-data-ref.c (loop_nest_has_data_refs): Remove pointless
> quadraticness.
>
>
This goes in the right direction: it cuts down compilation time.
As it is not a trivial change, I need some time to understand how
the scop detection works with this change.

Sebastian

Re: [PATCH][GRAPHITE] More TLC

2017-09-26 Thread Sebastian Pop

On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener  wrote:

> On Fri, 22 Sep 2017, Sebastian Pop wrote:
>
> > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener 
> wrote:
> >
> > >
> > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > TLC.  It also adds a testcase I reduced from a stupid mistake I made
> > > when reworking canonicalize_loop_closed_ssa.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > >
> > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > -Ofast -march=haswell -floop-nest-optimize are
> > >
> > >  61 loop nests "optimized"
> > >  45 loop nest transforms cancelled because of code generation issues
> > >  21 loop nest optimizations timed out the 35 ISL "operations" we
> allow
> > >
> > > I say "optimized" because the usual transform I've seen is static
> tiling
> > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > There's no way to automagically figure what kind of transform ISL did
> > >
> >
> > Here is how to automate (without magic) the detection
> > of the transform that isl did.
> >
> > The problem solved by isl is the minimization of strides
> > in memory, and to do this, we need to tell the isl scheduler
> > the validity dependence graph, in graphite-optimize-isl.c
> > see the validity (RAW, WAR, WAW) and the proximity
> > (RAR + validity) maps.  The proximity does include the
> > read after read, as the isl scheduler needs to minimize
> > strides between consecutive reads.
> >
> > When you apply the schedule to the dependence graph,
> > one can tell from the result the strides in memory, a good
> > way to say whether a transform was beneficial is to sum up
> > all memory strides, and make sure that the sum of all strides
> > decreases after transform.  We could add a printf with the
> > sum of strides before and after transforms, and have the
> > testcases check for that.
>
> Interesting.  Can you perhaps show me in code how to do that?
>
>
Sven, is there already a function that computes the sum of all
strides in a proximity map?  Maybe you have code that does
something similar in pet or ppcg?

Thanks,
Sebastian

Re: [PATCH] Fix PR82321

2017-09-26 Thread Sebastian Pop

On Tue, Sep 26, 2017 at 6:02 AM, Richard Biener  wrote:

>
> Latent, exposed by me removing the "redundant"
> rewrite-into-loop-closed-ssa.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>
> Richard.
>
> 2017-09-26  Richard Biener  
>
> PR tree-optimization/82321
> * graphite.c (canonicalize_loop_closed_ssa): Properly check
> for the def being inside the loop.
>
> * gcc.dg/graphite/pr82321.c: New testcase.
>
>
Looks good.
Thanks!

Re: [PATCH] Add helper to sort sibling loops, do so in GRAPHITE

2017-09-26 Thread Sebastian Pop

On Mon, Sep 25, 2017 at 8:18 AM, Richard Biener  wrote:

>
> The following adds a helper to sort the sibling loop list in RPO order
> as it can get messed up (we only ever add loops at the start of the list).
> GRAPHITE SCOP detection assumes this list is sorted naturally in RPO
> order (as a flow_loops_find would generate).
>
> Turns out it helps a few more loops in SPEC CPU 2006 to get optimized
> by GRAPHITE.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC 2k6 is happy
> with GRAPHITE.
>
> I've tested the variant below with the extra call in pass_tree_loop_init
> but as no pass cares about the sibling list order but graphite I'll not
> commit that hunk.
>
> Applied to trunk (w/o that hunk)
>
> Richard.
>
> 2017-09-25  Richard Biener  
>
> * cfgloop.h (sort_sibling_loops): Declare.
> * cfgloop.c (sort_sibling_loops_cmp): New helper.
> (sort_sibling_loops): New function sorting the sibling loop list
> in RPO order.
> * graphite.c (graphite_transform_loops): Sort sibling loops.
>
>
Looks good.  Thanks!

Re: [PATCH][GRAPHITE] More -fopt-info, do not abort from ISL

2017-09-26 Thread Sebastian Pop

On Mon, Sep 25, 2017 at 4:47 AM, Richard Biener  wrote:

>
> The following also dumps if the optimized schedule is equal to the
> original one.  It also makes all ISL operations (well, nearly) not
> abort on errors but instead propagate errors upward.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> Richard.
>
> 2017-09-25  Richard Biener  
>
> * graphite-optimize-isl.c (optimize_isl): Fail and dump if
> ISL errors other than isl_error_quota happen.  Dump if the
> schedule is the same.
> * graphite-sese-to-poly.c (build_poly_scop): Fail on ISL
> errors instead of aborting inside ISL.
>
>
Looks good.

[PATCH 1/2] C++: avoid partial duplicate implementation of cp_parser_error

2017-09-26 Thread David Malcolm

In r251026 (aka 3fe34694f0990d1d649711ede0326497f8a849dc,
"C/C++: show pertinent open token when missing a close token")
I copied part of cp_parser_error into cp_parser_required_error,
leading to duplication of code.

This patch eliminates this duplication by merging the two copies of the
code into a new cp_parser_error_1 subroutine.

Doing so removes an indentation level, making the patch appear to have
more churn than it really does.

The patch also undoes the change to g++.dg/parse/pragma2.C, as the
old behavior is restored.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/cp/ChangeLog:
* parser.c (get_matching_symbol): Move to before...
(cp_parser_error): Split out into...
(cp_parser_error_1): ...this new function, merging in content
from...
(cp_parser_required_error): ...here.  Eliminate partial duplicate
of body of cp_parser_error in favor of a call to the new
cp_parser_error_1 helper function.

gcc/testsuite/ChangeLog:
* g++.dg/parse/pragma2.C: Update to reflect reinstatement of the
"#pragma is not allowed here" error.
---
 gcc/cp/parser.c  | 169 ---
 gcc/testsuite/g++.dg/parse/pragma2.C |   4 +-
 2 files changed, 97 insertions(+), 76 deletions(-)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 25b91df..56d9442 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -2767,53 +2767,116 @@ cp_lexer_peek_conflict_marker (cp_lexer *lexer, enum 
cpp_ttype tok1_kind,
   return true;
 }
 
-/* If not parsing tentatively, issue a diagnostic of the form
+/* Get a description of the matching symbol to TOKEN_DESC e.g. "(" for
+   RT_CLOSE_PAREN.  */
+
+static const char *
+get_matching_symbol (required_token token_desc)
+{
+  switch (token_desc)
+{
+default:
+  gcc_unreachable ();
+  return "";
+case RT_CLOSE_BRACE:
+  return "{";
+case RT_CLOSE_PAREN:
+  return "(";
+}
+}
+
+/* Subroutine of cp_parser_error and cp_parser_required_error.
+
+   Issue a diagnostic of the form
   FILE:LINE: MESSAGE before TOKEN
where TOKEN is the next token in the input stream.  MESSAGE
(specified by the caller) is usually of the form "expected
-   OTHER-TOKEN".  */
+   OTHER-TOKEN".
+
+   This bypasses the check for tentative passing, and potentially
+   adds material needed by cp_parser_required_error.
+
+   If MISSING_TOKEN_DESC is not RT_NONE, and MATCHING_LOCATION is not
+   UNKNOWN_LOCATION, then we have an unmatched symbol at
+   MATCHING_LOCATION; highlight this secondary location.  */
 
 static void
-cp_parser_error (cp_parser* parser, const char* gmsgid)
+cp_parser_error_1 (cp_parser* parser, const char* gmsgid,
+  required_token missing_token_desc,
+  location_t matching_location)
 {
-  if (!cp_parser_simulate_error (parser))
+  cp_token *token = cp_lexer_peek_token (parser->lexer);
+  /* This diagnostic makes more sense if it is tagged to the line
+ of the token we just peeked at.  */
+  cp_lexer_set_source_position_from_token (token);
+
+  if (token->type == CPP_PRAGMA)
 {
-  cp_token *token = cp_lexer_peek_token (parser->lexer);
-  /* This diagnostic makes more sense if it is tagged to the line
-of the token we just peeked at.  */
-  cp_lexer_set_source_position_from_token (token);
+  error_at (token->location,
+   "%<#pragma%> is not allowed here");
+  cp_parser_skip_to_pragma_eol (parser, token);
+  return;
+}
 
-  if (token->type == CPP_PRAGMA)
+  /* If this is actually a conflict marker, report it as such.  */
+  if (token->type == CPP_LSHIFT
+  || token->type == CPP_RSHIFT
+  || token->type == CPP_EQ_EQ)
+{
+  location_t loc;
+  if (cp_lexer_peek_conflict_marker (parser->lexer, token->type, ))
{
- error_at (token->location,
-   "%<#pragma%> is not allowed here");
- cp_parser_skip_to_pragma_eol (parser, token);
+ error_at (loc, "version control conflict marker in file");
  return;
}
+}
 
-  /* If this is actually a conflict marker, report it as such.  */
-  if (token->type == CPP_LSHIFT
- || token->type == CPP_RSHIFT
- || token->type == CPP_EQ_EQ)
-   {
- location_t loc;
- if (cp_lexer_peek_conflict_marker (parser->lexer, token->type, ))
-   {
- error_at (loc, "version control conflict marker in file");
- return;
-   }
-   }
+  gcc_rich_location richloc (input_location);
+
+  bool added_matching_location = false;
+
+  if (missing_token_desc != RT_NONE)
+{
+  /* If matching_location != UNKNOWN_LOCATION, highlight it.
+Attempt to consolidate diagnostics by printing it as a
+   secondary range within the main diagnostic.  */
+  if (matching_location != UNKNOWN_LOCATION)
+   added_matching_location
+ =

[PATCH 2/2] C/C++: add fix-it hints for various missing symbols (v2)

2017-09-26 Thread David Malcolm

The patch improves our C/C++ frontends' handling of missing
symbols, by making c_parser_require and cp_parser_require use
"better" locations for the diagnostic, and insert fix-it hints,
under certain circumstances (see the comments in the patch for
full details).

For example, for this code with a missing semicolon:

  $ cat test.c
  int missing_semicolon (void)
  {
return 42
  }

  trunk currently emits:

  test.c:4:1: error: expected ';' before '}' token
   }
   ^

This patch adds a fix-it hint for the missing semicolon, and puts
the error at the location of the missing semicolon, printing the
followup token as a secondary location:

  test.c:3:12: error: expected ';' before '}' token
 return 42
  ^
  ;
   }
   ~

More examples can be seen in the test cases.

This is a revised version of the patch I posted here:
  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00135.html

Some of the changes in that patch landed in trunk in r251026
(aka 3fe34694f0990d1d649711ede0326497f8a849dc
"C/C++: show pertinent open token when missing a close token"),
so this patch contains the remaining part, updated also for the
previous patch that reunifies the cloned copiesc of cp_parser_error
introduced in r251026.

It also:
- fixes the typo seen by Jeff
- eliminated some unnecessary changes to c-c++-common/missing-symbol.c
- fixes some bugs

r250133, r250134, and r251026 already incorporated the suggestion from
Richard Sandiford to consolidate note-printing when the matching location
is near the primary location of the diagnostic.

This patch doesn't address Joseph's requests to tackle PR 7356 and
PR 18248, but he said that it was OK to leave these for followups.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu
in conjunction with patch 1 of the kit.

OK for trunk?

gcc/c-family/ChangeLog:
* c-common.c (enum missing_token_insertion_kind): New enum.
(get_missing_token_insertion_kind): New function.
(maybe_suggest_missing_token_insertion): New function.
* c-common.h (maybe_suggest_missing_token_insertion): New decl.

gcc/c/ChangeLog:
* c-parser.c (struct c_parser): Add "previous_token_loc" field.
(c_parser_consume_token): Set parser->previous_token_loc.
(get_matching_symbol): Likewise.
(c_parser_require): Add "type_is_unique" param and use it
to guard calls to maybe_suggest_missing_token_insertion.
(c_parser_parms_list_declarator): Override default value of new
"type_is_unique" param to c_parser_require.
(c_parser_asm_statement): Likewise.
* c-parser.h (c_parser_require): Add "type_is_unique" param,
defaulting to true.

gcc/cp/ChangeLog:
* parser.c (get_required_cpp_ttype): New function.
(cp_parser_error_1): Call it, using the result to call
maybe_suggest_missing_token_insertion.

gcc/testsuite/ChangeLog:
* c-c++-common/cilk-plus/AN/parser_errors.c: Update expected
output to reflect changes to reported locations of missing
symbols.
* c-c++-common/cilk-plus/AN/parser_errors2.c: Likewise.
* c-c++-common/cilk-plus/AN/parser_errors3.c: Likewise.
* c-c++-common/cilk-plus/AN/pr61191.c: Likewise.
* c-c++-common/gomp/pr63326.c: Likewise.
* c-c++-common/missing-close-symbol.c: Likewise, also update for
new fix-it hints.
* c-c++-common/missing-symbol.c: Likewise, also add test coverage
for missing colon in ternary operator.
* g++.dg/cpp1y/digit-sep-neg.C: Likewise.
* g++.dg/cpp1y/pr65202.C: Likewise.
* g++.dg/missing-symbol-2.C: New test case.
* g++.dg/other/do1.C: Update expected output to reflect
changes to reported locations of missing symbols.
* g++.dg/parse/error11.C: Likewise.
* g++.dg/template/error11.C: Likewise.
* gcc.dg/missing-symbol-2.c: New test case.
* gcc.dg/missing-symbol-3.c: New test case.
* gcc.dg/noncompile/940112-1.c: Update expected output to reflect
changes to reported locations of missing symbols.
* gcc.dg/noncompile/971104-1.c: Likewise.
* obj-c++.dg/exceptions-6.mm: Likewise.
* obj-c++.dg/pr48187.mm: Likewise.
* objc.dg/exceptions-6.m: Likewise.
---
 gcc/c-family/c-common.c| 158 +
 gcc/c-family/c-common.h|   3 +
 gcc/c/c-parser.c   |  29 +++-
 gcc/c/c-parser.h   |   3 +-
 gcc/cp/parser.c|  51 ++-
 .../c-c++-common/cilk-plus/AN/parser_errors.c  |   4 +-
 .../c-c++-common/cilk-plus/AN/parser_errors2.c |   3 +-
 .../c-c++-common/cilk-plus/AN/parser_errors3.c |   3 +-
 gcc/testsuite/c-c++-common/cilk-plus/AN/pr61191.c  |   3 +-
 gcc/testsuite/c-c++-common/gomp/pr63326.c  |  22 +--
 gcc/testsuite/c-c++-common/missing-close-symbol.c  |   2

[PATCH 0/2] Re: [PATCH] C/C++: add fix-it hints for various missing symbols

2017-09-26 Thread David Malcolm

On Mon, 2017-08-28 at 09:22 -0600, Jeff Law wrote:
> On 07/03/2017 12:37 PM, David Malcolm wrote:
> > This patch improves our C/C++ frontends' handling of missing
> > symbols, by making c_parser_require and cp_parser_require use
> > "better" locations for the diagnostic, and insert fix-it hints,
> > under certain circumstances (see the comments in the patch for
> > full details).
> > 
> > For example, for this code with a missing semicolon:
> > 
> >   $ cat test.c
> >   int missing_semicolon (void)
> >   {
> > return 42
> >   }
> > 
> > trunk currently emits:
> > 
> >   test.c:4:1: error: expected ‘;’ before ‘}’ token
> >}
> >^
> > 
> > This patch adds a fix-it hint for the missing semicolon, and puts
> > the error at the location of the missing semicolon, printing the
> > followup token as a secondary location:
> > 
> >   test.c:3:12: error: expected ‘;’ before ‘}’ token
> >  return 42
> >   ^
> >   ;
> >}
> >~
> > 
> > More examples can be seen in the test cases.
> > 
> > For reference, clang prints the following:
> > 
> >   test.c:3:12: error: expected ';' after return statement
> > return 42
> >  ^
> >  ;
> > 
> > i.e. describing what syntactic thing came before, which
> > I think is likely to be more meaningful to the user.
> > 
> > clang can also print notes about matching opening symbols
> > e.g. the note here:
> > 
> >   missing-symbol-2.c:25:22: error: expected ']'
> > const char test [42;
> >^
> >   missing-symbol-2.c:25:19: note: to match this '['
> > const char test [42;
> > ^
> > which, although somewhat redundant for this example, seems much
> > more
> > useful if there's non-trivial nesting of constructs, or more than a
> > few
> > lines separating the open/close symbols (e.g. showing a stray
> > "namespace {"
> > that the user forgot to close).
> > 
> > I'd like to implement both of these ideas as followups, but in
> > the meantime, is the fix-it hint patch OK for trunk?
> > (successfully bootstrapped & regrtested on x86_64-pc-linux-gnu)
> > 
> > gcc/c-family/ChangeLog:
> > * c-common.c (c_parse_error): Add RICHLOC param, and use it
> > rather
> > than implicitly using input_location.
> > (enum missing_token_insertion_kind): New enum.
> > (get_missing_token_insertion_kind): New function.
> > (maybe_suggest_missing_token_insertion): New function.
> > * c-common.h (c_parse_error): Add RICHLOC param.
> > (maybe_suggest_missing_token_insertion): New decl.
> > 
> > gcc/c/ChangeLog:
> > * c-parser.c (struct c_parser): Add "previous_token_loc" field.
> > (c_parser_consume_token): Set parser->previous_token_loc.
> > (c_parser_error): Rename to...
> > (c_parser_error_richloc): ...this, making static, and adding
> > "richloc" parameter, passing it to the c_parse_error call,
> > rather than calling c_parser_set_source_position_from_token.
> > (c_parser_error): Reintroduce, reimplementing in terms of the
> > above.
> > (c_parser_require): Add "type_is_unique" param.  Use
> > c_parser_error_richloc rather than c_parser_error, calling
> > maybe_suggest_missing_token_insertion.
> > (c_parser_parms_list_declarator): Override default value of new
> > "type_is_unique" param to c_parser_require.
> > (c_parser_asm_statement): Likewise.
> > * c-parser.h (c_parser_require): Add "type_is_unique" param,
> > defaulting to true.
> > 
> > gcc/cp/ChangeLog:
> > * parser.c (cp_parser_error): Add rich_location to call to
> > c_parse_error.
> > (get_required_cpp_ttype): New function.
> > (cp_parser_required_error): Remove calls to cp_parser_error,
> > instead setting a non-NULL gmsgid, and handling it if set by
> > calling c_parse_error, potentially with a fix-it hint.
> > 
> > gcc/testsuite/ChangeLog:
> > * c-c++-common/cilk-plus/AN/parser_errors.c: Update expected
> > output to reflect changes to reported locations of missing
> > symbols.
> > * c-c++-common/cilk-plus/AN/parser_errors2.c: Likewise.
> > * c-c++-common/cilk-plus/AN/parser_errors3.c: Likewise.
> > * c-c++-common/cilk-plus/AN/pr61191.c: Likewise.
> > * c-c++-common/gomp/pr63326.c: Likewise.
> > * c-c++-common/missing-symbol.c: New test case.
> > * g++.dg/cpp1y/digit-sep-neg.C: Update expected output to
> > reflect
> > changes to reported locations of missing symbols.
> > * g++.dg/cpp1y/pr65202.C: Likewise.
> > * g++.dg/other/do1.C: Likewise.
> > * g++.dg/missing-symbol-2.C: New test case.
> > * g++.dg/parse/error11.C: Update expected output to reflect
> > changes to reported locations of missing symbols.
> > * g++.dg/parse/pragma2.C: Likewise.
> > * g++.dg/template/error11.C: Likewise.
> > * gcc.dg/missing-symbol-2.c: New test case.
> > * gcc.dg/missing-symbol-3.c: New test case.
> > * gcc.dg/noncompile/940112-1.c: Update expected output

RE: 0005-Part-5.-Add-x86-CET-documentation

2017-09-26 Thread Tsimbalist, Igor V

Here is a new version of the patch.

Igor


> -Original Message-
> From: Sandra Loosemore [mailto:san...@codesourcery.com]
> Sent: Monday, September 25, 2017 5:43 AM
> To: Uros Bizjak ; Tsimbalist, Igor V
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: 0005-Part-5.-Add-x86-CET-documentation
> 
> On 09/20/2017 08:13 AM, Uros Bizjak wrote:
> > On Wed, Sep 20, 2017 at 11:20 AM, Tsimbalist, Igor V
> >  wrote:
> >> Uros, could you please review this patch as it's a part of x86 specific
> changes you have reviewed already.
> >
> > Please proofread and spell-check the text. There are grammatical errors,
> e.g.:
> > [snip]
> 
> If/when there is a newer version of this patch, cc it to me and I will also 
> make
> a pass through it.
> 
> -Sandra



0005-Part-5.-Add-x86-CET-documentation.patch
Description: 0005-Part-5.-Add-x86-CET-documentation.patch

RE: 0002-Part-2.-Document-finstrument-control-flow-and-notrack attribute

2017-09-26 Thread Tsimbalist, Igor V

Here is the updated version (version#3). All comments below are fixed.

Igor


> -Original Message-
> From: Tsimbalist, Igor V
> Sent: Monday, September 25, 2017 11:57 PM
> To: Sandra Loosemore ; 'gcc-
> patc...@gcc.gnu.org' 
> Cc: Jeff Law ; Tsimbalist, Igor V
> 
> Subject: RE: 0002-Part-2.-Document-finstrument-control-flow-and-notrack
> attribute
> 
> > -Original Message-
> > From: Sandra Loosemore [mailto:san...@codesourcery.com]
> > Sent: Monday, September 25, 2017 5:07 AM
> > To: Tsimbalist, Igor V ; 'gcc-
> > patc...@gcc.gnu.org' 
> > Cc: Jeff Law 
> > Subject: Re:
> > 0002-Part-2.-Document-finstrument-control-flow-and-notrack
> > attribute
> >
> > On 09/19/2017 07:45 AM, Tsimbalist, Igor V wrote:
> > > Here is an updated patch (version #2). Mainly attribute and option
> > > names
> > were changed.
> > >
> > > gcc/doc/
> > >   * extend.texi: Add 'nocf_check' documentation.
> > >   * gimple.texi: Add second parameter to
> > gimple_build_call_from_tree.
> > >   * invoke.texi: Add -fcf-protection documentation.
> > >   * rtl.texi: Add REG_CALL_NOTRACK documenation.
> > >
> > > Is it ok for trunk?
> > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index
> > > cd5733e..6bdb183 100644
> > > --- a/gcc/doc/extend.texi
> > > +++ b/gcc/doc/extend.texi
> > > @@ -5646,6 +5646,56 @@ Specify which floating-point unit to use.
> > > You must specify the  @code{target("fpmath=sse,387")} option as
> > > @code{target("fpmath=sse+387")} because the comma would separate
> > > different options.
> > > +
> > > +@item nocf_check
> > > +@cindex @code{nocf_check} function attribute The
> @code{nocf_check}
> > > +attribute on a function is used to inform the compiler that the
> > > +function's prolog should not be instrumented when
> >
> > s/prolog/prologue/
> 
> Fixed.
> 
> > > +compiled with the @option{-fcf-protection=branch} option.  The
> > > +compiler assumes that the function's address is a valid target for
> > > +a control-flow transfer.
> > > +
> > > +The @code{nocf_check} attribute on a type of pointer to function is
> > > +used to inform the compiler that a call through the pointer should
> > > +not be instrumented when compiled with the
> > > +@option{-fcf-protection=branch} option.  The compiler assumes that
> > > +the function's address from the pointer is a valid target for a
> > > +control-flow transfer.  A direct function call through a function
> > > +name is assumed as a safe call thus direct calls will not be
> >
> > ...is assumed to be a safe call, thus direct calls are not...
> 
> Fixed.
> 
> > > +instrumented by the compiler.
> > > +
> > > +The @code{nocf_check} attribute is applied to an object's type.  A
> > > +The @code{nocf_check} attribute is transfered to a call instruction
> > > +at the GIMPLE and RTL translation phases.  The attribute is not
> > > +propagated through assignment, store and load.
> >
> > extend.texi is user-facing documentation, but the second sentence here
> > is implementor-speak and not meaningful to users of GCC.  I don't
> > understand what the third sentence is trying to say.
> 
> The second sentence is removed. The third sentence is re-written as
> 
> In case of assignment of a function address or a function pointer to another
> pointer, the attribute is not carried over from the right-hand object's type,
> the type of left-hand object stays unchanged.  The compiler checks for
> @code{nocf_check} attribute mismatch and reports a warning in case of
> mismatch.
> 
> > > +
> > > +@smallexample
> > > +@{
> > > +int foo (void) __attribute__(nocf_check); void (*foo1)(void)
> > > +__attribute__(nocf_check); void (*foo2)(void);
> > > +
> > > +int
> > > +foo (void) /* The function's address is assumed as valid.  */
> >
> > s/as valid/to be valid/
> 
> Fixed.
> 
> > > +
> > > +  /* This call site is not checked for control-flow validness.  */
> >
> > s/validness/validity/g
> 
> Fixed.
> 
> > > +  (*foo1)();
> > > +
> > > +  foo1 = foo2;
> > > +  /* This call site is still not checked for control-flow validness.
> > > + */  (*foo1)();
> > > +
> > > +  /* This call site is checked for control-flow validness.  */
> > > + (*foo2)();
> > > +
> > > +  foo2 = foo1;
> > > +  /* This call site is still checked for control-flow validness.
> > > + */ (*foo2)();
> > > +
> > > +  return 0;
> > > +@}
> > > +@end smallexample
> > > +
> > >  @end table
> > >
> > >  On the x86, the inliner does not inline a diff --git
> > > a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index 635abd3..b6d9149
> > > 100644
> > > --- a/gcc/doc/gimple.texi
> > > +++ b/gcc/doc/gimple.texi
> > > @@ -1310,9 +1310,11 @@ operand is validated with
> > @code{is_gimple_operand}).
> > >  @end deftypefn
> > >
> > >
> > > -@deftypefn {GIMPLE function} gcall *gimple_build_call_from_tree
> > > (tree
> > > call_expr) -Build a @code{GIMPLE_CALL}

Re: [PATCH 3/5] pr65947-9.c: Requires char to be signed by default.

2017-09-26 Thread Richard Biener

On Tue, Sep 26, 2017 at 1:39 PM, Andreas Krebbel
 wrote:
> Fails on S/390 with char defaulting to unsigned char.

Ok.

> gcc/testsuite/ChangeLog:
>
> 2017-09-26  Andreas Krebbel  
>
> * gcc.dg/vect/pr65947-9.c: Use signed char explicitly.
> ---
>  gcc/testsuite/gcc.dg/vect/pr65947-9.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-9.c 
> b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
> index d769af9..e8f20aa 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr65947-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
> @@ -10,7 +10,7 @@ extern void abort (void) __attribute__ ((noreturn));
> vectorize because the vectorisation requires a slot for default values.  
> */
>
>  signed char __attribute__((noinline,noclone))
> -condition_reduction (char *a, char min_v)
> +condition_reduction (signed char *a, signed char min_v)
>  {
>signed char last = -72;
>
> --
> 2.9.1
>

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-09-26 Thread Wilco Dijkstra

Jakub Jelinek wrote:

> Well, we don't want to regress performance wise on one of the most important
> primary targets.  I don't care that much if the RTL/backend work is done
> together with the patch, or as a follow-up during stage1/3, but it should be
> done, the testcases I've posted can be used as a basis of a P1 runtime
> performance regression.

It should be sufficient to file a bug about inefficient 64-bit constant 
expansions on
x64. I didn't see a significant difference in my benchmarking of it on x64, so 
I'd say
it's only a performance regression if large benchmarks regress measurably (quite
unlikely).

Wilco

[GCC][PATCH][testsuite][mid-end] Fix failing slp test on aarch64 and arm.

2017-09-26 Thread Tamar Christina

Hi All,

The slp vectorization test currently fails on AArch32 and AArch64
due to it not taking into account that we do have 128 bit vectors in
NEON. This means that two of the loops get vectorized instead of just 1.

So update the conditions to include a check for neon.

Regtested on aarch64-none-elf.

Ok for trunk?

Thanks,
Tamar.

gcc/testsuite/
2017-09-26  Tamar Christina  

* gcc.dg/vect/slp-perm-9.c: Add arm_neon_ok checks.

-- 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
index 4d9c11dcc476a8023b3eaac2ae76cc01bd0db182..816c4b31be80dc6ab77bda838f77357e2157ffb9 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
@@ -54,8 +54,8 @@ int main (int argc, const char* argv[])
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { {! vect_perm } || {! vect_sizes_32B_16B } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  { target { { vect_perm } && { vect_sizes_32B_16B } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { { {! vect_perm } || {! vect_sizes_32B_16B } } && {! arm_neon_ok} } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  { target { { { vect_perm } && { vect_sizes_32B_16B } } || arm_neon_ok } } } } */
 /* { dg-final { scan-tree-dump-times "permutation requires at least three vectors" 1 "vect" { target vect_perm_short } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { {! vect_perm } || {! vect_sizes_32B_16B } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { { vect_perm } && { vect_sizes_32B_16B } } } } } */

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-09-26 Thread Jakub Jelinek

On Tue, Sep 26, 2017 at 12:44:10PM +, Sudi Das wrote:
> 
> Still waiting on Jakub's comment on whether there are more things needed
> at the backend.  But I have updated the patch according to Richard's
> comments.

Well, we don't want to regress performance wise on one of the most important
primary targets.  I don't care that much if the RTL/backend work is done
together with the patch, or as a follow-up during stage1/3, but it should be
done, the testcases I've posted can be used as a basis of a P1 runtime
performance regression.

Jakub

Re: [PATCH] BRIG frontend: request for a global review

2017-09-26 Thread Martin Jambor

Hi,

On Sun, Sep 17, 2017 at 02:13:34PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Tue, 24 Jan 2017 15:30:34 -0500, David Malcolm  wrote:
> > On Tue, 2017-01-24 at 13:52 +0100, Martin Jambor wrote:
> > > [...] I have just
> > > committed the BRIG FE as revision 244867.
> 
> In a build with that enabled, I just happened to "make html" in "gcc/",
> and ran into:
> 
> [...]
> makeinfo --split-size=500 --html -I [...]/source-gcc/gcc/doc -I 
> [...]/source-gcc/gcc/doc/include \
> -I [...]/source-gcc/gcc/brig -o 
> [...]/build-gcc/gcc/HTML/gcc-8.0.0/brig
> makeinfo: missing file argument.
> Try `makeinfo --help' for more information.
> [...]/source-gcc/gcc/brig/Make-lang.in:117: recipe for target 
> '[...]/build-gcc/gcc/HTML/gcc-8.0.0/brig/index.html' failed
> make: *** [[...]/build-gcc/gcc/HTML/gcc-8.0.0/brig/index.html] Error 255
> 
> > A deps issue for the docs I noticed when glancing through the commit:
> > 
> > diff --git a/gcc/brig/Make-lang.in b/gcc/brig/Make-lang.in
> > new file mode 100644 (file)
> > index 000..b85b1b0
> > --- /dev/null
> > +++ b/gcc/brig/Make-lang.in
> > 
> > [...snip...]
> > 
> > +# Documentation.
> > +
> > +GO_TEXI_FILES = \
> > +   brig/gccbrig.texi \
> > +   $(gcc_docdir)/include/fdl.texi \
> > +   $(gcc_docdir)/include/gpl_v3.texi \
> > +   $(gcc_docdir)/include/gcc-common.texi \
> > +   gcc-vers.texi
> > 
> > Presumably this should be BRIG_TEXI_FILES, rather than GO_TEXI_FILES?
> > 
> > +# doc/gccbrig.info: $(BRIG_TEXI_FILES)
> > +#  if test "x$(BUILD_INFO)" = xinfo; then \
> > +#rm -f doc/gccbrig.info*; \
> > +#$(MAKEINFO) $(MAKEINFOFLAGS) -I $(gcc_docdir) \
> > +#  -I $(gcc_docdir)/include -o $@ $<; \
> > +#  else true; fi
> > +
> > +# doc/gccbrig.dvi: $(BRIG_TEXI_FILES)
> > +#  $(TEXI2DVI) -I $(abs_docdir) -I $(abs_docdir)/include -o $@ $<
> > +
> > +# doc/gccbrig.pdf: $(BRIG_TEXI_FILES)
> > +#  $(TEXI2PDF) -I $(abs_docdir) -I $(abs_docdir)/include -o $@ $<
> > +
> > +$(build_htmldir)/brig/index.html: $(BRIG_TEXI_FILES)
> > +   $(mkinstalldirs) $(@D)
> > +   rm -f $(@D)/*
> > +   $(TEXI2HTML) -I $(gcc_docdir) -I $(gcc_docdir)/include \
> > +   -I $(srcdir)/brig -o $(@D) $<
> > 
> > ...for use in describing the deps of the above.
> 
> ..., so that still needs to be fixed.  Alas, that won't help: the
> "gccbrig.texi" file doesn't actually exist.  ;-)
> 

I see, I always only check "make info" when verifying documentation
changes and so missed this.  Thanks for providing the interim fix, me
and/or Pekka will add some basic content by the time next gcc 7 is
released (IIRC, it is supposed to come out at the end of this or the
beginning of next year).

Martin

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-09-26 Thread Sudi Das


Still waiting on Jakub's comment on whether there are more things needed at the 
backend. But I have updated the patch according to Richard's comments.

Thanks
Sudi



From: Richard Biener 
Sent: Friday, August 4, 2017 11:16 AM
To: Sudi Das
Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James 
Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
    
On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das  wrote:
>
>
>
>
> Sorry about the delayed response but looking at the above discussion, should 
> I conclude that this is a valid tree simplification?

Yes, I think so.  Jakub requested code to undo this at RTL expansion
based on target costs, not sure if we really should
require that from you given the user could have written the target
sequence himself.

Few comments about the patch:

+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus INTEGER_CST@1 @2))

I think this warrants a single_use check on the minus (note :s isn't enough
as with the unsigned case we'd happily ignore it by design).

+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
+   && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1))

You can relax this with using

  && wi::eq_p (@1, TYPE_PRECISION (type) - 1)

+   (if (TYPE_UNSIGNED(type))
+ (rshift (lshift @0 @1) @2)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert:type (rshift (lshift (convert:utype @0) @1) @2))
+

You can write (convert (rshift ...)), without the :type.

I'm leaving it to Jakub whether you need to write that RTL expansion tweak.

Thanks,
Richard.

> I am pasting the diff of the assembly that AArch64 generates with the test 
> case that I added. I see fewer instructions generated with the patch.
>
> --- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
> +++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
> @@ -24,10 +24,8 @@
>  str    x0, [sp, 8]
>  ldr    x0, [sp, 8]
>  mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>  add    sp, sp, 16
>  ret
>  .size    f2, .-f2
> @@ -39,10 +37,8 @@
>  str    x0, [sp, 8]
>  ldr    x0, [sp, 8]
>  mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>  add    sp, sp, 16
>  ret
>  .size    f3, .-f3
> @@ -52,11 +48,9 @@
>  f4:
>  sub    sp, sp, #16
>  str    w0, [sp, 12]
> -    mov    w1, 31
>  ldr    w0, [sp, 12]
> -    sub    w0, w1, w0
> -    mov    w1, 1
> -    lsl    w0, w1, w0
> +    mov    w1, -2147483648
> +    lsr    w0, w1, w0
>  add    sp, sp, 16
>  ret
>  .size    f4, .-f4
>
>
> Thanks
>
> Sudi
>
>
>
>
> From: Wilco Dijkstra
> Sent: Thursday, April 13, 2017 1:01 PM
> To: Richard Biener; Jakub Jelinek
> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>
> Richard Biener wrote:
>> It is IMHO a valid GIMPLE optimization / canonicalization.
>>
>>    movabsq $-9223372036854775808, %rax
>>
>> so this should then have been generated as 1<<63?
>>
>> At some point variable shifts were quite expensive as well..
>
> Yes I don't see a major difference between movabsq and
>
> movl    $1, %eax
> salq    $63, %rax
>
> on my Sandy Bridge, but if the above is faster then that is what the x64
> backend should emit - it's 1 byte smaller as well, so probably better in all
> cases.
>
> Wilco
diff --git a/gcc/match.pd b/gcc/match.pd
index e9017e4..160c12d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(lshift @0 @2)))
 
+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3))
+  (if (INTEGRAL_TYPE_P (type)
+   && wi::eq_p (@2, TYPE_PRECISION (type) - 1)
+   && single_use (@1))
+   (if (TYPE_UNSIGNED(type))
+ (rshift (lshift @0 @2) @3)
+   (with
+{ tree utype = unsigned_type_for (type); }
+(convert (rshift (lshift (convert:utype @0) @2) @3))
+
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
  (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c
new file mode 100644
index 000..317ea3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80131-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-gimple" } */
+
+/* Checks the simplification of:
+   1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1
+   f1 is not simplified but f2, f3 and f4 are. */
+
+__INT64_TYPE__ f1

Re: [PATCH 4/5] New target check: vect_nopeel - v2

2017-09-26 Thread Rainer Orth

Hi Andreas,

> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 307c726..3acfd85 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1398,6 +1398,9 @@ Target supports a vector misalign access.
>  @item vect_no_align
>  Target does not support a vector alignment mechanism.
>
> +@item vect_no_peel
> +Target does not require any loop peeling for alignment purposes.
> +
>  @item vect_no_int_min_max
>  Target does not support a vector min and max instruction on @code{int}.

please keep the items sorted alphabetically.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] x32: Encode %esp as %rsp to avoid 0x67 prefix

2017-09-26 Thread Uros Bizjak

On Tue, Sep 26, 2017 at 2:19 PM, Jakub Jelinek  wrote:
> On Sun, Sep 24, 2017 at 11:25:34AM +0200, Uros Bizjak wrote:
>> We can use 'q' modifier just before register output part (plus a small
>> simplification).
>>
>> Can you try the attached (untested) patch?
>>
>> Uros.
>
>> Index: i386.c
>> ===
>> --- i386.c(revision 253118)
>> +++ i386.c(working copy)
>> @@ -19953,6 +19953,14 @@ ix86_print_operand_address_as (FILE *file, rtx add
>> code = 'k';
>>   }
>>
>> +  /* Since the upper 32 bits of RSP are always zero for x32, we can
>> +  encode %esp as %rsp to avoid 0x67 prefix if there is no index or
>> +  base register.  */
>> +  if (TARGET_X32 && Pmode == SImode
>> +   && ((!index && base && REGNO (base) == SP_REG)
>> +   || (!base && index && REGNO (index) == SP_REG)))
>> + code = 'q';
>> +
>>if (ASSEMBLER_DIALECT == ASM_ATT)
>>   {
>> if (disp)
>
> This broke
> +FAIL: gcc.target/i386/pr55049-1.c (internal compiler error)
> +FAIL: gcc.target/i386/pr55049-1.c (test for excess errors)
> +FAIL: gcc.target/i386/pr59034-1.c (internal compiler error)
> +FAIL: gcc.target/i386/pr59034-1.c (test for excess errors)
> +FAIL: g++.dg/other/pr59492.C  -std=gnu++11 (internal compiler error)
> +FAIL: g++.dg/other/pr59492.C  -std=gnu++11 (test for excess errors)
> +FAIL: g++.dg/other/pr59492.C  -std=gnu++14 (internal compiler error)
> +FAIL: g++.dg/other/pr59492.C  -std=gnu++14 (test for excess errors)
> +FAIL: g++.dg/other/pr59492.C  -std=gnu++98 (internal compiler error)
> +FAIL: g++.dg/other/pr59492.C  -std=gnu++98 (test for excess errors)
> with --enable-checking=yes,rtl,extra.
>
> While I believe non-NULL index must be a REG, that is not the case for base,
> which can be pc_rtx.  A few lines later it calls print_reg on base and/or
> index (if non-NULL) and that assumes it is pc_rtx or uses REGNO on it,
> so I think we can't see a SUBREG or something similar there.
>
> Fixed thusly, ok for trunk?
>
> 2017-09-26  Jakub Jelinek  
>
> PR target/82267
> * config/i386/i386.c (ix86_print_operand_address_as): Only test
> REGNO (base) == SP_REG if base is a REG.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2017-09-26 11:38:54.0 +0200
> +++ gcc/config/i386/i386.c  2017-09-26 14:04:51.59554 +0200
> @@ -19957,7 +19957,7 @@ ix86_print_operand_address_as (FILE *fil
>  encode %esp as %rsp to avoid 0x67 prefix if there is no index or
>  base register.  */
>if (TARGET_X32 && Pmode == SImode
> - && ((!index && base && REGNO (base) == SP_REG)
> + && ((!index && base && REG_P (base) && REGNO (base) == SP_REG)
>   || (!base && index && REGNO (index) == SP_REG)))
> code = 'q';
>
>
>
> Jakub

Re: [PATCH] x32: Encode %esp as %rsp to avoid 0x67 prefix

2017-09-26 Thread Jakub Jelinek

On Sun, Sep 24, 2017 at 11:25:34AM +0200, Uros Bizjak wrote:
> We can use 'q' modifier just before register output part (plus a small
> simplification).
> 
> Can you try the attached (untested) patch?
> 
> Uros.

> Index: i386.c
> ===
> --- i386.c(revision 253118)
> +++ i386.c(working copy)
> @@ -19953,6 +19953,14 @@ ix86_print_operand_address_as (FILE *file, rtx add
> code = 'k';
>   }
>  
> +  /* Since the upper 32 bits of RSP are always zero for x32, we can
> +  encode %esp as %rsp to avoid 0x67 prefix if there is no index or
> +  base register.  */
> +  if (TARGET_X32 && Pmode == SImode
> +   && ((!index && base && REGNO (base) == SP_REG)
> +   || (!base && index && REGNO (index) == SP_REG)))
> + code = 'q';
> +
>if (ASSEMBLER_DIALECT == ASM_ATT)
>   {
> if (disp)

This broke
+FAIL: gcc.target/i386/pr55049-1.c (internal compiler error)
+FAIL: gcc.target/i386/pr55049-1.c (test for excess errors)
+FAIL: gcc.target/i386/pr59034-1.c (internal compiler error)
+FAIL: gcc.target/i386/pr59034-1.c (test for excess errors)
+FAIL: g++.dg/other/pr59492.C  -std=gnu++11 (internal compiler error)
+FAIL: g++.dg/other/pr59492.C  -std=gnu++11 (test for excess errors)
+FAIL: g++.dg/other/pr59492.C  -std=gnu++14 (internal compiler error)
+FAIL: g++.dg/other/pr59492.C  -std=gnu++14 (test for excess errors)
+FAIL: g++.dg/other/pr59492.C  -std=gnu++98 (internal compiler error)
+FAIL: g++.dg/other/pr59492.C  -std=gnu++98 (test for excess errors)
with --enable-checking=yes,rtl,extra.

While I believe non-NULL index must be a REG, that is not the case for base,
which can be pc_rtx.  A few lines later it calls print_reg on base and/or
index (if non-NULL) and that assumes it is pc_rtx or uses REGNO on it,
so I think we can't see a SUBREG or something similar there.

Fixed thusly, ok for trunk?

2017-09-26  Jakub Jelinek  

PR target/82267
* config/i386/i386.c (ix86_print_operand_address_as): Only test
REGNO (base) == SP_REG if base is a REG.

--- gcc/config/i386/i386.c.jj   2017-09-26 11:38:54.0 +0200
+++ gcc/config/i386/i386.c  2017-09-26 14:04:51.59554 +0200
@@ -19957,7 +19957,7 @@ ix86_print_operand_address_as (FILE *fil
 encode %esp as %rsp to avoid 0x67 prefix if there is no index or
 base register.  */
   if (TARGET_X32 && Pmode == SImode
- && ((!index && base && REGNO (base) == SP_REG)
+ && ((!index && base && REG_P (base) && REGNO (base) == SP_REG)
  || (!base && index && REGNO (index) == SP_REG)))
code = 'q';
 
 

Jakub

Re: [PATCH 2/5] pr60656.c: New target check: vect_mult_long

2017-09-26 Thread Andreas Krebbel

On 09/26/2017 01:57 PM, Rainer Orth wrote:
> Hi Andreas,
> 
>> We don't have a 64 bit vector integer multiply on z.  Add a specific
>> check for that.
>>
>> 2017-09-26  Andreas Krebbel  
>>
>>  * gcc.dg/vect/pr60656.c: Check vect_mult_long.
>>  * lib/target-supports.exp (check_effective_target_vect_mult_long):
>>  New proc.
> 
> as usual, this and the other new effective-target keywords need
> documenting in sourcebuild.texi.
> 
>   Rainer
> 

Ok.

-Andreas-

pr60656.c: New target check: vect_mult_long

We don't have a 64 bit vector integer multiply on z.  Add a specific
check for that.

gcc/ChangeLog:

2017-09-26  Andreas Krebbel  

* doc/sourcebuild.texi: Document vect_mult_long.

gcc/testsuite/ChangeLog:

2017-09-26  Andreas Krebbel  

* gcc.dg/vect/pr60656.c: Check vect_mult_long.
* lib/target-supports.exp (check_effective_target_vect_mult_long):
New proc.

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 9901c94..307c726 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1467,6 +1467,10 @@ into @code{int} results, or can promote (unpack) from 
@code{short} to
 Target supports a vector widening multiplication of @code{int} operands
 into @code{long} results.

+@item vect_mult_long
+Target supports a vector multiplication of @code{long} operands into
+@code{long} results.
+
 @item vect_sdot_qi
 Target supports a vector dot-product of @code{signed char}.

diff --git a/gcc/testsuite/gcc.dg/vect/pr60656.c 
b/gcc/testsuite/gcc.dg/vect/pr60656.c
index d9e30bb..f44269a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr60656.c
+++ b/gcc/testsuite/gcc.dg/vect/pr60656.c
@@ -43,4 +43,5 @@ int main()
   return 0;
 }

-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target
vect_widen_mult_si_to_di_pattern } } } */
+/* P * P * P requires a widening multiplication first as well as a 
longxlong->long after that.  */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target {
vect_widen_mult_si_to_di_pattern && vect_mult_long } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index b45a19e..7fdfbbb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5166,6 +5166,30 @@ proc check_effective_target_vect_long { } {
 return $answer
 }

+# Return 1 if the target supports hardware vector multiplication of
+# long operands with a long result, 0 otherwise.
+#
+# This can change for different subtargets so do not cache the result.
+
+proc check_effective_target_vect_mult_long { } {
+if { [istarget i?86-*-*] || [istarget x86_64-*-*]
+|| (([istarget powerpc*-*-*]
+  && ![istarget powerpc-*-linux*paired*])
+  && [check_effective_target_ilp32])
+|| [is-effective-target arm_neon]
+|| ([istarget sparc*-*-*] && [check_effective_target_ilp32])
+|| [istarget aarch64*-*-*]
+|| ([istarget mips*-*-*]
+ && [et-is-effective-target mips_msa]) } {
+   set answer 1
+} else {
+   set answer 0
+}
+
+verbose "check_effective_target_vect_mult_long: returning $answer" 2
+return $answer
+}
+
 # Return 1 if the target supports hardware vectors of float, 0 otherwise.
 #
 # This won't change for different subtargets so cache the result.
-- 
2.9.1

Re: [PATCH 4/5] New target check: vect_nopeel - v2

2017-09-26 Thread Andreas Krebbel

- vect_nopeel renamed to vect_no_peel
- documentation added.

gcc/testsuite/ChangeLog:

2017-09-26  Andreas Krebbel  

* doc/sourcebuild.texi: Document vect_no_peel.

gcc/testsuite/ChangeLog:

2017-09-26  Andreas Krebbel  

* g++.dg/vect/slp-pr56812.cc: Check vect_nopeel.
* lib/target-supports.exp (check_effective_target_vect_nopeel):
New proc.
---
 gcc/doc/sourcebuild.texi |  3 +++
 gcc/testsuite/g++.dg/vect/slp-pr56812.cc |  4 +++-
 gcc/testsuite/lib/target-supports.exp| 22 ++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 307c726..3acfd85 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1398,6 +1398,9 @@ Target supports a vector misalign access.
 @item vect_no_align
 Target does not support a vector alignment mechanism.

+@item vect_no_peel
+Target does not require any loop peeling for alignment purposes.
+
 @item vect_no_int_min_max
 Target does not support a vector min and max instruction on @code{int}.

diff --git a/gcc/testsuite/g++.dg/vect/slp-pr56812.cc 
b/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
index 80bdcdd..3dbaf76 100644
--- a/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
+++ b/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
@@ -17,4 +17,6 @@ void mydata::Set (float x)
 data[i] = x;

 }



-/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */

+/* For targets without vector loop peeling the loop becomes cheap

+   enough to be vectorized.  */

+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" { 
target { ! vect_no_peel }
} } } */

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 7fdfbbb..31e802d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3199,6 +3199,28 @@ proc check_effective_target_vect_floatuint_cvt { } {
 return $et_vect_floatuint_cvt_saved($et_index)
 }

+# Return 1 if peeling for alignment is never profitable on the target
+#
+
+proc check_effective_target_vect_no_peel { } {
+global et_vect_no_peel_saved
+global et_index
+
+if [info exists et_vect_no_peel_saved($et_index)] {
+   verbose "check_effective_target_vect_no_peel: using cached result" 2
+} else {
+   set et_vect_no_peel_saved($et_index) 0
+if { ([istarget s390*-*-*]
+ && [check_effective_target_s390_vx]) } {
+   set et_vect_no_peel_saved($et_index) 1
+}
+}
+
+verbose "check_effective_target_vect_no_peel:\
+returning $et_vect_no_peel_saved($et_index)" 2
+return $et_vect_no_peel_saved($et_index)
+}
+
 # Return 1 if the target supports #pragma omp declare simd, 0 otherwise.
 #
 # This won't change for different subtargets so cache the result.
-- 
2.9.1

Re: [PATCH] Optimize x == 0 && y == 0 into (x | y) == 0 in reassoc range opt (PR middle-end/35691)

2017-09-26 Thread Richard Biener

On Tue, 26 Sep 2017, Jakub Jelinek wrote:

> Hi!
> 
> Right now we handle x == 0 && y == 0 into (x | y) == 0
> and x == -1 && y == -1 into (x & y) == -1 optimizations just in
> match.pd, where it will handle the case where the && (or || if using !=)
> is actually & (or |) and they are next to each other.
> It doesn't handle the case when we have such comparisons as part of
> a larger && or || test that is split into multiple basic blocks or
> intermixed with other comparisons.
> 
> The following patch teaches optimize_range_tests to optimize even these.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> 2017-09-26  Jakub Jelinek  
> 
>   PR middle-end/35691
>   * tree-ssa-reassoc.c (update_range_test): Dump r->exp each time
>   if it is different SSA_NAME.
>   (optimize_range_tests_cmp_bitwise): New function.
>   (optimize_range_tests): Call it.
> 
>   * gcc.dg/pr35691-5.c: New test.
>   * gcc.dg/pr35691-6.c: New test.
> 
> --- gcc/tree-ssa-reassoc.c.jj 2017-09-26 09:35:15.101842092 +0200
> +++ gcc/tree-ssa-reassoc.c2017-09-26 10:18:07.600993304 +0200
> @@ -2379,7 +2379,16 @@ update_range_test (struct range_entry *r
>   r = otherrange + i;
> else
>   r = otherrangep[i];
> -   fprintf (dump_file, " and %c[", r->in_p ? '+' : '-');
> +   if (r->exp
> +   && r->exp != range->exp
> +   && TREE_CODE (r->exp) == SSA_NAME)
> + {
> +   fprintf (dump_file, " and ");
> +   print_generic_expr (dump_file, r->exp);
> + }
> +   else
> + fprintf (dump_file, " and");
> +   fprintf (dump_file, " %c[", r->in_p ? '+' : '-');
> print_generic_expr (dump_file, r->low);
> fprintf (dump_file, ", ");
> print_generic_expr (dump_file, r->high);
> @@ -2880,6 +2889,134 @@ optimize_range_tests_to_bit_test (enum t
>return any_changes;
>  }
>  
> +/* Optimize x != 0 && y != 0 && z != 0 into (x | y | z) != 0
> +   and similarly x != -1 && y != -1 && y != -1 into (x & y & z) != -1.  */
> +
> +static bool
> +optimize_range_tests_cmp_bitwise (enum tree_code opcode, int first, int 
> length,
> +   vec *ops,
> +   struct range_entry *ranges)
> +{
> +  int i;
> +  unsigned int b;
> +  bool any_changes = false;
> +  auto_vec buckets;
> +  auto_vec chains;
> +  auto_vec candidates;
> +
> +  for (i = first; i < length; i++)
> +{
> +  if (ranges[i].exp == NULL_TREE
> +   || TREE_CODE (ranges[i].exp) != SSA_NAME
> +   || !ranges[i].in_p
> +   || TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) <= 1
> +   || TREE_CODE (TREE_TYPE (ranges[i].exp)) == BOOLEAN_TYPE
> +   || ranges[i].low == NULL_TREE
> +   || ranges[i].low != ranges[i].high)
> + continue;
> +
> +  bool zero_p = integer_zerop (ranges[i].low);
> +  if (!zero_p && !integer_all_onesp (ranges[i].low))
> + continue;
> +
> +  b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 2 + !zero_p;
> +  if (buckets.length () <= b)
> + buckets.safe_grow_cleared (b + 1);
> +  if (chains.length () <= (unsigned) i)
> + chains.safe_grow (i + 1);
> +  chains[i] = buckets[b];
> +  buckets[b] = i + 1;
> +}
> +
> +  FOR_EACH_VEC_ELT (buckets, b, i)
> +if (i && chains[i - 1])
> +  {
> + int j, k = i;
> + for (j = chains[i - 1]; j; j = chains[j - 1])
> +   {
> + gimple *gk = SSA_NAME_DEF_STMT (ranges[k - 1].exp);
> + gimple *gj = SSA_NAME_DEF_STMT (ranges[j - 1].exp);
> + if (reassoc_stmt_dominates_stmt_p (gk, gj))
> +   k = j;
> +   }
> + tree type1 = TREE_TYPE (ranges[k - 1].exp);
> + tree type2 = NULL_TREE;
> + bool strict_overflow_p = false;
> + candidates.truncate (0);
> + for (j = i; j; j = chains[j - 1])
> +   {
> + tree type = TREE_TYPE (ranges[j - 1].exp);
> + strict_overflow_p |= ranges[j - 1].strict_overflow_p;
> + if (j == k
> + || useless_type_conversion_p (type1, type))
> +   ;
> + else if (type2 == NULL_TREE
> +  || useless_type_conversion_p (type2, type))
> +   {
> + if (type2 == NULL_TREE)
> +   type2 = type;
> + candidates.safe_push ([j - 1]);
> +   }
> +   }
> + unsigned l = candidates.length ();
> + for (j = i; j; j = chains[j - 1])
> +   {
> + tree type = TREE_TYPE (ranges[j - 1].exp);
> + if (j == k)
> +   continue;
> + if (useless_type_conversion_p (type1, type))
> +   ;
> + else if (type2 == NULL_TREE
> +  || useless_type_conversion_p (type2, type))
> +   continue;
> + candidates.safe_push ([j - 1]);
> +   }
> + gimple_seq seq = NULL;
> + tree op = NULL_TREE;
> + unsigned int id;
> + struct range_entry *r;
> +

[PATCH][GRAPHITE] Simplify SCOP detection

2017-09-26 Thread Richard Biener


The following is the result of me trying to understand SCOP detection
and the validity checks spread around the machinery.  It removes several
quadraticnesses by folding validity checks into 
scop_detection::harmful_loop_in_region where we already walk over all
BBs in the region and process individual found loops.

It also rewrites build_scop_depth/build_scop_breadth into something
I can undestand.

Bootstrap and regtest is running on x86_64-unknown-linux-gnu (graphite.exp
for all langs is happy, so is SPEC CPU 2006 testing where the statistics
agree before/after the patch).

I'll apply this after the bootstrap finished.

Richard.

2017-09-26  Richard Biener  

* graphite-scop-detection.c (scop_detection::build_scop_depth): Rewrite,
fold in ...
(scop_detection::build_scop_breadth): ... this.  Removed.
(scop_detection::loop_is_valid_in_scop): Fold into single caller.
(scop_detection::harmful_stmt_in_bb): Likewise.
(scop_detection::graphite_can_represent_stmt): Likewise.
(scop_detection::loop_body_is_valid_scop): Likewise.  Remove recursion.
(scop_detection::can_represent_loop): Remove recursion, fold in ...
(scop_detection::can_represent_loop_1): ... this.  Removed.
(scop_detection::harmful_loop_in_region): Simplify after inlining
the above and remove more quadraticness.
(build_scops): Adjust.
* tree-data-ref.c (loop_nest_has_data_refs): Remove pointless
quadraticness.


Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 253199)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -362,17 +362,7 @@ public:
 
   /* Build scop outer->inner if possible.  */
 
-  sese_l build_scop_depth (sese_l s, loop_p loop);
-
-  /* If loop and loop->next are valid scops, try to merge them.  */
-
-  sese_l build_scop_breadth (sese_l s1, loop_p loop);
-
-  /* Return true when LOOP is a valid scop, that is a Static Control Part, a
- region of code that can be represented in the polyhedral model.  SCOP
- defines the region we analyse.  */
-
-  bool loop_is_valid_in_scop (loop_p loop, sese_l scop) const;
+  void build_scop_depth (loop_p loop);
 
   /* Return true when BEGIN is the preheader edge of a loop with a single exit
  END.  */
@@ -398,18 +388,6 @@ public:
 
   void remove_intersecting_scops (sese_l s1);
 
-  /* Return true when the body of LOOP has statements that can be represented
- as a valid scop.  */
-
-  bool loop_body_is_valid_scop (loop_p loop, sese_l scop) const;
-
-  /* Return true when BB contains a harmful operation for a scop: that
- can be a function call with side effects, the induction variables
- are not linear with respect to SCOP, etc.  The current open
- scop should end before this statement.  */
-
-  bool harmful_stmt_in_bb (sese_l scop, basic_block bb) const;
-
   /* Return true when a statement in SCOP cannot be represented by Graphite.
  The assumptions are that L1 dominates L2, and SCOP->entry dominates L1.
  Limit the number of bbs between adjacent loops to
@@ -467,19 +445,12 @@ public:
  FIXME: For the moment, graphite cannot be used on loops that iterate using
  induction variables that wrap.  */
 
-  static bool can_represent_loop_1 (loop_p loop, sese_l scop);
-
-  /* Return true when all the loops within LOOP can be represented by
- Graphite.  */
-
   static bool can_represent_loop (loop_p loop, sese_l scop);
 
   /* Returns the number of pbbs that are in loops contained in SCOP.  */
 
   static int nb_pbbs_in_loops (scop_p scop);
 
-  static bool graphite_can_represent_stmt (sese_l, gimple *, basic_block);
-
 private:
   vec scops;
 };
@@ -673,10 +644,6 @@ scop_detection::merge_sese (sese_l first
   return invalid_sese;
 }
 
-  /* Analyze all the BBs in new sese.  */
-  if (harmful_loop_in_region (combined))
-return invalid_sese;
-
   DEBUG_PRINT (dp << "[merged-sese] s1: "; print_sese (dump_file, combined));
 
   return combined;
@@ -684,71 +651,40 @@ scop_detection::merge_sese (sese_l first
 
 /* Build scop outer->inner if possible.  */
 
-sese_l
-scop_detection::build_scop_depth (sese_l s, loop_p loop)
-{
-  if (!loop)
-return s;
-
-  DEBUG_PRINT (dp << "[Depth loop_" << loop->num << "]\n");
-  s = build_scop_depth (s, loop->inner);
-
-  sese_l s2 = merge_sese (s, get_sese (loop));
-  if (!s2)
-{
-  /* s might be a valid scop, so return it and start analyzing from the
-adjacent loop.  */
-  build_scop_depth (invalid_sese, loop->next);
-  return s;
-}
-
-  if (!loop_is_valid_in_scop (loop, s2))
-return build_scop_depth (invalid_sese, loop->next);
-
-  return build_scop_breadth (s2, loop);
-}
-
-/* If loop and loop->next are valid scops, try to merge them.  */
-
-sese_l
-scop_detection::build_scop_breadth (sese_l s1, loop_p loop)
+void

[PATCH] Optimize x == 0 && y == 0 into (x | y) == 0 in reassoc range opt (PR middle-end/35691)

2017-09-26 Thread Jakub Jelinek

Hi!

Right now we handle x == 0 && y == 0 into (x | y) == 0
and x == -1 && y == -1 into (x & y) == -1 optimizations just in
match.pd, where it will handle the case where the && (or || if using !=)
is actually & (or |) and they are next to each other.
It doesn't handle the case when we have such comparisons as part of
a larger && or || test that is split into multiple basic blocks or
intermixed with other comparisons.

The following patch teaches optimize_range_tests to optimize even these.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-09-26  Jakub Jelinek  

PR middle-end/35691
* tree-ssa-reassoc.c (update_range_test): Dump r->exp each time
if it is different SSA_NAME.
(optimize_range_tests_cmp_bitwise): New function.
(optimize_range_tests): Call it.

* gcc.dg/pr35691-5.c: New test.
* gcc.dg/pr35691-6.c: New test.

--- gcc/tree-ssa-reassoc.c.jj   2017-09-26 09:35:15.101842092 +0200
+++ gcc/tree-ssa-reassoc.c  2017-09-26 10:18:07.600993304 +0200
@@ -2379,7 +2379,16 @@ update_range_test (struct range_entry *r
r = otherrange + i;
  else
r = otherrangep[i];
- fprintf (dump_file, " and %c[", r->in_p ? '+' : '-');
+ if (r->exp
+ && r->exp != range->exp
+ && TREE_CODE (r->exp) == SSA_NAME)
+   {
+ fprintf (dump_file, " and ");
+ print_generic_expr (dump_file, r->exp);
+   }
+ else
+   fprintf (dump_file, " and");
+ fprintf (dump_file, " %c[", r->in_p ? '+' : '-');
  print_generic_expr (dump_file, r->low);
  fprintf (dump_file, ", ");
  print_generic_expr (dump_file, r->high);
@@ -2880,6 +2889,134 @@ optimize_range_tests_to_bit_test (enum t
   return any_changes;
 }
 
+/* Optimize x != 0 && y != 0 && z != 0 into (x | y | z) != 0
+   and similarly x != -1 && y != -1 && y != -1 into (x & y & z) != -1.  */
+
+static bool
+optimize_range_tests_cmp_bitwise (enum tree_code opcode, int first, int length,
+ vec *ops,
+ struct range_entry *ranges)
+{
+  int i;
+  unsigned int b;
+  bool any_changes = false;
+  auto_vec buckets;
+  auto_vec chains;
+  auto_vec candidates;
+
+  for (i = first; i < length; i++)
+{
+  if (ranges[i].exp == NULL_TREE
+ || TREE_CODE (ranges[i].exp) != SSA_NAME
+ || !ranges[i].in_p
+ || TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) <= 1
+ || TREE_CODE (TREE_TYPE (ranges[i].exp)) == BOOLEAN_TYPE
+ || ranges[i].low == NULL_TREE
+ || ranges[i].low != ranges[i].high)
+   continue;
+
+  bool zero_p = integer_zerop (ranges[i].low);
+  if (!zero_p && !integer_all_onesp (ranges[i].low))
+   continue;
+
+  b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 2 + !zero_p;
+  if (buckets.length () <= b)
+   buckets.safe_grow_cleared (b + 1);
+  if (chains.length () <= (unsigned) i)
+   chains.safe_grow (i + 1);
+  chains[i] = buckets[b];
+  buckets[b] = i + 1;
+}
+
+  FOR_EACH_VEC_ELT (buckets, b, i)
+if (i && chains[i - 1])
+  {
+   int j, k = i;
+   for (j = chains[i - 1]; j; j = chains[j - 1])
+ {
+   gimple *gk = SSA_NAME_DEF_STMT (ranges[k - 1].exp);
+   gimple *gj = SSA_NAME_DEF_STMT (ranges[j - 1].exp);
+   if (reassoc_stmt_dominates_stmt_p (gk, gj))
+ k = j;
+ }
+   tree type1 = TREE_TYPE (ranges[k - 1].exp);
+   tree type2 = NULL_TREE;
+   bool strict_overflow_p = false;
+   candidates.truncate (0);
+   for (j = i; j; j = chains[j - 1])
+ {
+   tree type = TREE_TYPE (ranges[j - 1].exp);
+   strict_overflow_p |= ranges[j - 1].strict_overflow_p;
+   if (j == k
+   || useless_type_conversion_p (type1, type))
+ ;
+   else if (type2 == NULL_TREE
+|| useless_type_conversion_p (type2, type))
+ {
+   if (type2 == NULL_TREE)
+ type2 = type;
+   candidates.safe_push ([j - 1]);
+ }
+ }
+   unsigned l = candidates.length ();
+   for (j = i; j; j = chains[j - 1])
+ {
+   tree type = TREE_TYPE (ranges[j - 1].exp);
+   if (j == k)
+ continue;
+   if (useless_type_conversion_p (type1, type))
+ ;
+   else if (type2 == NULL_TREE
+|| useless_type_conversion_p (type2, type))
+ continue;
+   candidates.safe_push ([j - 1]);
+ }
+   gimple_seq seq = NULL;
+   tree op = NULL_TREE;
+   unsigned int id;
+   struct range_entry *r;
+   candidates.safe_push ([k - 1]);
+   FOR_EACH_VEC_ELT (candidates, id, r)
+ {
+   gimple *g;
+   if (id == 0)
+ {
+   op =

Re: [PATCH 2/5] pr60656.c: New target check: vect_mult_long

2017-09-26 Thread Rainer Orth

Hi Andreas,

> We don't have a 64 bit vector integer multiply on z.  Add a specific
> check for that.
>
> 2017-09-26  Andreas Krebbel  
>
>   * gcc.dg/vect/pr60656.c: Check vect_mult_long.
>   * lib/target-supports.exp (check_effective_target_vect_mult_long):
>   New proc.

as usual, this and the other new effective-target keywords need
documenting in sourcebuild.texi.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH 1/8] Enable vect testcases on S/390.

2017-09-26 Thread Andreas Krebbel

On 09/26/2017 01:06 PM, Rainer Orth wrote:
> Hi Andreas,
> 
>> Add s390 platform checks where appropriate.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-09-26  Andreas Krebbel  
>>
>>  * lib/target-supports.exp: Enable tests for S/390.
> 
> this needs to be more specific: which procs were modified?
> 
> Besides, the changes to check_vect_support_and_set_flags and
> check_effective_target_s390_vxe aren't covered at all.

I had already committed the patch but have adjusted the changelog entry 
afterwards.

-Andreas-

Re: [patch, fortran] Warn about out-of-bounds access with DO subscripts

2017-09-26 Thread Jakub Jelinek

On Tue, Sep 26, 2017 at 09:17:40AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Mon, 25 Sep 2017 18:50:49 +0200, Thomas Koenig  
> wrote:
> > Thanks for the review, committed as r253156.
> > 
> > Now, on to some other bugs...
> 
> No, back to this one please.  ;-)
> 
> Apparently, the changes you prepared for existing testcases did not get
> committed, so I'm now seeing some FAILs there.  See also recent posts on
> the  mailing list:
> 
> FAIL: gfortran.dg/gomp/associate1.f90   -O  (test for excess errors)
> FAIL: gfortran.dg/predcom-1.f   -O  (test for excess errors)
> FAIL: gfortran.dg/unconstrained_commons.f   -O  (test for excess errors)

At least the gfortran.dg/gomp testcase doesn't seem to be related to OpenMP
at all, it fails also with just:
program associate1
  integer :: v, i, j
  real :: a(3, 3)
  i = 1
  j = 2
  associate(k => v, l => a(i, j), m => a(i, :))
  k = 1
  do i = 1, 10
k = k + 2
  end do
  end associate
end program

And I don't really see a bug in the testcase...

Jakub

[PATCH 3/5] pr65947-9.c: Requires char to be signed by default.

2017-09-26 Thread Andreas Krebbel

Fails on S/390 with char defaulting to unsigned char.

gcc/testsuite/ChangeLog:

2017-09-26  Andreas Krebbel  

* gcc.dg/vect/pr65947-9.c: Use signed char explicitly.
---
 gcc/testsuite/gcc.dg/vect/pr65947-9.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-9.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
index d769af9..e8f20aa 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-9.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
@@ -10,7 +10,7 @@ extern void abort (void) __attribute__ ((noreturn));
vectorize because the vectorisation requires a slot for default values.  */
 
 signed char __attribute__((noinline,noclone))
-condition_reduction (char *a, char min_v)
+condition_reduction (signed char *a, signed char min_v)
 {
   signed char last = -72;
 
-- 
2.9.1

[PATCH 4/5] New target check: vect_nopeel

2017-09-26 Thread Andreas Krebbel

Without peeling loops for vector alignment the vectorization costs are
lower and in some cases make the loop vectorizer cover optimizations
which otherwise would be handelt in slp instead.

This adds a new target check for that purpose.

gcc/testsuite/ChangeLog:

2017-09-26  Andreas Krebbel  

* g++.dg/vect/slp-pr56812.cc: Check vect_nopeel.
* lib/target-supports.exp (check_effective_target_vect_nopeel):
New proc.
---
 gcc/testsuite/g++.dg/vect/slp-pr56812.cc |  4 +++-
 gcc/testsuite/lib/target-supports.exp| 22 ++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/vect/slp-pr56812.cc 
b/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
index 80bdcdd..955b2ef 100644
--- a/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
+++ b/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
@@ -17,4 +17,6 @@ void mydata::Set (float x)
 data[i] = x;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */
+/* For targets without vector loop peeling the loop becomes cheap
+   enough to be vectorized.  */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" { 
target { ! vect_nopeel } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 7fdfbbb..686465a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3199,6 +3199,28 @@ proc check_effective_target_vect_floatuint_cvt { } {
 return $et_vect_floatuint_cvt_saved($et_index)
 }
 
+# Return 1 if peeling for alignment is never profitable on the target
+#
+
+proc check_effective_target_vect_nopeel { } {
+global et_vect_nopeel_saved
+global et_index
+
+if [info exists et_vect_nopeel_saved($et_index)] {
+   verbose "check_effective_target_vect_nopeel: using cached result" 2
+} else {
+   set et_vect_nopeel_saved($et_index) 0
+if { ([istarget s390*-*-*]
+ && [check_effective_target_s390_vx]) } {
+   set et_vect_nopeel_saved($et_index) 1
+}
+}
+
+verbose "check_effective_target_vect_nopeel:\
+returning $et_vect_nopeel_saved($et_index)" 2
+return $et_vect_nopeel_saved($et_index)
+}
+
 # Return 1 if the target supports #pragma omp declare simd, 0 otherwise.
 #
 # This won't change for different subtargets so cache the result.
-- 
2.9.1

[PATCH 2/5] pr60656.c: New target check: vect_mult_long

2017-09-26 Thread Andreas Krebbel

We don't have a 64 bit vector integer multiply on z.  Add a specific
check for that.

2017-09-26  Andreas Krebbel  

* gcc.dg/vect/pr60656.c: Check vect_mult_long.
* lib/target-supports.exp (check_effective_target_vect_mult_long):
New proc.
---
 gcc/testsuite/gcc.dg/vect/pr60656.c   |  3 ++-
 gcc/testsuite/lib/target-supports.exp | 24 
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr60656.c 
b/gcc/testsuite/gcc.dg/vect/pr60656.c
index d9e30bb..f44269a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr60656.c
+++ b/gcc/testsuite/gcc.dg/vect/pr60656.c
@@ -43,4 +43,5 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_widen_mult_si_to_di_pattern } } } */
+/* P * P * P requires a widening multiplication first as well as a 
longxlong->long after that.  */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_widen_mult_si_to_di_pattern && vect_mult_long } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index b45a19e..7fdfbbb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5166,6 +5166,30 @@ proc check_effective_target_vect_long { } {
 return $answer
 }
 
+# Return 1 if the target supports hardware vector multiplication of
+# long operands with a long result, 0 otherwise.
+#
+# This can change for different subtargets so do not cache the result.
+
+proc check_effective_target_vect_mult_long { } {
+if { [istarget i?86-*-*] || [istarget x86_64-*-*]
+|| (([istarget powerpc*-*-*]
+  && ![istarget powerpc-*-linux*paired*])
+  && [check_effective_target_ilp32])
+|| [is-effective-target arm_neon]
+|| ([istarget sparc*-*-*] && [check_effective_target_ilp32])
+|| [istarget aarch64*-*-*]
+|| ([istarget mips*-*-*]
+ && [et-is-effective-target mips_msa]) } {
+   set answer 1
+} else {
+   set answer 0
+}
+
+verbose "check_effective_target_vect_mult_long: returning $answer" 2
+return $answer
+}
+
 # Return 1 if the target supports hardware vectors of float, 0 otherwise.
 #
 # This won't change for different subtargets so cache the result.
-- 
2.9.1

[PATCH 5/5] Testcases using dg-options require at least -mzarch.

2017-09-26 Thread Andreas Krebbel

Testcases which override the vect default options using dg-options
need at least -mzarch on S/390 32 bit.

gcc/testsuite/ChangeLog:

2017-09-26  Andreas Krebbel  

* gfortran.dg/vect/fast-math-mgrid-resid.f: Use -mzarch on S/390.
* gfortran.dg/vect/pr77848.f: Likewise.
---
 gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f | 1 +
 gcc/testsuite/gfortran.dg/vect/pr77848.f   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
index 54f1e9e..7e2816b 100644
--- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
+++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
@@ -2,6 +2,7 @@
 ! { dg-require-effective-target vect_double }
 ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 
-fpredictive-commoning -fdump-tree-pcom-details" }
 ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } 
}
+! { dg-additional-options "-mzarch" { target { s390*-*-* } } }
 
 *** RESID COMPUTES THE RESIDUAL:  R = V - AU
 *
diff --git a/gcc/testsuite/gfortran.dg/vect/pr77848.f 
b/gcc/testsuite/gfortran.dg/vect/pr77848.f
index d54676e..8749275 100644
--- a/gcc/testsuite/gfortran.dg/vect/pr77848.f
+++ b/gcc/testsuite/gfortran.dg/vect/pr77848.f
@@ -1,7 +1,8 @@
 ! PR 77848: Verify versioning is on when vectorization fails
 ! { dg-do compile }
 ! { dg-options "-O3 -ffast-math -fdump-tree-ifcvt -fdump-tree-vect-details" }
-
+! { dg-additional-options "-mzarch" { target { s390*-*-* } } }
+  
   subroutine sub(x,a,n,m)
   implicit none
   real*8 x(*),a(*),atemp
-- 
2.9.1

[PATCH 1/5] Enable vect_float with S/390 VXE and adjust testcases

2017-09-26 Thread Andreas Krebbel

The target supports routines provide vect_double and vect_float but
these do not appear to be used consequently in the vect testcases.
With z13 we only have support for vector double but with z14 also for
vector float.  This patch adds vect_float to the testcases using the
float data type and make the vect_float target check to return 1 only
on z14.

gcc/testsuite/ChangeLog:

2017-09-26  Andreas Krebbel  

* lib/target-supports.exp (check_effective_target_vect_float):
Return 1 being on a S/390 with VXE.
* gcc.dg/vect/pr31699.c: Require vec_float.
* gcc.dg/vect/pr61194.c: Likewise.
* gcc.dg/vect/pr65947-10.c: Likewise.
* gcc.dg/vect/pr66142.c: Likewise.
* gcc.dg/vect/slp-10.c: Likewise.
* gcc.dg/vect/slp-11c.c: Likewise.
* gcc.dg/vect/slp-12b.c: Likewise.
* gcc.dg/vect/slp-18.c: Likewise.
* gcc.dg/vect/slp-33.c: Likewise.
* gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
* gcc.dg/vect/slp-cond-2.c: Likewise.
* gcc.dg/vect/vect-cond-10.c: Likewise.
* gcc.dg/vect/vect-cond-8.c: Likewise.
* gcc.dg/vect/vect-cond-9.c: Likewise.
* gcc.dg/vect/vect-float-extend-1.c: Likewise.
* gcc.dg/vect/vect-float-truncate-1.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr31699.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/pr61194.c   | 1 +
 gcc/testsuite/gcc.dg/vect/pr65947-10.c| 1 +
 gcc/testsuite/gcc.dg/vect/pr66142.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/slp-10.c| 1 +
 gcc/testsuite/gcc.dg/vect/slp-11c.c   | 1 +
 gcc/testsuite/gcc.dg/vect/slp-12b.c   | 1 +
 gcc/testsuite/gcc.dg/vect/slp-18.c| 1 +
 gcc/testsuite/gcc.dg/vect/slp-33.c| 1 +
 gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c  | 2 ++
 gcc/testsuite/gcc.dg/vect/slp-cond-2.c| 2 ++
 gcc/testsuite/gcc.dg/vect/vect-cond-10.c  | 1 +
 gcc/testsuite/gcc.dg/vect/vect-cond-8.c   | 1 +
 gcc/testsuite/gcc.dg/vect/vect-cond-9.c   | 1 +
 gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c   | 1 +
 gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c | 1 +
 gcc/testsuite/lib/target-supports.exp | 4 +++-
 17 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr31699.c 
b/gcc/testsuite/gcc.dg/vect/pr31699.c
index 59b8daa..7ec4dfe 100644
--- a/gcc/testsuite/gcc.dg/vect/pr31699.c
+++ b/gcc/testsuite/gcc.dg/vect/pr31699.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target vect_float } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/vect/pr61194.c 
b/gcc/testsuite/gcc.dg/vect/pr61194.c
index f7c71b9..8421367 100644
--- a/gcc/testsuite/gcc.dg/vect/pr61194.c
+++ b/gcc/testsuite/gcc.dg/vect/pr61194.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_cond_mixed } */
+/* { dg-require-effective-target vect_float } */
 
 #include "tree-vect.h"
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-10.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-10.c
index a8a674f..321cb8c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-10.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-10.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target vect_float } */
 
 #include "tree-vect.h"
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr66142.c 
b/gcc/testsuite/gcc.dg/vect/pr66142.c
index 94854ea..8c79f29 100644
--- a/gcc/testsuite/gcc.dg/vect/pr66142.c
+++ b/gcc/testsuite/gcc.dg/vect/pr66142.c
@@ -41,4 +41,4 @@ foo (float *a, float *b, float *c)
   *a = z;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" 
{ target vect_condition } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" 
{ target { vect_condition && vect_float } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-10.c 
b/gcc/testsuite/gcc.dg/vect/slp-10.c
index 3395d22..61c5d3c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-10.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-10.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_float } */
 
 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11c.c 
b/gcc/testsuite/gcc.dg/vect/slp-11c.c
index 8edd663..bdcf434 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11c.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_float } */
 
 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12b.c 
b/gcc/testsuite/gcc.dg/vect/slp-12b.c
index d6fe4e4..48e7865 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12b.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12b.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_uintfloat_cvt } */
+/* { dg-require-effective-target vect_float } */
 
 #include 
 #include "tree-vect.h"
diff --git

[PATCH 0/5] vect testsuite adjustments for S/390

2017-09-26 Thread Andreas Krebbel

These patches adjust the vect testcases and target support checks in
order to make the right set of testcases to be run on S/390 (z13 and
z14).

Ok for mainline?

Andreas Krebbel (5):
  Enable vect_float with S/390 VXE and adjust testcases
  pr60656.c: New target check: vect_mult_long
  pr65947-9.c: Requires char to be signed by default.
  New target check: vect_nopeel
  Testcases using dg-options require at least -mzarch.

 gcc/testsuite/g++.dg/vect/slp-pr56812.cc   |  4 +-
 gcc/testsuite/gcc.dg/vect/pr31699.c|  2 +-
 gcc/testsuite/gcc.dg/vect/pr60656.c|  3 +-
 gcc/testsuite/gcc.dg/vect/pr61194.c|  1 +
 gcc/testsuite/gcc.dg/vect/pr65947-10.c |  1 +
 gcc/testsuite/gcc.dg/vect/pr65947-9.c  |  2 +-
 gcc/testsuite/gcc.dg/vect/pr66142.c|  2 +-
 gcc/testsuite/gcc.dg/vect/slp-10.c |  1 +
 gcc/testsuite/gcc.dg/vect/slp-11c.c|  1 +
 gcc/testsuite/gcc.dg/vect/slp-12b.c|  1 +
 gcc/testsuite/gcc.dg/vect/slp-18.c |  1 +
 gcc/testsuite/gcc.dg/vect/slp-33.c |  1 +
 gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c   |  2 +
 gcc/testsuite/gcc.dg/vect/slp-cond-2.c |  2 +
 gcc/testsuite/gcc.dg/vect/vect-cond-10.c   |  1 +
 gcc/testsuite/gcc.dg/vect/vect-cond-8.c|  1 +
 gcc/testsuite/gcc.dg/vect/vect-cond-9.c|  1 +
 gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c|  1 +
 gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c  |  1 +
 .../gfortran.dg/vect/fast-math-mgrid-resid.f   |  1 +
 gcc/testsuite/gfortran.dg/vect/pr77848.f   |  3 +-
 gcc/testsuite/lib/target-supports.exp  | 50 +-
 22 files changed, 76 insertions(+), 7 deletions(-)

-- 
2.9.1

Re: [PATCH 1/8] Enable vect testcases on S/390.

2017-09-26 Thread Rainer Orth

Hi Andreas,

> Add s390 platform checks where appropriate.
>
> gcc/testsuite/ChangeLog:
>
> 2017-09-26  Andreas Krebbel  
>
>   * lib/target-supports.exp: Enable tests for S/390.

this needs to be more specific: which procs were modified?

Besides, the changes to check_vect_support_and_set_flags and
check_effective_target_s390_vxe aren't covered at all.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH] Fix PR82321

2017-09-26 Thread Richard Biener


Latent, exposed by me removing the "redundant" 
rewrite-into-loop-closed-ssa.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-09-26  Richard Biener  

PR tree-optimization/82321
* graphite.c (canonicalize_loop_closed_ssa): Properly check
for the def being inside the loop.

* gcc.dg/graphite/pr82321.c: New testcase.

Index: gcc/graphite.c
===
--- gcc/graphite.c  (revision 253188)
+++ gcc/graphite.c  (working copy)
@@ -326,7 +327,9 @@ canonicalize_loop_closed_ssa (loop_p loo
 
  /* Only add close phi nodes for SSA_NAMEs defined in LOOP.  */
  if (TREE_CODE (arg) != SSA_NAME
- || loop_containing_stmt (SSA_NAME_DEF_STMT (arg)) != loop)
+ || SSA_NAME_IS_DEFAULT_DEF (arg)
+ || ! flow_bb_inside_loop_p (loop,
+ gimple_bb (SSA_NAME_DEF_STMT (arg
continue;
 
  tree res = copy_ssa_name (arg);
Index: gcc/testsuite/gcc.dg/graphite/pr82321.c
===
--- gcc/testsuite/gcc.dg/graphite/pr82321.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr82321.c (working copy)
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+int y8;
+
+void
+dm (int io)
+{
+  if (y8 != 0)
+{
+  int pu = 1;
+
+  while (io < 2)
+   {
+ int xo = (pu != 0) ? y8 : 0;
+
+ while (y8 != 0)
+   if (xo != 0)
+ {
+gi:
+   xo = (__INTPTR_TYPE__)
+   pu = 0;
+ }
+   }
+}
+
+  if (io != 0)
+{
+  y8 = 1;
+  while (y8 != 0)
+   if (io / !y8 != 0)
+ y8 = 0;
+
+  goto gi;
+}
+}

Re: [PATCH v2,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant

2017-09-26 Thread Segher Boessenkool

Hi Kelvin,

On Mon, Sep 25, 2017 at 04:11:32PM -0600, Kelvin Nilsen wrote:
> On Power8 little endian, two instructions are needed to load from the
> natural in-memory representation of a vector into a vector register: a
> load followed by a swap.  When the vector value to be loaded is a
> constant, more efficient code can be achieved by swapping the
> representation of the constant in memory so that only a load instruction
> is required.

>   * gcc.target/powerpc/swaps-p8-28.c: New test.
>   * gcc.target/powerpc/swaps-p8-29.c: New test.
>   * gcc.target/powerpc/swaps-p8-31.c: New test.
>   * gcc.target/powerpc/swaps-p8-32.c: New test.
>   * gcc.target/powerpc/swaps-p8-34.c: New test.
>   * gcc.target/powerpc/swaps-p8-35.c: New test.
>   * gcc.target/powerpc/swaps-p8-37.c: New test.
>   * gcc.target/powerpc/swaps-p8-38.c: New test.
>   * gcc.target/powerpc/swaps-p8-40.c: New test.
>   * gcc.target/powerpc/swaps-p8-41.c: New test.
>   * gcc.target/powerpc/swaps-p8-43.c: New test.
>   * gcc.target/powerpc/swaps-p8-44.c: New test.
>   * gcc.target/powerpc/swps-p8-30.c: New test.
>   * gcc.target/powerpc/swps-p8-33.c: New test.
>   * gcc.target/powerpc/swps-p8-36.c: New test.
>   * gcc.target/powerpc/swps-p8-39.c: New test.
>   * gcc.target/powerpc/swps-p8-42.c: New test.
>   * gcc.target/powerpc/swps-p8-45.c: New test.

I think you want to name those "swps" files "swaps" as well?  (See below).

> +  /* If this is not a load or is not a swap, return false */

End the sentence with a dot (and space space) please.

> +   /* Constants held on the stack are not "true" constants
> +  because their values are not part of the static load
> +  image.  If this constant's base reference is a stack
> +  or frame pointer, it is seen as an artificial
> +  reference. */

Dot space space.

> +static void
> +replace_swapped_load_constant (swap_web_entry *insn_entry, rtx swap_insn)
> +{
> +  /* Find the load.  */
> +  struct df_insn_info *insn_info = DF_INSN_INFO_GET (swap_insn);
> +  rtx_insn *load_insn = 0;

Don't initialise this (you set it a few lines later :-) )

> +  df_ref use  = DF_INSN_INFO_USES (insn_info);
> +  gcc_assert (use);
> +
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +  gcc_assert (def_link && !def_link->next);
> +
> +  load_insn = DF_REF_INSN (def_link->ref);
> +  gcc_assert (load_insn);

You can remove most of these asserts btw; if e.g. the first one would
fail, the very next line would ICE anyway.  The ->next test is probably
useful; if you don have useless asserts the useful ones stand out more ;-)

> +  else if ((mode == V8HImode)
> +#ifdef HAVE_V8HFmode
> +|| (mode == V8HFmode)
> +#endif
> +)

Hrm.  So rs6000-modes.def claims it is creating V8HFmode:

VECTOR_MODES (FLOAT, 16); /*   V8HF  V4SF V2DF */

but VECTOR_MODES does not do that, because we have no HFmode.  Surprising.
Looks like a bug even.

Maybe we want to delete the #ifdef later; this code is fine until then.

> --- gcc/testsuite/gcc.target/powerpc/swaps-p8-28.c(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-28.c(working copy)
> @@ -0,0 +1,29 @@
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-do run { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power8" } } */
> +/* { dg-options "-mcpu=power8 -O3 " } */

Run tests need to check p8vector_hw instead of powerpc_p8vector_ok, or they
will crash on older systems.

> --- gcc/testsuite/gcc.target/powerpc/swps-p8-36.c (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/swps-p8-36.c (working copy)
> @@ -0,0 +1,31 @@
> +/* This file's name was changed from swaps-p8-36.c so that the
> +   assembler search for "not swap" would not get a false
> +   positive on the name of the file.  */

Oh.

> +/* { dg-final { scan-assembler-not "swap" } } */

So what is this really testing for?  xxswapd?  But a) we never generate
that, and b) you could use a better regex?

Or what else is it looking for?  I bet b) holds anyway :-)

Looks good except for those details.


Segher

[PATCH 8/8] S/390: Fix vmslg instruction and builtin.

2017-09-26 Thread Andreas Krebbel

gcc/ChangeLog:

2017-09-26  Andreas Krebbel  

* config/s390/vx-builtins.md ("vmslg"): Add missing operand in
assembler output.
* config/s390/s390-builtins.def: Fix constraint on op4.
---
 gcc/ChangeLog | 6 ++
 gcc/config/s390/s390-builtins.def | 2 +-
 gcc/config/s390/vx-builtins.md| 4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d4f67f5..07d665c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
 2017-09-26  Andreas Krebbel  
 
+   * config/s390/vx-builtins.md ("vmslg"): Add missing operand in
+   assembler output.
+   * config/s390/s390-builtins.def: Fix constraint on op4.
+
+2017-09-26  Andreas Krebbel  
+
* config/s390/s390.c (s390_expand_vec_compare): Use the new mode
independent expanders.
* config/s390/vector.md ("vec_cmpuneq", "vec_cmpltgt")
diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index ddcf370..3f7bae7 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -2271,7 +2271,7 @@ OB_DEF_VAR (s390_vec_test_mask_dbl, s390_vtm, 
  0,
 B_DEF  (s390_vtm,   vec_test_mask_intv16qi,0,  
 B_VX,   0,  BT_FN_INT_UV16QI_UV16QI)
 
 B_DEF  (s390_vec_msum_u128, vec_msumv2di,   0, 
 B_VXE,  O4_U2,  BT_FN_UV16QI_UV2DI_UV2DI_UV16QI_INT)
-B_DEF  (s390_vmslg, vmslg,  0, 
 B_VXE,  O4_U2,  BT_FN_INT128_UV2DI_UV2DI_INT128_INT)
+B_DEF  (s390_vmslg, vmslg,  0, 
 B_VXE,  O4_U4,  BT_FN_INT128_UV2DI_UV2DI_INT128_INT)
 
 OB_DEF (s390_vec_eqv,   s390_vec_eqv_b8,
s390_vec_eqv_dbl_c, B_VXE,  BT_FN_OV4SI_OV4SI_OV4SI)
 OB_DEF_VAR (s390_vec_eqv_b8,s390_vnx,   0, 
 0,  BT_OV_BV16QI_BV16QI_BV16QI)
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 4c157e3..7fb176c 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -1190,7 +1190,7 @@
   (match_operand:QI4 "const_mask_operand" "C")]
  UNSPEC_VEC_MSUM))]
   "TARGET_VXE"
-  "vmslg\t%v0,%v1,%v2,%v3"
+  "vmslg\t%v0,%v1,%v2,%v3,%4"
   [(set_attr "op_type" "VRR")])
 
 (define_insn "vmslg"
@@ -1201,7 +1201,7 @@
(match_operand:QI4 "const_mask_operand" "C")]
   UNSPEC_VEC_MSUM))]
   "TARGET_VXE"
-  "vmslg\t%v0,%v1,%v2,%v3"
+  "vmslg\t%v0,%v1,%v2,%v3,%4"
   [(set_attr "op_type" "VRR")])
 
 
-- 
2.9.1

[PATCH 6/8] S/390: Set the preferred mode for float vectors

2017-09-26 Thread Andreas Krebbel

gcc/ChangeLog:

2017-09-26  Andreas Krebbel  

* config/s390/s390.c (s390_preferred_simd_mode): Return V4SFmode
for SFmode.
---
 gcc/ChangeLog  | 5 +
 gcc/config/s390/s390.c | 8 
 2 files changed, 13 insertions(+)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7863af1..a33de8f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2017-09-26  Andreas Krebbel  
 
+   * config/s390/s390.c (s390_preferred_simd_mode): Return V4SFmode
+   for SFmode.
+
+2017-09-26  Andreas Krebbel  
+
* config/s390/vector.md ("vec_unpacks_low_v16qi"): Rename to
vec_unpacks_lo_v16qi.
("vec_unpacku_low_v16qi"): Rename to vec_unpacku_lo_v16qi.
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index e3fafa2a6..0ceeef4 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -15859,6 +15859,14 @@ s390_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
 static machine_mode
 s390_preferred_simd_mode (scalar_mode mode)
 {
+  if (TARGET_VXE)
+switch (mode)
+  {
+  case E_SFmode:
+   return V4SFmode;
+  default:;
+  }
+
   if (TARGET_VX)
 switch (mode)
   {
-- 
2.9.1

[PATCH 7/8] S/390: Fix vector fp unordered compares

2017-09-26 Thread Andreas Krebbel

V2DF mode was still hard-coded here.

gcc/ChangeLog:

2017-09-26  Andreas Krebbel  

* config/s390/s390.c (s390_expand_vec_compare): Use the new mode
independent expanders.
* config/s390/vector.md ("vec_cmpuneq", "vec_cmpltgt")
("vec_ordered", "vec_unordered"): New expanders.
---
 gcc/ChangeLog |  7 ++
 gcc/config/s390/s390.c|  8 +++---
 gcc/config/s390/vector.md | 64 +++
 3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a33de8f..d4f67f5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,12 @@
 2017-09-26  Andreas Krebbel  
 
+   * config/s390/s390.c (s390_expand_vec_compare): Use the new mode
+   independent expanders.
+   * config/s390/vector.md ("vec_cmpuneq", "vec_cmpltgt")
+   ("vec_ordered", "vec_unordered"): New expanders.
+
+2017-09-26  Andreas Krebbel  
+
* config/s390/s390.c (s390_preferred_simd_mode): Return V4SFmode
for SFmode.
 
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 0ceeef4..d2671ba 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6396,16 +6396,16 @@ s390_expand_vec_compare (rtx target, enum rtx_code cond,
  /* UNLT: a u< b -> !(a >= b) */
case UNLT: cond = GE; neg_p = true;break;
case UNEQ:
- emit_insn (gen_vec_cmpuneqv2df (target, cmp_op1, cmp_op2));
+ emit_insn (gen_vec_cmpuneq (target, cmp_op1, cmp_op2));
  return;
case LTGT:
- emit_insn (gen_vec_cmpltgtv2df (target, cmp_op1, cmp_op2));
+ emit_insn (gen_vec_cmpltgt (target, cmp_op1, cmp_op2));
  return;
case ORDERED:
- emit_insn (gen_vec_orderedv2df (target, cmp_op1, cmp_op2));
+ emit_insn (gen_vec_ordered (target, cmp_op1, cmp_op2));
  return;
case UNORDERED:
- emit_insn (gen_vec_unorderedv2df (target, cmp_op1, cmp_op2));
+ emit_insn (gen_vec_unordered (target, cmp_op1, cmp_op2));
  return;
default: break;
}
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 06c88c1..d40bf1e 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1406,6 +1406,22 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
+(define_expand "vec_cmpuneq"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")
+   (match_operand 2 "register_operand" "")]
+  "TARGET_VX"
+{
+  if (GET_MODE (operands[1]) == V4SFmode)
+emit_insn (gen_vec_cmpuneqv4sf (operands[0], operands[1], operands[2]));
+  else if (GET_MODE (operands[1]) == V2DFmode)
+emit_insn (gen_vec_cmpuneqv2df (operands[0], operands[1], operands[2]));
+  else
+gcc_unreachable ();
+
+  DONE;
+})
+
 ; LTGT a <> b -> a > b | b > a
 (define_expand "vec_cmpltgt"
   [(set (match_operand: 0 "register_operand" "=v")
@@ -1418,6 +1434,22 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
+(define_expand "vec_cmpltgt"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")
+   (match_operand 2 "register_operand" "")]
+  "TARGET_VX"
+{
+  if (GET_MODE (operands[1]) == V4SFmode)
+emit_insn (gen_vec_cmpltgtv4sf (operands[0], operands[1], operands[2]));
+  else if (GET_MODE (operands[1]) == V2DFmode)
+emit_insn (gen_vec_cmpltgtv2df (operands[0], operands[1], operands[2]));
+  else
+gcc_unreachable ();
+
+  DONE;
+})
+
 ; ORDERED (a, b): a >= b | b > a
 (define_expand "vec_ordered"
   [(set (match_operand:  0 "register_operand" "=v")
@@ -1430,6 +1462,22 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
+(define_expand "vec_ordered"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")
+   (match_operand 2 "register_operand" "")]
+  "TARGET_VX"
+{
+  if (GET_MODE (operands[1]) == V4SFmode)
+emit_insn (gen_vec_orderedv4sf (operands[0], operands[1], operands[2]));
+  else if (GET_MODE (operands[1]) == V2DFmode)
+emit_insn (gen_vec_orderedv2df (operands[0], operands[1], operands[2]));
+  else
+gcc_unreachable ();
+
+  DONE;
+})
+
 ; UNORDERED (a, b): !ORDERED (a, b)
 (define_expand "vec_unordered"
   [(set (match_operand:  0 "register_operand" "=v")
@@ -1443,6 +1491,22 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
+(define_expand "vec_unordered"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")
+   (match_operand 2 "register_operand" "")]
+  "TARGET_VX"
+{
+  if (GET_MODE (operands[1]) == V4SFmode)
+emit_insn (gen_vec_unorderedv4sf (operands[0], operands[1], operands[2]));
+  else if (GET_MODE (operands[1]) == V2DFmode)
+emit_insn (gen_vec_unorderedv2df (operands[0], operands[1], operands[2]));
+  else
+gcc_unreachable ();
+
+  DONE;
+})
+
 (define_insn "*vec_load_pair"
   [(set (match_operand:V_HW_64

[PATCH 5/8] S/390: Fix rtl standard names for vector unpack low->lo

2017-09-26 Thread Andreas Krebbel

gcc/ChangeLog:

2017-09-26  Andreas Krebbel  

* config/s390/vector.md ("vec_unpacks_low_v16qi"): Rename to
vec_unpacks_lo_v16qi.
("vec_unpacku_low_v16qi"): Rename to vec_unpacku_lo_v16qi.
---
 gcc/ChangeLog | 6 ++
 gcc/config/s390/vector.md | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d2808b5..7863af1 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
 2017-09-26  Andreas Krebbel  
 
+   * config/s390/vector.md ("vec_unpacks_low_v16qi"): Rename to
+   vec_unpacks_lo_v16qi.
+   ("vec_unpacku_low_v16qi"): Rename to vec_unpacku_lo_v16qi.
+
+2017-09-26  Andreas Krebbel  
+
* config/s390/vector.md ("vec_unpacks_lo_v4sf")
("vec_unpacks_hi_v4sf", "vec_unpacks_lo_v2df")
("vec_unpacks_hi_v2df", "vec_pack_trunc_v2df"): New expanders.
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index c15d81d..06c88c1 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1652,7 +1652,7 @@
   "vuphb\t%0,%1"
   [(set_attr "op_type" "VRR")])
 
-(define_insn "vec_unpacks_low_v16qi"
+(define_insn "vec_unpacks_lo_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
(sign_extend:V8HI
 (vec_select:V8QI
@@ -1676,7 +1676,7 @@
   "vuplhb\t%0,%1"
   [(set_attr "op_type" "VRR")])
 
-(define_insn "vec_unpacku_low_v16qi"
+(define_insn "vec_unpacku_lo_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
(zero_extend:V8HI
 (vec_select:V8QI
-- 
2.9.1

[PATCH 4/8] S/390: Add FP vec_pack/unpack

2017-09-26 Thread Andreas Krebbel

gcc/ChangeLog:

2017-09-26  Andreas Krebbel  

* config/s390/vector.md ("vec_unpacks_lo_v4sf")
("vec_unpacks_hi_v4sf", "vec_unpacks_lo_v2df")
("vec_unpacks_hi_v2df", "vec_pack_trunc_v2df"): New expanders.
---
 gcc/ChangeLog |  6 +++
 gcc/config/s390/vector.md | 96 ++-
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7843857..d2808b5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
 2017-09-26  Andreas Krebbel  
 
+   * config/s390/vector.md ("vec_unpacks_lo_v4sf")
+   ("vec_unpacks_hi_v4sf", "vec_unpacks_lo_v2df")
+   ("vec_unpacks_hi_v2df", "vec_pack_trunc_v2df"): New expanders.
+
+2017-09-26  Andreas Krebbel  
+
* config/s390/predicates.md ("const_shift_by_byte_operand"): New
predicate.
* config/s390/vector.md ("*vec_srb"): Change modes to V_128
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index e61bb88..c15d81d 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1781,7 +1781,7 @@
 
 ;; vector load lengthened
 
-; vflls
+; vflls float -> double
 (define_insn "*vec_extendv4sf"
   [(set (match_operand:V2DF 0 "register_operand" "=v")
(float_extend:V2DF
@@ -1792,6 +1792,34 @@
   "vldeb\t%v0,%v1"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "vec_unpacks_lo_v4sf"
+  [(set (match_dup 2)
+   (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
+ (match_dup 1)]
+UNSPEC_VEC_MERGEL))
+   (set (match_operand:V2DF   0 "register_operand" "=v")
+   (float_extend:V2DF
+(vec_select:V2SF
+ (match_dup 2)
+ (parallel [(const_int 0) (const_int 2)]]
+  "TARGET_VX"
+{ operands[2] = gen_reg_rtx(V4SFmode); })
+
+(define_expand "vec_unpacks_hi_v4sf"
+  [(set (match_dup 2)
+   (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
+ (match_dup 1)]
+UNSPEC_VEC_MERGEH))
+   (set (match_operand:V2DF   0 "register_operand" "=v")
+   (float_extend:V2DF
+(vec_select:V2SF
+ (match_dup 2)
+ (parallel [(const_int 0) (const_int 2)]]
+  "TARGET_VX"
+{ operands[2] = gen_reg_rtx(V4SFmode); })
+
+
+; double -> long double
 (define_insn "*vec_extendv2df"
   [(set (match_operand:V1TF 0 "register_operand" "=v")
(float_extend:V1TF
@@ -1802,6 +1830,72 @@
   "wflld\t%v0,%v1"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "vec_unpacks_lo_v2df"
+  [(set (match_dup 2)
+   (unspec:V2DF [(match_operand:V2DF 1 "register_operand" "v")
+ (match_dup 1)]
+UNSPEC_VEC_MERGEL))
+   (set (match_operand:V1TF   0 "register_operand" "=v")
+   (float_extend:V1TF
+(vec_select:V1DF
+ (match_dup 2)
+ (parallel [(const_int 0)]]
+  "TARGET_VXE"
+{ operands[2] = gen_reg_rtx (V2DFmode); })
+
+(define_expand "vec_unpacks_hi_v2df"
+  [(set (match_dup 2)
+   (unspec:V2DF [(match_operand:V2DF 1 "register_operand" "v")
+ (match_dup 1)]
+UNSPEC_VEC_MERGEH))
+   (set (match_operand:V1TF   0 "register_operand" "=v")
+   (float_extend:V1TF
+(vec_select:V1DF
+ (match_dup 2)
+ (parallel [(const_int 0)]]
+  "TARGET_VXE"
+{ operands[2] = gen_reg_rtx (V2DFmode); })
+
+
+; 2 x v2df -> 1 x v4sf
+(define_expand "vec_pack_trunc_v2df"
+  [(set (match_dup 3)
+   (unspec:V4SF [(match_operand:V2DF 1 "register_operand" "")
+ (const_int VEC_INEXACT)
+ (const_int VEC_RND_CURRENT)]
+UNSPEC_VEC_VFLR))
+   (set (match_dup 4)
+   (unspec:V4SF [(match_operand:V2DF 2 "register_operand" "")
+ (const_int VEC_INEXACT)
+ (const_int VEC_RND_CURRENT)]
+UNSPEC_VEC_VFLR))
+   (set (match_dup 6)
+   (unspec:V16QI [(subreg:V16QI (match_dup 3) 0)
+  (subreg:V16QI (match_dup 4) 0)
+  (match_dup 5)]
+ UNSPEC_VEC_PERM))
+   (set (match_operand:V4SF 0 "register_operand" "")
+   (subreg:V4SF (match_dup 6) 0))]
+  "TARGET_VX"
+{
+  rtx constv, perm[16];
+  int i;
+
+  for (i = 0; i < 4; ++i)
+{
+  perm[i] = GEN_INT (i);
+  perm[i + 4] = GEN_INT (i + 8);
+  perm[i + 8] = GEN_INT (i + 16);
+  perm[i + 12] = GEN_INT (i + 24);
+}
+  constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+
+  operands[3] = gen_reg_rtx (V4SFmode);
+  operands[4] = gen_reg_rtx (V4SFmode);
+  operands[5] = force_reg (V16QImode, constv);
+  operands[6] = gen_reg_rtx (V16QImode);
+})
+
 ; reduc_smin
 ; reduc_smax
 ; reduc_umin
-- 
2.9.1

[PATCH 1/8] Enable vect testcases on S/390.

2017-09-26 Thread Andreas Krebbel

Add s390 platform checks where appropriate.

gcc/testsuite/ChangeLog:

2017-09-26  Andreas Krebbel  

* lib/target-supports.exp: Enable tests for S/390.
---
 gcc/testsuite/ChangeLog   |   4 ++
 gcc/testsuite/lib/target-supports.exp | 131 ++
 2 files changed, 106 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 6401706..72cf8c3 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2017-09-26  Andreas Krebbel  
+
+   * lib/target-supports.exp: Enable tests for S/390.
+
 2017-09-26  Richard Biener  
 
PR tree-optimization/82320
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 7834c30..8b25797 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3066,7 +3066,9 @@ proc check_effective_target_vect_int { } {
 || [is-effective-target arm_neon]
 || ([istarget mips*-*-*]
 && ([et-is-effective-target mips_loongson]
-|| [et-is-effective-target mips_msa])) } {
+|| [et-is-effective-target mips_msa]))
+|| ([istarget s390*-*-*]
+&& [check_effective_target_s390_vx]) } {
set et_vect_int_saved($et_index) 1
}
 }
@@ -5069,7 +5071,9 @@ proc check_effective_target_vect_shift { } {
 || [is-effective-target arm_neon]
 || ([istarget mips*-*-*]
 && ([et-is-effective-target mips_msa]
-|| [et-is-effective-target mips_loongson])) } {
+|| [et-is-effective-target mips_loongson]))
+|| ([istarget s390*-*-*]
+&& [check_effective_target_s390_vx]) } {
   set et_vect_shift_saved($et_index) 1
}
 }
@@ -5087,7 +5091,9 @@ proc check_effective_target_whole_vector_shift { } {
 || ([is-effective-target arm_neon]
 && [check_effective_target_arm_little_endian])
 || ([istarget mips*-*-*]
-&& [et-is-effective-target mips_loongson]) } {
+&& [et-is-effective-target mips_loongson])
+|| ([istarget s390*-*-*]
+&& [check_effective_target_s390_vx]) } {
set answer 1
 } else {
set answer 0
@@ -5133,7 +5139,9 @@ proc check_effective_target_vect_shift_char { } {
  && ![istarget powerpc-*-linux*paired*])
 || [is-effective-target arm_neon]
 || ([istarget mips*-*-*]
-&& [et-is-effective-target mips_msa]) } {
+&& [et-is-effective-target mips_msa])
+|| ([istarget s390*-*-*]
+&& [check_effective_target_s390_vx]) } {
   set et_vect_shift_char_saved($et_index) 1
}
 }
@@ -5156,7 +5164,9 @@ proc check_effective_target_vect_long { } {
 || ([istarget sparc*-*-*] && [check_effective_target_ilp32])
 || [istarget aarch64*-*-*]
 || ([istarget mips*-*-*]
- && [et-is-effective-target mips_msa]) } {
+ && [et-is-effective-target mips_msa])
+|| ([istarget s390*-*-*]
+&& [check_effective_target_s390_vx]) } {
set answer 1
 } else {
set answer 0
@@ -5219,7 +5229,9 @@ proc check_effective_target_vect_double { } {
 || [istarget spu-*-*]
 || ([istarget powerpc*-*-*] && [check_vsx_hw_available])
 || ([istarget mips*-*-*]
-&& [et-is-effective-target mips_msa]) } {
+&& [et-is-effective-target mips_msa])
+|| ([istarget s390*-*-*]
+&& [check_effective_target_s390_vx]) } {
set et_vect_double_saved($et_index) 1
}
 }
@@ -5243,7 +5255,9 @@ proc check_effective_target_vect_long_long { } {
set et_vect_long_long_saved($et_index) 0
if { [istarget i?86-*-*] || [istarget x86_64-*-*]
 || ([istarget mips*-*-*]
-&& [et-is-effective-target mips_msa]) } {
+&& [et-is-effective-target mips_msa])
+|| ([istarget s390*-*-*]
+&& [check_effective_target_s390_vx]) } {
   set et_vect_long_long_saved($et_index) 1
 }
 }
@@ -5343,7 +5357,9 @@ proc check_effective_target_vect_perm { } {
 || [istarget i?86-*-*] || [istarget x86_64-*-*]
 || ([istarget mips*-*-*]
 && ([et-is-effective-target mpaired_single]
-|| [et-is-effective-target mips_msa])) } {
+|| [et-is-effective-target mips_msa]))
+|| ([istarget s390*-*-*]
+&& [check_effective_target_s390_vx]) } {
set et_vect_perm_saved($et_index) 1
 }
 }
@@ -5372,7 +5388,9 @@ proc check_effective_target_vect_perm_byte { } {
 || [istarget powerpc*-*-*]

[PATCH 3/8] S/390: Add support for vec_shr

2017-09-26 Thread Andreas Krebbel

gcc/ChangeLog:

2017-09-26  Andreas Krebbel  

* config/s390/predicates.md ("const_shift_by_byte_operand"): New
predicate.
* config/s390/vector.md ("*vec_srb"): Change modes to V_128
and V16QI.
("*vec_slb"): New insn pattern.
("vec_shr_"): New expander.
* config/s390/vx-builtins.md ("vec_slb"): Turn into expander
and force the shift count operand to V16QImode.
("vec_srb"): Set shift count mode to V16QI.
---
 gcc/ChangeLog  | 12 
 gcc/config/s390/predicates.md  |  7 +++
 gcc/config/s390/vector.md  | 39 ---
 gcc/config/s390/vx-builtins.md | 23 +--
 4 files changed, 64 insertions(+), 17 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index dcee7cb..7843857 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,17 @@
 2017-09-26  Andreas Krebbel  
 
+   * config/s390/predicates.md ("const_shift_by_byte_operand"): New
+   predicate.
+   * config/s390/vector.md ("*vec_srb"): Change modes to V_128
+   and V16QI.
+   ("*vec_slb"): New insn pattern.
+   ("vec_shr_"): New expander.
+   * config/s390/vx-builtins.md ("vec_slb"): Turn into expander
+   and force the shift count operand to V16QImode.
+   ("vec_srb"): Set shift count mode to V16QI.
+
+2017-09-26  Andreas Krebbel  
+
* config/s390/vector.md ("vec_widen_umult_lo_")
("vec_widen_umult_hi_", "vec_widen_smult_lo_")
("vec_widen_smult_hi_"): New expander definitions.
diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index db966dd..bbff8d8 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -508,3 +508,10 @@
 }
   return true;
 })
+
+(define_predicate "const_shift_by_byte_operand"
+  (match_code "const_int")
+{
+  unsigned HOST_WIDE_INT val = INTVAL (op);
+  return val <= 128 && val % 8 == 0;
+})
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 29131cd..e61bb88 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -980,15 +980,43 @@
 
 ; Pattern used by e.g. popcount
 (define_insn "*vec_srb"
-  [(set (match_operand:V_HW 0 "register_operand""=v")
-   (unspec:V_HW [(match_operand:V_HW 1 "register_operand"   "v")
- (match_operand: 2 "register_operand" "v")]
-UNSPEC_VEC_SRLB))]
+  [(set (match_operand:V_1280 "register_operand" "=v")
+   (unspec:V_128 [(match_operand:V_128 1 "register_operand"  "v")
+  (match_operand:V16QI 2 "register_operand"  "v")]
+  UNSPEC_VEC_SRLB))]
   "TARGET_VX"
   "vsrlb\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
 
 
+; Vector shift left by byte
+
+(define_insn "*vec_slb"
+  [(set (match_operand:V_1280 "register_operand" "=v")
+   (unspec:V_128 [(match_operand:V_128 1 "register_operand"  "v")
+   (match_operand:V16QI2 "register_operand"  "v")]
+  UNSPEC_VEC_SLB))]
+  "TARGET_VX"
+  "vslb\t%v0,%v1,%v2"
+  [(set_attr "op_type" "VRR")])
+
+; vec_shr is defined as shift towards element 0
+; this means it is a left shift on BE targets!
+(define_expand "vec_shr_"
+  [(set (match_dup 3)
+   (unspec:V16QI [(match_operand:SI 2 "const_shift_by_byte_operand" "")
+  (const_int 7)
+  (match_dup 3)]
+  UNSPEC_VEC_SET))
+   (set (match_operand:V_128 0 "register_operand" "")
+   (unspec:V_128 [(match_operand:V_128 1 "register_operand" "")
+   (match_dup 3)]
+  UNSPEC_VEC_SLB))]
+  "TARGET_VX"
+ {
+   operands[3] = gen_reg_rtx(V16QImode);
+ })
+
 ; vmnb, vmnh, vmnf, vmng
 (define_insn "smin3"
   [(set (match_operand:VI  0 "register_operand" "=v")
@@ -1779,9 +1807,6 @@
 ; reduc_umin
 ; reduc_umax
 
-; vec_shl vrep + vsl
-; vec_shr
-
 ; vec_pack_sfix_trunc: convert + pack ?
 ; vec_pack_ufix_trunc
 ; vec_unpacks_float_hi
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 54796df..4c157e3 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -1005,15 +1005,16 @@
 
 ; Vector shift left by byte
 
-(define_insn "vec_slb"
-  [(set (match_operand:V_HW 0 "register_operand""=v")
-   (unspec:V_HW [(match_operand:V_HW 1 "register_operand"   "v")
- (match_operand: 2 "register_operand" "v")]
+; Pattern definition in vector.md, see vec_vslb
+(define_expand "vec_slb"
+  [(set (match_operand:V_HW 0 "register_operand" "")
+   (unspec:V_HW [(match_operand:V_HW 1 "register_operand"   "")
+ (match_operand: 2 "register_operand" "")]
 UNSPEC_VEC_SLB))]
   "TARGET_VX"
-  "vslb\t%v0,%v1,%v2"
-  [(set_attr "op_type"

[PATCH 2/8] S/390: Add widening vector mult lo/hi patterns

2017-09-26 Thread Andreas Krebbel

Add support for widening vector multiply lo/hi patterns.  These do not
directly match on IBM Z instructions but can be emulated with even/odd
+ vector merge.

gcc/ChangeLog:

2017-09-26  Andreas Krebbel  

* config/s390/vector.md ("vec_widen_umult_lo_")
("vec_widen_umult_hi_", "vec_widen_smult_lo_")
("vec_widen_smult_hi_"): New expander definitions.
---
 gcc/ChangeLog |  6 
 gcc/config/s390/vector.md | 83 ---
 2 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7c6d7dc..dcee7cb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-09-26  Andreas Krebbel  
+
+   * config/s390/vector.md ("vec_widen_umult_lo_")
+   ("vec_widen_umult_hi_", "vec_widen_smult_lo_")
+   ("vec_widen_smult_hi_"): New expander definitions.
+
 2017-09-26  Richard Biener  
 
PR tree-optimization/82320
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 3cf7989..29131cd 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1065,10 +1065,85 @@
   "vmlo\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
 
-; vec_widen_umult_hi
-; vec_widen_umult_lo
-; vec_widen_smult_hi
-; vec_widen_smult_lo
+
+; Widening hi/lo multiplications
+
+; The S/390 instructions vml and vmh return the low or high parts of
+; the double sized result elements in the corresponding elements of
+; the target register.  That's NOT what the vec_widen_umult_lo/hi
+; patterns are expected to do.
+
+; We emulate the widening lo/hi multiplies with the even/odd versions
+; followed by a vector merge
+
+
+(define_expand "vec_widen_umult_lo_"
+  [(set (match_dup 3)
+   (unspec: [(match_operand:VI_QHS 1 "register_operand" "%v")
+ (match_operand:VI_QHS 2 "register_operand"  "v")]
+UNSPEC_VEC_UMULT_EVEN))
+   (set (match_dup 4)
+   (unspec: [(match_dup 1) (match_dup 2)]
+UNSPEC_VEC_UMULT_ODD))
+   (set (match_operand: 0 "register_operand" "=v")
+   (unspec: [(match_dup 3) (match_dup 4)]
+UNSPEC_VEC_MERGEL))]
+  "TARGET_VX"
+ {
+   operands[3] = gen_reg_rtx (mode);
+   operands[4] = gen_reg_rtx (mode);
+ })
+
+(define_expand "vec_widen_umult_hi_"
+  [(set (match_dup 3)
+   (unspec: [(match_operand:VI_QHS 1 "register_operand" "%v")
+ (match_operand:VI_QHS 2 "register_operand"  "v")]
+UNSPEC_VEC_UMULT_EVEN))
+   (set (match_dup 4)
+   (unspec: [(match_dup 1) (match_dup 2)]
+UNSPEC_VEC_UMULT_ODD))
+   (set (match_operand: 0 "register_operand" "=v")
+   (unspec: [(match_dup 3) (match_dup 4)]
+UNSPEC_VEC_MERGEH))]
+  "TARGET_VX"
+ {
+   operands[3] = gen_reg_rtx (mode);
+   operands[4] = gen_reg_rtx (mode);
+ })
+
+(define_expand "vec_widen_smult_lo_"
+  [(set (match_dup 3)
+   (unspec: [(match_operand:VI_QHS 1 "register_operand" "%v")
+ (match_operand:VI_QHS 2 "register_operand"  "v")]
+UNSPEC_VEC_SMULT_EVEN))
+   (set (match_dup 4)
+   (unspec: [(match_dup 1) (match_dup 2)]
+UNSPEC_VEC_SMULT_ODD))
+   (set (match_operand: 0 "register_operand" "=v")
+   (unspec: [(match_dup 3) (match_dup 4)]
+UNSPEC_VEC_MERGEL))]
+  "TARGET_VX"
+ {
+   operands[3] = gen_reg_rtx (mode);
+   operands[4] = gen_reg_rtx (mode);
+ })
+
+(define_expand "vec_widen_smult_hi_"
+  [(set (match_dup 3)
+   (unspec: [(match_operand:VI_QHS 1 "register_operand" "%v")
+ (match_operand:VI_QHS 2 "register_operand"  "v")]
+UNSPEC_VEC_SMULT_EVEN))
+   (set (match_dup 4)
+   (unspec: [(match_dup 1) (match_dup 2)]
+UNSPEC_VEC_SMULT_ODD))
+   (set (match_operand: 0 "register_operand" "=v")
+   (unspec: [(match_dup 3) (match_dup 4)]
+UNSPEC_VEC_MERGEH))]
+  "TARGET_VX"
+ {
+   operands[3] = gen_reg_rtx (mode);
+   operands[4] = gen_reg_rtx (mode);
+ })
 
 ; vec_widen_ushiftl_hi
 ; vec_widen_ushiftl_lo
-- 
2.9.1

[PATCH 0/8] S/390: Enable vect tests on S/390 + fixes and improvements

2017-09-26 Thread Andreas Krebbel

Committed to mainline

Andreas Krebbel (8):
  Enable vect testcases on S/390.
  S/390: Add widening vector mult lo/hi patterns
  S/390: Add support for vec_shr
  S/390: Add FP vec_pack/unpack
  S/390: Fix rtl standard names for vector unpack low->lo
  S/390: Set the preferred mode for float vectors
  S/390: Fix vector fp unordered compares
  S/390: Fix vmslg instruction and builtin.

 gcc/ChangeLog |  48 ++
 gcc/config/s390/predicates.md |   7 +
 gcc/config/s390/s390-builtins.def |   2 +-
 gcc/config/s390/s390.c|  16 +-
 gcc/config/s390/vector.md | 286 --
 gcc/config/s390/vx-builtins.md|  27 ++--
 gcc/testsuite/ChangeLog   |   4 +
 gcc/testsuite/lib/target-supports.exp | 131 
 8 files changed, 461 insertions(+), 60 deletions(-)

-- 
2.9.1

Re: [Ada] Improve performance of 'Image with enumeration types.

2017-09-26 Thread Duncan Sands


On 09/26/2017 12:17 PM, Eric Botcazou wrote:

By the way, why not always do this "inlining", even when not optimizing?


Because this generates more bloated code and inferior debugging experience.


This is a trick question, because when you answer "because XYZ" I will then
reply "but XYZ is a common reason that people disable inlining when
optimizing, so shouldn't you only do it when inlining is enabled?" :)


People ought not to disable inlining when optimizing though.


I've seen a few projects disable inlining when optimizing because it can 
generate bloated code and an inferior debugging experience :)  But I won't argue 
the point any further (that this should really be conditioned on inlining being 
enabled, not on optimization being enabled) as while I'm probably right in 
theory, in practice I doubt it will actually cause trouble for anyone.


Best wishes, Duncan.

Re: [Ada] Improve performance of 'Image with enumeration types.

2017-09-26 Thread Duncan Sands


Hi Arno,


it looks like this is in essence inlining the run-time library
routine. In which case, shouldn't you only do it if inlining is
enabled?  For example, it seems rather odd to do this if
compiling with -Os.


Actually, measurements showed that this instance of inlining is a
win for both performance and code size, so it???s a good candidate
even for -Os. Note that we inline string concatenation routines
for the same reason.


thanks for explaining.  I think it merits a comment in the code though.

By the way, why not always do this "inlining", even when not optimizing?


That's a practical trade off, based on our past experience.


if it's a trade-off then there must be a down-side.  What is the down-side?

Best wishes, Duncan.

Re: [Ada] Improve performance of 'Image with enumeration types.

2017-09-26 Thread Eric Botcazou

> By the way, why not always do this "inlining", even when not optimizing?

Because this generates more bloated code and inferior debugging experience.

> This is a trick question, because when you answer "because XYZ" I will then
> reply "but XYZ is a common reason that people disable inlining when
> optimizing, so shouldn't you only do it when inlining is enabled?" :)

People ought not to disable inlining when optimizing though.

-- 
Eric Botcazou

Re: [Ada] Improve performance of 'Image with enumeration types.

2017-09-26 Thread Arnaud Charlet

Duncan,

> >>it looks like this is in essence inlining the run-time library
> >>routine. In which case, shouldn't you only do it if inlining is
> >>enabled?  For example, it seems rather odd to do this if
> >>compiling with -Os.
> >
> >Actually, measurements showed that this instance of inlining is a
> >win for both performance and code size, so it???s a good candidate
> >even for -Os. Note that we inline string concatenation routines
> >for the same reason.
> 
> thanks for explaining.  I think it merits a comment in the code though.
> 
> By the way, why not always do this "inlining", even when not optimizing?

That's a practical trade off, based on our past experience.

Arno

Re: [Ada] Improve performance of 'Image with enumeration types.

2017-09-26 Thread Duncan Sands


Hi Pierre-Marie,

On 09/26/2017 11:30 AM, Pierre-Marie de Rodat wrote:

On 09/25/2017 02:47 PM, Duncan Sands wrote:
it looks like this is in essence inlining the run-time library routine. In 
which case, shouldn't you only do it if inlining is enabled?  For example, it 
seems rather odd to do this if compiling with -Os.


Actually, measurements showed that this instance of inlining is a win for both 
performance and code size, so it’s a good candidate even for -Os. Note that we 
inline string concatenation routines for the same reason.


thanks for explaining.  I think it merits a comment in the code though.

By the way, why not always do this "inlining", even when not optimizing?

This is a trick question, because when you answer "because XYZ" I will then 
reply "but XYZ is a common reason that people disable inlining when optimizing, 
so shouldn't you only do it when inlining is enabled?" :)


Best wishes, Duncan.

PS: I'm imagining XYZ is related to a better debugging experience.

Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag

2017-09-26 Thread Janus Weil

Hi Rainer,

>> Attached is a more complete patch, which should fix all problems that
>> were reported concerning these two test cases. Would be great if
>> someone could confirm that it works on a failing target (I currently
>> only have access to x86_64-linux-gnu machines).
>
> I've just checked sparc-sun-solaris2.11: works fine.  promotion_3.f90
> PASSes as before, but promotion_4.f90 is now UNSUPPORTED instead of
> failing.

thanks for checking!


>> Ok for trunk?
>
> The new fortran_real_10 effective-target keyword needs documenting in
> sourcebuild.texi.

Good point. fortran_real_16 was missing there as well. Added both (new
patch attached).

I'll commit this tonight, unless there are further comments ...

Cheers,
Janus
Index: gcc/doc/sourcebuild.texi
===
--- gcc/doc/sourcebuild.texi(revision 253134)
+++ gcc/doc/sourcebuild.texi(working copy)
@@ -1357,6 +1357,12 @@ Target has runtime support for any options added w
 @item fortran_integer_16
 Target supports Fortran @code{integer} that is 16 bytes or longer.
 
+@item fortran_real_10
+Target supports Fortran @code{real} that is 10 bytes or longer.
+
+@item fortran_real_16
+Target supports Fortran @code{real} that is 16 bytes or longer.
+
 @item fortran_large_int
 Target supports Fortran @code{integer} kinds larger than @code{integer(8)}.
 
Index: gcc/testsuite/gfortran.dg/promotion_3.f90
===
--- gcc/testsuite/gfortran.dg/promotion_3.f90   (revision 253134)
+++ gcc/testsuite/gfortran.dg/promotion_3.f90   (working copy)
@@ -1,5 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-fdefault-real-16" }
+! { dg-require-effective-target fortran_real_16 }
 !
 ! PR 82143: add a -fdefault-real-16 flag
 !
Index: gcc/testsuite/gfortran.dg/promotion_4.f90
===
--- gcc/testsuite/gfortran.dg/promotion_4.f90   (revision 253134)
+++ gcc/testsuite/gfortran.dg/promotion_4.f90   (working copy)
@@ -1,5 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-fdefault-real-10" }
+! { dg-require-effective-target fortran_real_10 }
 !
 ! PR 82143: add a -fdefault-real-16 flag
 !
@@ -12,5 +13,5 @@ double precision :: d
 if (kind(r4) /= 4) call abort
 if (kind(r8) /= 8) call abort
 if (kind(r) /= 10) call abort
-if (kind(d) /= 16) call abort
+if (kind(d)  < 10) call abort
 end
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 253134)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -1464,7 +1464,21 @@ proc check_effective_target_fortran_real_16 { } {
 }]
 }
 
+# Return 1 if the target supports Fortran real kind 10,
+# 0 otherwise. Contrary to check_effective_target_fortran_large_real
+# this checks for real(10) only.
+#
+# When the target name changes, replace the cached result.
 
+proc check_effective_target_fortran_real_10 { } {
+return [check_no_compiler_messages fortran_real_10 executable {
+   ! Fortran
+   real(kind=10) :: x
+   x = cos (x)
+   end
+}]
+}
+
 # Return 1 if the target supports Fortran's IEEE modules,
 # 0 otherwise.
 #

[patch] [arm] Fix pr82175 - fix -mcpu=native not working correctly

2017-09-26 Thread Richard Earnshaw (lists)

The new option processing machinery relies on %< rules in the specs to
suppress options that are rewritten.  Suppression appears to be a two
phase process where the option is partially suppressed when %< is
processed and then fully suppressed at the end of the string.  Strings
are separated by commas and there can be multiple strings used to form
DRIVER_SELF_SPECS.

The fix in this case is to separate the driver self specs for ARM into
separate rules as described; this forces the -m{cpu,tune,arch}=native
options to be properly removed before proceeding to the next rule set.

PR target/82175
* config/arm/arm.h (DRIVER_SELF_SPECS): Separate sub-rules with
commas.

Tested on cross and native.  Applied to trunk.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 9a171b0..0804e2a 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2237,9 +2237,12 @@ const char *arm_be8_option (int argc, const char **argv);
   " %{mfloat-abi=*: abi %*}"	\
   "

Re: [Ada] Improve performance of 'Image with enumeration types.

2017-09-26 Thread Pierre-Marie de Rodat


On 09/25/2017 02:47 PM, Duncan Sands wrote:
it looks like this is in essence inlining the run-time library routine.  
In which case, shouldn't you only do it if inlining is enabled?  For 
example, it seems rather odd to do this if compiling with -Os.


Actually, measurements showed that this instance of inlining is a win 
for both performance and code size, so it’s a good candidate even for 
-Os. Note that we inline string concatenation routines for the same reason.


--
Pierre-Marie de Rodat

Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag

2017-09-26 Thread Rainer Orth

Hi Janus,

> Attached is a more complete patch, which should fix all problems that
> were reported concerning these two test cases. Would be great if
> someone could confirm that it works on a failing target (I currently
> only have access to x86_64-linux-gnu machines).

I've just checked sparc-sun-solaris2.11: works fine.  promotion_3.f90
PASSes as before, but promotion_4.f90 is now UNSUPPORTED instead of
failing.

> Ok for trunk?

The new fortran_real_10 effective-target keyword needs documenting in
sourcebuild.texi.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag

2017-09-26 Thread Janus Weil

2017-09-26 10:44 GMT+02:00 Janus Weil :
> 2017-09-25 23:23 GMT+02:00 Steve Kargl :
>> On Mon, Sep 25, 2017 at 11:14:42PM +0200, Janus Weil wrote:
>>> 2017-09-25 17:07 GMT+02:00 David Edelsohn :
>>> > promotion_3.f90 and promotion_4.f90 are failing on at least PowerPC
>>> > and AArch64.  Are these new tests limited to x86 or some long double
>>> > assumptions?
>>>
>>> These tests require the availability of  a 10- or 16-byte-wide REAL
>>> type, respectively. I have to admit that I do not have a complete
>>> overview of which targets in GCC's wide portfolio provide such a type.
>>>
>>> It seems that REAL(16) is supported via libquadmath on 32-bit x86,
>>> x86-64 and Itanium at least. I'm not sure about REAL(10).
>>>
>>> Targets that do not support such a type probably need to be XFAILed.
>>>
>>
>> Janus, I think you can control with a dg option
>>
>> dg-require-effective-target fortran_large_real
>>
>> See, for example, gfortran.dg/random_3.f90
>
> Thanks for the pointer, Steve.
>
> However, it seems that "fortran_large_real" only requires some real
> type that is larger than 8 byte, but makes no assumptions on its
> actual size (10 or 16 byte). Therefore it's probably not very useful
> for promotion_{3,4}.
>
> But: I found that there's also a "fortran_real_16", which should be
> suitable for promotion_3. Can someone verify if the following fixes
> the problem on the failing targets:
>
> Index: promotion_3.f90
> ===
> --- promotion_3.f90(revision 253134)
> +++ promotion_3.f90(working copy)
> @@ -1,5 +1,6 @@
>  ! { dg-do run }
>  ! { dg-options "-fdefault-real-16" }
> +! { dg-require-effective-target fortran_real_16 }
>  !
>  ! PR 82143: add a -fdefault-real-16 flag
>  !
>
>
> If it does, I'll be happy to commit that. For promotion_4, we probably
> need to add an effective target "fortran_real_10" (which does not seem
> to exists yet).


Attached is a more complete patch, which should fix all problems that
were reported concerning these two test cases. Would be great if
someone could confirm that it works on a failing target (I currently
only have access to x86_64-linux-gnu machines).

Ok for trunk?

Cheers,
Janus
Index: gcc/testsuite/gfortran.dg/promotion_3.f90
===
--- gcc/testsuite/gfortran.dg/promotion_3.f90   (revision 253134)
+++ gcc/testsuite/gfortran.dg/promotion_3.f90   (working copy)
@@ -1,5 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-fdefault-real-16" }
+! { dg-require-effective-target fortran_real_16 }
 !
 ! PR 82143: add a -fdefault-real-16 flag
 !
Index: gcc/testsuite/gfortran.dg/promotion_4.f90
===
--- gcc/testsuite/gfortran.dg/promotion_4.f90   (revision 253134)
+++ gcc/testsuite/gfortran.dg/promotion_4.f90   (working copy)
@@ -1,5 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-fdefault-real-10" }
+! { dg-require-effective-target fortran_real_10 }
 !
 ! PR 82143: add a -fdefault-real-16 flag
 !
@@ -12,5 +13,5 @@ double precision :: d
 if (kind(r4) /= 4) call abort
 if (kind(r8) /= 8) call abort
 if (kind(r) /= 10) call abort
-if (kind(d) /= 16) call abort
+if (kind(d)  < 10) call abort
 end
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 253134)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -1464,7 +1464,21 @@ proc check_effective_target_fortran_real_16 { } {
 }]
 }
 
+# Return 1 if the target supports Fortran real kind 10,
+# 0 otherwise. Contrary to check_effective_target_fortran_large_real
+# this checks for real(10) only.
+#
+# When the target name changes, replace the cached result.
 
+proc check_effective_target_fortran_real_10 { } {
+return [check_no_compiler_messages fortran_real_10 executable {
+   ! Fortran
+   real(kind=10) :: x
+   x = cos (x)
+   end
+}]
+}
+
 # Return 1 if the target supports Fortran's IEEE modules,
 # 0 otherwise.
 #

Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag

2017-09-26 Thread Janus Weil

2017-09-25 23:23 GMT+02:00 Steve Kargl :
> On Mon, Sep 25, 2017 at 11:14:42PM +0200, Janus Weil wrote:
>> 2017-09-25 17:07 GMT+02:00 David Edelsohn :
>> > promotion_3.f90 and promotion_4.f90 are failing on at least PowerPC
>> > and AArch64.  Are these new tests limited to x86 or some long double
>> > assumptions?
>>
>> These tests require the availability of  a 10- or 16-byte-wide REAL
>> type, respectively. I have to admit that I do not have a complete
>> overview of which targets in GCC's wide portfolio provide such a type.
>>
>> It seems that REAL(16) is supported via libquadmath on 32-bit x86,
>> x86-64 and Itanium at least. I'm not sure about REAL(10).
>>
>> Targets that do not support such a type probably need to be XFAILed.
>>
>
> Janus, I think you can control with a dg option
>
> dg-require-effective-target fortran_large_real
>
> See, for example, gfortran.dg/random_3.f90

Thanks for the pointer, Steve.

However, it seems that "fortran_large_real" only requires some real
type that is larger than 8 byte, but makes no assumptions on its
actual size (10 or 16 byte). Therefore it's probably not very useful
for promotion_{3,4}.

But: I found that there's also a "fortran_real_16", which should be
suitable for promotion_3. Can someone verify if the following fixes
the problem on the failing targets:

Index: promotion_3.f90
===
--- promotion_3.f90(revision 253134)
+++ promotion_3.f90(working copy)
@@ -1,5 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-fdefault-real-16" }
+! { dg-require-effective-target fortran_real_16 }
 !
 ! PR 82143: add a -fdefault-real-16 flag
 !

If it does, I'll be happy to commit that. For promotion_4, we probably
need to add an effective target "fortran_real_10" (which does not seem
to exists yet).

Cheers,
Janus

Re: [Ada] Use the Monotonic Clock on Linux

2017-09-26 Thread Pierre-Marie de Rodat


On 09/25/2017 02:36 PM, Duncan Sands wrote:

+    --  The most recent calls to clock_gettime were more better.


were more better -> were better


Yes, we fixed that in a latter commit. :-)

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2a6c14a68616dfb8d8578bb8692c5e05de4aade3#patch3

--
Pierre-Marie de Rodat

[C++ Patch] PR 65579 ("gcc requires definition of a static constexpr member...")

2017-09-26 Thread Paolo Carlini


Hi,

this is a relatively old bug already analyzed by Martin last year. He 
also proposed a patch:


https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00593.html

After a short exchange Jason proposed a different approach based on 
simply completing the involved vars:


https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01420.html

Having verified that Martin wasn't actively working on the bug, I 
decided to try a very straightforward implementation of Jason's 
suggestion - see the attached, tested x86_64-linux - which appears to 
work fine as-is. Naturally, one could imagine restricting/enlarging the 
set of decls to complete: some choices don't seem good, like extending 
to non-constepr vars too (the corresponding snippet is ill formed anyway 
due to the initialization). I didn't try to test all the possible 
variants...


Thanks, Paolo.

///

/cp
2017-09-26  Paolo Carlini  

PR c++/65579
* decl.c (grokdeclarator): Before calling cp_apply_type_quals_to_decl
on constexpr VAR_DECLs complete their type.

/testsuite
2017-09-26  Paolo Carlini  

PR c++/65579
* g++.dg/cpp0x/constexpr-template11.C: New.
Index: cp/decl.c
===
--- cp/decl.c   (revision 253134)
+++ cp/decl.c   (working copy)
@@ -12348,7 +12348,11 @@ grokdeclarator (const cp_declarator *declarator,
 
 /* Set constexpr flag on vars (functions got it in grokfndecl).  */
 if (constexpr_p && VAR_P (decl))
-  DECL_DECLARED_CONSTEXPR_P (decl) = true;
+  {
+   DECL_DECLARED_CONSTEXPR_P (decl) = true;
+   if (!processing_template_decl)
+ TREE_TYPE (decl) = complete_type (TREE_TYPE (decl));
+  }
 
 /* Record constancy and volatility on the DECL itself .  There's
no need to do this when processing a template; we'll do this
Index: testsuite/g++.dg/cpp0x/constexpr-template11.C
===
--- testsuite/g++.dg/cpp0x/constexpr-template11.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-template11.C   (working copy)
@@ -0,0 +1,16 @@
+// PR c++/65579
+// { dg-do link { target c++11 } }
+
+template 
+struct S {
+int i;
+};
+
+struct T {
+  static constexpr S s = { 1 };
+};
+
+int main()
+{
+  return T::s.i;
+}

Re: Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-26 Thread Richard Biener

On Mon, Sep 25, 2017 at 7:14 PM, Prathamesh Kulkarni
 wrote:
> On 18 September 2017 at 15:40, Prathamesh Kulkarni
>  wrote:
>> On 15 September 2017 at 22:09, Marc Glisse  wrote:
>>> On Fri, 15 Sep 2017, Wilco Dijkstra wrote:
>>>
 Marc Glisse wrote:

> The question is whether, having computed c=a/b, it is cheaper to test a or c!=0.
> I think it is usually the second one, but not for all types on all
> targets. Although since
> you mention VRP, it is easier to do further optimizations using the
> information a>>>

 No, a>>> throughput on
 all modern cores, so rather than having to wait until the division
 finishes, you can
 execute whatever depends on the comparison many cycles earlier.

 Generally you want to avoid division as much as possible and when that
 fails
 reduce any dependencies on the result of divisions.
>>>
>>>
>>> This would indicate that we do not need to check for single-use, makes the
>>> patch simpler, thanks.
>>> (let's ignore -Os)
>> Hi,
>> Thanks for the suggestions, I have updated the patch.
>> Is this OK ?
>> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
>> I will try address the right shift by 4 case in follow up patch.
>>
> ping https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01145.html

Ok.

Thanks,
Richard.

> Thanks,
> Prathamesh
>> Thanks,
>> Prathamesh
>>>
>>> --
>>> Marc Glisse

1 2 >

1 - 100 of 103 matches

Mail list logo