Re: [0/3] Turn current_vector_size into a vec_info field

2019-10-20 Thread Richard Biener
On October 20, 2019 3:21:32 PM GMT+02:00, Richard Sandiford 
 wrote:
>Now that we're keeping multiple vec_infos around at the same time,
>it seemed worth turning current_vector_size into a vec_info field.
>This for example simplifies the book-keeping in vect_analyze_loop
>and helps with some follow-on changes.
>
>Tested on aarch64-linux-gnu and x86_64-linux-gnu.

OK. 

Thanks, 
Richard. 

>Richard



Re: [wwwdocs] Update gcc-10/changes.html re Intel ISA (was: gcc-wwwdocs branch master updated. 63fbcfeaf27d9dd2083ccbd34bdff8fccb63949c)

2019-10-20 Thread Hongtao Liu
On Mon, Oct 21, 2019 at 1:15 AM Gerald Pfeifer  wrote:
>
> On Fri, 11 Oct 2019, liuho...@gcc.gnu.org wrote:
> > commit 63fbcfeaf27d9dd2083ccbd34bdff8fccb63949c
> > Author: liuhongt 
> > Date:   Fri Oct 11 14:27:47 2019 +0800
> >
> > Update gcc10 changes with new intel ISA.
>
> I just applied this follow-up patch which adds markup.  Usually we
> also refer to "command-line option" or similar, but I leave it up
> to you whether you want to do this here.
>
> Gerald
>
> - Log -
> commit 403208f04a685071344227d54127664e6894ee0a
> Author: Gerald Pfeifer 
> Date:   Sun Oct 20 19:11:06 2019 +0200
>
> Properly mark up command-line options re the new Intel ISA.
>
> diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
> index 478436d..7e7b666 100644
> --- a/htdocs/gcc-10/changes.html
> +++ b/htdocs/gcc-10/changes.html
> @@ -193,10 +193,12 @@ a work-in-progress.
>Support to expand __builtin_roundeven into the appropriate
>  SSE 4.1 instruction has been added.
>
> -  GCC now supports the Intel CPU named Cooperlake through 
> -march=cooperlake.
> +  GCC now supports the Intel CPU named Cooperlake through
> +-march=cooperlake.
>  The switch enables the AVX512BF16 ISA extensions.
>
> -  GCC now supports the Intel CPU named Tigerlake through 
> -march=tigerlake.
> +  GCC now supports the Intel CPU named Tigerlake through
> +-march=tigerlake.
>  The switch enables the MOVDIRI MOVDIR64B AVX512VP2INTERSECT ISA 
> extensions.
>
>  

Thanks, I will watch out for this next time.

-- 
BR,
Hongtao


Re: [PATCH] Fix -Wshadow=local warnings in rtl.h

2019-10-20 Thread Bernd Edlinger
On 10/5/19 9:24 AM, Jakub Jelinek wrote:
> On Sat, Oct 05, 2019 at 06:12:37AM +, Bernd Edlinger wrote:
>> On 10/3/19 5:25 PM, Jakub Jelinek wrote:
>>> Does this affect debuggability of --enable-checking=yes,rtl compilers?
>>> I mean, often when we replace some macros with inlines step in GDB
>>> becomes a bigger nightmare, having to go through tons of inline frames.
>>> gdbinit.in has a lengthy list of inlines to skip in rtl.h, shouldn't this be
>>> added to that list?  Not 100% sure how well it will work on rtl checking
>>> vs. non-rtl checking builds.
>>>
>>
>> I don't see a big problem here.  If I type "s" in gdb it jumps to the check
>> function and the next s jumps back, adding skip instructions in gdbinit.in
>> does not seem to have any effect for me, but the debug is not that 
>> uncomfortable
>> anyway.
>>
>> Interesting is that gdb also jumps in the check function when I press n.
>> That is also Independent of the gdbinit.in, seems to be a bug due to inlining
>> code from another line than the original macro, but nothing terrilby bad.
> 
> Unfortunately, for me the above two counts as terribly bad, show
> stopper here.  A lot of function calls in RTL are like:
> rtx_equal_for_memref_p (XEXP (x, 0), XEXP (y, 0))
> rtx_equal_p (ENTRY_VALUE_EXP (x), ENTRY_VALUE_EXP (y))
> force_operand (XEXP (dest_mem, 0), target)
> etc.  (just random examples), during debugging one is absolutely
> uninterested in stepping into the implementation of XEXP, you know what it
> means, you want to go stright into the rtx_equal_for_memref_p etc.  call. 
> It is already bad that one has to step through the poly_int* stuff or
> rhs_regno for REGNO (we should add rhs_regno to gdbinit.in and some of the
> poly_int* stuff too).  And no, one can't use n instead of s, because then
> the whole call is skipped.  b rtx_equal_for_memref_p; n works, but it is
> time consuming and one needs to delete the breakpoint again.
> 

Okay, I think I have fixed both the "s" ignoring the skip status on inlined
subroutines, with a gdb-patch I posted here:
http://sourceware.org/ml/gdb-patches/2019-10/msg00685.html

And the "n" jumping in the bottom half if the inlined template, will be fixed
by the patch "Fix dwarf-lineinfo inconsistency of inlined subroutines"
which I posted here:
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01459.html

Those only affect optimized code, and are already there with the tree.h
as you will probably know.

With those two patches the rtl-checking patch creates both on non-optimized
stage-1 compiler and an optimized stage-3 compiler a very consistent debug
impression.

I have cleaned up the rtl checking patch once more, removed the no longer used
RTL_CHECKC3, XC3EXP macros, and found a way to suppress a template 
specialization
in gdbinit.in using a regular expression matching syntax, instead of suppressing
all of rtl.h, that might be more specific than suppressing per file.
Suppressing all of rtl.h would work as well, if that is considered a better 
approach.


Now I think the show-stoppers, regarding debuggability of the template-driven
rtl-checking macros are no longer an issue, right?

Also performance-wise it is better than what we had before, IMHO.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
2019-10-04  Bernd Edlinger  

	* gdbinit.in: Add skip for the new rtl checking templates.
	* rtl.h (RTL_CHECK1, RTL_CHECK2, RTL_CHECKC1, RTL_CHECKC2
	RTVEC_ELT): Reimplement with inline functions.
	(RTL_CHECKC3, XC3EXP): Remove.
	(RTL_FLAG_CHECK): New variadic macro.
	(RTL_FLAG_CHECK1-6): Use RTL_FLAG_CHECK.
	(RTL_FLAG_CHECK7): Remove.
	(rtvec_check): New helper function.
	(rtl_check_bounds, rtl_check_code,
	rtl_flag_check): New helper templates.
	* rtl.c (rtl_check_failed_code3): Remove.

Index: gcc/gdbinit.in
===
--- gcc/gdbinit.in	(revision 277155)
+++ gcc/gdbinit.in	(working copy)
@@ -299,6 +299,10 @@ skip PATTERN
 skip INSN_LOCATION
 skip INSN_HAS_LOCATION
 skip JUMP_LABEL_AS_INSN
+skip rtvec_check
+skip -rfunction rtl_check_bounds<.*>
+skip -rfunction rtl_check_code<.*>
+skip -rfunction rtl_flag_check<.*>
 
 # Restore pagination to the previous state.
 python if __gcc_prev_pagination: gdb.execute("set pagination on")
Index: gcc/rtl.c
===
--- gcc/rtl.c	(revision 277155)
+++ gcc/rtl.c	(working copy)
@@ -892,17 +892,6 @@ rtl_check_failed_code2 (const_rtx r, enum rtx_code
 }
 
 void
-rtl_check_failed_code3 (const_rtx r, enum rtx_code code1, enum rtx_code code2,
-			enum rtx_code code3, const char *file, int line,
-			const char *func)
-{
-  internal_error
-("RTL check: expected code '%s', '%s' or '%s', have '%s' in %s, at %s:%d",
- GET_RTX_NAME (code1), GET_RTX_NAME (code2), GET_RTX_NAME (code3),
- GET_RTX_NAME (GET_CODE (r)), func, trim_filename (file), line);
-}
-
-void
 rtl_check_failed_code_mode (const_rtx r, enum rtx_code code, 

Re: [PATCH] Fix description of -fcommon

2019-10-20 Thread Sandra Loosemore

On 10/20/19 11:14 AM, Bernd Edlinger wrote:

Hi,

I've noticed that the description of -fcommon that gets printed
with "gcc -v --help" is exactly the opposite of what this
option actually does.

With -fcommon, different global variables w/o initial value
are plced in common blocks, similar to fortran named common
blocks, while with -fno-common, those variables are placed
in the .bss segment.

I believe the description is describing the "flag_no_common",
but that is not what the user needs to know.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Good catch!  Yes, this is fine.

-Sandra


[wwwdocs] codingconventions.html - hboehm.info now defaults to https.

2019-10-20 Thread Gerald Pfeifer
Committed.

Gerald

- Log -
commit 7d0ef4e2d84d051e0764ca2236f20b1de7970b4a
Author: Gerald Pfeifer 
Date:   Sun Oct 20 22:07:54 2019 +0200

hboehm.info now defaults to https.

diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
index b489125..71a8772 100644
--- a/htdocs/codingconventions.html
+++ b/htdocs/codingconventions.html
@@ -657,7 +657,7 @@ However, the upstream source seems to be dead, so fastjar is
 essentially maintained in the GCC source tree.
 
 boehm-gc: The master sources are at http://hboehm.info/gc/;>http://hboehm.info/gc/.
+href="https://hboehm.info/gc/;>http://hboehm.info/gc/.
 Patches should be sent to
 mailto:bd...@lists.opendylan.org;>bd...@lists.opendylan.org,
 but it's acceptable to check them in the GCC source tree before getting


[doc] install.texi - hboehm.info tweak

2019-10-20 Thread Gerald Pfeifer
Committed.

Gerald

2019-10-20  Gerald Pfeifer  

* doc/install.texi (Configuration, --enable-objc-gc): hboehm.info
now defaults to https.

Index: doc/install.texi
===
--- doc/install.texi(revision 277213)
+++ doc/install.texi(working copy)
@@ -2347,7 +2347,7 @@ The following options apply to the build of the Ob
 @item --enable-objc-gc
 Specify that an additional variant of the GNU Objective-C runtime library
 is built, using an external build of the Boehm-Demers-Weiser garbage
-collector (@uref{http://www.hboehm.info/gc/}).  This library needs to be
+collector (@uref{https://www.hboehm.info/gc/}).  This library needs to be
 available for each multilib variant, unless configured with
 @option{--enable-objc-gc=@samp{auto}} in which case the build of the
 additional runtime library is skipped when not available and the build


[PATCH] Fix dwarf-lineinfo inconsistency of inlined subroutines

2019-10-20 Thread Bernd Edlinger
Hi,

this fixes an issue with the gdb step-over aka. "n" command.

It can be seen when you debug an optimized stage-3 cc1
it does not affect -O0 code, though.

This example debug session will explain the effect.

(gdb) b get_alias_set
Breakpoint 5 at 0xa099f0: file ../../gcc-trunk/gcc/alias.c, line 837.
(gdb) r
Breakpoint 5, get_alias_set (t=t@entry=0x77ff7ab0) at 
../../gcc-trunk/gcc/alias.c:837
837   if (t == error_mark_node
(gdb) n
839   && (TREE_TYPE (t) == 0 || TREE_TYPE (t) == error_mark_node)))
(gdb) n
3382  return __t;  <-- now we have a problem: wrong line info here
(gdb) bt
#0  get_alias_set (t=t@entry=0x77ff7ab0) at ../../gcc-trunk/gcc/tree.h:3382
#1  0x00b25dfe in set_mem_attributes_minus_bitpos (ref=0x7746f990, 
t=0x77ff7ab0, objectp=1, bitpos=...)
at ../../gcc-trunk/gcc/emit-rtl.c:1957
#2  0x01137a55 in make_decl_rtl (decl=0x77ff7ab0) at 
../../gcc-trunk/gcc/varasm.c:1518
#3  0x0113b6e8 in assemble_variable (decl=0x77ff7ab0, 
top_level=, at_end=, 
dont_output_data=0) at ../../gcc-trunk/gcc/varasm.c:2246
#4  0x0113f0ea in varpool_node::assemble_decl (this=0x7745b000) at 
../../gcc-trunk/gcc/varpool.c:584
#5  0x0113fa17 in varpool_node::assemble_decl (this=0x7745b000) at 
../../gcc-trunk/gcc/varpool.c:750


There are at least two problems here:

First you did not want to step into the TREE_TYPE, but it happens all
the time, even if you use "n" to step over it.

And secondly, from the call stack, you don't know where you are in 
get_alias_set.
But the code that is executing at this point is actually the x == 0 || x == 
error_mark_node
from alias.c, line 839, which contains the inlined body of the TREE_TYPE, but
the rest of the if.  So there is an inconsistency in the  

Contents of the .debug_info section:

 <2><4f686>: Abbrev Number: 12 (DW_TAG_inlined_subroutine)
<4f687>   DW_AT_abstract_origin: <0x53d4e>
<4f68b>   DW_AT_entry_pc: 0x7280
<4f693>   DW_AT_GNU_entry_view: 1
<4f695>   DW_AT_ranges  : 0xb480
<4f699>   DW_AT_call_file   : 8  <- alias.c
<4f69a>   DW_AT_call_line   : 839
<4f69c>   DW_AT_call_column : 8
<4f69d>   DW_AT_sibling : <0x4f717>

 The File Name Table (offset 0x253):
  8 2   0   0   alias.c
  102   0   0   tree.h

Contents of the .debug_ranges section:

b480 7280 7291 
b480 2764 277e 
b480 

The problem is at pc=0x7291 in the Line Number Section:

 Line Number Statements:

  [0x8826]  Special opcode 61: advance Address by 4 to 0x7284 and Line by 0 
to 3380
  [0x8827]  Set is_stmt to 1
  [0x8828]  Special opcode 189: advance Address by 13 to 0x7291 and Line by 
2 to 3382 (*)
  [0x8829]  Set is_stmt to 0 (**)
  [0x882a]  Copy (view 1)
  [0x882b]  Set File Name to entry 8 in the File Name Table <- back to 
alias.c
  [0x882d]  Set column to 8
  [0x882f]  Advance Line by -2543 to 839
  [0x8832]  Copy (view 2)
  [0x8833]  Set column to 27
  [0x8835]  Special opcode 61: advance Address by 4 to 0x7295 and Line by 0 
to 839
  [0x8836]  Set column to 3
  [0x8838]  Set is_stmt to 1 <-- next line info counts: alias.c:847
  [0x8839]  Special opcode 153: advance Address by 10 to 0x729f and Line by 
8 to 847
  [0x883a]  Set column to 7

(*) this line is tree.h:3382, but the program counter is *not* within the 
subroutine,
but exactly at the first instruction *after* the subroutine according to the 
debug_ranges.

What makes it worse, is that (**) makes gdb ignore the new location info 
alias.c:839,
which means, normally the n command would have continued to pc=0x729f, which is 
at alias.c:847.


The problem happens due to a block with only var
This patch fixes this problem by moving (**) to the first statement with a 
different line number.

In alias.c.316r.final this looks like that:

(note 2884 2883 1995 31 0x7f903a931ba0 NOTE_INSN_BLOCK_BEG)
(note 1995 2884 2885 31 ../../gcc-trunk/gcc/tree.h:3377 NOTE_INSN_INLINE_ENTRY)
(note 2885 1995 1996 31 0x7f903a931c00 NOTE_INSN_BLOCK_BEG)
[...]
(note 50 39 59 32 [bb 32] NOTE_INSN_BASIC_BLOCK)
(note 59 50 60 32 NOTE_INSN_DELETED)
(note 60 59 1997 32 NOTE_INSN_DELETED)
(note 1997 60 2239 32 ../../gcc-trunk/gcc/tree.h:3382 NOTE_INSN_BEGIN_STMT)
(note 2239 1997 2240 32 (var_location __tD.143911 (nil)) NOTE_INSN_VAR_LOCATION)
(note 2240 2239 2241 32 (var_location __sD.143912 (nil)) NOTE_INSN_VAR_LOCATION)
(note 2241 2240 2242 32 (var_location __fD.143913 (nil)) NOTE_INSN_VAR_LOCATION)
(note 2242 2241 2243 32 (var_location __lD.143914 (nil)) NOTE_INSN_VAR_LOCATION)
(note 2243 2242 2886 32 (var_location __gD.143915 (nil)) NOTE_INSN_VAR_LOCATION)
(note 2886 2243 2887 32 0x7f903a931c00 NOTE_INSN_BLOCK_END)
(note 2887 2886 57 32 0x7f903a931ba0 NOTE_INSN_BLOCK_END)
(insn:TI 57 2887 61 32 (set (reg/f:DI 0 ax [orig:87 _7 ] [87])
(mem/f/j:DI (plus:DI 

Re: [PATCH] Fix (hypothetical) problem with pre-reload splitters (PR target/92140)

2019-10-20 Thread Uros Bizjak
On Sun, Oct 20, 2019 at 1:24 PM Jakub Jelinek  wrote:
>
> Hi!
>
> As mentioned in the PR, the x86 backend has various define_insn_and_split
> patterns that are meant to match usually during combine, are then
> unconditionally split during split1 pass and as they have && 
> can_create_pseudo_p ()
> in their define_insn condition, if they get matched after split1, nothing
> would split them anymore and they wouldn't match after reload.
>
> The split1 pass already sets a property that can be used.
>
> I've first tried to remove some constraints and associated attributes, but
> it seems from further discussions in the PR I don't know much about the
> reasons why they were added and if they are still needed or not, so this
> version of the patch just replaces the can_create_pseudo_p () conditions
> with a new predicate that stops matching already after the split1 pass.

As explained by Segher in the PR, there is no 100% guarantee that
combine won't produce a pattern with a wrong hard register as a e.g.
count reg of a shift insn. RA will die on this kind of pattern with
reload failure, so these constraints are used together with
ix86_legitimate_combined_insn target hook to reject invalid
combinations involving hard registers.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Assuming that used property is the correct way to go, OK.

Thanks,
Uros.

> 2019-10-20  Jakub Jelinek  
>
> * config/i386/i386-protos.h (ix86_pre_reload_split): Declare.
> * config/i386/i386.c (ix86_pre_reload_split): New function.
> * config/i386/i386.md (*fix_trunc_i387_1, *add3_eq,
> *add3_ne, *add3_eq_0, *add3_ne_0, *add3_eq,
> *add3_ne, *add3_eq_1, *add3_eq_0, *add3_ne_0,
> *anddi3_doubleword, *andndi3_doubleword, *di3_doubleword,
> *one_cmpldi2_doubleword, *ashl3_doubleword_mask,
> *ashl3_doubleword_mask_1, *ashl3_mask, *ashl3_mask_1,
> *3_mask, *3_mask_1,
> *3_doubleword_mask,
> *3_doubleword_mask_1, *3_mask,
> *3_mask_1, *_mask, 
> *_mask_1,
> *btr_mask, *btr_mask_1, *jcc_bt, *jcc_bt_1,
> *jcc_bt_mask, *popcounthi2_1, frndintxf2_,
> *fist2__1, *3_1, *di3_doubleword):
> Use ix86_pre_reload_split instead of can_create_pseudo_p in condition.
> * config/i386/sse.md (*sse4_1_v8qiv8hi2_2,
> *avx2_v8qiv8si2_2,
> *sse4_1_v4qiv4si2_2,
> *sse4_1_v4hiv4si2_2,
> *avx512f_v8qiv8di2_2,
> *avx2_v4qiv4di2_2, 
> *avx2_v4hiv4di2_2,
> *sse4_1_v2hiv2di2_2,
> *sse4_1_v2siv2di2_2, sse4_2_pcmpestr,
> sse4_2_pcmpistr): Likewise.
>
> --- gcc/config/i386/i386-protos.h.jj2019-10-19 14:45:47.693185643 +0200
> +++ gcc/config/i386/i386-protos.h   2019-10-19 19:08:12.371359926 +0200
> @@ -55,6 +55,7 @@ extern rtx standard_80387_constant_rtx (
>  extern int standard_sse_constant_p (rtx, machine_mode);
>  extern const char *standard_sse_constant_opcode (rtx_insn *, rtx *);
>  extern bool ix86_standard_x87sse_constant_load_p (const rtx_insn *, rtx);
> +extern bool ix86_pre_reload_split (void);
>  extern bool symbolic_reference_mentioned_p (rtx);
>  extern bool extended_reg_mentioned_p (rtx);
>  extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> --- gcc/config/i386/i386.c.jj   2019-10-19 14:45:47.729185094 +0200
> +++ gcc/config/i386/i386.c  2019-10-19 19:08:12.376359849 +0200
> @@ -4894,6 +4894,18 @@ ix86_standard_x87sse_constant_load_p (co
>return true;
>  }
>
> +/* Predicate for pre-reload splitters with associated instructions,
> +   which can match any time before the split1 pass (usually combine),
> +   then are unconditionally split in that pass and should not be
> +   matched again afterwards.  */
> +
> +bool
> +ix86_pre_reload_split (void)
> +{
> +  return (can_create_pseudo_p ()
> + && !(cfun->curr_properties & PROP_rtl_split_insns));
> +}
> +
>  /* Returns true if OP contains a symbol reference */
>
>  bool
> --- gcc/config/i386/i386.md.jj  2019-10-19 14:46:15.489760948 +0200
> +++ gcc/config/i386/i386.md 2019-10-19 19:08:12.381359773 +0200
> @@ -4920,7 +4920,7 @@ (define_insn_and_split "*fix_trunc
> && !TARGET_FISTTP
> && !(SSE_FLOAT_MODE_P (GET_MODE (operands[1]))
>  && (TARGET_64BIT || mode != DImode))
> -   && can_create_pseudo_p ()"
> +   && ix86_pre_reload_split ()"
>"#"
>"&& 1"
>[(const_int 0)]
> @@ -6857,7 +6857,7 @@ (define_insn_and_split "*add3_eq"
>   (match_operand:SWI 2 "")))
> (clobber (reg:CC FLAGS_REG))]
>"ix86_binary_operator_ok (PLUS, mode, operands)
> -   && can_create_pseudo_p ()"
> +   && ix86_pre_reload_split ()"
>"#"
>"&& 1"
>[(set (reg:CC FLAGS_REG)
> @@ -6881,7 +6881,7 @@ (define_insn_and_split "*add3_ne"
> && (mode != DImode
> || INTVAL (operands[2]) != HOST_WIDE_INT_C (-0x8000))
> && ix86_binary_operator_ok (PLUS, mode, operands)
> -   && can_create_pseudo_p ()"
> +   && ix86_pre_reload_split ()"
>

Re: [testsuite] Add test for PR91532

2019-10-20 Thread Prathamesh Kulkarni
On Sat, 19 Oct 2019 at 23:45, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi Richard,
> > Sorry for not adding the test in PR91532 fix.
> > Is the attached patch OK to commit ?
> >
> > Thanks,
> > Prathamesh
> >
> > 2019-10-18  Prathamesh Kulkarni  
> >
> >   PR tree-optimization/91532
> > testsuite/
> >   * gcc.target/aarch64/sve/fmla_2.c: Add dg-scan check for deleted 
> > store.
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
> > index 5c04bcdb3f5..bebb073d1f8 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-options "-O3" } */
> > +/* { dg-options "-O3 -fdump-tree-ifcvt-details" } */
> >
> >  #include 
> >
> > @@ -15,5 +15,6 @@ f (double *restrict a, double *restrict b, double 
> > *restrict c,
> >  }
> >  }
> >
> > +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "ifcvt" } } */
> >  /* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, 
> > z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
> >  /* { dg-final { scan-assembler-not {\tfmad\t} } } */
>
> I think it'd be better to have a scan-assembler-times for st1d instead,
> so that we're testing the end result rather than how we get there.
Hi Richard,
Thanks for the suggestions, is the attached patch OK ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
2019-10-21  Prathamesh Kulkarni  

PR tree-optimization/91532
* gcc.target/aarch64/sve/fmla_2.c: Add dg-scan check for two st1d
insns.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
index 5c04bcdb3f5..51925fa8f50 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
@@ -17,3 +17,4 @@ f (double *restrict a, double *restrict b, double *restrict c,
 
 /* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
 /* { dg-final { scan-assembler-not {\tfmad\t} } } */
+/* { dg-final { scan-assembler-times {\tst1d} 2 } } */


Re: [PATCH V6 05/11] bpf: new GCC port

2019-10-20 Thread Gerald Pfeifer
On Mon, 9 Sep 2019, Jose E. Marchesi wrote:
> I just committed the port to svn trunk, in a single commit, yay!

Congratulations!

> Many thanks to you, richard, seguer and the other reviewers for the
> great feedback and suggestions.  What got committed is certainly WAY
> better than what I submitted initially.

Happy to hear. :-)

> Now the real fun starts :))

I noticed that https://gcc.gnu.org does not have a news item related
to this contribution.  Would you mind adding one?  (Our web pages are
now in GIT, cf. https://gcc.gnu.org/about.html - let me know if you need
help.)

Also gcc/doc/contrib.texi doesn't know anyone by your name yet.  Happy
to change this; let's sync off list?

Gerald


Re: [PATCH][wwwdocs] Update GCC 9 release note

2019-10-20 Thread Gerald Pfeifer
On Thu, 10 Oct 2019, H.J. Lu wrote:
> Here is the same patch for git repo.  Is it OK?

Has this been available since GCC 9.1, or has it been added later?
(If the latter, please add this to a GCC 9.2 or GCC 9.3 section in
that file).

Ok.

Gerald


[wwwdocs] Update gcc-10/changes.html re Intel ISA (was: gcc-wwwdocs branch master updated. 63fbcfeaf27d9dd2083ccbd34bdff8fccb63949c)

2019-10-20 Thread Gerald Pfeifer
On Fri, 11 Oct 2019, liuho...@gcc.gnu.org wrote:
> commit 63fbcfeaf27d9dd2083ccbd34bdff8fccb63949c
> Author: liuhongt 
> Date:   Fri Oct 11 14:27:47 2019 +0800
> 
> Update gcc10 changes with new intel ISA.

I just applied this follow-up patch which adds markup.  Usually we
also refer to "command-line option" or similar, but I leave it up
to you whether you want to do this here.

Gerald

- Log -
commit 403208f04a685071344227d54127664e6894ee0a
Author: Gerald Pfeifer 
Date:   Sun Oct 20 19:11:06 2019 +0200

Properly mark up command-line options re the new Intel ISA.

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 478436d..7e7b666 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -193,10 +193,12 @@ a work-in-progress.
   Support to expand __builtin_roundeven into the appropriate
 SSE 4.1 instruction has been added.
   
-  GCC now supports the Intel CPU named Cooperlake through 
-march=cooperlake.
+  GCC now supports the Intel CPU named Cooperlake through
+-march=cooperlake.
 The switch enables the AVX512BF16 ISA extensions.
   
-  GCC now supports the Intel CPU named Tigerlake through -march=tigerlake.
+  GCC now supports the Intel CPU named Tigerlake through
+-march=tigerlake.
 The switch enables the MOVDIRI MOVDIR64B AVX512VP2INTERSECT ISA extensions.
   
 


[PATCH] Fix description of -fcommon

2019-10-20 Thread Bernd Edlinger
Hi,

I've noticed that the description of -fcommon that gets printed
with "gcc -v --help" is exactly the opposite of what this
option actually does.

With -fcommon, different global variables w/o initial value
are plced in common blocks, similar to fortran named common
blocks, while with -fno-common, those variables are placed
in the .bss segment.

I believe the description is describing the "flag_no_common",
but that is not what the user needs to know.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
2019-10-20  Bernd Edlinger  

	* common.opt (-fcommon): Fix description.

Index: gcc/common.opt
===
--- gcc/common.opt	(revision 277155)
+++ gcc/common.opt	(working copy)
@@ -1132,7 +1132,7 @@
 
 fcommon
 Common Report Var(flag_no_common,0)
-Do not put uninitialized globals in the common section.
+Put uninitialized globals in the common section.
 
 fcompare-debug
 Driver


Re: [patch,testsuite] More fixes for small targets.

2019-10-20 Thread Jeff Law
On 10/18/19 9:33 AM, Georg-Johann Lay wrote:
> Here is some more cases fixed for small targets for noise reduction.
> 
> Ok to apply?
> 
> Johann
> 
> gcc/testsuite/
> Fix some fallout for small targets.
> 
> PR testsuite/52641
> * gcc.dg/torture/pr86034.c: Use 32-bit base type for a bitfield of
> width > 16 bits.
> * gcc.dg/torture/pr90972.c [avr]: Add option "-w".
> * gcc.dg/torture/pr87693.c: Same.
> * gcc.dg/torture/pr91178.c: Add dg-require-effective-target size32plus.
> * gcc.dg/torture/pr91178-2.c: Same.
> * gcc.dg/torture/20181024-1.c
> * gcc.dg/torture/pr86554-1.c: Use 32-bit integers.
> * gcc.dg/tree-ssa/pr91091-1.c: Same.
OK
jeff



Re: GCC wwwdocs move to git done

2019-10-20 Thread Gerald Pfeifer
On Wed, 9 Oct 2019, Joseph Myers wrote:
> I've done the move of GCC wwwdocs to git (using the previously posted and 
> discussed scripts), including setting up the post-receive hook to do the 
> same things previously covered by the old CVS hooks, and minimal updates 
> to the web pages dealing with the CVS setup for wwwdocs.

Really, really, cool.  Thanks a huge bunch, Joseph!

> Note 2: changes may be needed to the process for updating www.gnu.org 
> and Gerald's validator.

The validator *should* be working in this new world now.

And I am in contact with webmas...@gnu.org and will update this round
as things evolve.  (Not an urgency, just important.)

Gerald


Re: gcc-wwwdocs branch master updated. cdc7bf90357701877546f8bac160d0fb9e20b334

2019-10-20 Thread Gerald Pfeifer
On Wed, 9 Oct 2019, js...@gcc.gnu.org wrote:
> +Use "git commit" and "git push origin
> +master" to check in the patch.

I will admit I made a couple of first commits without reading those
details and just used a plain "git push".  

Is there any problem with that, any drawback?

Or could we simplify those instructions?

Gerald


[wwwdocs] readings.html - tweak polyhedron.com link

2019-10-20 Thread Gerald Pfeifer
Committed.

And as a side note, this should be the first commit in the world
of wwwdocs GIT that was properly checked by my bot. :-)

Gerald


commit bf45ac10505f02e59a0dfb13540cc8d7f5a21a68
Author: Gerald Pfeifer 
Date:   Sun Oct 20 17:54:09 2019 +0200

www.polyhedron.com is now polyhedron.com.

diff --git a/htdocs/readings.html b/htdocs/readings.html
index 42dd285..fd42aaf 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -461,7 +461,7 @@ names.
 Tests of run-time checking capabilities
 
   
-https://www.polyhedron.com/;>Polyhedron tests
+https://polyhedron.com;>Polyhedron tests
   
 
   


Re: [wwwdocs] Improve markup/nicer formatting for GIT instructions.

2019-10-20 Thread Gerald Pfeifer
On Sat, 19 Oct 2019, Gerald Pfeifer wrote:
> And this makes it a bit nicer (and shorter).

And this makes the anonymous checkout of wwwdocs simple copy

Commmitted.

Gerald


diff --git a/htdocs/about.html b/htdocs/about.html
index 48918c8..a67e358 100644
--- a/htdocs/about.html
+++ b/htdocs/about.html
@@ -59,8 +59,11 @@ and SSH installed, you can check out the web pages via
 
 where username is your user name at gcc.gnu.org.
 
-For anonymous access, use
-git://gcc.gnu.org/git/gcc-wwwdocs.git instead.
+For anonymous access, use
+
+git clone git://gcc.gnu.org/git/gcc-wwwdocs.git
+
+


Fix wrong code issue in access path oracle

2019-10-20 Thread Jan Hubicka
Hi,
this patch fixes micompilation of babel Jeff wrote me about.  Problem is
the array walking in nonoverlapping_refs_since_match_p which gets wrong
the following testcase:

int
main (int argc, char **argv)
{
  int c;
  unsigned char out[][1] = { {71}, {71}, {71} };

  for (int i = 0; i < 3; i++)
if (!out[i][0])
  __builtin_abort ();
  return 0;
}

Now the oracle is called for out[2] store and out[i][0] read.  It correctly
idenfifies that base pointers are the same and proceeds to
nonoverlapping_refs_since_match_p

this function expects two access path of same form (i.e. something like
a.b.foo and a.b.bar) attempts to match the common part of paths and if
they differ at some point disambiguate the accesses. It walks from
outermost to innermost reference.  At this point it sees that there is
one array acces in one path and two in other.  To synchronize it it pops
[i] access and containues comparing base[0] and base[2] which gets
disambiguated because they happen to have same type size bit by accident
- changing [1] in testcase to [2] avoids the misoptimization.

While writing the code I was imagining that the access path with more
array references will eventually lead to same type as the shorter access path.
This is not the case here where shorter access path ends by non-scalar value.
I was thikning of this option, but incorrectly convinced myself that the size
check later is sufficient.

It is not safe to pop the access without verifying that the size of the innter
type is actually and integer multiply of the other type size (so we maintain
the invariant that bases are either same or competely disjoint).
This seem to get tricky at least with -fno-strict-aliasing if code walks past
end of arrays (and expects it work) and probably also for trailing arrays, so
this patch is just a quick fix to disable the logic for all array referneces
which are not known to have zero offset (where the invariant is trivially
manitained).

I will discuss with richi options next week, but perhaps it is not worth the
extra effort as the case we look for is relatively rare. For now I am going to
commit the following patch which fixes the wrong code (and xfails the testcase
I made for different length paths). I will collect some data how much it matters
in practice at Monday.

Bootstrapped/regtested x86_64-linux.

* gcc.c-torture/execute/alias-access-path-2.c: New testcase.
* gcc.dg/tree-ssa/alias-access-path-11.c: Xfail.
* tree-ssa-alias.c (nonoverlapping_refs_since_match_p): Do not
skip non-zero array accesses.

Index: testsuite/gcc.c-torture/execute/alias-access-path-2.c
===
--- testsuite/gcc.c-torture/execute/alias-access-path-2.c   (nonexistent)
+++ testsuite/gcc.c-torture/execute/alias-access-path-2.c   (working copy)
@@ -0,0 +1,11 @@
+int
+main (int argc, char **argv)
+{
+  int c;
+  unsigned char out[][1] = { {71}, {71}, {71} };
+
+  for (int i = 0; i < 3; i++)
+if (!out[i][0])
+  __builtin_abort ();
+  return 0;
+}
Index: testsuite/gcc.dg/tree-ssa/alias-access-path-11.c
===
--- testsuite/gcc.dg/tree-ssa/alias-access-path-11.c(revision 276935)
+++ testsuite/gcc.dg/tree-ssa/alias-access-path-11.c(working copy)
@@ -12,4 +12,4 @@ test(int i,int j)
   (*innerptr)[3][j]=11;
   return (*barptr)[i][2][j];
 }
-/* { dg-final { scan-tree-dump-times "return 10" 1 "fre3"} } */
+/* { dg-final { scan-tree-dump-times "return 10" 1 "fre3" { xfail *-*-* } } } 
*/
Index: tree-ssa-alias.c
===
--- tree-ssa-alias.c(revision 276935)
+++ tree-ssa-alias.c(working copy)
@@ -1444,20 +1444,36 @@ nonoverlapping_refs_since_match_p (tree
  for (; narray_refs1 > narray_refs2; narray_refs1--)
{
  ref1 = component_refs1.pop ();
- /* Track whether we possibly introduced partial overlap assuming
-that innermost type sizes does not match.  This only can
-happen if the offset introduced by the ARRAY_REF
-is non-zero.  */
+
+ /* If index is non-zero we need to check whether the reference
+does not break the main invariant that bases are either
+disjoint or equal.  Consider the example:
+
+unsigned char out[][1];
+out[1]="a";
+out[i][0];
+
+Here bases out and out are same, but after removing the
+[i] index, this invariant no longer holds, because
+out[i] points to the middle of array out.
+
+TODO: If size of type of the skipped reference is an integer
+multiply of the size of type of the other reference this
+invariant can be verified, but even then it is not completely
+safe with !flag_strict_aliasing if the 

[3/3] Replace current_vector_size with vec_info::vector_size

2019-10-20 Thread Richard Sandiford
Now that all necessary routines have access to the vec_info,
it's trivial to convert current_vector_size to a member variable.


2019-10-20  Richard Sandiford  

gcc/
* tree-vectorizer.h (vec_info::vector_size): New member variable.
(vect_update_max_nunits): Update comment.
(current_vector_size): Delete.
* tree-vect-stmts.c (current_vector_size): Likewise.
(get_vectype_for_scalar_type): Use vec_info::vector_size instead
of current_vector_size.
(get_mask_type_for_scalar_type): Likewise.
* tree-vectorizer.c (try_vectorize_loop_1): Likewise.
* tree-vect-loop.c (vect_update_vf_for_slp): Likewise.
(vect_analyze_loop, vect_halve_mask_nunits): Likewise.
(vect_double_mask_nunits, vect_transform_loop): Likewise.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
(vect_make_slp_decision, vect_slp_bb_region): Likewise.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2019-10-20 14:14:33.692550581 +0100
+++ gcc/tree-vectorizer.h   2019-10-20 14:14:36.768528611 +0100
@@ -326,6 +326,10 @@ typedef std::pair vec_object
   /* Cost data used by the target cost model.  */
   void *target_cost_data;
 
+  /* The vector size for this loop in bytes, or 0 if we haven't picked
+ a size yet.  */
+  poly_uint64 vector_size;
+
 private:
   stmt_vec_info new_stmt_vec_info (gimple *stmt);
   void set_vinfo_for_stmt (gimple *, stmt_vec_info);
@@ -1472,7 +1476,7 @@ vect_get_num_copies (loop_vec_info loop_
 static inline void
 vect_update_max_nunits (poly_uint64 *max_nunits, poly_uint64 nunits)
 {
-  /* All unit counts have the form current_vector_size * X for some
+  /* All unit counts have the form vec_info::vector_size * X for some
  rational X, so two unit sizes must have a common multiple.
  Everything is a multiple of the initial value of 1.  */
   *max_nunits = force_common_multiple (*max_nunits, nunits);
@@ -1588,7 +1592,6 @@ extern dump_user_location_t find_loop_lo
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 
 /* In tree-vect-stmts.c.  */
-extern poly_uint64 current_vector_size;
 extern tree get_vectype_for_scalar_type (vec_info *, tree);
 extern tree get_vectype_for_scalar_type_and_size (tree, poly_uint64);
 extern tree get_mask_type_for_scalar_type (vec_info *, tree);
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-10-20 14:14:33.692550581 +0100
+++ gcc/tree-vect-stmts.c   2019-10-20 14:14:36.768528611 +0100
@@ -11133,22 +11133,20 @@ get_vectype_for_scalar_type_and_size (tr
   return vectype;
 }
 
-poly_uint64 current_vector_size;
-
 /* Function get_vectype_for_scalar_type.
 
Returns the vector type corresponding to SCALAR_TYPE as supported
by the target.  */
 
 tree
-get_vectype_for_scalar_type (vec_info *, tree scalar_type)
+get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
 {
   tree vectype;
   vectype = get_vectype_for_scalar_type_and_size (scalar_type,
- current_vector_size);
+ vinfo->vector_size);
   if (vectype
-  && known_eq (current_vector_size, 0U))
-current_vector_size = GET_MODE_SIZE (TYPE_MODE (vectype));
+  && known_eq (vinfo->vector_size, 0U))
+vinfo->vector_size = GET_MODE_SIZE (TYPE_MODE (vectype));
   return vectype;
 }
 
@@ -11166,7 +11164,7 @@ get_mask_type_for_scalar_type (vec_info
 return NULL;
 
   return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype),
- current_vector_size);
+ vinfo->vector_size);
 }
 
 /* Function get_same_sized_vectype
Index: gcc/tree-vectorizer.c
===
--- gcc/tree-vectorizer.c   2019-10-20 14:13:50.784857051 +0100
+++ gcc/tree-vectorizer.c   2019-10-20 14:14:36.768528611 +0100
@@ -971,7 +971,7 @@ try_vectorize_loop_1 (hash_tablevector_size.is_constant ())
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
 "loop vectorized using %wu byte vectors\n", bytes);
   else
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2019-10-20 14:14:33.692550581 +0100
+++ gcc/tree-vect-loop.c2019-10-20 14:14:36.764528643 +0100
@@ -1414,7 +1414,7 @@ vect_update_vf_for_slp (loop_vec_info lo
dump_printf_loc (MSG_NOTE, vect_location,
 "Loop contains SLP and non-SLP stmts\n");
   /* Both the vectorization factor and unroll factor have the form
-current_vector_size * X for some rational X, so they must have
+loop_vinfo->vector_size * X for some rational X, so they must have
 a common multiple.  */
   vectorization_factor
= 

[2/3] Pass vec_infos to more routines

2019-10-20 Thread Richard Sandiford
These 11 patches just pass vec_infos to one routine each.  Splitting
them up make it easier to write the changelogs, but they're so trivial
that it seemed better to send them all in one message.


Pass a vec_info to vect_supportable_shift

2019-10-20  Richard Sandiford  

gcc/
* tree-vectorizer.h (vect_supportable_shift): Take a vec_info.
* tree-vect-stmts.c (vect_supportable_shift): Likewise.
* tree-vect-patterns.c (vect_synth_mult_by_constant): Update call
accordingly.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2019-10-20 13:58:02.095634389 +0100
+++ gcc/tree-vectorizer.h   2019-10-20 14:14:00.632786715 +0100
@@ -1634,7 +1634,7 @@ extern void vect_get_load_cost (stmt_vec
stmt_vector_for_cost *, bool);
 extern void vect_get_store_cost (stmt_vec_info, int,
 unsigned int *, stmt_vector_for_cost *);
-extern bool vect_supportable_shift (enum tree_code, tree);
+extern bool vect_supportable_shift (vec_info *, enum tree_code, tree);
 extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &);
 extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &);
 extern void optimize_mask_stores (class loop*);
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-10-20 13:58:02.111634275 +0100
+++ gcc/tree-vect-stmts.c   2019-10-20 14:14:00.628786742 +0100
@@ -5465,7 +5465,7 @@ vectorizable_assignment (stmt_vec_info s
either as shift by a scalar or by a vector.  */
 
 bool
-vect_supportable_shift (enum tree_code code, tree scalar_type)
+vect_supportable_shift (vec_info *, enum tree_code code, tree scalar_type)
 {
 
   machine_mode vec_mode;
Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c2019-10-17 14:22:55.519309037 +0100
+++ gcc/tree-vect-patterns.c2019-10-20 14:14:00.628786742 +0100
@@ -2720,6 +2720,7 @@ apply_binop_and_append_stmt (tree_code c
 vect_synth_mult_by_constant (tree op, tree val,
 stmt_vec_info stmt_vinfo)
 {
+  vec_info *vinfo = stmt_vinfo->vinfo;
   tree itype = TREE_TYPE (op);
   machine_mode mode = TYPE_MODE (itype);
   struct algorithm alg;
@@ -2738,7 +2739,7 @@ vect_synth_mult_by_constant (tree op, tr
 
   /* Targets that don't support vector shifts but support vector additions
  can synthesize shifts that way.  */
-  bool synth_shift_p = !vect_supportable_shift (LSHIFT_EXPR, multtype);
+  bool synth_shift_p = !vect_supportable_shift (vinfo, LSHIFT_EXPR, multtype);
 
   HOST_WIDE_INT hwval = tree_to_shwi (val);
   /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.


Pass a vec_info to vect_supportable_direct_optab_p

2019-10-20  Richard Sandiford  

gcc/
* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
a vec_info.
(vect_recog_dot_prod_pattern): Update call accordingly.
(vect_recog_sad_pattern, vect_recog_pow_pattern): Likewise.
(vect_recog_widen_sum_pattern): Likewise.

Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c2019-10-20 14:14:00.628786742 +0100
+++ gcc/tree-vect-patterns.c2019-10-20 14:14:03.588765602 +0100
@@ -187,7 +187,7 @@ vect_get_external_def_edge (vec_info *vi
is nonnull.  */
 
 static bool
-vect_supportable_direct_optab_p (tree otype, tree_code code,
+vect_supportable_direct_optab_p (vec_info *, tree otype, tree_code code,
 tree itype, tree *vecotype_out,
 tree *vecitype_out = NULL)
 {
@@ -985,7 +985,7 @@ vect_recog_dot_prod_pattern (stmt_vec_in
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
-  if (!vect_supportable_direct_optab_p (type, DOT_PROD_EXPR, half_type,
+  if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
type_out, _vectype))
 return NULL;
 
@@ -1143,7 +1143,7 @@ vect_recog_sad_pattern (stmt_vec_info st
   vect_pattern_detected ("vect_recog_sad_pattern", last_stmt);
 
   tree half_vectype;
-  if (!vect_supportable_direct_optab_p (sum_type, SAD_EXPR, half_type,
+  if (!vect_supportable_direct_optab_p (vinfo, sum_type, SAD_EXPR, half_type,
type_out, _vectype))
 return NULL;
 
@@ -1273,6 +1273,7 @@ vect_recog_widen_mult_pattern (stmt_vec_
 static gimple *
 vect_recog_pow_pattern (stmt_vec_info stmt_vinfo, tree *type_out)
 {
+  vec_info *vinfo = stmt_vinfo->vinfo;
   gimple *last_stmt = stmt_vinfo->stmt;
   tree base, exp;
   gimple *stmt;
@@ -1366,7 +1367,7 @@ vect_recog_pow_pattern (stmt_vec_info st
   || (TREE_CODE (exp) == REAL_CST
   && real_equal (_REAL_CST (exp), 

Re: [Darwin, testsuite, committed] Fix Wnonnull on Darwin.

2019-10-20 Thread Iain Sandoe
Martin Sebor  wrote:

> On 10/19/19 2:56 AM, Iain Sandoe wrote:
>> Andreas Schwab  wrote:
>>> On Okt 19 2019, Iain Sandoe  wrote:
>>> 
 This test has failed always on Darwin, because Darwin does not mark
 entries in string.h with nonnull attributes.  Since the purpose of the test
 is to check that the warnings are issued for an inlined function, not that
 the target headers are marked up, we can provide locally marked up
 function declarations for Darwin.
>>> 
>>> If the test depends on the non-std declarations, then it should use them
>>> everywhere.
>> from my perspective, agreed, Martin?
> 
> I don't see a problem with it.  I prefer tests that don't depend
> on system headers to avoid these kinds of issues

We can do that anyway then, - I can just adjust the current code tor remove the
special-casing, and to use __SIZE_TYPE__ instead of size_t everywhere, OK?

> That said, it shouldn't matter if the declaration of a built-in
> function has the nonnull attribute.  As long as the built-in has
> one and isn't disabled GCC should use it.  I'd be curious to know
> what is actually preventing the warning from triggering there.

This is secondary problem, the Darwin header implementation expands th
 memcpy to:
__builtin___memcpy_chk (dst, src,  __builtin_object_size (dst),  len);

Which, obviously, isn’t doing what you expect.

So, this does suggest that
a) for the test to be well-controlled, we need to avoid system headers
b) there might be some other builtins to check.

(but if we restrict the discussion to my understood purpose of the Wnonnull
 test, only (a) matters and that’s what’s proposed above).

thanks.
Iain




[1/3] Avoid setting current_vector_size in get_vec_alignment_for_array_type

2019-10-20 Thread Richard Sandiford
The increase_alignment pass was using get_vectype_for_scalar_type
to get the preferred vector type for each array element type.
This has the effect of carrying over the vector size chosen by
the first successful call to all subsequent calls, whereas it seems
more natural to treat each array type independently and pick the
"best" vector type for each element type.


2019-10-20  Richard Sandiford  

gcc/
* tree-vectorizer.c (get_vec_alignment_for_array_type): Use
get_vectype_for_scalar_type_and_size instead of
get_vectype_for_scalar_type.

Index: gcc/tree-vectorizer.c
===
--- gcc/tree-vectorizer.c   2019-10-20 13:58:02.091634417 +0100
+++ gcc/tree-vectorizer.c   2019-10-20 14:13:50.784857051 +0100
@@ -1347,7 +1347,8 @@ get_vec_alignment_for_array_type (tree t
   gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
   poly_uint64 array_size, vector_size;
 
-  tree vectype = get_vectype_for_scalar_type (strip_array_types (type));
+  tree scalar_type = strip_array_types (type);
+  tree vectype = get_vectype_for_scalar_type_and_size (scalar_type, 0);
   if (!vectype
   || !poly_int_tree_p (TYPE_SIZE (type), _size)
   || !poly_int_tree_p (TYPE_SIZE (vectype), _size)


[0/3] Turn current_vector_size into a vec_info field

2019-10-20 Thread Richard Sandiford
Now that we're keeping multiple vec_infos around at the same time,
it seemed worth turning current_vector_size into a vec_info field.
This for example simplifies the book-keeping in vect_analyze_loop
and helps with some follow-on changes.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Richard


Re: Move code out of vect_slp_analyze_bb_1

2019-10-20 Thread Richard Sandiford
Richard Biener  writes:
> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford 
>  wrote:
>>After the previous patch, it seems more natural to apply the
>>PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what
>>the region is, rather than delaying it to vect_slp_analyze_bb_1.
>>(But rather than carve out the biggest region possible and then
>>reject it, wouldn't it be better to stop when the region gets
>>too big, so we at least have a chance of vectorising something?)
>>
>>It also seems more natural for vect_slp_bb_region to create the
>>bb_vec_info itself rather than (a) having to pass bits of data down
>>for the initialisation and (b) forcing vect_slp_analyze_bb_1 to free
>>on every failure return.
>>
>>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Ok. But I wonder what the reason was for this limit? Dependency analysis was 
> greatly simplified, being no longer quadratic here. Can you check where the 
> limit originally came from? But indeed splitting the region makes more sense 
> then, but at dataref group boundaries I'd say. 

Yeah, looks it was the complexity of dependence analysis:

  https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html

  > Is there any limit on the size of the BB you consider for
  > vectorization?  I see we do compute_all_dependences on it - that
  > might be quite costly.

  I added slp-max-insns-in-bb parameter with initial value 1000.

Thanks,
Richard


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-20 Thread Ramana Radhakrishnan
On Fri, Oct 18, 2019 at 8:49 PM Richard Earnshaw
 wrote:
>
>
> This series of patches rewrites all the DImode arithmetic patterns for
> the Arm backend when compiling for Arm or Thumb2 to split the
> operations during expand (the thumb1 code is unchanged and cannot
> benefit from early splitting as we are unable to expose the carry
> flag).
>
> This has a number of benefits:
>  - register allocation has more freedom to use independent
>registers for the upper and lower halves of the register
>  - we can make better use of combine for spotting insn merge
>opportunities without needing many additional patterns that are
>only used for DImode
>  - we eliminate a number of bugs in the machine description where
>the carry calculations were not correctly propagated by the
>split patterns (we mostly got away with this because the
>splitting previously happened only after most of the important
>optimization passes had been run).
>
> The patch series starts by paring back all the DImode arithmetic
> support to a very simple form without any splitting at all and then
> progressively re-implementing the patterns with early split
> operations.  This proved to be the only sane way of untangling the
> existing code due to a number of latent bugs which would have been
> exposed if a different approach had been taken.
>
> Each patch should produce a working compiler (it did when it was
> originally written), though since the patch set has been re-ordered
> slightly there is a possibility that some of the intermediate steps
> may have missing test updates that are only cleaned up later.
> However, only the end of the series should be considered complete.
> I've kept the patch as a series to permit easier regression hunting
> should that prove necessary.

Yay ! it's quite nice to see this go in.

Ramana


>
> R.
>
> Richard Earnshaw (29):
>   [arm] Rip out DImode addition and subtraction splits.
>   [arm] Perform early splitting of adddi3.
>   [arm] Early split zero- and sign-extension
>   [arm] Rewrite addsi3_carryin_shift_ in canonical form
>   [arm] fix constraints on addsi3_carryin_alt2
>   [arm] Early split subdi3
>   [arm] Remove redundant DImode subtract patterns
>   [arm] Introduce arm_carry_operation
>   [arm] Correctly cost addition with a carry-in
>   [arm] Correct cost calculations involving borrow for subtracts.
>   [arm] Reduce cost of insns that are simple reg-reg moves.
>   [arm] Implement negscc using SBC when appropriate.
>   [arm] Add alternative canonicalizations for subtract-with-carry +
> shift
>   [arm] Early split simple DImode equality comparisons
>   [arm] Improve handling of DImode comparisions against constants.
>   [arm] early split most DImode comparison operations.
>   [arm] Handle some constant comparisons using rsbs+rscs
>   [arm] Cleanup dead code - old support for DImode comparisons
>   [arm] Handle immediate values in uaddvsi4
>   [arm] Early expansion of uaddvdi4.
>   [arm] Improve code generation for addvsi4.
>   [arm] Allow the summation result of signed add-with-overflow to be
> discarded.
>   [arm] Early split addvdi4
>   [arm] Improve constant handling for usubvsi4.
>   [arm] Early expansion of usubvdi4.
>   [arm] Improve constant handling for subvsi4.
>   [arm] Early expansion of subvdi4
>   [arm] Improvements to negvsi4 and negvdi4.
>   [arm] Fix testsuite nit when compiling for thumb2
>
>  gcc/config/arm/arm-modes.def  |   19 +-
>  gcc/config/arm/arm-protos.h   |1 +
>  gcc/config/arm/arm.c  |  598 -
>  gcc/config/arm/arm.md | 2020 ++---
>  gcc/config/arm/iterators.md   |   15 +-
>  gcc/config/arm/predicates.md  |   29 +-
>  gcc/config/arm/thumb2.md  |8 +-
>  .../gcc.dg/builtin-arith-overflow-3.c |   41 +
>  gcc/testsuite/gcc.target/arm/negdi-3.c|4 +-
>  9 files changed, 1757 insertions(+), 978 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-arith-overflow-3.c
>


[PATCH] Fix (hypothetical) problem with pre-reload splitters (PR target/92140)

2019-10-20 Thread Jakub Jelinek
Hi!

As mentioned in the PR, the x86 backend has various define_insn_and_split
patterns that are meant to match usually during combine, are then
unconditionally split during split1 pass and as they have && 
can_create_pseudo_p ()
in their define_insn condition, if they get matched after split1, nothing
would split them anymore and they wouldn't match after reload.

The split1 pass already sets a property that can be used.

I've first tried to remove some constraints and associated attributes, but
it seems from further discussions in the PR I don't know much about the
reasons why they were added and if they are still needed or not, so this
version of the patch just replaces the can_create_pseudo_p () conditions
with a new predicate that stops matching already after the split1 pass.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-10-20  Jakub Jelinek  

* config/i386/i386-protos.h (ix86_pre_reload_split): Declare.
* config/i386/i386.c (ix86_pre_reload_split): New function.
* config/i386/i386.md (*fix_trunc_i387_1, *add3_eq,
*add3_ne, *add3_eq_0, *add3_ne_0, *add3_eq,
*add3_ne, *add3_eq_1, *add3_eq_0, *add3_ne_0,
*anddi3_doubleword, *andndi3_doubleword, *di3_doubleword,
*one_cmpldi2_doubleword, *ashl3_doubleword_mask,
*ashl3_doubleword_mask_1, *ashl3_mask, *ashl3_mask_1,
*3_mask, *3_mask_1,
*3_doubleword_mask,
*3_doubleword_mask_1, *3_mask,
*3_mask_1, *_mask, *_mask_1,
*btr_mask, *btr_mask_1, *jcc_bt, *jcc_bt_1,
*jcc_bt_mask, *popcounthi2_1, frndintxf2_,
*fist2__1, *3_1, *di3_doubleword):
Use ix86_pre_reload_split instead of can_create_pseudo_p in condition.
* config/i386/sse.md (*sse4_1_v8qiv8hi2_2,
*avx2_v8qiv8si2_2,
*sse4_1_v4qiv4si2_2,
*sse4_1_v4hiv4si2_2,
*avx512f_v8qiv8di2_2,
*avx2_v4qiv4di2_2, *avx2_v4hiv4di2_2,
*sse4_1_v2hiv2di2_2,
*sse4_1_v2siv2di2_2, sse4_2_pcmpestr,
sse4_2_pcmpistr): Likewise.

--- gcc/config/i386/i386-protos.h.jj2019-10-19 14:45:47.693185643 +0200
+++ gcc/config/i386/i386-protos.h   2019-10-19 19:08:12.371359926 +0200
@@ -55,6 +55,7 @@ extern rtx standard_80387_constant_rtx (
 extern int standard_sse_constant_p (rtx, machine_mode);
 extern const char *standard_sse_constant_opcode (rtx_insn *, rtx *);
 extern bool ix86_standard_x87sse_constant_load_p (const rtx_insn *, rtx);
+extern bool ix86_pre_reload_split (void);
 extern bool symbolic_reference_mentioned_p (rtx);
 extern bool extended_reg_mentioned_p (rtx);
 extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
--- gcc/config/i386/i386.c.jj   2019-10-19 14:45:47.729185094 +0200
+++ gcc/config/i386/i386.c  2019-10-19 19:08:12.376359849 +0200
@@ -4894,6 +4894,18 @@ ix86_standard_x87sse_constant_load_p (co
   return true;
 }
 
+/* Predicate for pre-reload splitters with associated instructions,
+   which can match any time before the split1 pass (usually combine),
+   then are unconditionally split in that pass and should not be
+   matched again afterwards.  */
+
+bool
+ix86_pre_reload_split (void)
+{
+  return (can_create_pseudo_p ()
+ && !(cfun->curr_properties & PROP_rtl_split_insns));
+}
+
 /* Returns true if OP contains a symbol reference */
 
 bool
--- gcc/config/i386/i386.md.jj  2019-10-19 14:46:15.489760948 +0200
+++ gcc/config/i386/i386.md 2019-10-19 19:08:12.381359773 +0200
@@ -4920,7 +4920,7 @@ (define_insn_and_split "*fix_trunc
&& !TARGET_FISTTP
&& !(SSE_FLOAT_MODE_P (GET_MODE (operands[1]))
 && (TARGET_64BIT || mode != DImode))
-   && can_create_pseudo_p ()"
+   && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(const_int 0)]
@@ -6857,7 +6857,7 @@ (define_insn_and_split "*add3_eq"
  (match_operand:SWI 2 "")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, mode, operands)
-   && can_create_pseudo_p ()"
+   && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (reg:CC FLAGS_REG)
@@ -6881,7 +6881,7 @@ (define_insn_and_split "*add3_ne"
&& (mode != DImode
|| INTVAL (operands[2]) != HOST_WIDE_INT_C (-0x8000))
&& ix86_binary_operator_ok (PLUS, mode, operands)
-   && can_create_pseudo_p ()"
+   && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (reg:CC FLAGS_REG)
@@ -6904,7 +6904,7 @@ (define_insn_and_split "*add3_eq_0
  (match_operand:SWI 1 "")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_unary_operator_ok (PLUS, mode, operands)
-   && can_create_pseudo_p ()"
+   && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (reg:CC FLAGS_REG)
@@ -6925,7 +6925,7 @@ (define_insn_and_split "*add3_ne_0
  (match_operand:SWI 1 "")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_unary_operator_ok (PLUS, mode, operands)
-   && can_create_pseudo_p ()"
+   && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (reg:CC FLAGS_REG)
@@ -6951,7 +6951,7 @@ (define_insn_and_split "*sub3_eq"
  

Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-20 Thread Richard Earnshaw (lists)
On 19/10/2019 17:31, Segher Boessenkool wrote:
> Hi Richard,
> 
> On Fri, Oct 18, 2019 at 08:48:31PM +0100, Richard Earnshaw wrote:
>>
>> This series of patches rewrites all the DImode arithmetic patterns for
>> the Arm backend when compiling for Arm or Thumb2 to split the
>> operations during expand (the thumb1 code is unchanged and cannot
>> benefit from early splitting as we are unable to expose the carry
>> flag).
> 
> Very nice :-)
> 
> I have a bunch of testcases from when I did something similar for PowerPC
> that I wanted to test...  But I cannot get your series to apply.  Do you
> have a git repo I can pull from?
> 

Perhaps because it's already committed to trunk?


> Here is one test case (it's a bit geared towards what our ISA can do):
> 
> ===
> typedef unsigned int u32;
> typedef unsigned long long u64;
> 
> u64 add(u64 a, u64 b) { return a + b; }
> u64 add1(u64 a) { return a + 1; }
> u64 add42(u64 a) { return a + 42; }
> u64 addm1(u64 a) { return a - 1; }
> u64 addff(u64 a) { return a + 0xULL; }
> u64 addH(u64 a) { return a + 0x12345678ULL; }
> u64 addH0(u64 a) { return a + 0x1234ULL; }
> u64 addS(u64 a, u32 b) { return a + b; }
> u64 addSH(u64 a, u32 b) { return a + ((u64)b << 32); }
> u64 addB1(u64 a) { return a + 0x1ULL; }
> u64 addB8(u64 a) { return a + 0x8ULL; }
> 
> u64 addSH42(u64 a, u32 b) { return a + ((u64)b << 32) + 42; }
> u64 addSHm1(u64 a, u32 b) { return a + ((u64)b << 32) - 1; }
> u64 addSHff(u64 a, u32 b) { return a + ((u64)b << 32) + 0xULL; }
> ===
> 
> rs6000 -m32 currently has non-optimal code for addm1, addSHm1; trunk arm
> has non-optimal code for addH0, addSH, addB1, addB8, addSH42, addSHm1, and
> addSHff if I understand well enough.  So I'd love to see what it does with
> your series applied :-)
> 
> 
> Segher
> 

We do pretty well on this.  Only addSHm1 needs three insns (except where
the constant isn't valid for arm), and I think that's the minimum for
this case anyway.  Several of the tests only need one insn.

R.
.arch armv8-a
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file   "lltest.c"
.text
.align  2
.global add
.syntax unified
.arm
.fpu softvfp
.type   add, %function
add:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
addsr0, r0, r2
adc r1, r1, r3
bx  lr
.size   add, .-add
.align  2
.global add1
.syntax unified
.arm
.fpu softvfp
.type   add1, %function
add1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
addsr0, r0, #1
adc r1, r1, #0
bx  lr
.size   add1, .-add1
.align  2
.global add42
.syntax unified
.arm
.fpu softvfp
.type   add42, %function
add42:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
addsr0, r0, #42
adc r1, r1, #0
bx  lr
.size   add42, .-add42
.align  2
.global addm1
.syntax unified
.arm
.fpu softvfp
.type   addm1, %function
addm1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
subsr0, r0, #1
sbc r1, r1, #0
bx  lr
.size   addm1, .-addm1
.align  2
.global addff
.syntax unified
.arm
.fpu softvfp
.type   addff, %function
addff:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
subsr0, r0, #1
adc r1, r1, #0
bx  lr
.size   addff, .-addff
.align  2
.global addH
.syntax unified
.arm
.fpu softvfp
.type   addH, %function
addH:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr3, #22136
addsr0, r0, r3
movwr3, #4660
adc r1, r3, r1
bx  lr
.size   addH, .-addH
.align  2
.global addH0
.syntax unified
.arm
.fpu softvfp
.type   addH0, %function
addH0:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
add r1, r1, #4608
add r1, r1,