Re: [0/67] Add wrapper classes for machine_modes

2017-05-04 Thread Jeff Law

On 12/09/2016 05:48 AM, Richard Sandiford wrote:

This series includes most of the changes in group C from:

 https://gcc.gnu.org/ml/gcc/2016-11/msg00033.html

The idea is to add wrapper classes around machine_mode_enum
for specific groups of modes, such as scalar integers, scalar floats,
complex values, etc.  This has two main benefits: one specific to SVE
and one not.

The SVE-specific benefit is that it helps to introduce the concept
of variable-length vectors.  To do that we need to change the size
of a vector mode from being a known compile-time constant to being
(possibly) a run-time invariant.  We then need to do the same for
unconstrained machine_modes, which might or might not be vectors.
Introducing these new constrained types means that we can continue
to treat them as having a constant size.

The other benefit is that it uses static type checking to enforce
conditions that are easily forgotten otherwise.  The most common
sources of problems seem to be:

(a) using VOIDmode or BLKmode where a scalar integer was expected
 (e.g. when getting the number of bits in the value).

(b) simplifying vector operations in ways that only make sense for
 scalars.

The series helps with both of these, although we don't get the full
benefit of (b) until variable-sized modes are introduced.

I know of three specific cases in which the static type checking
forced fixes for things that turned out to be real bugs (although
we didn't know that at the time, otherwise we'd have posted patches).
They were later fixed for trunk by:

   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01783.html
   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02983.html
   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02896.html

The group C patches in ARM/sve-branch did slow compile time down a little.
I've since taken steps to avoid that:

- Make the tailcall pass handle aggregate parameters and return values
   (already in trunk).

- Turn some of the new wrapper functions into inline functions.

- Make all the machmode.h macros that used:

 __builtin_constant_p (M) ? foo_inline (M) : foo_array[M[

   forward to an ALWAYS_INLINE function, so that (a) M is only evaluated
   once and (b) __builtin_constant_p is applied to a variable, and so is
   deferred until later passes.  This helped the optimisation to fire in
   more cases and to continue firing when M is a class rather than a
   raw enum.

- In a similar vein, make sure that conditions like:

  SImode == DImode

   are treated as builtin_constant_p by gencondmd, so that .md patterns
   with those conditions are dropped.

With these changes the series is actually a very slight compile-time win.
That might seem unlikely, but there are several possible reasons:

1. The machmode.h macro change above might allow more constant folding.

2. The series has a tendency to evaluate modes once, rather than
continually fetching them from (sometimes quite deep) rtx nests.
Refetching a mode is a particular problem if call comes between
two uses, since the compiler then has to re-evaluate the whole thing.

3. The series introduces many uses of new SCALAR_*TYPE_MODE macros,
as alternatives to TYPE_MODE.  The new macros avoid the usual:

  (VECTOR_TYPE_P (TYPE_CHECK (NODE)) \
   ? vector_type_mode (NODE) : (NODE)->type_common.mode)

and become direct field accesses in release builds.

VECTOR_TYPE_P would be consistently false for these uses,
but call-clobbered registers would usually be treated as clobbered
by the condition as a whole.

Maybe (3) is the most likely reason.

I tested this by compiling the testsuite for:

 aarch64-linux-gnu alpha-linux-gnu arc-elf arm-linux-gnueabi
 arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf
 epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
 hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
 i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
 m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu
 mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
 nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnuspe
 powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf
 rx-elf s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu
 sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf
 xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin
 x86_64-linux-gnu xtensa-elf

and checking that there were no changes in assembly.  Also tested
in the normal way on aarch64-linux-gnu and x86_64-linux-gnu.

The series depends on the already-posted:

   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01657.html
So can we get the discussion around the prerequisite restarted -- I like 
the core ideas around building wrapper classes around machine modes, but 
obviously we can't really move forward on this without the prereqs.


jeff



Re: [PATCH] disable -Walloc-size-larger-than and -Wstringop-overflow for non-C front ends (PR 80545)

2017-05-04 Thread Jeff Law

On 04/28/2017 04:02 PM, Martin Sebor wrote:

The two options were included in -Wall and enabled for all front
ends but only made to be recognized by the driver for the C family
of compilers.  That made it impossible to suppress those warnings
when compiling code for those other front ends (like Fortran).

The attached patch adjusts the warnings so that they are only
enabled for the C family of front ends and not for any others,
as per Richard's suggestion.  (The other solution would have
been to make the warnings available to all front ends.  Since
non-C languages don't have a way of calling the affected
functions -- or do they? -- this is probably not necessary.)

Martin

gcc-80545.diff


PR driver/80545 - option -Wstringop-overflow not recognized by Fortran

gcc/c-family/ChangeLog:

PR driver/80545
* c.opt (-Walloc-size-larger-than, -Wstringop-overflow): Enable
and make available for the C family only.

OK.
jeff


Re: [PATCH] handling address mode changes inside extract_bit_field

2017-05-04 Thread Jeff Law

On 03/01/2017 03:06 PM, Jim Wilson wrote:

This is a proposed patch for the bug 79794 which I just submitted.
This isn't a regression, so this can wait for after the gcc 7 branch
if necessary.

The problem here is that a reg+offset MEM target is passed to
extract_bit_field with a vector register source.  On aarch64, we have
an instruction for this, but it accepts a reg address only, so the
address gets loaded into a reg inside extract_bit_field.  We then
return to expand_expr which does
   ! rtx_equal_p (temp, target)
which fails because of the address mode change, so we end up copying
target into a reg and then back to itself.

expand_expr has a solution for this problem.  There is an alt_rtl
variable that can be set when temp is logically the same as target.
This variable is currently not passed into extract_bit_field.  This
patch does that.

There is an additional complication that the actual address load into
a reg occurs inside maybe_expand_insn, and it doesn't seem reasonable
to pass alt_reg into that.  However, I can grab a bit from the
expand_operand structure to indicate when an operand is the target,
and then clear it if target is replaced with a reg.

The resulting patch works, but ends up a bit more invasive than I
hoped.  The patch has passed a bootstrap and make check test on x86_64
and aarch64.

Jim


alt-rtl.patch


Proposed patch for RTL expand bug affecting aarch64 vector code.

PR middle-end/79794
* expmed.c (extract_bit_field_1): Add alt_rtl argument.  Before
maybe_expand_insn call, set ops[0].target.  If still set after call,
set alt_rtl.  Add extra arg to recursive calls.
(extract_bit_field): Add alt_rtl argument.  Pass to
extract_bit_field.
* expmed.h (extract_bit_field): Fix prototype.
* expr.c (emit_group_load_1, copy_blkmode_from_reg)
(copy_blkmode_to_reg, read_complex_part, store_field): Pass extra NULL
to extract_bit_field_calls.
(expand_expr_real_1): Pass alt_rtl to expand_expr_real instead of 0.
Pass alt_rtl to extract_bit_field calls.
* calls.c (store_unaligned_arguments_into_psuedos)
load_register_parameters): Pass extra NULL to extract_bit_field calls.
* optabs.c (maybe_legitimize_operand): Clear op->target when call
gen_reg_rtx.
* optabs.h (struct expand_operand): Add target bitfield.
The only part I found intrusive was the op->target stuff, but it wasn't 
terrible.


The additional argument creates visual clutter in the diffs as the 
callers get updated, but that's pretty easy to filter out.


This seems fine to me.  A testcase to add to the gcc.target testsuite 
would be useful, but I don't think it's strictly necessary.




jeff


Re: [PATCH] prevent -Wno-system-headers from suppressing -Wstringop-overflow (PR 79214)

2017-05-04 Thread Jeff Law

On 05/04/2017 03:09 PM, Martin Sebor wrote:

On 05/04/2017 01:17 PM, Jeff Law wrote:

On 01/25/2017 02:12 PM, Martin Sebor wrote:

While putting together examples for the GCC 7 changes document
I noticed that a few of the buffer overflow warnings issued by
-Wstringop-overflow are defeated by Glibc's macros for string
manipulation functions like strncat and strncpy.

While testing my fix I also noticed that I had missed a couple
of functions when implementing the warning: memmove and stpcpy.

The attached patch adds handlers for those and fixes the three
bugs below I raised for these omissions.

Is this patch okay for trunk?

PR preprocessor/79214 -  -Wno-system-header defeats strncat buffer
   overflow warnings
PR middle-end/79222 - missing -Wstringop-overflow= on a stpcpy overflow
PR middle-end/79223 - missing -Wstringop-overflow on a memmove overflow

Martin

gcc-79214.diff


PR preprocessor/79214 -  -Wno-system-header defeats strncat buffer
overflow warnings
PR middle-end/79222 - missing -Wstringop-overflow= on a stpcpy overflow
PR middle-end/79223 - missing -Wstringop-overflow on a memmove overflow

gcc/ChangeLog:

PR preprocessor/79214
PR middle-end/79222
PR middle-end/79223
* builtins.c (check_sizes): Add inlinining context and issue

s/inlinining/inlining/


warnings even when -Wno-system-headers is set.
(check_strncat_sizes): Same.
(expand_builtin_strncat): Same.
(expand_builtin_memmove): New function.
(expand_builtin_stpncpy): Same.
(expand_builtin): Handle memmove and stpncpy.

gcc/testsuite/ChangeLog:

PR preprocessor/79214
PR middle-end/79222
PR middle-end/79223
* gcc.dg/pr79214.c: New test.
* gcc.dg/pr79214.h: New test header.
* gcc.dg/pr79222.c: New test.
* gcc.dg/pr79223.c: New test.
* gcc.dg/pr78138.c: Adjust.

OK with the ChangeLog nit fixed.


Done.  Are bugs of this type candidates for backporting to release
branches?
Generally not since they are not regressions.  They could also cause 
cause false positives warnings which in turn could cause code that was 
previously building OK to no longer build.


You can try to make a case to  Jakub, Joseph & Richi for an exception 
though.  It's ultimately their call for a non-regression fix.


jeff


Re: [PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope

2017-05-04 Thread Kevin Buettner
Ahem...  I forgot to note that:

I have bootstrapped and regression tested my patch on x86_64-pc-linux-gnu.

Kevin

On Thu, 4 May 2017 17:45:51 -0700
Kevin Buettner  wrote:

> Consider the following OpenMP program:
> 
> void foo (int a1) {}
> 
> int
> main (void)
> {
>   static int s1 = -41;
>   int i1 = 11, i2;
> 
>   for (i2 = 1; i2 <= 2; i2++)
> {
>   int pass = i2;
> #pragma omp parallel num_threads (2) firstprivate (i1)
>   {
> foo (i1);
>   }
>   foo(pass);
> }
>   foo (s1); foo (i2);
> }
> 
> At the moment, when debugging such a program, GDB is not able to find
> and print the values of s1, i2, and pass.
> 
> My changes to omp-low.c and omp-expand.c, in conjunction with several
> other patches, allow GDB to find and print these values.
> 
> This is the current behavior when debugging in GDB:
> 
> (gdb) b 14
> Breakpoint 1 at 0x400617: file ex3.c, line 14.
> (gdb) run
> Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-trunk
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x773ca700 (LWP 32628)]
> 
> Thread 1 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
> 14  foo (i1);
> (gdb) p s1
> No symbol "s1" in current context.
> (gdb) p i1
> $1 = 11
> (gdb) p i2
> No symbol "i2" in current context.
> (gdb) p pass
> No symbol "pass" in current context.
> (gdb) c
> Continuing.
> [Switching to Thread 0x773ca700 (LWP 32628)]
> 
> Thread 2 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
> 14  foo (i1);
> (gdb) p s1
> No symbol "s1" in current context.
> (gdb) p i1
> $2 = 11
> (gdb) p i2
> No symbol "i2" in current context.
> (gdb) p pass
> No symbol "pass" in current context.
> (gdb) bt
> #0  main._omp_fn.0 () at ex3.c:14
> #1  0x77bc4926 in gomp_thread_start (xdata=)
> at gcc/libgomp/team.c:122
> #2  0x7799761a in start_thread () from /lib64/libpthread.so.0
> #3  0x776d159d in clone () from /lib64/libc.so.6
> 
> Note that GDB is unable to find s1, i2, or pass for either thread.
> 
> I show the backtrace for thread 2 because it's the more difficult case
> to handle due to the stack trace stopping at clone().  The stack frame
> for main(), which is where the variables of interest reside, is not a
> part of this stack.
> 
> When we run this example using the patches associated with this change
> along with several other patches, GDB's behavior looks like this:
> 
> (gdb) b 14
> Breakpoint 1 at 0x400617: file ex3.c, line 14.
> (gdb) run
> Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-new
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x773ca700 (LWP 32643)]
> 
> Thread 1 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
> 14  foo (i1);
> (gdb) p s1
> $1 = -41
> (gdb) p i1
> $2 = 11
> (gdb) p i2
> $3 = 1
> (gdb) p pass
> $4 = 1
> (gdb) c
> Continuing.
> [Switching to Thread 0x773ca700 (LWP 32643)]
> 
> Thread 2 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
> 14  foo (i1);
> (gdb) p s1
> $5 = -41
> (gdb) p i1
> $6 = 11
> (gdb) p i2
> $7 = 1
> (gdb) p pass
> $8 = 1
> 
> I didn't show the stack here.  It's the same as before.  (I would,
> however, like to be able to make GDB display a unified stack.)
> 
> Note that GDB is now able to find and print values for s1, i2, and
> pass.
> 
> GCC constructs a new function for executing the parallel code.
> The debugging information entry for this function is presently
> placed at the same level as that of the function in which the
> "#pragma omp parallel" directive appeared.
> 
> This is partial output from "readelf -v" for the (non-working) example
> shown above:
> 
>  <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
> <2e>   DW_AT_external: 1
> <2e>   DW_AT_name: (indirect string, offset: 0x80): main
> <32>   DW_AT_decl_file   : 1
> <33>   DW_AT_decl_line   : 4
> <34>   DW_AT_prototyped  : 1
> <34>   DW_AT_type: <0x9d>
> <38>   DW_AT_low_pc  : 0x400591
> <40>   DW_AT_high_pc : 0x71
> <48>   DW_AT_frame_base  : 1 byte block: 9c (DW_OP_call_frame_cfa)
> <4a>   DW_AT_GNU_all_tail_call_sites: 1
> <4a>   DW_AT_sibling : <0x9d>
>  <2><4e>: Abbrev Number: 3 (DW_TAG_variable)
> <4f>   DW_AT_name: s1
> <52>   DW_AT_decl_file   : 1
> <53>   DW_AT_decl_line   : 6
> <54>   DW_AT_type: <0x9d>
> <58>   DW_AT_location: 9 byte block: 3 30 10 60 0 0 0 0 0   
> (DW_OP_addr: 601030)
>  ...
>  

[PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope

2017-05-04 Thread Kevin Buettner
Consider the following OpenMP program:

void foo (int a1) {}

int
main (void)
{
  static int s1 = -41;
  int i1 = 11, i2;

  for (i2 = 1; i2 <= 2; i2++)
{
  int pass = i2;
#pragma omp parallel num_threads (2) firstprivate (i1)
  {
foo (i1);
  }
  foo(pass);
}
  foo (s1); foo (i2);
}

At the moment, when debugging such a program, GDB is not able to find
and print the values of s1, i2, and pass.

My changes to omp-low.c and omp-expand.c, in conjunction with several
other patches, allow GDB to find and print these values.

This is the current behavior when debugging in GDB:

(gdb) b 14
Breakpoint 1 at 0x400617: file ex3.c, line 14.
(gdb) run
Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-trunk
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x773ca700 (LWP 32628)]

Thread 1 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
14  foo (i1);
(gdb) p s1
No symbol "s1" in current context.
(gdb) p i1
$1 = 11
(gdb) p i2
No symbol "i2" in current context.
(gdb) p pass
No symbol "pass" in current context.
(gdb) c
Continuing.
[Switching to Thread 0x773ca700 (LWP 32628)]

Thread 2 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
14  foo (i1);
(gdb) p s1
No symbol "s1" in current context.
(gdb) p i1
$2 = 11
(gdb) p i2
No symbol "i2" in current context.
(gdb) p pass
No symbol "pass" in current context.
(gdb) bt
#0  main._omp_fn.0 () at ex3.c:14
#1  0x77bc4926 in gomp_thread_start (xdata=)
at gcc/libgomp/team.c:122
#2  0x7799761a in start_thread () from /lib64/libpthread.so.0
#3  0x776d159d in clone () from /lib64/libc.so.6

Note that GDB is unable to find s1, i2, or pass for either thread.

I show the backtrace for thread 2 because it's the more difficult case
to handle due to the stack trace stopping at clone().  The stack frame
for main(), which is where the variables of interest reside, is not a
part of this stack.

When we run this example using the patches associated with this change
along with several other patches, GDB's behavior looks like this:

(gdb) b 14
Breakpoint 1 at 0x400617: file ex3.c, line 14.
(gdb) run
Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-new
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x773ca700 (LWP 32643)]

Thread 1 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
14  foo (i1);
(gdb) p s1
$1 = -41
(gdb) p i1
$2 = 11
(gdb) p i2
$3 = 1
(gdb) p pass
$4 = 1
(gdb) c
Continuing.
[Switching to Thread 0x773ca700 (LWP 32643)]

Thread 2 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14
14  foo (i1);
(gdb) p s1
$5 = -41
(gdb) p i1
$6 = 11
(gdb) p i2
$7 = 1
(gdb) p pass
$8 = 1

I didn't show the stack here.  It's the same as before.  (I would,
however, like to be able to make GDB display a unified stack.)

Note that GDB is now able to find and print values for s1, i2, and
pass.

GCC constructs a new function for executing the parallel code.
The debugging information entry for this function is presently
placed at the same level as that of the function in which the
"#pragma omp parallel" directive appeared.

This is partial output from "readelf -v" for the (non-working) example
shown above:

 <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
<2e>   DW_AT_external: 1
<2e>   DW_AT_name: (indirect string, offset: 0x80): main
<32>   DW_AT_decl_file   : 1
<33>   DW_AT_decl_line   : 4
<34>   DW_AT_prototyped  : 1
<34>   DW_AT_type: <0x9d>
<38>   DW_AT_low_pc  : 0x400591
<40>   DW_AT_high_pc : 0x71
<48>   DW_AT_frame_base  : 1 byte block: 9c (DW_OP_call_frame_cfa)
<4a>   DW_AT_GNU_all_tail_call_sites: 1
<4a>   DW_AT_sibling : <0x9d>
 <2><4e>: Abbrev Number: 3 (DW_TAG_variable)
<4f>   DW_AT_name: s1
<52>   DW_AT_decl_file   : 1
<53>   DW_AT_decl_line   : 6
<54>   DW_AT_type: <0x9d>
<58>   DW_AT_location: 9 byte block: 3 30 10 60 0 0 0 0 0   
(DW_OP_addr: 601030)
 ...
 <2><7c>: Abbrev Number: 4 (DW_TAG_lexical_block)
<7d>   DW_AT_low_pc  : 0x4005a9
<85>   DW_AT_high_pc : 0x31
 <3><8d>: Abbrev Number: 5 (DW_TAG_variable)
<8e>   DW_AT_name: (indirect string, offset: 0x55): pass
<92>   DW_AT_decl_file   : 1
<93>   DW_AT_decl_line   : 11
<94>   DW_AT_type: <0x9d>
<98>   DW_AT_location: 2 byte block: 91 64  (DW_OP_fbreg: -28)
 ...
 <1>: Abbrev Number: 7 (DW_TAG_subprogram)
   DW_AT_name: 

[PATCH 1/2] C++ template type diff printing

2017-05-04 Thread David Malcolm
This patch kit implements two new options to make it easier
to read diagnostics involving mismatched template types:
  -fdiagnostics-show-template-tree and
  -fno-elide-type.

It adds two new formatting codes: %H and %I which are
equivalent to %qT, but are to be used together for type
comparisons e.g.
  "can't convert from %H to %I".

The formatters work together, and if they're used with template
types, they highlight the differences between the types via color.

For example in:

 #include 
 #include 
 using std::vector;
 using std::map;

 void takes_mivf (map v);

 int test ()
 {
takes_mivf (map ());
 }

rather than printing:

  could not convert 'std::map()'
from 'std::map' to 'std::map'

with -felide-type (the default), it prints:

  could not convert 'std::map()'
from 'map<[...],vector>' to 'map<[...],vector>

where "[...]"  is used to elide matching parts of the template,
and the different parts ("double" and "float") are colorized so
they catch the reader's eye.

With -fdiagnostics-show-template-tree a tree-like structure of the
template is printed, showing the differences; in this case:

  map<
[...],
vector<
  [double != float]>>

again with colorization of the different parts.

You can see a colorized verison of this here:
  https://dmalcolm.fedorapeople.org/gcc/2017-05-04/template-tree.html

With -fno-elide-type, the output is as before the patch, but colorized;
an example can be seen here:
  
https://dmalcolm.fedorapeople.org/gcc/2017-05-04/template-tree-fno-elide-type.html

Implementing %H and %I is slightly fiddly: given that they affect
each other, printing the types in pp_format has to be delayed until
both are seen, so the patch adds an optional hook to pretty-printers
to allow for a post-processing stage after stage 2 of pp_format, which
the C++ frontend implements by doing the comparison printing.
In an earlier version of this patch I had a single format code which
was printed as "from %qT to %qT", but I realized that that wasn't going
to support i18n, so I went with the two format codes.

To keep the patch simpler to review, I've only converted one
diagnostic to using them (which is the one used by the testcases);
the rest are converted in the followup patch.

The option names are chosen to be the same as the equivalent
options in clang.

The combination of the two patches  was successfully
bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c-family/ChangeLog:
* c-format.c (gcc_cxxdiag_char_table): Add 'H' and 'I' to
format_chars.
* c.opt (fdiagnostics-show-template-tree): New option.
(felide-type): New option.

gcc/c/ChangeLog:
* c-objc-common.c (c_tree_printer): Gain a const char **
parameter.

gcc/cp/ChangeLog:
* call.c (perform_implicit_conversion_flags): Convert
"from %qT to %qT" to "from %H to %I" in diagnostic.
* error.c (struct deferred_printed_type): New struct.
(class cxx_format_postprocessor): New class.
(cxx_initialize_diagnostics): Wire up a cxx_format_postprocessor
to pp->m_format_postprocessor.
(comparable_template_types_p): New function.
(newline_and_indent): New function.
(arg_to_string): New function.
(print_nonequal_arg): New function.
(type_to_string_with_compare): New function.
(print_template_tree_comparison): New function.
(append_formatted_chunk): New function.
(add_quotes): New function.
(cxx_format_postprocessor::handle): New function.
(defer_half_of_type_diff): New function.
(cp_printer): Add "buffer_ptr" param.  Implement %H and %I.

gcc/ChangeLog:
* diagnostic-color.c (color_dict): Add "type-diff".
(parse_gcc_colors): Update comment.
* doc/invoke.texi (Diagnostic Message Formatting Options): Add
-fdiagnostics-show-template-tree and -fno-elide-type.
(GCC_COLORS): Add type-diff to example.
(type-diff=): New.
(-fdiagnostics-show-template-tree): New.
(-fno-elide-type): New.
* tree-diagnostic.c (default_tree_printer): Update for new
const char ** param.
* tree-diagnostic.h (default_tree_printer): Likewise.
* pretty-print.c (pp_format): Pass formatters[argno] to the
pp_format_decoder callback.  Call any m_format_postprocessor's
"handle" method.
(pretty_printer::pretty_printer): Initialize
m_format_postprocessor.
(pretty_printer::~pretty_printer): Delete any
m_format_postprocessor.
* pretty-print.h (printer_fn): Add a const char ** parameter.
(class format_postprocessor): New class.
(struct pretty_printer::format_decoder): Document the new
parameter.
(struct pretty_printer::m_format_postprocessor): New field.

gcc/fortran/ChangeLog:
* error.c 

[PATCH 2/2] Use %H and %I throughout C++ frontend.

2017-05-04 Thread David Malcolm
This is the second half of the kit, which uses %H and %I throughout
the C++ frontend whenever describing type mismatches between a pair
of %qT.

gcc/cp/ChangeLog:
* call.c (print_conversion_rejection): Replace pairs of %qT with
%H and %I in various places.
(build_user_type_conversion_1): Likewise.
(build_integral_nontype_arg_conv): Likewise.
(build_conditional_expr_1): Likewise.
(convert_like_real): Likewise.
(convert_arg_to_ellipsis): Likewise.
(joust): Likewise.
(initialize_reference): Likewise.
* cvt.c (cp_convert_to_pointer): Likewise.
(cp_convert_to_pointer): Likewise.
(convert_to_reference): Likewise.
(ocp_convert): Likewise.
* typeck.c (cp_build_binary_op): Likewise.
(convert_member_func_to_ptr): Likewise.
(build_reinterpret_cast_1): Likewise.
(convert_for_assignment): Likewise.
* typeck2.c (check_narrowing): Likewise.
---
 gcc/cp/call.c| 38 +++---
 gcc/cp/cvt.c | 18 +-
 gcc/cp/typeck.c  | 22 +++---
 gcc/cp/typeck2.c |  6 +++---
 4 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 737f312..3b7d3e3 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -3402,7 +3402,7 @@ print_conversion_rejection (location_t loc, struct 
conversion_info *info)
from);
   else
inform (loc, "  no known conversion for implicit "
-   "% parameter from %qT to %qT",
+   "% parameter from %H to %I",
from, info->to_type);
 }
   else if (!TYPE_P (info->from))
@@ -3415,10 +3415,10 @@ print_conversion_rejection (location_t loc, struct 
conversion_info *info)
 }
   else if (info->n_arg == -2)
 /* Conversion of conversion function return value failed.  */
-inform (loc, "  no known conversion from %qT to %qT",
+inform (loc, "  no known conversion from %H to %I",
from, info->to_type);
   else
-inform (loc, "  no known conversion for argument %d from %qT to %qT",
+inform (loc, "  no known conversion for argument %d from %H to %I",
info->n_arg + 1, from, info->to_type);
 }
 
@@ -3925,7 +3925,7 @@ build_user_type_conversion_1 (tree totype, tree expr, int 
flags,
 {
   if (complain & tf_error)
{
- error ("conversion from %qT to %qT is ambiguous",
+ error ("conversion from %H to %I is ambiguous",
 fromtype, totype);
  print_z_candidates (location_of (expr), candidates);
}
@@ -4052,7 +4052,7 @@ build_integral_nontype_arg_conv (tree type, tree expr, 
tsubst_flags_t complain)
  break;
 
if (complain & tf_error)
- error_at (loc, "conversion from %qT to %qT not considered for "
+ error_at (loc, "conversion from %H to %I not considered for "
"non-type template argument", t, type);
/* fall through.  */
 
@@ -4833,14 +4833,14 @@ build_conditional_expr_1 (location_t loc, tree arg1, 
tree arg2, tree arg3,
  if (unsafe_conversion_p (loc, stype, arg2, false))
{
  if (complain & tf_error)
-   error_at (loc, "conversion of scalar %qT to vector %qT "
+   error_at (loc, "conversion of scalar %H to vector %I "
   "involves truncation", arg2_type, vtype);
  return error_mark_node;
}
  if (unsafe_conversion_p (loc, stype, arg3, false))
{
  if (complain & tf_error)
-   error_at (loc, "conversion of scalar %qT to vector %qT "
+   error_at (loc, "conversion of scalar %H to vector %I "
   "involves truncation", arg3_type, vtype);
  return error_mark_node;
}
@@ -5229,7 +5229,7 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree 
arg2, tree arg3,
 arg3_type);
   if (complain & tf_warning)
do_warn_double_promotion (result_type, arg2_type, arg3_type,
- "implicit conversion from %qT to %qT to "
+ "implicit conversion from %H to %I to "
  "match other result of conditional",
  loc);
 
@@ -6603,7 +6603,7 @@ convert_like_real (conversion *convs, tree expr, tree fn, 
int argnum,
 from std::nullptr_t requires direct-initialization.  */
   if (NULLPTR_TYPE_P (TREE_TYPE (expr))
  && TREE_CODE (totype) == BOOLEAN_TYPE)
-   complained = permerror (loc, "converting to %qT from %qT requires "
+   complained = permerror (loc, "converting to %H from %I requires "
"direct-initialization",
totype, TREE_TYPE (expr));
 
@@ -6612,7 +6612,7 @@ convert_like_real 

[gomp4] Add front end support for the if_present clause with the update directive

2017-05-04 Thread Cesar Philippidis
This patch make the c, c++ and fortran FEs aware of the new OpenACC 2.5
if_present clause for the update directive. The ME and runtime support
will come in a separate followup patch.

Thomas, for some reason I'm seeing a couple of new UNRESOLVED tests for
update-1.C. The c++ tests running with goacc.exp are built with
-fopenacc, but for some reason the tests in g++.dg/goacc/ are still ran
without -fopenacc for g++.dg/dg.exp. Maybe there's something wrong with
g++.dg/goacc/goacc.exp handling of .C files?

This patch has been committed to gomp-4_0-branch.

Cesar
2017-05-04  Cesar Philippidis  

	gcc/c-family/
	* c-pragma.h (enum pragma_omp_clause): Add
	PRAGMA_OACC_CLAUSE_IF_PRESENT.

	gcc/c/
	* c-parser.c (c_parser_omp_clause_name): Add support for if_present.
	(c_parser_oacc_all_clauses): Likewise.
	(c_finish_oacc_routine): Likewise.
	(OACC_UPDATE_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_IF_PRESENT.
	* c-typeck.c (c_finish_omp_clauses): Add support for if_present.

	gcc/cp/
	* parser.c (cp_parser_omp_clause_name): Add support for if_present.
	(cp_parser_oacc_all_clauses): Likewise.
	(cp_parser_oacc_kernels_parallel): Likewise.
	(OACC_UPDATE_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_IF_PRESENT.
	* pt.c (tsubst_omp_clauses): Add support for if_present.
	* semantics.c (finish_omp_clauses): Likewise.

	gcc/fortran/
	* gfortran.h (gfc_omp_clauses): Add if_present member.
	* openmp.c (enum omp_mask2): Add OMP_CLAUSE_IF_PRESENT.
	(gfc_match_omp_clauses): Handle it.
	(OACC_UPDATE_CLAUSES): Add OMP_CLAUSE_IF_PRESENT.
	* trans-openmp.c (gfc_trans_omp_clauses_1): Generate an omp clause for
	if_present.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_IF_PRESENT.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise, but just ignore it for
	now.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* tree.c (omp_clause_num_ops): Add an entry for OMP_CLAUSE_IF_PRESENT.
	(omp_clause_code_name): Likewise.
	* tree-core.h (enum omp_clause_code): Likewise.

	gcc/testsuite/
	* c-c++-common/goacc/update-if_present-1.c: New test.
	* c-c++-common/goacc/update-if_present-2.c: New test.
	* g++.dg/goacc/update-1.C: New test.
	* gfortran.dg/goacc/update-if_present-1.f90: New test.
	* gfortran.dg/goacc/update-if_present-2.f90: New test.


diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 7b77dca..f1716ad 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -171,6 +171,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_VECTOR_LENGTH,
   PRAGMA_OACC_CLAUSE_WAIT,
   PRAGMA_OACC_CLAUSE_WORKER,
+  PRAGMA_OACC_CLAUSE_IF_PRESENT,
   PRAGMA_OACC_CLAUSE_COLLAPSE = PRAGMA_OMP_CLAUSE_COLLAPSE,
   PRAGMA_OACC_CLAUSE_COPYIN = PRAGMA_OMP_CLAUSE_COPYIN,
   PRAGMA_OACC_CLAUSE_DEVICE = PRAGMA_OMP_CLAUSE_DEVICE,
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index b1af31f..957007e 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10393,7 +10393,9 @@ c_parser_omp_clause_name (c_parser *parser, bool consume_token = true)
 	result = PRAGMA_OACC_CLAUSE_HOST;
 	  break;
 	case 'i':
-	  if (!strcmp ("inbranch", p))
+	  if (!strcmp ("if_present", p))
+	result = PRAGMA_OACC_CLAUSE_IF_PRESENT;
+	  else if (!strcmp ("inbranch", p))
 	result = PRAGMA_OMP_CLAUSE_INBRANCH;
 	  else if (!strcmp ("independent", p))
 	result = PRAGMA_OACC_CLAUSE_INDEPENDENT;
@@ -13268,6 +13270,12 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_omp_clause_if (parser, clauses, false);
 	  c_name = "if";
 	  break;
+	case PRAGMA_OACC_CLAUSE_IF_PRESENT:
+	  clauses = c_parser_oacc_simple_clause (parser, here,
+		 OMP_CLAUSE_IF_PRESENT,
+		 clauses);
+	  c_name = "if_present";
+	  break;
 	case PRAGMA_OACC_CLAUSE_INDEPENDENT:
 	  clauses = c_parser_oacc_simple_clause (parser, here,
 		 OMP_CLAUSE_INDEPENDENT,
@@ -14344,6 +14352,7 @@ c_finish_oacc_routine (struct oacc_routine_data *data, tree fndecl,
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_HOST)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF_PRESENT)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
 
 #define OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK\
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index b04db44..70c15be 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13396,6 +13396,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	case OMP_CLAUSE_BIND:
 	case OMP_CLAUSE_NOHOST:
 	case OMP_CLAUSE_TILE:
+	case OMP_CLAUSE_IF_PRESENT:
 	  pc = _CLAUSE_CHAIN (c);
 	  continue;
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index b082feb..b9c9747 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29833,7 +29833,9 @@ cp_parser_omp_clause_name (cp_parser *parser, bool consume_token = true)
 	result = PRAGMA_OACC_CLAUSE_HOST;
 	  break;
 	case 'i':
-	  if (!strcmp ("inbranch", p))
+	  if (!strcmp 

[PATCH 11/12 rev1] [i386] Add remainder of -mcall-ms2sysv-xlogues implementation

2017-05-04 Thread Daniel Santos
Now generates RTL with appropriate stack restore and leave patterns.  Slightly
cleaned up code that calculates the number of vector elements for clarity.

Tests are good when rebased onto gcc-7_1_0-release as HEAD currently fails to
bootstrap.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 287 +++--
 1 file changed, 278 insertions(+), 9 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f2772b2d10e..e43dc819f9a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14148,6 +14148,78 @@ ix86_elim_entry_set_got (rtx reg)
 }
 }
 
+static rtx
+gen_frame_set (rtx reg, rtx frame_reg, int offset, bool store)
+{
+  rtx addr, mem;
+
+  if (offset)
+addr = gen_rtx_PLUS (Pmode, frame_reg, GEN_INT (offset));
+  mem = gen_frame_mem (GET_MODE (reg), offset ? addr : frame_reg);
+  return gen_rtx_SET (store ? mem : reg, store ? reg : mem);
+}
+
+static inline rtx
+gen_frame_load (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, false);
+}
+
+static inline rtx
+gen_frame_store (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, true);
+}
+
+static void
+ix86_emit_outlined_ms2sysv_save (const struct ix86_frame )
+{
+  struct machine_function *m = cfun->machine;
+  const unsigned ncregs = NUM_X86_64_MS_CLOBBERED_REGS
+ + m->call_ms2sysv_extra_regs;
+  rtvec v = rtvec_alloc (ncregs + 1);
+  unsigned int align, i, vi = 0;
+  rtx_insn *insn;
+  rtx sym, addr;
+  rtx rax = gen_rtx_REG (word_mode, AX_REG);
+  const struct xlogue_layout  = xlogue_layout::get_instance ();
+  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
+  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - 
m->fs.sp_offset;
+  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+
+  /* Verify that the incoming stack 16-byte alignment offset matches the
+ layout we're using.  */
+  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+
+  /* Get the stub symbol.  */
+  sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
+ : XLOGUE_STUB_SAVE);
+  RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
+
+  /* Setup RAX as the stub's base pointer.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (rax_offset, );
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  insn = emit_insn (gen_rtx_SET (rax, addr));
+
+  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+GEN_INT (-stack_alloc_size), -1,
+m->fs.cfa_reg == stack_pointer_rtx);
+  for (i = 0; i < ncregs; ++i)
+{
+  const xlogue_layout::reginfo  = xlogue.get_reginfo (i);
+  rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
+r.regno);
+  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+}
+
+  gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
+
+  insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
+  RTX_FRAME_RELATED_P (insn) = true;
+}
+
 /* Expand the prologue into a bunch of separate insns.  */
 
 void
@@ -14395,7 +14467,7 @@ ix86_expand_prologue (void)
 performing the actual alignment.  Otherwise we cannot guarantee
 that there's enough storage above the realignment point.  */
   allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
-  if (allocate)
+  if (allocate && !m->call_ms2sysv)
 pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
   GEN_INT (-allocate), -1, false);
 
@@ -14403,7 +14475,6 @@ ix86_expand_prologue (void)
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
stack_pointer_rtx,
GEN_INT (-align_bytes)));
-
   /* For the purposes of register save area addressing, the stack
 pointer can no longer be used to access anything in the frame
 below m->fs.sp_realigned_offset and the frame pointer cannot be
@@ -14420,6 +14491,9 @@ ix86_expand_prologue (void)
m->fs.sp_valid = false;
 }
 
+  if (m->call_ms2sysv)
+ix86_emit_outlined_ms2sysv_save (frame);
+
   allocate = frame.stack_pointer_offset - m->fs.sp_offset;
 
   if (flag_stack_usage_info)
@@ -14740,17 +14814,19 @@ ix86_emit_restore_regs_using_pop (void)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false, true))
   ix86_emit_restore_reg_using_pop (gen_rtx_REG (word_mode, regno));
 }
 
-/* Emit code and notes for the LEAVE instruction.  */
+/* Emit code and notes 

[PATCH, rs6000] Fix vec_xl and vec_xst intrinsics for P8

2017-05-04 Thread Bill Schmidt
Hi,

In an earlier patch, I changed vec_xl and vec_xst to make use of new
POWER9 instructions when loading or storing vector short/char values.
In so doing, I failed to enable the existing instruction use for
-mcpu=power8, so these were no longer considered valid by the compiler.
Not good.

This patch fixes the problem by using other existing built-in definitions
when the POWER9 instructions are not available.  I've added a test case
to improve coverage and demonstrate that the problem is fixed.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


[gcc]

2017-05-04  Bill Schmidt  

* config/rs6000/rs6000.c: Define POWER8 built-ins for vec_xl and
vec_xst with short and char pointer arguments.

[gcc/testsuite]

2017-05-04  Bill Schmidt  

* gcc.target/powerpc/p8-vec-xl-xst.c: New file.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 247560)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -18183,6 +18183,17 @@ altivec_init_builtins (void)
   def_builtin ("__builtin_vsx_st_elemrev_v16qi",
   void_ftype_v16qi_long_pvoid, VSX_BUILTIN_ST_ELEMREV_V16QI);
 }
+  else
+{
+  rs6000_builtin_decls[(int)VSX_BUILTIN_LD_ELEMREV_V8HI]
+   = rs6000_builtin_decls[(int)VSX_BUILTIN_LXVW4X_V8HI];
+  rs6000_builtin_decls[(int)VSX_BUILTIN_LD_ELEMREV_V16QI]
+   = rs6000_builtin_decls[(int)VSX_BUILTIN_LXVW4X_V16QI];
+  rs6000_builtin_decls[(int)VSX_BUILTIN_ST_ELEMREV_V8HI]
+   = rs6000_builtin_decls[(int)VSX_BUILTIN_STXVW4X_V8HI];
+  rs6000_builtin_decls[(int)VSX_BUILTIN_ST_ELEMREV_V16QI]
+   = rs6000_builtin_decls[(int)VSX_BUILTIN_STXVW4X_V16QI];
+}
 
   def_builtin ("__builtin_vec_vsx_ld", opaque_ftype_long_pcvoid,
   VSX_BUILTIN_VEC_LD);
Index: gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst.c
===
--- gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst.c(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst.c(working copy)
@@ -0,0 +1,62 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+/* Verify fix for problem where vec_xl and vec_xst are not recognized
+   for the vector char and vector short cases on P8 only.  */
+
+#include 
+
+vector unsigned char
+foo (unsigned char * address)
+{
+  return __builtin_vec_xl (0, address);
+}
+
+void
+bar (vector unsigned char x, unsigned char * address)
+{
+  __builtin_vec_xst (x, 0, address);
+}
+
+vector unsigned short
+foot (unsigned short * address)
+{
+  return __builtin_vec_xl (0, address);
+}
+
+void
+bart (vector unsigned short x, unsigned short * address)
+{
+  __builtin_vec_xst (x, 0, address);
+}
+
+vector unsigned char
+fool (unsigned char * address)
+{
+  return vec_xl (0, address);
+}
+
+void
+barl (vector unsigned char x, unsigned char * address)
+{
+  vec_xst (x, 0, address);
+}
+
+vector unsigned short
+footle (unsigned short * address)
+{
+  return vec_xl (0, address);
+}
+
+void
+bartle (vector unsigned short x, unsigned short * address)
+{
+  vec_xst (x, 0, address);
+}
+
+/* { dg-final { scan-assembler-times "lxvd2x"   4 } } */
+/* { dg-final { scan-assembler-times "stxvd2x"  4 } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 8 } } */



[PATCH 09/12 rev1] [i386] Add patterns and predicates mcall-ms2sysv-xlogues

2017-05-04 Thread Daniel Santos
I've cleaned up the patterns and predicates as per your instructions, resulting
in 74 less lines of code.  Adding explicit insns to restore the stack pointer
and pointer perform the "leave" (to the patterns restore_multiple_and_return
and restore_multiple_leave_return, respectively) disambiguates them just fine
without the const_int tag while correctly describing exactly what the pattern
does.

Thanks for your guidance.  I understand RTL much better now.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/predicates.md | 81 +++
 gcc/config/i386/sse.md| 37 
 2 files changed, 118 insertions(+)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 8f250a2e720..e7371a41b16 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1657,3 +1657,84 @@
   (ior (match_operand 0 "register_operand")
(and (match_code "const_int")
(match_test "op == constm1_rtx"
+
+;; Return true if the vector ends with between 12 and 18 register saves using
+;; RAX as the base address.
+(define_predicate "save_multiple"
+  (match_code "parallel")
+{
+  const unsigned len = XVECLEN (op, 0);
+  unsigned i;
+
+  /* Starting from end of vector, count register saves.  */
+  for (i = 0; i < len; ++i)
+{
+  rtx src, dest, addr;
+  rtx e = XVECEXP (op, 0, len - 1 - i);
+
+  if (GET_CODE (e) != SET)
+   break;
+
+  src  = SET_SRC (e);
+  dest = SET_DEST (e);
+
+  if (!REG_P (src) || !MEM_P (dest))
+   break;
+
+  addr = XEXP (dest, 0);
+
+  /* Good if dest address is in RAX.  */
+  if (REG_P (addr) && REGNO (addr) == AX_REG)
+   continue;
+
+  /* Good if dest address is offset of RAX.  */
+  if (GET_CODE (addr) == PLUS
+ && REG_P (XEXP (addr, 0))
+ && REGNO (XEXP (addr, 0)) == AX_REG)
+   continue;
+
+  break;
+}
+  return (i >= 12 && i <= 18);
+})
+
+
+;; Return true if the vector ends with between 12 and 18 register loads using
+;; RSI as the base address.
+(define_predicate "restore_multiple"
+  (match_code "parallel")
+{
+  const unsigned len = XVECLEN (op, 0);
+  unsigned i;
+
+  /* Starting from end of vector, count register restores.  */
+  for (i = 0; i < len; ++i)
+{
+  rtx src, dest, addr;
+  rtx e = XVECEXP (op, 0, len - 1 - i);
+
+  if (GET_CODE (e) != SET)
+   break;
+
+  src  = SET_SRC (e);
+  dest = SET_DEST (e);
+
+  if (!MEM_P (src) || !REG_P (dest))
+   break;
+
+  addr = XEXP (src, 0);
+
+  /* Good if src address is in RSI.  */
+  if (REG_P (addr) && REGNO (addr) == SI_REG)
+   continue;
+
+  /* Good if src address is offset of RSI.  */
+  if (GET_CODE (addr) == PLUS
+ && REG_P (XEXP (addr, 0))
+ && REGNO (XEXP (addr, 0)) == SI_REG)
+   continue;
+
+  break;
+}
+  return (i >= 12 && i <= 18);
+})
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 094404bc913..d488b25c254 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -20010,3 +20010,40 @@
   (match_operand:VI48_512 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512VPOPCNTDQ"
   "vpopcnt\t{%1, %0|%0, %1}")
+
+;; Save multiple registers out-of-line.
+(define_insn "save_multiple"
+  [(match_parallel 0 "save_multiple"
+[(use (match_operand:P 1 "symbol_operand"))])]
+  "TARGET_SSE && TARGET_64BIT"
+  "call\t%P1")
+
+;; Restore multiple registers out-of-line.
+(define_insn "restore_multiple"
+  [(match_parallel 0 "restore_multiple"
+[(use (match_operand:P 1 "symbol_operand"))])]
+  "TARGET_SSE && TARGET_64BIT"
+  "call\t%P1")
+
+;; Restore multiple registers out-of-line and return.
+(define_insn "restore_multiple_and_return"
+  [(match_parallel 0 "restore_multiple"
+[(return)
+ (use (match_operand:P 1 "symbol_operand"))
+ (set (reg:DI SP_REG) (reg:DI R10_REG))
+])]
+  "TARGET_SSE && TARGET_64BIT"
+  "jmp\t%P1")
+
+;; Restore multiple registers out-of-line when hard frame pointer is used,
+;; perform the leave operation prior to returning (from the function).
+(define_insn "restore_multiple_leave_return"
+  [(match_parallel 0 "restore_multiple"
+[(return)
+ (use (match_operand:P 1 "symbol_operand"))
+ (set (reg:DI SP_REG) (plus:DI (reg:DI BP_REG) (const_int 8)))
+ (set (reg:DI BP_REG) (mem:DI (reg:DI BP_REG)))
+ (clobber (mem:BLK (scratch)))
+])]
+  "TARGET_SSE && TARGET_64BIT"
+  "jmp\t%P1")
-- 
2.11.0



Re: [PATCH] prevent -Wno-system-headers from suppressing -Wstringop-overflow (PR 79214)

2017-05-04 Thread Martin Sebor

On 05/04/2017 01:17 PM, Jeff Law wrote:

On 01/25/2017 02:12 PM, Martin Sebor wrote:

While putting together examples for the GCC 7 changes document
I noticed that a few of the buffer overflow warnings issued by
-Wstringop-overflow are defeated by Glibc's macros for string
manipulation functions like strncat and strncpy.

While testing my fix I also noticed that I had missed a couple
of functions when implementing the warning: memmove and stpcpy.

The attached patch adds handlers for those and fixes the three
bugs below I raised for these omissions.

Is this patch okay for trunk?

PR preprocessor/79214 -  -Wno-system-header defeats strncat buffer
   overflow warnings
PR middle-end/79222 - missing -Wstringop-overflow= on a stpcpy overflow
PR middle-end/79223 - missing -Wstringop-overflow on a memmove overflow

Martin

gcc-79214.diff


PR preprocessor/79214 -  -Wno-system-header defeats strncat buffer
overflow warnings
PR middle-end/79222 - missing -Wstringop-overflow= on a stpcpy overflow
PR middle-end/79223 - missing -Wstringop-overflow on a memmove overflow

gcc/ChangeLog:

PR preprocessor/79214
PR middle-end/79222
PR middle-end/79223
* builtins.c (check_sizes): Add inlinining context and issue

s/inlinining/inlining/


warnings even when -Wno-system-headers is set.
(check_strncat_sizes): Same.
(expand_builtin_strncat): Same.
(expand_builtin_memmove): New function.
(expand_builtin_stpncpy): Same.
(expand_builtin): Handle memmove and stpncpy.

gcc/testsuite/ChangeLog:

PR preprocessor/79214
PR middle-end/79222
PR middle-end/79223
* gcc.dg/pr79214.c: New test.
* gcc.dg/pr79214.h: New test header.
* gcc.dg/pr79222.c: New test.
* gcc.dg/pr79223.c: New test.
* gcc.dg/pr78138.c: Adjust.

OK with the ChangeLog nit fixed.


Done.  Are bugs of this type candidates for backporting to release
branches?

Martin


Bump version namespace and remove _Rb_tree useless template parameter

2017-05-04 Thread François Dumont

Hi

Here is the patch to remove the useless _Is_pod_comparator 
_Rb_tree_impl template parameter. As this is an ABI breaking change it 
is limited to the versioned namespace mode and the patch also bump the 
namespace version.


Working on this patch I wonder if the gnu-versioned-namespace.ver 
is really up to date. The list of export expressions is far smaller than 
the one in gnu.ver. Would the testsuite show that some symbols are not 
properly exported ?


Bump version namespace.
* config/abi/pre/gnu-versioned-namespace.ver: Bump version namespace
from __7 to __8. Bump GLIBCXX_7.0 into GLIBCXX_8.0.
* include/bits/c++config: Adapt.
* include/bits/regex.h: Adapt.
* include/experimental/bits/fs_fwd.h: Adapt.
* include/experimental/bits/lfts_config.h: Adapt.
* include/std/variant: Adapt.
* python/libstdcxx/v6/printers.py: Adapt.
* testsuite/libstdc++-prettyprinters/48362.cc: Adapt.
* include/bits/stl_tree.h (_Rb_tree_impl<>): Remove _Is_pod_comparator
template parameter when version namespace is active.

Tested under Linux x86_64 with version namespace.

Ok to commit ?

François


diff --git a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
index 5fc627c..1721810 100644
--- a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
+++ b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
@@ -19,7 +19,7 @@
 ## with this library; see the file COPYING3.  If not see
 ## .
 
-GLIBCXX_7.0 {
+GLIBCXX_8.0 {
 
   global:
 
@@ -27,7 +27,7 @@ GLIBCXX_7.0 {
 extern "C++"
 {
   std::*;
-  std::__7::*;
+  std::__8::*;
   std::random_device::*
 };
 
@@ -60,7 +60,7 @@ GLIBCXX_7.0 {
 # vtable
 _ZTVSt*;
 _ZTVNSt*;
-_ZTVN9__gnu_cxx3__718stdio_sync_filebufI[cw]NSt3__711char_traitsI[cw];
+_ZTVN9__gnu_cxx3__818stdio_sync_filebufI[cw]NSt3__811char_traitsI[cw];
 
 # thunk
 _ZTv0_n12_NS*;
@@ -75,51 +75,51 @@ GLIBCXX_7.0 {
 _ZTSNSt*;
 
 # locale
-_ZNSt3__79has_facetINS_*;
+_ZNSt3__89has_facetINS_*;
 
 # hash
-_ZNSt8__detail3__712__prime_listE;
-_ZNSt3tr18__detail3__712__prime_listE;
+_ZNSt8__detail3__812__prime_listE;
+_ZNSt3tr18__detail3__812__prime_listE;
 
 # thread/mutex/condition_variable/future
 __once_proxy;
 
 # std::__detail::_List_node_base
-_ZNSt8__detail3__715_List_node_base7_M_hook*;
-_ZNSt8__detail3__715_List_node_base9_M_unhookEv;
-_ZNSt8__detail3__715_List_node_base10_M_reverseEv;
-_ZNSt8__detail3__715_List_node_base11_M_transfer*;
-_ZNSt8__detail3__715_List_node_base4swapER*;
+_ZNSt8__detail3__815_List_node_base7_M_hook*;
+_ZNSt8__detail3__815_List_node_base9_M_unhookEv;
+_ZNSt8__detail3__815_List_node_base10_M_reverseEv;
+_ZNSt8__detail3__815_List_node_base11_M_transfer*;
+_ZNSt8__detail3__815_List_node_base4swapER*;
 
 # std::__convert_to_v
-_ZNSt3__714__convert_to_v*;
+_ZNSt3__814__convert_to_v*;
 
 # std::__copy_streambufs
-_ZNSt3__717__copy_streambufsI*;
-_ZNSt3__721__copy_streambufs_eofI*;
+_ZNSt3__817__copy_streambufsI*;
+_ZNSt3__821__copy_streambufs_eofI*;
 
 # __gnu_cxx::__atomic_add
 # __gnu_cxx::__exchange_and_add
-_ZN9__gnu_cxx3__712__atomic_addEPV[il][il];
-_ZN9__gnu_cxx3__718__exchange_and_addEPV[li][il];
+_ZN9__gnu_cxx3__812__atomic_addEPV[il][il];
+_ZN9__gnu_cxx3__818__exchange_and_addEPV[li][il];
 
 # __gnu_cxx::__pool
-_ZN9__gnu_cxx3__76__poolILb[01]EE13_M_initializeEv;
-_ZN9__gnu_cxx3__76__poolILb[01]EE16_M_reserve_blockE[jmy][jmy];
-_ZN9__gnu_cxx3__76__poolILb[01]EE16_M_reclaim_blockEPc[jmy];
-_ZN9__gnu_cxx3__76__poolILb[01]EE10_M_destroyEv;
-_ZN9__gnu_cxx3__76__poolILb1EE16_M_get_thread_idEv;
+_ZN9__gnu_cxx3__86__poolILb[01]EE13_M_initializeEv;
+_ZN9__gnu_cxx3__86__poolILb[01]EE16_M_reserve_blockE[jmy][jmy];
+_ZN9__gnu_cxx3__86__poolILb[01]EE16_M_reclaim_blockEPc[jmy];
+_ZN9__gnu_cxx3__86__poolILb[01]EE10_M_destroyEv;
+_ZN9__gnu_cxx3__86__poolILb1EE16_M_get_thread_idEv;
 
-_ZN9__gnu_cxx3__717__pool_alloc_base9_M_refillE[jmy];
-_ZN9__gnu_cxx3__717__pool_alloc_base16_M_get_free_listE[jmy];
-_ZN9__gnu_cxx3__717__pool_alloc_base12_M_get_mutexEv;
+_ZN9__gnu_cxx3__817__pool_alloc_base9_M_refillE[jmy];
+_ZN9__gnu_cxx3__817__pool_alloc_base16_M_get_free_listE[jmy];
+_ZN9__gnu_cxx3__817__pool_alloc_base12_M_get_mutexEv;
 
-_ZN9__gnu_cxx3__79free_list6_M_getE[jmy];
-_ZN9__gnu_cxx3__79free_list8_M_clearEv;
+_ZN9__gnu_cxx3__89free_list6_M_getE[jmy];
+_ZN9__gnu_cxx3__89free_list8_M_clearEv;
 
 # __gnu_cxx::stdio_sync_filebuf
-_ZTVN9__gnu_cxx3__718stdio_sync_filebufI[cw]St3__711char_traitsI[cw]EEE;
-_ZN9__gnu_cxx3__718stdio_sync_filebufI[cw]NSt3__711char_traitsI[cw]EEE[5-9]*;
+_ZTVN9__gnu_cxx3__818stdio_sync_filebufI[cw]St3__811char_traitsI[cw]EEE;
+   

Re: [PATCH] warn for reading past the end by library functions (PR 54924, 79234)

2017-05-04 Thread Jeff Law

On 04/20/2017 04:49 PM, Martin Sebor wrote:

PR libstdc++/54924 - Warn for std::string constructor with wrong
size asks for a warning when constructing a std::string from
a character array and a number of elements that's in excess of
the number of elements.  E.g.,

   std::string s ("abc", 7);

PR middle-end/79234 - warn on past the end reads by library functions
is a more general enhancement that suggests warning for calls to any
standard library functions that read past the end of a provided array.
For example:

   char a[8];
   memcpy (a, "abcdef", sizeof a);

The attached patch extends the -Wstringop-overflow warning to also
detect and warn for reading past the end in memcmp, memchr, memcpy,
and mempcpy.  The patch doesn't handle memmove because there's
a separate bug for -Wstringop-overflow not handling the function.
A patch for it was submitted in January and deferred to GCC 8:

https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01994.html

Although the patch handles the std::string case fine the warning
for it is suppressed by -Wsystem-headers.  There's also a separate
bug for that (bug 79214) and a patch for it was submitted back in
January and deferred to GCC 8:

https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01994.html

Note I just ack'd the deferred patch noted above.




Martin


gcc-79234.diff


PR libstdc++/54924 - Warn for std::string constructor with wrong size
PR middle-end/79234 - warn on past the end reads by library functions

gcc/ChangeLog:

PR middle-end/79234
* builtins.c (check_sizes): Adjust to handle reading past the end.
Avoid printing excessive upper bound of ranges.
(expand_builtin_memchr): New function.
(compute_dest_size): Rename...
(compute_objsize): ...to this.
(expand_builtin_memcpy): Adjust.
(expand_builtin_mempcpy): Adjust.
(expand_builtin_strcat): Adjust.
(expand_builtin_strcpy): Adjust.
(check_strncat_sizes): Adjust.
(expand_builtin_strncat): Adjust.
(expand_builtin_strncpy): Adjust and simplify.
(expand_builtin_memset): Adjust.
(expand_builtin_bzero): Adjust.
(expand_builtin_memcmp): Adjust.
(expand_builtin): Handle memcmp.
(maybe_emit_chk_warning): Check strncat just once.

gcc/testsuite/ChangeLog:

PR middle-end/79234
* gcc.dg/builtin-stringop-chk-8.c: New test.
* gcc.dg/builtin-stringop-chk-1.c: Adjust.
* gcc.dg/builtin-stringop-chk-4.c: Same.
* gcc.dg/builtin-strncat-chk-1.c: Same.
* g++.dg/ext/strncpy-chk1.C: Same.
* g++.dg/torture/Wsizeof-pointer-memaccess1.C: Same.
* gcc.dg/out-of-bounds-1.c: Same.
* gcc.dg/pr78138.c: Same.
* gcc.dg/torture/Wsizeof-pointer-memaccess1.c: Same.
* gfortran.dg/mvbits_7.f90: Same.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index f3bee5b..892f576 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3044,10 +3045,10 @@ expand_builtin_memcpy_args (tree dest, tree src, tree 
len, rtx target, tree exp)
 MAXLEN is the user-supplied bound on the length of the source sequence
 (such as in strncat(d, s, N).  It specifies the upper limit on the number
 of bytes to write.
-   STR is the source string (such as in strcpy(d, s)) when the epxression
+   SRC is the source string (such as in strcpy(d, s)) when the epxression

s/epxression/expression

OK with the nit fixed.

jeff



Re: [PATCH] prevent -Wno-system-headers from suppressing -Wstringop-overflow (PR 79214)

2017-05-04 Thread Jeff Law

On 01/25/2017 02:12 PM, Martin Sebor wrote:

While putting together examples for the GCC 7 changes document
I noticed that a few of the buffer overflow warnings issued by
-Wstringop-overflow are defeated by Glibc's macros for string
manipulation functions like strncat and strncpy.

While testing my fix I also noticed that I had missed a couple
of functions when implementing the warning: memmove and stpcpy.

The attached patch adds handlers for those and fixes the three
bugs below I raised for these omissions.

Is this patch okay for trunk?

PR preprocessor/79214 -  -Wno-system-header defeats strncat buffer
   overflow warnings
PR middle-end/79222 - missing -Wstringop-overflow= on a stpcpy overflow
PR middle-end/79223 - missing -Wstringop-overflow on a memmove overflow

Martin

gcc-79214.diff


PR preprocessor/79214 -  -Wno-system-header defeats strncat buffer overflow 
warnings
PR middle-end/79222 - missing -Wstringop-overflow= on a stpcpy overflow
PR middle-end/79223 - missing -Wstringop-overflow on a memmove overflow

gcc/ChangeLog:

PR preprocessor/79214
PR middle-end/79222
PR middle-end/79223
* builtins.c (check_sizes): Add inlinining context and issue

s/inlinining/inlining/


warnings even when -Wno-system-headers is set.
(check_strncat_sizes): Same.
(expand_builtin_strncat): Same.
(expand_builtin_memmove): New function.
(expand_builtin_stpncpy): Same.
(expand_builtin): Handle memmove and stpncpy.

gcc/testsuite/ChangeLog:

PR preprocessor/79214
PR middle-end/79222
PR middle-end/79223
* gcc.dg/pr79214.c: New test.
* gcc.dg/pr79214.h: New test header.
* gcc.dg/pr79222.c: New test.
* gcc.dg/pr79223.c: New test.
* gcc.dg/pr78138.c: Adjust.

OK with the ChangeLog nit fixed.

jeff


Re: C PATCH to fix missing -Wlogical-op warning (PR c/80525)

2017-05-04 Thread Jeff Law

On 05/04/2017 06:23 AM, Marek Polacek wrote:

On Thu, May 04, 2017 at 02:13:24PM +0200, Richard Biener wrote:

On Thu, May 4, 2017 at 2:11 PM, Marek Polacek  wrote:

On Thu, May 04, 2017 at 12:42:03PM +0200, Richard Biener wrote:

+static tree
+unwrap_c_maybe_const (tree *tp, int *walk_subtrees, void *)
+{
+  if (TREE_CODE (*tp) == C_MAYBE_CONST_EXPR)
+{
+  *tp = C_MAYBE_CONST_EXPR_EXPR (*tp);
+  /* C_MAYBE_CONST_EXPRs don't nest.  */
+  *walk_subtrees = false;


This changes trees in-place -- do you need to operate on a copy?


Ugh, yes.  But I can't simply copy_node, because that creates new VAR_DECLs,
and operand_equal_p would consider them unequal.  Hmm...  We need something
else.


unshare_expr?


Yeah, so:

2017-05-04  Marek Polacek  

PR c/80525
* c-warn.c (unwrap_c_maybe_const): New.
(warn_logical_operator): Call it.

* c-c++-common/Wlogical-op-1.c: Don't use -fwrapv anymore.
* c-c++-common/Wlogical-op-2.c: New test.

OK.
jeff


Re: PR80613

2017-05-04 Thread Jeff Law

On 05/04/2017 10:00 AM, Prathamesh Kulkarni wrote:

Hi,
As mentioned in PR, the issue is that cddce1 marks the call to
__builtin_strdup as necessary:
marking necessary through .MEM_6 stmt p_7 = __builtin_strdup ();

and since p_7 doesn't get added to worklist in propagate_necessity()
because it's used only within free(), it's treated as "dead"
and wrongly gets released.
The patch fixes that by adding strdup/strndup in corresponding condition
in eliminate_unnecessary_stmts().

Another issue, was that my previous patch failed to remove multiple
calls to strdup:
char *f(char **tt)
{
   char *t = *tt;
   char *p;

   p = __builtin_strdup (t);
   p = __builtin_strdup (t);
   return p;
}

That's fixed in patch by adding strdup/strndup to another
corresponding condition in propagate_necessity() so that only one
instance of strdup would be kept.

Bootstrapped+tested on x86_64-unknown-linux-gnu.
Cross-testing on arm*-*-* and aarch64*-*-* in progress.
OK to commit if testing passes ?

Thanks
Prathamesh


pr80613-1.txt


2017-05-04  Prathamesh Kulkarni

PR tree-optimization/80613
* tree-ssa-dce.c (propagate_necessity): Add cases for BUILT_IN_STRDUP
and BUILT_IN_STRNDUP.
* (eliminate_unnecessary_stmts): Likewise.

testsuite/
* gcc.dg/tree-ssa/pr80613-1.c: New test-case.
* gcc.dg/tree-ssa/pr80613-2.c: New test-case.
So I'm comfortable with the change to eliminate_unnecessary_stmts as 
well as the associated testcase pr80613-1.c.  GIven that addresses the 
core of the bug, I'd go ahead and install that part immediately.


I'm still trying to understand the code in propagate_necessity.





diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr80613-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr80613-1.c
new file mode 100644
index 000..56176427922
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr80613-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+char *a(int);
+int b;
+
+void c() {
+  for (;;) {
+char d = *a(b);
+char *e = __builtin_strdup ();
+__builtin_free(e);
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr80613-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr80613-2.c
new file mode 100644
index 000..c58cc08d6c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr80613-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce1" } */
+
+/* There should only be one instance of __builtin_strdup after cddce1.  */
+
+char *f(char **tt)
+{
+  char *t = *tt;
+  char *p;
+
+  p = __builtin_strdup (t);
+  p = __builtin_strdup (t);
+  return p;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_strdup" 1 "cddce1" } } */
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index e17659df91f..7c05f981307 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -852,7 +852,9 @@ propagate_necessity (bool aggressive)
  == BUILT_IN_ALLOCA_WITH_ALIGN)
  || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_SAVE
  || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE
- || DECL_FUNCTION_CODE (callee) == 
BUILT_IN_ASSUME_ALIGNED))
+ || DECL_FUNCTION_CODE (callee) == BUILT_IN_ASSUME_ALIGNED
+ || DECL_FUNCTION_CODE (callee) == BUILT_IN_STRDUP
+ || DECL_FUNCTION_CODE (callee) == BUILT_IN_STRNDUP))
continue;
What I'm struggling with is that str[n]dup read from the memory pointed 
to by their incoming argument, so ISTM they are not "merely acting are 
barriers or that only store to memory" and thus shouldn't be treated 
like memset, malloc & friends.  Otherwise couldn't we end up incorrectly 
removing a store to memory that is only read by a live strdup?


So while I agree you ought to be able to remove the first call to strdup 
in the second testcase, I'm not sure the approach you've used works 
correctly.


jeff




Re: PR80613

2017-05-04 Thread Jeff Law

On 05/04/2017 11:52 AM, Martin Sebor wrote:

On 05/04/2017 10:00 AM, Prathamesh Kulkarni wrote:

Hi,
As mentioned in PR, the issue is that cddce1 marks the call to
__builtin_strdup as necessary:
marking necessary through .MEM_6 stmt p_7 = __builtin_strdup ();

and since p_7 doesn't get added to worklist in propagate_necessity()
because it's used only within free(), it's treated as "dead"
and wrongly gets released.
The patch fixes that by adding strdup/strndup in corresponding condition
in eliminate_unnecessary_stmts().

Another issue, was that my previous patch failed to remove multiple
calls to strdup:
char *f(char **tt)
{
  char *t = *tt;
  char *p;

  p = __builtin_strdup (t);
  p = __builtin_strdup (t);
  return p;


Since this is clearly a bug in the program -- the first result
leaks -- would it make sense to issue a warning before removing
the duplicate call?  (It would be nice to issue a warning not
just for strdup but also for other memory allocation functions,
so perhaps that should be a separate enhancement request.)

Seems like it should be a separate enhancement request.

jeff


[C++ PATCH] fix constrait diag

2017-05-04 Thread Nathan Sidwell

Changing '%qE %S' to '%' is missing a % and ICES constraints/req4.

Fixed thusly.
--
Nathan Sidwell
2017-05-04  Nathan Sidwell  

	* constraint.cc (diagnose_check_constraint): Fix %E thinko.

Index: constraint.cc
===
--- constraint.cc	(revision 247613)
+++ constraint.cc	(working copy)
@@ -2859,7 +2859,7 @@ diagnose_check_constraint (location_t lo
 {
   if (elide_constraint_failure_p ())
 return;
-  inform (loc, "in the expansion of concept %", check, sub);
+  inform (loc, "in the expansion of concept %<%E %S%>", check, sub);
   cur = get_concept_definition (decl);
   tsubst_expr (cur, targs, tf_warning_or_error, NULL_TREE, false);
   return;


Re: [Patch, fortran] PR70071

2017-05-04 Thread Harald Anlauf
On 05/04/17 18:15, Steve Kargl wrote:
> On Thu, May 04, 2017 at 05:26:17PM +0200, Harald Anlauf wrote:
>> While trying to clean up my working copy, I found that the trivial
>> patch for the ICE-on-invalid as described in the PR regtests cleanly
>> for 7-release on i686-pc-linux-gnu.
>>
>> Here's the cleaned-up version (diffs attached).
>>
>> 2017-05-04  Harald Anlauf  
>>
>>  PR fortran/70071
>>  * array.c (gfc_ref_dimen_size): Handle bad subscript triplets.
>>
>> 2017-05-04  Harald Anlauf  
>>
>>  PR fortran/70071
>>  * gfortran.dg/coarray_44.f90: New testcase.
>>
> 
> Harald,
> 
> The patch looks reasonable.  Do you have a commit privilege?
>

Steve,

no, I don't.

Would you like to take care of the patch?  Then please do so.

Thanks,
Harald



Re: PR80613

2017-05-04 Thread Trevor Saunders
On Thu, May 04, 2017 at 11:52:31AM -0600, Martin Sebor wrote:
> On 05/04/2017 10:00 AM, Prathamesh Kulkarni wrote:
> > Hi,
> > As mentioned in PR, the issue is that cddce1 marks the call to
> > __builtin_strdup as necessary:
> > marking necessary through .MEM_6 stmt p_7 = __builtin_strdup ();
> > 
> > and since p_7 doesn't get added to worklist in propagate_necessity()
> > because it's used only within free(), it's treated as "dead"
> > and wrongly gets released.
> > The patch fixes that by adding strdup/strndup in corresponding condition

so, I think it doesn't actually matter since we can completely remove
the strdup(), but in this casewe could also replace the strdup() with
calloc(1, 1) though I'm not sure it would ever happen in practice.  The
reasoning is that accessing ()[] for any x other than 0 would be
invalid.  if ()[0] aka d is not '\0' then strdup() will access
()[1].  Therefore we can infer that d is 0, and so the strdup() must
allocate one byte whose value is 0.

> > in eliminate_unnecessary_stmts().
> > 
> > Another issue, was that my previous patch failed to remove multiple
> > calls to strdup:
> > char *f(char **tt)
> > {
> >   char *t = *tt;
> >   char *p;
> > 
> >   p = __builtin_strdup (t);
> >   p = __builtin_strdup (t);
> >   return p;
> 
> Since this is clearly a bug in the program -- the first result
> leaks -- would it make sense to issue a warning before removing
> the duplicate call?  (It would be nice to issue a warning not
> just for strdup but also for other memory allocation functions,
> so perhaps that should be a separate enhancement request.)

I'm actually planning on looking at this this weekend after meaning to
for a while.  My main goal was to catch places where unique_ptr could be
used, but I think some leak detection can live in the same place.

Trev

> 
> Martin
> 
> > }
> > 
> > That's fixed in patch by adding strdup/strndup to another
> > corresponding condition in propagate_necessity() so that only one
> > instance of strdup would be kept.
> > 
> > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > Cross-testing on arm*-*-* and aarch64*-*-* in progress.
> > OK to commit if testing passes ?
> > 
> > Thanks
> > Prathamesh
> > 
> 


Re: [PATCH GCC8][17/33]Treat complex cand step as invriant expression

2017-05-04 Thread Bin.Cheng
On Wed, May 3, 2017 at 2:43 PM, Richard Biener
 wrote:
> On Tue, Apr 18, 2017 at 12:46 PM, Bin Cheng  wrote:
>> Hi,
>> We generally need to compute cand step in loop preheader and use it in loop 
>> body.
>> Unless it's an SSA_NAME of constant integer, an invariant expression is 
>> needed.
>
> I'm confused as to what this patch does.  Comments talk about "Handle step as"
> but then we print "Depend on inv...".  And we share bitmaps, well it seems
>
> + find_inv_vars (data, , >inv_vars);
> +
> + iv_inv_expr_ent *inv_expr = get_loop_invariant_expr (data, step);
> + /* Share bitmap between inv_vars and inv_exprs for cand.  */
> + if (inv_expr != NULL)
> +   {
> + cand->inv_exprs = cand->inv_vars;
> + cand->inv_vars = NULL;
> + if (cand->inv_exprs)
> +   bitmap_clear (cand->inv_exprs);
> + else
> +   cand->inv_exprs = BITMAP_ALLOC (NULL);
> +
> + bitmap_set_bit (cand->inv_exprs, inv_expr->id);
>
> just shares the bitmap allocation (and leaks cand->inv_exprs?).
>
> Note that generally it might be cheaper to use bitmap_head instead of
> bitmap in the various structures (and then bitmap_initialize ()), this
> saves one indirection.
>
> Anyway, the relation between inv_vars and inv_exprs is what confuses me.
> Maybe it's the same as for cost_pair?  invariants vs. loop invariants?
> whatever that means...
>
> That is, can you expand the comments in cost_pair / iv_cand for inv_vars
> vs. inv_exprs, esp what "invariant" actually means?
When we represent use with cand, there will be computation which is
loop invariant.  The invariant computation is an newly created
invariant expression and is based on ssa_vars existed before ivopts.
If the invariant expression is complicated, we handle and call it as
invariant expression.  We say the cost_pair depends on the inv.exprs.
If the invariant expression is simple enough, we record all existing
ssa_vars it based on in inv_vars.  We say the cost_pair depends on the
inv.vars.   The same words stand for struct iv_cand.  If cand.step is
simple enough, we simply record the ssa_var it based on in inv_vars,
otherwise, the step is a new invariant expression which doesn't exist
before, we record it in cand.inv_exprs.

Add comment for inv_vars/inv_exprs, is this OK?   I noticed there is a
redundant field cost_pair.inv_expr, I deleted it as obvious in a
standalone patch.

Thanks,
bin
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>>
>> 2017-04-11  Bin Cheng  
>>
>> * tree-ssa-loop-ivopts.c (struct iv_cand): New field inv_exprs.
>> (dump_cand): Support iv_cand.inv_exprs.
>> (add_candidate_1): Record invariant exprs in iv_cand.inv_exprs
>> for candidates.
>> (iv_ca_set_no_cp, iv_ca_set_cp, free_loop_data): Support
>> iv_cand.inv_exprs.
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 8a01e0a..93d7966 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -347,9 +347,10 @@ struct cost_pair
   struct iv_cand *cand;/* The candidate.  */
   comp_cost cost;  /* The cost.  */
   enum tree_code comp; /* For iv elimination, the comparison.  */
-  bitmap inv_vars; /* The list of invariants that have to be
-  preserved.  */
-  bitmap inv_exprs;/* Loop invariant expressions.  */
+  bitmap inv_vars; /* The list of invariant ssa_vars that have to be
+  preserved when representing iv_use with iv_cand.  */
+  bitmap inv_exprs;/* The list of newly created invariant expressions
+  when representing iv_use with iv_cand.  */
   tree value;  /* For final value elimination, the expression for
   the final value of the iv.  For iv elimination,
   the new bound to compare with.  */
@@ -419,8 +420,11 @@ struct iv_cand
   unsigned cost_step;  /* Cost of the candidate's increment operation.  */
   struct iv_use *ainc_use; /* For IP_{BEFORE,AFTER}_USE candidates, the place
  where it is incremented.  */
-  bitmap inv_vars; /* The list of invariants that are used in step of the
-  biv.  */
+  bitmap inv_vars; /* The list of invariant ssa_vars used in step of the
+  iv_cand.  */
+  bitmap inv_exprs;/* If step is more complicated than a single ssa_var,
+  hanlde it as a new invariant expression which will
+  be hoisted out of loop.  */
   struct iv *orig_iv;  /* The original iv if this cand is added from biv with
   smaller type.  */
 };
@@ -790,6 +794,11 @@ dump_cand (FILE *file, struct iv_cand *cand)
   fprintf (file, "  Depend on inv.vars: ");
   dump_bitmap (file, cand->inv_vars);
 }
+  if (cand->inv_exprs)
+{

Re: PR80613

2017-05-04 Thread Martin Sebor

On 05/04/2017 10:00 AM, Prathamesh Kulkarni wrote:

Hi,
As mentioned in PR, the issue is that cddce1 marks the call to
__builtin_strdup as necessary:
marking necessary through .MEM_6 stmt p_7 = __builtin_strdup ();

and since p_7 doesn't get added to worklist in propagate_necessity()
because it's used only within free(), it's treated as "dead"
and wrongly gets released.
The patch fixes that by adding strdup/strndup in corresponding condition
in eliminate_unnecessary_stmts().

Another issue, was that my previous patch failed to remove multiple
calls to strdup:
char *f(char **tt)
{
  char *t = *tt;
  char *p;

  p = __builtin_strdup (t);
  p = __builtin_strdup (t);
  return p;


Since this is clearly a bug in the program -- the first result
leaks -- would it make sense to issue a warning before removing
the duplicate call?  (It would be nice to issue a warning not
just for strdup but also for other memory allocation functions,
so perhaps that should be a separate enhancement request.)

Martin


}

That's fixed in patch by adding strdup/strndup to another
corresponding condition in propagate_necessity() so that only one
instance of strdup would be kept.

Bootstrapped+tested on x86_64-unknown-linux-gnu.
Cross-testing on arm*-*-* and aarch64*-*-* in progress.
OK to commit if testing passes ?

Thanks
Prathamesh





Re: [PATCH] Fix -fopt-info documentation in invoke.texi

2017-05-04 Thread Steve Ellcey
On Thu, 2017-05-04 at 12:24 +0200, Richard Biener wrote:
> > 
> > OK to checkin?
> Ok for trunk and branches.
> 
> Richard.

I just realized there is a problem/inconsistency with my patch.  The
ChangeLog says I am changing invoke.texi but the change is actually to
optinfo.texi.

It looks like invoke.texi and optinfo.texi have the same examples in
them but changes in one did not necessarily get duplicated in the other.
That is why I got confused.  invoke.texi already has the right default
(optimized-optall) for -fopt-info but optinfo.texi has the wrong one.

Here is an updated patch, the changes to optinfo.texi are the same but
I am also tweaking invoke.texi to include the explicit ordering statement
I put in optinfo.texi and I fixed the ChangeLog entry to match the actual
patch.

Since the actual textual changes are the same as before (just not where I
said they were) I will go ahead and check this in tomorrow unless there are
objections.

Steve Ellcey
sell...@cavium.com


2017-05-05  Steve Ellcey  

* doc/invoke.texi (-fopt-info): Explicitly say order of options
included in -fopt-info does not matter.
* doc/optinfo.texi (-fopt-info): Fix description of default
behavour. Explicitly say order of options included in -fopt-info
does not matter.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 68a558e..57c9678 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13219,7 +13219,8 @@ gcc -O2 -ftree-vectorize -fopt-info-vec-missed
 prints information about missed optimization opportunities from
 vectorization passes on @file{stderr}.  
 Note that @option{-fopt-info-vec-missed} is equivalent to 
-@option{-fopt-info-missed-vec}.
+@option{-fopt-info-missed-vec}.  The order of the optimization group
+names and message types listed after @option{-fopt-info} does not matter.
 
 As another example,
 @smallexample
diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi
index e17cb37..7e32efe 100644
--- a/gcc/doc/optinfo.texi
+++ b/gcc/doc/optinfo.texi
@@ -208,16 +208,18 @@ optimized locations from all the inlining passes into
 If the @var{filename} is provided, then the dumps from all the
 applicable optimizations are concatenated into the @file{filename}.
 Otherwise the dump is output onto @file{stderr}. If @var{options} is
-omitted, it defaults to @option{all-all}, which means dump all
-available optimization info from all the passes. In the following
-example, all optimization info is output on to @file{stderr}.
+omitted, it defaults to @option{optimized-optall}, which means dump
+all information about successful optimizations from all the passes.
+In the following example, the optimization information is output on
+to @file{stderr}.
 
 @smallexample
 gcc -O3 -fopt-info
 @end smallexample
 
 Note that @option{-fopt-info-vec-missed} behaves the same as
-@option{-fopt-info-missed-vec}.
+@option{-fopt-info-missed-vec}.  The order of the optimization group
+names and message types listed after @option{-fopt-info} does not matter.
 
 As another example, consider





[PATCH] Fix-it hints for -Wimplicit-fallthrough

2017-05-04 Thread David Malcolm
As of r247522, fix-it-hints can suggest the insertion of new lines.

This patch updates -Wimplicit-fallthrough to provide suggestions
with fix-it hints, showing the user where to insert "break;" or
fallthrough attributes.

For example:

 test.c: In function 'set_x':
 test.c:15:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
x = a;
~~^~~
 test.c:22:5: note: here
  case 'b':
  ^~~~
 test.c:22:5: note: insert '__attribute__ ((fallthrough));' to silence this 
warning
 +__attribute__ ((fallthrough));
  case 'b':
  ^~~~
 test.c:22:5: note: insert 'break;' to avoid fall-through
 +break;
  case 'b':
  ^~~~

The idea is that if an IDE supports -fdiagnostics-parseable-fixits, the
user can fix these issues by clicking on them.

It's loosely based on part of Marek's patch:
  https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01948.html
but with extra logic for locating a suitable place to insert the
fix-it hint, so that, if possible, it is on a line by itself before
the "case", indented the same as the prior statement.  If that's not
possible, no fix-it hint is emitted.

In Marek's patch he wrote:
  /* For C++17, we'd recommend [[fallthrough]];, but we're not
 there yet.  For C++11, recommend [[gnu::fallthrough]];.  */
"[[fallthrough]]" appears to work, but it appears that lang_hooks.name
doesn't expose C++17 yet, so the patch recommends [[gnu::fallthrough]
for C++17.


Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
PR c/7652
* diagnostic.c (diagnostic_report_diagnostic): Only add fixits
to the edit_context if they can be auto-applied.
* gimplify.c (add_newline_fixit_with_indentation): New function.
(warn_implicit_fallthrough_r): Add suggestions on how to fix
-Wimplicit-fallthrough.

gcc/testsuite/ChangeLog:
PR c/7652
* g++.dg/Wimplicit-fallthrough-fixit-c++11.C: New test case.
* g++.dg/Wimplicit-fallthrough-fixit-c++98.C: New test case.
* gcc.dg/Wimplicit-fallthrough-fixit.c: New test case.

libcpp/ChangeLog:
PR c/7652
* include/line-map.h
(rich_location::fixits_cannot_be_auto_applied): New method.
(rich_location::fixits_can_be_auto_applied_p): New accessor.
(rich_location::m_fixits_cannot_be_auto_applied): New field.
* line-map.c (rich_location::rich_location): Initialize new field.
---
 gcc/diagnostic.c   |  3 +-
 gcc/gimplify.c | 90 +-
 .../g++.dg/Wimplicit-fallthrough-fixit-c++11.C | 43 +++
 .../g++.dg/Wimplicit-fallthrough-fixit-c++98.C | 43 +++
 gcc/testsuite/gcc.dg/Wimplicit-fallthrough-fixit.c | 73 ++
 libcpp/include/line-map.h  | 22 ++
 libcpp/line-map.c  |  3 +-
 7 files changed, 274 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/Wimplicit-fallthrough-fixit-c++11.C
 create mode 100644 gcc/testsuite/g++.dg/Wimplicit-fallthrough-fixit-c++98.C
 create mode 100644 gcc/testsuite/gcc.dg/Wimplicit-fallthrough-fixit.c

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index dc81755..40509f1 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -942,7 +942,8 @@ diagnostic_report_diagnostic (diagnostic_context *context,
   diagnostic->x_data = NULL;
 
   if (context->edit_context_ptr)
-context->edit_context_ptr->add_fixits (diagnostic->richloc);
+if (diagnostic->richloc->fixits_can_be_auto_applied_p ())
+  context->edit_context_ptr->add_fixits (diagnostic->richloc);
 
   context->lock--;
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index fd27eb1..19dd6dc 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2038,6 +2038,60 @@ should_warn_for_implicit_fallthrough 
(gimple_stmt_iterator *gsi_p, tree label)
   return true;
 }
 
+/* Attempt to add a fix-it hint to RICHLOC, suggesting the insertion
+   of a new line containing LINE_CONTENT (which does not need
+   a '\n' character).
+   The new line is to be inserted immediately before the
+   line containing NEXT, using the location of PREV to determine
+   indentation.
+   The new line will only be inserted if there is nothing on
+   NEXT's line before NEXT, and we can determine the indentation
+   of PREV.  */
+
+static void
+add_newline_fixit_with_indentation (rich_location *richloc,
+   location_t prev,
+   location_t next,
+   const char *line_content)
+{
+  /* Check that the line containing NEXT is blank, up to NEXT.  */
+  int line_width;
+  expanded_location exploc_next = expand_location (next);
+  const char *line
+= location_get_source_line (exploc_next.file, exploc_next.line,
+   _width);
+  if (!line)
+return;
+  if (line_width < exploc_next.column)
+return;
+  /* 

Re: [PATCH 5/7] clean up quoting problems - c-family (PR 80280 et al.)

2017-05-04 Thread Martin Sebor

On 05/03/2017 09:56 AM, Joseph Myers wrote:

On Tue, 2 May 2017, Martin Sebor wrote:


+  inform (loc, "in the expansion of concept %qE %qS", check, sub);


Are you sure about this (two consecutive quoted strings, open quote of %qS
following closing quote of %qE) or should it be a single quoted string
%<%E %S%>?


I suspect you're right.  I've changed it to the latter.

Thanks
Martin



[PATCH OBVIOUS]Remove unused structure field inv_expr

2017-05-04 Thread Bin Cheng
Hi,
This patch removed unused field inv_expr in struct cost_pair left over by 
previous refactoring.
Build on x86_64.  Applying as obvious.

Thanks,
bin
2017-05-04  Bin Cheng  

* tree-ssa-loop-ivopts.c (struct cost_pair): Remove field inv_expr
which is not used any more.diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 7caa40d..adb985b 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -353,7 +353,6 @@ struct cost_pair
   tree value;  /* For final value elimination, the expression for
   the final value of the iv.  For iv elimination,
   the new bound to compare with.  */
-  iv_inv_expr_ent *inv_expr; /* Loop invariant expression.  */
 };
 
 /* Use.  */


Re: [PATCH][ARM] Update max_cond_insns settings

2017-05-04 Thread Wilco Dijkstra
Richard Earnshaw wrote:

> -  5, /* Max cond insns.  */
> +  2, /* Max cond insns.  */

> This parameter is also used for A32 code.  Is that really the right
> number there as well?

Yes, this parameter has always been the same for ARM and Thumb-2.

> I do wonder if the code in arm_option_params_internal should be tweaked
> to hard-limit the number of skipped insns for Thumb2 to one IT block.  So

You mean https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01191.html ? :-)

Wilco

Re: [PATCH, rs6000] Avoid vectorizing versioned copy loops with vectorization factor 2

2017-05-04 Thread Bill Schmidt
...only without the typo in the ChangeLog below...

> On May 3, 2017, at 2:43 PM, Bill Schmidt  wrote:
> 
> Hi,
> 
> We recently became aware of some poor code generation as a result of
> unprofitable (for POWER) loop vectorization.  When a loop is simply copying
> data with 64-bit loads and stores, vectorizing with 128-bit loads and stores
> generally does not provide any benefit on modern POWER processors.
> Furthermore, if there is a requirement to version the loop for aliasing,
> alignment, etc., the cost of the versioning test is almost certainly a
> performance loss for such loops.  The user code example included such a copy
> loop, executed only a few times on average, within an outer loop that was
> executed many times on average, causing a tremendous slowdown.
> 
> This patch very specifically targets these kinds of loops and no others,
> and artificially inflates the vectorization cost to ensure vectorization
> does not appear profitable.  This is done within the target model cost
> hooks to avoid affecting other targets.  A new test case is included that
> demonstrates the refusal to vectorize.
> 
> We've done SPEC performance testing to verify that the patch does not
> degrade such workloads.  Results were all in the noise range.  The
> customer code performance loss was verified to have been reversed.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this ok for trunk?
> 
> Thanks,
> Bill
> 
> 
> [gcc]
> 
> 2017-05-03  Bill Schmidt  
> 
>   * config/rs6000/rs6000.c (rs6000_vect_nonmem): New static var.
>   (rs6000_init_cost): Initialize rs6000_vect_nonmem.
>   (rs6000_add_stmt_cost): Update rs6000_vect_nonmem.
>   (rs6000_finish_cost): Avoid vectorizing simple copy loops with
>   VF=2 that require versioning.
> 
> [gcc/testsuite]
> 
> 2017-05-03  Bill Schmidt  
> 
>   * gcc.target/powerpc/veresioned-copy-loop.c: New file.

^^ fixed to "versioned".

Bill

> 
> 
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 247560)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -5873,6 +5873,8 @@ rs6000_density_test (rs6000_cost_data *data)
> 
> /* Implement targetm.vectorize.init_cost.  */
> 
> +static bool rs6000_vect_nonmem;
> +
> static void *
> rs6000_init_cost (struct loop *loop_info)
> {
> @@ -5881,6 +5883,7 @@ rs6000_init_cost (struct loop *loop_info)
>   data->cost[vect_prologue] = 0;
>   data->cost[vect_body] = 0;
>   data->cost[vect_epilogue] = 0;
> +  rs6000_vect_nonmem = false;
>   return data;
> }
> 
> @@ -5907,6 +5910,19 @@ rs6000_add_stmt_cost (void *data, int count, enum
> 
>   retval = (unsigned) (count * stmt_cost);
>   cost_data->cost[where] += retval;
> +
> +  /* Check whether we're doing something other than just a copy loop.
> +  Not all such loops may be profitably vectorized; see
> +  rs6000_finish_cost.  */
> +  if ((where == vect_body
> +&& (kind == vector_stmt || kind == vec_to_scalar || kind == vec_perm
> +|| kind == vec_promote_demote || kind == vec_construct
> +|| kind == scalar_to_vec))
> +   || (where != vect_body
> +   && (kind == vec_to_scalar || kind == vec_perm
> +   || kind == vec_promote_demote || kind == vec_construct
> +   || kind == scalar_to_vec)))
> + rs6000_vect_nonmem = true;
> }
> 
>   return retval;
> @@ -5923,6 +5939,19 @@ rs6000_finish_cost (void *data, unsigned *prologue
>   if (cost_data->loop_info)
> rs6000_density_test (cost_data);
> 
> +  /* Don't vectorize minimum-vectorization-factor, simple copy loops
> + that require versioning for any reason.  The vectorization is at
> + best a wash inside the loop, and the versioning checks make
> + profitability highly unlikely and potentially quite harmful.  */
> +  if (cost_data->loop_info)
> +{
> +  loop_vec_info vec_info = loop_vec_info_for_loop (cost_data->loop_info);
> +  if (!rs6000_vect_nonmem
> +   && LOOP_VINFO_VECT_FACTOR (vec_info) == 2
> +   && LOOP_REQUIRES_VERSIONING (vec_info))
> + cost_data->cost[vect_body] += 1;
> +}
> +
>   *prologue_cost = cost_data->cost[vect_prologue];
>   *body_cost = cost_data->cost[vect_body];
>   *epilogue_cost = cost_data->cost[vect_epilogue];
> Index: gcc/testsuite/gcc.target/powerpc/versioned-copy-loop.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/versioned-copy-loop.c(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/versioned-copy-loop.c(working copy)
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-O3 -fdump-tree-vect-details" } */
> +
> +/* Verify that a pure copy loop with a vectorization factor 

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-05-04 Thread Thomas Schwinge
Hi!

On Wed, 3 May 2017 11:00:14 +0200, Jakub Jelinek  wrote:
> On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > > In order to configure gcc to load libcuda.so.1 dynamically,
> > > one has to either configure it --without-cuda-driver, or without
> > > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> > > options if cuda.h and -lcuda aren't found in the default locations.

(I still have to follow up with my additional GCC changes...)


> > > The nvptx-tools change
> > 
> > (I'll get to that later.)
> 
> I'd like to ping the nvptx-tools change.  Shall I make a github pull request
> for that?

In the future, yes please.

This time, I've handled it in
.

> I have additional following two further patches, the first one just to shut
> up -Wformat-security warning

Tom had already submitted
 including the
same fix, which I've merged earlier today.

> the other one discovered today to fix build
> against glibc trunk - they have changed getopt related includes there

I handled that one in
.

Thanks!


Grüße
 Thomas


Re: Handle data dependence relations with different bases

2017-05-04 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, May 4, 2017 at 2:12 PM, Richard Biener
>  wrote:
>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
>>  wrote:
>>> This patch tries to calculate conservatively-correct distance
>>> vectors for two references whose base addresses are not the same.
>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
>>> isn't guaranteed to occur.
>>>
>>> The motivating example is:
>>>
>>>   struct s { int x[8]; };
>>>   void
>>>   f (struct s *a, struct s *b)
>>>   {
>>> for (int i = 0; i < 8; ++i)
>>>   a->x[i] += b->x[i];
>>>   }
>>>
>>> in which the "a" and "b" accesses are either independent or have a
>>> dependence distance of 0 (assuming -fstrict-aliasing).  Neither case
>>> prevents vectorisation, so we can vectorise without an alias check.
>>>
>>> I'd originally wanted to do the same thing for arrays as well, e.g.:
>>>
>>>   void
>>>   f (int a[][8], struct b[][8])
>>>   {
>>> for (int i = 0; i < 8; ++i)
>>>   a[0][i] += b[0][i];
>>>   }
>>>
>>> I think this is valid because C11 6.7.6.2/6 says:
>>>
>>>   For two array types to be compatible, both shall have compatible
>>>   element types, and if both size specifiers are present, and are
>>>   integer constant expressions, then both size specifiers shall have
>>>   the same constant value.
>>>
>>> So if we access an array through an int (*)[8], it must have type X[8]
>>> or X[], where X is compatible with int.  It doesn't seem possible in
>>> either case for "a[0]" and "b[0]" to overlap when "a != b".
>>>
>>> However, Richard B said that (at least in gimple) we support arbitrary
>>> overlap of arrays and allow arrays to be accessed with different
>>> dimensionality.  There are examples of this in PR50067.  I've therefore
>>> only handled references that end in a structure field access.
>>>
>>> There are two ways of handling these dependences in the vectoriser:
>>> use them to limit VF, or check at runtime as before.  I've gone for
>>> the approach of checking at runtime if we can, to avoid limiting VF
>>> unnecessarily.  We still fall back to a VF cap when runtime checks
>>> aren't allowed.
>>>
>>> The patch tests whether we queued an alias check with a dependence
>>> distance of X and then picked a VF <= X, in which case it's safe to
>>> drop the alias check.  Since vect_prune_runtime_alias_check_list can
>>> be called twice with different VF for the same loop, it's no longer
>>> safe to clear may_alias_ddrs on exit.  Instead we should use
>>> comp_alias_ddrs to check whether versioning is necessary.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> You seem to do your "fancy" thing but also later compute the old
>> base equality anyway (for same_base_p).  It looks to me for this
>> case the new fancy code can be simply skipped, keeping num_dimensions
>> as before?
>>
>> +  /* Try to approach equal type sizes.  */
>> +  if (!COMPLETE_TYPE_P (type_a)
>> + || !COMPLETE_TYPE_P (type_b)
>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
>> +   break;
>>
>> ah, interesting idea to avoid a quadratic search.  Note that you should
>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
>> as they are used for type-punning.

All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs,
ARRAY_REFs or COMPONENT_REFs of structures, since that's all that
dr_analyze_indices allows, so I think we safe in terms of the tree codes.

>> I see nonoverlapping_component_refs_of_decl_p should simply skip
>> ARRAY_REFs - but I also see there:
>>
>>   /* ??? We cannot simply use the type of operand #0 of the refs here
>>  as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>  for common blocks instead of using unions like everyone else.  */
>>   tree type1 = DECL_CONTEXT (field1);
>>   tree type2 = DECL_CONTEXT (field2);
>>
>> so you probably can't simply use TREE_TYPE (outer_ref) for type 
>> compatibility.
>> You also may not use types_compatible_p here as for LTO that is _way_ too
>> lax for aggregates.  The above uses
>>
>>   /* We cannot disambiguate fields in a union or qualified union.  */
>>   if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>  return false;
>>
>> so you should also bail out on unions here, rather than the check you do 
>> later.

The loop stops before we get to a union, so I think "only" the RECORD_TYPE
COMPONENT_REF handling is a potential problem.  Does this mean that
I should use the nonoverlapping_component_refs_of_decl_p code:

  tree field1 = TREE_OPERAND (ref1, 1);
  tree field2 = TREE_OPERAND (ref2, 1);

  /* ??? We cannot simply use the type of operand #0 of the refs here
 as the Fortran compiler smuggles type punning into COMPONENT_REFs
 for common blocks instead of using 

[gomp5] Default to nonmonotonic schedule for dynamic/guided

2017-05-04 Thread Jakub Jelinek
Hi!

OpenMP 5.0 will change the default, when neither monotonic nor nonmonotonic
modifier is present on dynamic or guided schedule, it is now nonmonotonic,
while in 4.5 the default was monotonic.

2017-05-04  Jakub Jelinek  

* omp-expand.c (expand_parallel_call, expand_omp_for): For dynamic
and guided schedule without monotonic and nonmonotonic modifier,
default to nonmonotonic.

* gcc.dg/gomp/for-4.c: Expected nonmonotonic functions in the dumps.
* gcc.dg/gomp/for-5.c: Likewise.
* g++.dg/gomp/for-4.C: Likewise.
* g++.dg/gomp/for-5.C: Likewise.

--- gcc/omp-expand.c.jj 2017-05-04 15:05:50.0 +0200
+++ gcc/omp-expand.c2017-05-04 17:58:48.633304639 +0200
@@ -536,8 +536,8 @@ expand_parallel_call (struct omp_region
  break;
case OMP_CLAUSE_SCHEDULE_DYNAMIC:
case OMP_CLAUSE_SCHEDULE_GUIDED:
- if (region->inner->sched_modifiers
- & OMP_CLAUSE_SCHEDULE_NONMONOTONIC)
+ if ((region->inner->sched_modifiers
+  & OMP_CLAUSE_SCHEDULE_MONOTONIC) == 0)
{
  start_ix2 = 3 + region->inner->sched_kind;
  break;
@@ -5854,7 +5854,7 @@ expand_omp_for (struct omp_region *regio
  break;
case OMP_CLAUSE_SCHEDULE_DYNAMIC:
case OMP_CLAUSE_SCHEDULE_GUIDED:
- if ((fd.sched_modifiers & OMP_CLAUSE_SCHEDULE_NONMONOTONIC)
+ if ((fd.sched_modifiers & OMP_CLAUSE_SCHEDULE_MONOTONIC) == 0
  && !fd.ordered
  && !fd.have_ordered)
{
--- gcc/testsuite/gcc.dg/gomp/for-4.c.jj2017-05-04 15:05:34.0 
+0200
+++ gcc/testsuite/gcc.dg/gomp/for-4.c   2017-05-04 18:17:27.792233682 +0200
@@ -12,5 +12,5 @@ void foo (int n)
 bar(i);
 }
 
-/* { dg-final { scan-tree-dump-times "GOMP_loop_dynamic_start" 1 "ompexp" } } 
*/
-/* { dg-final { scan-tree-dump-times "GOMP_loop_dynamic_next" 1 "ompexp" } } */
+/* { dg-final { scan-tree-dump-times "GOMP_loop_nonmonotonic_dynamic_start" 1 
"ompexp" } } */
+/* { dg-final { scan-tree-dump-times "GOMP_loop_nonmonotonic_dynamic_next" 1 
"ompexp" } } */
--- gcc/testsuite/gcc.dg/gomp/for-5.c.jj2017-05-04 15:05:34.0 
+0200
+++ gcc/testsuite/gcc.dg/gomp/for-5.c   2017-05-04 18:19:16.363931760 +0200
@@ -12,5 +12,5 @@ void foo (int n)
 bar(i);
 }
 
-/* { dg-final { scan-tree-dump-times "GOMP_loop_guided_start" 1 "ompexp" } } */
-/* { dg-final { scan-tree-dump-times "GOMP_loop_guided_next" 1 "ompexp" } } */
+/* { dg-final { scan-tree-dump-times "GOMP_loop_nonmonotonic_guided_start" 1 
"ompexp" } } */
+/* { dg-final { scan-tree-dump-times "GOMP_loop_nonmonotonic_guided_next" 1 
"ompexp" } } */
--- gcc/testsuite/g++.dg/gomp/for-4.C.jj2017-05-04 15:05:46.0 
+0200
+++ gcc/testsuite/g++.dg/gomp/for-4.C   2017-05-04 18:18:00.182845275 +0200
@@ -12,5 +12,5 @@ void foo (int n)
 bar(i);
 }
 
-/* { dg-final { scan-tree-dump-times "GOMP_loop_dynamic_start" 1 "ompexp" } } 
*/
-/* { dg-final { scan-tree-dump-times "GOMP_loop_dynamic_next" 1 "ompexp" } } */
+/* { dg-final { scan-tree-dump-times "GOMP_loop_nonmonotonic_dynamic_start" 1 
"ompexp" } } */
+/* { dg-final { scan-tree-dump-times "GOMP_loop_nonmonotonic_dynamic_next" 1 
"ompexp" } } */
--- gcc/testsuite/g++.dg/gomp/for-5.C.jj2017-05-04 15:05:46.0 
+0200
+++ gcc/testsuite/g++.dg/gomp/for-5.C   2017-05-04 18:18:12.796694018 +0200
@@ -12,5 +12,5 @@ void foo (int n)
 bar(i);
 }
 
-/* { dg-final { scan-tree-dump-times "GOMP_loop_guided_start" 1 "ompexp" } } */
-/* { dg-final { scan-tree-dump-times "GOMP_loop_guided_next" 1 "ompexp" } } */
+/* { dg-final { scan-tree-dump-times "GOMP_loop_nonmonotonic_guided_start" 1 
"ompexp" } } */
+/* { dg-final { scan-tree-dump-times "GOMP_loop_nonmonotonic_guided_next" 1 
"ompexp" } } */

Jakub


Re: [PATCH] Small type_hash_canon improvement

2017-05-04 Thread Jakub Jelinek
On Thu, May 04, 2017 at 06:21:17PM +0200, Richard Biener wrote:
> >the
> >only other user after all calls free_node in a loop, so it is highly
> >unlikely it would do anything there.
> >
> >If you mean the INTEGER_TYPE handling, then yes, I guess it could be
> >done in free_node too and can move it there.  If it was without
> >the && TREE_TYPE (TYPE_M*_VALUE (type)) == type extra checks, then it
> >is certainly unsafe and breaks bootstrap even, e.g. build_range_type
> >and other spots happily create INTEGER_TYPEs with min/max value that
> >have some other type.  But when the type of the INTEGER_CSTs is the
> >type we are ggc_freeing, anything that would refer to those constants
> >afterwards would be necessarily broken (as their TREE_TYPE would be
> >ggc_freed, possibly reused for something completely unrelated).
> >Thus I think it should be safe even in the LTO case and thus doable
> >in free_node.
> 
> OK.  OTOH LTO frees the whole SCC and thus doesn't expect any pointed to stuff
> to be freed.  Not sure if we allow double ggc_free of stuff.

We don't, that crashes miserably.

Jakub


Re: [PATCH] Small type_hash_canon improvement

2017-05-04 Thread Richard Biener
On May 4, 2017 6:03:46 PM GMT+02:00, Jakub Jelinek  wrote:
>On Thu, May 04, 2017 at 05:54:47PM +0200, Richard Biener wrote:
>> >2017-05-04  Jakub Jelinek  
>> >
>> >* tree.c (next_type_uid): Change type to unsigned.
>> >(type_hash_canon): Decrement back next_type_uid if
>> >freeing a type node with the highest TYPE_UID.  For INTEGER_TYPEs
>> >also ggc_free TYPE_MIN_VALUE, TYPE_MAX_VALUE and TYPE_CACHED_VALUES
>> >if possible.
>> >
>> >--- gcc/tree.c.jj   2017-05-03 16:55:39.688052581 +0200
>> >+++ gcc/tree.c  2017-05-03 18:49:30.662185944 +0200
>> >@@ -151,7 +151,7 @@ static const char * const tree_node_kind
>> > /* Unique id for next decl created.  */
>> > static GTY(()) int next_decl_uid;
>> > /* Unique id for next type created.  */
>> >-static GTY(()) int next_type_uid = 1;
>> >+static GTY(()) unsigned next_type_uid = 1;
>> > /* Unique id for next debug decl created.  Use negative numbers,
>> >to catch erroneous uses.  */
>> > static GTY(()) int next_debug_decl_uid;
>> >@@ -7188,6 +7188,19 @@ type_hash_canon (unsigned int hashcode,
>> > {
>> >   tree t1 = ((type_hash *) *loc)->type;
>> >   gcc_assert (TYPE_MAIN_VARIANT (t1) == t1);
>> >+  if (TYPE_UID (type) + 1 == next_type_uid)
>> >+   --next_type_uid;
>> >+  if (TREE_CODE (type) == INTEGER_TYPE)
>> >+   {
>> >+ if (TYPE_MIN_VALUE (type)
>> >+ && TREE_TYPE (TYPE_MIN_VALUE (type)) == type)
>> >+   ggc_free (TYPE_MIN_VALUE (type));
>> >+ if (TYPE_MAX_VALUE (type)
>> >+ && TREE_TYPE (TYPE_MAX_VALUE (type)) == type)
>> >+   ggc_free (TYPE_MAX_VALUE (type));
>> >+ if (TYPE_CACHED_VALUES_P (type))
>> >+   ggc_free (TYPE_CACHED_VALUES (type));
>> >+   }
>> >   free_node (type);
>> 
>> Shouldn't free_node handle this?  That said, is freeing min/max safe?
> The constants are shared after all.
>
>The next_type_uid handling, I think it is better in type_hash_canon,

Agreed.

>the
>only other user after all calls free_node in a loop, so it is highly
>unlikely it would do anything there.
>
>If you mean the INTEGER_TYPE handling, then yes, I guess it could be
>done in free_node too and can move it there.  If it was without
>the && TREE_TYPE (TYPE_M*_VALUE (type)) == type extra checks, then it
>is certainly unsafe and breaks bootstrap even, e.g. build_range_type
>and other spots happily create INTEGER_TYPEs with min/max value that
>have some other type.  But when the type of the INTEGER_CSTs is the
>type we are ggc_freeing, anything that would refer to those constants
>afterwards would be necessarily broken (as their TREE_TYPE would be
>ggc_freed, possibly reused for something completely unrelated).
>Thus I think it should be safe even in the LTO case and thus doable
>in free_node.

OK.  OTOH LTO frees the whole SCC and thus doesn't expect any pointed to stuff 
to be freed.  Not sure if we allow double ggc_free of stuff.

Richard.

>
>   Jakub



Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-04 Thread Martin Sebor

On 05/03/2017 03:27 PM, Joseph Myers wrote:

On Wed, 3 May 2017, Martin Sebor wrote:


Clarifying the comment is helpful, but a data structure involving putting
the same character in both still doesn't make sense to me.  It would seem
a lot clearer to (for example) split "DFKTEV" into separate "DFTV" and
"EK" cases, where "EK" uses NULL there just like "s", "d" etc. do.


Then the begin/end strings for the "DFTV" entry will be the empty
string (to indicate that they are expected to be quoted), as in
the attached incremental diff.  Let me know if I misunderstood
and you had something else in mind.


Yes, that's what I'd expect (incrementally).


FWIW, I don't mind doing this way if you prefer, but I'm hard
pressed to see the improvement.  All it did is grow the size of
the tables.  The code stayed the same.


Really I think it might be better not to have pointers / strings there at
all - rather, have a four-state enum value that says directly whether
those format specifiers are quote-neutral, should-be-quoted, left-quote or
right-quote.  Or that information could go in the existing flags2 field,
'"' to mean should-be-quoted, '<' to mean left-quote and '>' to mean
right-quote, for example.


I like the flags2 idea.  I split up the initialization array to also
detect quoted %K, and unquoted %R and %r.  With that I ran into test
failures that took me a bit to debug.  It turns out that there's code
(a nasty hack, really) that makes assumptions about some of
the conversion specifiers.  I dealt with the failures by simplifying
the initialization code and removing the hack.

Martin

PR translation/80280 - Missing closing quote (%>) c/semantics.c and c/c-typeck.c

gcc/c-family/ChangeLog:

	PR translation/80280
	* c-format.h (struct format_flag_spec): Add new member.
	(T89_T): New macro.
	* c-format.c (local_tree_type_node): New global.
	(printf_flag_specs, asm_fprintf_flag_spec): Initialize new data.
	(gcc_diag_flag_specs, scanf_flag_specs, strftime_flag_specs): Ditto.
	(strfmon_flag_specs): Likewise.
	(gcc_diag_char_table, gcc_cdiag_char_table): Split up specifiers
	with distinct quoting properties.
	(gcc_tdiag_char_table, gcc_cxxdiag_char_table): Same.
	(flag_chars_t::validate): Add argument and handle bad quoting.
	(check_format_info_main): Handle quoting problems.
	(init_dynamic_diag_info): Simplify.

gcc/testsuite/ChangeLog:

	PR translation/80280
	* gcc.dg/format/gcc_diag-10.c: New test.

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 400eb66..98fff4c 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -53,6 +53,9 @@ struct function_format_info
   unsigned HOST_WIDE_INT first_arg_num;	/* number of first arg (zero for varargs) */
 };
 
+/* Initialized in init_dynamic_diag_info.  */
+static tree local_tree_type_node;
+
 static bool decode_format_attr (tree, function_format_info *, int);
 static int decode_format_type (const char *);
 
@@ -492,17 +495,17 @@ static const format_length_info gcc_gfc_length_specs[] =
 
 static const format_flag_spec printf_flag_specs[] =
 {
-  { ' ',  0, 0, N_("' ' flag"),N_("the ' ' printf flag"),  STD_C89 },
-  { '+',  0, 0, N_("'+' flag"),N_("the '+' printf flag"),  STD_C89 },
-  { '#',  0, 0, N_("'#' flag"),N_("the '#' printf flag"),  STD_C89 },
-  { '0',  0, 0, N_("'0' flag"),N_("the '0' printf flag"),  STD_C89 },
-  { '-',  0, 0, N_("'-' flag"),N_("the '-' printf flag"),  STD_C89 },
-  { '\'', 0, 0, N_("''' flag"),N_("the ''' printf flag"),  STD_EXT },
-  { 'I',  0, 0, N_("'I' flag"),N_("the 'I' printf flag"),  STD_EXT },
-  { 'w',  0, 0, N_("field width"), N_("field width in printf format"), STD_C89 },
-  { 'p',  0, 0, N_("precision"),   N_("precision in printf format"),   STD_C89 },
-  { 'L',  0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 },
-  { 0, 0, 0, NULL, NULL, STD_C89 }
+  { ' ',  0, 0, 0, N_("' ' flag"),N_("the ' ' printf flag"),  STD_C89 },
+  { '+',  0, 0, 0, N_("'+' flag"),N_("the '+' printf flag"),  STD_C89 },
+  { '#',  0, 0, 0, N_("'#' flag"),N_("the '#' printf flag"),  STD_C89 },
+  { '0',  0, 0, 0, N_("'0' flag"),N_("the '0' printf flag"),  STD_C89 },
+  { '-',  0, 0, 0, N_("'-' flag"),N_("the '-' printf flag"),  STD_C89 },
+  { '\'', 0, 0, 0, N_("''' flag"),N_("the ''' printf flag"),  STD_EXT },
+  { 'I',  0, 0, 0, N_("'I' flag"),N_("the 'I' printf flag"),  STD_EXT },
+  { 'w',  0, 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 },
+  { 'p',  0, 0, 0, N_("precision"),   N_("precision in printf format"),   STD_C89 },
+  { 'L',  0, 0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 },
+  { 0, 0, 0, 0, NULL, NULL, STD_C89 }
 };
 
 
@@ 

Re: [Patch, fortran] PR70071

2017-05-04 Thread Steve Kargl
On Thu, May 04, 2017 at 05:26:17PM +0200, Harald Anlauf wrote:
> While trying to clean up my working copy, I found that the trivial
> patch for the ICE-on-invalid as described in the PR regtests cleanly
> for 7-release on i686-pc-linux-gnu.
> 
> Here's the cleaned-up version (diffs attached).
> 
> 2017-05-04  Harald Anlauf  
> 
>   PR fortran/70071
>   * array.c (gfc_ref_dimen_size): Handle bad subscript triplets.
> 
> 2017-05-04  Harald Anlauf  
> 
>   PR fortran/70071
>   * gfortran.dg/coarray_44.f90: New testcase.
> 

Harald,

The patch looks reasonable.  Do you have a commit privilege?
-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow


Re: [PATCH][AArch64] Accept more addressing modes for PRFM

2017-05-04 Thread Kyrill Tkachov


On 15/02/17 15:30, Richard Earnshaw (lists) wrote:

On 15/02/17 15:03, Kyrill Tkachov wrote:

Hi Richard,

On 15/02/17 15:00, Richard Earnshaw (lists) wrote:

On 03/02/17 17:12, Kyrill Tkachov wrote:

Hi all,

While evaluating Maxim's SW prefetch patches [1] I noticed that the
aarch64 prefetch pattern is
overly restrictive in its address operand. It only accepts simple
register addressing modes.
In fact, the PRFM instruction accepts almost all modes that a normal
64-bit LDR supports.
The restriction in the pattern leads to explicit address calculation
code to be emitted which we could avoid.

This patch relaxes the restrictions on the prefetch define_insn. It
creates a predicate and constraint that
allow the full addressing modes that PRFM allows. Thus for the testcase
in the patch (adapted from one of the existing
__builtin_prefetch tests in the testsuite) we can generate a:
prfmPLDL1STRM, [x1, 8]

instead of the current
prfmPLDL1STRM, [x1]
with an explicit increment of x1 by 8 in a separate instruction.

I've removed the %a output modifier in the output template and wrapped
the address operand into a DImode MEM before
passing it down to aarch64_print_operand.

This is because operand 0 is an address operand rather than a memory
operand and thus doesn't have a mode associated
with it.  When processing the 'a' output modifier the code in final.c
will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
argument.  This will ICE on aarch64 because we need a mode for the
memory in order for aarch64_classify_address to work
correctly.  Rather than overriding the VOIDmode in
aarch64_print_operand_address I decided to instead create the DImode
MEM in the "prefetch" output template and treat it as a normal 64-bit
memory address, which at the point of assembly output
is what it is anyway.

With this patch I see a reduction in instruction count in the SPEC2006
benchmarks when SW prefetching is enabled on top
of Maxim's patchset because fewer address calculation instructions are
emitted due to the use of the more expressive
addressing modes. It also fixes a performance regression that I observed
in 410.bwaves from Maxim's patches on Cortex-A72.
I'll be running a full set of benchmarks to evaluate this further, but I
think this is the right thing to do.

Bootstrapped and tested on aarch64-none-linux-gnu.

Maxim, do you want to try this on top of your patches on your hardware
to see if it helps with the regressions you mentioned?

Thanks,
Kyrill


[1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html

2016-02-03  Kyrylo Tkachov  

  * config/aarch64/aarch64.md (prefetch); Adjust predicate and
  constraint on operand 0 to allow more general addressing modes.
  Adjust output template.
  * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
  New function.
  * config/aarch64/aarch64-protos.h
  (aarch64_address_valid_for_prefetch_p): Declare prototype.
  * config/aarch64/constraints.md (Dp): New address constraint.
  * config/aarch64/predicates.md (aarch64_prefetch_operand): New
  predicate.

2016-02-03  Kyrylo Tkachov  

  * gcc.target/aarch64/prfm_imm_offset_1.c: New test.

aarch64-prfm-imm.patch


Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
contains an address, not a MEM.  So it's theoretically possible for
generic code to want to look inside the first operand and find an
address directly.  This change would break that assumption.

With this change the prefetch operand is still an address, not a MEM
during all the
optimisation passes.
It's wrapped in a MEM only during the ultimate printing of the assembly
string
during 'final'.


Ah!  I'd missed that.

This is OK for stage1.


I've bootstrapped and tested the patch against current trunk and committed
it as r247603.

Thanks,
Kyrill


R.


Kyrill


R.


commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
Author: Kyrylo Tkachov 
Date:   Thu Feb 2 14:46:11 2017 +

  [AArch64] Accept more addressing modes for PRFM

diff --git a/gcc/config/aarch64/aarch64-protos.h
b/gcc/config/aarch64/aarch64-protos.h
index babc327..61706de 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned,
unsigned);
   int aarch64_get_condition_code (rtx);
+bool aarch64_address_valid_for_prefetch_p (rtx, bool);
   bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
   unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
   unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index acc093a..c05eff3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4549,6 +4549,24 @@ aarch64_classify_address (struct
aarch64_address_info *info,
   

[PATCH] c/c++: Add fix-it hints for suggested missing #includes

2017-05-04 Thread David Malcolm
As of r247522, fix-it-hints can suggest the insertion of new lines.

This patch uses this to implement a new "maybe_add_include_fixit"
function in c-common.c and uses it in the two places where the C and C++
frontend can suggest missing #include directives. [1]

The idea is that the user can then click on the fix-it in an IDE
and have it add the #include for them (or use -fdiagnostics-generate-patch).

Examples can be seen in the test cases.

The function attempts to put the #include in a reasonable place:
immediately after the last #include within the file, or at the
top of the file.  It is idempotent, so -fdiagnostics-generate-patch
does the right thing if several such diagnostics are emitted.

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

[1] I'm working on a followup which tweaks another diagnostic so that it
can suggest that a #include was missing, so I'll use it there as well.

gcc/c-family/ChangeLog:
* c-common.c (try_to_locate_new_include_insertion_point): New
function.
(per_file_includes_t): New typedef.
(added_includes_t): New typedef.
(added_includes): New variable.
(maybe_add_include_fixit): New function.
* c-common.h (maybe_add_include_fixit): New decl.

gcc/c/ChangeLog:
* c-decl.c (implicitly_declare): When suggesting a missing
#include, provide a fix-it hint.

gcc/cp/ChangeLog:
* name-lookup.c (get_std_name_hint): Add '<' and '>' around
the header names.
(maybe_suggest_missing_header): Update for addition of '<' and '>'
to above.  Provide a fix-it hint.

gcc/testsuite/ChangeLog:
* g++.dg/lookup/missing-std-include-2.C: New text case.
* gcc.dg/missing-header-fixit-1.c: New test case.
---
 gcc/c-family/c-common.c| 117 +
 gcc/c-family/c-common.h|   2 +
 gcc/c/c-decl.c |  10 +-
 gcc/cp/name-lookup.c   |  94 +
 .../g++.dg/lookup/missing-std-include-2.C  |  55 ++
 gcc/testsuite/gcc.dg/missing-header-fixit-1.c  |  36 +++
 6 files changed, 267 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lookup/missing-std-include-2.C
 create mode 100644 gcc/testsuite/gcc.dg/missing-header-fixit-1.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 0884922..19f7e60 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -7983,4 +7983,121 @@ c_flt_eval_method (bool maybe_c11_only_p)
 return c_ts18661_flt_eval_method ();
 }
 
+/* Attempt to locate a suitable location within FILE for a
+   #include directive to be inserted before.  FILE should
+   be a string from libcpp (pointer equality is used).
+
+   Attempt to return the location within FILE immediately
+   after the last #include within that file, or the start of
+   that file if it has no #include directives.
+
+   Return UNKNOWN_LOCATION if no suitable location is found,
+   or if an error occurs.  */
+
+static location_t
+try_to_locate_new_include_insertion_point (const char *file)
+{
+  /* Locate the last ordinary map within FILE that ended with a #include.  */
+  const line_map_ordinary *last_include_ord_map = NULL;
+
+  /* ...and the next ordinary map within FILE after that one.  */
+  const line_map_ordinary *last_ord_map_after_include = NULL;
+
+  /* ...and the first ordinary map within FILE.  */
+  const line_map_ordinary *first_ord_map_in_file = NULL;
+
+  for (unsigned int i = 0; i < LINEMAPS_ORDINARY_USED (line_table); i++)
+{
+  const line_map_ordinary *ord_map
+   = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
+
+  const line_map_ordinary *from = INCLUDED_FROM (line_table, ord_map);
+  if (from)
+   if (from->to_file == file)
+ {
+   last_include_ord_map = from;
+   last_ord_map_after_include = NULL;
+ }
+
+  if (ord_map->to_file == file)
+   {
+ if (!first_ord_map_in_file)
+   first_ord_map_in_file = ord_map;
+ if (last_include_ord_map && !last_ord_map_after_include)
+   last_ord_map_after_include = ord_map;
+   }
+}
+
+  /* Determine where to insert the #include.  */
+  const line_map_ordinary *ord_map_for_insertion;
+
+  /* We want the next ordmap in the file after the last one that's a
+ #include, but failing that, the start of the file.  */
+  if (last_ord_map_after_include)
+ord_map_for_insertion = last_ord_map_after_include;
+  else
+ord_map_for_insertion = first_ord_map_in_file;
+
+  if (!ord_map_for_insertion)
+return UNKNOWN_LOCATION;
+
+  /* The "start_location" is column 0, meaning "the whole line".
+ rich_location and edit_context can't cope with this, so use
+ column 1 instead.  */
+  location_t col_0 = ord_map_for_insertion->start_location;
+  return linemap_position_for_loc_and_offset (line_table, col_0, 1);
+}
+
+/* A map from 

Re: [PATCH] Small type_hash_canon improvement

2017-05-04 Thread Jakub Jelinek
On Thu, May 04, 2017 at 05:54:47PM +0200, Richard Biener wrote:
> >2017-05-04  Jakub Jelinek  
> >
> > * tree.c (next_type_uid): Change type to unsigned.
> > (type_hash_canon): Decrement back next_type_uid if
> > freeing a type node with the highest TYPE_UID.  For INTEGER_TYPEs
> > also ggc_free TYPE_MIN_VALUE, TYPE_MAX_VALUE and TYPE_CACHED_VALUES
> > if possible.
> >
> >--- gcc/tree.c.jj2017-05-03 16:55:39.688052581 +0200
> >+++ gcc/tree.c   2017-05-03 18:49:30.662185944 +0200
> >@@ -151,7 +151,7 @@ static const char * const tree_node_kind
> > /* Unique id for next decl created.  */
> > static GTY(()) int next_decl_uid;
> > /* Unique id for next type created.  */
> >-static GTY(()) int next_type_uid = 1;
> >+static GTY(()) unsigned next_type_uid = 1;
> > /* Unique id for next debug decl created.  Use negative numbers,
> >to catch erroneous uses.  */
> > static GTY(()) int next_debug_decl_uid;
> >@@ -7188,6 +7188,19 @@ type_hash_canon (unsigned int hashcode,
> > {
> >   tree t1 = ((type_hash *) *loc)->type;
> >   gcc_assert (TYPE_MAIN_VARIANT (t1) == t1);
> >+  if (TYPE_UID (type) + 1 == next_type_uid)
> >+--next_type_uid;
> >+  if (TREE_CODE (type) == INTEGER_TYPE)
> >+{
> >+  if (TYPE_MIN_VALUE (type)
> >+  && TREE_TYPE (TYPE_MIN_VALUE (type)) == type)
> >+ggc_free (TYPE_MIN_VALUE (type));
> >+  if (TYPE_MAX_VALUE (type)
> >+  && TREE_TYPE (TYPE_MAX_VALUE (type)) == type)
> >+ggc_free (TYPE_MAX_VALUE (type));
> >+  if (TYPE_CACHED_VALUES_P (type))
> >+ggc_free (TYPE_CACHED_VALUES (type));
> >+}
> >   free_node (type);
> 
> Shouldn't free_node handle this?  That said, is freeing min/max safe?  The 
> constants are shared after all.

The next_type_uid handling, I think it is better in type_hash_canon, the
only other user after all calls free_node in a loop, so it is highly
unlikely it would do anything there.

If you mean the INTEGER_TYPE handling, then yes, I guess it could be
done in free_node too and can move it there.  If it was without
the && TREE_TYPE (TYPE_M*_VALUE (type)) == type extra checks, then it
is certainly unsafe and breaks bootstrap even, e.g. build_range_type
and other spots happily create INTEGER_TYPEs with min/max value that
have some other type.  But when the type of the INTEGER_CSTs is the
type we are ggc_freeing, anything that would refer to those constants
afterwards would be necessarily broken (as their TREE_TYPE would be
ggc_freed, possibly reused for something completely unrelated).
Thus I think it should be safe even in the LTO case and thus doable
in free_node.

Jakub


PR80613

2017-05-04 Thread Prathamesh Kulkarni
Hi,
As mentioned in PR, the issue is that cddce1 marks the call to
__builtin_strdup as necessary:
marking necessary through .MEM_6 stmt p_7 = __builtin_strdup ();

and since p_7 doesn't get added to worklist in propagate_necessity()
because it's used only within free(), it's treated as "dead"
and wrongly gets released.
The patch fixes that by adding strdup/strndup in corresponding condition
in eliminate_unnecessary_stmts().

Another issue, was that my previous patch failed to remove multiple
calls to strdup:
char *f(char **tt)
{
  char *t = *tt;
  char *p;

  p = __builtin_strdup (t);
  p = __builtin_strdup (t);
  return p;
}

That's fixed in patch by adding strdup/strndup to another
corresponding condition in propagate_necessity() so that only one
instance of strdup would be kept.

Bootstrapped+tested on x86_64-unknown-linux-gnu.
Cross-testing on arm*-*-* and aarch64*-*-* in progress.
OK to commit if testing passes ?

Thanks
Prathamesh
2017-05-04  Prathamesh Kulkarni  

PR tree-optimization/80613
* tree-ssa-dce.c (propagate_necessity): Add cases for BUILT_IN_STRDUP
and BUILT_IN_STRNDUP.
* (eliminate_unnecessary_stmts): Likewise.

testsuite/
* gcc.dg/tree-ssa/pr80613-1.c: New test-case.
* gcc.dg/tree-ssa/pr80613-2.c: New test-case.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr80613-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr80613-1.c
new file mode 100644
index 000..56176427922
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr80613-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+char *a(int);
+int b;
+
+void c() {
+  for (;;) {
+char d = *a(b);
+char *e = __builtin_strdup ();
+__builtin_free(e);
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr80613-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr80613-2.c
new file mode 100644
index 000..c58cc08d6c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr80613-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce1" } */
+
+/* There should only be one instance of __builtin_strdup after cddce1.  */
+
+char *f(char **tt)
+{
+  char *t = *tt;
+  char *p;
+
+  p = __builtin_strdup (t);
+  p = __builtin_strdup (t);
+  return p;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_strdup" 1 "cddce1" } } */
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index e17659df91f..7c05f981307 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -852,7 +852,9 @@ propagate_necessity (bool aggressive)
  == BUILT_IN_ALLOCA_WITH_ALIGN)
  || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_SAVE
  || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE
- || DECL_FUNCTION_CODE (callee) == 
BUILT_IN_ASSUME_ALIGNED))
+ || DECL_FUNCTION_CODE (callee) == BUILT_IN_ASSUME_ALIGNED
+ || DECL_FUNCTION_CODE (callee) == BUILT_IN_STRDUP
+ || DECL_FUNCTION_CODE (callee) == BUILT_IN_STRNDUP))
continue;
 
  /* Calls implicitly load from memory, their arguments
@@ -1353,6 +1355,8 @@ eliminate_unnecessary_stmts (void)
  && DECL_FUNCTION_CODE (call) != BUILT_IN_MALLOC
  && DECL_FUNCTION_CODE (call) != BUILT_IN_CALLOC
  && DECL_FUNCTION_CODE (call) != BUILT_IN_ALLOCA
+ && DECL_FUNCTION_CODE (call) != BUILT_IN_STRDUP
+ && DECL_FUNCTION_CODE (call) != BUILT_IN_STRNDUP
  && (DECL_FUNCTION_CODE (call)
  != BUILT_IN_ALLOCA_WITH_ALIGN)))
  /* Avoid doing so for bndret calls for the same reason.  */


Re: [PR 80622] Treat const pools as initialized in SRA

2017-05-04 Thread Richard Biener
On May 4, 2017 5:09:15 PM GMT+02:00, Martin Jambor  wrote:
>Hi,
>
>PR 80622 happens because when setting grp_write lazily, the code does
>not acknowledge that constant pool bases come initialized and so
>contain data even when not written to.  The patch below fixes that but
>it also puts a test for pre-initialization into a special function,
>uses it at all appropriate places and moves the test in question to an
>earlier time, which is a tiny bit cheaper because it may avoid
>unnecessary re-invocation of propagate_subaccesses_across_link.
>
>Bootstrapped and tested on x86_64-linux, OK for trunk?

OK.

Richard.

>Thanks,
>
>Martin
>
>
>
>2017-05-04  Martin Jambor  
>
>   PR tree-optimization/80622
>   * tree-sra.c (comes_initialized_p): New function.
>   (build_accesses_from_assign): Only set write lazily when
>   comes_initialized_p is false.
>   (analyze_access_subtree): Use comes_initialized_p.
>   (propagate_subaccesses_across_link): Assert !comes_initialized_p
>   instead of testing for PARM_DECL.
>
>testsuite/
>   * gcc.dg/tree-ssa/pr80622.c: New test.
>---
> gcc/testsuite/gcc.dg/tree-ssa/pr80622.c | 19 +++
>gcc/tree-sra.c  | 29
>+
> 2 files changed, 40 insertions(+), 8 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr80622.c
>
>diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr80622.c
>b/gcc/testsuite/gcc.dg/tree-ssa/pr80622.c
>new file mode 100644
>index 000..96dcb8fcdc0
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr80622.c
>@@ -0,0 +1,19 @@
>+/* { dg-do run } */
>+/* { dg-options "-O" } */
>+
>+struct S { int d; char e; int f; char g; } a;
>+char c;
>+
>+int
>+main ()
>+{
>+  struct S b[][1] = {3, 0, 3, 4, 3, 0, 3, 4, 3, 0, 3, 4, 3, 0, 3, 4,
>3,
>+  0, 3, 4, 3, 0, 3, 4, 3, 0, 3, 4, 3, 0, 3, 4, 3,
>0,
>+  3, 4, 3, 4, 7, 7, 3, 5, 0, 3, 4, 7, 7, 3, 5, 0,
>3,
>+  4, 3, 4, 7, 7, 3, 5, 0, 3, 4, 7, 7, 3, 5, 0, 3,
>4};
>+  a = b[4][0];
>+  c = b[4][0].e;
>+  if (a.g != 4)
>+__builtin_abort ();
>+  return 0;
>+}
>diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>index 1606573aead..8ac9c0783ff 100644
>--- a/gcc/tree-sra.c
>+++ b/gcc/tree-sra.c
>@@ -1305,6 +1305,15 @@ disqualify_if_bad_bb_terminating_stmt (gimple
>*stmt, tree lhs, tree rhs)
>   return false;
> }
> 
>+/* Return true if the nature of BASE is such that it contains data
>even if
>+   there is no write to it in the function.  */
>+
>+static bool
>+comes_initialized_p (tree base)
>+{
>+  return TREE_CODE (base) == PARM_DECL || constant_decl_p (base);
>+}
>+
>/* Scan expressions occurring in STMT, create access structures for all
>accesses
>to candidates for scalarization and remove those candidates which occur
>in
>statements or expressions that prevent them from being split apart. 
>Return
>@@ -1364,8 +1373,10 @@ build_accesses_from_assign (gimple *stmt)
>   link->racc = racc;
>   add_link_to_rhs (racc, link);
>/* Let's delay marking the areas as written until propagation of
>accesses
>-   across link.  */
>-  lacc->write = false;
>+   across link, unless the nature of rhs tells us that its data comes
>+   from elsewhere.  */
>+  if (!comes_initialized_p (racc->base))
>+  lacc->write = false;
> }
> 
>   return lacc || racc;
>@@ -2472,8 +2483,7 @@ analyze_access_subtree (struct access *root,
>struct access *parent,
> 
>   if (!hole || root->grp_total_scalarization)
> root->grp_covered = 1;
>-  else if (root->grp_write || TREE_CODE (root->base) == PARM_DECL
>- || constant_decl_p (root->base))
>+  else if (root->grp_write || comes_initialized_p (root->base))
> root->grp_unscalarized_data = 1; /* not covered and written to */
>   return sth_created;
> }
>@@ -2581,11 +2591,14 @@ propagate_subaccesses_across_link (struct
>access *lacc, struct access *racc)
> 
>/* IF the LHS is still not marked as being written to, we only need to
>do so
>  if the RHS at this level actually was.  */
>-  if (!lacc->grp_write &&
>-  (racc->grp_write || TREE_CODE (racc->base) == PARM_DECL))
>+  if (!lacc->grp_write)
> {
>-  lacc->grp_write = true;
>-  ret = true;
>+  gcc_checking_assert (!comes_initialized_p (racc->base));
>+  if (racc->grp_write)
>+  {
>+lacc->grp_write = true;
>+ret = true;
>+  }
> }
> 
>   if (is_gimple_reg_type (lacc->type)



Re: [PATCH] Small type_hash_canon improvement

2017-05-04 Thread Richard Biener
On May 4, 2017 4:43:45 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>While type_hash_canon in case of reusing an already existing type
>ggc_frees the freshly created type, we still waste one type uid
>for each such case, this patch attempts to avoid that.
>Furthermore, for INTEGER_TYPE we keep around the min and max value
>INTEGER_CSTs and the cached values vector (until it is GCed).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
>2017-05-04  Jakub Jelinek  
>
>   * tree.c (next_type_uid): Change type to unsigned.
>   (type_hash_canon): Decrement back next_type_uid if
>   freeing a type node with the highest TYPE_UID.  For INTEGER_TYPEs
>   also ggc_free TYPE_MIN_VALUE, TYPE_MAX_VALUE and TYPE_CACHED_VALUES
>   if possible.
>
>--- gcc/tree.c.jj  2017-05-03 16:55:39.688052581 +0200
>+++ gcc/tree.c 2017-05-03 18:49:30.662185944 +0200
>@@ -151,7 +151,7 @@ static const char * const tree_node_kind
> /* Unique id for next decl created.  */
> static GTY(()) int next_decl_uid;
> /* Unique id for next type created.  */
>-static GTY(()) int next_type_uid = 1;
>+static GTY(()) unsigned next_type_uid = 1;
> /* Unique id for next debug decl created.  Use negative numbers,
>to catch erroneous uses.  */
> static GTY(()) int next_debug_decl_uid;
>@@ -7188,6 +7188,19 @@ type_hash_canon (unsigned int hashcode,
> {
>   tree t1 = ((type_hash *) *loc)->type;
>   gcc_assert (TYPE_MAIN_VARIANT (t1) == t1);
>+  if (TYPE_UID (type) + 1 == next_type_uid)
>+  --next_type_uid;
>+  if (TREE_CODE (type) == INTEGER_TYPE)
>+  {
>+if (TYPE_MIN_VALUE (type)
>+&& TREE_TYPE (TYPE_MIN_VALUE (type)) == type)
>+  ggc_free (TYPE_MIN_VALUE (type));
>+if (TYPE_MAX_VALUE (type)
>+&& TREE_TYPE (TYPE_MAX_VALUE (type)) == type)
>+  ggc_free (TYPE_MAX_VALUE (type));
>+if (TYPE_CACHED_VALUES_P (type))
>+  ggc_free (TYPE_CACHED_VALUES (type));
>+  }
>   free_node (type);

Shouldn't free_node handle this?  That said, is freeing min/max safe?  The 
constants are shared after all.

Richard.

>   return t1;
> }
>
>   Jakub



[gomp5] Allow use_device_ptr clause on target data without map clauses

2017-05-04 Thread Jakub Jelinek
Hi!

I've created branches/gomp-5_0-branch as a playground for OpenMP 5.0
implementation (so far mostly the TR4 - OpenMP Version 5.0 Preview 1
from last fall, but including later changes too).

This is the first patch - OpenMP 5.0 will allow target data
with just use_device_ptr clauses and no map clauses, which was invalid in
4.5.

Tested on x86_64-linux, committed to gomp-5_0-branch.

2017-05-04  Jakub Jelinek  

* c-parser.c (c_parser_omp_target_data): Allow target data
with only use_device_ptr clauses.

* parser.c (cp_parser_omp_target_data): Allow target data
with only use_device_ptr clauses.

* c-c++-common/gomp/target-data-1.c: New test.

--- gcc/c/c-parser.c.jj 2017-05-04 15:05:06.0 +0200
+++ gcc/c/c-parser.c2017-05-04 15:27:33.306131900 +0200
@@ -16091,6 +16091,8 @@ c_parser_omp_target_data (location_t loc
*pc = OMP_CLAUSE_CHAIN (*pc);
continue;
  }
+  else if (OMP_CLAUSE_CODE (*pc) == OMP_CLAUSE_USE_DEVICE_PTR)
+   map_seen = 3;
   pc = _CLAUSE_CHAIN (*pc);
 }
 
@@ -16099,7 +16101,7 @@ c_parser_omp_target_data (location_t loc
   if (map_seen == 0)
error_at (loc,
  "%<#pragma omp target data%> must contain at least "
- "one % clause");
+ "one % or % clause");
   return NULL_TREE;
 }
 
--- gcc/cp/parser.c.jj  2017-05-04 15:05:49.0 +0200
+++ gcc/cp/parser.c 2017-05-04 15:28:27.167445027 +0200
@@ -35627,6 +35627,8 @@ cp_parser_omp_target_data (cp_parser *pa
*pc = OMP_CLAUSE_CHAIN (*pc);
continue;
  }
+  else if (OMP_CLAUSE_CODE (*pc) == OMP_CLAUSE_USE_DEVICE_PTR)
+   map_seen = 3;
   pc = _CLAUSE_CHAIN (*pc);
 }
 
@@ -35635,7 +35637,7 @@ cp_parser_omp_target_data (cp_parser *pa
   if (map_seen == 0)
error_at (pragma_tok->location,
  "%<#pragma omp target data%> must contain at least "
- "one % clause");
+ "one % or % clause");
   return NULL_TREE;
 }
 
--- gcc/testsuite/c-c++-common/gomp/target-data-1.c.jj  2017-05-04 
17:30:41.339849938 +0200
+++ gcc/testsuite/c-c++-common/gomp/target-data-1.c 2017-05-04 
17:31:48.27004 +0200
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+
+void
+foo (void)
+{
+  int a[4] = { 1, 2, 3, 4 };
+  #pragma omp target data map(to:a)
+  #pragma omp target data use_device_ptr(a)
+  #pragma omp target is_device_ptr(a)
+  {
+a[0]++;
+  }
+  #pragma omp target data  /* { dg-error "must contain at least 
one" } */
+  a[0]++;
+  #pragma omp target data map(to:a)
+  #pragma omp target data use_device_ptr(a) use_device_ptr(a) /* { dg-error 
"appears more than once in data clauses" } */
+  a[0]++;
+}

Jakub


Re: [PATCH GCC8][25/33]New loop constraint flags

2017-05-04 Thread Bin.Cheng
On Wed, Apr 26, 2017 at 3:01 PM, Bin.Cheng  wrote:
> On Wed, Apr 26, 2017 at 2:27 PM, Richard Biener
>  wrote:
>> On Tue, Apr 18, 2017 at 12:51 PM, Bin Cheng  wrote:
>>> Hi,
>>> This patch adds new loop constraint flags marking prologue, epilogue and 
>>> versioned loops generated
>>> by vectorizer, unroller and versioning.  These flags will be used in IVOPTs 
>>> in order to differentiate
>>> possible hot innermost loop from others.  I also plan to use them to avoid 
>>> unnecessary cunroll on
>>> such loops.
>>> Is it OK?
>>
>> Hmm, it doesn't really match "constraints".
>>
>> I'd rather somehow track the "original" loop it was versioned / copied
>> from plus either a
>> "kind" (epilogue, prologue, version) or determine this from dominance
>> relationship between
>> the copy and the original loop.
> Or we generalize "constraints flags", saying to introduce general
> bit-wise flags for loop structure.  Among these flags, one kind is
> constraint flags, the rest are general flags.  We could also change
> boolean fields into such flags?
> I do have following patches relying on this to avoid complete unroll
> for prologue/epilogue loops.
>>
>> Thus,
>>
>> struct loop {
>>  ...
>>  /* If not zero the loop number this loop was copied from.  */
>>  unsigned clone_of;
> In this case, we need to track between different loops, which looks
> like a burden.  For example, we need to make sure loop number won't be
> reused.  Even more complicated, considering in extreme case that the
> original loop could be removed.
>
I will drop this one from this patch series.

Thanks,
bin
>>
>> would that help?  With knowing loop relation we can also more
>> aggressively version
>> and eventually later collapse the two versions again if we can still
>> identify them and
>> they are still reasonably similar.
>>
>> Richard.
>>
>>>
>>> Thanks,
>>> bin
>>> 2017-04-11  Bin Cheng  
>>>
>>> * cfgloop.h (LOOP_C_PROLOG, LOOP_C_EPILOG, LOOP_C_VERSION): New.
>>> * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Set
>>> LOOP_C_EPILOG for unrolled epilogue loop.
>>> (vect_do_peeling): Set LOOP_C_PROLOG and LOOP_C_EPILOG for peeled
>>> loops.
>>> (vect_loop_versioning): Set LOOP_C_VERSION for versioned loop.


[C++ PATCH[ global trees

2017-05-04 Thread Nathan Sidwell

This patch moves more things into the cp_global_trees array.

1) a set of identifiers, in particular initializer_list is no longer 
checked by strcmp or multiple get_identifier calls.


2) The anonymous namespace identifier

3) The global namespace and its name.

committed to trunk.

nathan
--
Nathan Sidwell
2017-05-04  Nathan Sidwell  

	More global trees.
	* cp-tree.h (enum cp_tree_index): Add CPTI_GLOBAL,
	CPTI_GLOBAL_TYPE, CPTI_GLOBAL_IDENTIFIER, CPTI_ANON_IDENTIFIER,
	CPTI_INIT_LIST_IDENTIFIER.
	(global_namespace, global_type_node, global_identifier,
	anon_identifier, init_list_identifier): New.
	* decl.c (global_type_node, global_scope_name): Delete.
	(initialize_predefined_identifiers): Add new identifiers.
	(cxx_init_decl_processing): Adjust.
	* name-lookup.h (global_namespace, global_type_node): Delete.
	* name-lookup.c (global_namespace, anonymous_namespace_name,
	get_anonymous_namespace_name): Delete.
	(namespace_scope_ht_size, begin_scope, pushtag_1,
	push_namespace): Adjust,
	* call.c (type_has_extended_temps): Use init_list_identifier.
	* pt.c (listify): Likewise.

Index: cp/call.c
===
--- cp/call.c	(revision 247591)
+++ cp/call.c	(working copy)
@@ -10543,15 +10543,15 @@ type_has_extended_temps (tree type)
 bool
 is_std_init_list (tree type)
 {
-  /* Look through typedefs.  */
   if (!TYPE_P (type))
 return false;
   if (cxx_dialect == cxx98)
 return false;
+  /* Look through typedefs.  */
   type = TYPE_MAIN_VARIANT (type);
   return (CLASS_TYPE_P (type)
 	  && CP_TYPE_CONTEXT (type) == std_node
-	  && strcmp (TYPE_NAME_STRING (type), "initializer_list") == 0);
+	  && init_list_identifier == DECL_NAME (TYPE_NAME (type)));
 }
 
 /* Returns true iff DECL is a list constructor: i.e. a constructor which
Index: cp/cp-tree.h
===
--- cp/cp-tree.h	(revision 247591)
+++ cp/cp-tree.h	(working copy)
@@ -119,6 +119,8 @@ enum cp_tree_index
 CPTI_VTBL_PTR_TYPE,
 CPTI_STD,
 CPTI_ABI,
+CPTI_GLOBAL,
+CPTI_GLOBAL_TYPE,
 CPTI_CONST_TYPE_INFO_TYPE,
 CPTI_TYPE_INFO_PTR_TYPE,
 CPTI_ABORT_FNDECL,
@@ -138,9 +140,12 @@ enum cp_tree_index
 CPTI_THIS_IDENTIFIER,
 CPTI_PFN_IDENTIFIER,
 CPTI_VPTR_IDENTIFIER,
+CPTI_GLOBAL_IDENTIFIER,
 CPTI_STD_IDENTIFIER,
+CPTI_ANON_IDENTIFIER,
 CPTI_AUTO_IDENTIFIER,
 CPTI_DECLTYPE_AUTO_IDENTIFIER,
+CPTI_INIT_LIST_IDENTIFIER,
 
 CPTI_LANG_NAME_C,
 CPTI_LANG_NAME_CPLUSPLUS,
@@ -184,6 +189,8 @@ extern GTY(()) tree cp_global_trees[CPTI
 #define vtbl_ptr_type_node		cp_global_trees[CPTI_VTBL_PTR_TYPE]
 #define std_node			cp_global_trees[CPTI_STD]
 #define abi_node			cp_global_trees[CPTI_ABI]
+#define global_namespace		cp_global_trees[CPTI_GLOBAL]
+#define global_type_node		cp_global_trees[CPTI_GLOBAL_TYPE]
 #define const_type_info_type_node	cp_global_trees[CPTI_CONST_TYPE_INFO_TYPE]
 #define type_info_ptr_type		cp_global_trees[CPTI_TYPE_INFO_PTR_TYPE]
 #define abort_fndecl			cp_global_trees[CPTI_ABORT_FNDECL]
@@ -224,12 +231,14 @@ extern GTY(()) tree cp_global_trees[CPTI
 #define this_identifier			cp_global_trees[CPTI_THIS_IDENTIFIER]
 #define pfn_identifier			cp_global_trees[CPTI_PFN_IDENTIFIER]
 #define vptr_identifier			cp_global_trees[CPTI_VPTR_IDENTIFIER]
-/* The name of the std namespace.  */
+/* The name of the ::, std & anon namespaces.  */
+#define global_identifier		cp_global_trees[CPTI_GLOBAL_IDENTIFIER]
 #define std_identifier			cp_global_trees[CPTI_STD_IDENTIFIER]
+#define anon_identifier			cp_global_trees[CPTI_ANON_IDENTIFIER]
 /* auto and declspec(auto) identifiers.  */
 #define auto_identifier			cp_global_trees[CPTI_AUTO_IDENTIFIER]
 #define decltype_auto_identifier	cp_global_trees[CPTI_DECLTYPE_AUTO_IDENTIFIER]
-/* The name of a C++17 deduction guide.  */
+#define init_list_identifier		cp_global_trees[CPTI_INIT_LIST_IDENTIFIER]
 #define lang_name_c			cp_global_trees[CPTI_LANG_NAME_C]
 #define lang_name_cplusplus		cp_global_trees[CPTI_LANG_NAME_CPLUSPLUS]
 
@@ -277,6 +286,7 @@ extern GTY(()) tree cp_global_trees[CPTI
access nodes in tree.h.  */
 
 #define access_default_node		null_node
+
 
 #include "name-lookup.h"
 
Index: cp/decl.c
===
--- cp/decl.c	(revision 247591)
+++ cp/decl.c	(working copy)
@@ -140,14 +140,6 @@ static void expand_static_init (tree, tr
 
 tree cp_global_trees[CPTI_MAX];
 
-/* Indicates that there is a type value in some namespace, although
-   that is not necessarily in scope at the moment.  */
-
-tree global_type_node;
-
-/* The node that holds the "name" of the global scope.  */
-tree global_scope_name;
-
 #define local_names cp_function_chain->x_local_names
 
 /* A list of objects which have constructors or destructors
@@ -3935,7 +3927,7 @@ make_unbound_class_template (tree contex
 
 
 
-/* Push the declarations of builtin types into the namespace.
+/* Push 

Re: [PATCH GCC8][28/33]Don't count non-interger PHIs for register pressure

2017-05-04 Thread Bin.Cheng
On Wed, Apr 26, 2017 at 3:32 PM, Bin.Cheng  wrote:
> On Wed, Apr 26, 2017 at 3:23 PM, Richard Biener
>  wrote:
>> On Wed, Apr 26, 2017 at 3:37 PM, Bin.Cheng  wrote:
>>> On Wed, Apr 26, 2017 at 2:32 PM, Richard Biener
>>>  wrote:
 On Tue, Apr 18, 2017 at 12:52 PM, Bin Cheng  wrote:
> Hi,
> Given only integer variables are meaningful for register pressure 
> estimation in IVOPTs,
> this patch skips non-integer type PHIs when counting register pressure.
> Is it OK?

 Huh.  I suppose it only makes a difference because you are ignoring
 POINTER_TYPE_P
 IVs?  At least I would be surprised if get_iv returns true for float
 or vector PHIs (yeah, see
 early out in get_iv)?  So why exclude POINTER_TYPE_P IVs?
>>> Hmm, but if get_iv returns non-NULL, the phi won't be counted because
>>> loop is continued?  Actually, all IV and invariants are skipped by
>>> checking get_iv, so this is only to skip floating point phis.
>>
>> Err, but AFAICS get_iv will return non-NULL for POINTER_TYPE_P IVs
>> which you then skip by your added
>>
>> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
>> +   continue;
>>
>> thus float IVs are always skipped by means if get_iv returning NULL.
>>
>> Oh, the get_iv check continues for non-NULL result ... so it makes sense.
>> But still, why exclude POINTER_TYPE_P non-IV ops?
> POINTER_TYPE_P is simply an overlook,  will update patch.
Here is updated version picking up POINTER_TYPE_P.

Thanks,
bin
>
> Thanks,
> bin
>>
>> Richard.
>>
>>> Thanks,
>>> bin

 Richard.

> Thanks,
> bin
>
> 2017-04-11  Bin Cheng  
>
> * tree-ssa-loop-ivopts.c (determine_set_costs): Skip non-interger
> when counting register pressure.
>
From 9c2d8f5f3b749863bcb9a32ff3a520a8d3eda9f1 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 7 Mar 2017 16:26:27 +
Subject: [PATCH 26/33] skip-non_int-phi-reg-pressure-20170401.txt

---
 gcc/tree-ssa-loop-ivopts.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 205d118..8d6adfe 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5579,6 +5579,10 @@ determine_set_costs (struct ivopts_data *data)
   if (get_iv (data, op))
continue;
 
+  if (!POINTER_TYPE_P (TREE_TYPE (op))
+ && !INTEGRAL_TYPE_P (TREE_TYPE (op)))
+   continue;
+
   n++;
 }
 
-- 
1.9.1



[Patch, fortran] PR70071

2017-05-04 Thread Harald Anlauf
While trying to clean up my working copy, I found that the trivial
patch for the ICE-on-invalid as described in the PR regtests cleanly
for 7-release on i686-pc-linux-gnu.

Here's the cleaned-up version (diffs attached).


2017-05-04  Harald Anlauf  

PR fortran/70071
* array.c (gfc_ref_dimen_size): Handle bad subscript triplets.



2017-05-04  Harald Anlauf  

PR fortran/70071
* gfortran.dg/coarray_44.f90: New testcase.


If somebody wants to forward port this to 8-trunk, please go ahead.

Thanks,
Harald
Index: gcc/fortran/array.c
===
--- gcc/fortran/array.c (revision 247015)
+++ gcc/fortran/array.c (working copy)
@@ -2202,9 +2202,15 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen,
   mpz_t diff;
   bool t;
 
-  if (dimen < 0 || ar == NULL || dimen > ar->dimen - 1)
+  if (dimen < 0 || ar == NULL)
 gfc_internal_error ("gfc_ref_dimen_size(): Bad dimension");
 
+  if (dimen > ar->dimen - 1)
+{
+  gfc_error ("Bad array dimension at %L", >c_where[dimen]);
+  return false;
+}
+
   switch (ar->dimen_type[dimen])
 {
 case DIMEN_ELEMENT:
Index: gcc/testsuite/gfortran.dg/coarray_44.f90
===
--- gcc/testsuite/gfortran.dg/coarray_44.f90(revision 0)
+++ gcc/testsuite/gfortran.dg/coarray_44.f90(revision 0)
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options "-fcoarray=single" }
+!
+! PR fortran/70071
+! Based on testcases by Gerhard Steinmetz
+
+program pr70071
+  implicit none
+  integer, allocatable :: z(:)[:,:]
+  allocate (z(2)[1::2,*])  ! { dg-error "Bad array dimension" }
+  allocate (z(1::2)[2,*])  ! { dg-error "Bad array specification in ALLOCATE" }
+end program pr70071


Re: [PATCH GCC8][13/33]Rewrite cost computation of ivopts

2017-05-04 Thread Bin.Cheng
On Wed, Apr 26, 2017 at 11:18 AM, Richard Biener
 wrote:
> On Wed, Apr 26, 2017 at 12:12 PM, Bin.Cheng  wrote:
>> On Wed, Apr 26, 2017 at 10:50 AM, Richard Biener
>>  wrote:
>>> On Tue, Apr 18, 2017 at 12:43 PM, Bin Cheng  wrote:
 Hi,
 This is the major part of this patch series.  It rewrites cost computation 
 of ivopts using tree affine.
 Apart from description given by cover message:
   A) New computation cost model.  Currently, there are big amount code 
 trying to understand
  tree expression and estimate its computation cost.  The model is 
 designed long ago
  for generic tree expressions.  In order to process generic expression 
 (even address
  expression of array/memory references), it has code for too many 
 corner cases.  The
  problem is it's somehow impossible to handle all complicated 
 expressions, even with
  complicated logic in functions like get_computation_cost_at, 
 difference_cost,
  ptr_difference_cost, get_address_cost and so on...  The second 
 problem is it's hard
  to keep cost model consistent among special cases.  As special cases 
 being added
  from time to time, the model is no long unified any more.  There are 
 cases that right
  cost results in bad code, or vice versa, wrong cost results in good 
 code.  Finally,
  it's difficult to add code for new cases.
  This patch introduces a new cost computation model by using tree 
 affine.  Tree exprs
  are lowered to aff_tree which is simple arithmetic operation usually. 
  Code handling
  special cases is no longer necessary, which brings us quite 
 simplicity.  It is also
  easier to compute consistent costs among different expressions using 
 tree affine,
  which gives us a unified cost model.
 This patch also fixes issue that cost computation for address type iv_use 
 is inconsistent
 with how it is re-rewritten in the end.  It greatly simplified cost 
 computation.

 Is it OK?
>>>
>>> The patch is quite hard to follow (diff messes up here -- this is a
>>> case where a context
>> Hi Richard,
>> Thanks for reviewing, attachment is the updated context diff.  It also
>> includes a minor fix handling pre-increment addressing mode,
>> specifically, it adds two lines of code:
>>
>>  if (stmt_after_increment (data->current_loop, cand, use->stmt))
>> ainc_offset += ainc_step;
>>
>>
>>> diff is easier to read).  I trust you on the implementation details
>>> here, the overall structure
>>> looks ok to me.  The only question I have is with regarding to
>>> get_loop_invariant_expr
>>> which seems to be much less sophisticated than before (basically it's
>>> now what was
>>> record_inv_expr?).  I suppose the old functionality is superseeded by
>>> using affines
>>> everywhere else.
>> Yes, previous version tries a lot to cancel common sub expression when
>> representing use with cand.  New version relies on tree affine which
>> is much better.  One problem with invariant expression estimation is
>> we don't know in IVOPT if the inv_expr would be hoisted or not by
>> later passes.  This problem exists all the time, we can only make
>> assumptions here, I think new version is bit more aggressive in
>> recording new inv_expr here.
>
> Thanks.
>
> LGTM.
Trivial update replacing calls to aff_combination_simple_p because of
change in previous patch.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>>>
>>> Otherwise ok.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
 Thanks,
 bin
 2017-04-11  Bin Cheng  

 * tree-ssa-loop-ivopts.c (get_loop_invariant_expr): Simplify.
 (adjust_setup_cost): New parameter supporting round up adjustment.
 (struct address_cost_data): Delete.
 (force_expr_to_var_cost): Don't bound cost with spill_cost.
 (split_address_cost, ptr_difference_cost): Delete.
 (difference_cost, compare_aff_trees, record_inv_expr): Delete.
 (struct ainc_cost_data): New struct.
 (get_address_cost_ainc): New function.
 (get_address_cost, get_computation_cost): Reimplement.
 (determine_group_iv_cost_address): Record inv_expr for all uses of
 a group.
 (determine_group_iv_cost_cond): Call get_loop_invariant_expr.
 (iv_ca_has_deps): Reimplemented to ...
 (iv_ca_more_deps): ... this.  Check if NEW_CP introduces more deps
 than OLD_CP.
 (iv_ca_extend): Call iv_ca_more_deps.
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 203b272..08a78c8 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2911,6 +2911,45 @@ find_inv_vars (struct ivopts_data *data, tree *expr_p, 

Re: [PATCH GCC8][11/33]New interfaces for tree affine

2017-05-04 Thread Bin.Cheng
On Mon, Apr 24, 2017 at 11:43 AM, Richard Biener
 wrote:
> On Tue, Apr 18, 2017 at 12:43 PM, Bin Cheng  wrote:
>> Hi,
>> This patch adds three simple interfaces for tree affine which will be used in
>> cost computation later.
>>
>> Is it OK?
>
>
> +static inline tree
> +aff_combination_type (aff_tree *aff)
>
> misses a function comment.  Please do not introduce new 'static inline'
> function in headers but instead use plain 'inline'.
>
> +/* Return true if AFF is simple enough.  */
> +static inline bool
> +aff_combination_simple_p (aff_tree *aff)
> +{
>
> what is "simple"?  Based on that find a better name.
> "singleton"?  But aff_combination_const_p isn't
> simple_p (for whatever reason).
Patch updated.  The one (13th) depending on this one is updated too.

Thanks,
bin
>
> Richard.
>
>
>
>> Thanks,
>> bin
>> 2017-04-11  Bin Cheng  
>>
>> * tree-affine.h (aff_combination_type): New interface.
>> (aff_combination_const_p, aff_combination_simple_p): New interfaces.
diff --git a/gcc/tree-affine.h b/gcc/tree-affine.h
index b8eb8cc..f9bdcb5 100644
--- a/gcc/tree-affine.h
+++ b/gcc/tree-affine.h
@@ -88,8 +88,15 @@ bool aff_comb_cannot_overlap_p (aff_tree *, const widest_int 
&,
 /* Debugging functions.  */
 void debug_aff (aff_tree *);
 
+/* Return AFF's type.  */
+inline tree
+aff_combination_type (aff_tree *aff)
+{
+  return aff->type;
+}
+
 /* Return true if AFF is actually ZERO.  */
-static inline bool
+inline bool
 aff_combination_zero_p (aff_tree *aff)
 {
   if (!aff)
@@ -101,4 +108,22 @@ aff_combination_zero_p (aff_tree *aff)
   return false;
 }
 
+/* Return true if AFF is actually const.  */
+inline bool
+aff_combination_const_p (aff_tree *aff)
+{
+  return (aff == NULL || aff->n == 0);
+}
+
+/* Return true iff AFF contains one singleton variable.  Users need to
+   make sure AFF points to a valid combination.  */
+inline bool
+aff_combination_singleton_var_p (aff_tree *aff)
+{
+  gcc_assert (aff != NULL);
+
+  return (aff->n == 1
+ && aff->offset == 0
+ && (aff->elts[0].coef == 1 || aff->elts[0].coef == -1));
+}
 #endif /* GCC_TREE_AFFINE_H */


Re: [PATCH, GCC/ARM 2/2, ping] Allow combination of aprofile and rmprofile multilibs

2017-05-04 Thread Thomas Preudhomme

Now that stage1 is open, ping?

Best regards,

Thomas

On 03/01/17 17:23, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 06/12/16 11:35, Thomas Preudhomme wrote:

Ping?

*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow combinations of aprofile and rmprofile values for
--with-multilib-list.
* config/arm/t-multilib: New file.
* config/arm/t-aprofile: Remove initialization of MULTILIB_*
variables.  Remove setting of ISA and floating-point ABI in
MULTILIB_OPTIONS and MULTILIB_DIRNAMES.  Set architecture and FPU in
MULTI_ARCH_OPTS_A and MULTI_ARCH_DIRS_A rather than MULTILIB_OPTIONS
and MULTILIB_DIRNAMES respectively.  Add comment to introduce all
matches.  Add architecture matches for marvel-pj4 and generic-armv7-a
CPU options.
* config/arm/t-rmprofile: Likewise except for the matches changes.
* doc/install.texi (--with-multilib-list): Document the combination of
aprofile and rmprofile values and warn about pitfalls in doing that.

Best regards,

Thomas

On 17/11/16 20:43, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 08/11/16 13:36, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 02/11/16 10:05, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 24/10/16 09:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile
ARM
architectures. This specific patch is concerned with the latter. The patch
works
by moving the bits shared by both aprofile and rmprofile multilib build
(variable initilization as well as ISA and float ABI to build multilib for)
to a
new t-multilib file. Then, based on which profile was requested in
--with-multilib-list option, that files includes t-aprofile and/or
t-rmprofile
where the architecture and FPU to build the multilib for are specified.

Unfortunately the duplication of CPU to A profile architectures could not be
avoided because substitution due to MULTILIB_MATCHES are not transitive.
Therefore, mapping armv7-a to armv7 for rmprofile multilib build does not
have
the expected effect. Two patches were written to allow this using 2
different
approaches but I decided against it because this is not the right solution
IMO.
See caveats below for what I believe is the correct approach.


*** combined build caveats ***

As the documentation in this patch warns, there is a few caveats to using a
combined multilib build due to the way the multilib framework works.

1) For instance, when using only rmprofile the combination of options
-mthumb
-march=armv7 -mfpu=neon the thumb/-march=armv7 multilib but in a combined
multilib build the default multilib would be used. This is because in the
rmprofile build -mfpu=neon is not specified in MULTILIB_OPTION and thus the
option is ignored when considering MULTILIB_REQUIRED entries.

2) Another issue is the fact that aprofile and rmprofile multilib build have
some conflicting requirements in terms of how to map options for which no
multilib is built to another option. (i) A first example of this is the
difference of CPU to architecture mapping mentionned above: rmprofile
multilib
build needs A profile CPUs and architectures to be mapped down to ARMv7 so
that
one of the v7-ar multilib gets chosen in such a case but aprofile needs A
profile architectures to stand on their own because multilibs are built for
several architectures.

(ii) Another example of this is that in aprofile multilib build no
multilib is
built with -mfpu=fpv5-d16 but some multilibs are built with -mfpu=fpv4-d16.
Therefore, aprofile defines a match rule to map fpv5-d16 onto fpv4-d16.
However,
rmprofile multilib profile *does* build some multilibs with -mfpu=fpv5-d16.
This
has the consequence that when building for -mthumb -march=armv7e-m
-mfpu=fpv5-d16 -mfloat-abi=hard the default multilib is chosen because
this is
rewritten into -mthumb -march=armv7e-m -mfpu=fpv5-d16 -mfloat-abi=hard and
there
is no multilib for that.

Both of these issues could be handled by using MULTILIB_REUSE instead of
MULTILIB_MATCHES but this would require a large set of rules. I believe
instead
the right approach is to create a new mechanism to inform GCC on how options
can
be down mapped _when no multilib can be found_ which would require a smaller
set
of rules and would make it explicit that the options are not equivalent. A
patch
will be posted to this effect at a later time.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow combinations of aprofile and rmprofile values
for
--with-multilib-list.
* config/arm/t-multilib: New file.
* config/arm/t-aprofile: Remove initialization of 

Re: [PATCH GCC8][07/33]Offset validity check in address expression

2017-05-04 Thread Bin.Cheng
On Wed, May 3, 2017 at 10:49 AM, Richard Biener
 wrote:
> On Tue, May 2, 2017 at 7:06 PM, Bin.Cheng  wrote:
>> On Mon, Apr 24, 2017 at 11:34 AM, Richard Biener
>>  wrote:
>>> On Tue, Apr 18, 2017 at 12:41 PM, Bin Cheng  wrote:
 Hi,
 For now, we check validity of offset by computing the maximum offset then 
 checking if
 offset is smaller than the max offset.  This is inaccurate, for example, 
 some targets
 may require offset to be aligned by power of 2.  This patch introduces new 
 interface
 checking validity of offset.  It also buffers rtx among different calls.

 Is it OK?
>>>
>>> -  static vec max_offset_list;
>>> -
>>> +  auto_vec addr_list;
>>>as = TYPE_ADDR_SPACE (TREE_TYPE (use->iv->base));
>>>mem_mode = TYPE_MODE (TREE_TYPE (*use->op_p));
>>>
>>> -  num = max_offset_list.length ();
>>> +  num = addr_list.length ();
>>>list_index = (unsigned) as * MAX_MACHINE_MODE + (unsigned) mem_mode;
>>>if (list_index >= num)
>>>
>>> num here is always zero and thus the compare is always true.
>>>
>>> +  addr_list.safe_grow_cleared (list_index + MAX_MACHINE_MODE);
>>> +  for (; num < addr_list.length (); num++)
>>> +   addr_list[num] = NULL;
>>>
>>> the loop is now redundant (safe_grow_cleared)
>>>
>>> +  addr = addr_list[list_index];
>>> +  if (!addr)
>>>  {
>>>
>>> always true again...
>>>
>>> I wonder if you really indented to drop 'static' from addr_list?
>>> There's no caching
>>> across function calls.
>> Right, the redundancy is because I tried to cache across function
>> calls with declarations like:
>>   static unsigned num = 0;
>>   static GTY ((skip)) rtx *addr_list = NULL;
>> But this doesn't work, the addr_list[list_index] still gets corrupted 
>> somehow.
>
> Well, you need GTY (()), not GTY((skip)) on them.  Not sure if it works
> for function-scope decls, you have to check.  Look at whether a GC
> root is created for the variable in gt-tree-ssa-loop-ivopts.h (need to tweak
> GTFILES in the makefile plus include that generated file).  tree-ssa-address.c
> uses a global root for mem_addr_template_list for example.
Thanks for helping, patch updated.
Bootstrap and test on x86_64.  Is it OK?

Thanks,
bin

2017-05-02  Bin Cheng  

* Makefile.in (GTFILES): Add tree-ssa-loop-ivopts.c.
* tree-ssa-loop-ivopts.c (compute_max_addr_offset): Delete.
(addr_list, addr_offset_valid_p): New.
(split_address_groups): Check offset validity with above function.
(gt-tree-ssa-loop-ivopts.h): Include.

>>>
>>>
>>>
 Thanks,
 bin
 2017-04-11  Bin Cheng  

 * tree-ssa-loop-ivopts.c (compute_max_addr_offset): Delete.
 (addr_offset_valid_p): New function.
 (split_address_groups): Check offset validity with above function.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2411671..97259ac 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2484,7 +2484,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/gimple-ssa.h \
   $(srcdir)/tree-chkp.c \
   $(srcdir)/tree-ssanames.c $(srcdir)/tree-eh.c $(srcdir)/tree-ssa-address.c \
-  $(srcdir)/tree-cfg.c \
+  $(srcdir)/tree-cfg.c $(srcdir)/tree-ssa-loop-ivopts.c \
   $(srcdir)/tree-dfa.c \
   $(srcdir)/tree-iterator.c $(srcdir)/gimple-expr.c \
   $(srcdir)/tree-chrec.h \
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 7caa40d..203b272 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2460,67 +2460,36 @@ find_interesting_uses_outside (struct ivopts_data 
*data, edge exit)
 }
 }
 
-/* Compute maximum offset of [base + offset] addressing mode
-   for memory reference represented by USE.  */
+/* Return TRUE if OFFSET is within the range of [base + offset] addressing
+   mode for memory reference represented by USE.  */
 
-static HOST_WIDE_INT
-compute_max_addr_offset (struct iv_use *use)
+static GTY (()) vec *addr_list;
+
+static bool
+addr_offset_valid_p (struct iv_use *use, HOST_WIDE_INT offset)
 {
-  int width;
   rtx reg, addr;
-  HOST_WIDE_INT i, off;
-  unsigned list_index, num;
-  addr_space_t as;
-  machine_mode mem_mode, addr_mode;
-  static vec max_offset_list;
-
-  as = TYPE_ADDR_SPACE (TREE_TYPE (use->iv->base));
-  mem_mode = TYPE_MODE (TREE_TYPE (*use->op_p));
+  unsigned list_index;
+  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (use->iv->base));
+  machine_mode addr_mode, mem_mode = TYPE_MODE (TREE_TYPE (*use->op_p));
 
-  num = max_offset_list.length ();
   list_index = (unsigned) as * MAX_MACHINE_MODE + (unsigned) mem_mode;
-  if (list_index >= num)
-{
-  max_offset_list.safe_grow (list_index + MAX_MACHINE_MODE);
-  for (; num < max_offset_list.length (); num++)
-   max_offset_list[num] = -1;
-}
+  if (list_index >= vec_safe_length (addr_list))
+

[arm-embedded] [PATCH, GCC/x86 mingw32] Add configure option to force wildcard behavior on Windows

2017-05-04 Thread Thomas Preudhomme

Hi,

We've decided to apply the following patch to the ARM/embedded-7-branch as we 
did earlier for the ARM/embedded-6-branch. Patch attached for reference.


Best regards,

Thomas

On 17/02/17 22:52, JonY wrote:

On 02/17/2017 11:31 AM, Thomas Preudhomme wrote:

Here you are:

2017-01-24  Thomas Preud'homme  

* configure.ac (--enable-mingw-wildcard): Add new configurable
feature.
* configure: Regenerate.
* config.in: Regenerate.
* config/i386/driver-mingw32.c: new file.
* config/i386/x-mingw32: Add rule to build driver-mingw32.o.
* config.host: Link driver-mingw32.o on MinGW host.
* doc/install.texi: Document new --enable-mingw-wildcard configure
option.

Must have forgotten to paste it.


Thanks, I'll stage it locally until stage 1 opens.



diff --git a/gcc/ChangeLog.arm b/gcc/ChangeLog.arm
new file mode 100644
index ..d336f6a29a7f68fb938b2fae45f453bd20f35903
--- /dev/null
+++ b/gcc/ChangeLog.arm
@@ -0,0 +1,13 @@
+2017-05-04  Thomas Preud'homme  
+
+	Backport from mainline
+	2017-05-04  Thomas Preud'homme  
+
+	* configure.ac (--enable-mingw-wildcard): Add new configurable feature.
+	* configure: Regenerate.
+	* config.in: Regenerate.
+	* config/i386/driver-mingw32.c: new file.
+	* config/i386/x-mingw32: Add rule to build driver-mingw32.o.
+	* config.host: Link driver-mingw32.o on MinGW host.
+	* doc/install.texi: Document new --enable-mingw-wildcard configure
+	option.
diff --git a/gcc/config.host b/gcc/config.host
index 6b28f3033ef92f1f0e09cc41f3a90be05c5e1e43..5e2db5327e3094a19cd29c81ceb1a9e2b11797c9 100644
--- a/gcc/config.host
+++ b/gcc/config.host
@@ -239,6 +239,7 @@ case ${host} in
 host_xmake_file="${host_xmake_file} i386/x-mingw32"
 host_exeext=.exe
 out_host_hook_obj=host-mingw32.o
+host_extra_gcc_objs="${host_extra_gcc_objs} driver-mingw32.o"
 host_lto_plugin_soname=liblto_plugin-0.dll
 ;;
   x86_64-*-mingw*)
@@ -247,6 +248,7 @@ case ${host} in
 host_xmake_file="${host_xmake_file} i386/x-mingw32"
 host_exeext=.exe
 out_host_hook_obj=host-mingw32.o
+host_extra_gcc_objs="${host_extra_gcc_objs} driver-mingw32.o"
 host_lto_plugin_soname=liblto_plugin-0.dll
 ;;
   i[34567]86-*-darwin* | x86_64-*-darwin*)
diff --git a/gcc/config.in b/gcc/config.in
index d87cb3c9fab0499137702be24085e6f61d7f89e4..bf2aa7b2e7d81a593b72a8d0359864773754ef5d 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -2009,6 +2009,12 @@
 #endif
 
 
+/* Value to set mingw's _dowildcard to. */
+#ifndef USED_FOR_TARGET
+#undef MINGW_DOWILDCARD
+#endif
+
+
 /* Define if host mkdir takes a single argument. */
 #ifndef USED_FOR_TARGET
 #undef MKDIR_TAKES_ONE_ARG
diff --git a/gcc/config/i386/driver-mingw32.c b/gcc/config/i386/driver-mingw32.c
new file mode 100644
index ..b70363ad26a7dc8ffccbb273e46d4dd6de1a6f8c
--- /dev/null
+++ b/gcc/config/i386/driver-mingw32.c
@@ -0,0 +1,26 @@
+/* Host OS specific configuration for the gcc driver.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+
+/* When defined, force the use (if non null) or not (otherwise) of CLI
+   globbing.  */
+#ifdef MINGW_DOWILDCARD
+int _dowildcard = MINGW_DOWILDCARD;
+#endif
diff --git a/gcc/config/i386/x-mingw32 b/gcc/config/i386/x-mingw32
index 6a2d5a5069461b93884fa68ffbcbb13585d24c37..85f2793e5e934b6bf5629667578940cf3f172be3 100644
--- a/gcc/config/i386/x-mingw32
+++ b/gcc/config/i386/x-mingw32
@@ -29,3 +29,6 @@ host-mingw32.o : $(srcdir)/config/i386/host-mingw32.c $(CONFIG_H) $(SYSTEM_H) \
   coretypes.h hosthooks.h hosthooks-def.h toplev.h $(DIAGNOSTIC_H) $(HOOKS_H)
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/i386/host-mingw32.c
+
+driver-mingw32.o : $(srcdir)/config/i386/driver-mingw32.c $(CONFIG_H)
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
diff --git a/gcc/configure b/gcc/configure
index ea73b151a4e1797983665a7f5437136dc8dcb46e..1a295674eb448725e24df85b2fe85e74b32f02d7 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -921,6 +921,7 @@ with_libiconv_prefix
 enable_sjlj_exceptions
 with_gcc_major_version_only
 enable_secureplt
+enable_mingw_wildcard
 

[PR 80622] Treat const pools as initialized in SRA

2017-05-04 Thread Martin Jambor
Hi,

PR 80622 happens because when setting grp_write lazily, the code does
not acknowledge that constant pool bases come initialized and so
contain data even when not written to.  The patch below fixes that but
it also puts a test for pre-initialization into a special function,
uses it at all appropriate places and moves the test in question to an
earlier time, which is a tiny bit cheaper because it may avoid
unnecessary re-invocation of propagate_subaccesses_across_link.

Bootstrapped and tested on x86_64-linux, OK for trunk?

Thanks,

Martin



2017-05-04  Martin Jambor  

PR tree-optimization/80622
* tree-sra.c (comes_initialized_p): New function.
(build_accesses_from_assign): Only set write lazily when
comes_initialized_p is false.
(analyze_access_subtree): Use comes_initialized_p.
(propagate_subaccesses_across_link): Assert !comes_initialized_p
instead of testing for PARM_DECL.

testsuite/
* gcc.dg/tree-ssa/pr80622.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr80622.c | 19 +++
 gcc/tree-sra.c  | 29 +
 2 files changed, 40 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr80622.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr80622.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr80622.c
new file mode 100644
index 000..96dcb8fcdc0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr80622.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct S { int d; char e; int f; char g; } a;
+char c;
+
+int
+main ()
+{
+  struct S b[][1] = {3, 0, 3, 4, 3, 0, 3, 4, 3, 0, 3, 4, 3, 0, 3, 4, 3,
+  0, 3, 4, 3, 0, 3, 4, 3, 0, 3, 4, 3, 0, 3, 4, 3, 0,
+  3, 4, 3, 4, 7, 7, 3, 5, 0, 3, 4, 7, 7, 3, 5, 0, 3,
+  4, 3, 4, 7, 7, 3, 5, 0, 3, 4, 7, 7, 3, 5, 0, 3, 4};
+  a = b[4][0];
+  c = b[4][0].e;
+  if (a.g != 4)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 1606573aead..8ac9c0783ff 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1305,6 +1305,15 @@ disqualify_if_bad_bb_terminating_stmt (gimple *stmt, 
tree lhs, tree rhs)
   return false;
 }
 
+/* Return true if the nature of BASE is such that it contains data even if
+   there is no write to it in the function.  */
+
+static bool
+comes_initialized_p (tree base)
+{
+  return TREE_CODE (base) == PARM_DECL || constant_decl_p (base);
+}
+
 /* Scan expressions occurring in STMT, create access structures for all 
accesses
to candidates for scalarization and remove those candidates which occur in
statements or expressions that prevent them from being split apart.  Return
@@ -1364,8 +1373,10 @@ build_accesses_from_assign (gimple *stmt)
   link->racc = racc;
   add_link_to_rhs (racc, link);
   /* Let's delay marking the areas as written until propagation of accesses
-across link.  */
-  lacc->write = false;
+across link, unless the nature of rhs tells us that its data comes
+from elsewhere.  */
+  if (!comes_initialized_p (racc->base))
+   lacc->write = false;
 }
 
   return lacc || racc;
@@ -2472,8 +2483,7 @@ analyze_access_subtree (struct access *root, struct 
access *parent,
 
   if (!hole || root->grp_total_scalarization)
 root->grp_covered = 1;
-  else if (root->grp_write || TREE_CODE (root->base) == PARM_DECL
-  || constant_decl_p (root->base))
+  else if (root->grp_write || comes_initialized_p (root->base))
 root->grp_unscalarized_data = 1; /* not covered and written to */
   return sth_created;
 }
@@ -2581,11 +2591,14 @@ propagate_subaccesses_across_link (struct access *lacc, 
struct access *racc)
 
   /* IF the LHS is still not marked as being written to, we only need to do so
  if the RHS at this level actually was.  */
-  if (!lacc->grp_write &&
-  (racc->grp_write || TREE_CODE (racc->base) == PARM_DECL))
+  if (!lacc->grp_write)
 {
-  lacc->grp_write = true;
-  ret = true;
+  gcc_checking_assert (!comes_initialized_p (racc->base));
+  if (racc->grp_write)
+   {
+ lacc->grp_write = true;
+ ret = true;
+   }
 }
 
   if (is_gimple_reg_type (lacc->type)
-- 
2.12.2



[PATCH] Small type_hash_canon improvement

2017-05-04 Thread Jakub Jelinek
Hi!

While type_hash_canon in case of reusing an already existing type
ggc_frees the freshly created type, we still waste one type uid
for each such case, this patch attempts to avoid that.
Furthermore, for INTEGER_TYPE we keep around the min and max value
INTEGER_CSTs and the cached values vector (until it is GCed).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-05-04  Jakub Jelinek  

* tree.c (next_type_uid): Change type to unsigned.
(type_hash_canon): Decrement back next_type_uid if
freeing a type node with the highest TYPE_UID.  For INTEGER_TYPEs
also ggc_free TYPE_MIN_VALUE, TYPE_MAX_VALUE and TYPE_CACHED_VALUES
if possible.

--- gcc/tree.c.jj   2017-05-03 16:55:39.688052581 +0200
+++ gcc/tree.c  2017-05-03 18:49:30.662185944 +0200
@@ -151,7 +151,7 @@ static const char * const tree_node_kind
 /* Unique id for next decl created.  */
 static GTY(()) int next_decl_uid;
 /* Unique id for next type created.  */
-static GTY(()) int next_type_uid = 1;
+static GTY(()) unsigned next_type_uid = 1;
 /* Unique id for next debug decl created.  Use negative numbers,
to catch erroneous uses.  */
 static GTY(()) int next_debug_decl_uid;
@@ -7188,6 +7188,19 @@ type_hash_canon (unsigned int hashcode,
 {
   tree t1 = ((type_hash *) *loc)->type;
   gcc_assert (TYPE_MAIN_VARIANT (t1) == t1);
+  if (TYPE_UID (type) + 1 == next_type_uid)
+   --next_type_uid;
+  if (TREE_CODE (type) == INTEGER_TYPE)
+   {
+ if (TYPE_MIN_VALUE (type)
+ && TREE_TYPE (TYPE_MIN_VALUE (type)) == type)
+   ggc_free (TYPE_MIN_VALUE (type));
+ if (TYPE_MAX_VALUE (type)
+ && TREE_TYPE (TYPE_MAX_VALUE (type)) == type)
+   ggc_free (TYPE_MAX_VALUE (type));
+ if (TYPE_CACHED_VALUES_P (type))
+   ggc_free (TYPE_CACHED_VALUES (type));
+   }
   free_node (type);
   return t1;
 }

Jakub


Re: [wwwdocs] gcc-8/porting_to.html

2017-05-04 Thread Thomas Preudhomme

Committed with the suggested changes (see attachment for reference).

Thanks.

Best regards,

Thomas

On 23/03/17 06:47, Gerald Pfeifer wrote:

Hi Thomas,

On Wed, 22 Mar 2017, Thomas Preudhomme wrote:

Is this ok for wwwdocs once [1] is committed in GCC 8 cycle?


+ GCC on Microsoft Windows can now be configured via
+   --enable-mingw-wildcard or
+   --disable-mingw-wildcard to force a specific behavior for
+   GCC itself with regards to supporting or not the wildcard character.

Here I would omit the "or not" which I believe does not work well
in English.

+   Prior versions of GCC would follow the configuration of MinGW runtime.

And here add "the" before "MinGW".

This looks fine to me with these two minor changes, thank you.

Gerald
Index: htdocs/gcc-8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.2
diff -u -r1.2 changes.html
--- htdocs/gcc-8/changes.html	12 Mar 2017 14:25:34 -	1.2
+++ htdocs/gcc-8/changes.html	23 Mar 2017 10:44:35 -
@@ -135,7 +135,16 @@
 
 
 
-
+Windows
+   
+ GCC on Microsoft Windows can now be configured via
+   --enable-mingw-wildcard or
+   --disable-mingw-wildcard to force a specific behavior for
+   GCC itself with regards to supporting the wildcard character.  Prior
+   versions of GCC would follow the configuration of the MinGW runtime.
+   This behavior can still be obtained by not using the above options or by
+   using --enable-mingw-wildcard=platform.
+   
 
 
 


Re: [PATCH 1/4][PR tree-optimization/78496] Don't simplify conditionals too early in VRP

2017-05-04 Thread Jeff Law

On 05/04/2017 04:59 AM, Richard Biener wrote:



I think this is a hack ;)  Basically the issue is that jump-threading
uses ASSERT_EXPRs
at all (which are an implementation detail of VRP).  As far as I
understand it does that
because VRP can do "fancy" things and create ASSERT_EXPRs that do not directly
map to the conditional but to its operand def stmts.

Agreed it's a hack, but perhaps for different reasons.

The only reason we have a threading pass inside VRP is so that we can 
get access to the internal state.  It pre-dates the ability to query any 
kind of range data outside of VRP by a decade.


The long term plan is to drop the threading passes from VRP once we can 
get accurate ranges outside VRP using Andrew's work.  Once we can do 
that, DOM or backward threader can query that data and the jump 
threading pass inside VRP becomes redundant/useless.


The first major milestone for that work will be the ability to drop 
ASSERT_EXPRs completely from the IL, but still get just as accurate 
range information.  ASSERT_EXPRs are really just an implementation 
detail inside VRP.


--


You understanding is slightly wrong however,  The ASSERT_EXPRs and 
conditionals map 100% through propagation and into simplification.  It's 
only during simplification that we lose the direct mapping as we change 
the conditional in order to remove the unnecessary type conversion. 
Threading runs after simplification.


Another approach here would be to simplify the ASSERT_EXPR in the same 
manner that we simplify the conditional.  That may even be better for 
various reasons in the short term.  Let me poke at that.





I have meanwhile factored this "fancieness" out into (ok, bad name...)
register_edge_assert_for which records all these fancy asserts in a
vec.  This is
now used from EVRP:

   gimple *stmt = last_stmt (pred_e->src);
   if (stmt
   && gimple_code (stmt) == GIMPLE_COND
   && (op0 = gimple_cond_lhs (stmt))
   && TREE_CODE (op0) == SSA_NAME
   && (INTEGRAL_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))
   || POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Visiting controlling predicate ");
   print_gimple_stmt (dump_file, stmt, 0, 0);
 }
   /* Entering a new scope.  Try to see if we can find a VR
  here.  */
   tree op1 = gimple_cond_rhs (stmt);
   if (TREE_OVERFLOW_P (op1))
 op1 = drop_tree_overflow (op1);
   tree_code code = gimple_cond_code (stmt);

   auto_vec asserts;
   register_edge_assert_for (op0, pred_e, code, op0, op1, asserts);
   if (TREE_CODE (op1) == SSA_NAME)
 register_edge_assert_for (op1, pred_e, code, op0, op1, asserts);

regular VRP transforms those into its own assert representation in
finish_register_edge_assert_for (though assert_info should be enough
for jump threading).

Yea, this looks like a much simplified form of what Andrew has done.

I wasn't aware of this stuff from evrp, it may be useful to simplify the 
3rd patch in this series (which is still evolving and tries to put in a 
real solution for the conditional equivalence problems).



So I think it should be possible for jump threading to use this new
machinery (and even DOM based jump threading or backwards threading
could make use of this!)

Roughly the plan.

I think Aldy is close to being ready to start review work on the new API 
class for querying ranges.  I think it addresses the big issues we both 
want to see addressed (no more ANTI ranges, no fixed number of intervals 
within a range representation).  Removal of ANTI ranges does 
significantly simplify things on the client side which IMHO should be 
the driver at this point.


In parallel I expect to carve out two more hunks of Andrew's work in the 
near future.  Essentially they're APIs that allow efficient computation 
and mapping of blocks that generate range data for a particular name and 
walkers.  They should allow simplification of some stuff in DOM and 
improve the most glaring weakness in the backwards threader.


jeff


Re: [wwwdocs] gcc-8/porting_to.html

2017-05-04 Thread Thomas Preudhomme

Great, thanks. I'll go and commit the corresponding wwwdocs change.

Best regards,

Thomas

On 04/05/17 12:03, JonY wrote:

On 03/23/2017 10:47 AM, Thomas Preudhomme wrote:

Ack. Please find updated patch as per suggestions.

Best regards,

Thomas



I've applied the changes to GCC 8 trunk as r247588.




[PATCH] Improve VR computation for [x, y] & z or [x, y] | z (PR tree-optimization/80558)

2017-05-04 Thread Jakub Jelinek
Hi!

This patch improves value range computation of BIT_{AND,IOR}_EXPR
with one singleton range and one range_int_cst_p, where the singleton
range has n clear least significant bits, then m set bits and either
that is all it has (i.e. negation of a power of 2), or the bits above
those two sets of bits are the same for all values in the range (i.e.
min and max range have those bits identical).
During x86_64-linux and i686-linux bootstraps together this triggers
214000 times, though I have not actually gathered statistics on whether
the range computed without this patch would be wider in all cases.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-05-04  Jakub Jelinek  

PR tree-optimization/80558
* tree-vrp.c (extract_range_from_binary_expr_1): Optimize
[x, y] op z into [x op, y op z] for op & or | if conditions
are met.

* gcc.dg/tree-ssa/vrp115.c: New test.

--- gcc/tree-vrp.c.jj   2017-04-29 18:13:50.0 +0200
+++ gcc/tree-vrp.c  2017-05-03 16:08:44.525256483 +0200
@@ -3162,8 +3162,59 @@ extract_range_from_binary_expr_1 (value_
  _be_nonzero1,
  _be_nonzero1);
 
+  if (code == BIT_AND_EXPR || code == BIT_IOR_EXPR)
+   {
+ value_range *vr0p = NULL, *vr1p = NULL;
+ if (range_int_cst_singleton_p ())
+   {
+ vr0p = 
+ vr1p = 
+   }
+ else if (range_int_cst_singleton_p ())
+   {
+ vr0p = 
+ vr1p = 
+   }
+ /* For op & or | attempt to optimize:
+[x, y] op z into [x op z, y op z]
+if z is a constant which (for op | its bitwise not) has n
+consecutive least significant bits cleared followed by m 1
+consecutive bits set immediately above it and either
+m + n == precision, or (x >> (m + n)) == (y >> (m + n)).
+The least significant n bits of all the values in the range are
+cleared or set, the m bits above it are preserved and any bits
+above these are required to be the same for all values in the
+range.  */
+ if (vr0p && range_int_cst_p (vr0p))
+   {
+ wide_int w = vr1p->min;
+ int m = 0, n = 0;
+ if (code == BIT_IOR_EXPR)
+   w = ~w;
+ if (wi::eq_p (w, 0))
+   n = TYPE_PRECISION (expr_type);
+ else
+   {
+ n = wi::ctz (w);
+ w = ~(w | wi::mask (n, false, w.get_precision ()));
+ if (wi::eq_p (w, 0))
+   m = TYPE_PRECISION (expr_type) - n;
+ else
+   m = wi::ctz (w) - n;
+   }
+ wide_int mask = wi::mask (m + n, true, w.get_precision ());
+ if (wi::eq_p (mask & vr0p->min, mask & vr0p->max))
+   {
+ min = int_const_binop (code, vr0p->min, vr1p->min);
+ max = int_const_binop (code, vr0p->max, vr1p->min);
+   }
+   }
+   }
+
   type = VR_RANGE;
-  if (code == BIT_AND_EXPR)
+  if (min && max)
+   /* Optimized above already.  */;
+  else if (code == BIT_AND_EXPR)
{
  min = wide_int_to_tree (expr_type,
  must_be_nonzero0 & must_be_nonzero1);
--- gcc/testsuite/gcc.dg/tree-ssa/vrp115.c.jj   2017-05-03 16:12:55.514087451 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp115.c  2017-05-03 16:11:35.0 
+0200
@@ -0,0 +1,50 @@
+/* PR tree-optimization/80558 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+
+void link_error (void);
+
+void
+f1 (int x)
+{
+  if (x >= 5 && x <= 19)
+{
+  x &= -2;
+  if (x < 4 || x > 18)
+   link_error ();
+}
+}
+
+void
+f2 (int x)
+{
+  if (x >= 5 && x <= 19)
+{
+  x |= 7;
+  if (x < 7 || x > 23)
+   link_error ();
+}
+}
+
+void
+f3 (int x)
+{
+  if (x >= -18 && x <= 19)
+{
+  x |= 7;
+  if (x < -17 || x > 23)
+   link_error ();
+}
+}
+
+void
+f4 (int x)
+{
+  if (x >= 1603 && x <= 2015)
+{
+  x &= 496;
+  if (x < 64 || x > 464)
+   link_error ();
+}
+}

Jakub


[PATCH] Improve memory CSE

2017-05-04 Thread Richard Biener

The following improves how we find the latest VUSE which definition
dominates a PHI in get_continuation_for_phi.  Rather than the
very simplistic variant that requires one of the PHI args providing
this we see if walking from any of the PHI args upwards will get
us to such VUSE (with the exception to not handle further PHIs).

In reality somehow caching the "active" VUSE at the end of BBs
would be quite helpful.

Bootstraped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-05-04  Richard Biener  

* tree-ssa-alias.c (get_continuation_for_phi): Improve looking
for the last VUSE which def dominates the PHI.  Directly call
maybe_skip_until.
(get_continuation_for_phi_1): Remove.

* gcc.dg/tree-ssa/ssa-fre-58.c: New testcase.

Index: gcc/tree-ssa-alias.c
===
*** gcc/tree-ssa-alias.c(revision 247581)
--- gcc/tree-ssa-alias.c(working copy)
*** maybe_skip_until (gimple *phi, tree targ
*** 2663,2732 
return true;
  }
  
- /* For two PHI arguments ARG0 and ARG1 try to skip non-aliasing code
-until we hit the phi argument definition that dominates the other one.
-Return that, or NULL_TREE if there is no such definition.  */
- 
- static tree
- get_continuation_for_phi_1 (gimple *phi, tree arg0, tree arg1,
-   ao_ref *ref, unsigned int *cnt,
-   bitmap *visited, bool abort_on_visited,
-   void *(*translate)(ao_ref *, tree, void *, bool *),
-   void *data)
- {
-   gimple *def0 = SSA_NAME_DEF_STMT (arg0);
-   gimple *def1 = SSA_NAME_DEF_STMT (arg1);
-   tree common_vuse;
- 
-   if (arg0 == arg1)
- return arg0;
-   else if (gimple_nop_p (def0)
-  || (!gimple_nop_p (def1)
-  && dominated_by_p (CDI_DOMINATORS,
- gimple_bb (def1), gimple_bb (def0
- {
-   if (maybe_skip_until (phi, arg0, ref, arg1, cnt,
-   visited, abort_on_visited, translate, data))
-   return arg0;
- }
-   else if (gimple_nop_p (def1)
-  || dominated_by_p (CDI_DOMINATORS,
- gimple_bb (def0), gimple_bb (def1)))
- {
-   if (maybe_skip_until (phi, arg1, ref, arg0, cnt,
-   visited, abort_on_visited, translate, data))
-   return arg1;
- }
-   /* Special case of a diamond:
-MEM_1 = ...
-goto (cond) ? L1 : L2
-L1: store1 = ...#MEM_2 = vuse(MEM_1)
-  goto L3
-L2: store2 = ...#MEM_3 = vuse(MEM_1)
-L3: MEM_4 = PHI
-  We were called with the PHI at L3, MEM_2 and MEM_3 don't
-  dominate each other, but still we can easily skip this PHI node
-  if we recognize that the vuse MEM operand is the same for both,
-  and that we can skip both statements (they don't clobber us).
-  This is still linear.  Don't use maybe_skip_until, that might
-  potentially be slow.  */
-   else if ((common_vuse = gimple_vuse (def0))
-  && common_vuse == gimple_vuse (def1))
- {
-   bool disambiguate_only = true;
-   *cnt += 2;
-   if ((!stmt_may_clobber_ref_p_1 (def0, ref)
-  || (translate
-  && (*translate) (ref, arg0, data, _only) == NULL))
- && (!stmt_may_clobber_ref_p_1 (def1, ref)
- || (translate
- && (*translate) (ref, arg1, data, _only) == 
NULL)))
-   return common_vuse;
- }
- 
-   return NULL_TREE;
- }
- 
  
  /* Starting from a PHI node for the virtual operand of the memory reference
 REF find a continuation virtual operand that allows to continue walking
--- 2695,2700 
*** get_continuation_for_phi (gimple *phi, a
*** 2749,2792 
  
/* For two or more arguments try to pairwise skip non-aliasing code
   until we hit the phi argument definition that dominates the other one.  
*/
!   else if (nargs >= 2)
! {
!   tree arg0, arg1;
!   unsigned i;
  
!   /* Find a candidate for the virtual operand which definition
!dominates those of all others.  */
!   arg0 = PHI_ARG_DEF (phi, 0);
!   if (!SSA_NAME_IS_DEFAULT_DEF (arg0))
!   for (i = 1; i < nargs; ++i)
  {
!   arg1 = PHI_ARG_DEF (phi, i);
!   if (SSA_NAME_IS_DEFAULT_DEF (arg1))
  {
!   arg0 = arg1;
!   break;
  }
-   if (dominated_by_p (CDI_DOMINATORS,
-   gimple_bb (SSA_NAME_DEF_STMT (arg0)),
-   gimple_bb (SSA_NAME_DEF_STMT (arg1
- arg0 = arg1;
  }
  
!   /* Then pairwise reduce against the found candidate.  */
!   for (i = 0; i < nargs; ++i)
!   {
! arg1 = PHI_ARG_DEF (phi, i);
! arg0 = get_continuation_for_phi_1 (phi, arg0, arg1, ref,
! 

Re: Fix bootstrap issue with gcc 4.1

2017-05-04 Thread Jan Hubicka
> On Thu, May 4, 2017 at 11:04 AM, Jan Hubicka  wrote:
> >> >
> >> >Sure, I'm not questioning the patch, just wondering if we shouldn't
> >> >improve
> >> >store-merging further (we want to do it anyway for e.g. bitop adjacent
> >> >operations etc.).
> >>
> >> We definitely want to do that.  It should also 'nicely' merge with bswap 
> >> for gathering the load side of a piecewise memory to memory copy.
> >
> > The code we produce now in .optimized is:
> >[15.35%]:
> >   # DEBUG this => _42
> >   MEM[(struct inline_summary *)_42].estimated_self_stack_size = 0;
> >   MEM[(struct inline_summary *)_42].self_size = 0;
> >   _44 = [(struct inline_summary *)_42].self_time;
> >   # DEBUG this => _44
> >   # DEBUG sig => 0
> >   # DEBUG exp => 0
> >   MEM[(struct sreal *)_42 + 16B].m_sig = 0;
> >   MEM[(struct sreal *)_42 + 16B].m_exp = 0;
> >   sreal::normalize (_44);
> >   # DEBUG this => NULL
> >   # DEBUG sig => NULL
> >   # DEBUG exp => NULL
> >   MEM[(struct inline_summary *)_42].min_size = 0;
> >   MEM[(struct inline_summary *)_42].inlinable = 0;
> >   MEM[(struct inline_summary *)_42].contains_cilk_spawn = 0;
> >   MEM[(struct inline_summary *)_42].single_caller = 0;
> >   MEM[(struct inline_summary *)_42].fp_expressions = 0;
> >   MEM[(struct inline_summary *)_42].estimated_stack_size = 0;
> >   MEM[(struct inline_summary *)_42].stack_frame_offset = 0;
> 
> It should handle at least the bitfields here (inlinable, contains_cilk_spawn,
> single_caller and fp_expression).  Ah, I guess it doesn't because the
> padding isn't initialized and it doesn't want to touch it.  Well, just a 
> guess.
> 
> inline_summary is also very badly laid out now with lots of padding.

Yep, I am aware it does not make sense now after reorganization.  I plan to 
break it
into inliner independent part and inliner hints and then I will fix layout ;)

Honza
> 
> >   _45 = [(struct inline_summary *)_42].time;
> >   # DEBUG this => _45
> >   # DEBUG sig => 0
> >   # DEBUG exp => 0
> >   MEM[(struct sreal *)_42 + 56B].m_sig = 0;
> >   MEM[(struct sreal *)_42 + 56B].m_exp = 0;
> >   sreal::normalize (_45);
> >   # DEBUG this => NULL
> >   # DEBUG sig => NULL
> >   # DEBUG exp => NULL
> >   MEM[(struct inline_summary *)_42].size = 0;
> >   MEM[(void *)_42 + 80B] = 0;
> >   MEM[(void *)_42 + 88B] = 0;
> >   MEM[(void *)_42 + 96B] = 0;
> >   MEM[(struct predicate * *)_42 + 104B] = 0;
> >   MEM[(void *)_42 + 112B] = 0;
> >   MEM[(void *)_42 + 120B] = 0;
> >   goto ; [100.00%]
> >
> > so indeed it is not quite pretty at all. Even the bitfields are split
> > and there is offlined call to normlize zero in sreal.
> > I wonder why inliner does not see htis as obviously good inlining 
> > oppurtunity.
> > I will look into that problem. For store merging it would indeed be
> > very nice to improve this as well.
> >
> > Honza
> >>
> >> Richard.
> >>
> >> >
> >> > Jakub


[PATCH] RISC-V: Unify indention in riscv.md

2017-05-04 Thread Palmer Dabbelt
From: Kito Cheng 

This contains only whitespace changes.

gcc/ChangeLog

2017-05-04  Kito Cheng  

* config/riscv/riscv.md: Unify indentation.
---
 gcc/ChangeLog |   4 +
 gcc/config/riscv/riscv.md | 559 --
 2 files changed, 291 insertions(+), 272 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8548845..fc85689 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2017-05-04  Kito Cheng  
+
+   * config/riscv/riscv.md: Unify indentation.
+
 2017-05-04  Richard Sandiford  
 
* tree-ssa-loop-manip.c (niter_for_unrolled_loop): Add commentary
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4cbb243..18dba3b 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -398,47 +398,47 @@
 ;;
 
 (define_insn "add3"
-  [(set (match_operand:ANYF 0 "register_operand" "=f")
-   (plus:ANYF (match_operand:ANYF 1 "register_operand" "f")
-  (match_operand:ANYF 2 "register_operand" "f")))]
+  [(set (match_operand:ANYF0 "register_operand" "=f")
+   (plus:ANYF (match_operand:ANYF 1 "register_operand" " f")
+  (match_operand:ANYF 2 "register_operand" " f")))]
   "TARGET_HARD_FLOAT"
   "fadd.\t%0,%1,%2"
   [(set_attr "type" "fadd")
(set_attr "mode" "")])
 
 (define_insn "addsi3"
-  [(set (match_operand:SI 0 "register_operand" "=r,r")
-   (plus:SI (match_operand:SI 1 "register_operand" "r,r")
- (match_operand:SI 2 "arith_operand" "r,I")))]
+  [(set (match_operand:SI  0 "register_operand" "=r,r")
+   (plus:SI (match_operand:SI 1 "register_operand" " r,r")
+(match_operand:SI 2 "arith_operand"" r,I")))]
   ""
   { return TARGET_64BIT ? "addw\t%0,%1,%2" : "add\t%0,%1,%2"; }
   [(set_attr "type" "arith")
(set_attr "mode" "SI")])
 
 (define_insn "adddi3"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
-   (plus:DI (match_operand:DI 1 "register_operand" "r,r")
- (match_operand:DI 2 "arith_operand" "r,I")))]
+  [(set (match_operand:DI  0 "register_operand" "=r,r")
+   (plus:DI (match_operand:DI 1 "register_operand" " r,r")
+(match_operand:DI 2 "arith_operand"" r,I")))]
   "TARGET_64BIT"
   "add\t%0,%1,%2"
   [(set_attr "type" "arith")
(set_attr "mode" "DI")])
 
 (define_insn "*addsi3_extended"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI   0 "register_operand" "=r,r")
(sign_extend:DI
-(plus:SI (match_operand:SI 1 "register_operand" "r,r")
- (match_operand:SI 2 "arith_operand" "r,I"]
+(plus:SI (match_operand:SI 1 "register_operand" " r,r")
+ (match_operand:SI 2 "arith_operand"" r,I"]
   "TARGET_64BIT"
   "addw\t%0,%1,%2"
   [(set_attr "type" "arith")
(set_attr "mode" "SI")])
 
 (define_insn "*addsi3_extended2"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI   0 "register_operand" "=r,r")
(sign_extend:DI
- (subreg:SI (plus:DI (match_operand:DI 1 "register_operand" "r,r")
- (match_operand:DI 2 "arith_operand" "r,I"))
+ (subreg:SI (plus:DI (match_operand:DI 1 "register_operand" " r,r")
+ (match_operand:DI 2 "arith_operand"" r,I"))
 0)))]
   "TARGET_64BIT"
   "addw\t%0,%1,%2"
@@ -454,47 +454,47 @@
 ;;
 
 (define_insn "sub3"
-  [(set (match_operand:ANYF 0 "register_operand" "=f")
-   (minus:ANYF (match_operand:ANYF 1 "register_operand" "f")
-   (match_operand:ANYF 2 "register_operand" "f")))]
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+   (minus:ANYF (match_operand:ANYF 1 "register_operand" " f")
+   (match_operand:ANYF 2 "register_operand" " f")))]
   "TARGET_HARD_FLOAT"
   "fsub.\t%0,%1,%2"
   [(set_attr "type" "fadd")
(set_attr "mode" "")])
 
 (define_insn "subdi3"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-   (minus:DI (match_operand:DI 1 "reg_or_0_operand" "rJ")
-  (match_operand:DI 2 "register_operand" "r")))]
+  [(set (match_operand:DI 0"register_operand" "= r")
+   (minus:DI (match_operand:DI 1  "reg_or_0_operand" " rJ")
+  (match_operand:DI 2 "register_operand" "  r")))]
   "TARGET_64BIT"
   "sub\t%0,%z1,%2"
   [(set_attr "type" "arith")
(set_attr "mode" "DI")])
 
 (define_insn "subsi3"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (minus:SI (match_operand:SI 1 "reg_or_0_operand" "rJ")
-  (match_operand:SI 2 "register_operand" "r")))]
+  [(set (match_operand:SI   0 "register_operand" "= r")
+   (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ")
+  

[PATCH] RISC-V: Add -mstrict-align option

2017-05-04 Thread Palmer Dabbelt
From: Andrew Waterman 

The RISC-V user ISA permits misaligned accesses, but they may trap
and be emulated.  That emulation software needs to be compiled assuming
strict alignment.

Even when strict alignment is not required, set SLOW_UNALIGNED_ACCESS
based upon -mtune to avoid a performance pitfall.

gcc/ChangeLog:

2017-05-04  Andrew Waterman  

* config/riscv/riscv.opt (mstrict-align): New option.
* config/riscv/riscv.h (STRICT_ALIGNMENT): Use it.  Update comment.
(SLOW_UNALIGNED_ACCESS): Define.
(riscv_slow_unaligned_access): Declare.
* config/riscv/riscv.c (riscv_tune_info): Add slow_unaligned_access
field.
(riscv_slow_unaligned_access): New variable.
(rocket_tune_info): Set slow_unaligned_access to true.
(optimize_size_tune_info): Set slow_unaligned_access to false.
(riscv_cpu_info_table): Add entry for optimize_size_tune_info.
(riscv_valid_lo_sum_p): Use TARGET_STRICT_ALIGN.
(riscv_option_override): Set riscv_slow_unaligned_access.
* doc/invoke.texi: Add -mstrict-align to RISC-V.
---
 gcc/ChangeLog  | 16 
 gcc/config/riscv/riscv.c   | 20 +---
 gcc/config/riscv/riscv.h   | 10 ++
 gcc/config/riscv/riscv.opt |  4 
 gcc/doc/invoke.texi|  6 ++
 5 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fc85689..6b82034 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,19 @@
+2017-05-04  Andrew Waterman  
+
+   * config/riscv/riscv.opt (mstrict-align): New option.
+   * config/riscv/riscv.h (STRICT_ALIGNMENT): Use it.  Update comment.
+   (SLOW_UNALIGNED_ACCESS): Define.
+   (riscv_slow_unaligned_access): Declare.
+   * config/riscv/riscv.c (riscv_tune_info): Add slow_unaligned_access
+   field.
+   (riscv_slow_unaligned_access): New variable.
+   (rocket_tune_info): Set slow_unaligned_access to true.
+   (optimize_size_tune_info): Set slow_unaligned_access to false.
+   (riscv_cpu_info_table): Add entry for optimize_size_tune_info.
+   (riscv_valid_lo_sum_p): Use TARGET_STRICT_ALIGN.
+   (riscv_option_override): Set riscv_slow_unaligned_access.
+   * doc/invoke.texi: Add -mstrict-align to RISC-V.
+
 2017-05-04  Kito Cheng  
 
* config/riscv/riscv.md: Unify indentation.
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index d5928c3..f7fec4b 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -255,6 +255,7 @@ struct riscv_tune_info
   unsigned short issue_rate;
   unsigned short branch_cost;
   unsigned short memory_cost;
+  bool slow_unaligned_access;
 };
 
 /* Information about one CPU we know about.  */
@@ -268,6 +269,9 @@ struct riscv_cpu_info {
 
 /* Global variables for machine-dependent things.  */
 
+/* Whether unaligned accesses execute very slowly.  */
+bool riscv_slow_unaligned_access;
+
 /* Which tuning parameters to use.  */
 static const struct riscv_tune_info *tune_info;
 
@@ -301,7 +305,8 @@ static const struct riscv_tune_info rocket_tune_info = {
   {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
   1,   /* issue_rate */
   3,   /* branch_cost */
-  5/* memory_cost */
+  5,   /* memory_cost */
+  true,/* 
slow_unaligned_access */
 };
 
 /* Costs to use when optimizing for size.  */
@@ -313,12 +318,14 @@ static const struct riscv_tune_info 
optimize_size_tune_info = {
   {COSTS_N_INSNS (1), COSTS_N_INSNS (1)},  /* int_div */
   1,   /* issue_rate */
   1,   /* branch_cost */
-  2/* memory_cost */
+  2,   /* memory_cost */
+  false,   /* slow_unaligned_access */
 };
 
 /* A table describing all the processors GCC knows about.  */
 static const struct riscv_cpu_info riscv_cpu_info_table[] = {
   { "rocket", _tune_info },
+  { "size", _size_tune_info },
 };
 
 /* Return the riscv_cpu_info entry for the given name string.  */
@@ -726,7 +733,8 @@ riscv_valid_lo_sum_p (enum riscv_symbol_type sym_type, enum 
machine_mode mode)
   /* We may need to split multiword moves, so make sure that each word
  can be accessed without inducing a carry.  */
   if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
-  && GET_MODE_BITSIZE (mode) > GET_MODE_ALIGNMENT (mode))
+  && (!TARGET_STRICT_ALIGN
+ || GET_MODE_BITSIZE (mode) > GET_MODE_ALIGNMENT (mode)))
 return false;
 
   return true;
@@ -3773,6 +3781,12 @@ riscv_option_override (void)
 

Re: Handle data dependence relations with different bases

2017-05-04 Thread Richard Biener
On Thu, May 4, 2017 at 2:12 PM, Richard Biener
 wrote:
> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
>  wrote:
>> This patch tries to calculate conservatively-correct distance
>> vectors for two references whose base addresses are not the same.
>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
>> isn't guaranteed to occur.
>>
>> The motivating example is:
>>
>>   struct s { int x[8]; };
>>   void
>>   f (struct s *a, struct s *b)
>>   {
>> for (int i = 0; i < 8; ++i)
>>   a->x[i] += b->x[i];
>>   }
>>
>> in which the "a" and "b" accesses are either independent or have a
>> dependence distance of 0 (assuming -fstrict-aliasing).  Neither case
>> prevents vectorisation, so we can vectorise without an alias check.
>>
>> I'd originally wanted to do the same thing for arrays as well, e.g.:
>>
>>   void
>>   f (int a[][8], struct b[][8])
>>   {
>> for (int i = 0; i < 8; ++i)
>>   a[0][i] += b[0][i];
>>   }
>>
>> I think this is valid because C11 6.7.6.2/6 says:
>>
>>   For two array types to be compatible, both shall have compatible
>>   element types, and if both size specifiers are present, and are
>>   integer constant expressions, then both size specifiers shall have
>>   the same constant value.
>>
>> So if we access an array through an int (*)[8], it must have type X[8]
>> or X[], where X is compatible with int.  It doesn't seem possible in
>> either case for "a[0]" and "b[0]" to overlap when "a != b".
>>
>> However, Richard B said that (at least in gimple) we support arbitrary
>> overlap of arrays and allow arrays to be accessed with different
>> dimensionality.  There are examples of this in PR50067.  I've therefore
>> only handled references that end in a structure field access.
>>
>> There are two ways of handling these dependences in the vectoriser:
>> use them to limit VF, or check at runtime as before.  I've gone for
>> the approach of checking at runtime if we can, to avoid limiting VF
>> unnecessarily.  We still fall back to a VF cap when runtime checks
>> aren't allowed.
>>
>> The patch tests whether we queued an alias check with a dependence
>> distance of X and then picked a VF <= X, in which case it's safe to
>> drop the alias check.  Since vect_prune_runtime_alias_check_list can
>> be called twice with different VF for the same loop, it's no longer
>> safe to clear may_alias_ddrs on exit.  Instead we should use
>> comp_alias_ddrs to check whether versioning is necessary.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> You seem to do your "fancy" thing but also later compute the old
> base equality anyway (for same_base_p).  It looks to me for this
> case the new fancy code can be simply skipped, keeping num_dimensions
> as before?
>
> +  /* Try to approach equal type sizes.  */
> +  if (!COMPLETE_TYPE_P (type_a)
> + || !COMPLETE_TYPE_P (type_b)
> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
> +   break;
>
> ah, interesting idea to avoid a quadratic search.  Note that you should
> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
> as they are used for type-punning.  I see
> nonoverlapping_component_refs_of_decl_p
> should simply skip ARRAY_REFs - but I also see there:
>
>   /* ??? We cannot simply use the type of operand #0 of the refs here
>  as the Fortran compiler smuggles type punning into COMPONENT_REFs
>  for common blocks instead of using unions like everyone else.  */
>   tree type1 = DECL_CONTEXT (field1);
>   tree type2 = DECL_CONTEXT (field2);
>
> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility.
> You also may not use types_compatible_p here as for LTO that is _way_ too
> lax for aggregates.  The above uses
>
>   /* We cannot disambiguate fields in a union or qualified union.  */
>   if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>  return false;
>
> so you should also bail out on unions here, rather than the check you do 
> later.
>
> You seem to rely on getting an access_fn entry for each handled_component_p.
> It looks like this is the case -- we even seem to stop at unions (with the 
> same
> fortran "issue").  I'm not sure that's the best thing to do but you
> rely on that.
>
> I don't understand the looping, it needs more comments.  You seem to be
> looking for the innermost compatible RECORD_TYPE but then num_dimensions
> is how many compatible refs you found on the way (with incompatible ones
> not counting?!).  What about an inner varying array of structs?  This seems to
> be disregarded in the analysis now?  Thus, a[i].s.b[i].j vs. __real
> b[i].s.b[i].j?
>
> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p
> conveniently start from the other
> end of the ref here.

That said, for the motivational cases we either have one ref having
more 

Re: C PATCH to fix missing -Wlogical-op warning (PR c/80525)

2017-05-04 Thread Marek Polacek
On Thu, May 04, 2017 at 02:13:24PM +0200, Richard Biener wrote:
> On Thu, May 4, 2017 at 2:11 PM, Marek Polacek  wrote:
> > On Thu, May 04, 2017 at 12:42:03PM +0200, Richard Biener wrote:
> >> > +static tree
> >> > +unwrap_c_maybe_const (tree *tp, int *walk_subtrees, void *)
> >> > +{
> >> > +  if (TREE_CODE (*tp) == C_MAYBE_CONST_EXPR)
> >> > +{
> >> > +  *tp = C_MAYBE_CONST_EXPR_EXPR (*tp);
> >> > +  /* C_MAYBE_CONST_EXPRs don't nest.  */
> >> > +  *walk_subtrees = false;
> >>
> >> This changes trees in-place -- do you need to operate on a copy?
> >
> > Ugh, yes.  But I can't simply copy_node, because that creates new VAR_DECLs,
> > and operand_equal_p would consider them unequal.  Hmm...  We need something
> > else.
> 
> unshare_expr?

Yeah, so:

2017-05-04  Marek Polacek  

PR c/80525
* c-warn.c (unwrap_c_maybe_const): New.
(warn_logical_operator): Call it.

* c-c++-common/Wlogical-op-1.c: Don't use -fwrapv anymore.
* c-c++-common/Wlogical-op-2.c: New test.

diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
index 45dd583..aa0cfa9 100644
--- gcc/c-family/c-warn.c
+++ gcc/c-family/c-warn.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "asan.h"
 #include "gcc-rich-location.h"
+#include "gimplify.h"
 
 /* Print a warning if a constant expression had overflow in folding.
Invoke this function on every expression that the language
@@ -112,6 +113,21 @@ overflow_warning (location_t loc, tree value)
 }
 }
 
+/* Helper function for walk_tree.  Unwrap C_MAYBE_CONST_EXPRs in an expression
+   pointed to by TP.  */
+
+static tree
+unwrap_c_maybe_const (tree *tp, int *walk_subtrees, void *)
+{
+  if (TREE_CODE (*tp) == C_MAYBE_CONST_EXPR)
+{
+  *tp = C_MAYBE_CONST_EXPR_EXPR (*tp);
+  /* C_MAYBE_CONST_EXPRs don't nest.  */
+  *walk_subtrees = false;
+}
+  return NULL_TREE;
+}
+
 /* Warn about uses of logical || / && operator in a context where it
is likely that the bitwise equivalent was intended by the
programmer.  We have seen an expression in which CODE is a binary
@@ -189,11 +205,11 @@ warn_logical_operator (location_t location, enum 
tree_code code, tree type,
  (with OR) or trivially false (with AND).  If so, do not warn.
  This is a common idiom for testing ranges of data types in
  portable code.  */
+  op_left = unshare_expr (op_left);
+  walk_tree_without_duplicates (_left, unwrap_c_maybe_const, NULL);
   lhs = make_range (op_left, _p, , , _overflow_p);
   if (!lhs)
 return;
-  if (TREE_CODE (lhs) == C_MAYBE_CONST_EXPR)
-lhs = C_MAYBE_CONST_EXPR_EXPR (lhs);
 
   /* If this is an OR operation, invert both sides; now, the result
  should be always false to get a warning.  */
@@ -204,11 +220,11 @@ warn_logical_operator (location_t location, enum 
tree_code code, tree type,
   if (tem && integer_zerop (tem))
 return;
 
+  op_right = unshare_expr (op_right);
+  walk_tree_without_duplicates (_right, unwrap_c_maybe_const, NULL);
   rhs = make_range (op_right, _p, , , _overflow_p);
   if (!rhs)
 return;
-  if (TREE_CODE (rhs) == C_MAYBE_CONST_EXPR)
-rhs = C_MAYBE_CONST_EXPR_EXPR (rhs);
 
   /* If this is an OR operation, invert both sides; now, the result
  should be always false to get a warning.  */
diff --git gcc/testsuite/c-c++-common/Wlogical-op-1.c 
gcc/testsuite/c-c++-common/Wlogical-op-1.c
index e89a35a..c5f992a 100644
--- gcc/testsuite/c-c++-common/Wlogical-op-1.c
+++ gcc/testsuite/c-c++-common/Wlogical-op-1.c
@@ -1,8 +1,6 @@
 /* PR c/63357 */
 /* { dg-do compile } */
-/* For -fwrapv see PR80525, xfailing the subtest isn't possible as it passes
-   with the C++ FE which doesn't have maybe_const_expr.  */
-/* { dg-options "-fwrapv -Wlogical-op" } */
+/* { dg-options "-Wlogical-op" } */
 
 #ifndef __cplusplus
 # define bool _Bool
diff --git gcc/testsuite/c-c++-common/Wlogical-op-2.c 
gcc/testsuite/c-c++-common/Wlogical-op-2.c
index e69de29..6360ef9 100644
--- gcc/testsuite/c-c++-common/Wlogical-op-2.c
+++ gcc/testsuite/c-c++-common/Wlogical-op-2.c
@@ -0,0 +1,12 @@
+/* PR c/80525 */
+/* { dg-do compile } */
+/* { dg-options "-Wlogical-op" } */
+
+int
+fn (int a, int b)
+{
+  if ((a + 1) && (a + 1)) /* { dg-warning "logical .and. of equal expressions" 
} */
+return a;
+  if ((a + 1) || (a + 1)) /* { dg-warning "logical .or. of equal expressions" 
} */
+return b;
+}

Marek


Re: C PATCH to fix missing -Wlogical-op warning (PR c/80525)

2017-05-04 Thread Richard Biener
On Thu, May 4, 2017 at 2:11 PM, Marek Polacek  wrote:
> On Thu, May 04, 2017 at 12:42:03PM +0200, Richard Biener wrote:
>> > +static tree
>> > +unwrap_c_maybe_const (tree *tp, int *walk_subtrees, void *)
>> > +{
>> > +  if (TREE_CODE (*tp) == C_MAYBE_CONST_EXPR)
>> > +{
>> > +  *tp = C_MAYBE_CONST_EXPR_EXPR (*tp);
>> > +  /* C_MAYBE_CONST_EXPRs don't nest.  */
>> > +  *walk_subtrees = false;
>>
>> This changes trees in-place -- do you need to operate on a copy?
>
> Ugh, yes.  But I can't simply copy_node, because that creates new VAR_DECLs,
> and operand_equal_p would consider them unequal.  Hmm...  We need something
> else.

unshare_expr?
>
> Marek


Re: C PATCH to fix missing -Wlogical-op warning (PR c/80525)

2017-05-04 Thread Marek Polacek
On Thu, May 04, 2017 at 12:42:03PM +0200, Richard Biener wrote:
> > +static tree
> > +unwrap_c_maybe_const (tree *tp, int *walk_subtrees, void *)
> > +{
> > +  if (TREE_CODE (*tp) == C_MAYBE_CONST_EXPR)
> > +{
> > +  *tp = C_MAYBE_CONST_EXPR_EXPR (*tp);
> > +  /* C_MAYBE_CONST_EXPRs don't nest.  */
> > +  *walk_subtrees = false;
> 
> This changes trees in-place -- do you need to operate on a copy?

Ugh, yes.  But I can't simply copy_node, because that creates new VAR_DECLs,
and operand_equal_p would consider them unequal.  Hmm...  We need something
else.

Marek


Re: Handle data dependence relations with different bases

2017-05-04 Thread Richard Biener
On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
 wrote:
> This patch tries to calculate conservatively-correct distance
> vectors for two references whose base addresses are not the same.
> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
> isn't guaranteed to occur.
>
> The motivating example is:
>
>   struct s { int x[8]; };
>   void
>   f (struct s *a, struct s *b)
>   {
> for (int i = 0; i < 8; ++i)
>   a->x[i] += b->x[i];
>   }
>
> in which the "a" and "b" accesses are either independent or have a
> dependence distance of 0 (assuming -fstrict-aliasing).  Neither case
> prevents vectorisation, so we can vectorise without an alias check.
>
> I'd originally wanted to do the same thing for arrays as well, e.g.:
>
>   void
>   f (int a[][8], struct b[][8])
>   {
> for (int i = 0; i < 8; ++i)
>   a[0][i] += b[0][i];
>   }
>
> I think this is valid because C11 6.7.6.2/6 says:
>
>   For two array types to be compatible, both shall have compatible
>   element types, and if both size specifiers are present, and are
>   integer constant expressions, then both size specifiers shall have
>   the same constant value.
>
> So if we access an array through an int (*)[8], it must have type X[8]
> or X[], where X is compatible with int.  It doesn't seem possible in
> either case for "a[0]" and "b[0]" to overlap when "a != b".
>
> However, Richard B said that (at least in gimple) we support arbitrary
> overlap of arrays and allow arrays to be accessed with different
> dimensionality.  There are examples of this in PR50067.  I've therefore
> only handled references that end in a structure field access.
>
> There are two ways of handling these dependences in the vectoriser:
> use them to limit VF, or check at runtime as before.  I've gone for
> the approach of checking at runtime if we can, to avoid limiting VF
> unnecessarily.  We still fall back to a VF cap when runtime checks
> aren't allowed.
>
> The patch tests whether we queued an alias check with a dependence
> distance of X and then picked a VF <= X, in which case it's safe to
> drop the alias check.  Since vect_prune_runtime_alias_check_list can
> be called twice with different VF for the same loop, it's no longer
> safe to clear may_alias_ddrs on exit.  Instead we should use
> comp_alias_ddrs to check whether versioning is necessary.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

You seem to do your "fancy" thing but also later compute the old
base equality anyway (for same_base_p).  It looks to me for this
case the new fancy code can be simply skipped, keeping num_dimensions
as before?

+  /* Try to approach equal type sizes.  */
+  if (!COMPLETE_TYPE_P (type_a)
+ || !COMPLETE_TYPE_P (type_b)
+ || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
+ || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
+   break;

ah, interesting idea to avoid a quadratic search.  Note that you should
conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
as they are used for type-punning.  I see
nonoverlapping_component_refs_of_decl_p
should simply skip ARRAY_REFs - but I also see there:

  /* ??? We cannot simply use the type of operand #0 of the refs here
 as the Fortran compiler smuggles type punning into COMPONENT_REFs
 for common blocks instead of using unions like everyone else.  */
  tree type1 = DECL_CONTEXT (field1);
  tree type2 = DECL_CONTEXT (field2);

so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility.
You also may not use types_compatible_p here as for LTO that is _way_ too
lax for aggregates.  The above uses

  /* We cannot disambiguate fields in a union or qualified union.  */
  if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
 return false;

so you should also bail out on unions here, rather than the check you do later.

You seem to rely on getting an access_fn entry for each handled_component_p.
It looks like this is the case -- we even seem to stop at unions (with the same
fortran "issue").  I'm not sure that's the best thing to do but you
rely on that.

I don't understand the looping, it needs more comments.  You seem to be
looking for the innermost compatible RECORD_TYPE but then num_dimensions
is how many compatible refs you found on the way (with incompatible ones
not counting?!).  What about an inner varying array of structs?  This seems to
be disregarded in the analysis now?  Thus, a[i].s.b[i].j vs. __real
b[i].s.b[i].j?

nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p
conveniently start from the other
end of the ref here.

Richard.

> Thanks,
> Richard
>
>
> gcc/
> 2017-05-03  Richard Sandiford  
>
> * tree-data-ref.h (subscript): Add access_fn field.
> (data_dependence_relation): Add could_be_independent_p.
> (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros.
> 

Re: Handle data dependence relations with different bases

2017-05-04 Thread Richard Sandiford
"Bin.Cheng"  writes:
> On Thu, May 4, 2017 at 11:06 AM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford
>>>  wrote:
 Index: gcc/tree-data-ref.c
 ===
 --- gcc/tree-data-ref.c 2017-02-23 19:54:15.0 +
 +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100
 @@ -123,8 +123,7 @@ Software Foundation; either version 3, o
  } dependence_stats;

 static bool subscript_dependence_tester_1 (struct
> data_dependence_relation *,
 -  struct data_reference *,
 -  struct data_reference *,
 +  unsigned int, unsigned int,
struct loop *);
>>> As mentioned, how about passing access_fn directly, rather than less
>>> meaningful 0/1 values?
>>
>> The problem is that access_fn is a property of the individual
>> subscripts, whereas this is operating on a full data_reference.
>>
>> One alternative would be to use conditions like:
>>
>>   first_is_a ? SUB_ACCESS_FN_A (sub) : SUB_ACCESS_FN_B (sub)
>>
>> but IMO that's less readable than the existing:
>>
>>   SUB_ACCESS_FN (sub, index)
>>
>> Or we could have individual access_fn arrays for A and B, separate
>> from the main subscript array, but that would mean allocating three
>> arrays instead of one.
> Thanks for explanation, I see the problem now.  Even the latter
> sequence could be different for A and B, there should have the same
> number index?  If that's the case, is it possible just recording the
> starting position (or length) in DR_ACCESS_FN and use that information
> to access to access_fn vector.  This can save the copy in subscript.
> Anyway, this is not am important problem.

I think that would mean trading fields in the subscript for fields
in the main ddr structure.  One advantage of doing it in the subscript
is that those are freed after the analysis is complete, whereas the
ddr stays around until the caller has finished with it.

 + latter sequence.  */
 +  unsigned int start_a = 0;
 +  unsigned int start_b = 0;
 +  unsigned int num_dimensions = 0;
 +  unsigned int struct_start_a = 0;
 +  unsigned int struct_start_b = 0;
 +  unsigned int struct_num_dimensions = 0;
 +  unsigned int index_a = 0;
 +  unsigned int index_b = 0;
 +  tree next_ref_a = DR_REF (a);
 +  tree next_ref_b = DR_REF (b);
 +  tree struct_ref_a = NULL_TREE;
 +  tree struct_ref_b = NULL_TREE;
 +  while (index_a < num_dimensions_a && index_b < num_dimensions_b)
 +{
 +  gcc_checking_assert (handled_component_p (next_ref_a));
 +  gcc_checking_assert (handled_component_p (next_ref_b));
 +  tree outer_ref_a = TREE_OPERAND (next_ref_a, 0);
 +  tree outer_ref_b = TREE_OPERAND (next_ref_b, 0);
 +  tree type_a = TREE_TYPE (outer_ref_a);
 +  tree type_b = TREE_TYPE (outer_ref_b);
 +  if (types_compatible_p (type_a, type_b))
 +   {
 + /* This pair of accesses belong to a suitable sequence.  */
 + if (start_a + num_dimensions != index_a
 + || start_b + num_dimensions != index_b)
 +   {
 + /* Start a new sequence here.  */
 + start_a = index_a;
 + start_b = index_b;
 + num_dimensions = 0;
 +   }
 + num_dimensions += 1;
 + if (TREE_CODE (type_a) == RECORD_TYPE)
 +   {
 + struct_start_a = start_a;
 + struct_start_b = start_b;
 + struct_num_dimensions = num_dimensions;
 + struct_ref_a = outer_ref_a;
 + struct_ref_b = outer_ref_b;
 +   }
 + next_ref_a = outer_ref_a;
 + next_ref_b = outer_ref_b;
 + index_a += 1;
 + index_b += 1;
 + continue;
 +   }
 +  /* Try to approach equal type sizes.  */
 +  if (!COMPLETE_TYPE_P (type_a)
 + || !COMPLETE_TYPE_P (type_b)
 + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
 + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
 +   break;
 + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT
> (type_a));
 + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT
> (type_b));
 +  if (size_a <= size_b)
 +   {
 + index_a += 1;
 + next_ref_a = outer_ref_a;
 +   }
 +  if (size_b <= size_a)
 +   {
 + index_b += 1;
 + next_ref_b = outer_ref_b;
 +   }
  }

 -  /* If the references do not access the same 

Re: [RFC][PATCH] Introduce -fdump*-folding

2017-05-04 Thread Martin Liška

On 05/04/2017 12:40 PM, Richard Biener wrote:

On Thu, May 4, 2017 at 11:22 AM, Martin Liška  wrote:

On 05/03/2017 12:12 PM, Richard Biener wrote:


On Wed, May 3, 2017 at 10:10 AM, Martin Liška  wrote:


Hello

Last release cycle I spent quite some time with reading of IVOPTS pass
dump file. Using -fdump*-details causes to generate a lot of 'Applying
pattern'
lines, which can make reading of a dump file more complicated.

There are stats for tramp3d with -O2 and -fdump-tree-all-details.
Percentage number
shows how many lines are of the aforementioned pattern:

tramp3d-v4.cpp.164t.ivopts: 6.34%
  tramp3d-v4.cpp.091t.ccp2: 5.04%
  tramp3d-v4.cpp.093t.cunrolli: 4.41%
  tramp3d-v4.cpp.129t.laddress: 3.70%
  tramp3d-v4.cpp.032t.ccp1: 2.31%
  tramp3d-v4.cpp.038t.evrp: 1.90%
 tramp3d-v4.cpp.033t.forwprop1: 1.74%
  tramp3d-v4.cpp.103t.vrp1: 1.52%
 tramp3d-v4.cpp.124t.forwprop3: 1.31%
  tramp3d-v4.cpp.181t.vrp2: 1.30%
   tramp3d-v4.cpp.161t.cunroll: 1.22%
tramp3d-v4.cpp.027t.fixup_cfg3: 1.11%
   tramp3d-v4.cpp.153t.ivcanon: 1.07%
  tramp3d-v4.cpp.126t.ccp3: 0.96%
  tramp3d-v4.cpp.143t.sccp: 0.91%
 tramp3d-v4.cpp.185t.forwprop4: 0.82%
   tramp3d-v4.cpp.011t.cfg: 0.74%
 tramp3d-v4.cpp.096t.forwprop2: 0.50%
tramp3d-v4.cpp.019t.fixup_cfg1: 0.37%
 tramp3d-v4.cpp.120t.phicprop1: 0.33%
   tramp3d-v4.cpp.133t.pre: 0.32%
 tramp3d-v4.cpp.182t.phicprop2: 0.27%
tramp3d-v4.cpp.170t.veclower21: 0.25%
   tramp3d-v4.cpp.029t.einline: 0.24%

I'm suggesting to add new TDF that will be allocated for that.
Patch can bootstrap on ppc64le-redhat-linux and survives regression
tests.

Thoughts?



Ok.  Soon we'll want to change dump_flags to uint64_t ...  (we have 1 bit
left
if you allow negative dump_flags).  It'll tickle down on a lot of
interfaces
so introducing dump_flags_t at the same time might be a good idea.



Hello.

I've prepared patch that migrates all interfaces and introduces
dump_flags_t.


Great.


I've been
currently testing that. Apart from that Richi requested to come up with more
generic approach
of hierarchical structure of options.


Didn't really "request" it, it's just something we eventually need to do when
we run out of bits again ;)


I know, but it was me who came up with the idea of more fine suboptions :)





Can you please take a look at self-contained source file that shows way I've
decided to go?
Another question is whether we want to implement also "aliases", where for
instance
current 'all' is equal to union of couple of suboptions?


Yeah, I think we do want -all-all-all and -foo-all to work.  Not sure
about -all-foo-all.


Actually only having 'all' is quite easy to implement.

Let's imagine following hierarchy:

(root)
- vops
- folding
  - gimple
- ctor
- array_ref
- arithmetic
  - generic
- c
- c++
- ctor
- xyz

Then '-fdump-passname-folding-all' will be equal to '-fdump-passname-folding'.



The important thing is to make sure dump_flags_t stays POD and thus is
eligible to be passed in register(s).  In the end we might simply come up
with a two-level hierarchy, each 32bits (or we can even get back to 32bits
in total with two times 16bits).


I'm aware of having the type as POD.



It looks you didn't actually implement this as a hierarchy though but
still allocate from one pool of bits (so you only do a change to how
users access this?)


Yep, all leaf options are mapped to a mask and all inner nodes are just union
of suboptions. That will allow us to have 64 leaf suboptions. Reaching the limit
we can encode the values in more sophisticated way. That however brings need
to implement more complicated '&' and '|' operators.

I'll finish the implementation and try to migrate that to current handling.
Guess, I'm quite close.

Martin



Thanks,
Richard.



Thanks for feedback,
Martin



Thanks,
Richard.


Martin







Re: [wwwdocs] gcc-8/porting_to.html

2017-05-04 Thread JonY
On 03/23/2017 10:47 AM, Thomas Preudhomme wrote:
> Ack. Please find updated patch as per suggestions.
> 
> Best regards,
> 
> Thomas
> 

I've applied the changes to GCC 8 trunk as r247588.




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/4][PR tree-optimization/78496] Don't simplify conditionals too early in VRP

2017-05-04 Thread Richard Biener
On Wed, May 3, 2017 at 6:32 PM, Jeff Law  wrote:
> [ With the patch attached... ]
>
>
> On 05/03/2017 10:31 AM, Jeff Law wrote:
>>
>> This is the first of 3-5 patches to address pr78496.
>>
>> The goal of these patches is to catch jump threads earlier in the pipeline
>> to avoid undesirable behavior in PRE and more generally be able to exploit
>> the secondary opportunities exposed by jump threading.
>>
>> One of the more serious issues I found while investigating 78496 was VRP
>> failing to find what should have been obvious jump threads.  The fundamental
>> issue is VRP will simplify conditionals which are fed by a typecast prior to
>> jump threading.   So something like this:
>>
>> x = (typecast) y;
>> if (x == 42)
>>
>> Can often be transformed into:
>>
>> if (y == 42)
>>
>>
>> The problem is any ASSERT_EXPRS after the conditional will reference "x"
>> rather than "y".  That in turn makes it impossible for VRP to use those
>> ASSERT_EXPRs to thread later jumps that use x == 
>>
>>
>> More concretely consider this gimple code:
>>
>>
>> ;;   basic block 5, loop depth 0, count 0, freq 1, maybe hot
>> ;;prev block 4, next block 12, flags: (NEW, REACHABLE, VISITED)
>> ;;pred:   3 [50.0%]  (TRUE_VALUE,EXECUTABLE)
>> ;;4 [100.0%]  (FALLTHRU,EXECUTABLE)
>># iftmp.0_2 = PHI <1(3), 0(4)>
>>in_loop_7 = (unsigned char) iftmp.0_2;
>>if (in_loop_7 != 0)
>>  goto ; [33.00%]
>>else
>>  goto ; [67.00%]
>>
>> ;;succ:   6 [33.0%]  (TRUE_VALUE,EXECUTABLE)
>> ;;12 [67.0%]  (FALSE_VALUE,EXECUTABLE)
>>
>> ;;   basic block 12, loop depth 0, count 0, freq 6700, maybe hot
>> ;;prev block 5, next block 6, flags: (NEW)
>> ;;pred:   5 [67.0%]  (FALSE_VALUE,EXECUTABLE)
>>in_loop_15 = ASSERT_EXPR ;
>>goto ; [100.00%]
>> ;;succ:   7 [100.0%]  (FALLTHRU)
>>
>> ;;   basic block 6, loop depth 0, count 0, freq 3300, maybe hot
>> ;;prev block 12, next block 7, flags: (NEW, REACHABLE, VISITED)
>> ;;pred:   5 [33.0%]  (TRUE_VALUE,EXECUTABLE)
>>in_loop_14 = ASSERT_EXPR ;
>>simple_iv ();
>> ;;succ:   7 [100.0%]  (FALLTHRU,EXECUTABLE)
>>
>> And later we have:
>>
>> ;;   basic block 9, loop depth 0, count 0, freq 8476, maybe hot
>> ;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
>> ;;pred:   7 [84.8%]  (FALSE_VALUE,EXECUTABLE)
>>if (in_loop_7 == 0)
>>  goto ; [36.64%]
>>else
>>  goto ; [63.36%]
>>
>> VRP knows it can replace the uses of in_loop_7 in the conditionals in
>> blocks 5 and 9 with iftmp.0_2 and happily does so *before* jump threading
>> (but well after ASSERT_EXPR insertion).
>>
>> As a result VRP is unable to utilize the ASSERT_EXPRs in blocks 12 and 6
>> (which reference in_loop_7) to thread the jump at bb9 (which now references
>> iftmp.0_2).
>>
>>
>> The cases in pr78496 are slightly more complex, but boil down to the same
>> core issue -- simplifying the conditional too early.
>>
>> Thankfully this is easy to fix.  We just split the conditional
>> simplification into two steps so that the transformation noted above occurs
>> after jump threading (the other simplifications we want to occur before jump
>> threading).
>>
>> This allows VRP1 to pick up 27 missed jump threads in the testcase from
>> 78496.  It could well be enough to address 78496, but since we don't have a
>> solid description of the desired end result I won't consider 78496 fixed
>> quite yet as there's significant further improvements we can make.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu.  Installing on the
>> trunk.

I think this is a hack ;)  Basically the issue is that jump-threading
uses ASSERT_EXPRs
at all (which are an implementation detail of VRP).  As far as I
understand it does that
because VRP can do "fancy" things and create ASSERT_EXPRs that do not directly
map to the conditional but to its operand def stmts.

I have meanwhile factored this "fancieness" out into (ok, bad name...)
register_edge_assert_for which records all these fancy asserts in a
vec.  This is
now used from EVRP:

  gimple *stmt = last_stmt (pred_e->src);
  if (stmt
  && gimple_code (stmt) == GIMPLE_COND
  && (op0 = gimple_cond_lhs (stmt))
  && TREE_CODE (op0) == SSA_NAME
  && (INTEGRAL_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))
  || POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "Visiting controlling predicate ");
  print_gimple_stmt (dump_file, stmt, 0, 0);
}
  /* Entering a new scope.  Try to see if we can find a VR
 here.  */
  tree op1 = gimple_cond_rhs (stmt);
  if (TREE_OVERFLOW_P (op1))
op1 = drop_tree_overflow (op1);
  tree_code code = gimple_cond_code (stmt);

 

Re: Handle data dependence relations with different bases

2017-05-04 Thread Bin.Cheng
On Thu, May 4, 2017 at 11:06 AM, Richard Sandiford
 wrote:
> "Bin.Cheng"  writes:
>> On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford
>>  wrote:
>>> Index: gcc/tree-data-ref.h
>>> ===
>>> --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100
>>> +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100
>>> @@ -191,6 +191,9 @@ struct conflict_function
>>>
>>>  struct subscript
>>>  {
>>> +  /* The access functions of the two references.  */
>>> +  tree access_fn[2];
>> Is it better to follow existing code, i.e, name this as
>> access_fn_a/access_fn_b.  Thus we don't need to use const value 0/1 in
>> various places, which is a little bit confusing.
>
> [Answered below]
>
>>> +
>>>/* A description of the iterations for which the elements are
>>>   accessed twice.  */
>>>conflict_function *conflicting_iterations_in_a;
>>> @@ -209,6 +212,7 @@ struct subscript
>>>
>>>  typedef struct subscript *subscript_p;
>>>
>>> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I]
>>>  #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a
>>>  #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b
>>>  #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict
>>> @@ -264,6 +268,33 @@ struct data_dependence_relation
>>>/* Set to true when the dependence relation is on the same data
>>>   access.  */
>>>bool self_reference_p;
>>> +
>>> +  /* True if the dependence described is conservatively correct rather
>>> + than exact, and if it is still possible for the accesses to be
>>> + conditionally independent.  For example, the a and b references in:
>>> +
>>> +   struct s *a, *b;
>>> +   for (int i = 0; i < n; ++i)
>>> + a->f[i] += b->f[i];
>>> +
>>> + conservatively have a distance vector of (0), for the case in which
>>> + a == b, but the accesses are independent if a != b.  Similarly,
>>> + the a and b references in:
>>> +
>>> +   struct s *a, *b;
>>> +   for (int i = 0; i < n; ++i)
>>> + a[0].f[i] += b[i].f[i];
>>> +
>>> + conservatively have a distance vector of (0), but they are indepenent
>>> + when a != b + i.  In contrast, the references in:
>>> +
>>> +   struct s *a;
>>> +   for (int i = 0; i < n; ++i)
>>> + a->f[i] += a->f[i];
>>> +
>>> + have the same distance vector of (0), but the accesses can never be
>>> + independent.  */
>>> +  bool could_be_independent_p;
>>>  };
>>>
>>>  typedef struct data_dependence_relation *ddr_p;
>>> @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \
>>>  #define DDR_DIST_VECT(DDR, I) \
>>>DDR_DIST_VECTS (DDR)[I]
>>>  #define DDR_REVERSED_P(DDR) (DDR)->reversed_p
>>> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p
>>>
>>>
>>>  bool dr_analyze_innermost (struct data_reference *, struct loop *);
>>> @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data
>>>return false;
>>>
>>>return true;
>>> -}
>>> -
>>> -/* Return true when the DDR contains two data references that have the
>>> -   same access functions.  */
>>> -
>>> -static inline bool
>>> -same_access_functions (const struct data_dependence_relation *ddr)
>>> -{
>>> -  unsigned i;
>>> -
>>> -  for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
>>> -if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i),
>>> - DR_ACCESS_FN (DDR_B (ddr), i)))
>>> -  return false;
>>> -
>>> -  return true;
>>>  }
>>>
>>>  /* Returns true when all the dependences are computable.  */
>>> Index: gcc/tree-data-ref.c
>>> ===
>>> --- gcc/tree-data-ref.c 2017-02-23 19:54:15.0 +
>>> +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100
>>> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o
>>>  } dependence_stats;
>>>
>>>  static bool subscript_dependence_tester_1 (struct data_dependence_relation 
>>> *,
>>> -  struct data_reference *,
>>> -  struct data_reference *,
>>> +  unsigned int, unsigned int,
>>>struct loop *);
>> As mentioned, how about passing access_fn directly, rather than less
>> meaningful 0/1 values?
>
> The problem is that access_fn is a property of the individual
> subscripts, whereas this is operating on a full data_reference.
>
> One alternative would be to use conditions like:
>
>   first_is_a ? SUB_ACCESS_FN_A (sub) : SUB_ACCESS_FN_B (sub)
>
> but IMO that's less readable than the existing:
>
>   SUB_ACCESS_FN (sub, index)
>
> Or we could have individual access_fn arrays for A and B, separate
> from the main subscript array, but that would mean allocating three
> arrays instead of one.
Thanks for explanation, I see the problem now.  Even the 

Re: C PATCH to fix missing -Wlogical-op warning (PR c/80525)

2017-05-04 Thread Richard Biener
On Thu, May 4, 2017 at 12:27 PM, Marek Polacek  wrote:
> This PR points out a missing -Wlogical-op warning (unless you use -fwrapv).
>
> We end up calling warn_logical_operator with op_left that is
> C_M_C_E  != 0
> and op_right that is
> a + 1
>
> But make_range just cannot handle C_M_C_Es right; for exprs it simply picks 
> the
> first operand and that doesn't work with C_M_C_E, where we'd need to use
> C_MAYBE_CONST_EXPR_EXPR.  warn_logical_operator has code that strips C_M_C_E
> but that is insufficient.  I think we have to use walk_tree, because as this
> testcase shows, those C_M_C_Es might not be the outermost expression.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2017-05-03  Marek Polacek  
>
> PR c/80525
> * c-warn.c (unwrap_c_maybe_const): New.
> (warn_logical_operator): Call it.
>
> * c-c++-common/Wlogical-op-1.c: Don't use -fwrapv anymore.
> * c-c++-common/Wlogical-op-2.c: New test.
>
> diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
> index 45dd583..0f17bc5 100644
> --- gcc/c-family/c-warn.c
> +++ gcc/c-family/c-warn.c
> @@ -112,6 +112,21 @@ overflow_warning (location_t loc, tree value)
>  }
>  }
>
> +/* Helper function for walk_tree.  Unwrap C_MAYBE_CONST_EXPRs in an 
> expression
> +   pointed to by TP.  */
> +
> +static tree
> +unwrap_c_maybe_const (tree *tp, int *walk_subtrees, void *)
> +{
> +  if (TREE_CODE (*tp) == C_MAYBE_CONST_EXPR)
> +{
> +  *tp = C_MAYBE_CONST_EXPR_EXPR (*tp);
> +  /* C_MAYBE_CONST_EXPRs don't nest.  */
> +  *walk_subtrees = false;

This changes trees in-place -- do you need to operate on a copy?

> +}
> +  return NULL_TREE;
> +}
> +
>  /* Warn about uses of logical || / && operator in a context where it
> is likely that the bitwise equivalent was intended by the
> programmer.  We have seen an expression in which CODE is a binary
> @@ -189,11 +204,10 @@ warn_logical_operator (location_t location, enum 
> tree_code code, tree type,
>   (with OR) or trivially false (with AND).  If so, do not warn.
>   This is a common idiom for testing ranges of data types in
>   portable code.  */
> +  walk_tree_without_duplicates (_left, unwrap_c_maybe_const, NULL);
>lhs = make_range (op_left, _p, , , _overflow_p);
>if (!lhs)
>  return;
> -  if (TREE_CODE (lhs) == C_MAYBE_CONST_EXPR)
> -lhs = C_MAYBE_CONST_EXPR_EXPR (lhs);
>
>/* If this is an OR operation, invert both sides; now, the result
>   should be always false to get a warning.  */
> @@ -204,11 +218,10 @@ warn_logical_operator (location_t location, enum 
> tree_code code, tree type,
>if (tem && integer_zerop (tem))
>  return;
>
> +  walk_tree_without_duplicates (_right, unwrap_c_maybe_const, NULL);
>rhs = make_range (op_right, _p, , , _overflow_p);
>if (!rhs)
>  return;
> -  if (TREE_CODE (rhs) == C_MAYBE_CONST_EXPR)
> -rhs = C_MAYBE_CONST_EXPR_EXPR (rhs);
>
>/* If this is an OR operation, invert both sides; now, the result
>   should be always false to get a warning.  */
> diff --git gcc/testsuite/c-c++-common/Wlogical-op-1.c 
> gcc/testsuite/c-c++-common/Wlogical-op-1.c
> index e89a35a..c5f992a 100644
> --- gcc/testsuite/c-c++-common/Wlogical-op-1.c
> +++ gcc/testsuite/c-c++-common/Wlogical-op-1.c
> @@ -1,8 +1,6 @@
>  /* PR c/63357 */
>  /* { dg-do compile } */
> -/* For -fwrapv see PR80525, xfailing the subtest isn't possible as it passes
> -   with the C++ FE which doesn't have maybe_const_expr.  */
> -/* { dg-options "-fwrapv -Wlogical-op" } */
> +/* { dg-options "-Wlogical-op" } */
>
>  #ifndef __cplusplus
>  # define bool _Bool
> diff --git gcc/testsuite/c-c++-common/Wlogical-op-2.c 
> gcc/testsuite/c-c++-common/Wlogical-op-2.c
> index e69de29..6360ef9 100644
> --- gcc/testsuite/c-c++-common/Wlogical-op-2.c
> +++ gcc/testsuite/c-c++-common/Wlogical-op-2.c
> @@ -0,0 +1,12 @@
> +/* PR c/80525 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wlogical-op" } */
> +
> +int
> +fn (int a, int b)
> +{
> +  if ((a + 1) && (a + 1)) /* { dg-warning "logical .and. of equal 
> expressions" } */
> +return a;
> +  if ((a + 1) || (a + 1)) /* { dg-warning "logical .or. of equal 
> expressions" } */
> +return b;
> +}
>
> Marek


Re: [PATCH, GCC/ARM, Stage 1] PR71607: Fix ICE when loading constant

2017-05-04 Thread Prakhar Bahuguna
On 03/05/2017 11:30:13, Richard Earnshaw (lists) wrote:
> On 20/04/17 10:54, Prakhar Bahuguna wrote:
> > [ARM] PR71607: Fix ICE when loading constant
> > 
> > gcc/ChangeLog:
> > 
> > 2017-04-18  Andre Vieira  
> > Prakhar Bahuguna  
> > 
> > PR target/71607
> > * config/arm/arm.md (use_literal_pool): Removes.
> > (64-bit immediate split): No longer takes cost into consideration
> > if 'arm_disable_literal_pool' is enabled.
> > * config/arm/arm.c (arm_tls_referenced_p): Add diagnostic if TLS is
> > used when arm_disable_literal_pool is enabled.
> > (arm_max_const_double_inline_cost): Remove use of
> > arm_disable_literal_pool.
> > (arm_reorg): Add return if arm_disable_literal_pool is enabled.
> > * config/arm/vfp.md (no_literal_pool_df_immediate): New.
> > (no_literal_pool_sf_immediate): New.
> > 
> > testsuite/ChangeLog:
> > 
> > 2017-04-18  Andre Vieira  
> > Thomas Preud'homme  
> > Prakhar Bahuguna  
> > 
> > PR target/71607
> > * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
> > * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
> > * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
> > * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
> > * gcc.target/arm/thumb2-slow-flash-data-4.c: New.
> > * gcc.target/arm/thumb2-slow-flash-data-5.c: New.
> > * gcc.target/arm/tls-disable-literal-pool.c: New.
> > 
> > Okay for stage1?
> > 
> 
> This patch lacks a description of what's going on and why the change is
> necessary (it should stand alone from the PR data).  It's clearly a
> non-trivial change, so why have you adopted this approach?
> 
> R.
> 

Hi,

This patch is based off an earlier patch that was applied to the
embedded-6-branch, and I had neglected to include the full description, which
is presented below:

This patch tackles the issue reported in PR71607. This patch takes a different
approach for disabling the creation of literal pools. Instead of disabling the
patterns that would normally transform the rtl into actual literal pools, it
disables the creation of this literal pool rtl by making the target hook
TARGET_CANNOT_FORCE_CONST_MEM return true if arm_disable_literal_pool is true.
I added patterns to split floating point constants for both SF and DFmode. A
pattern to handle the addressing of label_refs had to be included as well since
all "memory_operand" patterns are disabled when TARGET_CANNOT_FORCE_CONST_MEM
returns true. Also the pattern for splitting 32-bit immediates had to be
changed, it was not accepting unsigned 32-bit unsigned integers with the MSB
set. I believe const_int_operand expects the mode of the operand to be set to
VOIDmode and not SImode. I have only changed it in the patterns that were
affecting this code, though I suggest looking into changing it in the rest of
the ARM backend.

Additionally, the use of thread-local storage is disabled if literal pools are
disabled, as there are no relocations for TLS variables and incorrect code is
generated as a result. The patch now emits a diagnostic in TLS-enabled
toolchains if a TLS symbol is found when -mpure-code or -mslow-flash-data are
enabled.

-- 

Prakhar Bahuguna


Re: [RFC][PATCH] Introduce -fdump*-folding

2017-05-04 Thread Richard Biener
On Thu, May 4, 2017 at 11:22 AM, Martin Liška  wrote:
> On 05/03/2017 12:12 PM, Richard Biener wrote:
>>
>> On Wed, May 3, 2017 at 10:10 AM, Martin Liška  wrote:
>>>
>>> Hello
>>>
>>> Last release cycle I spent quite some time with reading of IVOPTS pass
>>> dump file. Using -fdump*-details causes to generate a lot of 'Applying
>>> pattern'
>>> lines, which can make reading of a dump file more complicated.
>>>
>>> There are stats for tramp3d with -O2 and -fdump-tree-all-details.
>>> Percentage number
>>> shows how many lines are of the aforementioned pattern:
>>>
>>> tramp3d-v4.cpp.164t.ivopts: 6.34%
>>>   tramp3d-v4.cpp.091t.ccp2: 5.04%
>>>   tramp3d-v4.cpp.093t.cunrolli: 4.41%
>>>   tramp3d-v4.cpp.129t.laddress: 3.70%
>>>   tramp3d-v4.cpp.032t.ccp1: 2.31%
>>>   tramp3d-v4.cpp.038t.evrp: 1.90%
>>>  tramp3d-v4.cpp.033t.forwprop1: 1.74%
>>>   tramp3d-v4.cpp.103t.vrp1: 1.52%
>>>  tramp3d-v4.cpp.124t.forwprop3: 1.31%
>>>   tramp3d-v4.cpp.181t.vrp2: 1.30%
>>>tramp3d-v4.cpp.161t.cunroll: 1.22%
>>> tramp3d-v4.cpp.027t.fixup_cfg3: 1.11%
>>>tramp3d-v4.cpp.153t.ivcanon: 1.07%
>>>   tramp3d-v4.cpp.126t.ccp3: 0.96%
>>>   tramp3d-v4.cpp.143t.sccp: 0.91%
>>>  tramp3d-v4.cpp.185t.forwprop4: 0.82%
>>>tramp3d-v4.cpp.011t.cfg: 0.74%
>>>  tramp3d-v4.cpp.096t.forwprop2: 0.50%
>>> tramp3d-v4.cpp.019t.fixup_cfg1: 0.37%
>>>  tramp3d-v4.cpp.120t.phicprop1: 0.33%
>>>tramp3d-v4.cpp.133t.pre: 0.32%
>>>  tramp3d-v4.cpp.182t.phicprop2: 0.27%
>>> tramp3d-v4.cpp.170t.veclower21: 0.25%
>>>tramp3d-v4.cpp.029t.einline: 0.24%
>>>
>>> I'm suggesting to add new TDF that will be allocated for that.
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression
>>> tests.
>>>
>>> Thoughts?
>>
>>
>> Ok.  Soon we'll want to change dump_flags to uint64_t ...  (we have 1 bit
>> left
>> if you allow negative dump_flags).  It'll tickle down on a lot of
>> interfaces
>> so introducing dump_flags_t at the same time might be a good idea.
>
>
> Hello.
>
> I've prepared patch that migrates all interfaces and introduces
> dump_flags_t.

Great.

> I've been
> currently testing that. Apart from that Richi requested to come up with more
> generic approach
> of hierarchical structure of options.

Didn't really "request" it, it's just something we eventually need to do when
we run out of bits again ;)

>
> Can you please take a look at self-contained source file that shows way I've
> decided to go?
> Another question is whether we want to implement also "aliases", where for
> instance
> current 'all' is equal to union of couple of suboptions?

Yeah, I think we do want -all-all-all and -foo-all to work.  Not sure
about -all-foo-all.

The important thing is to make sure dump_flags_t stays POD and thus is
eligible to be passed in register(s).  In the end we might simply come up
with a two-level hierarchy, each 32bits (or we can even get back to 32bits
in total with two times 16bits).

It looks you didn't actually implement this as a hierarchy though but
still allocate from one pool of bits (so you only do a change to how
users access this?)

Thanks,
Richard.

>
> Thanks for feedback,
> Martin
>
>>
>> Thanks,
>> Richard.
>>
>>> Martin
>
>


Re: [PATCH][AArch64] Model Cortex-A53 load forwarding

2017-05-04 Thread Richard Earnshaw (lists)
On 05/04/17 13:29, Wilco Dijkstra wrote:
> Code scheduling for Cortex-A53 isn't as good as it could be.  It turns out
> code runs faster overall if we place loads and stores with a dependency
> closer together.  To achieve this effect, this patch adds a bypass between
> cortex_a53_load1 and cortex_a53_load*/cortex_a53_store* if the result of an
> earlier load is used in an address calculation.  This significantly improved
> benchmark scores in a proprietary benchmark suite.
> 
> Passes AArch64 bootstrap and regress. OK for stage 1?
> 

What about an ARM bootstrap?  OK if that also passes.

R.

> ChangeLog:
> 2017-04-05  Wilco Dijkstra  
> 
>   * config/arm/aarch-common.c (arm_early_load_addr_dep_ptr):
>   New function.
>   (arm_early_store_addr_dep_ptr): Likewise.
>   * config/arm/aarch-common-protos.h
>   (arm_early_load_addr_dep_ptr): Add prototype.
>   (arm_early_store_addr_dep_ptr): Likewise.
>   * config/arm/cortex-a53.md: Add new bypasses.
> ---
> 
> diff --git a/gcc/config/arm/aarch-common-protos.h 
> b/gcc/config/arm/aarch-common-protos.h
> index 
> 8e9fb7a895b0a4aaf1585eb3368443899b061c9b..5298172e6b6930a110388a40a7533ff208a87095
>  100644
> --- a/gcc/config/arm/aarch-common-protos.h
> +++ b/gcc/config/arm/aarch-common-protos.h
> @@ -30,7 +30,9 @@ extern bool aarch_rev16_p (rtx);
>  extern bool aarch_rev16_shleft_mask_imm_p (rtx, machine_mode);
>  extern bool aarch_rev16_shright_mask_imm_p (rtx, machine_mode);
>  extern int arm_early_load_addr_dep (rtx, rtx);
> +extern int arm_early_load_addr_dep_ptr (rtx, rtx);
>  extern int arm_early_store_addr_dep (rtx, rtx);
> +extern int arm_early_store_addr_dep_ptr (rtx, rtx);
>  extern int arm_mac_accumulator_is_mul_result (rtx, rtx);
>  extern int arm_mac_accumulator_is_result (rtx, rtx);
>  extern int arm_no_early_alu_shift_dep (rtx, rtx);
> diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
> index 
> dd37be0291a633f606d95ec8acacc598435828b3..74b80b272550028919c4274387944867ffed43d1
>  100644
> --- a/gcc/config/arm/aarch-common.c
> +++ b/gcc/config/arm/aarch-common.c
> @@ -241,6 +241,24 @@ arm_early_load_addr_dep (rtx producer, rtx consumer)
>return reg_overlap_mentioned_p (value, addr);
>  }
>  
> +/* Return nonzero if the CONSUMER instruction (a load) does need
> +   a Pmode PRODUCER's value to calculate the address.  */
> +
> +int
> +arm_early_load_addr_dep_ptr (rtx producer, rtx consumer)
> +{
> +  rtx value = arm_find_sub_rtx_with_code (PATTERN (producer), SET, false);
> +  rtx addr = arm_find_sub_rtx_with_code (PATTERN (consumer), SET, false);
> +
> +  if (!value || !addr || !MEM_P (SET_SRC (value)))
> +return 0;
> +
> +  value = SET_DEST (value);
> +  addr = SET_SRC (addr);
> +
> +  return GET_MODE (value) == Pmode && reg_overlap_mentioned_p (value, addr);
> +}
> +
>  /* Return nonzero if the CONSUMER instruction (an ALU op) does not
> have an early register shift value or amount dependency on the
> result of PRODUCER.  */
> @@ -336,6 +354,24 @@ arm_early_store_addr_dep (rtx producer, rtx consumer)
>return !arm_no_early_store_addr_dep (producer, consumer);
>  }
>  
> +/* Return nonzero if the CONSUMER instruction (a store) does need
> +   a Pmode PRODUCER's value to calculate the address.  */
> +
> +int
> +arm_early_store_addr_dep_ptr (rtx producer, rtx consumer)
> +{
> +  rtx value = arm_find_sub_rtx_with_code (PATTERN (producer), SET, false);
> +  rtx addr = arm_find_sub_rtx_with_code (PATTERN (consumer), SET, false);
> +
> +  if (!value || !addr || !MEM_P (SET_SRC (value)))
> +return 0;
> +
> +  value = SET_DEST (value);
> +  addr = SET_DEST (addr);
> +
> +  return GET_MODE (value) == Pmode && reg_overlap_mentioned_p (value, addr);
> +}
> +
>  /* Return non-zero iff the consumer (a multiply-accumulate or a
> multiple-subtract instruction) has an accumulator dependency on the
> result of the producer and no other dependency on that result.  It
> diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
> index 
> b367ad403a4a641da34521c17669027b87092737..f8225f33c7a06485147b30fe2633309ac252d0c7
>  100644
> --- a/gcc/config/arm/cortex-a53.md
> +++ b/gcc/config/arm/cortex-a53.md
> @@ -246,6 +246,16 @@
>"cortex_a53_store*"
>"arm_no_early_store_addr_dep")
>  
> +;; Model a bypass for load to load/store address.
> +
> +(define_bypass 3 "cortex_a53_load1"
> +  "cortex_a53_load*"
> +  "arm_early_load_addr_dep_ptr")
> +
> +(define_bypass 3 "cortex_a53_load1"
> +  "cortex_a53_store*"
> +  "arm_early_store_addr_dep_ptr")
> +
>  ;; Model a GP->FP register move as similar to stores.
>  
>  (define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*"
> 



Re: [PATCH] Fix ICE in get_range_info (PR tree-optimization/80612)

2017-05-04 Thread Richard Biener
On Thu, May 4, 2017 at 11:14 AM, Marek Polacek  wrote:
> We need to check that the SSA_NAME we're passing down to get_range_info
> is of INTEGRAL_TYPE_P; on pointers we'd crash on an assert.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk and 7.2?

Ok.

Richard.

> 2017-05-04  Marek Polacek  
>
> PR tree-optimization/80612
> * calls.c (get_size_range): Check for INTEGRAL_TYPE_P.
>
> * gcc.dg/torture/pr80612.c: New test.
>
> diff --git gcc/calls.c gcc/calls.c
> index c26f157..bd081cc 100644
> --- gcc/calls.c
> +++ gcc/calls.c
> @@ -1270,7 +1270,7 @@ get_size_range (tree exp, tree range[2])
>
>wide_int min, max;
>enum value_range_type range_type
> -= (TREE_CODE (exp) == SSA_NAME
> += ((TREE_CODE (exp) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE (exp)))
> ? get_range_info (exp, , ) : VR_VARYING);
>
>if (range_type == VR_VARYING)
> diff --git gcc/testsuite/gcc.dg/torture/pr80612.c 
> gcc/testsuite/gcc.dg/torture/pr80612.c
> index e69de29..225b811 100644
> --- gcc/testsuite/gcc.dg/torture/pr80612.c
> +++ gcc/testsuite/gcc.dg/torture/pr80612.c
> @@ -0,0 +1,15 @@
> +/* PR tree-optimization/80612 */
> +/* { dg-do compile } */
> +
> +struct obstack *a;
> +struct obstack {
> +  union {
> +void *plain;
> +void (*extra)();
> +  } chunkfun;
> +} fn1(void p4()) {
> +  a->chunkfun.plain = p4;
> +  a->chunkfun.extra(a);
> +}
> +void fn2(int) __attribute__((__alloc_size__(1)));
> +void fn3() { fn1(fn2); }
>
> Marek


Re: Fix bootstrap issue with gcc 4.1

2017-05-04 Thread Richard Biener
On Thu, May 4, 2017 at 11:04 AM, Jan Hubicka  wrote:
>> >
>> >Sure, I'm not questioning the patch, just wondering if we shouldn't
>> >improve
>> >store-merging further (we want to do it anyway for e.g. bitop adjacent
>> >operations etc.).
>>
>> We definitely want to do that.  It should also 'nicely' merge with bswap for 
>> gathering the load side of a piecewise memory to memory copy.
>
> The code we produce now in .optimized is:
>[15.35%]:
>   # DEBUG this => _42
>   MEM[(struct inline_summary *)_42].estimated_self_stack_size = 0;
>   MEM[(struct inline_summary *)_42].self_size = 0;
>   _44 = [(struct inline_summary *)_42].self_time;
>   # DEBUG this => _44
>   # DEBUG sig => 0
>   # DEBUG exp => 0
>   MEM[(struct sreal *)_42 + 16B].m_sig = 0;
>   MEM[(struct sreal *)_42 + 16B].m_exp = 0;
>   sreal::normalize (_44);
>   # DEBUG this => NULL
>   # DEBUG sig => NULL
>   # DEBUG exp => NULL
>   MEM[(struct inline_summary *)_42].min_size = 0;
>   MEM[(struct inline_summary *)_42].inlinable = 0;
>   MEM[(struct inline_summary *)_42].contains_cilk_spawn = 0;
>   MEM[(struct inline_summary *)_42].single_caller = 0;
>   MEM[(struct inline_summary *)_42].fp_expressions = 0;
>   MEM[(struct inline_summary *)_42].estimated_stack_size = 0;
>   MEM[(struct inline_summary *)_42].stack_frame_offset = 0;

It should handle at least the bitfields here (inlinable, contains_cilk_spawn,
single_caller and fp_expression).  Ah, I guess it doesn't because the
padding isn't initialized and it doesn't want to touch it.  Well, just a guess.

inline_summary is also very badly laid out now with lots of padding.

>   _45 = [(struct inline_summary *)_42].time;
>   # DEBUG this => _45
>   # DEBUG sig => 0
>   # DEBUG exp => 0
>   MEM[(struct sreal *)_42 + 56B].m_sig = 0;
>   MEM[(struct sreal *)_42 + 56B].m_exp = 0;
>   sreal::normalize (_45);
>   # DEBUG this => NULL
>   # DEBUG sig => NULL
>   # DEBUG exp => NULL
>   MEM[(struct inline_summary *)_42].size = 0;
>   MEM[(void *)_42 + 80B] = 0;
>   MEM[(void *)_42 + 88B] = 0;
>   MEM[(void *)_42 + 96B] = 0;
>   MEM[(struct predicate * *)_42 + 104B] = 0;
>   MEM[(void *)_42 + 112B] = 0;
>   MEM[(void *)_42 + 120B] = 0;
>   goto ; [100.00%]
>
> so indeed it is not quite pretty at all. Even the bitfields are split
> and there is offlined call to normlize zero in sreal.
> I wonder why inliner does not see htis as obviously good inlining oppurtunity.
> I will look into that problem. For store merging it would indeed be
> very nice to improve this as well.
>
> Honza
>>
>> Richard.
>>
>> >
>> > Jakub


Re: [PATCH][AArch64] Enable AUTOPREFETCHER_WEAK with -mcpu=generic

2017-05-04 Thread Richard Earnshaw (lists)
On 05/04/17 13:38, Wilco Dijkstra wrote:
> Many supported cores use the AUTOPREFETCHER_WEAK setting which tries
> to order loads and stores to improve streaming performance.  Since significant
> gains were reported in http://patchwork.ozlabs.org/patch/534469/ it seems
> like a good idea to enable this setting too for -mcpu=generic.  Since the
> weak model only keeps the order if it doesn't make the schedule worse, it
> should not impact performance adversely on cores that don't show a gain.
> Any objections?
> 
> ChangeLog:
> 2017-04-05  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.c (generic_tunings): Update prefetch model.
> 

OK.  The consensus seems to be in favour of this.

R.

> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 8b729b1b1f87316e940d7fc657f235a935ffa93e..b249ce2b310707c7ded2827d505ce2ddfcfbf976
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -547,7 +547,7 @@ static const struct tune_params generic_tunings =
>2, /* min_div_recip_mul_df.  */
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
> -  tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
> +  tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_NONE)  /* tune_flags.  */
>  };
> 



C PATCH to fix missing -Wlogical-op warning (PR c/80525)

2017-05-04 Thread Marek Polacek
This PR points out a missing -Wlogical-op warning (unless you use -fwrapv).

We end up calling warn_logical_operator with op_left that is
C_M_C_E  != 0
and op_right that is
a + 1

But make_range just cannot handle C_M_C_Es right; for exprs it simply picks the
first operand and that doesn't work with C_M_C_E, where we'd need to use
C_MAYBE_CONST_EXPR_EXPR.  warn_logical_operator has code that strips C_M_C_E
but that is insufficient.  I think we have to use walk_tree, because as this
testcase shows, those C_M_C_Es might not be the outermost expression.  

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-05-03  Marek Polacek  

PR c/80525
* c-warn.c (unwrap_c_maybe_const): New.
(warn_logical_operator): Call it.

* c-c++-common/Wlogical-op-1.c: Don't use -fwrapv anymore.
* c-c++-common/Wlogical-op-2.c: New test.

diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
index 45dd583..0f17bc5 100644
--- gcc/c-family/c-warn.c
+++ gcc/c-family/c-warn.c
@@ -112,6 +112,21 @@ overflow_warning (location_t loc, tree value)
 }
 }
 
+/* Helper function for walk_tree.  Unwrap C_MAYBE_CONST_EXPRs in an expression
+   pointed to by TP.  */
+
+static tree
+unwrap_c_maybe_const (tree *tp, int *walk_subtrees, void *)
+{
+  if (TREE_CODE (*tp) == C_MAYBE_CONST_EXPR)
+{
+  *tp = C_MAYBE_CONST_EXPR_EXPR (*tp);
+  /* C_MAYBE_CONST_EXPRs don't nest.  */
+  *walk_subtrees = false;
+}
+  return NULL_TREE;
+}
+
 /* Warn about uses of logical || / && operator in a context where it
is likely that the bitwise equivalent was intended by the
programmer.  We have seen an expression in which CODE is a binary
@@ -189,11 +204,10 @@ warn_logical_operator (location_t location, enum 
tree_code code, tree type,
  (with OR) or trivially false (with AND).  If so, do not warn.
  This is a common idiom for testing ranges of data types in
  portable code.  */
+  walk_tree_without_duplicates (_left, unwrap_c_maybe_const, NULL);
   lhs = make_range (op_left, _p, , , _overflow_p);
   if (!lhs)
 return;
-  if (TREE_CODE (lhs) == C_MAYBE_CONST_EXPR)
-lhs = C_MAYBE_CONST_EXPR_EXPR (lhs);
 
   /* If this is an OR operation, invert both sides; now, the result
  should be always false to get a warning.  */
@@ -204,11 +218,10 @@ warn_logical_operator (location_t location, enum 
tree_code code, tree type,
   if (tem && integer_zerop (tem))
 return;
 
+  walk_tree_without_duplicates (_right, unwrap_c_maybe_const, NULL);
   rhs = make_range (op_right, _p, , , _overflow_p);
   if (!rhs)
 return;
-  if (TREE_CODE (rhs) == C_MAYBE_CONST_EXPR)
-rhs = C_MAYBE_CONST_EXPR_EXPR (rhs);
 
   /* If this is an OR operation, invert both sides; now, the result
  should be always false to get a warning.  */
diff --git gcc/testsuite/c-c++-common/Wlogical-op-1.c 
gcc/testsuite/c-c++-common/Wlogical-op-1.c
index e89a35a..c5f992a 100644
--- gcc/testsuite/c-c++-common/Wlogical-op-1.c
+++ gcc/testsuite/c-c++-common/Wlogical-op-1.c
@@ -1,8 +1,6 @@
 /* PR c/63357 */
 /* { dg-do compile } */
-/* For -fwrapv see PR80525, xfailing the subtest isn't possible as it passes
-   with the C++ FE which doesn't have maybe_const_expr.  */
-/* { dg-options "-fwrapv -Wlogical-op" } */
+/* { dg-options "-Wlogical-op" } */
 
 #ifndef __cplusplus
 # define bool _Bool
diff --git gcc/testsuite/c-c++-common/Wlogical-op-2.c 
gcc/testsuite/c-c++-common/Wlogical-op-2.c
index e69de29..6360ef9 100644
--- gcc/testsuite/c-c++-common/Wlogical-op-2.c
+++ gcc/testsuite/c-c++-common/Wlogical-op-2.c
@@ -0,0 +1,12 @@
+/* PR c/80525 */
+/* { dg-do compile } */
+/* { dg-options "-Wlogical-op" } */
+
+int
+fn (int a, int b)
+{
+  if ((a + 1) && (a + 1)) /* { dg-warning "logical .and. of equal expressions" 
} */
+return a;
+  if ((a + 1) || (a + 1)) /* { dg-warning "logical .or. of equal expressions" 
} */
+return b;
+}

Marek


Re: [PATCH][AArch64] Set jump alignment to 4 for Cortex cores

2017-05-04 Thread Richard Earnshaw (lists)
On 12/04/17 13:50, Wilco Dijkstra wrote:
> Set jump alignment to 4 for Cortex cores as it reduces codesize by 0.4% on 
> average
> with no obvious performance difference.  See original discussion of the 
> overheads
> of various alignments: 
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg02075.html
> 
> Bootstrap OK, OK for stage 1?
> 
> ChangeLog:
> 2017-04-12  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (cortexa35_tunings): Set jump alignment to 4.
> (cortexa53_tunings): Likewise.
> (cortexa57_tunings): Likewise.
> (cortexa72_tunings): Likewise.
> (cortexa73_tunings): Likewise.
> 

OK.

R.

> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> a6004e6e283ba7157e65b678cf668f8a47e21abb..a8b3a29dd2e242a35f37b8c6a6fb30699ace5e01
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -564,7 +564,7 @@ static const struct tune_params cortexa35_tunings =
>(AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
>16,/* function_align.  */
> -  8, /* jump_align.  */
> +  4, /* jump_align.  */
>8, /* loop_align.  */
>2, /* int_reassoc_width.  */
>4, /* fp_reassoc_width.  */
> @@ -590,7 +590,7 @@ static const struct tune_params cortexa53_tunings =
>(AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
>16,/* function_align.  */
> -  8, /* jump_align.  */
> +  4, /* jump_align.  */
>8, /* loop_align.  */
>2, /* int_reassoc_width.  */
>4, /* fp_reassoc_width.  */
> @@ -616,7 +616,7 @@ static const struct tune_params cortexa57_tunings =
>(AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
>16,/* function_align.  */
> -  8, /* jump_align.  */
> +  4, /* jump_align.  */
>8, /* loop_align.  */
>2, /* int_reassoc_width.  */
>4, /* fp_reassoc_width.  */
> @@ -642,7 +642,7 @@ static const struct tune_params cortexa72_tunings =
>(AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
>16,/* function_align.  */
> -  8, /* jump_align.  */
> +  4, /* jump_align.  */
>8, /* loop_align.  */
>2, /* int_reassoc_width.  */
>4, /* fp_reassoc_width.  */
> @@ -668,7 +668,7 @@ static const struct tune_params cortexa73_tunings =
>(AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
>16,/* function_align.  */
> -  8, /* jump_align.  */
> +  4, /* jump_align.  */
>8, /* loop_align.  */
>2, /* int_reassoc_width.  */
>4, /* fp_reassoc_width.  */
> 



Re: [PATCH][AArch64] Update alignment for -mcpu=generic

2017-05-04 Thread Richard Earnshaw (lists)
On 12/04/17 13:58, Wilco Dijkstra wrote:
> With -mcpu=generic the loop alignment is currently 4.  All but one of the
> supported cores use 8 or higher.  Since using 8 provides performance gains
> on several cores, it is best to use that by default.  As discussed in [1],
> the jump alignment has no effect on performance, yet has a relatively high
> codesize cost [2], so setting it to 4 is best.  This gives a 0.2% overall 
> codesize improvement as well as performance gains in several benchmarks.
> Any objections?
> 
> Bootstrap OK on AArch64, OK for stage 1?
> 
> ChangeLog:
> 2017-04-12  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (generic_tunings): Set jump alignment to 4.
>   Set loop alignment to 8.
> 

OK.  It looks to me as though these two values were back-to-front.
Having loop align lower than jump align sounds just perverse.

R.

> [1] https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00574.html
> [2] https://gcc.gnu.org/ml/gcc-patches/2016-06/msg02075.html
> 
> ---
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> c8cf7169a5d387de336920b50c83761dc0c96f3a..8b729b1b1f87316e940d7fc657f235a935ffa93e
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -538,8 +538,8 @@ static const struct tune_params generic_tunings =
>2, /* issue_rate  */
>(AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
>8, /* function_align.  */
> -  8, /* jump_align.  */
> -  4, /* loop_align.  */
> +  4, /* jump_align.  */
> +  8, /* loop_align.  */
>2, /* int_reassoc_width.  */
>4, /* fp_reassoc_width.  */
>1, /* vec_reassoc_width.  */
> 
> 



Re: Cap niter_for_unrolled_loop to upper bound

2017-05-04 Thread Richard Biener
On Thu, May 4, 2017 at 8:47 AM, Richard Sandiford
 wrote:
> For the reasons explained in PR77536, niter_for_unrolled_loop assumes 5
> iterations in the absence of profiling information, although it doesn't
> increase beyond the estimate for the original loop.  This left a hole in
> which the new estimate could be less than the old one but still greater
> than the limit imposed by CEIL (nb_iterations_upper_bound, unroll factor).
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> 2017-05-04  Richard Sandiford  
>
> * tree-ssa-loop-manip.c (niter_for_unrolled_loop): Add commentary
> to explain the use of truncating division.  Cap the number of
> iterations to the maximum given by nb_iterations_upper_bound,
> if defined.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-profile-1.c: New test.
>
> Index: gcc/tree-ssa-loop-manip.c
> ===
> --- gcc/tree-ssa-loop-manip.c   2017-05-03 08:46:26.068861808 +0100
> +++ gcc/tree-ssa-loop-manip.c   2017-05-04 07:41:56.686034705 +0100
> @@ -1104,6 +1104,9 @@ niter_for_unrolled_loop (struct loop *lo
>gcc_assert (factor != 0);
>bool profile_p = false;
>gcov_type est_niter = expected_loop_iterations_unbounded (loop, 
> _p);
> +  /* Note that this is really CEIL (est_niter + 1, factor) - 1, where the
> + "+ 1" converts latch iterations to loop iterations and the "- 1"
> + converts back.  */
>gcov_type new_est_niter = est_niter / factor;
>
>/* Without profile feedback, loops for which we do not know a better 
> estimate
> @@ -1120,6 +1123,15 @@ niter_for_unrolled_loop (struct loop *lo
> new_est_niter = 5;
>  }
>
> +  if (loop->any_upper_bound)
> +{
> +  /* As above, this is really CEIL (upper_bound + 1, factor) - 1.  */
> +  widest_int bound = wi::udiv_floor (loop->nb_iterations_upper_bound,
> +factor);
> +  if (wi::ltu_p (bound, new_est_niter))
> +   new_est_niter = bound.to_uhwi ();
> +}
> +
>return new_est_niter;
>  }
>
> Index: gcc/testsuite/gcc.dg/vect/vect-profile-1.c
> ===
> --- /dev/null   2017-05-04 07:24:39.449302696 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-profile-1.c  2017-05-04 07:41:56.685075916 
> +0100
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fdump-tree-vect-details-blocks" } */
> +
> +/* At least one of these should correspond to a full vector.  */
> +
> +void
> +f1 (int *x)
> +{
> +  for (int j = 0; j < 2; ++j)
> +x[j] += 1;
> +}
> +
> +void
> +f2 (int *x)
> +{
> +  for (int j = 0; j < 4; ++j)
> +x[j] += 1;
> +}
> +
> +void
> +f3 (int *x)
> +{
> +  for (int j = 0; j < 8; ++j)
> +x[j] += 1;
> +}
> +
> +void
> +f4 (int *x)
> +{
> +  for (int j = 0; j < 16; ++j)
> +x[j] += 1;
> +}
> +
> +/* { dg-final { scan-tree-dump {goto ; \[0+.0*%\]} vect } } */


Re: [PATCH] Fix -fopt-info documentation in invoke.texi

2017-05-04 Thread Richard Biener
On Thu, May 4, 2017 at 12:51 AM, Steve Ellcey  wrote:
> The description of the default behavour of -fopt-info in invoke.texi is
> wrong.  This patch fixes it.  I also added a sentence to explicitly say
> what is implied by the note that -fopt-info-vec-missed is the same as
> -fopt-info-missed-vec.  Namely, that order doesn't matter.
>
> OK to checkin?

Ok for trunk and branches.

Richard.

> Steve Ellcey
> sell...@cavium.com
>
>
> 2017-05-03  Steve Ellcey  
>
> * doc/invoke.texi (-fopt-info): Fix description of default
> behavour. Explicitly say order of options included in -fopt-info
> does not matter.
>
>
> diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi
> index e17cb37..f4158a0 100644
> --- a/gcc/doc/optinfo.texi
> +++ b/gcc/doc/optinfo.texi
> @@ -208,16 +208,17 @@ optimized locations from all the inlining passes into
>  If the @var{filename} is provided, then the dumps from all the
>  applicable optimizations are concatenated into the @file{filename}.
>  Otherwise the dump is output onto @file{stderr}. If @var{options} is
> -omitted, it defaults to @option{all-all}, which means dump all
> -available optimization info from all the passes. In the following
> -example, all optimization info is output on to @file{stderr}.
> +omitted, it defaults to @option{optimized-optall}, which means dump
> +information about successfully applied optimizations from all the passes.
> +In the following example, the optimization info is output on to 
> @file{stderr}.
>
>  @smallexample
>  gcc -O3 -fopt-info
>  @end smallexample
>
>  Note that @option{-fopt-info-vec-missed} behaves the same as
> -@option{-fopt-info-missed-vec}.
> +@option{-fopt-info-missed-vec}.  The order of the optimization group
> +names and message types listed after @option{-fopt-info} does not matter.
>
>  As another example, consider


Re: [PATCH][ARM] Update max_cond_insns settings

2017-05-04 Thread Richard Earnshaw (lists)
On 12/04/17 14:02, Wilco Dijkstra wrote:
> The existing setting of max_cond_insns for most cores is non-optimal.
> Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
> Also such long sequences of conditional instructions can increase the number
> of executed instructions significantly, so using 5 for max_cond_insns is
> non-optimal.
> 
> Previous benchmarking showed that setting max_cond_insn to 2 was the best 
> value
> for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from 
> Cortex-A35
> and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
> advantage of producing more uniform code.
> 
> Bootstrap and regress OK on arm-none-linux-gnueabihf.
> 
> OK for stage 1?
> 
> ChangeLog:
> 2017-04-12  Wilco Dijkstra  
> 
> * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
> (arm_cortex_a35_tune): Likewise.
> ---
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
>arm_default_branch_cost,
>_default_vec_cost,
>1, /* Constant limit.  */
> -  5, /* Max cond insns.  */
> +  2, /* Max cond insns.  */
>8, /* Memset max inline.  */
>1, /* Issue rate.  */
>ARM_PREFETCH_NOT_BENEFICIAL,
> @@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
>arm_default_branch_cost,
>_default_vec_cost,
>1, /* Constant limit.  */
> -  5, /* Max cond insns.  */
> +  2, /* Max cond insns.  */
>8, /* Memset max inline.  */
>2, /* Issue rate.  */
>ARM_PREFETCH_NOT_BENEFICIAL,
> 


This parameter is also used for A32 code.  Is that really the right
number there as well?

I do wonder if the code in arm_option_params_internal should be tweaked
to hard-limit the number of skipped insns for Thumb2 to one IT block.  So

/* When -mrestrict-it is in use tone down the if-conversion.  */
max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
  ? 1
  : (TARGET_THUMB2 ? MIN (current_tune->max_insns_skipped, 4)
 | current_tune->max_insns_skipped);




[OBVIOUS][PATCH] Remove an unused variable.

2017-05-04 Thread Martin Liška

Hi.

Installed as obvious.

Martin

>From 67276230a5f6150a214d8be4ebbd4962f2ce371b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 4 May 2017 11:59:31 +0200
Subject: [PATCH] Remove an unused variable.

gcc/ChangeLog:

2017-05-04  Martin Liska  

	* tree-vrp.c (simplify_cond_using_ranges_2): Remove unused
	variable cond_code.
---
 gcc/tree-vrp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 36cb7480ecd..8d6124f58a6 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9644,7 +9644,6 @@ simplify_cond_using_ranges_2 (gcond *stmt)
 {
   tree op0 = gimple_cond_lhs (stmt);
   tree op1 = gimple_cond_rhs (stmt);
-  enum tree_code cond_code = gimple_cond_code (stmt);
 
   /* If we have a comparison of an SSA_NAME (OP0) against a constant,
  see if OP0 was set by a type conversion where the source of
-- 
2.12.2



Re: [PATCH] Backport the recent ARM ABI patch to 6 (PR target/77728)

2017-05-04 Thread Marek Polacek
Ping.

On Thu, Apr 27, 2017 at 12:44:42PM +0200, Marek Polacek wrote:
> This is a backport of the ARM ABI fix, except that it doesn't change code,
> only adds the ABI warning.
> 
> So there were four changes, three of them are changing "else if (res < 0)"
> to "if (res != 0)" and the fourth was the "res != 0" change in
> arm_function_arg_boundary.
> 
> I've verified on a testcase that we now get the warning but there are no
> changes in .s files.
> 
> Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for 6?
> 
> 2017-04-26  Marek Polacek  
>   Ramana Radhakrishnan  
>   Jakub Jelinek  
> 
>   PR target/77728
>   * config/arm/arm.c: Include gimple.h.
>   (aapcs_layout_arg): Emit -Wpsabi note if arm_needs_doubleword_align
>   returns negative, increment ncrn if it returned non-zero.
>   (arm_needs_doubleword_align): Return int instead of bool,
>   ignore DECL_ALIGN of non-FIELD_DECL TYPE_FIELDS chain
>   members, but if there is any such non-FIELD_DECL
>   > PARM_BOUNDARY aligned decl, return -1 instead of false.
>   (arm_function_arg): Emit -Wpsabi note if arm_needs_doubleword_align
>   returns negative, increment nregs if it returned non-zero.
>   (arm_setup_incoming_varargs): Likewise.
>   (arm_function_arg_boundary): Emit -Wpsabi note if
>   arm_needs_doubleword_align returns negative, return
>   DOUBLEWORD_ALIGNMENT if it returned non-zero.
> 
>   * g++.dg/abi/pr77728-1.C: New test.
> 
> diff --git gcc/config/arm/arm.c gcc/config/arm/arm.c
> index 6373103..b3da8c8 100644
> --- gcc/config/arm/arm.c
> +++ gcc/config/arm/arm.c
> @@ -61,6 +61,7 @@
>  #include "builtins.h"
>  #include "tm-constrs.h"
>  #include "rtl-iter.h"
> +#include "gimple.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -78,7 +79,7 @@ struct four_ints
>  
>  /* Forward function declarations.  */
>  static bool arm_const_not_ok_for_debug_p (rtx);
> -static bool arm_needs_doubleword_align (machine_mode, const_tree);
> +static int arm_needs_doubleword_align (machine_mode, const_tree);
>  static int arm_compute_static_chain_stack_bytes (void);
>  static arm_stack_offsets *arm_get_frame_offsets (void);
>  static void arm_add_gc_roots (void);
> @@ -6137,8 +6138,20 @@ aapcs_layout_arg (CUMULATIVE_ARGS *pcum, machine_mode 
> mode,
>/* C3 - For double-word aligned arguments, round the NCRN up to the
>   next even number.  */
>ncrn = pcum->aapcs_ncrn;
> -  if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> -ncrn++;
> +  if (ncrn & 1)
> +{
> +  int res = arm_needs_doubleword_align (mode, type);
> +  /* Only warn during RTL expansion of call stmts, otherwise we would
> +  warn e.g. during gimplification even on functions that will be
> +  always inlined, and we'd warn multiple times.  Don't warn when
> +  called in expand_function_start either, as we warn instead in
> +  arm_function_arg_boundary in that case.  */
> +  if (res < 0 && warn_psabi && currently_expanding_gimple_stmt)
> + inform (input_location, "parameter passing for argument of type "
> + "%qT will change in GCC 7.1", type);
> +  if (res != 0)
> + ncrn++;
> +}
>  
>nregs = ARM_NUM_REGS2(mode, type);
>  
> @@ -6243,12 +6256,16 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree 
> fntype,
>  }
>  }
>  
> -/* Return true if mode/type need doubleword alignment.  */
> -static bool
> +/* Return 1 if double word alignment is required for argument passing.
> +   Return -1 if double word alignment used to be required for argument
> +   passing before PR77728 ABI fix, but is not required anymore.
> +   Return 0 if double word alignment is not required and wasn't requried
> +   before either.  */
> +static int
>  arm_needs_doubleword_align (machine_mode mode, const_tree type)
>  {
>if (!type)
> -return PARM_BOUNDARY < GET_MODE_ALIGNMENT (mode);
> +return GET_MODE_ALIGNMENT (mode) > PARM_BOUNDARY;
>  
>/* Scalar and vector types: Use natural alignment, i.e. of base type.  */
>if (!AGGREGATE_TYPE_P (type))
> @@ -6258,12 +6275,21 @@ arm_needs_doubleword_align (machine_mode mode, 
> const_tree type)
>if (TREE_CODE (type) == ARRAY_TYPE)
>  return TYPE_ALIGN (TREE_TYPE (type)) > PARM_BOUNDARY;
>  
> +  int ret = 0;
>/* Record/aggregate types: Use greatest member alignment of any member.  
> */ 
>for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
>  if (DECL_ALIGN (field) > PARM_BOUNDARY)
> -  return true;
> +  {
> + if (TREE_CODE (field) == FIELD_DECL)
> +   return 1;
> + else
> +   /* Before PR77728 fix, we were incorrectly considering also
> +  other aggregate fields, like VAR_DECLs, TYPE_DECLs etc.
> +  Make sure we can warn about that with -Wpsabi.  */
> +   ret = -1;
> +  }
>  
> -  return false;
> +  return 

Re: [PATCH, GCC] Require c99_runtime for pr78622.c

2017-05-04 Thread Thomas Preudhomme

And now with the patch.

Best regards,

Thomas

On 04/05/17 10:36, Thomas Preudhomme wrote:

Hi,

Testcase gcc.c-torture/execute/pr78622.c uses %hhd printf specifier
which was introduced in C99. C89 only recognizes h, l and L length
specifier, it does not recognize hh length specifier. As such, this
commit adds a c99_runtime effective target requirement.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-05-04  Thomas Preud'homme  

* gcc.c-torture/execute/pr78622.c: Require c99_runtime effective
target.


Committed as obvious.

Best regards,

Thomas
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78622.c b/gcc/testsuite/gcc.c-torture/execute/pr78622.c
index 85084bef026fc6622d7c83e6e74186fe52a5edd6..384803d1a091723fdb6368faf5b5389de7509555 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr78622.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr78622.c
@@ -1,6 +1,7 @@
 /* PR middle-end/78622 - [7 Regression] -Wformat-overflow/-fprintf-return-value
incorrect with overflow/wrapping
{ dg-skip-if "Requires %hhd format" { hppa*-*-hpux* } { "*" } { "" } }
+   { dg-require-effective-target c99_runtime }
{ dg-additional-options "-Wformat-overflow=2" } */
 
 __attribute__((noinline, noclone)) int


Re: Handle data dependence relations with different bases

2017-05-04 Thread Richard Sandiford
"Bin.Cheng"  writes:
> On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford
>  wrote:
>> Index: gcc/tree-data-ref.h
>> ===
>> --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100
>> +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100
>> @@ -191,6 +191,9 @@ struct conflict_function
>>
>>  struct subscript
>>  {
>> +  /* The access functions of the two references.  */
>> +  tree access_fn[2];
> Is it better to follow existing code, i.e, name this as
> access_fn_a/access_fn_b.  Thus we don't need to use const value 0/1 in
> various places, which is a little bit confusing.

[Answered below]

>> +
>>/* A description of the iterations for which the elements are
>>   accessed twice.  */
>>conflict_function *conflicting_iterations_in_a;
>> @@ -209,6 +212,7 @@ struct subscript
>>
>>  typedef struct subscript *subscript_p;
>>
>> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I]
>>  #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a
>>  #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b
>>  #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict
>> @@ -264,6 +268,33 @@ struct data_dependence_relation
>>/* Set to true when the dependence relation is on the same data
>>   access.  */
>>bool self_reference_p;
>> +
>> +  /* True if the dependence described is conservatively correct rather
>> + than exact, and if it is still possible for the accesses to be
>> + conditionally independent.  For example, the a and b references in:
>> +
>> +   struct s *a, *b;
>> +   for (int i = 0; i < n; ++i)
>> + a->f[i] += b->f[i];
>> +
>> + conservatively have a distance vector of (0), for the case in which
>> + a == b, but the accesses are independent if a != b.  Similarly,
>> + the a and b references in:
>> +
>> +   struct s *a, *b;
>> +   for (int i = 0; i < n; ++i)
>> + a[0].f[i] += b[i].f[i];
>> +
>> + conservatively have a distance vector of (0), but they are indepenent
>> + when a != b + i.  In contrast, the references in:
>> +
>> +   struct s *a;
>> +   for (int i = 0; i < n; ++i)
>> + a->f[i] += a->f[i];
>> +
>> + have the same distance vector of (0), but the accesses can never be
>> + independent.  */
>> +  bool could_be_independent_p;
>>  };
>>
>>  typedef struct data_dependence_relation *ddr_p;
>> @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \
>>  #define DDR_DIST_VECT(DDR, I) \
>>DDR_DIST_VECTS (DDR)[I]
>>  #define DDR_REVERSED_P(DDR) (DDR)->reversed_p
>> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p
>>
>>
>>  bool dr_analyze_innermost (struct data_reference *, struct loop *);
>> @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data
>>return false;
>>
>>return true;
>> -}
>> -
>> -/* Return true when the DDR contains two data references that have the
>> -   same access functions.  */
>> -
>> -static inline bool
>> -same_access_functions (const struct data_dependence_relation *ddr)
>> -{
>> -  unsigned i;
>> -
>> -  for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++)
>> -if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i),
>> - DR_ACCESS_FN (DDR_B (ddr), i)))
>> -  return false;
>> -
>> -  return true;
>>  }
>>
>>  /* Returns true when all the dependences are computable.  */
>> Index: gcc/tree-data-ref.c
>> ===
>> --- gcc/tree-data-ref.c 2017-02-23 19:54:15.0 +
>> +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100
>> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o
>>  } dependence_stats;
>>
>>  static bool subscript_dependence_tester_1 (struct data_dependence_relation 
>> *,
>> -  struct data_reference *,
>> -  struct data_reference *,
>> +  unsigned int, unsigned int,
>>struct loop *);
> As mentioned, how about passing access_fn directly, rather than less
> meaningful 0/1 values?

The problem is that access_fn is a property of the individual
subscripts, whereas this is operating on a full data_reference.

One alternative would be to use conditions like:

  first_is_a ? SUB_ACCESS_FN_A (sub) : SUB_ACCESS_FN_B (sub)

but IMO that's less readable than the existing:

  SUB_ACCESS_FN (sub, index)

Or we could have individual access_fn arrays for A and B, separate
from the main subscript array, but that would mean allocating three
arrays instead of one.

>>  /* Returns true iff A divides B.  */
>>
>> @@ -144,6 +143,21 @@ int_divides_p (int a, int b)
>>return ((b % a) == 0);
>>  }
>>
>> +/* Return true if reference REF contains a union access.  */
>> +
>> +static bool
>> +ref_contains_union_access_p (tree ref)
>> +{
>> +  while 

Re: [PATCH][AArch64] Improve address cost for -mcpu=generic

2017-05-04 Thread Richard Earnshaw (lists)
On 12/04/17 14:08, Wilco Dijkstra wrote:
> All cores which add a cpu_addrcost_table use a non-zero value for
> HI and TI mode shifts (a non-zero value for general indexing also 
> applies to all shifts).  Given this, it makes no sense to use a
> different setting in generic_addrcost_table.  So change it so that all
> supported cores, including -mcpu=generic, now generate the same:
> 

That's not quite true: exynosm1_addrcost_table has 0 for HI but 2 for
TI; though use of TI here is a bit misleading as it essentially means
"any scaling that isn't 2, 4 or 8".

That being said, this change does seem to be closer to the more general
trend; furthermore, single scaled-offset mems are still merged into one
instruction.

OK.

R.

> int f(short *p, short *q, long x) { return p[x] + q[x]; }
> 
>   lsl x2, x2, 1
>   ldrsh   w3, [x0, x2]
>   ldrsh   w0, [x1, x2]
>   add w0, w3, w0
>   ret
> 
> Bootstrapped for AArch64. Any comments? OK for stage 1?
> 
> ChangeLog:
> 2017-04-12  Wilco Dijkstra  
> 
> * gcc/config/aarch64/aarch64.c (generic_addrcost_table):
> Change HI/TI mode setting.
> 
> ---
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 419b756efcb40e48880cd4529efc4f9f59938325..728ce7029f1e2b5161d9f317d10e564dd5a5f472
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -193,10 +193,10 @@ static const struct aarch64_flag_desc 
> aarch64_tuning_flags[] =
>  static const struct cpu_addrcost_table generic_addrcost_table =
>  {
>  {
> -  0, /* hi  */
> +  1, /* hi  */
>0, /* si  */
>0, /* di  */
> -  0, /* ti  */
> +  1, /* ti  */
>  },
>0, /* pre_modify  */
>0, /* post_modify  */
> 



[PATCH, GCC] Require c99_runtime for pr78622.c

2017-05-04 Thread Thomas Preudhomme

Hi,

Testcase gcc.c-torture/execute/pr78622.c uses %hhd printf specifier
which was introduced in C99. C89 only recognizes h, l and L length
specifier, it does not recognize hh length specifier. As such, this
commit adds a c99_runtime effective target requirement.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-05-04  Thomas Preud'homme  

* gcc.c-torture/execute/pr78622.c: Require c99_runtime effective
target.


Committed as obvious.

Best regards,

Thomas


Re: Handle data dependence relations with different bases

2017-05-04 Thread Bin.Cheng
On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford
 wrote:
> This patch tries to calculate conservatively-correct distance
> vectors for two references whose base addresses are not the same.
> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
> isn't guaranteed to occur.
>
> The motivating example is:
>
>   struct s { int x[8]; };
>   void
>   f (struct s *a, struct s *b)
>   {
> for (int i = 0; i < 8; ++i)
>   a->x[i] += b->x[i];
>   }
>
> in which the "a" and "b" accesses are either independent or have a
> dependence distance of 0 (assuming -fstrict-aliasing).  Neither case
> prevents vectorisation, so we can vectorise without an alias check.
>
> I'd originally wanted to do the same thing for arrays as well, e.g.:
>
>   void
>   f (int a[][8], struct b[][8])
>   {
> for (int i = 0; i < 8; ++i)
>   a[0][i] += b[0][i];
>   }
>
> I think this is valid because C11 6.7.6.2/6 says:
>
>   For two array types to be compatible, both shall have compatible
>   element types, and if both size specifiers are present, and are
>   integer constant expressions, then both size specifiers shall have
>   the same constant value.
>
> So if we access an array through an int (*)[8], it must have type X[8]
> or X[], where X is compatible with int.  It doesn't seem possible in
> either case for "a[0]" and "b[0]" to overlap when "a != b".
>
> However, Richard B said that (at least in gimple) we support arbitrary
> overlap of arrays and allow arrays to be accessed with different
> dimensionality.  There are examples of this in PR50067.  I've therefore
> only handled references that end in a structure field access.
>
> There are two ways of handling these dependences in the vectoriser:
> use them to limit VF, or check at runtime as before.  I've gone for
> the approach of checking at runtime if we can, to avoid limiting VF
> unnecessarily.  We still fall back to a VF cap when runtime checks
> aren't allowed.
>
> The patch tests whether we queued an alias check with a dependence
> distance of X and then picked a VF <= X, in which case it's safe to
> drop the alias check.  Since vect_prune_runtime_alias_check_list can
> be called twice with different VF for the same loop, it's no longer
> safe to clear may_alias_ddrs on exit.  Instead we should use
> comp_alias_ddrs to check whether versioning is necessary.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
Hi Richard,
It's nice to explore more alias opportunity, below are some simple
comments embedded.
>
> Thanks,
> Richard
>
>
> gcc/
> 2017-05-03  Richard Sandiford  
>
> * tree-data-ref.h (subscript): Add access_fn field.
> (data_dependence_relation): Add could_be_independent_p.
> (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros.
> (same_access_functions): Move to tree-data-ref.c.
> * tree-data-ref.c (ref_contains_union_access_p): New function.
> (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of
> DR_ACCESS_FN.
> (constant_access_functions): Likewise.
> (add_other_self_distances): Likewise.
> (same_access_functions): Likewise.  (Moved from tree-data-ref.h.)
> (initialize_data_dependence_relation): Use XCNEW and remove
> explicit zeroing of DDR_REVERSED_P.  Look for a subsequence
> of access functions that have the same type.  Allow the
> subsequence to end with different bases in some circumstances.
> Record the chosen access functions in SUB_ACCESS_FN.
> (build_classic_dist_vector_1): Replace ddr_a and ddr_b with
> a_index and b_index.  Use SUB_ACCESS_FN instead of DR_ACCESS_FN.
> (subscript_dependence_tester_1): Likewise dra and drb.
> (build_classic_dist_vector): Update calls accordingly.
> (subscript_dependence_tester): Likewise.
> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check
> DDR_COULD_BE_INDEPENDENT_P.
> * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test
> comp_alias_ddrs instead of may_alias_ddrs.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try
> to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back
> to using the recorded distance vectors if that fails.
> (dependence_distance_ge_vf): New function.
> (vect_prune_runtime_alias_test_list): Use it.  Don't clear
> LOOP_VINFO_MAY_ALIAS_DDRS.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-alias-check-3.c: New test.
> * gcc.dg/vect/vect-alias-check-4.c: Likewise.
> * gcc.dg/vect/vect-alias-check-5.c: Likewise.
>
> Index: gcc/tree-data-ref.h
> ===
> --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100
> +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100
> @@ -191,6 +191,9 @@ struct conflict_function
>
>  struct 

Re: [RFC][PATCH] Introduce -fdump*-folding

2017-05-04 Thread Martin Liška

On 05/03/2017 12:12 PM, Richard Biener wrote:

On Wed, May 3, 2017 at 10:10 AM, Martin Liška  wrote:

Hello

Last release cycle I spent quite some time with reading of IVOPTS pass
dump file. Using -fdump*-details causes to generate a lot of 'Applying pattern'
lines, which can make reading of a dump file more complicated.

There are stats for tramp3d with -O2 and -fdump-tree-all-details. Percentage 
number
shows how many lines are of the aforementioned pattern:

tramp3d-v4.cpp.164t.ivopts: 6.34%
  tramp3d-v4.cpp.091t.ccp2: 5.04%
  tramp3d-v4.cpp.093t.cunrolli: 4.41%
  tramp3d-v4.cpp.129t.laddress: 3.70%
  tramp3d-v4.cpp.032t.ccp1: 2.31%
  tramp3d-v4.cpp.038t.evrp: 1.90%
 tramp3d-v4.cpp.033t.forwprop1: 1.74%
  tramp3d-v4.cpp.103t.vrp1: 1.52%
 tramp3d-v4.cpp.124t.forwprop3: 1.31%
  tramp3d-v4.cpp.181t.vrp2: 1.30%
   tramp3d-v4.cpp.161t.cunroll: 1.22%
tramp3d-v4.cpp.027t.fixup_cfg3: 1.11%
   tramp3d-v4.cpp.153t.ivcanon: 1.07%
  tramp3d-v4.cpp.126t.ccp3: 0.96%
  tramp3d-v4.cpp.143t.sccp: 0.91%
 tramp3d-v4.cpp.185t.forwprop4: 0.82%
   tramp3d-v4.cpp.011t.cfg: 0.74%
 tramp3d-v4.cpp.096t.forwprop2: 0.50%
tramp3d-v4.cpp.019t.fixup_cfg1: 0.37%
 tramp3d-v4.cpp.120t.phicprop1: 0.33%
   tramp3d-v4.cpp.133t.pre: 0.32%
 tramp3d-v4.cpp.182t.phicprop2: 0.27%
tramp3d-v4.cpp.170t.veclower21: 0.25%
   tramp3d-v4.cpp.029t.einline: 0.24%

I'm suggesting to add new TDF that will be allocated for that.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Thoughts?


Ok.  Soon we'll want to change dump_flags to uint64_t ...  (we have 1 bit left
if you allow negative dump_flags).  It'll tickle down on a lot of interfaces
so introducing dump_flags_t at the same time might be a good idea.


Hello.

I've prepared patch that migrates all interfaces and introduces dump_flags_t. 
I've been
currently testing that. Apart from that Richi requested to come up with more 
generic approach
of hierarchical structure of options.

Can you please take a look at self-contained source file that shows way I've 
decided to go?
Another question is whether we want to implement also "aliases", where for 
instance
current 'all' is equal to union of couple of suboptions?

Thanks for feedback,
Martin



Thanks,
Richard.


Martin


#include 
#include 
#include "stdio.h"
#include "string.h"

using namespace std;

enum dump_option_enum
{
  FOLDING,
  FOLDING_GIMPLE,
  FOLDING_GIMPLE_CTOR,
  FOLDING_GIMPLE_MATCH,
  FOLDING_GENERIC,
  FOLDING_GENERIC_C,
  FOLDING_GENERIC_CPP,
  DUMP_OPTION_COUNT
};

struct dump_option_node
{
  inline dump_option_node (const char *_name, dump_option_enum _enum_value);
  inline void initialize (uint64_t *mask_translation);
  inline uint64_t initialize_masks (unsigned *current, uint64_t *mask_translation);
  inline uint64_t parse(char *token);

  const char *name;
  dump_option_enum enum_value;
  vector children;
  uint64_t mask;
};

dump_option_node::dump_option_node (const char *_name,
dump_option_enum _enum_value):
  name (_name), enum_value (_enum_value), mask (0)
{}

void
dump_option_node::initialize (uint64_t *mask_translation)
{
  unsigned current = 0;
  initialize_masks (, mask_translation);
}

uint64_t
dump_option_node::initialize_masks (unsigned *current, uint64_t *mask_translation)
{
  if (children.empty ())
mask = (1 << ((*current)++));
  // TODO: add assert
  else
{
  uint64_t combined = 0;
  for (unsigned i = 0; i < children.size (); i++)
	combined |= children[i]->initialize_masks (current, mask_translation);

  mask = combined;
}

  mask_translation[enum_value] = mask;
  return mask;
}

uint64_t
dump_option_node::parse(char *token)
{
  if (token == NULL)
return mask;

  for (unsigned i = 0; i < children.size (); i++)
if (strcmp (children[i]->name, token) == 0)
{
  token = strtok (NULL, "-");
  return children[i]->parse (token); 
}

  return 0;
}

struct dump_flags_t
{
  dump_flags_t ();
  dump_flags_t (uint64_t mask);
  dump_flags_t (dump_option_enum enum_value);

  inline void operator|= (dump_flags_t other)
  {
m_mask |= other.m_mask;
  }

  inline void operator&= (dump_flags_t other)
  {
m_mask &= other.m_mask;
  }

  inline bool operator& (dump_flags_t other)
  {
return m_mask & other.m_mask;
  }

  static inline void init ();
  static dump_flags_t parse (char *option);

  uint64_t m_mask;

  static dump_option_node *root;
  static uint64_t mask_translation[DUMP_OPTION_COUNT];
};


[PATCH] Fix ICE in get_range_info (PR tree-optimization/80612)

2017-05-04 Thread Marek Polacek
We need to check that the SSA_NAME we're passing down to get_range_info
is of INTEGRAL_TYPE_P; on pointers we'd crash on an assert.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 7.2?

2017-05-04  Marek Polacek  

PR tree-optimization/80612
* calls.c (get_size_range): Check for INTEGRAL_TYPE_P.

* gcc.dg/torture/pr80612.c: New test.

diff --git gcc/calls.c gcc/calls.c
index c26f157..bd081cc 100644
--- gcc/calls.c
+++ gcc/calls.c
@@ -1270,7 +1270,7 @@ get_size_range (tree exp, tree range[2])
 
   wide_int min, max;
   enum value_range_type range_type
-= (TREE_CODE (exp) == SSA_NAME
+= ((TREE_CODE (exp) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE (exp)))
? get_range_info (exp, , ) : VR_VARYING);
 
   if (range_type == VR_VARYING)
diff --git gcc/testsuite/gcc.dg/torture/pr80612.c 
gcc/testsuite/gcc.dg/torture/pr80612.c
index e69de29..225b811 100644
--- gcc/testsuite/gcc.dg/torture/pr80612.c
+++ gcc/testsuite/gcc.dg/torture/pr80612.c
@@ -0,0 +1,15 @@
+/* PR tree-optimization/80612 */
+/* { dg-do compile } */
+
+struct obstack *a;
+struct obstack {
+  union {
+void *plain;
+void (*extra)();
+  } chunkfun;
+} fn1(void p4()) {
+  a->chunkfun.plain = p4;
+  a->chunkfun.extra(a);
+}
+void fn2(int) __attribute__((__alloc_size__(1)));
+void fn3() { fn1(fn2); }

Marek


[PATCH] Tweak array_at_struct_end_p

2017-05-04 Thread Richard Biener

The following picks the changes suggested as followup for PR80533
that do not cause the warning regression on accessing a [0] array.

Additionally the patch removes the unnecessary allow_compref of the
function.

The question whether we want to allow an array to extend into
padding still stands.  This patch allows it for C99 flex arrays
(but not pre-C99 GNU extension [0] due to the above warning
regression, also not for [1] or larger arrays we treat as flex arrays
when we can't see an underlying decl).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-05-04  Richard Biener  

* tree.c (array_at_struct_end_p): Handle arrays at struct
end with flexarrays more conservatively.  Refactor and treat
arrays of arrays or aggregates more strict.  Fix
VIEW_CONVERT_EXPR handling.  Remove allow_compref argument.
* tree.c (array_at_struct_end_p): Adjust prototype.
* emit-rtl.c (set_mem_attributes_minus_bitpos): Adjust.
* gimple-fold.c (get_range_strlen): Likewise.
* tree-chkp.c (chkp_may_narrow_to_field): Likewise.

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 247542)
+++ gcc/tree.c  (working copy)
@@ -13227,18 +13235,26 @@ array_ref_up_bound (tree exp)
   return NULL_TREE;
 }
 
-/* Returns true if REF is an array reference to an array at the end of
-   a structure.  If this is the case, the array may be allocated larger
-   than its upper bound implies.  When ALLOW_COMPREF is true considers
-   REF when it's a COMPONENT_REF in addition ARRAY_REF and
-   ARRAY_RANGE_REF.  */
+/* Returns true if REF is an array reference or a component reference
+   to an array at the end of a structure.
+   If this is the case, the array may be allocated larger
+   than its upper bound implies.  */
 
 bool
-array_at_struct_end_p (tree ref, bool allow_compref)
+array_at_struct_end_p (tree ref)
 {
-  if (TREE_CODE (ref) != ARRAY_REF
-  && TREE_CODE (ref) != ARRAY_RANGE_REF
-  && (!allow_compref || TREE_CODE (ref) != COMPONENT_REF))
+  tree atype;
+
+  if (TREE_CODE (ref) == ARRAY_REF
+  || TREE_CODE (ref) == ARRAY_RANGE_REF)
+{
+  atype = TREE_TYPE (TREE_OPERAND (ref, 0));
+  ref = TREE_OPERAND (ref, 0);
+}
+  else if (TREE_CODE (ref) == COMPONENT_REF
+  && TREE_CODE (TREE_TYPE (TREE_OPERAND (ref, 1))) == ARRAY_TYPE)
+atype = TREE_TYPE (TREE_OPERAND (ref, 1));
+  else
 return false;
 
   while (handled_component_p (ref))
@@ -13246,19 +13262,42 @@ array_at_struct_end_p (tree ref, bool al
   /* If the reference chain contains a component reference to a
  non-union type and there follows another field the reference
 is not at the end of a structure.  */
-  if (TREE_CODE (ref) == COMPONENT_REF
- && TREE_CODE (TREE_TYPE (TREE_OPERAND (ref, 0))) == RECORD_TYPE)
+  if (TREE_CODE (ref) == COMPONENT_REF)
{
- tree nextf = DECL_CHAIN (TREE_OPERAND (ref, 1));
- while (nextf && TREE_CODE (nextf) != FIELD_DECL)
-   nextf = DECL_CHAIN (nextf);
- if (nextf)
-   return false;
+ if (TREE_CODE (TREE_TYPE (TREE_OPERAND (ref, 0))) == RECORD_TYPE)
+   {
+ tree nextf = DECL_CHAIN (TREE_OPERAND (ref, 1));
+ while (nextf && TREE_CODE (nextf) != FIELD_DECL)
+   nextf = DECL_CHAIN (nextf);
+ if (nextf)
+   return false;
+   }
}
+  /* If we have a multi-dimensional array we do not consider
+ a non-innermost dimension as flex array if the whole
+multi-dimensional array is at struct end.
+Same for an array of aggregates with a trailing array
+member.  */
+  else if (TREE_CODE (ref) == ARRAY_REF)
+   return false;
+  else if (TREE_CODE (ref) == ARRAY_RANGE_REF)
+   ;
+  /* If we view an underlying object as sth else then what we
+ gathered up to now is what we have to rely on.  */
+  else if (TREE_CODE (ref) == VIEW_CONVERT_EXPR)
+   break;
+  else
+   gcc_unreachable ();
 
   ref = TREE_OPERAND (ref, 0);
 }
 
+  /* The array now is at struct end.  Treat flexible arrays as
+ always subject to extend, even into just padding constrained by
+ an underlying decl.  */
+  if (! TYPE_SIZE (atype))
+return true;
+
   tree size = NULL;
 
   if (TREE_CODE (ref) == MEM_REF
Index: gcc/tree.h
===
--- gcc/tree.h  (revision 247542)
+++ gcc/tree.h  (working copy)
@@ -4885,12 +4885,10 @@ extern tree array_ref_up_bound (tree);
EXP, an ARRAY_REF or an ARRAY_RANGE_REF.  */
 extern tree array_ref_low_bound (tree);
 
-/* Returns true if REF is an array reference to an array at the end of
-   a structure.  If this is the case, the array may be allocated larger
-   than its upper bound implies.  When second argument is true 

[PATCH 3/3] Vect peeling cost model

2017-05-04 Thread Robin Dapp
gcc/ChangeLog:

2017-04-26  Robin Dapp  

* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
Change cost model.
(vect_peeling_hash_choose_best_peeling): Return extended peel info.
(vect_peeling_supportable): Return peeling status.
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 7b68582..da49e35 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -904,9 +904,9 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
 
 
 /* Function vect_update_misalignment_for_peel.
-   Sets DR's misalignment
+   Set DR's misalignment
- to 0 if it has the same alignment as DR_PEEL,
-   - to the misalignment computed using NPEEL if DR's salignment is known,
+   - to the misalignment computed using NPEEL if DR's misalignment is known,
- to -1 (unknown) otherwise.
 
DR - the data reference whose misalignment is to be adjusted.
@@ -1293,7 +1293,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
 {
   vect_peel_info elem = *slot;
   int dummy;
-  unsigned int inside_cost = 0, outside_cost = 0, i;
+  unsigned int inside_cost = 0, outside_cost = 0;
   gimple *stmt = DR_STMT (elem->dr);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
@@ -1342,7 +1342,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
choosing an option with the lowest cost (if cost model is enabled) or the
option that aligns as many accesses as possible.  */
 
-static struct data_reference *
+static struct _vect_peel_extended_info
 vect_peeling_hash_choose_best_peeling (hash_table *peeling_htab,
    loop_vec_info loop_vinfo,
unsigned int *npeel,
@@ -1369,7 +1369,7 @@ vect_peeling_hash_choose_best_peeling (hash_table *peeling_hta
 
*npeel = res.peel_info.npeel;
*body_cost_vec = res.body_cost_vec;
-   return res.peel_info.dr;
+   return res;
 }
 
 /* Return true if the new peeling NPEEL is supported.  */
@@ -1518,7 +1518,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   enum dr_alignment_support supportable_dr_alignment;
-  struct data_reference *dr0 = NULL, *first_store = NULL;
+
+  struct data_reference *most_frequent_read = NULL;
+  unsigned int dr_read_count = 0;
+  struct data_reference *most_frequent_write = NULL;
+  unsigned int dr_write_count = 0;
   struct data_reference *dr;
   unsigned int i, j;
   bool do_peeling = false;
@@ -1527,11 +1531,13 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   gimple *stmt;
   stmt_vec_info stmt_info;
   unsigned int npeel = 0;
-  bool all_misalignments_unknown = true;
+  bool one_misalignment_known = false;
+  bool one_misalignment_unknown = false;
+
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
-  unsigned int nelements, mis, same_align_drs_max = 0;
+  unsigned int nelements, mis;
   stmt_vector_for_cost body_cost_vec = stmt_vector_for_cost ();
   hash_table peeling_htab (1);
 
@@ -1652,57 +1658,67 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   npeel_tmp += nelements;
 }
 
-  all_misalignments_unknown = false;
-  /* Data-ref that was chosen for the case that all the
- misalignments are unknown is not relevant anymore, since we
- have a data-ref with known alignment.  */
-  dr0 = NULL;
+	  one_misalignment_known = true;
 }
   else
 {
-  /* If we don't know any misalignment values, we prefer
- peeling for data-ref that has the maximum number of data-refs
- with the same alignment, unless the target prefers to align
- stores over load.  */
-  if (all_misalignments_unknown)
-{
-		  unsigned same_align_drs
-		= STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
-  if (!dr0
-		  || same_align_drs_max < same_align_drs)
-{
-  same_align_drs_max = same_align_drs;
-  dr0 = dr;
-}
-		  /* For data-refs with the same number of related
-		 accesses prefer the one where the misalign
-		 computation will be invariant in the outermost loop.  */
-		  else if (same_align_drs_max == same_align_drs)
+	  /* If we don't know any misalignment values, we prefer
+		 peeling for data-ref that has the maximum number of data-refs
+		 with the same alignment, unless the target prefers to align
+		 stores over load.  */
+	  unsigned same_align_dr_count
+		= STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
+
+	  /* For data-refs with the same number of related
+		 accesses prefer the one where the 

[PATCH 2/3] Vect peeling cost model

2017-05-04 Thread Robin Dapp
Wrap some frequently used snippets in separate functions.

gcc/ChangeLog:

2017-04-26  Robin Dapp  

* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Rename.
(vect_get_peeling_costs_all_drs): Create function.
(vect_peeling_hash_get_lowest_cost):
Use vect_get_peeling_costs_all_drs.
(vect_peeling_supportable): Create function.
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 9ffae94..7b68582 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -903,7 +903,11 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
 }
 
 
-/* Function vect_update_misalignment_for_peel
+/* Function vect_update_misalignment_for_peel.
+   Sets DR's misalignment
+   - to 0 if it has the same alignment as DR_PEEL,
+   - to the misalignment computed using NPEEL if DR's salignment is known,
+   - to -1 (unknown) otherwise.
 
DR - the data reference whose misalignment is to be adjusted.
DR_PEEL - the data reference whose misalignment is being made
@@ -916,7 +920,7 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
struct data_reference *dr_peel, int npeel)
 {
   unsigned int i;
-  vec same_align_drs;
+  vec same_aligned_drs;
   struct data_reference *current_dr;
   int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr;
   int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel;
@@ -932,9 +936,9 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
 
   /* It can be assumed that the data refs with the same alignment as dr_peel
  are aligned in the vector loop.  */
-  same_align_drs
+  same_aligned_drs
 = STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (DR_STMT (dr_peel)));
-  FOR_EACH_VEC_ELT (same_align_drs, i, current_dr)
+  FOR_EACH_VEC_ELT (same_aligned_drs, i, current_dr)
 {
   if (current_dr != dr)
 continue;
@@ -1234,27 +1238,23 @@ vect_peeling_hash_get_most_frequent (_vect_peel_info **slot,
   return 1;
 }
 
+/* Get the costs of peeling NPEEL iterations checking data access costs
+   for all data refs. */
 
-/* Traverse peeling hash table and calculate cost for each peeling option.
-   Find the one with the lowest cost.  */
-
-int
-vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
-   _vect_peel_extended_info *min)
+static void
+vect_get_peeling_costs_all_drs (struct data_reference *dr0,
+unsigned int *inside_cost,
+unsigned int *outside_cost,
+stmt_vector_for_cost *body_cost_vec,
+unsigned int npeel, unsigned int vf)
 {
-  vect_peel_info elem = *slot;
-  int save_misalignment, dummy;
-  unsigned int inside_cost = 0, outside_cost = 0, i;
-  gimple *stmt = DR_STMT (elem->dr);
+  gimple *stmt = DR_STMT (dr0);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
-  struct data_reference *dr;
-  stmt_vector_for_cost prologue_cost_vec, body_cost_vec, epilogue_cost_vec;
 
-  prologue_cost_vec.create (2);
-  body_cost_vec.create (2);
-  epilogue_cost_vec.create (2);
+  unsigned i;
+  data_reference *dr;
 
   FOR_EACH_VEC_ELT (datarefs, i, dr)
 {
@@ -1272,12 +1272,40 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
 	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
 	continue;
 
+  int save_misalignment;
   save_misalignment = DR_MISALIGNMENT (dr);
-  vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
-  vect_get_data_access_cost (dr, _cost, _cost,
- _cost_vec);
+  if (dr == dr0 && npeel == vf / 2)
+	SET_DR_MISALIGNMENT (dr, 0);
+  else
+	vect_update_misalignment_for_peel (dr, dr0, npeel);
+  vect_get_data_access_cost (dr, inside_cost, outside_cost,
+ body_cost_vec);
   SET_DR_MISALIGNMENT (dr, save_misalignment);
 }
+}
+
+/* Traverse peeling hash table and calculate cost for each peeling option.
+   Find the one with the lowest cost.  */
+
+int
+vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
+   _vect_peel_extended_info *min)
+{
+  vect_peel_info elem = *slot;
+  int dummy;
+  unsigned int inside_cost = 0, outside_cost = 0, i;
+  gimple *stmt = DR_STMT (elem->dr);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  stmt_vector_for_cost prologue_cost_vec, body_cost_vec,
+		   epilogue_cost_vec;
+
+  prologue_cost_vec.create (2);
+  body_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  vect_get_peeling_costs_all_drs (elem->dr, _cost, _cost,
+  _cost_vec, elem->npeel, 0);
 
   outside_cost += vect_get_known_peeling_cost
 (loop_vinfo, elem->npeel, ,
@@ -1292,7 +1320,8 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
   epilogue_cost_vec.release ();
 
   if (inside_cost < min->inside_cost
-  || (inside_cost == min->inside_cost && outside_cost < 

Re: [PATCH, GCC/ARM, Stage 1] Enable Purecode for ARMv8-M Baseline

2017-05-04 Thread Ramana Radhakrishnan
On Thu, May 04, 2017 at 09:50:42AM +0100, Prakhar Bahuguna wrote:


> > 
> > Otherwise ok. Please respin and test with an armhf thumb32 bootstrap
> > and regression test run.
> > 
> > regards
> > Ramana
> 
> I've respun this patch with the suggested changes, along with a new changelog
> for docs:

And tested hopefully with a bootstrap and regression test run on armhf on
GNU/Linux ?

Ok if no regressions.

regards
Ramana

> 
> doc/ChangeLog:
> 
> 2017-01-11  Prakhar Bahuguna  
>   Andre Simoes Dias Vieira  
> 
>   * invoke.texi (-mpure-code): Change "ARMv7-M targets" for
>   "M-profile targets with the MOVT instruction".
> 
> -- 
> 
> Prakhar Bahuguna

> From e0f62c9919ceb9cfc6b4cc49615fb7188ae50519 Mon Sep 17 00:00:00 2001
> From: Prakhar Bahuguna 
> Date: Wed, 15 Mar 2017 10:25:03 +
> Subject: [PATCH] Enable Purecode for ARMv8-M Baseline.
> 
> ---
>  gcc/config/arm/arm.c   | 78 
> ++
>  gcc/config/arm/arm.md  |  6 +-
>  gcc/doc/invoke.texi|  3 +-
>  .../gcc.target/arm/pure-code/pure-code.exp |  5 +-
>  4 files changed, 58 insertions(+), 34 deletions(-)
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 83914913433..e0a7cabcb2e 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -2833,16 +2833,16 @@ arm_option_check_internal (struct gcc_options *opts)
>flag_pic = 0;
>  }
>  
> -  /* We only support -mslow-flash-data on armv7-m targets.  */
> -  if (target_slow_flash_data
> -  && ((!(arm_arch7 && !arm_arch_notm) && !arm_arch7em)
> -   || (TARGET_THUMB1_P (flags) || flag_pic || TARGET_NEON)))
> -error ("-mslow-flash-data only supports non-pic code on armv7-m 
> targets");
> -
> -  /* We only support pure-code on Thumb-2 M-profile targets.  */
> -  if (target_pure_code
> -  && (!arm_arch_thumb2 || arm_arch_notm || flag_pic || TARGET_NEON))
> -error ("-mpure-code only supports non-pic code on armv7-m targets");
> +  /* We only support -mpure-code and -mslow-flash-data on M-profile targets
> + with MOVT.  */
> +  if ((target_pure_code || target_slow_flash_data)
> +  && (!TARGET_HAVE_MOVT || arm_arch_notm || flag_pic || TARGET_NEON))
> +{
> +  const char *flag = (target_pure_code ? "-mpure-code" :
> +  "-mslow-flash-data");
> +  error ("%s only supports non-pic code on M-profile targets with the "
> +  "MOVT instruction", flag);
> +}
>  
>  }
>  
> @@ -4077,7 +4077,7 @@ const_ok_for_arm (HOST_WIDE_INT i)
>  || (i & ~0xfc03) == 0))
>   return TRUE;
>  }
> -  else
> +  else if (TARGET_THUMB2)
>  {
>HOST_WIDE_INT v;
>  
> @@ -4093,6 +4093,14 @@ const_ok_for_arm (HOST_WIDE_INT i)
>if (i == v)
>   return TRUE;
>  }
> +  else if (TARGET_HAVE_MOVT)
> +{
> +  /* Thumb-1 Targets with MOVT.  */
> +  if (i > 0x)
> + return FALSE;
> +  else
> + return TRUE;
> +}
>  
>return FALSE;
>  }
> @@ -7736,6 +7744,32 @@ arm_legitimate_address_outer_p (machine_mode mode, rtx 
> x, RTX_CODE outer,
>return 0;
>  }
>  
> +/* Return true if we can avoid creating a constant pool entry for x.  */
> +static bool
> +can_avoid_literal_pool_for_label_p (rtx x)
> +{
> +  /* Normally we can assign constant values to target registers without
> + the help of constant pool.  But there are cases we have to use constant
> + pool like:
> + 1) assign a label to register.
> + 2) sign-extend a 8bit value to 32bit and then assign to register.
> +
> + Constant pool access in format:
> + (set (reg r0) (mem (symbol_ref (".LC0"
> + will cause the use of literal pool (later in function arm_reorg).
> + So here we mark such format as an invalid format, then the compiler
> + will adjust it into:
> + (set (reg r0) (symbol_ref (".LC0")))
> + (set (reg r0) (mem (reg r0))).
> + No extra register is required, and (mem (reg r0)) won't cause the use
> + of literal pools.  */
> +  if (arm_disable_literal_pool && GET_CODE (x) == SYMBOL_REF
> +  && CONSTANT_POOL_ADDRESS_P (x))
> +return 1;
> +  return 0;
> +}
> +
> +
>  /* Return nonzero if X is a valid Thumb-2 address operand.  */
>  static int
>  thumb2_legitimate_address_p (machine_mode mode, rtx x, int strict_p)
> @@ -7799,23 +7833,7 @@ thumb2_legitimate_address_p (machine_mode mode, rtx x, 
> int strict_p)
> && thumb2_legitimate_index_p (mode, xop0, strict_p)));
>  }
>  
> -  /* Normally we can assign constant values to target registers without
> - the help of constant pool.  But there are cases we have to use constant
> - pool like:
> - 1) assign a label to register.
> - 2) sign-extend a 8bit value to 32bit and then assign to register.
> -
> - Constant pool access in 

Re: Fix bootstrap issue with gcc 4.1

2017-05-04 Thread Jan Hubicka
> >
> >Sure, I'm not questioning the patch, just wondering if we shouldn't
> >improve
> >store-merging further (we want to do it anyway for e.g. bitop adjacent
> >operations etc.).
> 
> We definitely want to do that.  It should also 'nicely' merge with bswap for 
> gathering the load side of a piecewise memory to memory copy.

The code we produce now in .optimized is:
   [15.35%]:  
  # DEBUG this => _42   
  MEM[(struct inline_summary *)_42].estimated_self_stack_size = 0;  
  MEM[(struct inline_summary *)_42].self_size = 0;  
  _44 = [(struct inline_summary *)_42].self_time;   
  # DEBUG this => _44   
  # DEBUG sig => 0  
  # DEBUG exp => 0  
  MEM[(struct sreal *)_42 + 16B].m_sig = 0; 
  MEM[(struct sreal *)_42 + 16B].m_exp = 0; 
  sreal::normalize (_44);   
  # DEBUG this => NULL  
  # DEBUG sig => NULL   
  # DEBUG exp => NULL   
  MEM[(struct inline_summary *)_42].min_size = 0;   
  MEM[(struct inline_summary *)_42].inlinable = 0;  
  MEM[(struct inline_summary *)_42].contains_cilk_spawn = 0;
  MEM[(struct inline_summary *)_42].single_caller = 0;  
  MEM[(struct inline_summary *)_42].fp_expressions = 0; 
  MEM[(struct inline_summary *)_42].estimated_stack_size = 0;   
  MEM[(struct inline_summary *)_42].stack_frame_offset = 0; 
  _45 = [(struct inline_summary *)_42].time;
  # DEBUG this => _45   
  # DEBUG sig => 0  
  # DEBUG exp => 0  
  MEM[(struct sreal *)_42 + 56B].m_sig = 0; 
  MEM[(struct sreal *)_42 + 56B].m_exp = 0; 
  sreal::normalize (_45);   
  # DEBUG this => NULL  
  # DEBUG sig => NULL   
  # DEBUG exp => NULL   
  MEM[(struct inline_summary *)_42].size = 0;   
  MEM[(void *)_42 + 80B] = 0;   
  MEM[(void *)_42 + 88B] = 0;   
  MEM[(void *)_42 + 96B] = 0;   
  MEM[(struct predicate * *)_42 + 104B] = 0;
  MEM[(void *)_42 + 112B] = 0;  
  MEM[(void *)_42 + 120B] = 0;  
  goto ; [100.00%]   

so indeed it is not quite pretty at all. Even the bitfields are split
and there is offlined call to normlize zero in sreal.
I wonder why inliner does not see htis as obviously good inlining oppurtunity.
I will look into that problem. For store merging it would indeed be
very nice to improve this as well.

Honza
> 
> Richard.
> 
> >
> > Jakub


  1   2   >