Re: [PATCH] Improve BB vectorization dependence analysis

2015-11-17 Thread Richard Biener
On Tue, 17 Nov 2015, Richard Biener wrote:

> On Mon, 16 Nov 2015, Alan Lawrence wrote:
> 
> > On 09/11/15 12:55, Richard Biener wrote:
> > > 
> > > Currently BB vectorization computes all dependences inside a BB
> > > region and fails all vectorization if it cannot handle some of them.
> > > 
> > > This is obviously not needed - BB vectorization can restrict the
> > > dependence tests to those that are needed to apply the load/store
> > > motion effectively performed by the vectorization (sinking all
> > > participating loads/stores to the place of the last one).
> > > 
> > > With restructuring it that way it's also easy to not give up completely
> > > but only for the SLP instance we cannot vectorize (this gives
> > > a slight bump in my SPEC CPU 2006 testing to 756 vectorized basic
> > > block regions).
> > > 
> > > But first and foremost this patch is to reduce the dependence analysis
> > > cost and somewhat mitigate the compile-time effects of the first patch.
> > > 
> > > For fixing PR56118 only a cost model issue remains.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > 
> > > Richard.
> > > 
> > > 2015-11-09  Richard Biener  
> > > 
> > >   PR tree-optimization/56118
> > >   * tree-vectorizer.h (vect_find_last_scalar_stmt_in_slp): Declare.
> > >   * tree-vect-slp.c (vect_find_last_scalar_stmt_in_slp): Export.
> > >   * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): New
> > >   function.
> > >   (vect_slp_analyze_data_ref_dependences): Instead of computing
> > >   all dependences of the region DRs just analyze the code motions
> > >   SLP vectorization will perform.  Remove SLP instances that
> > >   cannot have their store/load motions applied.
> > >   (vect_analyze_data_refs): Allow DRs without a vectype
> > >   in BB vectorization.
> > > 
> > >   * gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c: Adjust.
> > 
> > Since this, I've been seeing an ICE on gfortran.dg/vect/vect-9.f90 at on 
> > both
> > aarch64-none-linux-gnu and arm-none-linux-gnueabihf:
> > 
> > spawn /home/alalaw01/build/gcc/testsuite/gfortran4/../../gfortran
> > -B/home/alalaw01/build/gcc/testsuite/gfortran4/../../
> > -B/home/alalaw01/build/aarch64-unknown-linux-gnu/./libgfortran/
> > /home/alalaw01/gcc/gcc/testsuite/gfortran.dg/vect/vect-9.f90
> > -fno-diagnostics-show-caret -fdiagnostics-color=never -O -O2 
> > -ftree-vectorize
> > -fvect-cost-model=unlimited -fdump-tree-vect-details -Ofast -S -o vect-9.s
> > /home/alalaw01/gcc/gcc/testsuite/gfortran.dg/vect/vect-9.f90:5:0: Error:
> > definition in block 13 follows the use for SSA_NAME: _339 in statement:
> > vectp.156_387 = &*cc_36(D)[_339];
> > /home/alalaw01/gcc/gcc/testsuite/gfortran.dg/vect/vect-9.f90:5:0: internal
> > compiler error: verify_ssa failed
> > 0xcfc61b verify_ssa(bool, bool)
> > ../../gcc-fsf/gcc/tree-ssa.c:1039
> > 0xa2fc0b execute_function_todo
> > ../../gcc-fsf/gcc/passes.c:1952
> > 0xa30393 do_per_function
> > ../../gcc-fsf/gcc/passes.c:1632
> > 0xa3058f execute_todo
> > ../../gcc-fsf/gcc/passes.c:2000
> > Please submit a full bug report...
> > FAIL: gfortran.dg/vect/vect-9.f90   -O  (internal compiler error)
> > FAIL: gfortran.dg/vect/vect-9.f90   -O  (test for excess errors)
> > 
> > Still there (on aarch64) at r230329.
> 
> Please open a bugreport.

I have opened PR68379 with preliminary analysis.

Richard.


Re: [PATCH, PR middle-end/68134] Reject scalar modes in default get_mask_mode hook

2015-11-17 Thread Richard Biener
On Tue, Nov 17, 2015 at 12:49 PM, Ilya Enkovich  wrote:
> Hi,
>
> Default hook for get_mask_mode is supposed to return integer vector modes.  
> This means it should reject calar modes returned by mode_for_vector.  
> Bootstrapped and regtested on x86_64-unknown-linux-gnu, regtested on 
> aarch64-unknown-linux-gnu.  OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-17  Ilya Enkovich  
>
> PR middle-end/68134
> * targhooks.c (default_get_mask_mode): Filter out
> scalar modes returned by mode_for_vector.
>
> gcc/testsuite/
>
> 2015-11-17  Ilya Enkovich  
>
> PR middle-end/68134
> * gcc.dg/pr68134.c: New test.
>
>
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index c34b4e9..66d983b 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -1093,8 +1093,8 @@ default_get_mask_mode (unsigned nunits, unsigned 
> vector_size)
>gcc_assert (elem_size * nunits == vector_size);
>
>vector_mode = mode_for_vector (elem_mode, nunits);
> -  if (VECTOR_MODE_P (vector_mode)
> -  && !targetm.vector_mode_supported_p (vector_mode))
> +  if (!VECTOR_MODE_P (vector_mode)
> +  || !targetm.vector_mode_supported_p (vector_mode))
>  vector_mode = BLKmode;
>
>return vector_mode;
> diff --git a/gcc/testsuite/gcc.dg/pr68134.c b/gcc/testsuite/gcc.dg/pr68134.c
> new file mode 100644
> index 000..522b4c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68134.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c99" } */
> +
> +#include 
> +
> +typedef double float64x1_t __attribute__ ((vector_size (8)));
> +typedef uint64_t uint64x1_t;
> +
> +void
> +foo (void)
> +{
> +  float64x1_t arg1 = (float64x1_t) 0x3fedf9d4343c7c80;
> +  float64x1_t arg2 = (float64x1_t) 0x3fcdc53742ea9c40;
> +  uint64x1_t result = (uint64x1_t) (arg1 == arg2);
> +  uint64_t got = result;
> +  uint64_t exp = 0;
> +  if (got != 0)
> +__builtin_abort ();
> +}


Re: [PATCH] Clarify that -fwrapv covers all signed arithmetic overflow

2015-11-17 Thread Joseph Myers
On Tue, 17 Nov 2015, Paolo Bonzini wrote:

> GCC's -fwrapv option does not affect code generation for shifts
> because currently GCC does not rely on the fact that certain
> signed shifts trigger undefined behavior.  However, the definition
> of signed arithmetic overflow does extend to shifts; it is only
> code generation that is limited to addition, subtraction and
> multiplication.

It is part of the GNU C language, independent of -fwrapv, that shifts 
where the shift amount is in the range [0, width - 1] are fully defined 
(although ubsan will still sanitize those not defined in ISO C) - they are 
considered to be defined in terms of bits, not integers, so overflow is 
not a meaningful concept for them.

-fwrapv *should* however affect division and modulo -1, although it 
doesn't at present (see bug 30484).

-- 
Joseph S. Myers
jos...@codesourcery.com


[RFC PATCH] Do not sanitize left shifts for -fwrapv

2015-11-17 Thread Paolo Bonzini
Left shifts into the sign bit is a kind of overflow, and the
standard chooses to treat left shifts of negative values the
same way.

However, the -fwrapv option modifies the language to one where
integers are defined as two's complement---which also defines
entirely the behavior of shifts.  Disable sanitization of left
shifts when -fwrapv is in effect.

This needs test cases of course, but I wanted to be sure in advance
whether this is an acceptable change and whether it is considered
a bug (thus acceptable for stage 3).  The same change was proposed
for LLVM at https://llvm.org/bugs/show_bug.cgi?id=25552.

Paolo

* c-family/c-ubsan.c (ubsan_instrument_shift): Disable sanitization
of left shifts for wrapping signed types as well.


Index: c-family/c-ubsan.c
===
--- c-family/c-ubsan.c  (revision 227511)
+++ c-family/c-ubsan.c  (working copy)
@@ -150,7 +150,7 @@
  (unsigned) x >> (uprecm1 - y)
  if non-zero, is undefined.  */
   if (code == LSHIFT_EXPR
-  && !TYPE_UNSIGNED (type0)
+  && !TYPE_OVERFLOW_WRAPS (type0)
   && flag_isoc99)
 {
   tree x = fold_build2 (MINUS_EXPR, op1_utype, uprecm1,
@@ -165,7 +165,7 @@
  x < 0 || ((unsigned) x >> (uprecm1 - y))
  if > 1, is undefined.  */
   if (code == LSHIFT_EXPR
-  && !TYPE_UNSIGNED (type0)
+  && !TYPE_OVERFLOW_WRAPS (type0)
   && (cxx_dialect >= cxx11))
 {
   tree x = fold_build2 (MINUS_EXPR, op1_utype, uprecm1,


[v3] Handle C++11 overloads on Solaris 12

2015-11-17 Thread Rainer Orth
Solaris 12 recently introduced the C++11  overloads, which
caused bootstrap to be broken on both mainline and the gcc-5 branch:

In file included from 
/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/precompiled/stdc++.h:41:0:
/var/gcc/regression/trunk/12-gcc/build/i386-pc-solaris2.12/libstdc++-v3/include/cmath:
 In function 'constexpr int std::fpclassify(float)':
/var/gcc/regression/trunk/12-gcc/build/i386-pc-solaris2.12/libstdc++-v3/include/cmath:561:3:
 error: redefinition of 'constexpr int std::fpclassify(float)'
   fpclassify(float __x)
   ^
In file included from /usr/include/math.h:13:0,
 from 
/var/gcc/regression/trunk/12-gcc/build/i386-pc-solaris2.12/libstdc++-v3/include/cmath:44,
 from 
/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/precompiled/stdc++.h:41:
/usr/include/iso/math_c99.h:647:13: note: 'int std::fpclassify(float)' 
previously defined here
  inline int fpclassify(float __X) { return __builtin_fpclassify(
 ^

The following patch fixes this by testing for the problem and wrapping
the overloads in include/c_global/cmath and include/tr1/cmath
appropriately.  The test needs to be dynamic since apparently a backport
to Solaris 11 (and perhaps even Solaris 10) is planned.

Bootstrapped without regressions on {i386-pc,sparc-sun}-solaris2.1[012]
and x86_64-pc-linux-gnu.  Ok for mainline and the gcc-5 branch (where
the patch is identical except for minimally different context and not
restricting the dg-excess-errors in c99_classification_macros_c.cc to
*-*-solaris2.1[01]* since the branch still defaults to -std=gnu++98).

Rainer


2015-11-09  Rainer Orth  

libstdc++-v3:
* acinclude.m4 (GLIBCXX_CHECK_MATH11_PROTO): New test.
* configure.ac: Use it.
* configure: Regenerate.
* config.h.in: Regenerate.

* include/c_global/cmath [__cplusplus >= 201103L]
(std::fpclassify): Wrap in !__CORRECT_ISO_CPP11_MATH_H_PROTO.
(std::isfinite): Likewise.
(std::isinf): Likewise.
(std::isnan): Likewise.
(std::isnormal): Likewise.
(std::signbit): Likewise.
(std::isgreater): Likewise.
(std::isgreaterequal): Likewise.
(std::isless): Likewise.
(std::islessequal): Likewise.
(std::islessgreater): Likewise.
(std::isunordered): Likewise.
(std::acosh): Likewise.
(std::asinh): Likewise.
(std::atanh): Likewise.
(std::cbrt): Likewise.
(std::copysign): Likewise.
(std::erf): Likewise.
(std::erfc): Likewise.
(std::exp2): Likewise.
(std::expm1): Likewise.
(std::fdim): Likewise.
(std::fma): Likewise.
(std::fmax): Likewise.
(std::fmin): Likewise.
(std::hypot): Likewise.
(std::ilogb): Likewise.
(std::lgamma): Likewise.
(std::llrint): Likewise.
(std::llround): Likewise.
(std::log1p): Likewise.
(std::log2): Likewise.
(std::logb): Likewise.
(std::lrint): Likewise.
(std::lround): Likewise.
(std::nearbyint): Likewise.
(std::nextafter): Likewise.
(std::nexttoward): Likewise.
(std::remainder): Likewise.
(std::remquo): Likewise.
(std::rint): Likewise.
(std::round): Likewise.
(std::scalbln): Likewise.
(std::scalbn): Likewise.
(std::tgamma): Likewise.
(std::trunc): Likewise.
* include/tr1/cmath [_GLIBCXX_USE_C99_MATH_TR1] (std::tr1::acosh):
Wrap in !__CORRECT_ISO_CPP11_MATH_H_PROTO.
(std::tr1::asinh): Likewise.
(std::tr1::atanh): Likewise.
(std::tr1::cbrt): Likewise.
(std::tr1::copysign): Likewise.
(std::tr1::erf): Likewise.
(std::tr1::erfc): Likewise.
(std::tr1::exp2): Likewise.
(std::tr1::expm1): Likewise.
(std::tr1::fabs): Likewise.
(std::tr1::fdim): Likewise.
(std::tr1::fma): Likewise.
(std::tr1::fmax): Likewise.
(std::tr1::fmin): Likewise.
(std::tr1::hypot): Likewise.
(std::tr1::ilogb): Likewise.
(std::tr1::lgamma): Likewise.
(std::tr1::llrint): Likewise.
(std::tr1::llround): Likewise.
(std::tr1::log1p): Likewise.
(std::tr1::log2): Likewise.
(std::tr1::logb): Likewise.
(std::tr1::lrint): Likewise.
(std::tr1::lround): Likewise.
(std::tr1::nearbyint): Likewise.
(std::tr1::nextafter): Likewise.
(std::tr1::nexttoward): Likewise.
(std::tr1::remainder): Likewise.
(std::tr1::remquo): Likewise.
(std::tr1::rint): Likewise.
(std::tr1::scalbln): Likewise.
(std::tr1::scalbn): Likewise.
(std::tr1::tgamma): Likewise.
(std::tr1::trunc): Likewise.
(std::tr1::pow): Likewise.

* testsuite/26_numerics/headers/cmath/c99_classification_macros_c.cc:
Restrict 

Re: [PATCH 00/16] Unit tests framework (v3)

2015-11-17 Thread Bernd Schmidt

On 11/17/2015 02:53 AM, Mike Stump wrote:

On Nov 16, 2015, at 3:12 PM, Jeff Law  wrote:

So I'd tend to want them either at the end of the file with a
single #if CHECKING_P or as a separate foo-tests file.


Hum…  I kinda don’t want the main files mucked up with tests.  I
think I’d rather have

#if CHECKING_P #include "test/expr-test.h" #endif

at the end, and punt the whole lot into a single subdirectory that
most people, most of the time, can simply ignore.  Wading through a
ton of code that you aren’t interested in, is, well, annoying.


Most of the tests submitted so far are relatively tiny, sometimes the 
list of #includes in the testcase is longer than the tests themselves. 
If they are at the end of a file you'd hardly be wading through them 
either. Let's just use common sense and make separate files if we ever 
get huge amounts of test code and keep it simple otherwise.



Bernd


Port libvtv to Solaris

2015-11-17 Thread Rainer Orth
Now that init priority support on Solaris is on mainline, porting libvtv
proved to be relatively easy, though it discovered a couple of quirks on
a non-gld non-x86 platform.

A considerable part of the patch lives in Solaris-specific files and
thus doesn't need approval, though some changes require explanation:

* In gcc.c (LINK_COMMAND_SPEC), VTABLE_VERIFICATION_SPEC was before
  %{L*}.  The spec includes -lvtv.  Solaris ld, other than GNU ld, heeds
  the relative order of -L and -l switches, so the libvtv testcases
  wouldn't link manually, but did inside a testsuite run where
  LD_LIBRARY_PATH points ld at the correct directory.

* Solaris/SPARC uses an 8 kB page size, so a couple of cases where uses
  of VTV_PAGE_SIZE had been replaced with hardcoded values of 4096 had
  to be reverted.

* Inside libgcc, the vtv_*.c files are compiled with
  -finhibit-size-directive, whereas in libvtv that flag is absent.  This
  caused all testcases to fail due to a linker warning:

FAIL: libvtv.cc/bb_tests.cc -O0 -fvtable-verify=std (test for excess errors)
Excess errors:
ld: warning: symbol '_vtable_map_vars_end' has differing sizes:
(file 
/var/gcc/gcc-6.0.0-2015/12-gcc-vtv/sparc-sun-solaris2.12/./libvtv/.libs/libvtv.so
 value=0x1000; file /var/gcc/gcc-6.0.0-2015/12-gcc-vtv/gcc/vtv_end.o 
value=0x0);
/var/gcc/gcc-6.0.0-2015/12-gcc-vtv/gcc/vtv_end.o definition taken

* Like Cygwin, Solaris has no obstack functions in libc, so I'm now
  using a common conditional for that.

* libvtv requires constructor priority support and dl_iterate_phdr.  The
  former needs either a recent (Solaris 12 only so far) ld or gld, the
  latter came in Solaris 11 only.

* Unlike glibc systems, Solaris has no __fortify_fail in libc; some of
  this can probably provided using libbacktrace, which I haven't yet
  done.

* It also lacks program_invocation_name, but the functionality can be
  provided via getexecname() instead.

* On Solaris 12/SPARC with Solaris as, the .vtable_map_vars section
  wouldn't be pagesize aligned (I'm still looking how to fix this; it
  works out of the box for .bss), so the section length calclated in
  read_section_offset_and_length would be negative, leading to all sorts
  of havoc.  For the moment, I'm using gas instead to avoid this.

* The patch also fixes a number of typos noticed during testing.

With those changes, I get almost clean libvtv test results on
i386-pc-solaris2.12 (as/ld and gas/gld) and sparc-sun-solaris2.12
(gas/ld):

* i386-pc-solaris2.12:

=== libvtv tests ===


Running target unix

=== libvtv Summary for unix ===

# of expected passes176

Running target unix/-m64
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_inserts_mt.cc -O0 -fvtable-verify=std 
-lpthread execution test
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_inserts_mt.cc -O2 -fvtable-verify=std 
-lpthread execution test

=== libvtv Summary for unix/-m64 ===

# of expected passes174
# of unexpected failures2

=== libvtv Summary ===

# of expected passes350
# of unexpected failures2

* sparc-sun-solaris2.12:

=== libvtv tests ===


Running target unix
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_inserts_mt.cc -O0 -fvtable-verify=std 
-lpthread execution test
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_mt.cc -O0 -fvtable-verify=std -lpthread 
execution test
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_inserts_mt.cc -O2 -fvtable-verify=std 
-lpthread execution test
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_mt.cc -O2 -fvtable-verify=std -lpthread 
execution test

=== libvtv Summary for unix ===

# of expected passes172
# of unexpected failures4

Running target unix/-m64
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_inserts_mt.cc -O0 -fvtable-verify=std 
-lpthread execution test
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_mt.cc -O0 -fvtable-verify=std -lpthread 
execution test
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_inserts_mt.cc -O2 -fvtable-verify=std 
-lpthread execution test
WARNING: program timed out.
FAIL: libvtv.mt.cc/register_set_pair_mt.cc -O2 -fvtable-verify=std -lpthread 
execution test

=== libvtv Summary for unix/-m64 ===

# of expected passes172
# of unexpected failures4

=== libvtv Summary ===

# of expected passes344
# of unexpected failures8

I'm still investigating what causes those timeouts, it seems to be a
scalability issue in libc.

While I realize that we are past stage1, maybe the fact that this patch
is for an off-by-default feature and well localized still could allow it
into mainline at this point?

Thanks.
Rainer


2015-08-20  Rainer Orth  

Re: [PATCH][ARM] PR 68143 Properly update memory offsets when expanding setmem

2015-11-17 Thread Ramana Radhakrishnan


On 06/11/15 10:46, Kyrill Tkachov wrote:
> Hi all,
> 
> In this wrong-code PR the vector setmem expansion and 
> arm_block_set_aligned_vect in particular
> use the wrong offset when calling adjust_automodify_address. In the attached 
> testcase during the
> initial zeroing out we get two V16QI stores, but they both are recorded by 
> adjust_automodify_address
> as modifying x+0 rather than x+0 and x+12 (the total size to be written is 
> 28).
> 
> This led to the scheduling pass moving the store from "x.g = 2;" to before 
> the zeroing stores.
> 
> This patch fixes the problem by keeping track of the offset to which stores 
> are emitted and
> passing it to adjust_automodify_address as appropriate.
> 
> From inspection I see arm_block_set_unaligned_vect also has this issue so I 
> performed the same
> fix in that function as well.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> This bug appears on GCC 5 too and I'm currently testing this patch there.
> Ok to backport to GCC 5 as well?

> 
> Thanks,
> Kyrill
> 
> 2015-11-06  Kyrylo Tkachov  
> 
> PR target/68143
> * config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of
> offset from dstbase and use it appropriately in
> adjust_automodify_address.
> (arm_block_set_aligned_vect): Likewise.
> 
> 2015-11-06  Kyrylo Tkachov  
> 
> PR target/68143
> * gcc.target/arm/pr68143_1.c: New test.

Sorry about the delay in reviewing this. There's nothing arm specific about 
this test - I'd just put this in gcc.c-torture/execute, there are enough 
auto-testers with neon on that will show up issues if this starts failing.

Ok with that change.

Ramana

> 
> arm-setmem-offset.patch
> 
> 
> commit 78c6989a7af1df672ea227057180d79d717ed5f3
> Author: Kyrylo Tkachov 
> Date:   Wed Oct 28 17:29:18 2015 +
> 
> [ARM] Properly update memory offsets when expanding setmem
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 66e8afc..adf3143 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -29268,7 +29268,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
>rtx (*gen_func) (rtx, rtx);
>machine_mode mode;
>unsigned HOST_WIDE_INT v = value;
> -
> +  unsigned int offset = 0;
>gcc_assert ((align & 0x3) != 0);
>nelt_v8 = GET_MODE_NUNITS (V8QImode);
>nelt_v16 = GET_MODE_NUNITS (V16QImode);
> @@ -29289,7 +29289,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
>  return false;
>  
>dst = copy_addr_to_reg (XEXP (dstbase, 0));
> -  mem = adjust_automodify_address (dstbase, mode, dst, 0);
> +  mem = adjust_automodify_address (dstbase, mode, dst, offset);
>  
>v = sext_hwi (v, BITS_PER_WORD);
>val_elt = GEN_INT (v);
> @@ -29306,7 +29306,11 @@ arm_block_set_unaligned_vect (rtx dstbase,
>  {
>emit_insn ((*gen_func) (mem, reg));
>if (i + 2 * nelt_mode <= length)
> - emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
> + {
> +   emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
> +   offset += nelt_mode;
> +   mem = adjust_automodify_address (dstbase, mode, dst, offset);
> + }
>  }
>  
>/* If there are not less than nelt_v8 bytes leftover, we must be in
> @@ -29317,6 +29321,9 @@ arm_block_set_unaligned_vect (rtx dstbase,
>if (i + nelt_v8 < length)
>  {
>emit_insn (gen_add2_insn (dst, GEN_INT (length - i)));
> +  offset += length - i;
> +  mem = adjust_automodify_address (dstbase, mode, dst, offset);
> +
>/* We are shifting bytes back, set the alignment accordingly.  */
>if ((length & 1) != 0 && align >= 2)
>   set_mem_align (mem, BITS_PER_UNIT);
> @@ -29327,12 +29334,13 @@ arm_block_set_unaligned_vect (rtx dstbase,
>else if (i < length && i + nelt_v8 >= length)
>  {
>if (mode == V16QImode)
> - {
> -   reg = gen_lowpart (V8QImode, reg);
> -   mem = adjust_automodify_address (dstbase, V8QImode, dst, 0);
> - }
> + reg = gen_lowpart (V8QImode, reg);
> +
>emit_insn (gen_add2_insn (dst, GEN_INT ((length - i)
> + (nelt_mode - nelt_v8;
> +  offset += (length - i) + (nelt_mode - nelt_v8);
> +  mem = adjust_automodify_address (dstbase, V8QImode, dst, offset);
> +
>/* We are shifting bytes back, set the alignment accordingly.  */
>if ((length & 1) != 0 && align >= 2)
>   set_mem_align (mem, BITS_PER_UNIT);
> @@ -29359,6 +29367,7 @@ arm_block_set_aligned_vect (rtx dstbase,
>rtx rval[MAX_VECT_LEN];
>machine_mode mode;
>unsigned HOST_WIDE_INT v = value;
> +  unsigned int offset = 0;
>  
>gcc_assert ((align & 0x3) == 0);
>nelt_v8 = GET_MODE_NUNITS (V8QImode);
> @@ -29390,14 +29399,15 @@ arm_block_set_aligned_vect (rtx dstbase,
>/* Handle first 16 bytes specially using vst1:v16qi instruction.  */
>if (mode == V16QImode)
>  {
> -  mem = 

Re: [PATCH, PR middle-end/68134] Reject scalar modes in default get_mask_mode hook

2015-11-17 Thread Bernd Schmidt

On 11/17/2015 12:49 PM, Ilya Enkovich wrote:

Default hook for get_mask_mode is supposed to return integer vector
modes.  This means it should reject calar modes returned by
mode_for_vector.  Bootstrapped and regtested on
x86_64-unknown-linux-gnu, regtested on aarch64-unknown-linux-gnu.  OK
for trunk?


You didn't say what exactly fails if an integer mode is returned. I'm 
assuming it's build_truth_vector_type which can call make_vector_type 
with an integer mode.


The patch looks OK to me.


Bernd



Re: C++ PATCH to integrate c++-delayed-folding branch

2015-11-17 Thread Jason Merrill

On 11/17/2015 04:09 AM, Andreas Schwab wrote:

Can we please get trunk back to bootstrap land?


Which target isn't bootstrapping for you?

Jason




[PATCH] Improve comments in pass_tree_loop_init::execute

2015-11-17 Thread Tom de Vries

Hi,

this no-functional-changes patch improves comments in 
pass_tree_loop_init::execute.


For the discussion related to the comment for scev_initialize, see:
- https://gcc.gnu.org/ml/gcc-patches/2013-02/msg01127.html
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56426

OK for trunk?

Thanks,
- Tom
Improve comments in pass_tree_loop_init::execute

2015-11-17  Tom de Vries  

	* tree-ssa-loop.c (pass_tree_loop_init::execute): Improve comments.

---
 gcc/tree-ssa-loop.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 570406f..40df84f 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -276,12 +276,21 @@ public:
 unsigned int
 pass_tree_loop_init::execute (function *fun ATTRIBUTE_UNUSED)
 {
+  /* When processing a loop in the loop pipeline, we should be able to assert
+ that:
+   (loops_state_satisfies_p (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS
+	  | LOOP_CLOSED_SSA)
+	&& scev_initialized_p ())
+  */
+
   loop_optimizer_init (LOOPS_NORMAL
 		   | LOOPS_HAVE_RECORDED_EXITS);
   rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
 
-  /* We might discover new loops, e.g. when turning irreducible
- regions into reducible.  */
+  /* Note that we run scev_initialize here even if number_of_loops () <= 1.
+ Even if we have no real loops now, we might discover new loops while
+ executing the loop pipeline, e.g. when turning irreducible regions into
+ reducible, in which case we still would need scev to be initialized.  */
   scev_initialize ();
 
   return 0;


Re: [AArch64][PATCH 4/7] Add ACLE feature macro for ARMv8.1,Adv.SIMD instructions.

2015-11-17 Thread James Greenhalgh
On Tue, Oct 27, 2015 at 11:33:21AM +, James Greenhalgh wrote:
> On Fri, Oct 23, 2015 at 01:22:16PM +0100, Matthew Wahab wrote:
> > The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> > sqrdmlah and sqrdmlsh. This patch adds the feature macro
> > __ARM_FEATURE_QRDMX to indicate the presence of these instructions,
> > generating it when the feature is available, as it is when
> > -march=armv8.1-a is selected.
> > 
> > Tested the series for aarch64-none-linux-gnu with native bootstrap and
> > make check on an ARMv8 architecture. Also tested aarch64-none-elf with
> > cross-compiled check-gcc on an ARMv8.1 emulator.
> > 
> > Ok for trunk?
> > Matthew
> 
> I don't see this macro documented in the versions of ACLE available from
> the ARM documentation sites, and googling doesn't show anything other
> than your patches. You don't explicitly mention anywhere in cover text for
> this series where these new features are (or will be?) documented.
> 
> Could you please write a more complete description of where these new
> macros and intrinsics come from and what they are intended to do? I would
> not like to accept them without some confidence that these names have
> been finalized, and I am nervous about having the best description of the
> behaviour of them be the GCC source code.

This macro and the intrinsics included in this patch set are as they will
appear in a future release of ACLE.

__ARM_FEATURE_QRDMX will be defined to 1 if the SQRDMLAH and SQRDMLSH
instructions are available.

The intrinsics added take this form for the non-lane intrinsics:

  int16x4_t vqrdmlah_s16 (int16x4_t a, int16x4_t b, int16x4_t c)
a -> Vd.4H, b -> Vn.4H, c-> Vm.4h
VQRDMLAH Vd.4H,Vn.4H,Vm.4H
Vd.4H -> result

And this form for the lane intrinsics:

  int16x4_t vqrdmlah_lane_s16 (int16x4_t a, int16x4_t b,
   int16x4_t v, const int lane)
a -> Vd.4H, b -> Vn.4H, v -> Vm.4h, 0 <= lane <= 3
VQRDMLAH Vd.4H,Vn.4H,Vm.H[lane]
Vd.4H -> result

Using the same syntax as is in the ARM Neon Intrinsics Reference [1].

These intrinsics are only available when __ARM_FEATURE_QRDMX is defined.

With all that said...

This patch is OK, but please fix the ChangeLog entry:

> > * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add
> > ARM_FEATURE_QRDMX.

  s/ARM_FEATURE_QRDMX/__ARM_FEATURE_QRDMX/

Thanks,
James

---
[1]: 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf
  



Re: [PATCH, PR middle-end/68134] Reject scalar modes in default get_mask_mode hook

2015-11-17 Thread Ilya Enkovich
2015-11-17 15:26 GMT+03:00 Bernd Schmidt :
> On 11/17/2015 12:49 PM, Ilya Enkovich wrote:
>>
>> Default hook for get_mask_mode is supposed to return integer vector
>> modes.  This means it should reject calar modes returned by
>> mode_for_vector.  Bootstrapped and regtested on
>> x86_64-unknown-linux-gnu, regtested on aarch64-unknown-linux-gnu.  OK
>> for trunk?
>
>
> You didn't say what exactly fails if an integer mode is returned. I'm
> assuming it's build_truth_vector_type which can call make_vector_type with
> an integer mode.

In case of integer mode we don't have such instruction in optab but
don't lower it either.

Ilya

>
> The patch looks OK to me.
>
>
> Bernd


[visium] Provide user-mode version of libraries

2015-11-17 Thread Eric Botcazou
This adds a user-mode set of multilibs to the visium-elf port.

Applied on the mainline.


2015-11-17  Eric Botcazou  

* config/visium/t-visium (MULTILIB_OPTIONS): Add muser-mode.
(MULTILIB_DIRNAMES): Adjust accordingly.

-- 
Eric BotcazouIndex: config/visium/t-visium
===
--- config/visium/t-visium	(revision 230453)
+++ config/visium/t-visium	(working copy)
@@ -19,5 +19,5 @@
 
 # The compiler defaults to -mcpu=gr5 but this may be overridden via --with-cpu
 # at configure time so the -mcpu setting must be symmetrical.
-MULTILIB_OPTIONS = mcpu=gr5/mcpu=gr6
-MULTILIB_DIRNAMES = gr5 gr6
+MULTILIB_OPTIONS = mcpu=gr5/mcpu=gr6 muser-mode
+MULTILIB_DIRNAMES = gr5 gr6 user


Re: [PATCH] Improve comments in pass_tree_loop_init::execute

2015-11-17 Thread Richard Biener
On Tue, 17 Nov 2015, Tom de Vries wrote:

> Hi,
> 
> this no-functional-changes patch improves comments in
> pass_tree_loop_init::execute.
> 
> For the discussion related to the comment for scev_initialize, see:
> - https://gcc.gnu.org/ml/gcc-patches/2013-02/msg01127.html
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56426
> 
> OK for trunk?

The comment about SCEV is no longer accurate as we gate pass_tree_loop
on having "real" loops.

Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

2015-11-17 Thread Bernd Schmidt

On 11/17/2015 10:08 AM, Kyrill Tkachov wrote:

Yes, I had considered that as well. It should be equivalent. I didn't
use !reg_used_between_p because I thought
it'd be more expensive than checking reg_overlap_mentioned_p since we
must iterate over a number of instructions
and call reg_overlap_mentioned_p on each one. But I suppose this case is
rare enough that it wouldn't make any
measurable difference.

Would you prefer to use !reg_used_between_p here?


I would but apparently it doesn't work, so that's kind of neither here 
nor there.



The added comment could lead to some confusion since it's placed in
front of an existing if statement that also tests a different
condition. Also, if we go with your fix,


+  || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN
(cand->insn


Shouldn't this really be !rtx_equal_p?



Maybe, will it behave the right way if the two regs have different modes
or when subregs are involved?


It would return false, in which case we'll conservatively fail here. I 
think that's desirable?



Bernd


Re: [PATCH][GCC] Make stackalign test LTO proof

2015-11-17 Thread Bernd Schmidt

On 11/16/2015 04:48 PM, Andre Vieira wrote:

On 16/11/15 15:34, Joern Wolfgang Rennecke wrote:

I just happened to stumble on this problem with another port.
The volatile & test solution doesn't work, though.

What does work, however, is:

__asm__ ("" : : "" (dummy));


I can confirm that Joern's solution works for me too.


Ok to make that change.


Bernd


Re: [PATCH] Clarify that -fwrapv covers all signed arithmetic overflow

2015-11-17 Thread Paolo Bonzini


On 17/11/2015 13:58, Joseph Myers wrote:
>> > GCC's -fwrapv option does not affect code generation for shifts
>> > because currently GCC does not rely on the fact that certain
>> > signed shifts trigger undefined behavior.  However, the definition
>> > of signed arithmetic overflow does extend to shifts; it is only
>> > code generation that is limited to addition, subtraction and
>> > multiplication.
> It is part of the GNU C language, independent of -fwrapv, that shifts 
> where the shift amount is in the range [0, width - 1] are fully defined 
> (although ubsan will still sanitize those not defined in ISO C)

-fwrapv should probably disable those checks, just like it disables e.g.
"signed integer overflow: 1 + 2147483647 cannot be represented in type
'int'".

> - they are 
> considered to be defined in terms of bits, not integers, so overflow is 
> not a meaningful concept for them.

Can you suggest a wording for "if the GNU C language definition changes
[which, no matter how unlikely, is explicitly not ruled out by the
manual] -fwrapv will be extended to signed shifts, and shifts of
negative numbers would return A*2^B whenever the result fits in the type"?

Paolo

> -fwrapv *should* however affect division and modulo -1, although it 
> doesn't at present (see bug 30484).


Re: [PATCH][ARM] PR 68143 Properly update memory offsets when expanding setmem

2015-11-17 Thread Kyrill Tkachov


On 17/11/15 12:58, Kyrill Tkachov wrote:

Hi Ramana,

On 17/11/15 12:02, Ramana Radhakrishnan wrote:


On 06/11/15 10:46, Kyrill Tkachov wrote:

Hi all,

In this wrong-code PR the vector setmem expansion and 
arm_block_set_aligned_vect in particular
use the wrong offset when calling adjust_automodify_address. In the attached 
testcase during the
initial zeroing out we get two V16QI stores, but they both are recorded by 
adjust_automodify_address
as modifying x+0 rather than x+0 and x+12 (the total size to be written is 28).

This led to the scheduling pass moving the store from "x.g = 2;" to before the 
zeroing stores.

This patch fixes the problem by keeping track of the offset to which stores are 
emitted and
passing it to adjust_automodify_address as appropriate.

 From inspection I see arm_block_set_unaligned_vect also has this issue so I 
performed the same
fix in that function as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

This bug appears on GCC 5 too and I'm currently testing this patch there.
Ok to backport to GCC 5 as well?
Thanks,
Kyrill

2015-11-06  Kyrylo Tkachov  

 PR target/68143
 * config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of
 offset from dstbase and use it appropriately in
 adjust_automodify_address.
 (arm_block_set_aligned_vect): Likewise.

2015-11-06  Kyrylo Tkachov  

 PR target/68143
 * gcc.target/arm/pr68143_1.c: New test.

Sorry about the delay in reviewing this. There's nothing arm specific about 
this test - I'd just put this in gcc.c-torture/execute, there are enough 
auto-testers with neon on that will show up issues if this starts failing.


Thanks, will do. I was on the fence about whether this should go in torture.
I'll put it there.



For the record, here's what I committed with r230462.

2015-11-17  Kyrylo Tkachov  

PR target/68143
* config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of
offset from dstbase and use it appropriately in
adjust_automodify_address.
(arm_block_set_aligned_vect): Likewise.

2015-11-17  Kyrylo Tkachov  

PR target/68143
* gcc.c-torture/execute/pr68143_1.c: New test.


Kyrill



Ok with that change.

Ramana


arm-setmem-offset.patch


commit 78c6989a7af1df672ea227057180d79d717ed5f3
Author: Kyrylo Tkachov 
Date:   Wed Oct 28 17:29:18 2015 +

 [ARM] Properly update memory offsets when expanding setmem

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 66e8afc..adf3143 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29268,7 +29268,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
rtx (*gen_func) (rtx, rtx);
machine_mode mode;
unsigned HOST_WIDE_INT v = value;
-
+  unsigned int offset = 0;
gcc_assert ((align & 0x3) != 0);
nelt_v8 = GET_MODE_NUNITS (V8QImode);
nelt_v16 = GET_MODE_NUNITS (V16QImode);
@@ -29289,7 +29289,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
  return false;
  dst = copy_addr_to_reg (XEXP (dstbase, 0));
-  mem = adjust_automodify_address (dstbase, mode, dst, 0);
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
  v = sext_hwi (v, BITS_PER_WORD);
val_elt = GEN_INT (v);
@@ -29306,7 +29306,11 @@ arm_block_set_unaligned_vect (rtx dstbase,
  {
emit_insn ((*gen_func) (mem, reg));
if (i + 2 * nelt_mode <= length)
-emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
+{
+  emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
+  offset += nelt_mode;
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
+}
  }
  /* If there are not less than nelt_v8 bytes leftover, we must be in
@@ -29317,6 +29321,9 @@ arm_block_set_unaligned_vect (rtx dstbase,
if (i + nelt_v8 < length)
  {
emit_insn (gen_add2_insn (dst, GEN_INT (length - i)));
+  offset += length - i;
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
+
/* We are shifting bytes back, set the alignment accordingly.  */
if ((length & 1) != 0 && align >= 2)
  set_mem_align (mem, BITS_PER_UNIT);
@@ -29327,12 +29334,13 @@ arm_block_set_unaligned_vect (rtx dstbase,
else if (i < length && i + nelt_v8 >= length)
  {
if (mode == V16QImode)
-{
-  reg = gen_lowpart (V8QImode, reg);
-  mem = adjust_automodify_address (dstbase, V8QImode, dst, 0);
-}
+reg = gen_lowpart (V8QImode, reg);
+
emit_insn (gen_add2_insn (dst, GEN_INT ((length - i)
+ (nelt_mode - nelt_v8;
+  offset += (length - i) + (nelt_mode - nelt_v8);
+  mem = adjust_automodify_address (dstbase, V8QImode, dst, offset);
+
/* We are shifting bytes back, set the alignment accordingly.  */
if ((length & 1) != 0 && align >= 2)
  set_mem_align (mem, BITS_PER_UNIT);
@@ -29359,6 +29367,7 @@ 

[PATCH] Clarify that -fwrapv covers all signed arithmetic overflow

2015-11-17 Thread Paolo Bonzini
GCC's -fwrapv option does not affect code generation for shifts
because currently GCC does not rely on the fact that certain
signed shifts trigger undefined behavior.  However, the definition
of signed arithmetic overflow does extend to shifts; it is only
code generation that is limited to addition, subtraction and
multiplication.

Make the documentation of -fwrapv consistent with the existing
text under -fstrict-overflow ("Using '-fwrapv' means that integer
signed overflow is fully defined: it wraps.").

Ok for trunk and the branches?

Paolo

* doc/invoke.texi (Optimize Options): Clarify the effect of -fwrapv.

Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 227511)
+++ doc/invoke.texi (working copy)
@@ -23705,9 +23705,10 @@
 @item -fwrapv
 @opindex fwrapv
 This option instructs the compiler to assume that signed arithmetic
-overflow of addition, subtraction and multiplication wraps around
-using twos-complement representation.  This flag enables some optimizations
-and disables others.  This option is enabled by default for the Java
+overflow wraps around using twos-complement representation.
+This flag affects code generation for addition, subtraction
+and multiplication, enabling some optimizations
+and disabling others.  This option is enabled by default for the Java
 front end, as required by the Java language specification.
 The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
 @option{-ftrapv} @option{-fwrapv} on the command-line results in


Re: Aw: Re: TR1 Special Math

2015-11-17 Thread Szabolcs Nagy

On 17/11/15 02:00, Ed Smith-Rowland wrote:

On 11/16/2015 07:28 PM, Florian Goth wrote:

Any particular pointers how I can help in improving the implementation?



Immediately: I have a good patch with xfails where #include  should 
inject into namespace std.  That's
probably a one liner in the makefiles that's better done in tree.  That stuff 
kills me.
The values checking and NaN checking is very good.



there are several correctness bugs visible in the code.
(e.g. sinc(inf) returning nan instead of 0.)

so at least test all combinations of special numbers
(+-0, +-inf, qnan, and possibly a few other points
including subnormal, small normal large normal input,
this helped me catch corner-case bugs in musl math lib).

it would be nice to know something about the expected
accuracy of these functions (some of them i'd guess
to be hard to implement with low ulp errors).



Re: [PATCH][ARM] PR 68143 Properly update memory offsets when expanding setmem

2015-11-17 Thread Kyrill Tkachov

Hi Ramana,

On 17/11/15 12:02, Ramana Radhakrishnan wrote:


On 06/11/15 10:46, Kyrill Tkachov wrote:

Hi all,

In this wrong-code PR the vector setmem expansion and 
arm_block_set_aligned_vect in particular
use the wrong offset when calling adjust_automodify_address. In the attached 
testcase during the
initial zeroing out we get two V16QI stores, but they both are recorded by 
adjust_automodify_address
as modifying x+0 rather than x+0 and x+12 (the total size to be written is 28).

This led to the scheduling pass moving the store from "x.g = 2;" to before the 
zeroing stores.

This patch fixes the problem by keeping track of the offset to which stores are 
emitted and
passing it to adjust_automodify_address as appropriate.

 From inspection I see arm_block_set_unaligned_vect also has this issue so I 
performed the same
fix in that function as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

This bug appears on GCC 5 too and I'm currently testing this patch there.
Ok to backport to GCC 5 as well?
Thanks,
Kyrill

2015-11-06  Kyrylo Tkachov  

 PR target/68143
 * config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of
 offset from dstbase and use it appropriately in
 adjust_automodify_address.
 (arm_block_set_aligned_vect): Likewise.

2015-11-06  Kyrylo Tkachov  

 PR target/68143
 * gcc.target/arm/pr68143_1.c: New test.

Sorry about the delay in reviewing this. There's nothing arm specific about 
this test - I'd just put this in gcc.c-torture/execute, there are enough 
auto-testers with neon on that will show up issues if this starts failing.


Thanks, will do. I was on the fence about whether this should go in torture.
I'll put it there.

Kyrill



Ok with that change.

Ramana


arm-setmem-offset.patch


commit 78c6989a7af1df672ea227057180d79d717ed5f3
Author: Kyrylo Tkachov 
Date:   Wed Oct 28 17:29:18 2015 +

 [ARM] Properly update memory offsets when expanding setmem

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 66e8afc..adf3143 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29268,7 +29268,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
rtx (*gen_func) (rtx, rtx);
machine_mode mode;
unsigned HOST_WIDE_INT v = value;
-
+  unsigned int offset = 0;
gcc_assert ((align & 0x3) != 0);
nelt_v8 = GET_MODE_NUNITS (V8QImode);
nelt_v16 = GET_MODE_NUNITS (V16QImode);
@@ -29289,7 +29289,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
  return false;
  
dst = copy_addr_to_reg (XEXP (dstbase, 0));

-  mem = adjust_automodify_address (dstbase, mode, dst, 0);
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
  
v = sext_hwi (v, BITS_PER_WORD);

val_elt = GEN_INT (v);
@@ -29306,7 +29306,11 @@ arm_block_set_unaligned_vect (rtx dstbase,
  {
emit_insn ((*gen_func) (mem, reg));
if (i + 2 * nelt_mode <= length)
-   emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
+   {
+ emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
+ offset += nelt_mode;
+ mem = adjust_automodify_address (dstbase, mode, dst, offset);
+   }
  }
  
/* If there are not less than nelt_v8 bytes leftover, we must be in

@@ -29317,6 +29321,9 @@ arm_block_set_unaligned_vect (rtx dstbase,
if (i + nelt_v8 < length)
  {
emit_insn (gen_add2_insn (dst, GEN_INT (length - i)));
+  offset += length - i;
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
+
/* We are shifting bytes back, set the alignment accordingly.  */
if ((length & 1) != 0 && align >= 2)
set_mem_align (mem, BITS_PER_UNIT);
@@ -29327,12 +29334,13 @@ arm_block_set_unaligned_vect (rtx dstbase,
else if (i < length && i + nelt_v8 >= length)
  {
if (mode == V16QImode)
-   {
- reg = gen_lowpart (V8QImode, reg);
- mem = adjust_automodify_address (dstbase, V8QImode, dst, 0);
-   }
+   reg = gen_lowpart (V8QImode, reg);
+
emit_insn (gen_add2_insn (dst, GEN_INT ((length - i)
  + (nelt_mode - nelt_v8;
+  offset += (length - i) + (nelt_mode - nelt_v8);
+  mem = adjust_automodify_address (dstbase, V8QImode, dst, offset);
+
/* We are shifting bytes back, set the alignment accordingly.  */
if ((length & 1) != 0 && align >= 2)
set_mem_align (mem, BITS_PER_UNIT);
@@ -29359,6 +29367,7 @@ arm_block_set_aligned_vect (rtx dstbase,
rtx rval[MAX_VECT_LEN];
machine_mode mode;
unsigned HOST_WIDE_INT v = value;
+  unsigned int offset = 0;
  
gcc_assert ((align & 0x3) == 0);

nelt_v8 = GET_MODE_NUNITS (V8QImode);
@@ -29390,14 +29399,15 @@ arm_block_set_aligned_vect (rtx dstbase,
/* Handle first 16 bytes specially using vst1:v16qi instruction.  */
if (mode == V16QImode)
  {
-  mem = 

Re: [PATCH] Clarify that -fwrapv covers all signed arithmetic overflow

2015-11-17 Thread Joseph Myers
On Tue, 17 Nov 2015, Paolo Bonzini wrote:

> * it doesn't promise that GCC will never rely on undefined behavior
> rules for signed left shifts

I think we should remove the ", but this is subject to change" in 
implement-c.texi (while replacing it with noting that ubsan will still 
diagnose such cases, and they will also be diagnosed where constant 
expressions are required).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-11-17 Thread David Edelsohn
Kirill,

* c-c++-common/attr-simd.c
and
* c-c++-common/attr-simd-3.c

fail on 32 bit systems, e.g., see powerpc64-linux tested in 32 bit mode.

- David


[PATCH] PR fortran/43996 -- Too large array constructor in SPREAD

2015-11-17 Thread Steve Kargl
The attached patch fixes an issue with SPREAD and the PARAMETER
attribute when an array constructor is too large for expansion.
gfortran now issues an error message and points to the 
-fmax-array-constructor.

Patch built on i386-*-freebsd and x86_64-*-freebsd.  There are
no regressions.  OK to commit?

2015-11-17  Steven G. Kargl  

PR fortran/43996
* simplify.c (gfc_simplify_spread): Issue error for too large array 
constructor in a PARAMETER statement.

2015-11-17  Steven G. Kargl  

PR fortran/43996
* gfortran.dg/pr43996.f90

-- 
Steve
Index: gcc/fortran/simplify.c
===
--- gcc/fortran/simplify.c	(revision 230463)
+++ gcc/fortran/simplify.c	(working copy)
@@ -5991,8 +5991,8 @@ gfc_simplify_spacing (gfc_expr *x)
 gfc_expr *
 gfc_simplify_spread (gfc_expr *source, gfc_expr *dim_expr, gfc_expr *ncopies_expr)
 {
-  gfc_expr *result = 0L;
-  int i, j, dim, ncopies;
+  gfc_expr *result = NULL;
+  int nelem, i, j, dim, ncopies;
   mpz_t size;
 
   if ((!gfc_is_constant_expr (source)
@@ -6019,8 +6019,20 @@ gfc_simplify_spread (gfc_expr *source, g
   else
 mpz_init_set_ui (size, 1);
 
-  if (mpz_get_si (size)*ncopies > flag_max_array_constructor)
-return NULL;
+  nelem = mpz_get_si (size) * ncopies;
+  if (nelem > flag_max_array_constructor)
+{
+  if (gfc_current_ns->sym_root->n.sym->attr.flavor == FL_PARAMETER)
+	{
+	  gfc_error ("The number of elements (%d) in the array constructor "
+		 "at %L requires an increase of the allowed %d upper "
+		 "limit.  See %<-fmax-array-constructor%> option.",
+		 nelem, >where, flag_max_array_constructor);
+	  return _bad_expr;
+	}
+  else
+	return NULL;
+}
 
   if (source->expr_type == EXPR_CONSTANT)
 {
Index: gcc/testsuite/gfortran.dg/pr43996.f90
===
--- gcc/testsuite/gfortran.dg/pr43996.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr43996.f90	(working copy)
@@ -0,0 +1,7 @@
+! { dg-do compile }
+! PR fortran/43996
+!
+real, parameter :: a(720,360) = spread((/(j, j=1,720) /), dim=2, ncopies=360) ! { dg-error "number of elements" }
+real x
+x = a(720,360)
+end


[PATCH] g++.dg/cpp1y/pr58708.C wchar_t size

2015-11-17 Thread David Edelsohn
The testcase in the GCC testsuite assumes that wchar_t is 32 bits,
which is not correct on AIX.  32 bit AIX maintains 16 bit wchar_t for
backward compatibility (64 bit AIX uses 32 bit wchar_t).

What is the preferred method to make the testcase safe for smaller wchar_t?

The following patch works for me.  I wasn't sure what header file and
what macro test would be considered portable.  I could include
stdint.h and compare

WCHAR_MAX == UINT16_MAX

or

WCHAR_MAX < UINT32_MAX

Thanks, David

Index: pr58708.C
===
--- pr58708.C   (revision 230463)
+++ pr58708.C   (working copy)
@@ -1,5 +1,7 @@
 // { dg-do run { target c++14 } }

+#include 
+
 template
   struct is_same
   {
@@ -43,7 +45,11 @@
   if (foo.chars[1] != 98) __builtin_abort();
   if (foo.chars[2] != 99) __builtin_abort();

-  auto wfoo = L"\x01020304\x05060708"_foo;
+#if WCHAR_MAX == 65535
+auto wfoo = L"\x0102\x0304"_foo;
+#else
+auto wfoo = L"\x01020304\x05060708"_foo;
+#endif
   if (is_same::value != true)
__builtin_abort();
   if (sizeof(wfoo.chars)/sizeof(wchar_t) != 2) __builtin_abort();
   if (wfoo.chars[0] != 16909060) __builtin_abort();


Re: [PATCH] g++.dg/cpp1y/pr58708.C wchar_t size

2015-11-17 Thread Jonathan Wakely
On 17 November 2015 at 16:04, David Edelsohn wrote:
> The testcase in the GCC testsuite assumes that wchar_t is 32 bits,
> which is not correct on AIX.  32 bit AIX maintains 16 bit wchar_t for
> backward compatibility (64 bit AIX uses 32 bit wchar_t).
>
> What is the preferred method to make the testcase safe for smaller wchar_t?
>
> The following patch works for me.  I wasn't sure what header file and
> what macro test would be considered portable.  I could include
> stdint.h and compare
>
> WCHAR_MAX == UINT16_MAX
>
> or
>
> WCHAR_MAX < UINT32_MAX

__SIZEOF_WCHAR_T__ is always pre-defined  by the compiler, so that
could be used.


Re: [gomp4, ptx] worker & gang complex double reductions

2015-11-17 Thread Nathan Sidwell

On 11/16/15 17:07, Nathan Sidwell wrote:

I've committed this patch to the gomp4 branch.  It adds support for worker and
gang level complex double reductions.


I was unsatisfied with that approach, so I've separated the two mechanisms into 
different functions with the attached patch.  The locking scheme returns to the 
early variant (but using cmp builtin)


  while (cmp (, 0, 1)
continue;
  T accum = *ptr;
  accum = accum OP myval;
  *ptr = accum
  cmp (, 1, 0);

A new dispatcher function decides which approach to take, and that's where we 
can add atomic optimization smarts and the like.


nathan

2015-11-17  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_lockless_update): Remove complex
	double handling here ...
	(nvptx_lockfull_update): ... to this new function.
	(nvptx_reduction_update): New function.
	(nvptx_goacc_reduction_fini): Call it.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 230463)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -4354,11 +4354,10 @@ nvptx_global_lock_addr ()
write = guess OP myval;
actual = cmp (ptr, guess, write)
  } while (actual bit-differnt-to guess);
+   return write;
 
-  Unfortunately for types larger than 64 bits, there is no cmp
-  instruction.  We use a lock variable in global memory to synthesize
-  the above sequence.  (A lock in global memory is necessary to force
-  execution engine descheduling and avoid resource starvation.)  */
+   This relies on a cmp instruction, which is available for 32-
+   and 64-bit types.  Larger types must use a locking scheme.  */
 
 static tree
 nvptx_lockless_update (location_t loc, gimple_stmt_iterator *gsi,
@@ -4368,20 +4367,9 @@ nvptx_lockless_update (location_t loc, g
   tree_code code = NOP_EXPR;
   tree arg_type = unsigned_type_node;
   tree var_type = TREE_TYPE (var);
-  tree dest_type = var_type;
-  tree inner_type = NULL_TREE; /* Non-null if synthesizing cmp */
 
-  if (TREE_CODE (var_type) == COMPLEX_TYPE)
-{
-  if (TYPE_SIZE (TREE_TYPE (var_type))
-	  == TYPE_SIZE (long_long_unsigned_type_node))
-	/* Must do by parts.  */
-	var_type = TREE_TYPE (var_type);
-  else
-	code = VIEW_CONVERT_EXPR;
-}
-
-  if (TREE_CODE (var_type) == REAL_TYPE)
+  if (TREE_CODE (var_type) == COMPLEX_TYPE
+  || TREE_CODE (var_type) == REAL_TYPE)
 code = VIEW_CONVERT_EXPR;
 
   if (TYPE_SIZE (var_type) == TYPE_SIZE (long_long_unsigned_type_node))
@@ -4390,31 +4378,21 @@ nvptx_lockless_update (location_t loc, g
   fn = NVPTX_BUILTIN_CMP_SWAPLL;
 }
 
-  if (var_type != dest_type)
-{
-  inner_type = arg_type;
-  arg_type = dest_type;
-  /* We use the cmp insn to do the global locking.  */
-  fn = NVPTX_BUILTIN_CMP_SWAP;
-}
-
   tree swap_fn = nvptx_builtin_decl (fn, true);
 
-  /* Build and insert the initialization sequence.  */
   gimple_seq init_seq = NULL;
   tree init_var = make_ssa_name (arg_type);
-  tree init_expr = omp_reduction_init_op (loc, op, dest_type);
-  if (arg_type != dest_type)
-init_expr = fold_build1 (code, arg_type, init_expr);
+  tree init_expr = omp_reduction_init_op (loc, op, var_type);
+  init_expr = fold_build1 (code, arg_type, init_expr);
   gimplify_assign (init_var, init_expr, _seq);
   gimple *init_end = gimple_seq_last (init_seq);
 
   gsi_insert_seq_before (gsi, init_seq, GSI_SAME_STMT);
-
+  
   /* Split the block just after the init stmts.  */
   basic_block pre_bb = gsi_bb (*gsi);
   edge pre_edge = split_block (pre_bb, init_end);
-  basic_block head_bb = pre_edge->dest;
+  basic_block loop_bb = pre_edge->dest;
   pre_bb = pre_edge->src;
   /* Reset the iterator.  */
   *gsi = gsi_for_stmt (gsi_stmt (*gsi));
@@ -4422,179 +4400,166 @@ nvptx_lockless_update (location_t loc, g
   tree expect_var = make_ssa_name (arg_type);
   tree actual_var = make_ssa_name (arg_type);
   tree write_var = make_ssa_name (arg_type);
-  tree lock_state = NULL_TREE;
-  tree uns_unlocked = NULL_TREE, uns_locked = NULL_TREE;
-
+  
   /* Build and insert the reduction calculation.  */
   gimple_seq red_seq = NULL;
-  if (inner_type)
-{
-  /* Unlock the lock using cmp with an appropriate expected
-	 value.  This ends up with us unlocking only on subsequent
-	 iterations.  */
-  lock_state = make_ssa_name (unsigned_type_node);
-  uns_unlocked = build_int_cst (unsigned_type_node, 0);
-  uns_locked = build_int_cst (unsigned_type_node, 1);
-  
-  tree unlock_expr = nvptx_global_lock_addr ();
-  unlock_expr = build_call_expr_loc (loc, swap_fn, 3, unlock_expr,
-	 lock_state, uns_unlocked);
-  gimplify_and_add (unlock_expr, _seq);
-}
-
-  tree write_expr = expect_var;
-  if (arg_type != dest_type)
-write_expr = fold_build1 (code, dest_type, expect_var);
-  write_expr = fold_build2 (op, dest_type, write_expr, var);
-  if (arg_type != dest_type)
-write_expr = fold_build1 (code, arg_type, 

Re: [PATCH][GCC] Make stackalign test LTO proof

2015-11-17 Thread Andre Vieira

On 17/11/15 12:29, Bernd Schmidt wrote:

On 11/16/2015 04:48 PM, Andre Vieira wrote:

On 16/11/15 15:34, Joern Wolfgang Rennecke wrote:

I just happened to stumble on this problem with another port.
The volatile & test solution doesn't work, though.

What does work, however, is:

__asm__ ("" : : "" (dummy));


I can confirm that Joern's solution works for me too.


Ok to make that change.


Bernd


OK, Joern will you submit a patch for this or shall I?

Cheers,
Andre



Re: [PATCH] Add LANG_HOOKS_EMPTY_RECORD_P for C++ empty class

2015-11-17 Thread Richard Biener
On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu  wrote:
> Empty record should be returned and passed the same way in C and C++.
> This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which
> defaults to return false.  For C++, LANG_HOOKS_EMPTY_RECORD_P is defined
> to is_really_empty_class, which returns true for C++ empty classes.  For
> LTO, we stream out a bit to indicate if a record is empty and we store
> it in TYPE_LANG_FLAG_0 when streaming in.  get_ref_base_and_extent is
> changed to set bitsize to 0 for empty records.  Middle-end and x86
> backend are updated to ignore empty records for parameter passing and
> function value return.  Other targets may need similar changes.

Please avoid a new langhook for this and instead claim a bit in tree_type_common
like for example restrict_flag (double-check it is unused for non-pointers).

I don't like that you need to modify targets - those checks should be done
in the caller (which may just use a new wrapper with the logic and then
dispatching to the actual hook).

Why do you need do adjust get_ref_base_and_extent?

Thanks,
Richard.

> gcc/
>
> PR c++/60336
> PR middle-end/67239
> PR target/68355
> * calls.c (store_one_arg): Use 0 for empty record size.  Don't
> push 0 size argument onto stack.
> (must_pass_in_stack_var_size_or_pad): Return false for empty
> record.
> * function.c (locate_and_pad_parm): Use 0 for empty record size.
> * tree-dfa.c (get_ref_base_and_extent): Likewise.
> * langhooks-def.h (LANG_HOOKS_EMPTY_RECORD_P): New.
> (LANG_HOOKS_DECLS): Add LANG_HOOKS_EMPTY_RECORD_P.
> * langhooks.h (lang_hooks_for_decls): Add empty_record_p.
> * lto-streamer.h (LTO_major_version): Increase by 1 to 6.
> * targhooks.c: Include "langhooks.h".
> (std_gimplify_va_arg_expr): Use 0 for empty record size.
> * tree-streamer-in.c (unpack_ts_base_value_fields): Stream in
> TYPE_LANG_FLAG_0.
> * tree-streamer-out.c: Include "langhooks.h".
> (pack_ts_base_value_fields): Stream out a bit to indicate if a
> record is empty.
> * config/i386/i386.c (classify_argument): Return 0 for empty
> record.
> (construct_container): Return NULL for empty record.
> (ix86_function_arg): Likewise.
> (ix86_function_arg_advance): Skip empty record.
> (ix86_return_in_memory): Return false for empty record.
> (ix86_gimplify_va_arg): Use 0 for empty record size.
>
> gcc/cp/
>
> PR c++/60336
> PR middle-end/67239
> PR target/68355
> * class.c (is_empty_class): Changed to return bool and take
> const_tree.
> (is_really_empty_class): Changed to take const_tree.  Check
> if TYPE_BINFO is zero.
> * cp-tree.h (is_empty_class): Updated.
> (is_really_empty_class): Likewise.
> * cp-lang.c (LANG_HOOKS_EMPTY_RECORD_P): New.
>
> gcc/lto/
>
> PR c++/60336
> PR middle-end/67239
> PR target/68355
> * lto-lang.c (lto_empty_record_p): New.
> (LANG_HOOKS_EMPTY_RECORD_P): Likewise.
>
> gcc/testsuite/
>
> PR c++/60336
> PR middle-end/67239
> PR target/68355
> * g++.dg/abi/empty12.C: New test.
> * g++.dg/abi/empty12.h: Likewise.
> * g++.dg/abi/empty12a.c: Likewise.
> * g++.dg/pr60336-1.C: Likewise.
> * g++.dg/pr60336-2.C: Likewise.
> * g++.dg/pr68355.C: Likewise.
> ---
>  gcc/calls.c | 41 
> +++--
>  gcc/config/i386/i386.c  | 18 +++-
>  gcc/cp/class.c  | 17 ---
>  gcc/cp/cp-lang.c|  2 ++
>  gcc/cp/cp-tree.h|  4 ++--
>  gcc/function.c  |  7 +--
>  gcc/langhooks-def.h |  2 ++
>  gcc/langhooks.h |  3 +++
>  gcc/lto-streamer.h  |  2 +-
>  gcc/lto/lto-lang.c  | 13 
>  gcc/targhooks.c |  6 +-
>  gcc/testsuite/g++.dg/abi/empty12.C  | 17 +++
>  gcc/testsuite/g++.dg/abi/empty12.h  |  9 
>  gcc/testsuite/g++.dg/abi/empty12a.c |  6 ++
>  gcc/testsuite/g++.dg/pr60336-1.C| 17 +++
>  gcc/testsuite/g++.dg/pr60336-2.C| 28 +
>  gcc/testsuite/g++.dg/pr68355.C  | 24 ++
>  gcc/tree-dfa.c  |  2 ++
>  gcc/tree-streamer-in.c  |  5 +
>  gcc/tree-streamer-out.c |  6 ++
>  20 files changed, 204 insertions(+), 25 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/abi/empty12.C
>  create mode 100644 gcc/testsuite/g++.dg/abi/empty12.h
>  create mode 100644 gcc/testsuite/g++.dg/abi/empty12a.c
>  create mode 100644 gcc/testsuite/g++.dg/pr60336-1.C
>  create mode 100644 

Re: [PATCH][RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

2015-11-17 Thread Kyrill Tkachov


On 17/11/15 12:10, Bernd Schmidt wrote:

On 11/17/2015 10:08 AM, Kyrill Tkachov wrote:

Yes, I had considered that as well. It should be equivalent. I didn't
use !reg_used_between_p because I thought
it'd be more expensive than checking reg_overlap_mentioned_p since we
must iterate over a number of instructions
and call reg_overlap_mentioned_p on each one. But I suppose this case is
rare enough that it wouldn't make any
measurable difference.

Would you prefer to use !reg_used_between_p here?


I would but apparently it doesn't work, so that's kind of neither here nor 
there.


The added comment could lead to some confusion since it's placed in
front of an existing if statement that also tests a different
condition. Also, if we go with your fix,


+  || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN
(cand->insn


Shouldn't this really be !rtx_equal_p?



Maybe, will it behave the right way if the two regs have different modes
or when subregs are involved?


It would return false, in which case we'll conservatively fail here. I think 
that's desirable?



Well, I think the statement we want to make is
"return false from this function if the two expressions contain the same register 
number".
if (!rtx_equal_p (..., ...))
  return false;

will only return false if the two expressions are the same REG with the same 
mode.
if (!reg_overlap_mentioned_p (..., ...))
  return false;

should return false even if the modes are different or one is a subreg, which 
is what we want.

I did not see any codegen regressions using reg_overlap_mentioned_p on aarch64, 
so I don't think
it will restrict any legitimate cases.

Thanks,
Kyrill



Bernd





Re: [PATCH] Fix uninitialized src_range within c_expr (Re: libcpp/C FE source range patch committed (r230331))

2015-11-17 Thread David Malcolm
On Mon, 2015-11-16 at 22:34 +0100, Bernd Schmidt wrote:
> On 11/16/2015 09:50 PM, David Malcolm wrote:
> > The root cause is uninitialized data.  Specifically, the C parser's
> > struct c_expr gained a "src_range" field, and it turns out there are a
> > few places where I wasn't initializing this when returning c_expr
> > instances on the stack, and in some cases the values could get used.
> 
> > I'm working on a followup to fix the remaining places I identified via
> > review of the source.
> 
> The patch is mostly OK IMO and should be installed to fix the problems, 

I'm attaching two followup patches.

The patch as is introduces some ICEs due to accessing EXPR_LOCATION ()
of a c_expr's "value" field, for some cases where value is NULL.  The
first attached patch bulletproofs both implementations of
set_c_expr_source_range for this case.

I've successfully bootstrapped and regression-tested the combination of
this plus the previous patch on x86_64-pc-linux-gnu; I've also got a
bootstrap ongoing on powerpc-ibm-aix7.1.3.0.

> but I think there are a few more things to consider.
> 
> Should c_expr perhaps acquire a constructor so that this problem is 
> avoided in the future? The whole thing seems somewhat error-prone.

I agree that it's error prone, and the ctor approach is what I've been
trying for the C++ FE [1] but I suspect that touching that in the C FE
would be a much more invasive patch (unless we simply give it a default
ctor that makes the src_range be a pair of UNKNOWN_LOCATIONS?).  I'll
give it a go, but it feels like a separate followup.

> > @@ -4278,9 +4278,11 @@ c_parser_braced_init (c_parser *parser, tree type, 
> > bool nested_p)
> > obstack_free (_init_obstack, NULL);
> > return ret;
> >   }
> > +  location_t close_loc = c_parser_peek_token (parser)->location;
> 
> It looks like we're peeking the token twice here (via a 
> c_parser_token_is_not call above the quoted code). Probably not too 
> expensive but maybe we can avoid it.

Thanks; I'm also attaching a patch that does so (not yet bootstrapped,
but will do so).


> > case RID_VA_ARG:
> > - c_parser_consume_token (parser);
> > + {
> > +   location_t start_loc = loc;
> 
> Does this really have to be indented in an extra braced block? Please 
> fix if not.

This case gains a pair of locals: start_loc and end_loc (so that we can
track the spelling range whilst retaining the "loc" used for the caret),
and I preferred to confine their scope to within the case, hence the
extra braced block.  Omitting the braced block leads to:
../../src/gcc/c/c-parser.c:7494:7: error: jump to case label [-fpermissive]
  case RID_OFFSETOF:
   ^
../../src/gcc/c/c-parser.c:7472:17: error:   crosses initialization of 
‘location_t end_loc’
  location_t end_loc = c_parser_peek_token (parser)->get_finish ();
 ^
etc.  I could fix that by moving the locals to the top of the function,
but that seems messy, so it seemed best to add the braces (and hence
indent).
Hope that sounds like the right trade-off.

Is the combination of the 3 patches OK for trunk? (assuming
bootstrap it's only the braced-init tweak that hasn't been).

Thanks
Dave
[1] in "[PATCH/RFC] C++ FE: expression ranges (v2)":
   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01859.html

>From a19859a1ecdf850684b8d191b0ff57c8e50cc121 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Tue, 17 Nov 2015 06:09:52 -0500
Subject: [PATCH 2/3] Bulletproof set_c_expr_source_range against NULL
 expr->value

gcc/c/ChangeLog:
	* c-parser.c (set_c_expr_source_range): Bulletproof both
	overloaded implementations against NULL expr->value.
---
 gcc/c/c-parser.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index bcad80c..9ab7ceb 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -65,7 +65,8 @@ set_c_expr_source_range (c_expr *expr,
 {
   expr->src_range.m_start = start;
   expr->src_range.m_finish = finish;
-  set_source_range (expr->value, start, finish);
+  if (expr->value)
+set_source_range (expr->value, start, finish);
 }
 
 void
@@ -73,7 +74,8 @@ set_c_expr_source_range (c_expr *expr,
 			 source_range src_range)
 {
   expr->src_range = src_range;
-  set_source_range (expr->value, src_range);
+  if (expr->value)
+set_source_range (expr->value, src_range);
 }
 
 
-- 
1.8.5.3

>From 205da878acc752adb275da3ca61a342a9a124f93 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Tue, 17 Nov 2015 10:18:39 -0500
Subject: [PATCH 3/3] Lookup next token once at end of c_parser_braced_init,
 not twice

---
 gcc/c/c-parser.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 9ab7ceb..eedcaa4 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -4270,7 +4270,8 @@ c_parser_braced_init (c_parser *parser, tree type, bool nested_p)
 	break;
 	}
 }
-  if (c_parser_next_token_is_not 

Re: [PATCH] Fix uninitialized src_range within c_expr (Re: libcpp/C FE source range patch committed (r230331))

2015-11-17 Thread Bernd Schmidt

On 11/17/2015 04:13 PM, David Malcolm wrote:

On Mon, 2015-11-16 at 22:34 +0100, Bernd Schmidt wrote:


Should c_expr perhaps acquire a constructor so that this problem is
avoided in the future? The whole thing seems somewhat error-prone.


I agree that it's error prone, and the ctor approach is what I've been
trying for the C++ FE [1] but I suspect that touching that in the C FE
would be a much more invasive patch (unless we simply give it a default
ctor that makes the src_range be a pair of UNKNOWN_LOCATIONS?).


The UNKNOWN_LOCATIONS pair would have been my approach, yes.


This case gains a pair of locals: start_loc and end_loc (so that we can
track the spelling range whilst retaining the "loc" used for the caret),
and I preferred to confine their scope to within the case, hence the
extra braced block.  Omitting the braced block leads to:
../../src/gcc/c/c-parser.c:7494:7: error: jump to case label [-fpermissive]
   case RID_OFFSETOF:
^
../../src/gcc/c/c-parser.c:7472:17: error:   crosses initialization of 
‘location_t end_loc’
   location_t end_loc = c_parser_peek_token (parser)->get_finish ();
  ^
etc.


Hmm, odd, I tried placing just the location_t start_loc line into the 
switch and that appeared to compile fine. But I guess this is not a huge 
problem.


Is the combination of the 3 patches OK for trunk? (assuming
bootstrap it's only the braced-init tweak that hasn't been).


Yes.


Bernd


Re: [PATCH] Clarify that -fwrapv covers all signed arithmetic overflow

2015-11-17 Thread Joseph Myers
On Tue, 17 Nov 2015, Paolo Bonzini wrote:

> Can you suggest a wording for "if the GNU C language definition changes
> [which, no matter how unlikely, is explicitly not ruled out by the
> manual] -fwrapv will be extended to signed shifts, and shifts of
> negative numbers would return A*2^B whenever the result fits in the type"?

I don't think we can usefully say how a hypothetical change in one area 
would or would not affect a particular option.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-17 Thread Tom de Vries

On 17/11/15 11:05, Richard Biener wrote:

On Tue, Nov 17, 2015 at 12:20 AM, Tom de Vries  wrote:

On 16/11/15 13:45, Richard Biener wrote:


+ NEXT_PASS (pass_scev_cprop);


What's that for?  It's supposed to help removing loops - I don't
expect kernels to vanish.




I'm using pass_scev_cprop for the "final value replacement"
functionality.
Added comment.




That functionality is intented to enable loop removal.



Let me try to explain in a bit more detail.


I.

Consider a parloops testcase test.c, with a use of the final value of the
iteration variable (return i):
...
unsigned int
foo (int n, int *a)
{
   int i;
   for (i = 0; i < n; ++i)
 a[i] = 1;

   return i;
}
...

Say we compile with:
...
$ gcc -S -O2 test.c -ftree-parallelize-loops=2 -fdump-tree-all-details
...

We can see here in the parloops dump-file that the loop was parallelized:
...
   SUCCESS: may be parallelized
...

Now say that we run with -fno-tree-scev-cprop in addition. Instead we find
in the parloops dump-file:
...
phi is i_1 = PHI 
arg of phi to exit:   value i_10 used outside loop
   checking if it a part of reduction pattern:
   FAILED: it is not a part of reduction.
...

Auto-parallelization fails in this case because there is a loop exit phi
(the one in bb 6 defining i_1) which is not part of a reduction:
...
   :
   # i_13 = PHI <0(3), i_10(5)>
   _5 = (long unsigned int) i_13;
   _6 = _5 * 4;
   _8 = a_7(D) + _6;
   *_8 = 1;
   i_10 = i_13 + 1;
   if (n_4(D) > i_10)
 goto ;
   else
 goto ;

   :
   goto ;

   :
   # i_1 = PHI 
   _20 = (unsigned int) i_1;
...

With -ftree-scev-cprop, we find in the pass_scev_cprop dump-file:
...
final value replacement:
   i_1 = PHI 
   with
   i_1 = n_4(D);
...

And the resulting loop no longer has any loop exit phis, so
auto-parallelization succeeds:
...
   :
   # i_13 = PHI <0(3), i_10(5)>
   _5 = (long unsigned int) i_13;
   _6 = _5 * 4;
   _8 = a_7(D) + _6;
   *_8 = 1;
   i_10 = i_13 + 1;
   if (n_4(D) > i_10)
 goto ;
   else
 goto ;

   :
   goto ;

   :
   _20 = (unsigned int) n_4(D);
...

[ I've filed PR68373 - "autopar fails on loop exit phi with argument defined
outside loop", for a slightly different testcase where despite the final
value replacement autopar still fails. ]


II.

Now, back to oacc kernels.

Consider test-case kernels-loop-n.f95 (will add this one to the test-cases):
...
module test
contains
   subroutine foo(n)
 implicit none
 integer :: n
 integer, dimension (0:n-1) :: a, b, c
 integer:: i, ii
 do i = 0, n - 1
a(i) = i * 2
 end do

 do i = 0, n -1
b(i) = i * 4
 end do

 !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
 do ii = 0, n - 1
c(ii) = a(ii) + b(ii)
 end do
 !$acc end kernels

 do i = 0, n - 1
if (c(i) .ne. a(i) + b(i)) call abort
 end do

   end subroutine foo
end module test
...

The loop at the start of the kernels pass group contains an in-memory
iteration variable, with a store to '*_9 = _38'.
...
   :
   _13 = *.omp_data_i_4(D).c;
   c.21_14 = *_13;
   _16 = *_9;
   _17 = (integer(kind=8)) _16;
   _18 = *.omp_data_i_4(D).a;
   a.22_19 = *_18;
   _23 = MEM[(integer(kind=4)[0:D.3488] *)a.22_19][_17];
   _24 = *.omp_data_i_4(D).b;
   b.23_25 = *_24;
   _29 = MEM[(integer(kind=4)[0:D.3484] *)b.23_25][_17];
   _30 = _23 + _29;
   MEM[(integer(kind=4)[0:D.3480] *)c.21_14][_17] = _30;
   _38 = _16 + 1;
   *_9 = _38;
   if (_8 == _16)
 goto ;
   else
 goto ;
...

After pass_lim/pass_copy_prop, we've rewritten that into using a local
iteration variable, but we've generated a read of the final value of the
iteration variable outside the loop, which means auto-parallelization will
fail:
...
   :
   # D__lsm.29_12 = PHI 
   _17 = (integer(kind=8)) D__lsm.29_12;
   _23 = MEM[(integer(kind=4)[0:D.3488] *)a.22_19][_17];
   _29 = MEM[(integer(kind=4)[0:D.3484] *)b.23_25][_17];
   _30 = _23 + _29;
   MEM[(integer(kind=4)[0:D.3480] *)c.21_14][_17] = _30;
   _38 = D__lsm.29_12 + 1;
   if (_8 == D__lsm.29_12)
 goto ;
   else
 goto ;

   :
   # D__lsm.29_27 = PHI <_38(5)>
   *_9 = D__lsm.29_27;
   goto ;


So this store is not actually necessary?


a.
In the case of this example, the store is dead.

There is a corresponding load at the point that we split off the region:
...
  :
  #pragma omp return

  :
  D.3635 = .omp_data_arr.25.ii;
  ii = *D.3635;
...

This load is later removed, given that ii is unused after the region. 
But once the region is split off,  there's nothing in the context of the 
store to suggest that it's dead.


And to get rid of the load of ii before the region is split off, we 
would have to implement some sort of liveness analysis on pre-ssa code.


b.
There's the case where there is an explicit use of ii after the region, 
in which case the store is not dead.


c.
And there's the case were we use a data 

Re: [PATCH] PR 65751 Bogus in error message

2015-11-17 Thread Steve Kargl
On Tue, Nov 17, 2015 at 10:53:52AM +0100, Dominique d'Humi??res wrote:
> Is the following patch OK for trunk and 5.3? 

OK.

> 
> I have used the legalese found in my draft for Fortran 2015.
> Would it be acceptable to replace 
> "with the BIND attribute or the SEQUENCE attribute" 
> with
> "with the BIND or SEQUENCE attribute"?

In my opinion, yes.

-- 
Steve


Re: C++ PATCH to integrate c++-delayed-folding branch

2015-11-17 Thread David Edelsohn
On Mon, Nov 16, 2015 at 10:47 PM, Jason Merrill  wrote:
> On 11/16/2015 09:39 PM, David Edelsohn wrote:
>>
>> The PPC port seems to be bootstrapping again, but I'm not sure why.
>> Mike Meissner's patch only should have affected long double.
>
>
>> It's hard to know if there is a latent bug that has gone back into hiding.
>
>
> The problem was twofold:
>
> 1) VSX_L included IFmode, but VSa didn't, so expanding various patterns over
> VSX_L generated an IFmode insn that still had  scattered around.
> Mike's patch fixed this.
>
> 2) The delayed folding merge changed
>
>   __builtin_constant_p (non-constant-expr && false)
>
> to be false because the operand is not a C++ constant expression. Previously
> we had seen that the IFmode insn was inactive because its test was known to
> be false, but with this change we needed to evaluate its test at runtime, so
> we had to parse the insn itself, so we ran into problem #1.
>
> I have a patch to fix the __builtin_constant_p regression which I will be
> checking in shortly.

Thanks for debugging and fixing this!

- David


[PING] [PATCH] Improve C++ loop's backward-jump location

2015-11-17 Thread Andreas Arnez
Ping:

  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01192.html


> gcc/cp/ChangeLog:
>  
>   * cp-gimplify.c (genericize_cp_loop): Change LOOP_EXPR's location
>   to start of loop body instead of start of loop.
> 
> gcc/testsuite/ChangeLog:
>  
>   * g++.dg/guality/pr67192.C: New test.



Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-17 Thread Richard Biener
On Tue, 17 Nov 2015, Tom de Vries wrote:

> On 17/11/15 11:05, Richard Biener wrote:
> > On Tue, Nov 17, 2015 at 12:20 AM, Tom de Vries 
> > wrote:
> > > On 16/11/15 13:45, Richard Biener wrote:
> > > > > > 
> > > > > > + NEXT_PASS (pass_scev_cprop);
> > > > > > > > 
> > > > > > > > What's that for?  It's supposed to help removing loops - I don't
> > > > > > > > expect kernels to vanish.
> > > > > 
> > > > > > 
> > > > > > I'm using pass_scev_cprop for the "final value replacement"
> > > > > > functionality.
> > > > > > Added comment.
> > > 
> > > 
> > > > That functionality is intented to enable loop removal.
> > > 
> > > 
> > > Let me try to explain in a bit more detail.
> > > 
> > > 
> > > I.
> > > 
> > > Consider a parloops testcase test.c, with a use of the final value of the
> > > iteration variable (return i):
> > > ...
> > > unsigned int
> > > foo (int n, int *a)
> > > {
> > >int i;
> > >for (i = 0; i < n; ++i)
> > >  a[i] = 1;
> > > 
> > >return i;
> > > }
> > > ...
> > > 
> > > Say we compile with:
> > > ...
> > > $ gcc -S -O2 test.c -ftree-parallelize-loops=2 -fdump-tree-all-details
> > > ...
> > > 
> > > We can see here in the parloops dump-file that the loop was parallelized:
> > > ...
> > >SUCCESS: may be parallelized
> > > ...
> > > 
> > > Now say that we run with -fno-tree-scev-cprop in addition. Instead we find
> > > in the parloops dump-file:
> > > ...
> > > phi is i_1 = PHI 
> > > arg of phi to exit:   value i_10 used outside loop
> > >checking if it a part of reduction pattern:
> > >FAILED: it is not a part of reduction.
> > > ...
> > > 
> > > Auto-parallelization fails in this case because there is a loop exit phi
> > > (the one in bb 6 defining i_1) which is not part of a reduction:
> > > ...
> > >:
> > ># i_13 = PHI <0(3), i_10(5)>
> > >_5 = (long unsigned int) i_13;
> > >_6 = _5 * 4;
> > >_8 = a_7(D) + _6;
> > >*_8 = 1;
> > >i_10 = i_13 + 1;
> > >if (n_4(D) > i_10)
> > >  goto ;
> > >else
> > >  goto ;
> > > 
> > >:
> > >goto ;
> > > 
> > >:
> > ># i_1 = PHI 
> > >_20 = (unsigned int) i_1;
> > > ...
> > > 
> > > With -ftree-scev-cprop, we find in the pass_scev_cprop dump-file:
> > > ...
> > > final value replacement:
> > >i_1 = PHI 
> > >with
> > >i_1 = n_4(D);
> > > ...
> > > 
> > > And the resulting loop no longer has any loop exit phis, so
> > > auto-parallelization succeeds:
> > > ...
> > >:
> > ># i_13 = PHI <0(3), i_10(5)>
> > >_5 = (long unsigned int) i_13;
> > >_6 = _5 * 4;
> > >_8 = a_7(D) + _6;
> > >*_8 = 1;
> > >i_10 = i_13 + 1;
> > >if (n_4(D) > i_10)
> > >  goto ;
> > >else
> > >  goto ;
> > > 
> > >:
> > >goto ;
> > > 
> > >:
> > >_20 = (unsigned int) n_4(D);
> > > ...
> > > 
> > > [ I've filed PR68373 - "autopar fails on loop exit phi with argument
> > > defined
> > > outside loop", for a slightly different testcase where despite the final
> > > value replacement autopar still fails. ]
> > > 
> > > 
> > > II.
> > > 
> > > Now, back to oacc kernels.
> > > 
> > > Consider test-case kernels-loop-n.f95 (will add this one to the
> > > test-cases):
> > > ...
> > > module test
> > > contains
> > >subroutine foo(n)
> > >  implicit none
> > >  integer :: n
> > >  integer, dimension (0:n-1) :: a, b, c
> > >  integer:: i, ii
> > >  do i = 0, n - 1
> > > a(i) = i * 2
> > >  end do
> > > 
> > >  do i = 0, n -1
> > > b(i) = i * 4
> > >  end do
> > > 
> > >  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
> > >  do ii = 0, n - 1
> > > c(ii) = a(ii) + b(ii)
> > >  end do
> > >  !$acc end kernels
> > > 
> > >  do i = 0, n - 1
> > > if (c(i) .ne. a(i) + b(i)) call abort
> > >  end do
> > > 
> > >end subroutine foo
> > > end module test
> > > ...
> > > 
> > > The loop at the start of the kernels pass group contains an in-memory
> > > iteration variable, with a store to '*_9 = _38'.
> > > ...
> > >:
> > >_13 = *.omp_data_i_4(D).c;
> > >c.21_14 = *_13;
> > >_16 = *_9;
> > >_17 = (integer(kind=8)) _16;
> > >_18 = *.omp_data_i_4(D).a;
> > >a.22_19 = *_18;
> > >_23 = MEM[(integer(kind=4)[0:D.3488] *)a.22_19][_17];
> > >_24 = *.omp_data_i_4(D).b;
> > >b.23_25 = *_24;
> > >_29 = MEM[(integer(kind=4)[0:D.3484] *)b.23_25][_17];
> > >_30 = _23 + _29;
> > >MEM[(integer(kind=4)[0:D.3480] *)c.21_14][_17] = _30;
> > >_38 = _16 + 1;
> > >*_9 = _38;
> > >if (_8 == _16)
> > >  goto ;
> > >else
> > >  goto ;
> > > ...
> > > 
> > > After pass_lim/pass_copy_prop, we've rewritten that into using a local
> > > iteration variable, but we've generated a read of the final value of the
> > > iteration variable outside the loop, which means auto-parallelization will
> > > fail:
> > > ...

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-17 Thread Tom de Vries

On 17/11/15 16:18, Richard Biener wrote:

IMHO autopar needs to handle induction itself.

>
>I'm not sure what you mean. Could you elaborate?  Autopar handles induction
>variables, but it doesn't handle exit phis reading the final value of the
>induction variable. Is that what you want fixed? How?

Yes.  Perform final value replacement.



I see. Calling scev_const_prop in pass_parallelize_loops_oacc_kernels 
seems to work fine.


Doing the same for pass_parallelize_loops like this:
...
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 17415a8..d944395 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2787,6 +2787,9 @@ pass_parallelize_loops::execute (function *fun)
   if (number_of_loops (fun) <= 1)
 return 0;

+  unsigned int sccp_todo = scev_const_prop ();
+  gcc_assert (sccp_todo == 0);
+
   if (parallelize_loops ())
 {
   fun->curr_properties &= ~(PROP_gimple_eomp);
...
seems to fix PR 68373 - "autopar fails on loop exit phi with argument 
defined outside loop".


The new scev_const_prop call in autopar rewrites this phi into an 
assignment, and that allows parloops to succeed:

...
final value replacement:
  n_2 = PHI 
  with
  n_2 = n_4(D);
...

Thanks,
- Tom


Re: [PATCH] Clarify that -fwrapv covers all signed arithmetic overflow

2015-11-17 Thread Paolo Bonzini


On 17/11/2015 16:27, Joseph Myers wrote:
> > Can you suggest a wording for "if the GNU C language definition changes
> > [which, no matter how unlikely, is explicitly not ruled out by the
> > manual] -fwrapv will be extended to signed shifts, and shifts of
> > negative numbers would return A*2^B whenever the result fits in the type"?
>
> I don't think we can usefully say how a hypothetical change in one area 
> would or would not affect a particular option.

I agree.  That is why I phrased my original patch in the other way,
assuming that overflow _can_ be defined for signed left shifts but it is
not treated as undefined.  My definition of overflow for signed left
shifts would be shifting a 1 into or out of the sign bit.

However, I understood that you don't want to define overflow of signed
left shifts.

The reason why I am proposing this patch is that the current
documentation has a sort of catch-22:

* it doesn't promise that GCC will never rely on undefined behavior
rules for signed left shifts

* it says that -fwrapv affects add/sub/mult

This means that GCC has no future-proof option for projects that wish to
rely on definedness of signed left shifts.

In fact, as you mentioned, ubsan _already_ provides a case where GCC
does not treat left shift as an operation on the bit representation.
This makes it even more important to define such an option _now_ and to
make ubsan respect it (for which I've also sent an RFC patch earlier today).

Paolo


Re: Extend tree-call-cdce to calls whose result is used

2015-11-17 Thread Richard Biener
On Tue, Nov 17, 2015 at 10:19 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Nov 13, 2015 at 2:12 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz  wrote:
> Hi,
>
> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>
>> +static bool
>> +can_use_internal_fn (gcall *call)
>> +{
>> +  /* Only replace calls that set errno.  */
>> +  if (!gimple_vdef (call))
>> +return false;
>
> Oh, I managed to confuse this in my head while reading the patch.  So,
> hmm, you don't actually replace the builtin with an internal function
> (without the condition) under no-errno-math?  Does something else do that?
> Because otherwise that seems an unnecessary restriction?
>
>> >> r229916 fixed that for the non-EH case.
>> >
>> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the
>> > original dominator of the EH destination was the call block it moves,
>> > otherwise it remains unchanged.
>>
>> The target of the edge is easy in itself, I agree, but that isn't
>> necessarily the only affected block, if the EH handler doesn't
>> exit or rethrow.
>
> You're worried the non-EH and the EH regions merge again, right?  Like so:
>
> before change:
>
> BB1: throwing-call
>  fallthru/   \EH
> BB2   BBeh
>  |   /\ (stuff in EH-region)
>  | /some path out of EH region
>  | /--/
> BB3
>
> Here, BB3 must at least be dominated by BB1 (the throwing block), or by
> something further up (when there are other side-entries to the path
> BB2->BB3 or into the EH region).  When further up, nothing changes, when
> it's BB1, then it's afterwards dominated by the BB containing the
> condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh,
> which gets idom=Bcall.  Depending on how you split BB1, either Bcond or
> BBcall might still be BB1 and doesn't lead to changes in the dom tree.
>
>> > Currently we have quite some of such passes (reassoc, forwprop,
>> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap
>> > and others), but they are all handling only special situations in one
>> > way or the other.  pass_fold_builtins is another one, but it seems
>> > most related to what you want (replacing a call with something else),
>> > so I thought that'd be the natural choice.
>>
>> Well, to be pedantic, it's not really replacing the call.  Except for
>> the special case of targets that support direct assignments to errno,
>> it keeps the original call but ensures that it isn't usually executed.
>> From that point of view it doesn't really seem like a fold.
>>
>> But I suppose that's just naming again :-).  And it's easily solved with
>> s/fold/rewrite/.
>
> Exactly, in my mind pass_fold_builtin (like many of the others I
> mentioned) doesn't do folding but rewriting :)

 So I am replying here to the issue of where to do the transform call_cdce
 does and the one Richard wants to add.  For example we "lower"
 posix_memalign as early as GIMPLE lowering (that's before CFG 
 construction).
 We also lower sincos to cexpi during GENERIC folding (or if that is dropped
 either GIMPLE lowering or GIMPLE folding during gimplification would be
 appropriate).

 Now, with offloading we have to avoid creating target dependencies before
 LTO stream-out (thus no IFN replacements before that - not sure if
 Richards patches have an issue there already).
>>>
>>> No, this patch was the earliest point at which we converted to internal
>>> functions.  The idea was to make code treat ECF_PURE built-in functions
>>> and internal functions as being basically equivalent.  There's therefore
>>> not much benefit to doing a straight replacement of one with the other
>>> during either GENERIC or gimple.  Instead the series only used internal
>>> functions for things that built-in functions couldn't do, specifically:
>>>
>>> - the case used in this patch, to optimise part of a non-pure built-in
>>>   function using a pure equivalent.
>>>
>>> - vector versions of built-in functions.
>>>
>>> The cfgexpand patch makes sure that pure built-in functions are expanded
>>> like internal functions where possible.
>>>
 Which would leave us with a lowering stage early in the main
 optimization pipeline - I think fold_builtins pass is way too late but
 any "folding" pass will do (like forwprop or backprop where the latter
 might be better because it might end up computing FP "ranges" to
 improve the initial lowering 

Re: RFA (GGC): PATCH to support GGC finalizers with PCH

2015-11-17 Thread Richard Biener
On Tue, Nov 17, 2015 at 3:09 PM, Jason Merrill  wrote:
> While I was looking at the interaction of delayed folding with GGC, I
> noticed that ggc_handle_finalizers currently runs no finalizers if
> G.context_depth != 0.  So any GC objects in a greater depth will still be
> collected, but they won't have their finalizers run.  This specifically
> affects compiles that use a PCH file, since G.context_depth is set to 1
> after loading the PCH.
>
> This patch fixes ggc_handle_finalizers to look at the depth of each
> finalizer so that we still don't try to run finalizers for non-collectable
> objects loaded from the PCH, but we do run finalizers for collectable
> objects allocated after loading the PCH.
>
> I ended up not relying on this for delayed folding, but it still seems like
> a good bug fix.
>
> Tested x86_64-pc-linux-gnu.  OK for trunk?

Hmm, this enlarges finalizer/vec_finalizer.  Wouldn't it be better to
add separate finalizer vectors for context_depth != 0?  (I'm proposing
to add one for exactly context_depth == 1)

When is context_depth increased other than for PCH?

Richard.


Re: C++ PATCH to integrate c++-delayed-folding branch

2015-11-17 Thread Andreas Schwab
Jason Merrill  writes:

> On 11/17/2015 04:09 AM, Andreas Schwab wrote:
>> Can we please get trunk back to bootstrap land?
>
> Which target isn't bootstrapping for you?

PR68346, PR68361

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Add genmatch support for internal functions

2015-11-17 Thread Richard Biener
On Tue, Nov 17, 2015 at 9:52 AM, Richard Sandiford
 wrote:
> Richard Sandiford  writes:
>> This patch makes genmatch match calls based on combined_fn rather
>> than built_in_function and extends the matching to internal functions.
>> It also uses fold_const_call to fold the calls to a constant, rather
>> than going through fold_builtin_n.
>>
>> In order to slightly simplify the code and remove potential
>> ambiguity, the patch enforces lower case for tree codes
>> (foo->FOO_EXPR), caps for functions (no built_in_hypot->BUILT_IN_HYPOT)
>> and requires an exact match for user-defined identifiers.  The first two
>> were already met in practice but there were a couple of cases where
>> operator lists were defined in one case and used in another.
>>
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>> OK to install?
>
> The updated patch below adds the SCALAR_FLOAT_TYPE_P check discussed here:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01949.html
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * match.pd: Use HYPOT and COS rather than hypot and cos.
> Use CASE_CFN_* macros.  Guard log/exp folds with
> SCALAR_FLOAT_TYPE_P.
> * genmatch.c (internal_fn): New enum.
> (fn_id::fn): Change to an unsigned int.
> (fn_id::fn_id): Accept internal_fn too.
> (add_builtin): Rename to...
> (add_function): ...this and turn into a template.
> (get_operator): Only try one variation if the original name fails.
> Only add _EXPR if the original name was all lower case.
> Try converting internal and built-in function names to their
> CFN equivalents.
> (expr::gen_transform): Use maybe_build_call_expr_loc for generic.
> (dt_simplify::gen_1): Likewise.
> (dt_node::gen_kids_1): Use gimple_call_combined_fn for gimple
> and get_call_combined_fn for generic.
> (dt_simplify::gen): Use combined_fn as the type of fn_ids.
> (decision_tree::gen): Likewise.
> (main): Use lower case in the strings for {VIEW_,}CONVERT[012].
> Use add_function rather than add_builtin.  Register internal
> functions too.
> * generic-match-head.c: Include case-cfn-macros.h.
> * gimple-fold.c (replace_stmt_with_simplification): Use
> gimple_call_combined_fn to test whether we can keep an
> existing call.
> * gimple-match.h (code_helper): Replace built_in_function
> with combined_fn.
> * gimple-match-head.c: Include fold-const-call.h, internal-fn.h
> and case-fn-macros.h.
> (gimple_resimplify1): Use fold_const_call.
> (gimple_resimplify2, gimple_resimplify3): Likewise.
> (build_call_internal, build_call): New functions.
> (maybe_push_res_to_seq): Use them.
> (gimple_simplify): Use fold_const_call.  Set *rcode to a combined_fn
> rather than a built-in function.
> * tree.h (build_call_expr_internal_loc): Declare.
> (maybe_build_call_expr_loc): Likewise.
> * tree.c (build_call_expr_internal_loc_array): New function.
> (maybe_build_call_expr_loc): Likewise.
>
> diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
> index f2e08ed..f55f91e 100644
> --- a/gcc/generic-match-head.c
> +++ b/gcc/generic-match-head.c
> @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dfa.h"
>  #include "builtins.h"
>  #include "dumpfile.h"
> +#include "case-cfn-macros.h"
>
>
>  /* Routine to determine if the types T1 and T2 are effectively
> diff --git a/gcc/genmatch.c b/gcc/genmatch.c
> index 9d74ed7..daa66d9 100644
> --- a/gcc/genmatch.c
> +++ b/gcc/genmatch.c
> @@ -230,6 +230,12 @@ enum built_in_function {
>  END_BUILTINS
>  };
>
> +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) IFN_##CODE,
> +enum internal_fn {
> +#include "internal-fn.def"
> +  IFN_LAST
> +};
> +
>  /* Return true if CODE represents a commutative tree code.  Otherwise
> return false.  */
>  bool
> @@ -341,13 +347,15 @@ struct operator_id : public id_base
>const char *tcc;
>  };
>
> -/* Identifier that maps to a builtin function code.  */
> +/* Identifier that maps to a builtin or internal function code.  */
>
>  struct fn_id : public id_base
>  {
>fn_id (enum built_in_function fn_, const char *id_)
>: id_base (id_base::FN, id_), fn (fn_) {}
> -  enum built_in_function fn;
> +  fn_id (enum internal_fn fn_, const char *id_)
> +  : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)) {}
> +  unsigned int fn;
>  };
>
>  struct simplify;
> @@ -447,10 +455,12 @@ add_operator (enum tree_code code, const char *id,
>*slot = op;
>  }
>
> -/* Add a builtin identifier to the hash.  */
> +/* Add a built-in or internal function identifier to the hash.  ID is
> +   

Re: Ping: [PATCH 3/6] Vectorize internal functions

2015-11-17 Thread Richard Biener
On Tue, Nov 17, 2015 at 10:30 AM, Richard Sandiford
 wrote:
> Thanks for all the reviews for this series.  I think the patch below
> is the only target-independent one that hasn't had any comments.

This patch is ok.

Thanks,
Richard.

> Richard
>
> Richard Sandiford  writes:
>> This patch tries to vectorize built-in and internal functions as
>> internal functions first, falling back on the current built-in
>> target hooks otherwise.
>>
>>
>> gcc/
>>   * internal-fn.h (direct_internal_fn_info): Add vectorizable flag.
>>   * internal-fn.c (direct_internal_fn_array): Update accordingly.
>>   * tree-vectorizer.h (vectorizable_function): Delete.
>>   * tree-vect-stmts.c: Include internal-fn.h.
>>   (vectorizable_internal_function): New function.
>>   (vectorizable_function): Inline into...
>>   (vectorizable_call): ...here.  Explicitly reject calls that read
>>   from or write to memory.  Try using an internal function before
>>   falling back on the old vectorizable_function behavior.
>>
>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>> index 898c83d..a5bda2f 100644
>> --- a/gcc/internal-fn.c
>> +++ b/gcc/internal-fn.c
>> @@ -69,13 +69,13 @@ init_internal_fns ()
>>
>>  /* Create static initializers for the information returned by
>> direct_internal_fn.  */
>> -#define not_direct { -2, -2 }
>> -#define mask_load_direct { -1, -1 }
>> -#define load_lanes_direct { -1, -1 }
>> -#define mask_store_direct { 3, 3 }
>> -#define store_lanes_direct { 0, 0 }
>> -#define unary_direct { 0, 0 }
>> -#define binary_direct { 0, 0 }
>> +#define not_direct { -2, -2, false }
>> +#define mask_load_direct { -1, -1, false }
>> +#define load_lanes_direct { -1, -1, false }
>> +#define mask_store_direct { 3, 3, false }
>> +#define store_lanes_direct { 0, 0, false }
>> +#define unary_direct { 0, 0, true }
>> +#define binary_direct { 0, 0, true }
>>
>>  const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
>>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
>> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
>> index 6cb123f..aea6abd 100644
>> --- a/gcc/internal-fn.h
>> +++ b/gcc/internal-fn.h
>> @@ -134,6 +134,14 @@ struct direct_internal_fn_info
>>   function isn't directly mapped to an optab.  */
>>signed int type0 : 8;
>>signed int type1 : 8;
>> +  /* True if the function is pointwise, so that it can be vectorized by
>> + converting the return type and all argument types to vectors of the
>> + same number of elements.  E.g. we can vectorize an IFN_SQRT on
>> + floats as an IFN_SQRT on vectors of N floats.
>> +
>> + This only needs 1 bit, but occupies the full 16 to ensure a nice
>> + layout.  */
>> +  unsigned int vectorizable : 16;
>>  };
>>
>>  extern const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1];
>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> index 75389c4..1142142 100644
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-scalar-evolution.h"
>>  #include "tree-vectorizer.h"
>>  #include "builtins.h"
>> +#include "internal-fn.h"
>>
>>  /* For lang_hooks.types.type_for_mode.  */
>>  #include "langhooks.h"
>> @@ -1632,27 +1633,32 @@ vect_finish_stmt_generation (gimple *stmt, gimple 
>> *vec_stmt,
>>  add_stmt_to_eh_lp (vec_stmt, lp_nr);
>>  }
>>
>> -/* Checks if CALL can be vectorized in type VECTYPE.  Returns
>> -   a function declaration if the target has a vectorized version
>> -   of the function, or NULL_TREE if the function cannot be vectorized.  */
>> +/* We want to vectorize a call to combined function CFN with function
>> +   decl FNDECL, using VECTYPE_OUT as the type of the output and VECTYPE_IN
>> +   as the types of all inputs.  Check whether this is possible using
>> +   an internal function, returning its code if so or IFN_LAST if not.  */
>>
>> -tree
>> -vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
>> +static internal_fn
>> +vectorizable_internal_function (combined_fn cfn, tree fndecl,
>> + tree vectype_out, tree vectype_in)
>>  {
>> -  /* We only handle functions that do not read or clobber memory.  */
>> -  if (gimple_vuse (call))
>> -return NULL_TREE;
>> -
>> -  combined_fn fn = gimple_call_combined_fn (call);
>> -  if (fn != CFN_LAST)
>> -return targetm.vectorize.builtin_vectorized_function
>> -  (fn, vectype_out, vectype_in);
>> -
>> -  if (gimple_call_builtin_p (call, BUILT_IN_MD))
>> -return targetm.vectorize.builtin_md_vectorized_function
>> -  (gimple_call_fndecl (call), vectype_out, vectype_in);
>> -
>> -  return NULL_TREE;
>> +  internal_fn ifn;
>> +  if (internal_fn_p (cfn))
>> +ifn = as_internal_fn (cfn);
>> +  else
>> +ifn = associated_internal_fn (fndecl);
>> +  if (ifn != IFN_LAST && direct_internal_fn_p (ifn))
>> +   

RFA (GGC): PATCH to support GGC finalizers with PCH

2015-11-17 Thread Jason Merrill
While I was looking at the interaction of delayed folding with GGC, I 
noticed that ggc_handle_finalizers currently runs no finalizers if 
G.context_depth != 0.  So any GC objects in a greater depth will still 
be collected, but they won't have their finalizers run.  This 
specifically affects compiles that use a PCH file, since G.context_depth 
is set to 1 after loading the PCH.


This patch fixes ggc_handle_finalizers to look at the depth of each 
finalizer so that we still don't try to run finalizers for 
non-collectable objects loaded from the PCH, but we do run finalizers 
for collectable objects allocated after loading the PCH.


I ended up not relying on this for delayed folding, but it still seems 
like a good bug fix.


Tested x86_64-pc-linux-gnu.  OK for trunk?
commit 0bd746ae39b37b9b08e4d861d97fe30ecf4e8ad8
Author: Jason Merrill 
Date:   Fri Nov 13 09:39:15 2015 -0500

	Support GGC finalizers with PCH.

	* ggc-page.c (class finalizer): Add m_depth field.
	(finalizer::finalizer): Initialize it.
	(finalizer::depth): Return it.
	(class vec_finalizer): Likewise.
	(ggc_internal_alloc): Adjust constructor calls.
	(ggc_handle_finalizers): Run finalizers that are deep enough.

diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
index deb21bb..1d5aeef 100644
--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -331,22 +331,26 @@ typedef struct page_table_chain
 class finalizer
 {
 public:
-  finalizer (void *addr, void (*f)(void *)) : m_addr (addr), m_function (f) {}
+  finalizer (void *addr, void (*f)(void *), unsigned short depth)
+: m_addr (addr), m_function (f), m_depth(depth) {}
 
   void *addr () const { return m_addr; }
-
+  unsigned short depth () const { return m_depth; }
   void call () const { m_function (m_addr); }
 
 private:
   void *m_addr;
   void (*m_function)(void *);
+  unsigned short m_depth;
 };
 
 class vec_finalizer
 {
 public:
-  vec_finalizer (uintptr_t addr, void (*f)(void *), size_t s, size_t n) :
-m_addr (addr), m_function (f), m_object_size (s), m_n_objects (n) {}
+  vec_finalizer (uintptr_t addr, void (*f)(void *), size_t s, size_t n,
+		 unsigned short depth)
+: m_addr (addr), m_function (f), m_object_size (s), m_n_objects (n),
+m_depth (depth) {}
 
   void call () const
 {
@@ -355,13 +359,15 @@ public:
 }
 
   void *addr () const { return reinterpret_cast (m_addr); }
+  unsigned short depth () const { return m_depth; }
 
 private:
   uintptr_t m_addr;
   void (*m_function)(void *);
   size_t m_object_size;
   size_t m_n_objects;
-  };
+  unsigned short m_depth;
+};
 
 #ifdef ENABLE_GC_ALWAYS_COLLECT
 /* List of free objects to be verified as actually free on the
@@ -1388,10 +1394,11 @@ ggc_internal_alloc (size_t size, void (*f)(void *), size_t s, size_t n
   timevar_ggc_mem_total += object_size;
 
   if (f && n == 1)
-G.finalizers.safe_push (finalizer (result, f));
+G.finalizers.safe_push (finalizer (result, f, G.context_depth));
   else if (f)
 G.vec_finalizers.safe_push
-  (vec_finalizer (reinterpret_cast (result), f, s, n));
+  (vec_finalizer (reinterpret_cast (result), f, s, n,
+		  G.context_depth));
 
   if (GATHER_STATISTICS)
 {
@@ -1875,14 +1882,12 @@ clear_marks (void)
 static void
 ggc_handle_finalizers ()
 {
-  if (G.context_depth != 0)
-return;
-
   unsigned length = G.finalizers.length ();
   for (unsigned int i = 0; i < length;)
 {
   finalizer  = G.finalizers[i];
-  if (!ggc_marked_p (f.addr ()))
+  if (f.depth() >= G.context_depth
+	  && !ggc_marked_p (f.addr ()))
 	{
 	  f.call ();
 	  G.finalizers.unordered_remove (i);
@@ -1897,7 +1902,8 @@ ggc_handle_finalizers ()
   for (unsigned int i = 0; i < length;)
 {
   vec_finalizer  = G.vec_finalizers[i];
-  if (!ggc_marked_p (f.addr ()))
+  if (f.depth() >= G.context_depth
+	  && !ggc_marked_p (f.addr ()))
 	{
 	  f.call ();
 	  G.vec_finalizers.unordered_remove (i);


Re: C++ PATCH to integrate c++-delayed-folding branch

2015-11-17 Thread Alan Lawrence

On 14/11/15 00:07, Jason Merrill wrote:

And here's the final patch integrating the delayed folding branch.  The general
idea is to mostly avoid folding until the end of the function, at which point we
fold everything as part of genericization.  Since many warnings rely on looking
at folded trees, we fold the arguments that we pass to the various warning
functions.  To avoid issues with combinatorial explosion, we cache the results
of folding in a hash_map.

In the future we probably want to move many of these warnings into
language-independent code so we can avoid folding for them in the front end; we
also shouldn't need to fold during genericization, but apparently not doing that
leads to optimization regressions.

This is mostly Kai's work; I just cleaned it up a bit to get it ready for the
merge.  Marek also helped some.  The largest change is hooking into the GTY
machinery to handle throwing away the folding cache rather than trying to do it
manually in various places.

Tested x86_64-pc-linux-gnu, applying to trunk.


Also PR68385 (arm-none-eabi).



Re: Replace match.pd DEFINE_MATH_FNs with auto-generated lists

2015-11-17 Thread Richard Biener
On Tue, Nov 17, 2015 at 10:23 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On November 10, 2015 9:13:25 PM GMT+01:00, Richard Sandiford
>>  wrote:
>>>Richard Biener  writes:
 On Sat, Nov 7, 2015 at 2:23 PM, Richard Sandiford
  wrote:
> diff --git a/gcc/genmatch.c b/gcc/genmatch.c
> index cff32b0..7139476 100644
> --- a/gcc/genmatch.c
> +++ b/gcc/genmatch.c
> @@ -4638,6 +4638,11 @@ main (int argc, char **argv)
>cpp_callbacks *cb = cpp_get_callbacks (r);
>cb->error = error_cb;
>
> +  /* Add the build directory to the #include "" search path.  */
> +  cpp_dir *dir = XCNEW (cpp_dir);
> +  dir->name = ASTRDUP (".");
> +  cpp_set_include_chains (r, dir, NULL, false);

 Does that work on non-UNIX hosts?
>>>
>>>Bah, hadn't thought about that.
>>>
 I wonder if there is sth
 better we can use by passing some -DXXX=... to the genmatch
 build command from the Makefile?
>>>
>>>toplev.c has:
>>>
>>>  src_pwd = getpwd ();
>>>  if (!src_pwd)
>>>  src_pwd = ".";
>>>
>>>where getpwd is a libiberty function.  Maybe we can use that?
>>
>> Looks like so.
>
> OK, here's the updated patch.  Tested as before.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * Makefile.in (MOSTLYCLEANFILES): Add cfn-operators.pd.
> (generated_files): Likewise.
> (s-cfn-operators, cfn-operators.pd): New rules.
> (s-match): Depend on cfn-operators.pd.
> * gencfn-macros.c: Expand comment to describe -o behavior.
> (print_define_operator_list): New function.
> (main): Accept -o.  Call print_define_operator_list.
> * genmatch.c (main): Add the current directory to the include path.
> * match.pd (DEFINE_MATH_FN): Delete.  Include cfn-operators.pd
> instead.
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index ba8108d..0fd8d99 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1570,7 +1570,7 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h 
> insn-codes.h \
>   tm-preds.h tm-constrs.h checksum-options gimple-match.c generic-match.c \
>   tree-check.h min-insn-modes.c insn-modes.c insn-modes.h \
>   genrtl.h gt-*.h gtype-*.h gtype-desc.c gtyp-input.list \
> - case-cfn-macros.h \
> + case-cfn-macros.h cfn-operators.pd \
>   xgcc$(exeext) cpp$(exeext) $(FULL_DRIVER_NAME) \
>   $(EXTRA_PROGRAMS) gcc-cross$(exeext) \
>   $(SPECS) collect2$(exeext) gcc-ar$(exeext) gcc-nm$(exeext) \
> @@ -2256,6 +2256,14 @@ s-case-cfn-macros: build/gencfn-macros$(build_exeext)
> $(STAMP) s-case-cfn-macros
>  case-cfn-macros.h: s-case-cfn-macros; @true
>
> +s-cfn-operators: build/gencfn-macros$(build_exeext)
> +   $(RUN_GEN) build/gencfn-macros$(build_exeext) -o \
> + > tmp-cfn-operators.pd
> +   $(SHELL) $(srcdir)/../move-if-change tmp-cfn-operators.pd \
> + cfn-operators.pd
> +   $(STAMP) s-cfn-operators
> +cfn-operators.pd: s-cfn-operators; @true
> +
>  target-hooks-def.h: s-target-hooks-def-h; @true
>  # make sure that when we build info files, the used tm.texi is up to date.
>  $(srcdir)/doc/tm.texi: s-tm-texi; @true
> @@ -2322,7 +2330,7 @@ s-tm-texi: build/genhooks$(build_exeext) 
> $(srcdir)/doc/tm.texi.in
>  gimple-match.c: s-match gimple-match-head.c ; @true
>  generic-match.c: s-match generic-match-head.c ; @true
>
> -s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd
> +s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd cfn-operators.pd
> $(RUN_GEN) build/genmatch$(build_exeext) --gimple $(srcdir)/match.pd \
> > tmp-gimple-match.c
> $(RUN_GEN) build/genmatch$(build_exeext) --generic $(srcdir)/match.pd 
> \
> @@ -2443,7 +2451,8 @@ generated_files = config.h tm.h $(TM_P_H) $(TM_H) 
> multilib.h \
> $(ALL_GTFILES_H) gtype-desc.c gtype-desc.h gcov-iov.h \
> options.h target-hooks-def.h insn-opinit.h \
> common/common-target-hooks-def.h pass-instances.def \
> -   c-family/c-target-hooks-def.h params.list case-cfn-macros.h
> +   c-family/c-target-hooks-def.h params.list case-cfn-macros.h \
> +   cfn-operators.pd
>
>  #
>  # How to compile object files to run on the build machine.
> diff --git a/gcc/gencfn-macros.c b/gcc/gencfn-macros.c
> index 5ee3af0..401c429 100644
> --- a/gcc/gencfn-macros.c
> +++ b/gcc/gencfn-macros.c
> @@ -40,7 +40,27 @@ along with GCC; see the file COPYING3.  If not see
>case CFN_BUILT_IN_SQRTL:
>case CFN_SQRT:
>
> -   The macros for groups with no internal function drop the last line.  */
> +   The macros for groups with no internal function drop the last line.
> +
> +   When run with -o, the generator prints a similar list of
> +   define_operator_list directives, for use by match.pd.  Each operator
> +   list starts with the built-in functions, in order of 

Re: Short-cut generation of simple built-in functions

2015-11-17 Thread Richard Biener
On Tue, Nov 17, 2015 at 10:55 AM, Richard Sandiford
 wrote:
> Richard Sandiford  writes:
>> Richard Biener  writes:
>>> On Tue, Nov 10, 2015 at 10:24 PM, Richard Sandiford
>>>  wrote:
 Richard Biener  writes:
> On Sat, Nov 7, 2015 at 2:31 PM, Richard Sandiford
>  wrote:
>> This patch short-circuits the builtins.c expansion code for a particular
>> gimple call if:
>>
>> - the function has an associated internal function
>> - the target implements that internal function
>> - the call has no side effects
>>
>> This allows a later patch to remove the builtins.c code, once calls with
>> side effects have been handled.
>>
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>> OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> gcc/
>> * builtins.h (called_as_built_in): Declare.
>> * builtins.c (called_as_built_in): Make external.
>> * internal-fn.h (expand_internal_call): Define a variant that
>> specifies the internal function explicitly.
>> * internal-fn.c (expand_load_lanes_optab_fn)
>> (expand_store_lanes_optab_fn, expand_ANNOTATE, 
>> expand_GOMP_SIMD_LANE)
>> (expand_GOMP_SIMD_VF, expand_GOMP_SIMD_LAST_LANE)
>> (expand_GOMP_SIMD_ORDERED_START, expand_GOMP_SIMD_ORDERED_END)
>> (expand_UBSAN_NULL, expand_UBSAN_BOUNDS, expand_UBSAN_VPTR)
>> (expand_UBSAN_OBJECT_SIZE, expand_ASAN_CHECK, 
>> expand_TSAN_FUNC_EXIT)
>> (expand_UBSAN_CHECK_ADD, expand_UBSAN_CHECK_SUB)
>> (expand_UBSAN_CHECK_MUL, expand_ADD_OVERFLOW, 
>> expand_SUB_OVERFLOW)
>> (expand_MUL_OVERFLOW, expand_LOOP_VECTORIZED)
>> (expand_mask_load_optab_fn, expand_mask_store_optab_fn)
>> (expand_ABNORMAL_DISPATCHER, expand_BUILTIN_EXPECT, 
>> expand_VA_ARG)
>> (expand_UNIQUE, expand_GOACC_DIM_SIZE, expand_GOACC_DIM_POS)
>> (expand_GOACC_LOOP, expand_GOACC_REDUCTION, 
>> expand_direct_optab_fn)
>> (expand_unary_optab_fn, expand_binary_optab_fn): Add an 
>> internal_fn
>> argument.
>> (internal_fn_expanders): Update prototype.
>> (expand_internal_call): Define a variant that specifies the
>> internal function explicitly. Use it to implement the previous
>> interface.
>> * cfgexpand.c (expand_call_stmt): Try to expand calls to built-in
>> functions as calls to internal functions.
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index f65011e..bbcc7dc3 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -222,7 +222,7 @@ is_builtin_fn (tree decl)
>> of the optimization level.  This means whenever a function is 
>> invoked with
>> its "internal" name, which normally contains the prefix "__builtin". 
>>  */
>>
>> -static bool
>> +bool
>>  called_as_built_in (tree node)
>>  {
>>/* Note that we must use DECL_NAME, not DECL_ASSEMBLER_NAME_SET_P 
>> since
>> diff --git a/gcc/builtins.h b/gcc/builtins.h
>> index 917eb90..1d00068 100644
>> --- a/gcc/builtins.h
>> +++ b/gcc/builtins.h
>> @@ -50,6 +50,7 @@ extern struct target_builtins *this_target_builtins;
>>  extern bool force_folding_builtin_constant_p;
>>
>>  extern bool is_builtin_fn (tree);
>> +extern bool called_as_built_in (tree);
>>  extern bool get_object_alignment_1 (tree, unsigned int *,
>> unsigned HOST_WIDE_INT *);
>>  extern unsigned int get_object_alignment (tree);
>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>> index bfbc958..dc7d4f5 100644
>> --- a/gcc/cfgexpand.c
>> +++ b/gcc/cfgexpand.c
>> @@ -2551,10 +2551,25 @@ expand_call_stmt (gcall *stmt)
>>return;
>>  }
>>
>> +  /* If this is a call to a built-in function and it has no effect other
>> + than setting the lhs, try to implement it using an internal 
>> function
>> + instead.  */
>> +  decl = gimple_call_fndecl (stmt);
>> +  if (gimple_call_lhs (stmt)
>> +  && !gimple_vdef (stmt)
>
> I think you want && ! gimple_has_side_effects (stmt)
> instead of checking !gimple_vdef (stmt).

 OK, I can do that, but what would the difference be in practice for
 these types of call?  I.e. are there cases for built-ins where:

   (A) gimple_vdef (stmt) && !gimple_side_effects (stmt)

 or:

   (B) !gimple_vdef (stmt) && gimple_side_effects (stmt)

 ?
>>>
>>> There was talk to make calls use volatile to prevent CSE and friends.
>>>
>>> Using gimple_has_side_effects 

Re: [PATCH] g++.dg/cpp1y/pr58708.C wchar_t size

2015-11-17 Thread David Edelsohn
On Tue, Nov 17, 2015 at 11:22 AM, Jonathan Wakely  wrote:
> On 17 November 2015 at 16:04, David Edelsohn wrote:
>> The testcase in the GCC testsuite assumes that wchar_t is 32 bits,
>> which is not correct on AIX.  32 bit AIX maintains 16 bit wchar_t for
>> backward compatibility (64 bit AIX uses 32 bit wchar_t).
>>
>> What is the preferred method to make the testcase safe for smaller wchar_t?
>>
>> The following patch works for me.  I wasn't sure what header file and
>> what macro test would be considered portable.  I could include
>> stdint.h and compare
>>
>> WCHAR_MAX == UINT16_MAX
>>
>> or
>>
>> WCHAR_MAX < UINT32_MAX
>
> __SIZEOF_WCHAR_T__ is always pre-defined  by the compiler, so that
> could be used.

Thanks for the pointer.  How about the following?

Thanks, David


Index: pr58708.C
===
--- pr58708.C   (revision 230463)
+++ pr58708.C   (working copy)
@@ -43,7 +43,11 @@
   if (foo.chars[1] != 98) __builtin_abort();
   if (foo.chars[2] != 99) __builtin_abort();

-  auto wfoo = L"\x01020304\x05060708"_foo;
+#if __SIZEOF_WCHAR_T__ == 2
+auto wfoo = L"\x0102\x0304"_foo;
+#else
+auto wfoo = L"\x01020304\x05060708"_foo;
+#endif
   if (is_same::value != true)
__builtin_abort();
   if (sizeof(wfoo.chars)/sizeof(wchar_t) != 2) __builtin_abort();
   if (wfoo.chars[0] != 16909060) __builtin_abort();


Re: [PATCH] Clarify that -fwrapv covers all signed arithmetic overflow

2015-11-17 Thread Paolo Bonzini


On 17/11/2015 17:02, Joseph Myers wrote:
> On Tue, 17 Nov 2015, Paolo Bonzini wrote:
> 
>> * it doesn't promise that GCC will never rely on undefined behavior
>> rules for signed left shifts
> 
> I think we should remove the ", but this is subject to change" in 
> implement-c.texi (while replacing it with noting that ubsan will still 
> diagnose such cases, and they will also be diagnosed where constant 
> expressions are required).

That's great.  I'll send a patch.

Paolo


Re: [PATCH] Clarify that -fwrapv covers all signed arithmetic overflow

2015-11-17 Thread Paolo Bonzini


On 17/11/2015 17:02, Joseph Myers wrote:
> On Tue, 17 Nov 2015, Paolo Bonzini wrote:
> 
>> * it doesn't promise that GCC will never rely on undefined behavior
>> rules for signed left shifts
> 
> I think we should remove the ", but this is subject to change" in 
> implement-c.texi (while replacing it with noting that ubsan will still 
> diagnose such cases, and they will also be diagnosed where constant 
> expressions are required).

... hmm, are you sure?  None of the following warn for me

int x = -1 << 2;
int y = 1 << 31;
int z = 2 << 31;

I tried with any combination of -ansi, -pedantic, -std=cXX,
-fsanitize=undefined.

Paolo


Re: [PATCH] Clarify that -fwrapv covers all signed arithmetic overflow

2015-11-17 Thread Joseph Myers
On Tue, 17 Nov 2015, Paolo Bonzini wrote:

> On 17/11/2015 17:02, Joseph Myers wrote:
> > On Tue, 17 Nov 2015, Paolo Bonzini wrote:
> > 
> >> * it doesn't promise that GCC will never rely on undefined behavior
> >> rules for signed left shifts
> > 
> > I think we should remove the ", but this is subject to change" in 
> > implement-c.texi (while replacing it with noting that ubsan will still 
> > diagnose such cases, and they will also be diagnosed where constant 
> > expressions are required).
> 
> ... hmm, are you sure?  None of the following warn for me
> 
> int x = -1 << 2;
> int y = 1 << 31;
> int z = 2 << 31;
> 
> I tried with any combination of -ansi, -pedantic, -std=cXX,
> -fsanitize=undefined.

With a recent trunk build I get:

$ ./build/gcc/xgcc -B./build/gcc/ -S -o /dev/null -pedantic -std=c11 t.c 
t.c:1:9: warning: initializer element is not a constant expression [-Wpedantic]
 int x = -1 << 2;
 ^

t.c:2:9: warning: initializer element is not a constant expression [-Wpedantic]
 int y = 1 << 31;
 ^

t.c:3:11: warning: result of '2 << 31' requires 34 bits to represent, but 'int' 
only has 32 bits [-Wshift-overflow=]
 int z = 2 << 31;
   ^

t.c:3:9: warning: initializer element is not a constant expression [-Wpedantic]
 int z = 2 << 31;
 ^

(and -pedantic-errors produces errors for the "not a constant expression" 
cases, as expected).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: nvptx priority queues nonsupport in libgomp

2015-11-17 Thread Cesar Philippidis
On 11/17/2015 09:23 AM, Nathan Sidwell wrote:
> On 11/17/15 12:23, Nathan Sidwell wrote:
>> On 11/17/15 12:16, Cesar Philippidis wrote:
>>> This patch adds an empty priority_queues.c in libgomp for nvptx targets.
>>> Nvptx targets don't have sufficient support for a complete libgomp
>>> library, so we're only building a subset of it. And without that empty
>>> file, I was seeing an error message that looked like this:
>>>
>>> libgomp/libgomp.h:122:17: fatal error: sem.h: No such file or directory
>>>   #include "sem.h"
>>>
>>> I'm still running the entire testsuite, but it doesn't introduce any new
>>> regressions in libgomp.oacc-c. Is this OK for trunk, or am I missing
>>> something?
>>
>> Please apply to trunk.  I've just tripped over it, you've saved me  an
>> investigation ...
> 
> Actually, please put a comment in the file, rather than leave it empty

OK. I've applied this patch in r230466.

Cesar

2015-11-17  Cesar Philippidis  

	libgomp/
	* config/nvptx/priority_queue.c: New file.

diff --git a/libgomp/config/nvptx/priority_queue.c b/libgomp/config/nvptx/priority_queue.c
new file mode 100644
index 000..63aecd2
--- /dev/null
+++ b/libgomp/config/nvptx/priority_queue.c
@@ -0,0 +1 @@
+/* Empty stub for omp task priority support.  */


nvptx priority queues nonsupport in libgomp

2015-11-17 Thread Cesar Philippidis
This patch adds an empty priority_queues.c in libgomp for nvptx targets.
Nvptx targets don't have sufficient support for a complete libgomp
library, so we're only building a subset of it. And without that empty
file, I was seeing an error message that looked like this:

libgomp/libgomp.h:122:17: fatal error: sem.h: No such file or directory
 #include "sem.h"

I'm still running the entire testsuite, but it doesn't introduce any new
regressions in libgomp.oacc-c. Is this OK for trunk, or am I missing
something?

Cesar
2015-11-17  Cesar Philippidis  

	libgomp/
	* config/nvptx/priority_queue.c: New empty file.

diff --git a/libgomp/config/nvptx/priority_queue.c b/libgomp/config/nvptx/priority_queue.c
new file mode 100644
index 000..e69de29


Re: [PATCH] PR/67682, break SLP groups up if only some elements match

2015-11-17 Thread Alan Lawrence

On 16/11/15 14:42, Christophe Lyon wrote:


Hi Alan,

I've noticed that this new test (gcc.dg/vect/bb-slp-subgroups-3.c)
fails for armeb targets.
I haven't had time to look at more details yet, but I guess you can
reproduce it quickly enough.



Thanks - yes I see it now.

-fdump-tree-optimized looks sensible:

__attribute__((noinline))
test ()
{
  vector(4) int vect__14.21;
  vector(4) int vect__2.20;
  vector(4) int vect__2.19;
  vector(4) int vect__3.13;
  vector(4) int vect__2.12;

  :
  vect__2.12_24 = MEM[(int *)];
  vect__3.13_27 = vect__2.12_24 + { 1, 2, 3, 4 };
  MEM[(int *)] = vect__3.13_27;
  vect__2.19_31 = MEM[(int *) + 16B];
  vect__2.20_33 = VEC_PERM_EXPR ;
  vect__14.21_35 = vect__2.20_33 * { 3, 4, 5, 7 };
  MEM[(int *) + 16B] = vect__14.21_35;
  return;
}

but while a[0...3] end up containing 5 7 9 11 as expected,
a[4..7] end up with 30 32 30 28 rather than the expected 12 24 40 70.
That is, we end up with (10 8 6 4), rather than the expected (4 6 8 10), being 
multiplied by {3,4,5,7}. Looking at the RTL, those values come from a UZP1/2 
pair that should extract elements {0,2,4,6} of b. Assembler, with my workings as 
to what's in each register:


test:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr2, #:lower16:b
movtr2, #:upper16:b
vldrd22, .L11
vldrd23, .L11+8
;; So d22 = (3 4), d23 = (5 7), q11 = (5 7 3 4)
movwr3, #:lower16:a
movtr3, #:upper16:a
vld1.64 {d16-d17}, [r2:64]
;; So d16 = (b[0] b[1]), d17 = (b[2] b[3]), q8 = (b[2] b[3] b[0] b[1])
vmovq9, q8  @ v4si
;; q9 = (b[2] b[3] b[0] b[1])
vldrd20, [r2, #16]
vldrd21, [r2, #24]
;; So d20 = (b[4] b[5]), d21 = (b[6] b[7]), q10 = (b[6] b[7] b[4] b[5])
vuzp.32 q10, q9
;; So  q10 = (b[3] b[1] b[7] b[5]), i.e. d20 = (b[7] b[5]) and d21 = (b[3] b[1])
;; and q9 = (b[2] b[0] b[6] b[4]), i.e. d18 = (b[6] b[4]) and d19 = (b[2] b[0])
vldrd20, .L11+16
vldrd21, .L11+24
;; d20 = (1 2), d21 = (3 4), q10 = (3 4 1 2)
vmul.i32q9, q9, q11
;; q9 = (b[2]*5 b[0]*7 b[6]*3 b[4]*4)
;; i.e. d18 = (b[6]*3 b[4]*4) and d19 = (b[2]*5 b[0]*7)
vadd.i32q8, q8, q10
;; q8 = (b[2]+3 b[3]+4 b[0]+1 b[1]+2)
;; i.e. d16 = (b[0]+1 b[1]+2), d17 = (b[2]+3 b[3]+4)
vst1.64 {d16-d17}, [r3:64]
;; a[0] = b[0]+1, a[1] = b[1]+2, a[2] = b[2]+3, a[3]=b[3]+4 all ok
vstrd18, [r3, #16]
;; a[4] = b[6]*3, a[5] = b[4]*4
vstrd19, [r3, #24]
;; a[6] = b[2]*5, a[7] = b[0]*7
bx  lr
.L12:
.align  3
.L11:
.word   3
.word   4
.word   5
.word   7
.word   1
.word   2
.word   3
.word   4

Which is to say - the bit order in the q-registers, is neither big- nor 
little-endian, but the elements get stored back to memory in a consistent order 
with how they were loaded, so we're OK as long as there are no permutes. 
Unfortunately for UZP this lane ordering mixup is not idempotent and messes 
everything up...


Hmmm. I'm feeling that "properly" fixing this testcase, amounts to fixing 
armeb's whole register-numbering/lane-flipping scheme, and might be quite a 
large task. OTOH it might also fix the significant number of failing vectorizer 
tests. A simpler solution might be to disable...some part of vector 
supporton armeb, but I'm not sure which part would be best, yet.


Thoughts (CC maintainers)?

--Alan



Re: nvptx priority queues nonsupport in libgomp

2015-11-17 Thread Jakub Jelinek
On Tue, Nov 17, 2015 at 09:16:05AM -0800, Cesar Philippidis wrote:
> This patch adds an empty priority_queues.c in libgomp for nvptx targets.
> Nvptx targets don't have sufficient support for a complete libgomp
> library, so we're only building a subset of it. And without that empty
> file, I was seeing an error message that looked like this:
> 
> libgomp/libgomp.h:122:17: fatal error: sem.h: No such file or directory
>  #include "sem.h"
> 
> I'm still running the entire testsuite, but it doesn't introduce any new
> regressions in libgomp.oacc-c. Is this OK for trunk, or am I missing
> something?
> 
> Cesar

> 2015-11-17  Cesar Philippidis  
> 
>   libgomp/
>   * config/nvptx/priority_queue.c: New empty file.
> 
> diff --git a/libgomp/config/nvptx/priority_queue.c 
> b/libgomp/config/nvptx/priority_queue.c
> new file mode 100644
> index 000..e69de29

Ok for trunk.

Jakub


Re: nvptx priority queues nonsupport in libgomp

2015-11-17 Thread Nathan Sidwell

On 11/17/15 12:23, Nathan Sidwell wrote:

On 11/17/15 12:16, Cesar Philippidis wrote:

This patch adds an empty priority_queues.c in libgomp for nvptx targets.
Nvptx targets don't have sufficient support for a complete libgomp
library, so we're only building a subset of it. And without that empty
file, I was seeing an error message that looked like this:

libgomp/libgomp.h:122:17: fatal error: sem.h: No such file or directory
  #include "sem.h"

I'm still running the entire testsuite, but it doesn't introduce any new
regressions in libgomp.oacc-c. Is this OK for trunk, or am I missing
something?


Please apply to trunk.  I've just tripped over it, you've saved me  an
investigation ...


Actually, please put a comment in the file, rather than leave it empty


--
Nathan Sidwell - Director, Sourcery Services - Mentor Embedded


Re: nvptx priority queues nonsupport in libgomp

2015-11-17 Thread Nathan Sidwell

On 11/17/15 12:16, Cesar Philippidis wrote:

This patch adds an empty priority_queues.c in libgomp for nvptx targets.
Nvptx targets don't have sufficient support for a complete libgomp
library, so we're only building a subset of it. And without that empty
file, I was seeing an error message that looked like this:

libgomp/libgomp.h:122:17: fatal error: sem.h: No such file or directory
  #include "sem.h"

I'm still running the entire testsuite, but it doesn't introduce any new
regressions in libgomp.oacc-c. Is this OK for trunk, or am I missing
something?


Please apply to trunk.  I've just tripped over it, you've saved me  an 
investigation ...


nathan

--
Nathan Sidwell


Re: nvptx priority queues nonsupport in libgomp

2015-11-17 Thread Jakub Jelinek
On Tue, Nov 17, 2015 at 12:23:51PM -0500, Nathan Sidwell wrote:
> On 11/17/15 12:23, Nathan Sidwell wrote:
> >On 11/17/15 12:16, Cesar Philippidis wrote:
> >>This patch adds an empty priority_queues.c in libgomp for nvptx targets.
> >>Nvptx targets don't have sufficient support for a complete libgomp
> >>library, so we're only building a subset of it. And without that empty
> >>file, I was seeing an error message that looked like this:
> >>
> >>libgomp/libgomp.h:122:17: fatal error: sem.h: No such file or directory
> >>  #include "sem.h"
> >>
> >>I'm still running the entire testsuite, but it doesn't introduce any new
> >>regressions in libgomp.oacc-c. Is this OK for trunk, or am I missing
> >>something?
> >
> >Please apply to trunk.  I've just tripped over it, you've saved me  an
> >investigation ...
> 
> Actually, please put a comment in the file, rather than leave it empty

Yeah, something like
/* Intentionally empty.  */
or similar is better than empty file.

Jakub


[wwwdocs] Update libstdc++ release notes in gcc-6/changes.html

2015-11-17 Thread Jonathan Wakely

Committed to cvs.

Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.42
diff -u -r1.42 changes.html
--- htdocs/gcc-6/changes.html	15 Nov 2015 08:01:27 -	1.42
+++ htdocs/gcc-6/changes.html	17 Nov 2015 10:49:56 -
@@ -117,14 +117,28 @@
 non-member functions std::size,
 std::empty, and std::data for
 accessing containers and arrays;
+std::invoke;
 std::shared_mutex;
 std::void_t and std::bool_constant
-utilities. 
+metaprogramming utilities. 
   
+  Thanks to Ville Voutilainen for contributing many of the C++17 features.
 
 An experimental implementation of the File System TS.
 Experimental support for most features of the second version of the
-Library Fundamentals TS.
+Library Fundamentals TS, including polymorphic memory resources and
+array support in shared_ptr, thanks to Fan You.
+Some assertions checked by Debug Mode can now also be enabled by
+_GLIBCXX_ASSERTIONS. The subset of checks enabled by
+the new macro have less run-time overhead than the full
+_GLIBCXX_DEBUG checks and and don't affect the library
+ABI, so can be enabled per-translation unit.
+
+Timed mutex types are supported on more targets, including Darwin.
+
+Improved std::locale support for DragonFly and FreeBSD,
+thanks to John Marino and Andreas Tobler.
+
   
 
 


[PATCH] Add LANG_HOOKS_EMPTY_RECORD_P for C++ empty class

2015-11-17 Thread H.J. Lu
Empty record should be returned and passed the same way in C and C++.
This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which
defaults to return false.  For C++, LANG_HOOKS_EMPTY_RECORD_P is defined
to is_really_empty_class, which returns true for C++ empty classes.  For
LTO, we stream out a bit to indicate if a record is empty and we store
it in TYPE_LANG_FLAG_0 when streaming in.  get_ref_base_and_extent is
changed to set bitsize to 0 for empty records.  Middle-end and x86
backend are updated to ignore empty records for parameter passing and
function value return.  Other targets may need similar changes.

gcc/

PR c++/60336
PR middle-end/67239
PR target/68355
* calls.c (store_one_arg): Use 0 for empty record size.  Don't
push 0 size argument onto stack.
(must_pass_in_stack_var_size_or_pad): Return false for empty
record.
* function.c (locate_and_pad_parm): Use 0 for empty record size.
* tree-dfa.c (get_ref_base_and_extent): Likewise.
* langhooks-def.h (LANG_HOOKS_EMPTY_RECORD_P): New.
(LANG_HOOKS_DECLS): Add LANG_HOOKS_EMPTY_RECORD_P.
* langhooks.h (lang_hooks_for_decls): Add empty_record_p.
* lto-streamer.h (LTO_major_version): Increase by 1 to 6.
* targhooks.c: Include "langhooks.h".
(std_gimplify_va_arg_expr): Use 0 for empty record size.
* tree-streamer-in.c (unpack_ts_base_value_fields): Stream in
TYPE_LANG_FLAG_0.
* tree-streamer-out.c: Include "langhooks.h".
(pack_ts_base_value_fields): Stream out a bit to indicate if a
record is empty.
* config/i386/i386.c (classify_argument): Return 0 for empty
record.
(construct_container): Return NULL for empty record.
(ix86_function_arg): Likewise.
(ix86_function_arg_advance): Skip empty record.
(ix86_return_in_memory): Return false for empty record.
(ix86_gimplify_va_arg): Use 0 for empty record size.

gcc/cp/

PR c++/60336
PR middle-end/67239
PR target/68355
* class.c (is_empty_class): Changed to return bool and take
const_tree.
(is_really_empty_class): Changed to take const_tree.  Check
if TYPE_BINFO is zero.
* cp-tree.h (is_empty_class): Updated.
(is_really_empty_class): Likewise.
* cp-lang.c (LANG_HOOKS_EMPTY_RECORD_P): New.

gcc/lto/

PR c++/60336
PR middle-end/67239
PR target/68355
* lto-lang.c (lto_empty_record_p): New.
(LANG_HOOKS_EMPTY_RECORD_P): Likewise.

gcc/testsuite/

PR c++/60336
PR middle-end/67239
PR target/68355
* g++.dg/abi/empty12.C: New test.
* g++.dg/abi/empty12.h: Likewise.
* g++.dg/abi/empty12a.c: Likewise.
* g++.dg/pr60336-1.C: Likewise.
* g++.dg/pr60336-2.C: Likewise.
* g++.dg/pr68355.C: Likewise.
---
 gcc/calls.c | 41 +++--
 gcc/config/i386/i386.c  | 18 +++-
 gcc/cp/class.c  | 17 ---
 gcc/cp/cp-lang.c|  2 ++
 gcc/cp/cp-tree.h|  4 ++--
 gcc/function.c  |  7 +--
 gcc/langhooks-def.h |  2 ++
 gcc/langhooks.h |  3 +++
 gcc/lto-streamer.h  |  2 +-
 gcc/lto/lto-lang.c  | 13 
 gcc/targhooks.c |  6 +-
 gcc/testsuite/g++.dg/abi/empty12.C  | 17 +++
 gcc/testsuite/g++.dg/abi/empty12.h  |  9 
 gcc/testsuite/g++.dg/abi/empty12a.c |  6 ++
 gcc/testsuite/g++.dg/pr60336-1.C| 17 +++
 gcc/testsuite/g++.dg/pr60336-2.C| 28 +
 gcc/testsuite/g++.dg/pr68355.C  | 24 ++
 gcc/tree-dfa.c  |  2 ++
 gcc/tree-streamer-in.c  |  5 +
 gcc/tree-streamer-out.c |  6 ++
 20 files changed, 204 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/empty12.C
 create mode 100644 gcc/testsuite/g++.dg/abi/empty12.h
 create mode 100644 gcc/testsuite/g++.dg/abi/empty12a.c
 create mode 100644 gcc/testsuite/g++.dg/pr60336-1.C
 create mode 100644 gcc/testsuite/g++.dg/pr60336-2.C
 create mode 100644 gcc/testsuite/g++.dg/pr68355.C

diff --git a/gcc/calls.c b/gcc/calls.c
index b56556a..ecc9b7a 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -4835,7 +4835,10 @@ store_one_arg (struct arg_data *arg, rtx argblock, int 
flags,
 Note that in C the default argument promotions
 will prevent such mismatches.  */
 
-  size = GET_MODE_SIZE (arg->mode);
+  if (lang_hooks.decls.empty_record_p (TREE_TYPE (pval)))
+   size = 0;
+  else
+   size = GET_MODE_SIZE (arg->mode);
   /* Compute how much space the push instruction will push.
 On many machines, pushing a byte will advance the stack
 

[PATCH, PR middle-end/68134] Reject scalar modes in default get_mask_mode hook

2015-11-17 Thread Ilya Enkovich
Hi,

Default hook for get_mask_mode is supposed to return integer vector modes.  
This means it should reject calar modes returned by mode_for_vector.  
Bootstrapped and regtested on x86_64-unknown-linux-gnu, regtested on 
aarch64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-17  Ilya Enkovich  

PR middle-end/68134
* targhooks.c (default_get_mask_mode): Filter out
scalar modes returned by mode_for_vector.

gcc/testsuite/

2015-11-17  Ilya Enkovich  

PR middle-end/68134
* gcc.dg/pr68134.c: New test.


diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index c34b4e9..66d983b 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1093,8 +1093,8 @@ default_get_mask_mode (unsigned nunits, unsigned 
vector_size)
   gcc_assert (elem_size * nunits == vector_size);
 
   vector_mode = mode_for_vector (elem_mode, nunits);
-  if (VECTOR_MODE_P (vector_mode)
-  && !targetm.vector_mode_supported_p (vector_mode))
+  if (!VECTOR_MODE_P (vector_mode)
+  || !targetm.vector_mode_supported_p (vector_mode))
 vector_mode = BLKmode;
 
   return vector_mode;
diff --git a/gcc/testsuite/gcc.dg/pr68134.c b/gcc/testsuite/gcc.dg/pr68134.c
new file mode 100644
index 000..522b4c6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68134.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c99" } */
+
+#include 
+
+typedef double float64x1_t __attribute__ ((vector_size (8)));
+typedef uint64_t uint64x1_t;
+
+void
+foo (void)
+{
+  float64x1_t arg1 = (float64x1_t) 0x3fedf9d4343c7c80;
+  float64x1_t arg2 = (float64x1_t) 0x3fcdc53742ea9c40;
+  uint64x1_t result = (uint64x1_t) (arg1 == arg2);
+  uint64_t got = result;
+  uint64_t exp = 0;
+  if (got != 0)
+__builtin_abort ();
+}


Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-17 Thread Sebastian Huber



On 05/11/15 16:22, Daniel Gutson wrote:

On Wed, Nov 4, 2015 at 3:20 AM, Jonathan Wakely  wrote:

>On 4 November 2015 at 02:11, Daniel Gutson wrote:

>>Since this is a nothrow new, we thought that probably the system
>>might not be exceptions-friendly (such as certain embedded systems),
>>so we wanted to provide the new_handler the ability to do something else
>>other than trying to allocate memory and keep the function iterating.

>
>That could be done using an alternative overload of operator new
>instead of altering the semantics of the standard one (that could be
>provided as a GNU extension, for example).
>

>>In fact, our idea is that since the nothrow new can indeed return nullptr,
>>let the new_handler do something else and leave the no-memory-consequence
>>to the caller.
>>This new flag enables just that.

>
>
>The default configuration already allows the caller to deal with
>allocation failure from the nothrow version of new, by not installing
>a new-handler installed, and dealing with a null return value. How
>would I use this alternative configuration? Since the behaviour only
>changes when a new-handler is installed, presumably I'm meant to
>install some kind of new-handler that does something else ... but
>what? The patch should include documentation explaining when and how
>to use this option.

Real use cases: statistics and logging. It's a (one time) callback
reporting that something went wrong,
but not intended to fix things e.g. by attempting to free more memory.


For statistics and logging you may also use the --wrap option of GNU ld.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



[patch, doc] fix PR53587, missing documentation for -mms-bitfields

2015-11-17 Thread Sandra Loosemore
I've checked in this patch to fix PR53587, which is about missing 
documentation for the -mms-bitfields command-line option for x86.  It 
turns out there *was* documentation, but it was buried in the discussion 
of the corresponding variable attributes with no pointers in the option 
summary or index.  I thought the primary documentation should be in 
options.texi with pointers to it from the variable and type attribute 
entries, rather than vice-versa, so I've checked in this patch to move 
things around and add more cross-references.


-Sandra

2015-11-17  Sandra Loosemore  

	PR target/53587
	* doc/invoke.texi (Option Summary): Add -mms-bitfields to x86
	option list.
	(x86 Options): Add -mms-bitfields and -mno-ms-bitfields.  Move
	discussion of the Microsoft structure layout details here from
	its former home in extend.texi.
	* doc/extend.texi (x86 Variable Attributes): Replace detailed
	discussion with pointer to its new location.  Add cross-reference
	to corresponding type attributes.
	(x86 Type Attributes): Add cross-references to command-line options
	and variable attributes.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 230466)
+++ gcc/doc/invoke.texi	(working copy)
@@ -1103,7 +1103,7 @@ See RS/6000 and PowerPC Options.
 -mprefetchwt1 -mclflushopt -mxsavec -mxsaves @gol
 -msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol
 -mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mmwaitx -mthreads @gol
--mno-align-stringops  -minline-all-stringops @gol
+-mms-bitfields -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
 -mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -23431,6 +23431,142 @@ on thread-safe exception handling must c
 @option{-D_MT}; when linking, it links in a special thread helper library
 @option{-lmingwthrd} which cleans up per-thread exception-handling data.
 
+@item -mms-bitfields
+@itemx -mno-ms-bitfields
+@opindex mms-bitfields
+@opindex mno-ms-bitfields
+
+Enable/disable bit-field layout compatible with the native Microsoft
+Windows compiler.  
+
+If @code{packed} is used on a structure, or if bit-fields are used,
+it may be that the Microsoft ABI lays out the structure differently
+than the way GCC normally does.  Particularly when moving packed
+data between functions compiled with GCC and the native Microsoft compiler
+(either via function call or as data in a file), it may be necessary to access
+either format.
+
+This option is enabled by default for Microsoft Windows
+targets.  This behavior can also be controlled locally by use of variable
+or type attributes.  For more information, see @ref{x86 Variable Attributes}
+and @ref{x86 Type Attributes}.
+
+The Microsoft structure layout algorithm is fairly simple with the exception
+of the bit-field packing.  
+The padding and alignment of members of structures and whether a bit-field 
+can straddle a storage-unit boundary are determine by these rules:
+
+@enumerate
+@item Structure members are stored sequentially in the order in which they are
+declared: the first member has the lowest memory address and the last member
+the highest.
+
+@item Every data object has an alignment requirement.  The alignment requirement
+for all data except structures, unions, and arrays is either the size of the
+object or the current packing size (specified with either the
+@code{aligned} attribute or the @code{pack} pragma),
+whichever is less.  For structures, unions, and arrays,
+the alignment requirement is the largest alignment requirement of its members.
+Every object is allocated an offset so that:
+
+@smallexample
+offset % alignment_requirement == 0
+@end smallexample
+
+@item Adjacent bit-fields are packed into the same 1-, 2-, or 4-byte allocation
+unit if the integral types are the same size and if the next bit-field fits
+into the current allocation unit without crossing the boundary imposed by the
+common alignment requirements of the bit-fields.
+@end enumerate
+
+MSVC interprets zero-length bit-fields in the following ways:
+
+@enumerate
+@item If a zero-length bit-field is inserted between two bit-fields that
+are normally coalesced, the bit-fields are not coalesced.
+
+For example:
+
+@smallexample
+struct
+ @{
+   unsigned long bf_1 : 12;
+   unsigned long : 0;
+   unsigned long bf_2 : 12;
+ @} t1;
+@end smallexample
+
+@noindent
+The size of @code{t1} is 8 bytes with the zero-length bit-field.  If the
+zero-length bit-field were removed, @code{t1}'s size would be 4 bytes.
+
+@item If a zero-length bit-field is inserted after a bit-field, @code{foo}, and the
+alignment of the zero-length bit-field is greater than the member that follows it,
+@code{bar}, @code{bar} is aligned as the type of the zero-length bit-field.
+
+For example:
+
+@smallexample

C++ PATCHes for bootstrap/68346, 68361

2015-11-17 Thread Jason Merrill

A couple of bootstrap issues on some targets:

68346: My earlier change to avoid folding the arguments to 
warn_tautological_cmp wasn't quite right, either.  This patch folds 
within the function, at the place where we are interested in a constant 
value.


68361: The way we were trying to suppress -Wparentheses before wasn't 
effective enough.  Let's actually turn off the flag around the relevant 
convert call.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 56ca181e3d438fb9b95c865fde6e77f5335c791a
Author: Jason Merrill 
Date:   Tue Nov 17 10:51:24 2015 -0500

	PR bootstrap/68361
	* cvt.c (cp_convert_and_check): Use warning_sentinel to suppress
	-Wparentheses.

diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 0231efc..ebca004 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -644,7 +644,7 @@ cp_convert_and_check (tree type, tree expr, tsubst_flags_t complain)
   else
 	{
 	  /* Avoid bogus -Wparentheses warnings.  */
-	  TREE_NO_WARNING (folded) = true;
+	  warning_sentinel w (warn_parentheses);
 	  folded_result = cp_convert (type, folded, tf_none);
 	}
   folded_result = fold_simple (folded_result);
diff --git a/gcc/testsuite/g++.dg/warn/Wparentheses-28.C b/gcc/testsuite/g++.dg/warn/Wparentheses-28.C
new file mode 100644
index 000..f6636cb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wparentheses-28.C
@@ -0,0 +1,14 @@
+// PR bootstrap/68361
+// { dg-options -Wparentheses }
+
+struct A
+{
+  int p: 2;
+};
+
+A a, b;
+
+int main()
+{
+  bool t = (a.p = b.p);
+}
commit 7489a9400cffa4c1010debaf2d86dcd286ce1cfd
Author: Jason Merrill 
Date:   Tue Nov 17 12:56:00 2015 -0500

	PR bootstrap/68346

	* c-common.c (warn_tautological_cmp): Fold before checking for
	constants.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 06d857c..f50ca48 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -1924,7 +1924,7 @@ warn_tautological_cmp (location_t loc, enum tree_code code, tree lhs, tree rhs)
 
   /* We do not warn for constants because they are typical of macro
  expansions that test for features, sizeof, and similar.  */
-  if (CONSTANT_CLASS_P (lhs) || CONSTANT_CLASS_P (rhs))
+  if (CONSTANT_CLASS_P (fold (lhs)) || CONSTANT_CLASS_P (fold (rhs)))
 return;
 
   /* Don't warn for e.g.
diff --git a/gcc/testsuite/g++.dg/warn/Wtautological-compare2.C b/gcc/testsuite/g++.dg/warn/Wtautological-compare2.C
new file mode 100644
index 000..9d9060d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wtautological-compare2.C
@@ -0,0 +1,11 @@
+// PR bootstrap/68346
+// { dg-options -Wtautological-compare }
+
+#define INVALID_REGNUM			(~(unsigned int) 0)
+#define PIC_OFFSET_TABLE_REGNUM INVALID_REGNUM
+
+int main()
+{
+  if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
+__builtin_abort();
+}


Re: [PATCH 00/16] Unit tests framework (v3)

2015-11-17 Thread Jeff Law

On 11/17/2015 05:51 AM, Bernd Schmidt wrote:

On 11/17/2015 02:53 AM, Mike Stump wrote:

On Nov 16, 2015, at 3:12 PM, Jeff Law  wrote:

So I'd tend to want them either at the end of the file with a
single #if CHECKING_P or as a separate foo-tests file.


Hum…  I kinda don’t want the main files mucked up with tests.  I
think I’d rather have

#if CHECKING_P #include "test/expr-test.h" #endif

at the end, and punt the whole lot into a single subdirectory that
most people, most of the time, can simply ignore.  Wading through a
ton of code that you aren’t interested in, is, well, annoying.


Most of the tests submitted so far are relatively tiny, sometimes the
list of #includes in the testcase is longer than the tests themselves.
If they are at the end of a file you'd hardly be wading through them
either. Let's just use common sense and make separate files if we ever
get huge amounts of test code and keep it simple otherwise.
I'd been pondering a #include solution too.  But at this stage I don't 
think it buys us anything significant beyond just putting them at the 
end of the source files.  Obviously if we find that the tests are 
intrusive, then we can adjust.


One could legitimately ask what about tests that hit multiple source 
files.  The snarky response is that such tests aren't really suitable 
for unit testing :-)  But for those we can create a separate source 
file, or put them into the most logical location.


Jeff


[PATCH] fix c++/68308 - [6 Regression] ICE: tree check: expected integer_cst

2015-11-17 Thread Martin Sebor

Attached is a patch fixing the ICE caused by a prior change of mine:

  https://gcc.gnu.org/viewcvs/gcc?view=revision=230081

Tested on x86_64, committing to trunk as per Jason via IRC.

Martin

gcc/ChangeLog:
2015-11-17  Martin Sebor  

PR c++/68308
* cp/init.c (build_new_1): Check for expression constness
the right way.

testsuite/ChangeLog:
2015-11-17  Martin Sebor  

PR c++/68308
* g++.dg/init/new46.C: New test.

Index: gcc/cp/init.c
===
--- gcc/cp/init.c	(revision 230463)
+++ gcc/cp/init.c	(working copy)
@@ -2715,7 +2715,7 @@
 
   size = size_binop (MULT_EXPR, size, fold_convert (sizetype, nelts));
 
-  if (TREE_CONSTANT (outer_nelts))
+  if (INTEGER_CST == TREE_CODE (outer_nelts))
 	{
 	  if (tree_int_cst_lt (max_outer_nelts_tree, outer_nelts))
 	{
@@ -3330,7 +3330,8 @@
 	 non-class type and its value before converting to std::size_t is
 	 less than zero. ... If the expression is a constant expression,
 	 the program is ill-fomed.  */
-  if (TREE_CONSTANT (cst_nelts) && tree_int_cst_sgn (cst_nelts) == -1)
+  if (INTEGER_CST == TREE_CODE (cst_nelts)
+	  && tree_int_cst_sgn (cst_nelts) == -1)
 	{
 	  if (complain & tf_error)
 	error ("size of array is negative");
Index: gcc/testsuite/g++.dg/init/new46.C
===
--- gcc/testsuite/g++.dg/init/new46.C	(revision 0)
+++ gcc/testsuite/g++.dg/init/new46.C	(working copy)
@@ -0,0 +1,65 @@
+// { dg-do compile }
+// { dg-options "-Wall" }
+
+// Test for c++/68308 - [6 Regression] ICE: tree check: expected integer_cst,
+//  have var_decl in decompose, at tree.h:5105
+
+typedef __typeof__ (sizeof 0) size_t;
+
+// Not defined, only referenced in templates that aren't expected
+// to be instantiated to make sure they really aren't to verify
+// verify c++/68308.
+template  void inst_check ();
+
+// Not instantiated (must not be diagnosed).
+template 
+char* fn1_x () {
+const size_t a = sizeof (T);
+return inst_check() ? new char [a] : 0;
+}
+
+// Not instantiated (must not be diagnosed).
+template 
+char* fn2_1_x () {
+return inst_check() ? new char [N] : 0;
+}
+
+template 
+char* fn2_1 () {
+return new char [N];
+}
+
+// Not instantiated (must not be diagnosed).
+template 
+char* fn2_2_x () {
+return inst_check() ? new char [M][N] : 0;
+}
+
+template 
+char* fn2_2 () {
+return new char [M][N];   // { dg-error "size of array is too large" }
+}
+
+// Not instantiated (must not be diagnosed).
+template 
+T* fn3_x () {
+const size_t a = sizeof (T);
+return inst_check() ? new T [a] : 0;
+}
+
+template 
+T* fn3 () {
+const size_t a = sizeof (T);
+return new T [a]; // { dg-error "size of array is too large" }
+}
+
+
+struct S { char a [__SIZE_MAX__ / 8]; };
+
+void foo ()
+{
+fn2_1<1>();
+fn2_1<__SIZE_MAX__ / 4>();
+fn2_2<__SIZE_MAX__ / 4, 4>();
+fn3();
+}


[committed] Remove dead macros

2015-11-17 Thread Richard Sandiford
Nothing uses these macros and removing them makes it more likely
that future code will use CASE_CFN_* instead.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
Applied as obvious.

Thanks,
Richard


gcc/
* tree.h (BUILTIN_EXP10_P, BUILTIN_EXPONENT_P, BUILTIN_SQRT_P)
(BUILTIN_CBRT_P, BUILTIN_ROOT_P): Delete.

diff --git a/gcc/tree.h b/gcc/tree.h
index b9c400c..41c0f7c 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -233,22 +233,6 @@ as_internal_fn (combined_fn code)
 
 /* Helper macros for math builtins.  */
 
-#define BUILTIN_EXP10_P(FN) \
- ((FN) == BUILT_IN_EXP10 || (FN) == BUILT_IN_EXP10F || (FN) == BUILT_IN_EXP10L 
\
-  || (FN) == BUILT_IN_POW10 || (FN) == BUILT_IN_POW10F || (FN) == 
BUILT_IN_POW10L)
-
-#define BUILTIN_EXPONENT_P(FN) (BUILTIN_EXP10_P (FN) \
-  || (FN) == BUILT_IN_EXP || (FN) == BUILT_IN_EXPF || (FN) == BUILT_IN_EXPL \
-  || (FN) == BUILT_IN_EXP2 || (FN) == BUILT_IN_EXP2F || (FN) == BUILT_IN_EXP2L)
-
-#define BUILTIN_SQRT_P(FN) \
- ((FN) == BUILT_IN_SQRT || (FN) == BUILT_IN_SQRTF || (FN) == BUILT_IN_SQRTL)
-
-#define BUILTIN_CBRT_P(FN) \
- ((FN) == BUILT_IN_CBRT || (FN) == BUILT_IN_CBRTF || (FN) == BUILT_IN_CBRTL)
-
-#define BUILTIN_ROOT_P(FN) (BUILTIN_SQRT_P (FN) || BUILTIN_CBRT_P (FN))
-
 #define CASE_FLT_FN(FN) case FN: case FN##F: case FN##L
 #define CASE_FLT_FN_REENT(FN) case FN##_R: case FN##F_R: case FN##L_R
 #define CASE_INT_FN(FN) case FN: case FN##L: case FN##LL: case FN##IMAX



Re: [PATCH] PR fortran/59910 -- structure constructor in DATA statement

2015-11-17 Thread Dominique d'Humières
> … but I suspect gfc_reduce_init_expr() 
> may be useful for PARAMETER statements as well (need to
> check this!).

As in the following test

  module m
implicit none
type t
  integer :: i
end type t
type(t), dimension(2), parameter :: a1  = (/ t(1), t(2) /)
type(t), dimension(1), parameter :: c = spread ( a1(1), 1, 1 )
  end module m

? Compiling it with the patch gives the ICE

f951: internal compiler error: in gfc_conv_array_initializer, at 
fortran/trans-array.c:5704

Otherwise the test succeeds with the patch.

Thanks for working on these PRs,

Dominique



Re: [PATCH 5/5] [AARCH64] Add variant support to -m*=native and add thunderxt88pass1.

2015-11-17 Thread Joseph Myers
invoke.texi needs updating for thunderxt88pass1 support.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

2015-11-17 Thread Bernd Schmidt

On 11/17/2015 02:03 PM, Kyrill Tkachov wrote:

+ || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn
return false;



Well, I think the statement we want to make is
"return false from this function if the two expressions contain the same
register number".


I looked at it again and I think I'll need more time to really figure 
out what's going on in this pass.


However, about the above... either I'm very confused, or the actual 
statement here is "return false _unless_ the two expressions contain the 
same register number". In the testcase, the regs in question are ax and 
bx, which is then rejected if the patch has been applied.


(gdb) p tmp_reg
(reg:SI 0 ax)
(gdb) p cand->insn
(insn 59 117 60 7 (set (reg:SI 4 si [orig:107 D.1813 ] [107])
(sign_extend:SI (reg:HI 3 bx [orig:99 D.1813 ] [99])))

And I think it would be better to strengthen that to "return false 
unless registers are identical". (From the above it's clear however that 
a plain rtx_equal_p wouldn't work since there's an extension in the 
operand).


Also, I had another look at the testcase. It uses __builtin_printf and 
dg-output, which is at least unusual. Try to use the normal "if (cond) 
abort ()".



Bernd


POWERPC64_TOC_POINTER_ALIGNMENT

2015-11-17 Thread Alan Modra
David noticed that gcc112 was generating gcc/auto-host.h with
#define POWERPC64_TOC_POINTER_ALIGNMENT 32768

This is not the correct value of either 8 or 256 depending on how old
ld is.  On investigating I found the cause is Fedora 21 modifying the
toolchain to default to -z relro.  ld -z relro puts the relro gap just
before .got (prior to my patches reordering sections for relro on
powerpc64).  That unfortunately aligns .got, defeating the deliberate
mis-alignment of .got in the testcase.

Fixed with the following obvious patch and committed to mainline.

Incidentally, bootstrap fails for me on powerpc64 due to "comparison
is always true due to limited range of data type [-Wtype-limits]"
&& GET_MODE_SIZE (mode) <= POWERPC64_TOC_POINTER_ALIGNMENT));
since my POWERPC64_TOC_POINTER_ALIGNMENT is 256 and mode_size is an
unsigned char array.  Grrr, so what code obfuscation do we use here to
work around this annoying warning?

Cross compiling from x86_64..
/src/gcc-virgin/configure \
--with-sysroot=/powerpc64le-linux --prefix=/usr/local \
--target=powerpc64le-linux --with-cpu=power8 \
--enable-targets=powerpc64-linux,powerpc-linux,powerpcle-linux \
--disable-multilib --disable-nls --enable-__cxa_atexit \
--enable-gnu-indirect-function --enable-secureplt --with-long-double-128 \
--enable-languages=all,go

..fails with:
In file included from /src/gcc-virgin/libgcc/libgcov-driver.c:49:0:
/src/gcc-virgin/libgcc/../gcc/gcov-io.c: In function 'gcov_do_dump':
/src/gcc-virgin/libgcc/../gcc/gcov-io.c:731:51: internal compiler error: in 
convert_move, at expr.c:286
   r = sizeof (long long) * __CHAR_BIT__ - 1 - __builtin_clzll (v);
   ^~~
0x7d5250 convert_move(rtx_def*, rtx_def*, int)
/src/gcc-virgin/gcc/expr.c:286
0x8b0e67 expand_direct_optab_fn
/src/gcc-virgin/gcc/internal-fn.c:2132
0x6cfd9b expand_call_stmt
/src/gcc-virgin/gcc/cfgexpand.c:2565
0x6cfd9b expand_gimple_stmt_1
/src/gcc-virgin/gcc/cfgexpand.c:3525
0x6cfd9b expand_gimple_stmt
/src/gcc-virgin/gcc/cfgexpand.c:3688
0x6d171e expand_gimple_basic_block
/src/gcc-virgin/gcc/cfgexpand.c:5694
0x6d7f96 execute
/src/gcc-virgin/gcc/cfgexpand.c:6309

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 230508)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2015-11-18  Alan Modra  
+
+   * configure.ac (POWERPC64_TOC_POINTER_ALIGNMENT): Pass -z norelro
+   to ld.
+   * configure: Regenerate.
+
 2015-11-17  Tom de Vries  
 
* tree-ssa-loop.c (pass_tree_loop_init::execute): Improve comments.
Index: gcc/configure.ac
===
--- gcc/configure.ac(revision 230508)
+++ gcc/configure.ac(working copy)
@@ -5257,7 +5257,7 @@
 x: .quad .TOC.
 EOF
   if $gcc_cv_as -a64 -o conftest.o conftest.s > /dev/null 2>&1 \
- && $gcc_cv_ld $emul_name -o conftest conftest.o > /dev/null 2>&1; then
+ && $gcc_cv_ld $emul_name -z norelro -o conftest conftest.o > 
/dev/null 2>&1; then
 gcc_cv_ld_toc_align=`$gcc_cv_nm conftest | ${AWK} '/\.TOC\./ { match 
($0, "0[[[:xdigit:]]]*", a); print strtonum ("0x" substr(a[[0]], 
length(a[[0]])-3)) }'`
   fi
   rm -f conftest conftest.o conftest.s

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] g++.dg/cpp1y/pr58708.C wchar_t size

2015-11-17 Thread Mike Stump
On Nov 17, 2015, at 8:50 AM, David Edelsohn  wrote:
> 
> Thanks for the pointer.  How about the following?

Ok.

sizeof (*wfoo) or sizeof (wchar_t) or some such might be even more portable.

> 
> Thanks, David
> 
> 
> Index: pr58708.C
> ===
> --- pr58708.C   (revision 230463)
> +++ pr58708.C   (working copy)
> @@ -43,7 +43,11 @@
>   if (foo.chars[1] != 98) __builtin_abort();
>   if (foo.chars[2] != 99) __builtin_abort();
> 
> -  auto wfoo = L"\x01020304\x05060708"_foo;
> +#if __SIZEOF_WCHAR_T__ == 2
> +auto wfoo = L"\x0102\x0304"_foo;
> +#else
> +auto wfoo = L"\x01020304\x05060708"_foo;
> +#endif
>   if (is_same::value != true)
> __builtin_abort();
>   if (sizeof(wfoo.chars)/sizeof(wchar_t) != 2) __builtin_abort();
>   if (wfoo.chars[0] != 16909060) __builtin_abort();



Re: vector lightweight debug mode

2015-11-17 Thread François Dumont
On 16/11/2015 11:29, Jonathan Wakely wrote:
> On 15/11/15 22:12 +0100, François Dumont wrote:
>> Here is a last version I think.
>>
>>I completed the debug light mode by adding some check on iterator
>> ranges.
>>
>>Even if check are light I made some changes to make sure that
>> internally vector is not using methods instrumented with those checks.
>> This is to make sure checks are not done several times. Doing so also
>> simplify normal mode especially when using insert range, there is no
>> need to check if parameters are integers or not.
>
> Yes, I'd also observed that those improvements could be made, to avoid
> dispatching when we already know we have iterators not integers.

I will keep those simplification even if I remove some checks.

>
>>I also introduce some __builtin_expect to make sure compiler will
>> prefer the best path.
>>
>>I didn't manage to check result on generated code. I am pretty sure
>> there will be an impact, you can't run more code without impact. But
>> that is a known drawback of debug mode, light or not, we just need to
>> minimize it. Mostly by making sure that checks are done only once.
>
> Not doing the checks is also an option. That minimizes the cost :-)

This is controlled by a macro, users already have this option.

>
> For the full debug mode we want to check everything we can, and accept
> that has a cost.
>
> For the lightweight one we need to evaluate the relative benefits. Is
> it worth adding checks for errors that only happen rarely? Does the
> benefit outweigh the cost?
>
> I'm still not convinced that's the case for the "valid range" checks.
> I'm willing to be convinced, but am not convinced yet.

Ok so I will remove this check. And what about insert position check ? I
guess this one too so I will remove it too. Note that will only remain
checks on the most basic operations that is to say those on which the
check will have the biggest impact proportionally.

I would like we push the simplest version so that people can start
experimenting.

I would also prefer concentrate on _GLIBCXX_DEBUG mode :-)

>
>>It would be great to have it for gcc 6.0. I am working on the same
>> for other containers.
>
> Please don't do the valid range checks for std::deque, the checks are
> undefined for iterators into different containers and will not give a
> reliable answer.

But debug mode is full of those checks, no ?

François


Re: Incorrect code due to indirect tail call of varargs function with hard float ABI

2015-11-17 Thread Kugan


On 17/11/15 21:05, Ramana Radhakrishnan wrote:
> Hi Kugan,
> 
> It does look like an issue.
> 
> Please open a bug report.
> 
>>
>>
>> On 17/11/15 12:00, Charles Baylis wrote:
>>> On 16 November 2015 at 22:24, Kugan  
>>> wrote:
>>>
 Please note that we have a sibcall from "broken" to "indirect".

 "direct" is variadic function so it is conforming to AAPCS base standard.

 "broken" is a non-variadic function and will return the value in
 floating point register for TARGET_HARD_FLOAT. Thus we should not be
 doing sibcall here.

 Attached patch fixes this. Bootstrap and regression testing is ongoing.
 Is this OK if no issues with the testing?
>>>
>>> Hi Kugan,
>>>
>>> It looks like this patch should work, but I think this is an overly
>>> conservative fix, as it prevents all sibcalls for hardfloat targets.
>>> It would be better if only variadic sibcalls were prevented on
>>> hardfloat. You can check for variadic calls by checking the
>>> function_type in the call expression (exp) using stdarg_p().
>>>
>>> As an example to show how to test for variadic function calls, this is
>>> how to test it in gdb:
>>>
>>> (gdb) b arm_function_ok_for_sibcall
>>> Breakpoint 1 at 0xdae59c: file
>>> /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c, line 6634.
>>> (gdb) r
>>> ...
>>> Breakpoint 1, arm_function_ok_for_sibcall (decl=0x0, exp=0x76104ce8)
>>> at /home/cbaylis/srcarea/gcc/gcc-git/gcc/config/arm/arm.c:6634
>>> 6634  if (cfun->machine->sibcall_blocked)
>>> (gdb) print debug_tree(exp)
>>>  >> type >> size 
>>> unit size 
>>> align 64 symtab 0 alias set -1 canonical type 0x762835e8
>>> precision 64
>>> pointer_to_this >
>>> side-effects addressable
>>> fn >> type >> 0x760e9348>
>>> ...
>>> (gdb) print stdarg_p((tree)0x760e9348)<--- from function_type ^
>>> $2 = true
>>>
>>
>> How about:
> 
> 
> 
> A run time testcase and a changelog would also be needed.
> 
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index a379121..2376d66 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -6681,6 +6681,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
>>  register.  */
>>rtx a, b;
>>
>> +  /* If it is an indirect function pointer, get the function type.  */
>> +  if (!decl
>> + && POINTER_TYPE_P (TREE_TYPE (CALL_EXPR_FN (exp)))
>> + && (TREE_CODE (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp
>> + == FUNCTION_TYPE))
>> +   decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
>> +
> 
> If decl is null it's guaranteed to be an indirect function call - drop the 
> additional checks in the if clause.
> 
> 
>>a = arm_function_value (TREE_TYPE (exp), decl, false);
>>b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
>>   cfun->decl, false);
>>
> 
> 
> Please resubmit with a testcase, Changelog and after testing.

Hi Ramana,

Thanks for the review. I have opened a gcc bug-report for this. I tested
the attached patch for  arm-none-linux-gnueabihf and
arm-none-linux-gnueabi with no new regressions. Is this OK?


Thanks,
Kugan

gcc/ChangeLog:

2015-11-18  Kugan Vivekanandarajah  

PR target/68390
* config/arm/arm.c (arm_function_ok_for_sibcall): Get function type
for indirect function call.

gcc/testsuite/ChangeLog:

2015-11-18  Kugan Vivekanandarajah  

PR target/68390
* gcc.target/arm/PR68390.c: New test.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a379121..a4509f4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6681,6 +6681,10 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
 register.  */
   rtx a, b;
 
+  /* If it is an indirect function pointer, get the function type.  */
+  if (!decl)
+   decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
+
   a = arm_function_value (TREE_TYPE (exp), decl, false);
   b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
  cfun->decl, false);
diff --git a/gcc/testsuite/gcc.target/arm/PR68390.c 
b/gcc/testsuite/gcc.target/arm/PR68390.c
index e69de29..86f07fe 100644
--- a/gcc/testsuite/gcc.target/arm/PR68390.c
+++ b/gcc/testsuite/gcc.target/arm/PR68390.c
@@ -0,0 +1,27 @@
+/* { dg-do run }  */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline))
+double direct(int x, ...)
+{
+  return x*x;
+}
+
+__attribute__ ((noinline))
+double broken(double (*indirect)(int x, ...), int v)
+{
+  return indirect(v);
+}
+
+int main ()
+{
+  double d1, d2;
+  int i = 2;
+  d1 = broken (direct, i);
+  if (d1 != i*i)
+{
+  __builtin_abort ();
+}
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/variadic_sibcall.c 
b/gcc/testsuite/gcc.target/arm/variadic_sibcall.c
deleted file mode 100644
index 

[patch] libstdc++/66059 optimise std::make_integer_sequence

2015-11-17 Thread Jonathan Wakely

I've been talking about a compiler built-in to implement
make_integer_sequence since before the proposal even made it into the
standard, so I tried to implement one that would allow:

template
 using make_integer_sequence = integer_sequence< __intseq(_Tp, _Num) >;

But I don't know the front-end well enough to make that work.

Instead here's a much more efficient implementation that takes a
divide-and-conquer approach rather than building the sequence
linearly.

Tested powerpc64le-linux, committed to trunk.


commit 4bd998a906972eb9ae47ffd87f28b17816112318
Author: Jonathan Wakely 
Date:   Tue Nov 17 17:55:25 2015 +

PR libstdc++/66059 optimise _Build_index_tuple

	PR libstdc++/66059
	* include/std/utility (_Build_index_tuple): Optimise.

diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index 89b6852..985bcb2 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -212,17 +212,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // Stores a tuple of indices.  Used by tuple and pair, and by bind() to
   // extract the elements in a tuple.
-  template
-struct _Index_tuple
+  template struct _Index_tuple { };
+
+  // Concatenates two _Index_tuples.
+  template struct _Itup_cat;
+
+  template
+struct _Itup_cat<_Index_tuple<_Ind1...>, _Index_tuple<_Ind2...>>
 {
-  typedef _Index_tuple<_Indexes..., sizeof...(_Indexes)> __next;
+  using __type = _Index_tuple<_Ind1..., (_Ind2 + sizeof...(_Ind1))...>;
 };
 
   // Builds an _Index_tuple<0, 1, 2, ..., _Num-1>.
   template
 struct _Build_index_tuple
+: _Itup_cat::__type,
+		typename _Build_index_tuple<_Num - _Num / 2>::__type>
+{ };
+
+  template<>
+struct _Build_index_tuple<1>
 {
-  typedef typename _Build_index_tuple<_Num - 1>::__type::__next __type;
+  typedef _Index_tuple<0> __type;
 };
 
   template<>


[PATCH] PR fortran/59910 -- structure constructor in DATA statement

2015-11-17 Thread Steve Kargl
Here's what looks like a fairly simple patch, but it leads
to a question.  Why does gfortran not try to reduce the 
components in a structure constructor in general?  I've
hidden the gfc_reduce_init_expr() behind a check for a
DATA statement, but I suspect gfc_reduce_init_expr() 
may be useful for PARAMETER statements as well (need to
check this!).

Anyway, the patch has been built and tested on x86_64-*-freebsd.
A slightly different patch was built and tested on i386-*-freebsd.

OK to commit?

2015-11-17  Steven G. Kargl  

PR fortran/59910
* primary.c (gfc_match_structure_constructor): Reduce a structure
constructor in a DATA statement.

2015-11-17  Steven G. Kargl  

PR fortran/59910
* gfortran.dg/pr59910.f90:

-- 
Steve
Index: gcc/fortran/primary.c
===
--- gcc/fortran/primary.c	(revision 230497)
+++ gcc/fortran/primary.c	(working copy)
@@ -2722,6 +2722,12 @@ gfc_match_structure_constructor (gfc_sym
   return MATCH_ERROR;
 }
 
+  /* If a structure constructor is in a DATA statement, then each entity
+ in the structure constructor must be a constant.  Try to reduce the
+ expression here.  */
+  if (gfc_in_match_data ())
+gfc_reduce_init_expr (e);
+
   *result = e;
   return MATCH_YES;
 }
Index: gcc/testsuite/gfortran.dg/pr59910.f90
===
--- gcc/testsuite/gfortran.dg/pr59910.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr59910.f90	(working copy)
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! PR fortran/59910
+!
+program main
+  implicit none
+  type bar
+  integer :: limit(1)
+  end type
+  type (bar) :: testsuite
+  data testsuite / bar(reshape(source=[10],shape=[1])) /
+end


Re: RFA (GGC): PATCH to support GGC finalizers with PCH

2015-11-17 Thread Jason Merrill

On 11/17/2015 09:39 AM, Richard Biener wrote:

On Tue, Nov 17, 2015 at 3:09 PM, Jason Merrill  wrote:

While I was looking at the interaction of delayed folding with GGC, I
noticed that ggc_handle_finalizers currently runs no finalizers if
G.context_depth != 0.  So any GC objects in a greater depth will still be
collected, but they won't have their finalizers run.  This specifically
affects compiles that use a PCH file, since G.context_depth is set to 1
after loading the PCH.

This patch fixes ggc_handle_finalizers to look at the depth of each
finalizer so that we still don't try to run finalizers for non-collectable
objects loaded from the PCH, but we do run finalizers for collectable
objects allocated after loading the PCH.

I ended up not relying on this for delayed folding, but it still seems like
a good bug fix.

Tested x86_64-pc-linux-gnu.  OK for trunk?


Hmm, this enlarges finalizer/vec_finalizer.  Wouldn't it be better to
add separate finalizer vectors for context_depth != 0?  (I'm proposing
to add one for exactly context_depth == 1)

When is context_depth increased other than for PCH?


That seems to be the only place it's changed currently.  I was assuming 
that the generalized way ggc-page handles context_depth was intended to 
support more depths in the future (perhaps for collecting after 
processing a nested function?), so my patch was following that model.


How about this?

Jason

commit afb196cd7fc176736f9ff2abf92690a7c4ae4f94
Author: Jason Merrill 
Date:   Fri Nov 13 09:39:15 2015 -0500

	Support GGC finalizers with PCH.

	* ggc-page.c (ggc_globals): Change finalizers and vec_finalizers
	to be vecs of vecs.
	(add_finalizer): Split out from ggc_internal_alloc.
	(ggc_handle_finalizers): Run finalizers for the current depth.
	(init_ggc, ggc_pch_read): Reserve space for finalizers.

diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
index deb21bb..c3af3c8 100644
--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -361,7 +361,7 @@ private:
   void (*m_function)(void *);
   size_t m_object_size;
   size_t m_n_objects;
-  };
+};
 
 #ifdef ENABLE_GC_ALWAYS_COLLECT
 /* List of free objects to be verified as actually free on the
@@ -456,11 +456,11 @@ static struct ggc_globals
  better runtime data access pattern.  */
   unsigned long **save_in_use;
 
-  /* Finalizers for single objects.  */
-  vec finalizers;
+  /* Finalizers for single objects.  The first index is collection_depth.  */
+  vec finalizers;
 
   /* Finalizers for vectors of objects.  */
-  vec vec_finalizers;
+  vec vec_finalizers;
 
 #ifdef ENABLE_GC_ALWAYS_COLLECT
   /* List of free objects to be verified as actually free on the
@@ -1240,6 +1240,23 @@ ggc_round_alloc_size (size_t requested_size)
   return size;
 }
 
+static void
+add_finalizer (void *result, void (*f)(void *), size_t s, size_t n)
+{
+  if (f == NULL)
+;
+  else if (n == 1)
+{
+  finalizer fin (result, f);
+  G.finalizers[G.context_depth].safe_push (fin);
+}
+  else
+{
+  vec_finalizer fin (reinterpret_cast (result), f, s, n);
+  G.vec_finalizers[G.context_depth].safe_push (fin);
+}
+}
+
 /* Allocate a chunk of memory of SIZE bytes.  Its contents are undefined.  */
 
 void *
@@ -1387,11 +1404,8 @@ ggc_internal_alloc (size_t size, void (*f)(void *), size_t s, size_t n
   /* For timevar statistics.  */
   timevar_ggc_mem_total += object_size;
 
-  if (f && n == 1)
-G.finalizers.safe_push (finalizer (result, f));
-  else if (f)
-G.vec_finalizers.safe_push
-  (vec_finalizer (reinterpret_cast (result), f, s, n));
+  if (f)
+add_finalizer (result, f, s, n);
 
   if (GATHER_STATISTICS)
 {
@@ -1788,6 +1802,9 @@ init_ggc (void)
   G.by_depth_max = INITIAL_PTE_COUNT;
   G.by_depth = XNEWVEC (page_entry *, G.by_depth_max);
   G.save_in_use = XNEWVEC (unsigned long *, G.by_depth_max);
+
+  G.finalizers.safe_grow_cleared (1);
+  G.vec_finalizers.safe_grow_cleared (1);
 }
 
 /* Merge the SAVE_IN_USE_P and IN_USE_P arrays in P so that IN_USE_P
@@ -1875,36 +1892,42 @@ clear_marks (void)
 static void
 ggc_handle_finalizers ()
 {
-  if (G.context_depth != 0)
-return;
-
-  unsigned length = G.finalizers.length ();
-  for (unsigned int i = 0; i < length;)
+  unsigned dlen = G.finalizers.length();
+  for (unsigned d = G.context_depth; d < dlen; ++d)
 {
-  finalizer  = G.finalizers[i];
-  if (!ggc_marked_p (f.addr ()))
+  vec  = G.finalizers[d];
+  unsigned length = v.length ();
+  for (unsigned int i = 0; i < length;)
 	{
-	  f.call ();
-	  G.finalizers.unordered_remove (i);
-	  length--;
+	  finalizer  = v[i];
+	  if (!ggc_marked_p (f.addr ()))
+	{
+	  f.call ();
+	  v.unordered_remove (i);
+	  length--;
+	}
+	  else
+	i++;
 	}
-  else
-	i++;
 }
 
-
-  length = G.vec_finalizers.length ();
-  for (unsigned int i = 0; i < length;)
+  gcc_assert (dlen == G.vec_finalizers.length());
+  for (unsigned d = 

Re: [Patch, vrp] Allow VRP type conversion folding only for widenings upto word mode

2015-11-17 Thread Senthil Kumar Selvaraj
On Mon, Nov 16, 2015 at 10:02:15AM +0100, Richard Biener wrote:
> On Sat, 14 Nov 2015, Senthil Kumar Selvaraj wrote:
> 
> > On Sat, Nov 14, 2015 at 09:57:40AM +0100, Richard Biener wrote:
> > > On November 14, 2015 9:49:28 AM GMT+01:00, Senthil Kumar Selvaraj 
> > >  wrote:
> > > >On Sat, Nov 14, 2015 at 09:13:41AM +0100, Marc Glisse wrote:
> > > >> On Sat, 14 Nov 2015, Senthil Kumar Selvaraj wrote:
> > > >> 
> > > >> >This patch came out of a discussion held in the gcc mailing list
> > > >> >(https://gcc.gnu.org/ml/gcc/2015-11/msg00067.html).
> > > >> >
> > > >> >The patch restricts folding of conditional exprs with lhs previously
> > > >> >set by a type conversion to occur only if the source of the type
> > > >> >conversion's mode is word mode or smaller.
> > > >> >
> > > >> >Bootstrapped and reg tested on x86_64 (with
> > > >--enable-languages=c,c++).
> > > >> >
> > > >> >If ok, could you commit please? I don't have commit access.
> > > >> >
> > > >> >Regards
> > > >> >Senthil
> > > >> >
> > > >> >gcc/ChangeLog
> > > >> >
> > > >> >2015-11-11  Senthil Kumar Selvaraj 
> > > >
> > > >> >
> > > >> >  * tree-vrp.c (simplify_cond_using_ranges): Fold only
> > > >> >  if innerop's mode is word_mode or smaller.
> > > >> >
> > > >> >
> > > >> >diff --git gcc/tree-vrp.c gcc/tree-vrp.c
> > > >> >index e2393e4..c139bc6 100644
> > > >> >--- gcc/tree-vrp.c
> > > >> >+++ gcc/tree-vrp.c
> > > >> >@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
> > > >> >  innerop = gimple_assign_rhs1 (def_stmt);
> > > >> >
> > > >> >  if (TREE_CODE (innerop) == SSA_NAME
> > > >> >+ && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
> > > >> >+   <= GET_MODE_SIZE(word_mode))
> > > >> >&& !POINTER_TYPE_P (TREE_TYPE (innerop)))
> > > >> >  {
> > > >> >value_range *vr = get_value_range (innerop);
> > > >> 
> > > >> I thought the result of the discussion was that the transformation is
> > > >ok if
> > > >> either it is narrowing or it widens but to something no bigger than
> > > >> word_mode. So you should have 2 comparisons, or 1 with a max.
> > > >
> > > >Hmm, I came to the opposite conclusion - I thought Richard only okayed
> > > >"widening upto word-mode", not the narrowing. 
> > > 
> > > I didn't mean to suggest narrowing is not OK.  In fact narrowing is 
> > > always OK.
> > 
> > My bad. Here's a revised patch that checks for both conditions, using
> > max as Marc suggested to limit to word_mode or narrowing conversions.
> > 
> > Bootstrapped and regtested for x86_64 with c and c++.
> > 
> > Is this ok? If yes, would you commit it
> > for me please? I don't have commit access.
> > 
> > gcc/ChangeLog
> > 2015-11-14  Senthil Kumar Selvaraj  
> > 
> > * tree-vrp.c (simplify_cond_using_ranges): Fold only
> > if innerop's mode smaller or equal to word_mode or op0's mode.
> > 
> > 
> > diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> > index e2393e4..cfd90e7 100644
> > --- a/gcc/tree-vrp.c
> > +++ b/gcc/tree-vrp.c
> > @@ -9467,7 +9467,10 @@ simplify_cond_using_ranges (gcond *stmt)
> >innerop = gimple_assign_rhs1 (def_stmt);
> >  
> >if (TREE_CODE (innerop) == SSA_NAME
> > - && !POINTER_TYPE_P (TREE_TYPE (innerop)))
> > + && !POINTER_TYPE_P (TREE_TYPE (innerop))
> > + && (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (innerop)))
> > +   <= std::max (GET_MODE_SIZE (word_mode),
> > +GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0))
> 
> Please use TYPE_PRECISION (...) and GET_MODE_PRECISION (word_mode) and
> add a comment as to what we are testing here and why.
> 
> Btw, ideally we'd factor out a
> 
> bool
> desired_pro_or_demotion_p (tree to_type, tree from_type) {}
> 
> function somewhere as we have similar tests throughout the compiler
> that we might want to unify (and also have a central place to
> eventually add a target hook if ever desired).
> 
> In fact in other places we also check that the type we promote/demote
> to matches its mode precision or the type we promote/demote from
> already does not.
> 
> I'd suggest tree.[ch] for that function.
> 
> Please also add a testcase.

How does the below patch look? Bootstrapped, but not regtested yet.

The testcase was rather tricky to write - I wasn't sure how to reliably
get a type bigger than a word for all targets. I resorted to __int128,
not sure it's a good idea though - I should probably add dg-skip-if for
targets that don't support that. Do you know of a better way to write
that?

Regards
Senthil

diff --git gcc/testsuite/gcc.dg/tree-ssa/vrp98.c 
gcc/testsuite/gcc.dg/tree-ssa/vrp98.c
new file mode 100644
index 000..448ceba
--- /dev/null
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp98.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-tree-vrp1-details" } */
+
+#include 
+#include 
+
+typedef unsigned int word __attribute__((mode(word)));

Re: POWERPC64_TOC_POINTER_ALIGNMENT

2015-11-17 Thread Michael Meissner
On Wed, Nov 18, 2015 at 09:52:41AM +1030, Alan Modra wrote:
> David noticed that gcc112 was generating gcc/auto-host.h with
> #define POWERPC64_TOC_POINTER_ALIGNMENT 32768
> 
> This is not the correct value of either 8 or 256 depending on how old
> ld is.  On investigating I found the cause is Fedora 21 modifying the
> toolchain to default to -z relro.  ld -z relro puts the relro gap just
> before .got (prior to my patches reordering sections for relro on
> powerpc64).  That unfortunately aligns .got, defeating the deliberate
> mis-alignment of .got in the testcase.
> 
> Fixed with the following obvious patch and committed to mainline.
> 
> Incidentally, bootstrap fails for me on powerpc64 due to "comparison
> is always true due to limited range of data type [-Wtype-limits]"
> && GET_MODE_SIZE (mode) <= POWERPC64_TOC_POINTER_ALIGNMENT));
> since my POWERPC64_TOC_POINTER_ALIGNMENT is 256 and mode_size is an
> unsigned char array.  Grrr, so what code obfuscation do we use here to
> work around this annoying warning?

For the moment, I added the following to my local build.  However, I can't
build libgcc on an x86 cross compiler:

In file included from 
/home/meissner/fsf-src/ieee/libgcc/soft-fp/soft-fp.h:321:0,
 from addkf3.c:33:
addkf3.c: In function ‘__addkf3’:
/home/meissner/fsf-src/ieee/libgcc/soft-fp/op-common.h:2057:6: internal 
compiler error: in convert_move, at expr.c:286
  (r) = __builtin_clzl (x); \
  ^~~~

/home/meissner/fsf-src/ieee/libgcc/soft-fp/op-2.h:129:2: note: in expansion of 
macro ‘__FP_CLZ’
  __FP_CLZ ((R), X##_f1);   \
  ^~~~

/home/meissner/fsf-src/ieee/libgcc/soft-fp/op-common.h:827:8: note: in 
expansion of macro ‘_FP_FRAC_CLZ_2’
_FP_FRAC_CLZ_##wc (_FP_ADD_INTERNAL_diff, R);  \
^

/home/meissner/fsf-src/ieee/libgcc/soft-fp/op-common.h:850:34: note: in 
expansion of macro ‘_FP_ADD_INTERNAL’
 #define _FP_ADD(fs, wc, R, X, Y) _FP_ADD_INTERNAL (fs, wc, R, X, Y, '+')
  ^~~~

/home/meissner/fsf-src/ieee/libgcc/soft-fp/quad.h:306:29: note: in expansion of 
macro ‘_FP_ADD’
 # define FP_ADD_Q(R, X, Y)  _FP_ADD (Q, 2, R, X, Y)
 ^~~

addkf3.c:48:3: note: in expansion of macro ‘FP_ADD_Q’
   FP_ADD_Q (R, A, B);
   ^~~~

0x78432e convert_move(rtx_def*, rtx_def*, int)
/home/meissner/fsf-src/ieee/gcc/expr.c:286
0x8753a6 expand_direct_optab_fn
/home/meissner/fsf-src/ieee/gcc/internal-fn.c:2132
0x6686ca expand_call_stmt
/home/meissner/fsf-src/ieee/gcc/cfgexpand.c:2565
0x6698e4 expand_gimple_stmt_1
/home/meissner/fsf-src/ieee/gcc/cfgexpand.c:3525
0x6698e4 expand_gimple_stmt
/home/meissner/fsf-src/ieee/gcc/cfgexpand.c:3688
0x66b2fe expand_gimple_basic_block
/home/meissner/fsf-src/ieee/gcc/cfgexpand.c:5694
0x66f226 execute
/home/meissner/fsf-src/ieee/gcc/cfgexpand.c:6309
Please submit a full bug report,

Here is the temporary patch I'm using to get past rs6000.c.  But I suspect the
TOC alignment should never be 256.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 230511)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -7992,14 +7992,21 @@ rs6000_cannot_force_const_mem (machine_m
can be addressed relative to the toc pointer.  */
 
 static bool
-use_toc_relative_ref (rtx sym, machine_mode mode)
+use_toc_relative_ref (rtx sym, machine_mode mode ATTRIBUTE_UNUSED)
 {
   return ((constant_pool_expr_p (sym)
   && ASM_OUTPUT_SPECIAL_POOL_ENTRY_P (get_pool_constant (sym),
   get_pool_mode (sym)))
  || (TARGET_CMODEL == CMODEL_MEDIUM
  && SYMBOL_REF_LOCAL_P (sym)
- && GET_MODE_SIZE (mode) <= POWERPC64_TOC_POINTER_ALIGNMENT));
+ /* If the linker says that TOC alignment is 256 bits, this test
+will always be true, since GET_MODE_SIZE returns an unsigned
+char on the PowerPC.  Prevent an warning/error in this
+case.  */
+#if POWERPC64_TOC_POINTER_ALIGNMENT < 256
+ && GET_MODE_SIZE (mode) <= POWERPC64_TOC_POINTER_ALIGNMENT
+#endif
+ ));
 }
 
 /* Our implementation of LEGITIMIZE_RELOAD_ADDRESS.  Returns a value to

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] PR fortran/59910 -- structure constructor in DATA statement

2015-11-17 Thread Steve Kargl
On Tue, Nov 17, 2015 at 04:36:01PM -0800, Steve Kargl wrote:
> On Wed, Nov 18, 2015 at 12:24:29AM +0100, Dominique d'Humières wrote:
> > > ??? but I suspect gfc_reduce_init_expr() 
> > > may be useful for PARAMETER statements as well (need to
> > > check this!).
> > 
> > As in the following test
> > 
> >   module m
> > implicit none
> > type t
> >   integer :: i
> > end type t
> > type(t), dimension(2), parameter :: a1  = (/ t(1), t(2) /)
> > type(t), dimension(1), parameter :: c = spread ( a1(1), 1, 1 )
> >   end module m
> > 
> 
> Yep.  We again arrive at gfc_conv_array_initializer with
> expr->expr_type == EXPR_FUNCTION, which isn't handled correctly.
> 
> The issue seems deeply rooted in the handling of derived types,
> which is actually worse than this!  But, that is definitely for
> another day.  See PR67817. :(
> 

Ugh. gfc_simplify_spread does not actually the use of SPREAD
here, because source->expr_type == EXPR_STRUCTURE which is not
handled.

-- 
Steve


Re: POWERPC64_TOC_POINTER_ALIGNMENT

2015-11-17 Thread Alan Modra
On Tue, Nov 17, 2015 at 07:53:18PM -0500, Michael Meissner wrote:
> Here is the temporary patch I'm using to get past rs6000.c.  But I suspect the
> TOC alignment should never be 256.

Yes, it should be.  Recent GNU ld aligns .TOC. to a 256 byte boundary.
I have this patch in my tree.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index abc8eaa..e3ec042 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8059,12 +8059,17 @@ rs6000_cannot_force_const_mem (machine_mode mode 
ATTRIBUTE_UNUSED, rtx x)
 static bool
 use_toc_relative_ref (rtx sym, machine_mode mode)
 {
+  /* Silence complaint that the POWERPC64_TOC_POINTER_ALIGNMENT test
+ is always true.  */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wtype-limits"
   return ((constant_pool_expr_p (sym)
   && ASM_OUTPUT_SPECIAL_POOL_ENTRY_P (get_pool_constant (sym),
   get_pool_mode (sym)))
  || (TARGET_CMODEL == CMODEL_MEDIUM
  && SYMBOL_REF_LOCAL_P (sym)
  && GET_MODE_SIZE (mode) <= POWERPC64_TOC_POINTER_ALIGNMENT));
+#pragma GCC diagnostic pop
 }
 
 /* Our implementation of LEGITIMIZE_RELOAD_ADDRESS.  Returns a value to

-- 
Alan Modra
Australia Development Lab, IBM


Re: Incorrect code due to indirect tail call of varargs function with hard float ABI

2015-11-17 Thread Kugan

> Hi Ramana,
> 
> Thanks for the review. I have opened a gcc bug-report for this. I tested
> the attached patch for  arm-none-linux-gnueabihf and
> arm-none-linux-gnueabi with no new regressions. Is this OK?
> 
> 
> Thanks,
> Kugan
> 
> gcc/ChangeLog:
> 
> 2015-11-18  Kugan Vivekanandarajah  
> 
>   PR target/68390
>   * config/arm/arm.c (arm_function_ok_for_sibcall): Get function type
>   for indirect function call.
> 
> gcc/testsuite/ChangeLog:
> 
> 2015-11-18  Kugan Vivekanandarajah  
> 
>   PR target/68390
>   * gcc.target/arm/PR68390.c: New test.
> 
> 
Hi Ramana,

With further testing on bare-metal, I found that for the following decl
has to be null for indirect functions.

  if (TARGET_AAPCS_BASED
  && arm_abi == ARM_ABI_AAPCS
  && decl
  && DECL_WEAK (decl))
return false;

Here is the updated patch and ChangeLog. Sorry for the noise.

Thanks,
Kugan


gcc/ChangeLog:

2015-11-18  Kugan Vivekanandarajah  

PR target/68390
* config/arm/arm.c (arm_function_ok_for_sibcall): Get function type
for indirect function call.

gcc/testsuite/ChangeLog:

2015-11-18  Kugan Vivekanandarajah  

PR target/68390
* gcc.target/arm/PR68390.c: New test.



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a379121..0dae7da 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6680,8 +6680,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
 a VFP register but then need to transfer it to a core
 register.  */
   rtx a, b;
+  tree fn_decl = decl;
 
-  a = arm_function_value (TREE_TYPE (exp), decl, false);
+  /* If it is an indirect function pointer, get the function type.  */
+  if (!decl)
+   fn_decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
+
+  a = arm_function_value (TREE_TYPE (exp), fn_decl, false);
   b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
  cfun->decl, false);
   if (!rtx_equal_p (a, b))
diff --git a/gcc/testsuite/gcc.target/arm/PR68390.c 
b/gcc/testsuite/gcc.target/arm/PR68390.c
index e69de29..86f07fe 100644
--- a/gcc/testsuite/gcc.target/arm/PR68390.c
+++ b/gcc/testsuite/gcc.target/arm/PR68390.c
@@ -0,0 +1,27 @@
+/* { dg-do run }  */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline))
+double direct(int x, ...)
+{
+  return x*x;
+}
+
+__attribute__ ((noinline))
+double broken(double (*indirect)(int x, ...), int v)
+{
+  return indirect(v);
+}
+
+int main ()
+{
+  double d1, d2;
+  int i = 2;
+  d1 = broken (direct, i);
+  if (d1 != i*i)
+{
+  __builtin_abort ();
+}
+  return 0;
+}
+


Re: [PATCH] PR fortran/59910 -- structure constructor in DATA statement

2015-11-17 Thread Steve Kargl
On Wed, Nov 18, 2015 at 12:24:29AM +0100, Dominique d'Humières wrote:
> > ??? but I suspect gfc_reduce_init_expr() 
> > may be useful for PARAMETER statements as well (need to
> > check this!).
> 
> As in the following test
> 
>   module m
> implicit none
> type t
>   integer :: i
> end type t
> type(t), dimension(2), parameter :: a1  = (/ t(1), t(2) /)
> type(t), dimension(1), parameter :: c = spread ( a1(1), 1, 1 )
>   end module m
> 

Yep.  We again arrive at gfc_conv_array_initializer with
expr->expr_type == EXPR_FUNCTION, which isn't handled correctly.

The issue seems deeply rooted in the handling of derived types,
which is actually worse than this!  But, that is definitely for
another day.  See PR67817. :(

-- 
Steve


Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m

2015-11-17 Thread James Greenhalgh
On Mon, Nov 16, 2015 at 01:15:32PM +, Andre Vieira wrote:
> On 16/11/15 12:07, James Greenhalgh wrote:
> >On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote:
> >>Hi,
> >>
> >>   This patch changes the target support mechanism to make it
> >>recognize any ARM 'M' profile as a non-neon supporting target. The
> >>current check only tests for armv6 architectures and earlier, and
> >>does not account for armv7-m.
> >>
> >>   This is correct because there is no 'M' profile that supports neon
> >>and the current test is not sufficient to exclude armv7-m.
> >>
> >>   Tested by running regressions for this testcase for various ARM targets.
> >>
> >>   Is this OK to commit?
> >>
> >>   Thanks,
> >>   Andre Vieira
> >>
> >>gcc/testsuite/ChangeLog:
> >>2015-11-06  Andre Vieira  
> >>
> >> * gcc/testsuite/lib/target-supports.exp
> >>   (check_effective_target_arm_neon_ok_nocache): Added check
> >>   for M profile.
> >
> >> From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001
> >>From: Andre Simoes Dias Vieira 
> >>Date: Fri, 13 Nov 2015 11:16:34 +
> >>Subject: [PATCH] Disable neon testing for armv7-m
> >>
> >>---
> >>  gcc/testsuite/lib/target-supports.exp | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/gcc/testsuite/lib/target-supports.exp 
> >>b/gcc/testsuite/lib/target-supports.exp
> >>index 
> >>75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6
> >> 100644
> >>--- a/gcc/testsuite/lib/target-supports.exp
> >>+++ b/gcc/testsuite/lib/target-supports.exp
> >>@@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } {
> >>int dummy;
> >>/* Avoid the case where a test adds -mfpu=neon, but the 
> >> toolchain is
> >>   configured for -mcpu=arm926ej-s, for example.  */
> >>-   #if __ARM_ARCH < 7
> >>+   #if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M'
> >>#error Architecture too old for NEON.
> >
> >Could you fix this #error message while you're here?
> >
> >Why we can't change this test to look for the __ARM_NEON macro from ACLE:
> >
> >#if __ARM_NEON < 1
> >   #error NEON is not enabled
> >#endif
> >
> >Thanks,
> >James
> >
> 
> There is a check for this already:
> 'check_effective_target_arm_neon'. I think the idea behind
> arm_neon_ok is to check whether the hardware would support neon,
> whereas arm_neon is to check whether neon was enabled, i.e.
> -mfpu=neon was used or a mcpu was passed that has neon enabled by
> default.
> 
> The comments for 'check_effective_target_arm_neon_ok_nocache'
> highlight this, though maybe the comments for
> check_effective_target_arm_neon could be better.
> 
> # Return 1 if this is an ARM target supporting -mfpu=neon
> # -mfloat-abi=softfp or equivalent options.  Some multilibs may be
> # incompatible with these options.  Also set et_arm_neon_flags to the
> # best options to add.
> 
> proc check_effective_target_arm_neon_ok_nocache
> ...
> /* Avoid the case where a test adds -mfpu=neon, but the toolchain is
>configured for -mcpu=arm926ej-s, for example.  */
> ...
> 
> 
> and
> 
> # Return 1 if this is a ARM target with NEON enabled.
> 
> proc check_effective_target_arm_neon

OK, got it - sorry for my mistake, I had the two procs confused.

I'd still like to see the error message fixed "Architecture too old for NEON."
is not an accurate description of the problem.

Thanks,
James



Re: [PATCH][RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

2015-11-17 Thread Kyrill Tkachov

Hi Bernd,

On 16/11/15 18:40, Bernd Schmidt wrote:

On 11/16/2015 03:07 PM, Kyrill Tkachov wrote:


I've explained in the comments in the patch what's going on but the
short version is trying to change the destination of a defining insn
that feeds into an extend insn is not valid if the defining insn
doesn't feed directly into the extend insn. In the ree pass the only
way this can happen is if there is an intermediate conditional move
that the pass tries to handle in a special way. An equivalent fix
would have been to check on that path (when copy_needed in
combine_reaching_defs is true) that the state->copies_list vector
(that contains the conditional move insns feeding into the extend
insn) is empty.


I ran this through gdb, and I think I see what's going on. For reference, 
here's a comment from the source:

  /* Considering transformation of
 (set (reg1) (expression))
 ...
 (set (reg2) (any_extend (reg1)))

 into

 (set (reg2) (any_extend (expression)))
 (set (reg1) (reg2))
 ...  */

I was thinking that another possible fix would be to also check 
!reg_used_between_p for reg1 to ensure it's not used. I'm thinking this might 
be a little clearer - what is your opinion?


Yes, I had considered that as well. It should be equivalent. I didn't use 
!reg_used_between_p because I thought
it'd be more expensive than checking reg_overlap_mentioned_p since we must 
iterate over a number of instructions
and call reg_overlap_mentioned_p on each one. But I suppose this case is rare 
enough that it wouldn't make any
measurable difference.

Would you prefer to use !reg_used_between_p here?



The added comment could lead to some confusion since it's placed in front of an 
existing if statement that also tests a different condition. Also, if we go 
with your fix,


+  || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn


Shouldn't this really be !rtx_equal_p?



Maybe, will it behave the right way if the two regs have different modes or 
when subregs are involved?
(will we even hit such a case in this path?)

Thanks,
Kyrill


Bernd





Re: [PATCH][RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

2015-11-17 Thread Kyrill Tkachov


On 17/11/15 09:08, Kyrill Tkachov wrote:

Hi Bernd,

On 16/11/15 18:40, Bernd Schmidt wrote:

On 11/16/2015 03:07 PM, Kyrill Tkachov wrote:


I've explained in the comments in the patch what's going on but the
short version is trying to change the destination of a defining insn
that feeds into an extend insn is not valid if the defining insn
doesn't feed directly into the extend insn. In the ree pass the only
way this can happen is if there is an intermediate conditional move
that the pass tries to handle in a special way. An equivalent fix
would have been to check on that path (when copy_needed in
combine_reaching_defs is true) that the state->copies_list vector
(that contains the conditional move insns feeding into the extend
insn) is empty.


I ran this through gdb, and I think I see what's going on. For reference, 
here's a comment from the source:

  /* Considering transformation of
 (set (reg1) (expression))
 ...
 (set (reg2) (any_extend (reg1)))

 into

 (set (reg2) (any_extend (expression)))
 (set (reg1) (reg2))
 ...  */

I was thinking that another possible fix would be to also check 
!reg_used_between_p for reg1 to ensure it's not used. I'm thinking this might 
be a little clearer - what is your opinion?


Yes, I had considered that as well. It should be equivalent. I didn't use 
!reg_used_between_p because I thought
it'd be more expensive than checking reg_overlap_mentioned_p since we must 
iterate over a number of instructions
and call reg_overlap_mentioned_p on each one. But I suppose this case is rare 
enough that it wouldn't make any
measurable difference.


Actually, I tried it out. And while a check reg_used_between_p fixed the 
testcase, it caused code quality regressions
on aarch64. Seems it's too aggressive in restricting ree.

I'll have a closer look.

Kyrill



Would you prefer to use !reg_used_between_p here?



The added comment could lead to some confusion since it's placed in front of an 
existing if statement that also tests a different condition. Also, if we go 
with your fix,


+  || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn


Shouldn't this really be !rtx_equal_p?



Maybe, will it behave the right way if the two regs have different modes or 
when subregs are involved?
(will we even hit such a case in this path?)

Thanks,
Kyrill


Bernd







Re: [PATCH] Make fdump-tree-sccp-details more complete

2015-11-17 Thread Richard Biener
On Mon, Nov 16, 2015 at 10:39 PM, Tom de Vries  wrote:
> Hi,
>
> pass_scev_cprop contains a bit where it replaces uses of an ssa-name with
> constants.  This is currently not noted in the dump-file, even with
> TDF_DETAILS.
>
> This patch adds that information in the dump-file, in this format:
> ...
> Replacing uses of: D__lsm.10_34 with: 1
> ...
>
> OK for trunk if bootstrap and reg-test succeeds?

Ok.

Richard.

> Thanks,
> - Tom


Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-17 Thread James Greenhalgh
On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
> Hi,
> GIMPLE IVO needs to call backend interface to calculate costs for addr
> expressions like below:
>FORM1: "r73 + r74 + 16380"
>FORM2: "r73 << 2 + r74 + 16380"
> 
> They are invalid address expression on AArch64, so will be legitimized by
> aarch64_legitimize_address.  Below are what we got from that function:
> 
> For FORM1, the address expression is legitimized into below insn sequence
> and rtx:
>r84:DI=r73:DI+r74:DI
>r85:DI=r84:DI+0x3000
>r83:DI=r85:DI
>"r83 + 4092"
> 
> For FORM2, the address expression is legitimized into below insn sequence
> and rtx:
>r108:DI=r73:DI<<0x2
>r109:DI=r108:DI+r74:DI
>r110:DI=r109:DI+0x3000
>r107:DI=r110:DI
>"r107 + 4092"
> 
> So the costs computed are 12/16 respectively.  The high cost prevents IVO
> from choosing right candidates.  Besides cost computation, I also think the
> legitmization is bad in terms of code generation.
> The root cause in aarch64_legitimize_address can be described by it's
> comment:
>/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST),
>   where mask is selected by alignment and size of the offset.
>   We try to pick as large a range for the offset as possible to
>   maximize the chance of a CSE.  However, for aligned addresses
>   we limit the range to 4k so that structures with different sized
>   elements are likely to use the same base.  */
> I think the split of CONST is intended for REG+CONST where the const offset
> is not in the range of AArch64's addressing modes.  Unfortunately, it
> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG< when the CONST are in the range of addressing modes.  As a result, these two
> cases fallthrough this logic, resulting in sub-optimal results.
> 
> It's obvious we can do below legitimization:
> FORM1:
>r83:DI=r73:DI+r74:DI
>"r83 + 16380"
> FORM2:
>r107:DI=0x3ffc
>r106:DI=r74:DI+r107:DI
>   REG_EQUAL r74:DI+0x3ffc
>"r106 + r73 << 2"
> 
> This patch handles these two cases as described.

Thanks for the description, it made the patch very easy to review. I only
have a style comment.

> Bootstrap & test on AArch64 along with other patch.  Is it OK?
> 
> 2015-11-04  Bin Cheng  
>   Jiong Wang  
> 
>   * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>   address expressions like REG+REG+CONST and REG+NON_REG+CONST.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5c8604f..47875ac 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
> machine_mode mode)
>  {
>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>HOST_WIDE_INT base_offset;
> +  rtx op0 = XEXP (x,0);
> +
> +  if (GET_CODE (op0) == PLUS)
> + {
> +   rtx op0_ = XEXP (op0, 0);
> +   rtx op1_ = XEXP (op0, 1);

I don't see this trailing _ on a variable name in many places in the source
tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
Can we pick a different name for op0_ and op1_?

> +
> +   /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
> +  reach here, the 'CONST' may be valid in which case we should
> +  not split.  */
> +   if (REG_P (op0_) && REG_P (op1_))
> + {
> +   machine_mode addr_mode = GET_MODE (op0);
> +   rtx addr = gen_reg_rtx (addr_mode);
> +
> +   rtx ret = plus_constant (addr_mode, addr, offset);
> +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
> + {
> +   emit_insn (gen_adddi3 (addr, op0_, op1_));
> +   return ret;
> + }
> + }
> +   /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
> +  will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
> +  we split it into Y=REG+CONST, Y+NON_REG.  */
> +   else if (REG_P (op0_) || REG_P (op1_))
> + {
> +   machine_mode addr_mode = GET_MODE (op0);
> +   rtx addr = gen_reg_rtx (addr_mode);
> +
> +   /* Switch to make sure that register is in op0_.  */
> +   if (REG_P (op1_))
> + std::swap (op0_, op1_);
> +
> +   rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
> +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
> + {
> +   addr = force_operand (plus_constant (addr_mode,
> +op0_, offset),
> + NULL_RTX);
> +   ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
> +   return ret;
> + }

The logic here is a bit hairy to follow, you construct a PLUS RTX to check
aarch64_legitimate_address_hook_p, then construct a 

Re: [PATCH] Improve BB vectorization dependence analysis

2015-11-17 Thread Richard Biener
On Mon, 16 Nov 2015, Alan Lawrence wrote:

> On 09/11/15 12:55, Richard Biener wrote:
> > 
> > Currently BB vectorization computes all dependences inside a BB
> > region and fails all vectorization if it cannot handle some of them.
> > 
> > This is obviously not needed - BB vectorization can restrict the
> > dependence tests to those that are needed to apply the load/store
> > motion effectively performed by the vectorization (sinking all
> > participating loads/stores to the place of the last one).
> > 
> > With restructuring it that way it's also easy to not give up completely
> > but only for the SLP instance we cannot vectorize (this gives
> > a slight bump in my SPEC CPU 2006 testing to 756 vectorized basic
> > block regions).
> > 
> > But first and foremost this patch is to reduce the dependence analysis
> > cost and somewhat mitigate the compile-time effects of the first patch.
> > 
> > For fixing PR56118 only a cost model issue remains.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > 
> > Richard.
> > 
> > 2015-11-09  Richard Biener  
> > 
> > PR tree-optimization/56118
> > * tree-vectorizer.h (vect_find_last_scalar_stmt_in_slp): Declare.
> > * tree-vect-slp.c (vect_find_last_scalar_stmt_in_slp): Export.
> > * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): New
> > function.
> > (vect_slp_analyze_data_ref_dependences): Instead of computing
> > all dependences of the region DRs just analyze the code motions
> > SLP vectorization will perform.  Remove SLP instances that
> > cannot have their store/load motions applied.
> > (vect_analyze_data_refs): Allow DRs without a vectype
> > in BB vectorization.
> > 
> > * gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c: Adjust.
> 
> Since this, I've been seeing an ICE on gfortran.dg/vect/vect-9.f90 at on both
> aarch64-none-linux-gnu and arm-none-linux-gnueabihf:
> 
> spawn /home/alalaw01/build/gcc/testsuite/gfortran4/../../gfortran
> -B/home/alalaw01/build/gcc/testsuite/gfortran4/../../
> -B/home/alalaw01/build/aarch64-unknown-linux-gnu/./libgfortran/
> /home/alalaw01/gcc/gcc/testsuite/gfortran.dg/vect/vect-9.f90
> -fno-diagnostics-show-caret -fdiagnostics-color=never -O -O2 -ftree-vectorize
> -fvect-cost-model=unlimited -fdump-tree-vect-details -Ofast -S -o vect-9.s
> /home/alalaw01/gcc/gcc/testsuite/gfortran.dg/vect/vect-9.f90:5:0: Error:
> definition in block 13 follows the use for SSA_NAME: _339 in statement:
> vectp.156_387 = &*cc_36(D)[_339];
> /home/alalaw01/gcc/gcc/testsuite/gfortran.dg/vect/vect-9.f90:5:0: internal
> compiler error: verify_ssa failed
> 0xcfc61b verify_ssa(bool, bool)
> ../../gcc-fsf/gcc/tree-ssa.c:1039
> 0xa2fc0b execute_function_todo
> ../../gcc-fsf/gcc/passes.c:1952
> 0xa30393 do_per_function
> ../../gcc-fsf/gcc/passes.c:1632
> 0xa3058f execute_todo
> ../../gcc-fsf/gcc/passes.c:2000
> Please submit a full bug report...
> FAIL: gfortran.dg/vect/vect-9.f90   -O  (internal compiler error)
> FAIL: gfortran.dg/vect/vect-9.f90   -O  (test for excess errors)
> 
> Still there (on aarch64) at r230329.

Please open a bugreport.

Thanks,
Richard.


[PATCH] PR 65751 Bogus in error message

2015-11-17 Thread Dominique d'Humières
Is the following patch OK for trunk and 5.3? 

I have used the legalese found in my draft for Fortran 2015.
Would it be acceptable to replace 
"with the BIND attribute or the SEQUENCE attribute" 
with
"with the BIND or SEQUENCE attribute"?

Dominique

Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog   (revision 230455)
+++ gcc/fortran/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2015-11-17  Dominique d'Humieres 
+
+   PR fortran/65751
+   * expr.c (gfc_check_pointer_assign): Fix error message.
+
 2015-11-16  Steven G. Kargl  
 
PR fortran/58027
Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c  (revision 230455)
+++ gcc/fortran/expr.c  (working copy)
@@ -3632,11 +3632,10 @@
   || (lvalue->ts.type == BT_DERIVED
   && (lvalue->ts.u.derived->attr.is_bind_c
   || lvalue->ts.u.derived->attr.sequence
-   gfc_error ("Data-pointer-object  must be unlimited "
-  "polymorphic, a sequence derived type or of a "
-  "type with the BIND attribute assignment at %L "
-  "to be compatible with an unlimited polymorphic "
-  "target", >where);
+   gfc_error ("Data-pointer-object at %L must be unlimited "
+  "polymorphic, or of a type with the BIND attribute "
+  "or the SEQUENCE attribute, to be compatible with "
+  "an unlimited polymorphic target", >where);
   else
gfc_error ("Different types in pointer assignment at %L; "
   "attempted assignment of %s to %s", >where,
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 230455)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2015-11-17  Dominique d'Humieres 
+
+   PR fortran/65751
+   * gfortran.dg/unlimited_polymorphic_2.f03: Update test.
+
 2015-11-17  Uros Bizjak  
 
* gcc.dg/torture/pr68264.c: Use dg-add-options ieee.
Index: gcc/testsuite/gfortran.dg/unlimited_polymorphic_2.f03
===
--- gcc/testsuite/gfortran.dg/unlimited_polymorphic_2.f03   (revision 
230455)
+++ gcc/testsuite/gfortran.dg/unlimited_polymorphic_2.f03   (working copy)
@@ -48,7 +48,7 @@
 call foo (y)
 
 y => tgt ! This is OK, of course.
-tgt => y ! { dg-error "must be unlimited polymorphic" }
+tgt => y ! { dg-error "Data-pointer-object at .1. must be unlimited 
polymorphic" }
 
 select type (y) ! This is the correct way to accomplish the previous
   type is (integer)



Re: Extend tree-call-cdce to calls whose result is used

2015-11-17 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Nov 13, 2015 at 2:12 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz  wrote:
 Hi,

 On Mon, 9 Nov 2015, Richard Sandiford wrote:

> +static bool
> +can_use_internal_fn (gcall *call)
> +{
> +  /* Only replace calls that set errno.  */
> +  if (!gimple_vdef (call))
> +return false;

 Oh, I managed to confuse this in my head while reading the patch.  So,
 hmm, you don't actually replace the builtin with an internal function
 (without the condition) under no-errno-math?  Does something else do that?
 Because otherwise that seems an unnecessary restriction?

> >> r229916 fixed that for the non-EH case.
> >
> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the
> > original dominator of the EH destination was the call block it moves,
> > otherwise it remains unchanged.
>
> The target of the edge is easy in itself, I agree, but that isn't
> necessarily the only affected block, if the EH handler doesn't
> exit or rethrow.

 You're worried the non-EH and the EH regions merge again, right?  Like so:

 before change:

 BB1: throwing-call
  fallthru/   \EH
 BB2   BBeh
  |   /\ (stuff in EH-region)
  | /some path out of EH region
  | /--/
 BB3

 Here, BB3 must at least be dominated by BB1 (the throwing block), or by
 something further up (when there are other side-entries to the path
 BB2->BB3 or into the EH region).  When further up, nothing changes, when
 it's BB1, then it's afterwards dominated by the BB containing the
 condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh,
 which gets idom=Bcall.  Depending on how you split BB1, either Bcond or
 BBcall might still be BB1 and doesn't lead to changes in the dom tree.

> > Currently we have quite some of such passes (reassoc, forwprop,
> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap
> > and others), but they are all handling only special situations in one
> > way or the other.  pass_fold_builtins is another one, but it seems
> > most related to what you want (replacing a call with something else),
> > so I thought that'd be the natural choice.
>
> Well, to be pedantic, it's not really replacing the call.  Except for
> the special case of targets that support direct assignments to errno,
> it keeps the original call but ensures that it isn't usually executed.
> From that point of view it doesn't really seem like a fold.
>
> But I suppose that's just naming again :-).  And it's easily solved with
> s/fold/rewrite/.

 Exactly, in my mind pass_fold_builtin (like many of the others I
 mentioned) doesn't do folding but rewriting :)
>>>
>>> So I am replying here to the issue of where to do the transform call_cdce
>>> does and the one Richard wants to add.  For example we "lower"
>>> posix_memalign as early as GIMPLE lowering (that's before CFG construction).
>>> We also lower sincos to cexpi during GENERIC folding (or if that is dropped
>>> either GIMPLE lowering or GIMPLE folding during gimplification would be
>>> appropriate).
>>>
>>> Now, with offloading we have to avoid creating target dependencies before
>>> LTO stream-out (thus no IFN replacements before that - not sure if
>>> Richards patches have an issue there already).
>>
>> No, this patch was the earliest point at which we converted to internal
>> functions.  The idea was to make code treat ECF_PURE built-in functions
>> and internal functions as being basically equivalent.  There's therefore
>> not much benefit to doing a straight replacement of one with the other
>> during either GENERIC or gimple.  Instead the series only used internal
>> functions for things that built-in functions couldn't do, specifically:
>>
>> - the case used in this patch, to optimise part of a non-pure built-in
>>   function using a pure equivalent.
>>
>> - vector versions of built-in functions.
>>
>> The cfgexpand patch makes sure that pure built-in functions are expanded
>> like internal functions where possible.
>>
>>> Which would leave us with a lowering stage early in the main
>>> optimization pipeline - I think fold_builtins pass is way too late but
>>> any "folding" pass will do (like forwprop or backprop where the latter
>>> might be better because it might end up computing FP "ranges" to
>>> improve the initial lowering code).
>>
>> This isn't at all related to what backprop is doing though.
>> backprop is about optimising definitions based on information
>> about all uses.
>>
>> Does fold_builtins need to be as 

[PATCH, testsuite]: Add ieee options for gcc.dg/torture/pr68264.c

2015-11-17 Thread Uros Bizjak
This test uses NaN, so it requires ieee options for certain targets.

2015-11-17  Uros Bizjak  

* gcc.dg/torture/pr68264.c: Use dg-add-options ieee.

Tested on alphaev68-linux-gnu and committed to mainline SVN.

Uros.

Index: gcc.dg/torture/pr68264.c
===
--- gcc.dg/torture/pr68264.c(revision 230454)
+++ gcc.dg/torture/pr68264.c(working copy)
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-add-options ieee } */
 /* { dg-require-effective-target fenv_exceptions } */

 #include 


Re: [Patch AArch64] Add support for Cortex-A35

2015-11-17 Thread Marcus Shawcroft
On 16 November 2015 at 14:36, James Greenhalgh  wrote:

> 2015-11-16  James Greenhalgh  
>
> * config/aarch64/aarch64-cores.def (cortex-a35): New.
> * config/aarch64/aarch64.c (cortexa35_tunings): New.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * doc/invoke.texi (-mcpu): Add Cortex-A35
>

OK /M


Re: Add genmatch support for internal functions

2015-11-17 Thread Richard Sandiford
Richard Sandiford  writes:
> This patch makes genmatch match calls based on combined_fn rather
> than built_in_function and extends the matching to internal functions.
> It also uses fold_const_call to fold the calls to a constant, rather
> than going through fold_builtin_n.
>
> In order to slightly simplify the code and remove potential
> ambiguity, the patch enforces lower case for tree codes
> (foo->FOO_EXPR), caps for functions (no built_in_hypot->BUILT_IN_HYPOT)
> and requires an exact match for user-defined identifiers.  The first two
> were already met in practice but there were a couple of cases where
> operator lists were defined in one case and used in another.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?

The updated patch below adds the SCALAR_FLOAT_TYPE_P check discussed here:

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01949.html

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* match.pd: Use HYPOT and COS rather than hypot and cos.
Use CASE_CFN_* macros.  Guard log/exp folds with
SCALAR_FLOAT_TYPE_P.
* genmatch.c (internal_fn): New enum.
(fn_id::fn): Change to an unsigned int.
(fn_id::fn_id): Accept internal_fn too.
(add_builtin): Rename to...
(add_function): ...this and turn into a template.
(get_operator): Only try one variation if the original name fails.
Only add _EXPR if the original name was all lower case.
Try converting internal and built-in function names to their
CFN equivalents.
(expr::gen_transform): Use maybe_build_call_expr_loc for generic.
(dt_simplify::gen_1): Likewise.
(dt_node::gen_kids_1): Use gimple_call_combined_fn for gimple
and get_call_combined_fn for generic.
(dt_simplify::gen): Use combined_fn as the type of fn_ids.
(decision_tree::gen): Likewise.
(main): Use lower case in the strings for {VIEW_,}CONVERT[012].
Use add_function rather than add_builtin.  Register internal
functions too.
* generic-match-head.c: Include case-cfn-macros.h.
* gimple-fold.c (replace_stmt_with_simplification): Use
gimple_call_combined_fn to test whether we can keep an
existing call.
* gimple-match.h (code_helper): Replace built_in_function
with combined_fn.
* gimple-match-head.c: Include fold-const-call.h, internal-fn.h
and case-fn-macros.h.
(gimple_resimplify1): Use fold_const_call.
(gimple_resimplify2, gimple_resimplify3): Likewise.
(build_call_internal, build_call): New functions.
(maybe_push_res_to_seq): Use them.
(gimple_simplify): Use fold_const_call.  Set *rcode to a combined_fn
rather than a built-in function.
* tree.h (build_call_expr_internal_loc): Declare.
(maybe_build_call_expr_loc): Likewise.
* tree.c (build_call_expr_internal_loc_array): New function.
(maybe_build_call_expr_loc): Likewise.

diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
index f2e08ed..f55f91e 100644
--- a/gcc/generic-match-head.c
+++ b/gcc/generic-match-head.c
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "builtins.h"
 #include "dumpfile.h"
+#include "case-cfn-macros.h"
 
 
 /* Routine to determine if the types T1 and T2 are effectively
diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index 9d74ed7..daa66d9 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -230,6 +230,12 @@ enum built_in_function {
 END_BUILTINS
 };
 
+#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) IFN_##CODE,
+enum internal_fn {
+#include "internal-fn.def"
+  IFN_LAST
+};
+
 /* Return true if CODE represents a commutative tree code.  Otherwise
return false.  */
 bool
@@ -341,13 +347,15 @@ struct operator_id : public id_base
   const char *tcc;
 };
 
-/* Identifier that maps to a builtin function code.  */
+/* Identifier that maps to a builtin or internal function code.  */
 
 struct fn_id : public id_base
 {
   fn_id (enum built_in_function fn_, const char *id_)
   : id_base (id_base::FN, id_), fn (fn_) {}
-  enum built_in_function fn;
+  fn_id (enum internal_fn fn_, const char *id_)
+  : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)) {}
+  unsigned int fn;
 };
 
 struct simplify;
@@ -447,10 +455,12 @@ add_operator (enum tree_code code, const char *id,
   *slot = op;
 }
 
-/* Add a builtin identifier to the hash.  */
+/* Add a built-in or internal function identifier to the hash.  ID is
+   the name of its CFN_* enumeration value.  */
 
+template 
 static void
-add_builtin (enum built_in_function code, const char *id)
+add_function (T code, const char *id)
 {
   fn_id *fn = new fn_id (code, id);
   id_base **slot = operators->find_slot_with_hash (fn, fn->hashval, INSERT);
@@ -485,30 +495,32 @@ 

Re: [Patch ARM] Add support for Cortex-A35

2015-11-17 Thread Ramana Radhakrishnan
On Mon, Nov 16, 2015 at 2:42 PM, James Greenhalgh
 wrote:
>
> Hi,
>
> This patch adds support to the ARM back-end for the Cortex-A35
> processor, as recently announced by ARM. The ARM Cortex-A35 provides
> full support for the ARMv8-A architecture, including the CRC extension,
> with optional Advanced-SIMD and Floating-Point support. We therefore set
> feature flags for this CPU to FL_FOR_ARCH8A and FL_CRC32 and FL_LDSCHED,
> in the same fashion as Cortex-A53 and Cortex-A57. While the Cortex-A35
> has dual issue capabilities, we model it with an issue rate of one, with
> the expectation that this will give better schedules when using the
> Cortex-A53 pipeline model.
>
> Bootstrapped with --with-tune=cortex-a35 with no issues.
>
> I'm sorry to have this upstream a little late for the close of Stage 1,
> I wanted to wait for binutils support to be committed. This happened
> on Thursday [1]. If it is OK with the ARM maintainers, I'd like to get
> this in to GCC 6.
>
> OK?

Can you also deal with an entry in the news for GCC6 page ?

Ramana
>
> Thanks,
> James
>
> [1]: https://sourceware.org/ml/binutils-cvs/2015-11/msg00065.html
>
> ---
> 2015-11-16  James Greenhalgh  
>
> * config/arm/arm-cores.def (cortex-a35): New.
> * config/arm/arm.c (arm_cortex_a35_tune): New.
> * config/arm/arm-tables.opt: Regenerate.
> * config/arm/arm-tune.md: Regenerate.
> * config/arm/bpabi.h (BE8_LINK_SPEC): Add cortex-a35.
> * config/arm/t-aprofile: Likewise.
> * doc/invoke.texi (-mcpu): Likewise.
>


[embedded-5-branch][PATCH 0/2] Backporting algorithmic optimization and testcase change

2015-11-17 Thread Andre Vieira
This series is aimed at backporting algorithmic optimizations and a 
change to a test it affects from trunk to the embedded-5-branch.


Andre Vieira(2):
Backporting algorithmic optimization in match and simplify
Backporting fix for PR-67948.



[embedded-5-branch][PATCH 2/2]Backporting fix for PR-67948.

2015-11-17 Thread Andre Vieira
This patch backports the fix for PR-67948 from trunk to the 
embedded-5-branch.


The original patch is at:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02193.html

Tested for Cortex-M3.

Is this OK to commit?

Thanks,
Andre

gcc/testsuite/ChangeLog
2015-10-27  Andre Vieira 
Backport from mainline:
  2015-10-15  Andre Vieira  
   PR testsuite/67948
   * gcc.target/arm/xor-and.c: check for eor instead of orr.
From 89922547118e716b41ddf6edefb274322193f25c Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Thu, 15 Oct 2015 12:48:26 +0100
Subject: [PATCH] Fix for xor-and.c test

---
 gcc/testsuite/gcc.target/arm/xor-and.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/xor-and.c b/gcc/testsuite/gcc.target/arm/xor-and.c
index 53dff85f8f780fb99a93bbcc24180a3d0d5d3be9..3715530cd7bf9ad8abb24cb21cd51ae3802079e8 100644
--- a/gcc/testsuite/gcc.target/arm/xor-and.c
+++ b/gcc/testsuite/gcc.target/arm/xor-and.c
@@ -10,6 +10,6 @@ unsigned short foo (unsigned short x)
   return x;
 }
 
-/* { dg-final { scan-assembler "orr" } } */
+/* { dg-final { scan-assembler "eor" } } */
 /* { dg-final { scan-assembler-not "mvn" } } */
 /* { dg-final { scan-assembler-not "uxth" } } */
-- 
1.9.1



  1   2   >