Re: Simple bitop reassoc in match.pd

2016-05-11 Thread Marc Glisse

On Wed, 11 May 2016, Jeff Law wrote:


On 05/11/2016 10:17 AM, Marc Glisse wrote:

The transformation seems right to me, but we are then missing another
transformation like ~X & Y -> X ^ Y when we know that X & ~Y is 0 (the 1
bits of X are included in those of Y). That may not be easy with the
limited bit tracking we have. A version limited to constant Y of the
form 2^n-1 (i.e. 0...01...1) where X has a range included in [0, Y] may
be simpler.
While we don't have strong bit tracking, we do track [0..1] ranges reasonably 
well, so it may be worth doing.


I had started writing

+/* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0.  */
+#if GIMPLE
+(simplify
+ (bit_and (bit_not SSA_NAME@0) INTEGER_CST@1)
+ (if ((get_nonzero_bits (@0) & wi::bit_not (@1)) == 0)
+  (bit_xor @0 @1)))
+#endif

but then I realized that VRP does not call the simplify machinery, so this 
would have to be rewritten in simplify_bit_ops_using_ranges. The good 
point is that we could then compute:

may_be_nonzero_X & ~must_be_nonzero_Y
instead of assuming that Y is a constant (not that it would change 
anything in practice).


--
Marc Glisse


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-05-11 Thread Dhole
Attaching the patch with all the issues addressed.

On 16-05-10 13:32:03, Bernd Schmidt wrote:
> Not sure %ld is always correct here. See below.

Using %wd now
 
> >diff --git a/gcc/gcc.c b/gcc/gcc.c
> >index 1af5920..0b11cb5 100644
> >--- a/gcc/gcc.c
> >+++ b/gcc/gcc.c
> >@@ -403,6 +403,7 @@ static const char *pass_through_libs_spec_func (int, 
> >const char **);
> >  static const char *replace_extension_spec_func (int, const char **);
> >  static const char *greater_than_spec_func (int, const char **);
> >  static char *convert_white_space (char *);
> >+static void setenv_SOURCE_DATE_EPOCH_current_time (void);
> 
> Best to just move the function before its use so you don't have to declare
> it here.

done.
 
> >
> >  /* The Specs Language
> >
> >@@ -3837,6 +3838,7 @@ driver_handle_option (struct gcc_options *opts,
> >else
> > compare_debug_opt = arg;
> >save_switch (compare_debug_replacement_opt, 0, NULL, validated, 
> > true);
> >+  setenv_SOURCE_DATE_EPOCH_current_time ();
> >return true;
> >
> >  case OPT_fdiagnostics_color_:
> >@@ -9853,6 +9855,30 @@ path_prefix_reset (path_prefix *prefix)
> >prefix->max_len = 0;
> >  }
> >
> >+static void
> >+setenv_SOURCE_DATE_EPOCH_current_time ()
> 
> Functions need a comment documenting what they do. Also, not thrilled about
> the name with the caps. Maybe set_source_date_envvar.

Added comment and changed to set_source_date_epoch_envvar.

> >+  /* Array size is 21 = ceil(log_10(2^64)) + 1 to hold string 
> >representations
> >+ of 64 bit integers.  */
> >+  char source_date_epoch[21];
> >+  time_t tt;
> >+  struct tm *tb = NULL;
> >+
> >+  errno = 0;
> >+  tt = time (NULL);
> >+  if (tt < (time_t) 0 || errno != 0)
> >+tt = (time_t) 0;
> >+
> >+  tb = gmtime ();
> >+
> >+  if (!strftime (source_date_epoch, 21, "%s", tb))
> >+snprintf (source_date_epoch, 21, "0");
> >+  /* Using setenv instead of xputenv because we want the variable to remain
> >+ after finalizing so that it's still set in the second run when using
> >+ -fcompare-debug.  */
> >+  setenv ("SOURCE_DATE_EPOCH", source_date_epoch, 0);
> >+}
> 
> Do we really need the whole dance with gmtime/strftime? I thought time_t is
> documented to hold the number we want, so convert that to long and print it.

Using a single snprintf with a cast to unsigned long long now.
 
> >diff --git a/libcpp/macro.c b/libcpp/macro.c
> >index c2a8376..5f7ffbd 100644
> >--- a/libcpp/macro.c
> >+++ b/libcpp/macro.c
> >@@ -359,8 +359,9 @@ _cpp_builtin_macro_text (cpp_reader *pfile, cpp_hashnode 
> >*node,
> >
> >   /* Set a reproducible timestamp for __DATE__ and __TIME__ macro
> >  usage if SOURCE_DATE_EPOCH is defined.  */
> >-  if (pfile->source_date_epoch != (time_t) -1)
> >- tb = gmtime (>source_date_epoch);
> >+  tt = pfile->cb.get_source_date_epoch (pfile);
> >+  if (tt != (time_t) -1)
> >+tb = gmtime ();
> 
> That looks like we could call the callback multiple times, which is
> inefficient and could get repeated error messages. Best to store the value
> once computed, and maybe add a testcase that the error is printed only once
> (once we have the dejagnu machinery).
> 
> The callback could potentially be NULL, right, if this isn't called from one
> of the C frontends? Best to check for that as well.

I've added the pfile->source_date_epoch back to store the value once
computed.  It's meant to be initialized to -2 (not yet set), and then
set to either -1 (SOURCE_DATE_EPOCH not set) or a non-negative value
containing the value form SOURCE_DATE_EPOCH.

I've also added the dg-set-compiler-env-var function in the test
framework and written 2 tests: one to check the correct behaviour of
SOURCE_DATE_EPOCH, and one to check that when SOURCE_DATE_EPOCH contains
an invalid value the error is only reported once.

Cheers,
-- 
Dhole
gcc/c-family/ChangeLog:

2016-05-12  Eduard Sanou  

* c-common.c (get_source_date_epoch): Renamed to
cb_get_source_date_epoch.
* c-common.c (cb_get_source_date_epoch): Use a single generic erorr
message when the parsing fails.  Use error_at instead of fatal_error.
* c-common.h (get_source_date_epoch): Renamed to
cb_get_source_date_epoch.
* c-common.h (cb_get_source_date_epoch): Prototype.
* c-common.h (MAX_SOURCE_DATE_EPOCH): Define.
* c-common.h (c_omp_region_type): Remove trailing coma.
* c-lex.c (init_c_lex): Set cb->get_source_date_epoch callback.
* c-lex.c (c_lex_with_flags): Initialize source_date_epoch with the
new cpp_init_source_date_epoch.

gcc/ChangeLog:

2016-05-12  Eduard Sanou  

* doc/cppenv.texi: Note that the `%s` in `date` is a non-standard
extension.
* gcc.c (driver_handle_option): Call set_source_date_epoch_envvar.
* gcc.c (set_source_date_epoch_envvar): New function, sets
the SOURCE_DATE_EPOCH 

[PATCH v3, rs6000] Add built-in support for new Power9 darn (deliver a random number) instruction

2016-05-11 Thread Kelvin Nilsen

This patch adds built-in function support for the Power9 darn
instruction.  This patch differs from the v2 patch distributed on 5 May 
in the following ways:

1. Removed "darn" macro short-hand definitions from altivec.h
2. Removed extraneous assert in rs6000.c
3. Changed the attribute type for each of the darn instructions to
"integer".
4. Removed the redundant specification of attribute length from each of
the darn instructions.
5. Replaced a %d with %u in a fprintf formatting string within rs6000.c.
6. Fixed various spelling and style errors as identified by Bernhard
Reutner-Fischer and Segher Boessenkool.
7. After further investigation and private discussion with Segher
Boessenkool, it was determined to not replace use of the
RS6000_BTC_SPECIAL flag with a new RS6000_BTC_NULLARY flag.

I have bootstrapped and tested this patch against the trunk and against
the gcc-6-branch on both powerpc64le-unknown-linux-gnu and
powerpc64-unknown-linux-gnu with no regressions.  Is this ok for trunk
and for backporting to GCC 6 after a few days of burn-in time on the
trunk?

Thanks,
Kelvin

gcc/testsuite/ChangeLog:

2016-05-11  Kelvin Nilsen  

* gcc.target/powerpc/darn-0.c: New test.
* gcc.target/powerpc/darn-1.c: New test.
* gcc.target/powerpc/darn-2.c: New test.


gcc/ChangeLog:

2016-05-11  Kelvin Nilsen  

* config/rs6000/altivec.md (UNSPEC_DARN): New unspec constant.
(UNSPEC_DARN_32): New unspec constant.
(UNSPEC_DARN_RAW): New unspec constant.
(darn_32): New instruction.
(darn_raw): New instruction.
(darn): New instruction.
* config/rs6000/rs6000-builtin.def (RS6000_BUILTIN_0): Add
support and documentation for this macro.
(BU_P9_MISC_1): New macro definition.
(BU_P9_64BIT_MISC_0): New macro definition.
(BU_P9_MISC_0): New macro definition.
(darn_32): New builtin definition.
(darn_raw): New builtin definition.
(darn): New builtin definition.
* config/rs6000/rs6000.c: Add #define RS6000_BUILTIN_0 and #undef
RS6000_BUILTIN_0 directives to surround each occurrence of
#include "rs6000-builtin.def".
(rs6000_builtin_mask_calculate): Add in the RS6000_BTM_MODULO and
RS6000_BTM_64BIT flags to the returned mask, depending on
configuration.
(def_builtin): Correct an error in the assignments made to the
debugging variable attr_string.
(rs6000_expand_builtin): Add support for no-operand built-in
functions.
(builtin_function_type): Remove fatal_error assertion that is no
longer valid.
(rs6000_common_init_builtins): Add support for no-operand built-in
functions.
* config/rs6000/rs6000.h (RS6000_BTM_MODULO): New macro
definition.
(RS6000_BTM_PURE): Enhance comment to clarify intent of this flag
definition.
(RS6000_BTM_64BIT): New macro definition.
* doc/extend.texi: Document __builtin_darn (void),
__builtin_darn_raw (void), and __builtin_darn_32 (void) built-in
functions.


Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 235884)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -73,6 +73,9 @@
UNSPEC_VUNPACK_LO_SIGN_DIRECT
UNSPEC_VUPKHPX
UNSPEC_VUPKLPX
+   UNSPEC_DARN
+   UNSPEC_DARN_32
+   UNSPEC_DARN_RAW
UNSPEC_DST
UNSPEC_DSTT
UNSPEC_DSTST
@@ -3590,6 +3593,33 @@
   [(set_attr "length" "4")
(set_attr "type" "vecsimple")])
 
+(define_insn "darn_32"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(unspec:SI [(const_int 0)] UNSPEC_DARN_32))]
+  "TARGET_MODULO"
+  {
+ return "darn %0,0";
+  }
+  [(set_attr "type" "integer")])
+
+(define_insn "darn_raw"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(unspec:DI [(const_int 0)] UNSPEC_DARN_RAW))]
+  "TARGET_MODULO && TARGET_64BIT"
+  {
+ return "darn %0,2";
+  }
+  [(set_attr "type" "integer")])
+
+(define_insn "darn"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(unspec:DI [(const_int 0)] UNSPEC_DARN))]
+  "TARGET_MODULO && TARGET_64BIT"
+  {
+ return "darn %0,1";
+  }
+  [(set_attr "type" "integer")])
+
 (define_expand "bcd_"
   [(parallel [(set (reg:CCFP 74)
   (compare:CCFP
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 235884)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -24,6 +24,7 @@
.  */
 
 /* Before including this file, some macros must be defined:
+   RS6000_BUILTIN_0 -- 0 arg builtins
RS6000_BUILTIN_1 -- 1 arg builtins
RS6000_BUILTIN_2 -- 2 arg builtins
RS6000_BUILTIN_3 -- 3 arg builtins
@@ -43,6 +44,10 @@
ATTR

Re: [PATCH, rs6000] Fix pr70963.c test case for older hardware

2016-05-11 Thread Segher Boessenkool
On Wed, May 11, 2016 at 02:45:32PM -0500, Bill Schmidt wrote:
> When fixing PR70963, I introduced a test case that didn’t sufficiently 
> constrain
> the target architecture.  Although the built-ins I was testing are available 
> on any
> platform supporting VSX, the use of vec_all_eq for vector double and vector 
> long
> long requires at least a POWER8 target.  This patch remedies this by ensuring
> the test will not run unless p8vector support is available.
> 
> Verified on powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu.
> Is this ok for trunk?  I will wait a bit longer before deploying to earlier 
> releases…

It is okay, thanks,


Segher


> 2016-05-11  Bill Schmidt  
> 
> * gcc.target/powerpc/pr70963.c: Require at least power8 at both
> compile and run time.


Re: [PATCH] Fix crash with --help=^ (PR driver/71063)

2016-05-11 Thread Jakub Jelinek
On Wed, May 11, 2016 at 10:40:36PM +0200, Marek Polacek wrote:
> We crashed when given --help=^ and Kyrill explained why in the PR
> ().  The following
> seems as good a fix as any, I think.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2016-05-11  Marek Polacek  
> 
>   PR driver/71063
>   * opts.c (common_handle_option): Detect missing argument for --help^.
> 
>   * gcc.dg/opts-7.c: New test.

Ok.  But while touching this, can you fix the formatting around too?
space between * and a, or ++ a, or & exclude_flags?
> 
> diff --git gcc/opts.c gcc/opts.c
> index 0f9431a..71e0779 100644
> --- gcc/opts.c
> +++ gcc/opts.c
> @@ -1640,6 +1640,11 @@ common_handle_option (struct gcc_options *opts,
>   if (* a == '^')
> {
>   ++ a;
> + if (*a == '\0')
> +   {
> + error_at (loc, "missing argument to %qs", "--help=^");
> + break;
> +   }
>   pflags = & exclude_flags;
> }
>   else

Jakub


Re: Simple bitop reassoc in match.pd

2016-05-11 Thread Marc Glisse

On Wed, 11 May 2016, Marc Glisse wrote:

We could also use set_range_info and make simplify_conversion_using_ranges 
use get_range_info instead of get_value_range.


Something like this seems to fix the testcase. I'll try to submit it 
properly, but I don't know when. (I also added the ~X transform to my 
TODO-list)


--
Marc GlisseIndex: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 236122)
+++ gcc/tree-vrp.c  (working copy)
@@ -8940,6 +8940,8 @@
   gassign *newop
= gimple_build_assign (tem, BIT_XOR_EXPR, op0, op1);
   gsi_insert_before (gsi, newop, GSI_SAME_STMT);
+  if (TYPE_PRECISION (TREE_TYPE (tem)) > 1)
+   set_range_info (tem, VR_RANGE, 0, 1);
   gimple_assign_set_rhs_with_ops (gsi, NOP_EXPR, tem);
 }
   /* Or without.  */
@@ -9648,7 +9650,6 @@
 {
   tree innerop, middleop, finaltype;
   gimple *def_stmt;
-  value_range *innervr;
   signop inner_sgn, middle_sgn, final_sgn;
   unsigned inner_prec, middle_prec, final_prec;
   widest_int innermin, innermed, innermax, middlemin, middlemed, middlemax;
@@ -9666,18 +9667,16 @@
   || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (innerop))
 return false;
 
-  /* Get the value-range of the inner operand.  */
-  innervr = get_value_range (innerop);
-  if (innervr->type != VR_RANGE
-  || TREE_CODE (innervr->min) != INTEGER_CST
-  || TREE_CODE (innervr->max) != INTEGER_CST)
+  /* Get the value-range of the inner operand.  Use get_range_info in
+ case innerop was created during substitute-and-fold.  */
+  wide_int imin, imax;
+  if (get_range_info (innerop, , ) != VR_RANGE)
 return false;
+  innermin = widest_int::from (imin, TYPE_SIGN (TREE_TYPE (innerop)));
+  innermax = widest_int::from (imax, TYPE_SIGN (TREE_TYPE (innerop)));
 
   /* Simulate the conversion chain to check if the result is equal if
  the middle conversion is removed.  */
-  innermin = wi::to_widest (innervr->min);
-  innermax = wi::to_widest (innervr->max);
-
   inner_prec = TYPE_PRECISION (TREE_TYPE (innerop));
   middle_prec = TYPE_PRECISION (TREE_TYPE (middleop));
   final_prec = TYPE_PRECISION (finaltype);


[PATCH] Fix crash with --help=^ (PR driver/71063)

2016-05-11 Thread Marek Polacek
We crashed when given --help=^ and Kyrill explained why in the PR
().  The following
seems as good a fix as any, I think.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-11  Marek Polacek  

PR driver/71063
* opts.c (common_handle_option): Detect missing argument for --help^.

* gcc.dg/opts-7.c: New test.

diff --git gcc/opts.c gcc/opts.c
index 0f9431a..71e0779 100644
--- gcc/opts.c
+++ gcc/opts.c
@@ -1640,6 +1640,11 @@ common_handle_option (struct gcc_options *opts,
if (* a == '^')
  {
++ a;
+   if (*a == '\0')
+ {
+   error_at (loc, "missing argument to %qs", "--help=^");
+   break;
+ }
pflags = & exclude_flags;
  }
else
diff --git gcc/testsuite/gcc.dg/opts-7.c gcc/testsuite/gcc.dg/opts-7.c
index e69de29..c54d0b8 100644
--- gcc/testsuite/gcc.dg/opts-7.c
+++ gcc/testsuite/gcc.dg/opts-7.c
@@ -0,0 +1,6 @@
+/* PR driver/71063 */
+/* Test we don't ICE.  */
+/* { dg-do compile } */
+/* { dg-options "--help=^" } */
+
+/* { dg-error "missing argument to" "" { target *-*-* } 0 } */

Marek


Re: [C PATCH] PR43651: add warning for duplicate qualifier

2016-05-11 Thread Mikhail Maltsev
On 05/10/2016 10:51 PM, Joseph Myers wrote:
> On Sat, 9 Apr 2016, Mikhail Maltsev wrote:
> 
>> gcc/c/ChangeLog:
>>
>> 2016-04-08  Mikhail Maltsev  
>>
>> PR c/43651
>> * c-decl.c (declspecs_add_qual): Warn when -Wduplicate-decl-specifier
>> is enabled.
>> * c-errors.c (pedwarn_c90): Return true if warned.
>> * c-tree.h (pedwarn_c90): Change return type to bool.
>> (enum c_declspec_word): Add new enumerator cdw_atomic.
>>
>> gcc/ChangeLog:
>>
>> 2016-04-08  Mikhail Maltsev  
>>
>> PR c/43651
>> * doc/invoke.texi (Wduplicate-decl-specifier): Document new option.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-04-08  Mikhail Maltsev  
>>
>> PR c/43651
>> * gcc.dg/Wduplicate-decl-specifier-c11.c: New test.
>> * gcc.dg/Wduplicate-decl-specifier.c: Likewise.
>>
>>
>> gcc/c-family/ChangeLog:
>>
>> 2016-04-08  Mikhail Maltsev  
>>
>> PR c/43651
>> * c.opt (Wduplicate-decl-specifier): New option.
> 
> OK.
> 

Committed as r236142.

-- 
Regards,
Mikhail Maltsev


[PATCH, rs6000] Fix pr70963.c test case for older hardware

2016-05-11 Thread Bill Schmidt
Hi,

When fixing PR70963, I introduced a test case that didn’t sufficiently constrain
the target architecture.  Although the built-ins I was testing are available on 
any
platform supporting VSX, the use of vec_all_eq for vector double and vector long
long requires at least a POWER8 target.  This patch remedies this by ensuring
the test will not run unless p8vector support is available.

Verified on powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu.
Is this ok for trunk?  I will wait a bit longer before deploying to earlier 
releases…

Thanks,
Bill


2016-05-11  Bill Schmidt  

* gcc.target/powerpc/pr70963.c: Require at least power8 at both
compile and run time.


Index: gcc/testsuite/gcc.target/powerpc/pr70963.c
===
--- gcc/testsuite/gcc.target/powerpc/pr70963.c  (revision 236082)
+++ gcc/testsuite/gcc.target/powerpc/pr70963.c  (working copy)
@@ -1,7 +1,8 @@
-/* { dg-do run { target { powerpc64*-*-* && vsx_hw } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-do run { target powerpc64*-*-* } } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-options "-maltivec" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8" } */

#include 
#include 


[PATCH, testsuite]: Fix gcc.target/i386/sse-13.c failure with -fpic

2016-05-11 Thread Uros Bizjak
Hello!

A couple of testcases are missing dg-add-options bind_pic_locally
directive. This is needed to avoid compilation failure with
__always_inline__ attribute when "extern" and "__inline" are defined
away.

2016-05-11  Uros Bizjak  

* gcc.target/i386/sse-13.c: Add dg-add-options bind_pic_locally
directive.
* gcc.target/i386/pr66746.c: Ditto.

Tested on x86_64-linux-gnu and committed to mainline SVN.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr66746.c 
b/gcc/testsuite/gcc.target/i386/pr66746.c
index 3ef77bf..d7f6699 100644
--- a/gcc/testsuite/gcc.target/i386/pr66746.c
+++ b/gcc/testsuite/gcc.target/i386/pr66746.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target ia32 } } */
 /* { dg-options "-O2 -miamcu" } */
+/* { dg-add-options bind_pic_locally } */
 
 /* Defining away "extern" and "__inline" results in all of them being
compiled as proper functions.  */
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c 
b/gcc/testsuite/gcc.target/i386/sse-13.c
index 1144e5d..5562fbc 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a 
-m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi 
-mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw 
-madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha 
-mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw 
-mavx512vbmi -mavx512ifma -mclwb -mpcommit -mmwaitx -mclzero -mpku" } */
+/* { dg-add-options bind_pic_locally } */
 
 #include 
 


[PATCH, i386]: Use copy_to_suggested_reg in legitimize_pic_address some more

2016-05-11 Thread Uros Bizjak
Hello!

Patched gcc creates exactly the same RTL, so the comment does not
apply anymore. Not to mention that gen_movsi should not operate on
DImode values ...

2016-05-11  Uros Bizjak  

* config/i386/i386.c (legitimize_pic_address): Use
copy_to_suggested_reg instead of gen_movsi.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: i386.c
===
--- i386.c  (revision 236134)
+++ i386.c  (working copy)
@@ -15474,8 +15474,6 @@ legitimize_pic_address (rtx orig, rtx reg)
{
  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), UNSPEC_PCREL);
  new_rtx = gen_rtx_CONST (Pmode, new_rtx);
-
- new_rtx = copy_to_suggested_reg (new_rtx, reg, Pmode);
}
   else if (TARGET_64BIT && ix86_cmodel != CM_LARGE_PIC)
{
@@ -15484,14 +15482,6 @@ legitimize_pic_address (rtx orig, rtx reg)
  new_rtx = gen_rtx_CONST (Pmode, new_rtx);
  new_rtx = gen_const_mem (Pmode, new_rtx);
  set_mem_alias_set (new_rtx, ix86_GOT_alias_set ());
-
- if (reg == 0)
-   reg = gen_reg_rtx (Pmode);
- /* Use directly gen_movsi, otherwise the address is loaded
-into register for CSE.  We don't want to CSE this addresses,
-instead we CSE addresses from the GOT table, so skip this.  */
- emit_insn (gen_movsi (reg, new_rtx));
- new_rtx = reg;
}
   else
{
@@ -15504,9 +15494,9 @@ legitimize_pic_address (rtx orig, rtx reg)
  new_rtx = gen_rtx_PLUS (Pmode, pic_offset_table_rtx, new_rtx);
  new_rtx = gen_const_mem (Pmode, new_rtx);
  set_mem_alias_set (new_rtx, ix86_GOT_alias_set ());
+   }
 
- new_rtx = copy_to_suggested_reg (new_rtx, reg, Pmode);
-   }
+  new_rtx = copy_to_suggested_reg (new_rtx, reg, Pmode);
 }
   else
 {


Re: Simple bitop reassoc in match.pd

2016-05-11 Thread Marc Glisse

On Wed, 11 May 2016, Jeff Law wrote:


We could also simplify (int)(_Bool)x to x using VRP information that x
is in [0, 1], but apparently when VRP replaces x==0 with y=x^1,(_Bool)y,
it does not compute a range for the new variable y, and by the time the
next VRP pass comes, it is too late.

Seems like a clear oversight.


In get_value_range, there is:
  /* If we query the range for a new SSA name return an unmodifiable VARYING.
 We should get here at most from the substitute-and-fold stage which
 will never try to change values.  */
so this is a known limitation.

We could try to change that (XRESIZEVEC, memset(0) on the new elements, 
update num_vr_values to the new num_ssa_names, at this point vr_value 
should be replaced with a vector).


We could also use set_range_info and make simplify_conversion_using_ranges 
use get_range_info instead of get_value_range. Might even move the whole 
function to match.pd then ;-)


--
Marc Glisse


PATCH: PR target/70738: Add -mgeneral-regs-only option

2016-05-11 Thread H.J. Lu
On Tue, May 10, 2016 at 1:02 PM, Sandra Loosemore
 wrote:
> On 04/20/2016 07:42 AM, Koval, Julia wrote:
>>
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index a5a8b23..82de5bf 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -5263,6 +5263,83 @@ On x86-32 targets, the @code{stdcall} attribute
>> causes the compiler to
>>  assume that the called function pops off the stack space used to
>>  pass arguments, unless it takes a variable number of arguments.
>>
>> +@item no_caller_saved_registers
>> +@cindex @code{no_caller_saved_registers} function attribute, x86
>> +Use this attribute to indicate that the specified function has no
>> +caller-saved registers.  That is, all registers are callee-saved.
>> +The compiler generates proper function entry and exit sequences to
>> +save and restore any modified registers, except for the EFLAGS
>> +register.  If the compiler generates MPX, SSE, MMX or x87 instructions
>> +in a function with @code{no_caller_saved_registers} attribute or
>> +functions called from a function with @code{no_caller_saved_registers}
>> +attribute may contain MPX, SSE, MMX or x87 instructions, the compiler
>> +must save and restore the corresponding state.
>
>
> I cannot parse the last sentence in this paragraph.  How can the compiler
> know whether called functions may contain those instructions? Plus, talking
> about what the compiler must do seems too implementor-speaky for user
> documentation.  Maybe you mean something like "The compiler also saves and
> restores state associated with MPX, SSE, MMX, and x87 instructions." ?
>
> I also think the documentation needs to give some hint about why a user
> would want to put this attribute on a function.
>
>> +
>> +@item interrupt
>> +@cindex @code{interrupt} function attribute, x86
>> +Use this attribute to indicate that the specified function is an
>> +interrupt handler.  The compiler generates function
>> +entry and exit sequences suitable for use in an interrupt handler when
>> +this attribute is present.  The @code{IRET} instruction, instead of the
>> +@code{RET} instruction, is used to return from interrupt handlers.  All
>> +registers, except for the EFLAGS register which is restored by the
>> +@code{IRET} instruction, are preserved by the compiler.  If the
>> +compiler generates MPX, SSE, MMX or x87 instructions in an interrupt
>> +handler, or functions called from an interrupt handler may contain MPX,
>> +SSE, MMX or x87 instructions, the compiler must save and restore the
>> +corresponding state.
>
>
> Similar problems here.
>
> From the further discussion that follows, it appears that you can use the
> "interrupt" attribute on exception handlers as well, but the paragraph above
> only mentions interrupt handlers.
>
>> +
>> +Since the direction flag in the FLAGS register in interrupt handlers
>> +is undetermined, cld instruction must be emitted in function prologue
>> +if rep string instructions are used in interrupt handler or interrupt
>> +handler isn't a leaf function.
>
>
> Again, this sounds like implementor-speak, and there are grammatical errors
> (noun/verb disagreement, missing articles).  Do users of this attribute need
> to know what instructions the compiler is emitting?  We already say above
> that it causes GCC to generate suitable entry and exit sequences.
>

It was done on purpose since this document is also served as
the spec for compiler implementers.  Here is a patch to add
-mgeneral-regs-only option to x86 backend.   We can update
spec for interrupt handle to recommend compiling interrupt handler
with -mgeneral-regs-only option and add a note for compiler
implementers.

OK for trunk if there is no regression?

-- 
H.J.
From d6f72979f077064c59ac83c8fe5e738ade732e96 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 11 May 2016 09:49:33 -0700
Subject: [PATCH] Add -mgeneral-regs-only option

X86 Linux kernel is compiled only with integer instructions.  Currently,

-mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -mno-80387
-mno-fp-ret-in-387  -mskip-rax-setup

is used to compile kernel.  If we add another non-integer feature, it
has to be turned off.  We can add a -mgeneral-regs-only option, similar
to AArch64, to disable all non-integer features so that kernel doesn't
need a long list and the same option will work for future compilers.
It can also be used to compile interrupt handler.

gcc/

	PR target/70738
	* common/config/i386/i386-common.c
	(OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET): New.
	(ix86_handle_option): Disable MPX, MMX, SSE and x87 instructions
	for -mgeneral-regs-only.
	* config/i386/i386.c (ix86_option_override_internal): Don't
	enable x87 instructions if only the general registers are
	allowed.
	* config/i386/i386.opt: Add -mgeneral-regs-only.
	* doc/invoke.texi: Document -mgeneral-regs-only.

gcc/testsuite/

	PR target/70738
	* gcc.target/i386/pr70738-1.c: Likewise.
	* gcc.target/i386/pr70738-2.c: Likewise.
	* 

Re: [C/C++ PATCH] Missing warning for contradictory attributes (PR c++/71024)

2016-05-11 Thread Bernd Schmidt

On 05/11/2016 06:38 PM, Marek Polacek wrote:

I checked and we don't have a test testing all the contradicting attributes
as in the new test.  But with that added, we can do the following...

Tested on x86_64-linux, ok for trunk?

2016-05-11  Marek Polacek  

* gcc.dg/attr-opt-1.c: Move to c-c++-common/.
* gcc.dg/pr18079-2.c: Remove file.


Ok.


Bernd



Re: [C/C++ PATCH] Missing warning for contradictory attributes (PR c++/71024)

2016-05-11 Thread Marek Polacek
On Wed, May 11, 2016 at 04:56:05PM +0200, Bernd Schmidt wrote:
> On 05/11/2016 03:59 PM, Marek Polacek wrote:
> > The C++ FE was missing diagnostics e.g. when an "always_inline" on DECL was
> > followed by DECL with the "noinline" attribute.  The C FE already has code
> > dealing with this, so I factored it out to a common function.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> Ok, but could you first look whether there are existing testcases that could
> be moved to c-c++-common from gcc.dg instead of adding a new one?

I checked and we don't have a test testing all the contradicting attributes
as in the new test.  But with that added, we can do the following...

Tested on x86_64-linux, ok for trunk?

2016-05-11  Marek Polacek  

* gcc.dg/attr-opt-1.c: Move to c-c++-common/.
* gcc.dg/pr18079-2.c: Remove file.

diff --git gcc/testsuite/c-c++-common/attr-opt-1.c 
gcc/testsuite/c-c++-common/attr-opt-1.c
index e69de29..fc9b4a6 100644
--- gcc/testsuite/c-c++-common/attr-opt-1.c
+++ gcc/testsuite/c-c++-common/attr-opt-1.c
@@ -0,0 +1,37 @@
+/* PR c/70255 */
+/* { dg-do compile } */
+
+double
+fn1 (double h, double l) /* { dg-message "previous definition" } */
+{
+  return h + l;
+}
+double fn1 (double, double) __attribute__ ((optimize 
("no-associative-math"))); /* { dg-warning "optimization attribute" } */
+
+__attribute__ ((optimize ("no-associative-math"))) double
+fn2 (double h, double l)
+{
+  return h + l;
+}
+double fn2 (double, double) __attribute__ ((optimize ("no-associative-math")));
+
+__attribute__ ((optimize ("no-associative-math"))) double
+fn3 (double h, double l) /* { dg-message "previous definition" } */
+{
+  return h + l;
+}
+double fn3 (double, double) __attribute__ ((optimize 
("O2,no-associative-math"))); /* { dg-warning "optimization attribute" } */
+
+__attribute__ ((optimize ("no-associative-math,O2"))) double
+fn4 (double h, double l) /* { dg-message "previous definition" } */
+{
+  return h + l;
+}
+double fn4 (double, double) __attribute__ ((optimize 
("O2,no-associative-math"))); /* { dg-warning "optimization attribute" } */
+
+__attribute__ ((optimize ("no-reciprocal-math"), optimize 
("no-associative-math"))) int
+fn5 (void)
+{
+  return 0;
+}
+int __attribute__ ((optimize ("no-associative-math"), optimize 
("no-reciprocal-math"))) fn5 (void);
diff --git gcc/testsuite/gcc.dg/attr-opt-1.c gcc/testsuite/gcc.dg/attr-opt-1.c
deleted file mode 100644
index 4140fda..000
--- gcc/testsuite/gcc.dg/attr-opt-1.c
+++ /dev/null
@@ -1,37 +0,0 @@
-/* PR c/70255 */
-/* { dg-do compile } */
-
-double
-fn1 (double h, double l) /* { dg-message "previous definition" } */
-{
-  return h + l;
-}
-double fn1 (double, double) __attribute__ ((optimize 
("no-associative-math"))); /* { dg-warning "optimization attribute on .fn1. 
follows definition" } */
-
-__attribute__ ((optimize ("no-associative-math"))) double
-fn2 (double h, double l)
-{
-  return h + l;
-}
-double fn2 (double, double) __attribute__ ((optimize ("no-associative-math")));
-
-__attribute__ ((optimize ("no-associative-math"))) double
-fn3 (double h, double l) /* { dg-message "previous definition" } */
-{
-  return h + l;
-}
-double fn3 (double, double) __attribute__ ((optimize 
("O2,no-associative-math"))); /* { dg-warning "optimization attribute on .fn3. 
follows definition" } */
-
-__attribute__ ((optimize ("no-associative-math,O2"))) double
-fn4 (double h, double l) /* { dg-message "previous definition" } */
-{
-  return h + l;
-}
-double fn4 (double, double) __attribute__ ((optimize 
("O2,no-associative-math"))); /* { dg-warning "optimization attribute on .fn4. 
follows definition" } */
-
-__attribute__ ((optimize ("no-reciprocal-math"), optimize 
("no-associative-math"))) int
-fn5 (void)
-{
-  return 0;
-}
-int __attribute__ ((optimize ("no-associative-math"), optimize 
("no-reciprocal-math"))) fn5 (void);
diff --git gcc/testsuite/gcc.dg/pr18079-2.c gcc/testsuite/gcc.dg/pr18079-2.c
deleted file mode 100644
index 2c83b70..000
--- gcc/testsuite/gcc.dg/pr18079-2.c
+++ /dev/null
@@ -1,16 +0,0 @@
-/* PR c/18079 */
-/* { dg-do compile } */
-/* { dg-options "-Wall" } */
-
-__attribute__ ((always_inline)) void fndecl1 (void);
-__attribute__ ((noinline)) void fndecl1 (void); /* { dg-warning "attribute 
'noinline' follows declaration with attribute 'always_inline'" } */
-
-__attribute__ ((noinline)) void fndecl2 (void);
-__attribute__ ((always_inline)) void fndecl2 (void); /* { dg-warning 
"attribute 'always_inline' follows declaration with attribute 'noinline'" } */
-
-
-__attribute__ ((hot)) void fndecl3 (void);
-__attribute__ ((cold)) void fndecl3 (void); /* { dg-warning "attribute 'cold' 
follows declaration with attribute 'hot'" } */
-
-__attribute__ ((cold)) void fndecl4 (void);
-__attribute__ ((hot)) void fndecl4 (void); /* { dg-warning "attribute 'hot' 
follows declaration with attribute 'cold'" } */

Marek


Re: Simple bitop reassoc in match.pd

2016-05-11 Thread Jeff Law

On 05/11/2016 10:17 AM, Marc Glisse wrote:

The transformation seems right to me, but we are then missing another
transformation like ~X & Y -> X ^ Y when we know that X & ~Y is 0 (the 1
bits of X are included in those of Y). That may not be easy with the
limited bit tracking we have. A version limited to constant Y of the
form 2^n-1 (i.e. 0...01...1) where X has a range included in [0, Y] may
be simpler.
While we don't have strong bit tracking, we do track [0..1] ranges 
reasonably well, so it may be worth doing.



We could also simplify (int)(_Bool)x to x using VRP information that x
is in [0, 1], but apparently when VRP replaces x==0 with y=x^1,(_Bool)y,
it does not compute a range for the new variable y, and by the time the
next VRP pass comes, it is too late.

Seems like a clear oversight.

jeff



Re: Simple bitop reassoc in match.pd (was: Canonicalize X u< X to UNORDERED_EXPR)

2016-05-11 Thread Marc Glisse

On Wed, 11 May 2016, H.J. Lu wrote:


* fold-const.c (fold_binary_loc) [(X ^ Y) & Y]: Remove and merge with...
* match.pd ((X & Y) ^ Y): ... this.



It caused:

FAIL: gcc.dg/tree-ssa/vrp47.c scan-tree-dump-times vrp2 " & 1;" 0

on x86.


Ah, yes, logical_op_short_circuit so I didn't test it. Hmm, doesn't seem 
so easy. We want to compute (int)(x==0) in a branch where x is known to be 
in [0, 1]. VRP1 gives (int)(_Bool)(x^1). forwprop3 handles the double 
conversion, which gives (x^1)&1. Here the new transform applies and gives 
(~x)&1. VRP2 comes, and with (x^1)&1 it used to notice that this is just 
x^1 (remember that x is in [0, 1] in this branch), while it doesn't know 
what to do with (~x)&1. In the end, we get

notl%eax
andl$1, %eax
instead of
xorl$1, %eax

(andn doesn't seem to have a version with an immediate)

The transformation seems right to me, but we are then missing another 
transformation like ~X & Y -> X ^ Y when we know that X & ~Y is 0 (the 1 
bits of X are included in those of Y). That may not be easy with the 
limited bit tracking we have. A version limited to constant Y of the form 
2^n-1 (i.e. 0...01...1) where X has a range included in [0, Y] may be 
simpler.


We could also simplify (int)(_Bool)x to x using VRP information that x is 
in [0, 1], but apparently when VRP replaces x==0 with y=x^1,(_Bool)y, it 
does not compute a range for the new variable y, and by the time the next 
VRP pass comes, it is too late.


In the mean time, I propose xfailing this test...

--
Marc Glisse


Re: libgomp: Make GCC 5 OpenACC offloading executables work

2016-05-11 Thread Thomas Schwinge
Hi!

On Wed, 11 May 2016 11:38:39 -0400, Nathan Sidwell  wrote:
> On 05/11/16 10:22, Bernd Schmidt wrote:
> > On 05/11/2016 03:46 PM, Thomas Schwinge wrote:
> >>> What we now got, doesn't work, for several reasons.  GCC 5 OpenACC
> >>> offloading executables will just run into SIGSEGV.
> >
> > I'm tempted to say, let's just wait until someone actually reports that in
> > bugzilla. Offloading in gcc-5 was broken enough that I expect no one was
> > actually using it. There's really very little point in carrying 
> > compatibility
> > crud around.
> 
> I agree.  This would simply be enabling a poorly performing binary, rather 
> than 
> encouraging a shiny new one.

I conceptually agree to that.  (If we're serious about that, then we can
remove more code, such as the legacy libgomp entry point itself -- a
"missing symbol: [...]" is still vaguely better than a SIGSEGV.)  Yet,
what I fixed here, is just what Jakub and Nathan agreed upon in
:
"GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback".
Currently such code will always result in a SIGSEGV, which the patch
fixes.  (And, given that we now have this patch, it seems "unfair" to
"wait until someone actually reports that in bugzilla".)  Another option,
instead of having such legacy entry pointw do host fallback, is to
instead call gomp_fatal with a message like "re-compile your code with a
newer GCC version".


Grüße
 Thomas


Re: [PATCH, DOC] Document ASAN_OPTIONS="halt_on_error" env variable.

2016-05-11 Thread Sandra Loosemore

On 05/11/2016 08:56 AM, Jakub Jelinek wrote:

On Wed, May 11, 2016 at 04:47:46PM +0200, Martin Liška wrote:

Thank you Jakub for the note. What about the second version of the patch?

Thanks,
Martin



>From da688c187067dc5c475a4ab5b844c11c4bcd0494 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 11 May 2016 16:05:49 +0200
Subject: [PATCH] Document ASAN_OPTIONS="halt_on_error" env variable.

gcc/ChangeLog:

2016-05-11  Martin Liska  

* doc/invoke.texi: Explain connection between -fsanitize-recover=address
and ASAN_OPTIONS="halt_on_error=1".
---
  gcc/doc/invoke.texi | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a54a0af..282367d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9998,6 +9998,12 @@ for which this feature is experimental.
  accepted, the former enables recovery for all sanitizers that support it,
  the latter disables recovery for all sanitizers that support it.

+Even if a recovery mode is turned on, it can be overridden by providing
+@code{halt_on_error=1} to one of the following environment variables:
+@env{ASAN_OPTIONS}, @env{TSAN_OPTIONS}, @env{UBSAN_OPTIONS}.
+The default value is set to @code{halt_on_error=0} for all options,
+except @env{ASAN_OPTIONS}.
+


I think it better should say that:
Even if a recovery mode is turned on the compiler side, it needs to be also
enabled on the runtime library side, otherwise the failures are still fatal.
The runtime library defaults to ... and this can be overridden through ...
or so.


Yes, please.  I cannot understand either of the first two versions of 
the patch.


-Sandra



Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-11 Thread Ilya Verbin
On Wed, May 11, 2016 at 10:47:49 +0100, Ramana Radhakrishnan wrote:
> 
> > I've looked at the generated code in more details, and for armv6 this 
> > generates
> > mcr p15, 0, r0, c7, c10, 5
> > which is not what __cilkrts_fence uses currently (CP15DSB vs CP15DMB)
> 
> Wow I hadn't noticed that it was a DSB -  DSB is way too heavy weight. 
> Userland shouldn't need to use this by default IMNSHO. It's needed if you are 
> working on non-cacheable memory or performing cache maintenance operations 
> but I can't imagine cilkplus wanting to do that ! 
> 
> http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf
> 
> It's almost like the default definitions need to be in terms of the atomic 
> extensions rather than having these written in this form. Folks usually get 
> this wrong ! 
> 
> > Looking at arm/sync.md it seems that there is no way to generate CP15DSB.
> 
> No - there is no way of generating DSB,  DMB's should be sufficient for this 
> purpose. Would anyone know what the semantics of __cilkrts_fence are that 
> require this to be a DSB ? 

__cilkrts_fence semantics is identical to __sync_synchronize, so DMB look OK.

Maybe we should just define:
  #define __cilkrts_fence() __sync_synchronize()
?

  -- Ilya

> Ramana
> 
> > 
> >> Christophe
> >>
> >>> Thanks,
> >>>   -- Ilya


[PTX] more test markup

2016-05-11 Thread Nathan Sidwell
I've applied this patch to markup some more tests on what they require.  Two 
crash the PTX assembler, so skip those.


nathan
2016-05-11  Nathan Sidwell  

	* gcc.dg/pr68671.c: Xfail on PTX -- assembler crash.
	* gcc.c-torture/execute/pr68185.c: Likewise.
	* gcc.dg/ipa/pr70306.c: Requires global constructors.
	* gcc.dg/pr69634.c: Requires scheduling.
	* gcc.dg/torture/pr66178.c: Require label values.
	* gcc.dg/setjmp-6.c: Require indirect jumps.

Index: gcc.dg/pr68671.c
===
--- gcc.dg/pr68671.c	(revision 236101)
+++ gcc.dg/pr68671.c	(working copy)
@@ -1,6 +1,7 @@
 /* PR tree-optimization/68671 */
 /* { dg-do run } */
 /* { dg-options " -O2 -fno-tree-dce" } */
+/* { dg-xfail-if "ptxas crashes" { nvptx-*-* } { "" } { "" } } */
 
 volatile int a = -1;
 volatile int b;
Index: gcc.dg/ipa/pr70306.c
===
--- gcc.dg/ipa/pr70306.c	(revision 236101)
+++ gcc.dg/ipa/pr70306.c	(working copy)
@@ -1,5 +1,6 @@
 /* { dg-options "-O2 -fdump-ipa-icf" } */
 /* { dg-do run } */
+/* { dg-require-effective-target global_constructor } */
 
 int ctor_counter = 1;
 int dtor_counter;
Index: gcc.dg/pr69634.c
===
--- gcc.dg/pr69634.c	(revision 236101)
+++ gcc.dg/pr69634.c	(working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fno-dce -fschedule-insns -fno-tree-vrp -fcompare-debug" } */
 /* { dg-additional-options "-Wno-psabi -mno-sse" { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-effective-target scheduling } */
 
 typedef unsigned short u16;
 typedef short v16u16 __attribute__ ((vector_size (16)));
Index: gcc.dg/torture/pr66178.c
===
--- gcc.dg/torture/pr66178.c	(revision 236101)
+++ gcc.dg/torture/pr66178.c	(working copy)
@@ -1,4 +1,6 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target label_values } */
+
 int test(void)
 {
 static int a =  ((char *)&(char *)&)-1;
Index: gcc.dg/setjmp-6.c
===
--- gcc.dg/setjmp-6.c	(revision 236101)
+++ gcc.dg/setjmp-6.c	(working copy)
@@ -1,6 +1,7 @@
 /* PR69569 */
 /* { dg-do compile } */
 /* { dg-options "-O3" } */
+/* { dg-require-effective-target indirect_jumps } */
 
 #include 
 
Index: gcc.c-torture/execute/pr68185.c
===
--- gcc.c-torture/execute/pr68185.c	(revision 236101)
+++ gcc.c-torture/execute/pr68185.c	(working copy)
@@ -1,3 +1,5 @@
+/* { dg-xfail-if "ptxas crashes" { nvptx-*-* } { "-O0" } { "" } } */
+
 int a, b, d = 1, e, f, o, u, w = 1, z;
 short c, q, t;
 


Re: Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-05-11 Thread Joseph Myers
On Wed, 11 May 2016, Tejas Belagod wrote:

> AFAICS, I don't think it mandates a double-rounding behavior for double to
> __fp16 conversions and I don't see a change in stand between the two versions
> of ACLE on the behavior of __fp16.

It's not a change between the two versions of ACLE.  It's a change 
relative to the early (pre-ACLE) __fp16 specification (or, at least, a 
clarification thereto in email on 12 Aug 2008) that was used as a basis 
for the original implementation of __fp16 in GCC (and that thus is what's 
currently implemented by GCC and tested for in the testsuite).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Respect --param ipa-max-agg-items=0

2016-05-11 Thread Martin Jambor
Hi,

when analyzing PR 70646, I found out that --param ipa-max-agg-items=0
does not prevent creation of aggregate jump functions because it is
checked only after the first such jump function is created.  The
following patch fixes that by checking the parameter before starting
the whole analysis.

Bootstrapped and lto-bootstrapped on x86_64-linux.  OK for trunk?  OK
for all active release branches?

Thanks,

Martin


2016-04-18  Martin Jambor  

PR ipa/70646
* ipa-prop.c (determine_locally_known_aggregate_parts): Bail out early
if parameter PARAM_IPA_MAX_AGG_ITEMS is zero.
---
 gcc/ipa-prop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 65482ba..f02ec47 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -1414,6 +1414,9 @@ determine_locally_known_aggregate_parts (gcall *call, 
tree arg,
   bool check_ref, by_ref;
   ao_ref r;
 
+  if (PARAM_VALUE (PARAM_IPA_MAX_AGG_ITEMS) == 0)
+return;
+
   /* The function operates in three stages.  First, we prepare check_ref, r,
  arg_base and arg_offset based on what is actually passed as an actual
  argument.  */
-- 
2.8.2



[PR 70646] Store size to inlining predicate conditions

2016-05-11 Thread Martin Jambor
Hi,

PR 70646 takes place because inlining predicate evaluation does not
check for memory access size when evaluating IS_NOT_CONSTANT
conditions.  This means that a smaller stored constant is evaluated as
disproving an IS_NOT_CONSTANT condition even though that was meant for
a larger memory read.  Inlining in turn made the conclusion that a
branch can never be taken and turned all calls there into
__builtin_unreachables, eliminating the branch in practice.  However,
intraprocedural folding did not evaluate the __builtin_constant as
true and decided to execute the removed branch.

The fix below stores the intended size into each condition and then
compares it with the size of the actual constant that is propagated
from a caller, with the exception of CHANGED conditions, which are
evaluated to false when error_mark_node is propagated instead of a
constant and which has no size.  But that is OK because CHANGED is
used only for inlining heuristics and in principle cannot be used to
optimize out anything completely.

As noted in bugzilla, I use singed HOST_WIDE_INT for the size so that
it is consistent with what get_ref_base_and_extent uses to return
size, which is what we use to get it for aggregate values.

Bootstrapped, lto-bootstrapped and tested on x86_64-linux without any
issues.  OK for trunk and all active release branches?  (4.9 still is
active, right?)

Thanks,

Martin


2016-04-20  Martin Jambor  

PR ipa/70646
* ipa-inline.h (condition): New field size.
* ipa-inline-analysis.c (add_condition): New parameter SIZE, use it
for comaprison and store it into the new condition.
(evaluate_conditions_for_known_args): Use condition size to check
access sizes for all but CHANGED conditions.
(unmodified_parm_1): New parameter size_p, store access size into it.
(unmodified_parm): Likewise.
(unmodified_parm_or_parm_agg_item): Likewise.
(eliminated_by_inlining_prob): Pass NULL to unmodified_parm as size_p.
(set_cond_stmt_execution_predicate): Extract access sizes and store
them to conditions.
(set_switch_stmt_execution_predicate): Likewise.
(will_be_nonconstant_expr_predicate): Likewise.
(will_be_nonconstant_predicate): Likewise.
(inline_read_section): Stream condition size.
(inline_write_summary): Likewise.

testsuite/
*gcc.dg/ipa/pr70646.c: New test.
---
 gcc/ipa-inline-analysis.c  | 132 +++--
 gcc/ipa-inline.h   |   2 +
 gcc/testsuite/gcc.dg/ipa/pr70646.c |  40 +++
 3 files changed, 123 insertions(+), 51 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr70646.c

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 17b21d1..68824f7 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -216,13 +216,14 @@ struct agg_position_info
   bool by_ref;
 };
 
-/* Add condition to condition list CONDS.  AGGPOS describes whether the used
-   oprand is loaded from an aggregate and where in the aggregate it is.  It can
-   be NULL, which means this not a load from an aggregate.  */
+/* Add condition to condition list SUMMARY. OPERAND_NUM, SIZE, CODE and VAL
+   correspond to fields of condition structure.  AGGPOS describes whether the
+   used operand is loaded from an aggregate and where in the aggregate it is.
+   It can be NULL, which means this not a load from an aggregate.  */
 
 static struct predicate
 add_condition (struct inline_summary *summary, int operand_num,
-  struct agg_position_info *aggpos,
+  HOST_WIDE_INT size, struct agg_position_info *aggpos,
   enum tree_code code, tree val)
 {
   int i;
@@ -248,6 +249,7 @@ add_condition (struct inline_summary *summary, int 
operand_num,
   for (i = 0; vec_safe_iterate (summary->conds, i, ); i++)
 {
   if (c->operand_num == operand_num
+ && c->size == size
  && c->code == code
  && c->val == val
  && c->agg_contents == agg_contents
@@ -264,6 +266,7 @@ add_condition (struct inline_summary *summary, int 
operand_num,
   new_cond.agg_contents = agg_contents;
   new_cond.by_ref = by_ref;
   new_cond.offset = offset;
+  new_cond.size = size;
   vec_safe_push (summary->conds, new_cond);
   return single_cond_predicate (i + predicate_first_dynamic_condition);
 }
@@ -867,21 +870,25 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
  clause |= 1 << (i + predicate_first_dynamic_condition);
  continue;
}
-  if (c->code == IS_NOT_CONSTANT || c->code == CHANGED)
+  if (c->code == CHANGED)
continue;
 
-  if (operand_equal_p (TYPE_SIZE (TREE_TYPE (c->val)),
-  TYPE_SIZE (TREE_TYPE (val)), 0))
+  if (tree_to_shwi (TYPE_SIZE (TREE_TYPE (val))) != c->size)
{
- val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c->val), val);
+ clause 

Re: libgomp: Make GCC 5 OpenACC offloading executables work

2016-05-11 Thread Nathan Sidwell

On 05/11/16 10:22, Bernd Schmidt wrote:

On 05/11/2016 03:46 PM, Thomas Schwinge wrote:

What we now got, doesn't work, for several reasons.  GCC 5 OpenACC
offloading executables will just run into SIGSEGV.


I'm tempted to say, let's just wait until someone actually reports that in
bugzilla. Offloading in gcc-5 was broken enough that I expect no one was
actually using it. There's really very little point in carrying compatibility
crud around.


I agree.  This would simply be enabling a poorly performing binary, rather than 
encouraging a shiny new one.



nathan


Re: [PATCH, DOC] Document ASAN_OPTIONS="halt_on_error" env variable.

2016-05-11 Thread Jakub Jelinek
On Wed, May 11, 2016 at 04:47:46PM +0200, Martin Liška wrote:
> Thank you Jakub for the note. What about the second version of the patch?
> 
> Thanks,
> Martin

> >From da688c187067dc5c475a4ab5b844c11c4bcd0494 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 11 May 2016 16:05:49 +0200
> Subject: [PATCH] Document ASAN_OPTIONS="halt_on_error" env variable.
> 
> gcc/ChangeLog:
> 
> 2016-05-11  Martin Liska  
> 
>   * doc/invoke.texi: Explain connection between -fsanitize-recover=address
>   and ASAN_OPTIONS="halt_on_error=1".
> ---
>  gcc/doc/invoke.texi | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a54a0af..282367d 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -9998,6 +9998,12 @@ for which this feature is experimental.
>  accepted, the former enables recovery for all sanitizers that support it,
>  the latter disables recovery for all sanitizers that support it.
>  
> +Even if a recovery mode is turned on, it can be overridden by providing
> +@code{halt_on_error=1} to one of the following environment variables:
> +@env{ASAN_OPTIONS}, @env{TSAN_OPTIONS}, @env{UBSAN_OPTIONS}.
> +The default value is set to @code{halt_on_error=0} for all options,
> +except @env{ASAN_OPTIONS}.
> +

I think it better should say that:
Even if a recovery mode is turned on the compiler side, it needs to be also
enabled on the runtime library side, otherwise the failures are still fatal.
The runtime library defaults to ... and this can be overridden through ...
or so.

Jakub


Re: Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-05-11 Thread Tejas Belagod

On 28/04/16 16:49, Joseph Myers wrote:

On Thu, 28 Apr 2016, Matthew Wahab wrote:


Hello,

The ARM target supports the half-precision floating point type __fp16
but does not allow its use as a function return or parameter type. This
patch removes that restriction and defines the ACLE macro
__ARM_FP16_ARGS to indicate this. The code generated for passing __fp16
values into and out of functions depends on the level of hardware
support but conforms to the AAPCS (see
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf).


The sole use of the TARGET_INVALID_PARAMETER_TYPE and
TARGET_INVALID_RETURN_TYPE hooks was to disallow __fp16 use as a function
return or parameter type.  Thus, I think this patch should completely
remove those hooks and poison them in system.h.

This patch addresses one incompatibility of the original __fp16
specification with the more recent ACLE specification and the
specification in ISO/IEC TS 18661-3 for how such types should work.
Another such incompatibility is the peculiar rule in the original
specification that conversions from double to __fp16 go via float, with
double rounding.  Do you have plans to eliminate that and move to the
single-rounding semantics that are in current specifications?



http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

Section 4.1.2 states that double to fp16 should round only once and it only 
suggests that it is done via a two step hardware instruction rather than an 
emulation library if speed is the priority as pre-ARMv8 architectures do not 
support this in hardware.


http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1.pdf

updates this paragraph to reflect ARMv8 hardware feature, but still maintains 
the suggestion of using two-step hardware instruction rather than emulation 
library if speed is priority for pre-ARMv8 architectures.


AFAICS, I don't think it mandates a double-rounding behavior for double to 
__fp16 conversions and I don't see a change in stand between the two versions of 
ACLE on the behavior of __fp16.


We could improve the ACLE spec to include a caveat that a two-step reduction 
could introduce a loss in precision which could result in incompatibility with 
ARMv8 architectures.



I note that that AAPCS revision says for __fp16, in 7.1.1 Arithmetic
Types, "In a variadic function call this will be passed as a
double-precision value.".  I haven't checked what this patch implements,
but that could be problematic, and different from what's said under 7.2,
"For variadic functions, float arguments that match the ellipsis (...) are
converted to type double.".

In TS 18661-3, _Float16 is *not* affected by default argument promotions;
only float is.  This reflects how the default conversion of float to
double is a legacy feature; note for example how in C99 and C11 float
_Imaginary is not promoted to double _Imaginary, and float _Complex is not
promoted to double _Complex.

Thus it would be better for compatibility with TS 18661-3 to pass __fp16
values to variadic functions as themselves, unpromoted.  (Formally of
course the lack of promotion is a language feature not an ABI feature; as
long as va_arg for _Float16 named works correctly, you could promote at
the ABI level and then convert back, and the only effect would be that
sNaNs get quieted, so passing a _Float16 sNaN through variable arguments
would act as a convertFormat operation instead of a copy operation.  It's
not clear that having such an ABI-level promotion is a good idea,
however.)

Now, in the context of the current implementation and current ACLE
arithmetic on __fp16 values produces float results - the operands are
promoted at the C language level.  This is different from TS 18661-3,
where _Float16 arithmetic produces results whose semantics type is
_Float16 but which, if FLT_EVAL_METHOD is 0, are evaluated with excess
range and precision to the range and precision of float.  So if __fp16 and
float are differently passed to variadic functions, you have the issue
that if the argument is an expression resulting from __fp16 arithmetic,
the way it is passed depends on whether current ACLE or TS 18661-3 are
followed.  But if the eventual aim is for __fp16 (when using the IEEE
format rather than the alternative format) to become just a typedef for
_Float16, then these issues will need to be addressed.



__fp16's compatibility with _Float16 is still under discussion internally.

Thanks,
Tejas.



Re: [C/C++ PATCH] Missing warning for contradictory attributes (PR c++/71024)

2016-05-11 Thread Bernd Schmidt

On 05/11/2016 03:59 PM, Marek Polacek wrote:

The C++ FE was missing diagnostics e.g. when an "always_inline" on DECL was
followed by DECL with the "noinline" attribute.  The C FE already has code
dealing with this, so I factored it out to a common function.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


Ok, but could you first look whether there are existing testcases that 
could be moved to c-c++-common from gcc.dg instead of adding a new one?



Bernd


Re: [PATCH, DOC] Document ASAN_OPTIONS="halt_on_error" env variable.

2016-05-11 Thread Martin Liška
On 05/11/2016 04:20 PM, Jakub Jelinek wrote:
> On Wed, May 11, 2016 at 04:13:27PM +0200, Martin Liška wrote:
>> It's bit confusing for a use that -fsanitize-recover=address does not recover
>> an instrumented binary. As a default value of halt_on_error is set to 0 for 
>> address sanitizer,
>> the binary fails on a first error.
>>
>> Following patch attempts to explain the ENV variable.
>>
>> Ready for trunk?
>> Thanks,
>> Martin
> 
>> >From 95f694a92428759773e5259323e82cbf49eade34 Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Wed, 11 May 2016 16:05:49 +0200
>> Subject: [PATCH] Document ASAN_OPTIONS="halt_on_error" env variable.
>>
>> gcc/ChangeLog:
>>
>> 2016-05-11  Martin Liska  
>>
>>  * doc/invoke.texi: Explain connection between -fsanitize-recover=address
>>  and ASAN_OPTIONS="halt_on_error=1".
>> ---
>>  gcc/doc/invoke.texi | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index a54a0af..722647a 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -9998,6 +9998,12 @@ for which this feature is experimental.
>>  accepted, the former enables recovery for all sanitizers that support it,
>>  the latter disables recovery for all sanitizers that support it.
>>  
>> +The error recovery mode can be overwritten by @code{halt_on_error=1} 
>> provided
>> +to one of the following environment variables: @env{ASAN_OPTIONS}, 
>> @env{TSAN_OPTIONS}
>> +or @env{UBSAN_OPTIONS}. The default value is set to @code{halt_on_error=1},
>> +only the address sanitizer by default does not recover and 
>> @code{halt_on_error=0}
>> +must be provided.
> 
> It can be overridden (not overwritten?) only in one way I believe, i.e. the
> code must be built with -fsanitize-recover= (whether by default or
> not) and the recovery must be enabled in the library (by default or using
> env var) for successful recovery.  If you compile without recovery, then no
> matter what you do on the env var side it still will be fatal.
> So the docs need to make that clear.
> 
>   Jakub
> 

Thank you Jakub for the note. What about the second version of the patch?

Thanks,
Martin
>From da688c187067dc5c475a4ab5b844c11c4bcd0494 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 11 May 2016 16:05:49 +0200
Subject: [PATCH] Document ASAN_OPTIONS="halt_on_error" env variable.

gcc/ChangeLog:

2016-05-11  Martin Liska  

	* doc/invoke.texi: Explain connection between -fsanitize-recover=address
	and ASAN_OPTIONS="halt_on_error=1".
---
 gcc/doc/invoke.texi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a54a0af..282367d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9998,6 +9998,12 @@ for which this feature is experimental.
 accepted, the former enables recovery for all sanitizers that support it,
 the latter disables recovery for all sanitizers that support it.
 
+Even if a recovery mode is turned on, it can be overridden by providing
+@code{halt_on_error=1} to one of the following environment variables:
+@env{ASAN_OPTIONS}, @env{TSAN_OPTIONS}, @env{UBSAN_OPTIONS}.
+The default value is set to @code{halt_on_error=0} for all options,
+except @env{ASAN_OPTIONS}.
+
 Syntax without explicit @var{opts} parameter is deprecated.  It is equivalent to
 @smallexample
 -fsanitize-recover=undefined,float-cast-overflow,float-divide-by-zero
-- 
2.8.2



[PATCH][ARM] PR target/71056: Don't use vectorized builtins when NEON is not available

2016-05-11 Thread Kyrill Tkachov

Hi all,

In this PR a NEON builtin is introduced during SLP vectorisation even when NEON 
is not available
because arm_builtin_vectorized_function is missing an appropriate check in the 
BSWAP handling code.

Then during expand when we try to expand the NEON builtin the code in 
arm_expand_neon_builtin rightly
throws an error telling the user to enable NEON, even though the testcase 
doesn't use any intrinsics.

This patch fixes the bug by bailing out early if !TARGET_NEON. This allows us 
to remove a redundant
TARGET_NEON check further down in the function as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk?

This appears on GCC 6 as well.
On older branches the test failure doesn't trigger but the logic looks buggy 
anyway.
Ok for the branches as well if testing is clean?

Thanks,
Kyrill

2016-05-11  Kyrylo Tkachov  

PR target/71056
* config/arm/arm-builtins.c (arm_builtin_vectorized_function): Return
NULL_TREE early if NEON is not available.  Remove now redundant check
in ARM_CHECK_BUILTIN_MODE.

2016-05-11  Kyrylo Tkachov  

PR target/71056
* gcc.target/arm/pr71056.c: New test.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 90fb40fed24cd31ed7f718664fc9b45e58c3cfa8..68b2839879f78e8d819444fbc11d2a91f8d6279a 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -2861,6 +2861,10 @@ arm_builtin_vectorized_function (unsigned int fn, tree type_out, tree type_in)
   int in_n, out_n;
   bool out_unsigned_p = TYPE_UNSIGNED (type_out);
 
+  /* Can't provide any vectorized builtins when we can't use NEON.  */
+  if (!TARGET_NEON)
+return NULL_TREE;
+
   if (TREE_CODE (type_out) != VECTOR_TYPE
   || TREE_CODE (type_in) != VECTOR_TYPE)
 return NULL_TREE;
@@ -2875,7 +2879,7 @@ arm_builtin_vectorized_function (unsigned int fn, tree type_out, tree type_in)
NULL_TREE is returned if no such builtin is available.  */
 #undef ARM_CHECK_BUILTIN_MODE
 #define ARM_CHECK_BUILTIN_MODE(C)\
-  (TARGET_NEON && TARGET_FPU_ARMV8   \
+  (TARGET_FPU_ARMV8   \
&& flag_unsafe_math_optimizations \
&& ARM_CHECK_BUILTIN_MODE_1 (C))
 
diff --git a/gcc/testsuite/gcc.target/arm/pr71056.c b/gcc/testsuite/gcc.target/arm/pr71056.c
new file mode 100644
index ..136754eb13c4c4f8f840001d5520cf27f3c57461
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr71056.c
@@ -0,0 +1,32 @@
+/* PR target/71056.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_vfp3_ok } */
+/* { dg-options "-O3 -mfpu=vfpv3" } */
+
+/* Check that compiling for a non-NEON target doesn't try to introduce
+   a NEON vectorized builtin.  */
+
+extern char *buff;
+int f2 ();
+struct T1
+{
+  int reserved[2];
+  unsigned int ip;
+  unsigned short cs;
+  unsigned short rsrv2;
+};
+void
+f3 (const char *p)
+{
+  struct T1 x;
+  __builtin_memcpy (, p, sizeof (struct T1));
+  x.reserved[0] = __builtin_bswap32 (x.reserved[0]);
+  x.reserved[1] = __builtin_bswap32 (x.reserved[1]);
+  x.ip = __builtin_bswap32 (x.ip);
+  x.cs = x.cs << 8 | x.cs >> 8;
+  x.rsrv2 = x.rsrv2 << 8 | x.rsrv2 >> 8;
+  if (f2 ())
+{
+  __builtin_memcpy (buff, "\n", 1);
+}
+}


Re: [PATCH, DOC] Document ASAN_OPTIONS="halt_on_error" env variable.

2016-05-11 Thread Yury Gribov

On 05/11/2016 05:13 PM, Martin Liška wrote:

Hello.

It's bit confusing for a use that -fsanitize-recover=address does not recover
an instrumented binary. As a default value of halt_on_error is set to 0 for 
address sanitizer,
the binary fails on a first error.


I'm the guy behind -fsanitize-recover=address so let me explain.

Error recovery requires changes both to compiler (insert calls to 
recovering __asan_report_error_X_noabort rather than noreturning 
__asan_report_error_X) and runtime (do not abort when detecting overflow 
inside intercepted API like memcpy). -fsanitize-recover controls the 
compiler side, whereas halt_on_error=0 controls the runtime side.


Unfortunately currently there is no way to inform runtime library that 
compiled code would like it to continue execution after detecting error. 
Actually it's not clear how to do that properly because different parts 
of application could be compiled with different recovery settings (e.g. 
a.c with recovery and b.c without) making it hard to understand what 
behavior user would expect from runtime library interceptors.



Following patch attempts to explain the ENV variable.

Ready for trunk?


LGTM (but I'm not a maintainer and do not have approve right).


Thanks,
Martin





Re: libgomp: Make GCC 5 OpenACC offloading executables work

2016-05-11 Thread Bernd Schmidt

On 05/11/2016 03:46 PM, Thomas Schwinge wrote:

What we now got, doesn't work, for several reasons.  GCC 5 OpenACC
offloading executables will just run into SIGSEGV.


I'm tempted to say, let's just wait until someone actually reports that 
in bugzilla. Offloading in gcc-5 was broken enough that I expect no one 
was actually using it. There's really very little point in carrying 
compatibility crud around.



Bernd


Re: [PATCH, DOC] Document ASAN_OPTIONS="halt_on_error" env variable.

2016-05-11 Thread Jakub Jelinek
On Wed, May 11, 2016 at 04:13:27PM +0200, Martin Liška wrote:
> It's bit confusing for a use that -fsanitize-recover=address does not recover
> an instrumented binary. As a default value of halt_on_error is set to 0 for 
> address sanitizer,
> the binary fails on a first error.
> 
> Following patch attempts to explain the ENV variable.
> 
> Ready for trunk?
> Thanks,
> Martin

> >From 95f694a92428759773e5259323e82cbf49eade34 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 11 May 2016 16:05:49 +0200
> Subject: [PATCH] Document ASAN_OPTIONS="halt_on_error" env variable.
> 
> gcc/ChangeLog:
> 
> 2016-05-11  Martin Liska  
> 
>   * doc/invoke.texi: Explain connection between -fsanitize-recover=address
>   and ASAN_OPTIONS="halt_on_error=1".
> ---
>  gcc/doc/invoke.texi | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a54a0af..722647a 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -9998,6 +9998,12 @@ for which this feature is experimental.
>  accepted, the former enables recovery for all sanitizers that support it,
>  the latter disables recovery for all sanitizers that support it.
>  
> +The error recovery mode can be overwritten by @code{halt_on_error=1} provided
> +to one of the following environment variables: @env{ASAN_OPTIONS}, 
> @env{TSAN_OPTIONS}
> +or @env{UBSAN_OPTIONS}. The default value is set to @code{halt_on_error=1},
> +only the address sanitizer by default does not recover and 
> @code{halt_on_error=0}
> +must be provided.

It can be overridden (not overwritten?) only in one way I believe, i.e. the
code must be built with -fsanitize-recover= (whether by default or
not) and the recovery must be enabled in the library (by default or using
env var) for successful recovery.  If you compile without recovery, then no
matter what you do on the env var side it still will be fatal.
So the docs need to make that clear.

Jakub


Re: [PATCH] Apply fix for PR68463 to RS6000

2016-05-11 Thread James Norris

Thomas,

On 05/11/2016 09:13 AM, Thomas Schwinge wrote:

Hi!

On Tue, 10 May 2016 10:39:33 -0500, James Norris  
wrote:

The fix for PR68463 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463)
was missing code that prevented the fix from working on RS6000. The
attached patch adds the missing code for RS6000.


:-( Bah.  When reviewing these changes,
,
I had been under the impression that all GNU/Linux targets use the
gcc/config/gnu-user.h file being patched there, hence my comment "I guess
we currently don't have to care about offloading configurations not using
the gnu-user.h file in which you modified the
STARTFILE_SPEC/ENDFILE_SPEC?" -- but as we now found out that hard way,
this PowerPC target does not actually, and so we need to repeat the
changes here:


  * config/rs6000/sysv4.h (CRTOFFLOADBEGIN): Define. Add 
crtoffloadbegin.o
  if offloading is enabled and -fopenacc or -fopenmp is specified.
  (CRTOFFLOADEND): Likewise.
  (STARTFILE_LINUX_SPEC): Add CRTOFFLOADBEGIN.
  (ENDFILE_LINUX_SPEC): Add CRTOFFLOADEND.


Should have added a "PR driver/68463" tag to the ChangeLog snippet, to
get this commit added to .


Opp.



Are you also going to commit this to gcc-6-branch, where it is broken in
the very same way?  You can do so "as obvious", without special approval.


Yes. And gomp4 too.

Jim




Re: [PATCH 3/3] Enhance dumps of IVOPTS

2016-05-11 Thread Martin Liška
On 05/10/2016 03:16 PM, Bin.Cheng wrote:
> Another way is to remove the use of id for struct iv_inv_expr_ent once
> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
> to pointers, and rename iv_inv_expr_ent.id to count and use this to
> record reference number in iv_ca.  This if-statement on dump_file can
> be saved.  Also I think it simplifies current code a bit.  For now,
> there are id <-> struct maps for different structures in IVOPT which
> make it not straightforward.

Sound good to me, I will re-implement dump enhancement in suggested manner.

Martin


[PATCH, DOC] Document ASAN_OPTIONS="halt_on_error" env variable.

2016-05-11 Thread Martin Liška
Hello.

It's bit confusing for a use that -fsanitize-recover=address does not recover
an instrumented binary. As a default value of halt_on_error is set to 0 for 
address sanitizer,
the binary fails on a first error.

Following patch attempts to explain the ENV variable.

Ready for trunk?
Thanks,
Martin
>From 95f694a92428759773e5259323e82cbf49eade34 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 11 May 2016 16:05:49 +0200
Subject: [PATCH] Document ASAN_OPTIONS="halt_on_error" env variable.

gcc/ChangeLog:

2016-05-11  Martin Liska  

	* doc/invoke.texi: Explain connection between -fsanitize-recover=address
	and ASAN_OPTIONS="halt_on_error=1".
---
 gcc/doc/invoke.texi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a54a0af..722647a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9998,6 +9998,12 @@ for which this feature is experimental.
 accepted, the former enables recovery for all sanitizers that support it,
 the latter disables recovery for all sanitizers that support it.
 
+The error recovery mode can be overwritten by @code{halt_on_error=1} provided
+to one of the following environment variables: @env{ASAN_OPTIONS}, @env{TSAN_OPTIONS}
+or @env{UBSAN_OPTIONS}. The default value is set to @code{halt_on_error=1},
+only the address sanitizer by default does not recover and @code{halt_on_error=0}
+must be provided.
+
 Syntax without explicit @var{opts} parameter is deprecated.  It is equivalent to
 @smallexample
 -fsanitize-recover=undefined,float-cast-overflow,float-divide-by-zero
-- 
2.8.2



Re: [PATCH] Apply fix for PR68463 to RS6000

2016-05-11 Thread Thomas Schwinge
Hi!

On Tue, 10 May 2016 10:39:33 -0500, James Norris  
wrote:
> The fix for PR68463 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463)
> was missing code that prevented the fix from working on RS6000. The
> attached patch adds the missing code for RS6000.

:-( Bah.  When reviewing these changes,
,
I had been under the impression that all GNU/Linux targets use the
gcc/config/gnu-user.h file being patched there, hence my comment "I guess
we currently don't have to care about offloading configurations not using
the gnu-user.h file in which you modified the
STARTFILE_SPEC/ENDFILE_SPEC?" -- but as we now found out that hard way,
this PowerPC target does not actually, and so we need to repeat the
changes here:

>  * config/rs6000/sysv4.h (CRTOFFLOADBEGIN): Define. Add 
> crtoffloadbegin.o
>  if offloading is enabled and -fopenacc or -fopenmp is specified.
>  (CRTOFFLOADEND): Likewise.
>  (STARTFILE_LINUX_SPEC): Add CRTOFFLOADBEGIN.
>  (ENDFILE_LINUX_SPEC): Add CRTOFFLOADEND.

Should have added a "PR driver/68463" tag to the ChangeLog snippet, to
get this commit added to .

Are you also going to commit this to gcc-6-branch, where it is broken in
the very same way?  You can do so "as obvious", without special approval.


Grüße
 Thomas


signature.asc
Description: PGP signature


[PATCH] Fix PR71055

2016-05-11 Thread Richard Biener

The following fixes PR71055.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-05-11  Richard Biener  

PR tree-optimization/71055
* tree-ssa-sccvn.c (vn_reference_lookup_3): When native-interpreting
sth with precision not equal to access size verify we don't chop
off bits.

* gcc.dg/torture/pr71055.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 236069)
--- gcc/tree-ssa-sccvn.c(working copy)
*** vn_reference_lookup_3 (ao_ref *ref, tree
*** 1907,1920 
buffer, sizeof (buffer));
  if (len > 0)
{
! tree val = native_interpret_expr (vr->type,
buffer
+ ((offset - offset2)
   / BITS_PER_UNIT),
ref->size / BITS_PER_UNIT);
  if (val)
return vn_reference_lookup_or_insert_for_pieces
!(vuse, vr->set, vr->type, vr->operands, val);
}
}
  }
--- 1950,1983 
buffer, sizeof (buffer));
  if (len > 0)
{
! tree type = vr->type;
! /* Make sure to interpret in a type that has a range
!covering the whole access size.  */
! if (INTEGRAL_TYPE_P (vr->type)
! && ref->size != TYPE_PRECISION (vr->type))
!   type = build_nonstandard_integer_type (ref->size,
!  TYPE_UNSIGNED (type));
! tree val = native_interpret_expr (type,
buffer
+ ((offset - offset2)
   / BITS_PER_UNIT),
ref->size / BITS_PER_UNIT);
+ /* If we chop off bits because the types precision doesn't
+match the memory access size this is ok when optimizing
+reads but not when called from the DSE code during
+elimination.  */
+ if (val
+ && type != vr->type)
+   {
+ if (! int_fits_type_p (val, vr->type))
+   val = NULL_TREE;
+ else
+   val = fold_convert (vr->type, val);
+   }
+ 
  if (val)
return vn_reference_lookup_or_insert_for_pieces
!(vuse, vr->set, vr->type, vr->operands, val);
}
}
  }
Index: gcc/testsuite/gcc.dg/torture/pr71055.c
===
*** gcc/testsuite/gcc.dg/torture/pr71055.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr71055.c  (working copy)
***
*** 0 
--- 1,18 
+ /* { dg-do run } */
+ 
+ extern void abort (void);
+ union U { int i; _Bool b; char c; };
+ void __attribute__((noinline,noclone))
+ foo (union U *u)
+ {
+   if (u->c != 0)
+ abort ();
+ }
+ int main()
+ {
+   union U u;
+   u.i = 10;
+   u.b = 0;
+   foo ();
+   return 0;
+ }


[C/C++ PATCH] Missing warning for contradictory attributes (PR c++/71024)

2016-05-11 Thread Marek Polacek
The C++ FE was missing diagnostics e.g. when an "always_inline" on DECL was
followed by DECL with the "noinline" attribute.  The C FE already has code
dealing with this, so I factored it out to a common function.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-11  Marek Polacek  

PR c++/71024
* c-common.c (diagnose_mismatched_attributes): New function.
* c-common.h (diagnose_mismatched_attributes): Declare.

* c-decl.c (diagnose_mismatched_decls): Factor out code to
diagnose_mismatched_attributes and call it.

* decl.c (duplicate_decls): Call diagnose_mismatched_decls.

* c-c++-common/attributes-3.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 63a18c8..665448c 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -12824,4 +12824,58 @@ get_source_date_epoch ()
   return (time_t) epoch;
 }
 
+/* Check and possibly warn if two declarations have contradictory
+   attributes, such as always_inline vs. noinline.  */
+
+bool
+diagnose_mismatched_attributes (tree olddecl, tree newdecl)
+{
+  bool warned = false;
+
+  tree a1 = lookup_attribute ("optimize", DECL_ATTRIBUTES (olddecl));
+  tree a2 = lookup_attribute ("optimize", DECL_ATTRIBUTES (newdecl));
+  /* An optimization attribute applied on a declaration after the
+ definition is likely not what the user wanted.  */
+  if (a2 != NULL_TREE
+  && DECL_SAVED_TREE (olddecl) != NULL_TREE
+  && (a1 == NULL_TREE || !attribute_list_equal (a1, a2)))
+warned |= warning (OPT_Wattributes,
+  "optimization attribute on %qD follows "
+  "definition but the attribute doesn%'t match",
+  newdecl);
+
+  /* Diagnose inline __attribute__ ((noinline)) which is silly.  */
+  if (DECL_DECLARED_INLINE_P (newdecl)
+  && DECL_UNINLINABLE (olddecl)
+  && lookup_attribute ("noinline", DECL_ATTRIBUTES (olddecl)))
+warned |= warning (OPT_Wattributes, "inline declaration of %qD follows "
+  "declaration with attribute noinline", newdecl);
+  else if (DECL_DECLARED_INLINE_P (olddecl)
+  && DECL_UNINLINABLE (newdecl)
+  && lookup_attribute ("noinline", DECL_ATTRIBUTES (newdecl)))
+warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute "
+  "noinline follows inline declaration ", newdecl);
+  else if (lookup_attribute ("noinline", DECL_ATTRIBUTES (newdecl))
+  && lookup_attribute ("always_inline", DECL_ATTRIBUTES (olddecl)))
+warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute "
+  "%qs follows declaration with attribute %qs",
+  newdecl, "noinline", "always_inline");
+  else if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (newdecl))
+  && lookup_attribute ("noinline", DECL_ATTRIBUTES (olddecl)))
+warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute "
+  "%qs follows declaration with attribute %qs",
+  newdecl, "always_inline", "noinline");
+  else if (lookup_attribute ("cold", DECL_ATTRIBUTES (newdecl))
+  && lookup_attribute ("hot", DECL_ATTRIBUTES (olddecl)))
+warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute "
+  "%qs follows declaration with attribute %qs",
+  newdecl, "cold", "hot");
+  else if (lookup_attribute ("hot", DECL_ATTRIBUTES (newdecl))
+  && lookup_attribute ("cold", DECL_ATTRIBUTES (olddecl)))
+warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute "
+  "%qs follows declaration with attribute %qs",
+  newdecl, "hot", "cold");
+  return warned;
+}
+
 #include "gt-c-family-c-common.h"
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index 4454d08..0ee9f56 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -850,6 +850,7 @@ extern bool keyword_is_type_qualifier (enum rid);
 extern bool keyword_is_decl_specifier (enum rid);
 extern bool cxx_fundamental_alignment_p (unsigned);
 extern bool pointer_to_zero_sized_aggr_p (tree);
+extern bool diagnose_mismatched_attributes (tree, tree);
 
 #define c_sizeof(LOC, T)  c_sizeof_or_alignof_type (LOC, T, true, false, 1)
 #define c_alignof(LOC, T) c_sizeof_or_alignof_type (LOC, T, false, false, 1)
diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index 9c09536..6ba0e0e 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -2227,55 +2227,7 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
 }
 
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
-{
-  tree a1 = lookup_attribute ("optimize", DECL_ATTRIBUTES (olddecl));
-  tree a2 = lookup_attribute ("optimize", DECL_ATTRIBUTES (newdecl));
-  /* An optimization attribute applied on a declaration after the
-definition is likely not what the user wanted.  */
-  if (a2 

Re: "const" qualifier vs. OpenACC data/OpenMP map clauses

2016-05-11 Thread Thomas Schwinge
Hi!

Ping.  Or should I open a PR for that?

On Fri, 15 Apr 2016 10:47:34 +0200, I wrote:
> On Thu, 14 Apr 2016 14:21:33 -0700, Cesar Philippidis 
>  wrote:
> > This patch fixes a segfault in libgomp.oacc-fortran/non-scalar-data.f90.
> > The problem here is that 'n' is a parameter, and the kernels region
> > implicitly adds a copy clause to n. Naturally, the test segfaults when
> > it comes time to write the value back to the host as the kernels region
> > terminates.
> 
> ;-| Ha!  Glad that you found this rather quickly!
> 
> > This problem only occurs on nvptx targets.
> 
> It's a generic problem, I would say.  Just in a lot of cases, it seem
> that the writeback doesn't have a destructive effect (if the "const" data
> happens to live in a writeable memory location, for suppose).
> 
> So, in C (which I'm more comfortable with than Fortran), we're basically
> talking about the following case:
> 
> const int b = 0;
> #pragma acc kernels /* implicit copy(b) */
> {
>   b = 1;
> }
> 
> ..., which is invalid, as the copyout of "b" writes to "const" memory.
> GCC does not currently diagnose that (but should, I think?).
> 
> But, GCC does diagnose "assignment of read-only variable" inside the
> offloaded block, which I wonder whether that's correct: at least in the
> case of OpenACC, the specification says that given a data clause, "the
> compiler will allocate and manage a copy of the variable or array in the
> memory of the current device, creating a visible device copy of that
> variable or array, for non-shared memory devices".  (Have not looked up
> the corresponding for OpenMP.)  Does that "visible device copy of that
> variable" still have the "const" property?  Or only for shared memory
> devices?  (Uh...)  So, to maintain coherence between shared and
> non-shared memory devices, it probably makes sense to indeed emit this
> diagnostic.  (That is, a "visible device copy" keeps the "const" property
> of the original variable.)
> 
> With that settled, does it then follow that, for example, the create
> (OpenMP: map(alloc)) and "const" qualifiers are conflicting?  That is,
> should the following emit some kind of "conflicting data clause for
> read-only variable" diagnostic?
> 
> const int b = 0;
> #pragma acc kernels create(b)
> {
>   b = 1;
> }
> 
> (I hope the intention of the specification is not to allow for, for
> example, a create clause to override the original variable's "const"
> property.)
> 
> Here's what I got so far; OK to commit to trunk (as
> gcc/testsuite/c-c++-common/{goacc,gomp}/read-only.c?)?  I suppose more
> test cases to be added once resolving the XFAILs.
> 
> void openacc(const double a)
> {
>   const short b = 0;
> #pragma acc kernels /* implicit copy(b) */ /* { dg-error "assignment of 
> read-only variable" "TODO" { xfail *-*-* } } */
>   {
> b = 1; /* { dg-error "assignment of read-only variable" } */
>   }
> 
> #pragma acc kernels create(b) /* { dg-error "TODO conflicting data clause 
> for read-only variable" "TODO" { xfail *-*-* } } */
>   {
> b = 1; /* { dg-error "assignment of read-only variable" } */
>   }
> 
>   (void) b;
> }
> 
> void openmp(const int a)
> {
>   a = 10; /* { dg-error "assignment of read-only parameter" } */
> 
> #pragma omp target data map(from:a) /* { dg-error "assignment of 
> read-only parameter" "TODO" { xfail *-*-* } } */
>   {
> a = 10; /* { dg-error "assignment of read-only parameter" } */
>   }
> 
> #pragma omp target map(from:a) /* { dg-error "assignment of read-only 
> parameter" "TODO" { xfail *-*-* } } */
>   {
> a = 10; /* { dg-error "assignment of read-only parameter" } */
>   }
> 
> #pragma omp target map(alloc:a) /* { dg-error "TODO conflicting map 
> clause for read-only parameter" "TODO" { xfail *-*-* } } */
>   {
> a = 10; /* { dg-error "assignment of read-only parameter" } */
>   }
> 
>   const float b = 1;
>   b = 0.123;  /* { dg-error "assignment of read-only variable" } */
> 
> #pragma omp target data map(from:b) /* { dg-error "assignment of 
> read-only variable" "TODO" { xfail *-*-* } } */
>   {
> b = 10; /* { dg-error "assignment of read-only variable" } */
>   }
> 
> #pragma omp target map(from:b) /* { dg-error "assignment of read-only 
> variable" "TODO" { xfail *-*-* } } */
>   {
> b = 10; /* { dg-error "assignment of read-only variable" } */
>   }
> 
> #pragma omp target map(alloc:b) /* { dg-error "TODO conflicting map 
> clause for read-only variable" "TODO" { xfail *-*-* } } */
>   {
> b = 10; /* { dg-error "assignment of read-only variable" } */
>   }
> 
>   (void) b;
> }


Grüße
 Thomas


[PATCH] Fix PR71057

2016-05-11 Thread Richard Biener

The following fixes PR71057

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-05-11  Richard Biener  

PR debug/71057
* dwarf2out.c (retry_incomplete_types): Set early_dwarf.
(dwarf2out_finish): Move retry_incomplete_types call ...
(dwarf2out_early_finish): ... here.

* g++.dg/debug/pr71057.C: New testcase.

Index: gcc/dwarf2out.c
===
*** gcc/dwarf2out.c (revision 236032)
--- gcc/dwarf2out.c (working copy)
*** gen_entry_point_die (tree decl, dw_die_r
*** 19401,19411 
--- 19401,19413 
  static void
  retry_incomplete_types (void)
  {
+   set_early_dwarf s;
int i;
  
for (i = vec_safe_length (incomplete_types) - 1; i >= 0; i--)
  if (should_emit_struct_debug ((*incomplete_types)[i], 
DINFO_USAGE_DIR_USE))
gen_type_die ((*incomplete_types)[i], comp_unit_die ());
+   vec_safe_truncate (incomplete_types, 0);
  }
  
  /* Determine what tag to use for a record type.  */
*** dwarf2out_finish (const char *filename)
*** 27382,27391 
resolve_addr (comp_unit_die ());
move_marked_base_types ();
  
-   /* Walk through the list of incomplete types again, trying once more to
-  emit full debugging info for them.  */
-   retry_incomplete_types ();
- 
if (flag_eliminate_unused_debug_types)
  prune_unused_types ();
  
--- 27384,27389 
*** dwarf2out_finish (const char *filename)
*** 27686,27691 
--- 27684,27693 
  static void
  dwarf2out_early_finish (void)
  {
+   /* Walk through the list of incomplete types again, trying once more to
+  emit full debugging info for them.  */
+   retry_incomplete_types ();
+ 
/* The point here is to flush out the limbo list so that it is empty
   and we don't need to stream it for LTO.  */
flush_limbo_die_list ();
Index: gcc/testsuite/g++.dg/debug/pr71057.C
===
*** gcc/testsuite/g++.dg/debug/pr71057.C(revision 0)
--- gcc/testsuite/g++.dg/debug/pr71057.C(working copy)
***
*** 0 
--- 1,12 
+ // { dg-do compile }
+ // { dg-options "-g" }
+ template  using decay_t = _Tp;
+ template  struct A;
+ template  struct B { B(A); };
+ template  struct C {
+   template  using constructor = B;
+ typedef constructor dummy;
+ };
+ template  struct D {};
+ C a;
+ D fn1() { fn1, a; }


Re: Simple bitop reassoc in match.pd (was: Canonicalize X u< X to UNORDERED_EXPR)

2016-05-11 Thread H.J. Lu
On Mon, May 9, 2016 at 11:11 PM, Marc Glisse  wrote:
> On Fri, 6 May 2016, Marc Glisse wrote:
>
>> Here they are. I did (X) and (X)&(X). The next one would be
>> ((X)), but at some point we have to defer to reassoc.
>>
>> I didn't add the convert?+tree_nop_conversion_p to the existing transform
>> I modified. I guess at some point we should make a pass and add them to all
>> the transformations on bit operations...
>>
>> For (X & Y) & Y, I believe that any conversion is fine. For the others,
>> tree_nop_conversion_p is probably too strict (narrowing should be fine for
>> all), but I was too lazy to look for tighter conditions.
>>
>> (X ^ Y) ^ Y -> X should probably have (non_lvalue ...) on its output, but
>> in a simple test it didn't seem to matter. Is non_lvalue still needed?
>>
>>
>> Bootstrap+regtest on powerpc64le-unknown-linux-gnu.
>>
>> 2016-05-06  Marc Glisse  
>>
>> gcc/
>> * fold-const.c (fold_binary_loc) [(X ^ Y) & Y]: Remove and merge
>> with...
>> * match.pd ((X & Y) ^ Y): ... this.
>> ((X & Y) & Y, (X | Y) | Y, (X ^ Y) ^ Y, (X & Y) & (X & Z), (X | Y)
>> | (X | Z), (X ^ Y) ^ (X ^ Z)): New transformations.
>>
>> gcc/testsuite/
>> * gcc.dg/tree-ssa/bit-assoc.c: New testcase.
>> * gcc.dg/tree-ssa/pr69270.c: Adjust.
>> * gcc.dg/tree-ssa/vrp59.c: Disable forwprop.
>
>
> Here it is again, I just replaced convert with convert[12] in the last 2
> transforms. This should matter for (unsigned)(si & 42) & (ui & 42u). I
> didn't change it in the other transform, because it would only matter in the
> case (T)(X & CST) & CST, which I think would be better served by extending
> the transform that currently handles (X & CST1) & CST2 (not done in this
> patch).

It caused:

FAIL: gcc.dg/tree-ssa/vrp47.c scan-tree-dump-times vrp2 " & 1;" 0

on x86.

H.J.


Re: [PATCH] clean up insn-automata.c

2016-05-11 Thread Vladimir Makarov

On 05/11/2016 01:39 AM, Alexander Monakov wrote:

On Wed, 30 Mar 2016, Bernd Schmidt wrote:

On 03/25/2016 04:43 AM, Aldy Hernandez wrote:

If Bernd is fine with this, I'm happy to retract my patch and any
possible followups.  I'm just interested in having no path causing a
possible out of bounds access.  If your patch will do that, I'm cool.

I'll need to see that patch first to comment :-)

Here's the proposed patch.  I've found that there's only one user of the
current fancy logic in output_internal_insn_code_evaluation: handling of
NULL_RTX and const0_rtx is only useful for 'state_transition' (generated
by output_trans_func), so it's possible to inline the extended handling
there, and handle only plain non-null rtx_insns in
output_internal_insn_code_evaluation.

This change allows to remove extra checks and casting in
output_internal_insn_latency_func, as done by the patch.

As a nice bonus, it constrains prototypes of three automata functions to
accept insn_rtx rather than just rtx.

Bootstrapped and regtested on x86_64, OK?

Yes, it is ok for the trunk.  Thank you for solving this issue, Alexander.

* genattr.c (main): Change 'rtx' to 'rtx_insn *' in prototypes of
'insn_latency', 'maximal_insn_latency', 'min_insn_conflict_delay'.
* genautomata.c (output_internal_insn_code_evaluation): Simplify.
Move handling of non-insn arguments inline into the sole user:
(output_trans_func): ...here.
(output_min_insn_conflict_delay_func): Change 'rtx' to 'rtx_insn *' in
emitted function prototype.
(output_internal_insn_latency_func): Ditto.  Simplify.
(output_internal_maximal_insn_latency_func): Ditto.  Delete
always-unused argument.
(output_insn_latency_func): Ditto.
(output_maximal_insn_latency_func): Ditto.





Re: libgomp: Make GCC 5 OpenACC offloading executables work

2016-05-11 Thread Thomas Schwinge
Hi!

Ping.

On Wed, 20 Apr 2016 13:35:28 +0200, I wrote:
> On Mon, 28 Sep 2015 15:38:57 -0400, Nathan Sidwell  wrote:
> > On 09/24/15 04:40, Jakub Jelinek wrote:
> > > Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
> > > anyway because of the incompatible PTX version
> 
> I do agree that it's reasonable to require users to re-compile their code
> when switching between major GCC releases, to retain the offloading
> feature, or otherwise resort to host fallback execution.  I'll propose
> some text along these lines for the GCC 6 release notes.
> 
> > > why don't you just
> > > do
> > >goacc_save_and_set_bind (acc_device_host);
> > >fn (hostaddrs);
> > >goacc_restore_bind ();
> > 
> > Committed the  attached.  Thanks for the review.
> 
> What we now got, doesn't work, for several reasons.  GCC 5 OpenACC
> offloading executables will just run into SIGSEGV.  Here is a patch
> (which depends on
> ).
> Unfortunately, we have to jump through some hoops: because GCC 5
> compiler-generated OpenACC reductions code emits calls to
> acc_get_device_type, and because we'll (have to) always resort to host
> fallback execution for GCC 5 executables, we also have to enforce these
> acc_get_device_type calls to return acc_device_host; otherwise reductions
> will give bogus results.  (I hope I'm correctly implementing/using the
> symbol versioning "magic".)  OK for gcc-6-branch and trunk?  Assuming we
> want this fixed on gcc-6-branch, should it be part of 6.1 (to avoid 6.1
> users running into the SIGSEGV), or delay for 6.2?
> 
> We don't have an easy way to add test cases to make sure we don't break
> such legacy interfaces, do we?  (So, I just manually checked a few test
> cases.)
> 
> commit c68c6b8e79176f5dc21684efe2517cbfb83a182e
> Author: Thomas Schwinge 
> Date:   Wed Apr 20 13:08:57 2016 +0200
> 
> libgomp: Make GCC 5 OpenACC offloading executables work
> 
>   * libgomp.h: Include "openacc.h".
>   (goacc_get_device_type_201, goacc_get_device_type_20): New
>   prototypes.
>   (oacc_20_201_symver, goacc_get_device_type_201): New macros.
>   * libgomp.map: Add acc_get_device_type with OACC_2.0.1 symbol
>   version.
>   * oacc-init.c (acc_get_device_type): Rename to
>   goacc_get_device_type_201.
>   (goacc_get_device_type_20): New function.
>   * oacc-parallel.c (GOACC_parallel): Call goacc_lazy_initialize.
>   * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Refuse version
>   0 offload images.
>   * target.c (gomp_load_image_to_device): Gracefully handle the case
>   that a plugin refuses to load offload images.
> ---
>  libgomp/libgomp.h | 10 ++
>  libgomp/libgomp.map   | 10 ++
>  libgomp/oacc-init.c   | 18 +-
>  libgomp/oacc-parallel.c   | 11 +++
>  libgomp/plugin/plugin-nvptx.c | 10 +-
>  libgomp/target.c  |  6 +-
>  6 files changed, 62 insertions(+), 3 deletions(-)
> 
> diff --git libgomp/libgomp.h libgomp/libgomp.h
> index 6a05bbc..9fa1cb1 100644
> --- libgomp/libgomp.h
> +++ libgomp/libgomp.h
> @@ -1011,6 +1011,8 @@ gomp_work_share_init_done (void)
>  /* Now that we're back to default visibility, include the globals.  */
>  #include "libgomp_g.h"
>  
> +#include "openacc.h"
> +
>  /* Include omp.h by parts.  */
>  #include "omp-lock.h"
>  #define _LIBGOMP_OMP_LOCK_DEFINED 1
> @@ -1047,11 +1049,17 @@ extern void gomp_set_nest_lock_25 (omp_nest_lock_25_t 
> *) __GOMP_NOTHROW;
>  extern void gomp_unset_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
>  extern int gomp_test_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
>  
> +extern acc_device_t goacc_get_device_type_201 (void) __GOACC_NOTHROW;
> +extern acc_device_t goacc_get_device_type_20 (void) __GOACC_NOTHROW;
> +
>  # define strong_alias(fn, al) \
>extern __typeof (fn) al __attribute__ ((alias (#fn)));
>  # define omp_lock_symver(fn) \
>__asm (".symver g" #fn "_30, " #fn "@@OMP_3.0"); \
>__asm (".symver g" #fn "_25, " #fn "@OMP_1.0");
> +# define oacc_20_201_symver(fn) \
> +  __asm (".symver go" #fn "_201, " #fn "@@OACC_2.0.1"); \
> +  __asm (".symver go" #fn "_20, " #fn "@OACC_2.0");
>  #else
>  # define gomp_init_lock_30 omp_init_lock
>  # define gomp_destroy_lock_30 omp_destroy_lock
> @@ -1063,6 +1071,8 @@ extern int gomp_test_nest_lock_25 (omp_nest_lock_25_t 
> *) __GOMP_NOTHROW;
>  # define gomp_set_nest_lock_30 omp_set_nest_lock
>  # define gomp_unset_nest_lock_30 omp_unset_nest_lock
>  # define gomp_test_nest_lock_30 omp_test_nest_lock
> +
> +# define goacc_get_device_type_201 acc_get_device_type
>  #endif
>  
>  #ifdef HAVE_ATTRIBUTE_VISIBILITY
> diff --git libgomp/libgomp.map libgomp/libgomp.map
> index 4d42c42..4803aab 100644
> --- libgomp/libgomp.map
> +++ libgomp/libgomp.map
> @@ -304,7 

Re: libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to test

2016-05-11 Thread Thomas Schwinge
Hi!

Ping.

On Mon, 02 May 2016 11:54:27 +0200, I wrote:
> On Fri, 29 Apr 2016 09:43:41 +0200, Jakub Jelinek  wrote:
> > On Thu, Apr 28, 2016 at 12:43:43PM +0200, Thomas Schwinge wrote:
> > > commit 3b521f3e35fdb4b320e95b5f6a82b8d89399481a
> > > Author: Thomas Schwinge 
> > > Date:   Thu Apr 21 11:36:39 2016 +0200
> > > 
> > > libgomp: Unconfuse offload plugins vs. offload targets
> > 
> > I don't like this patch at all, rather than unconfusing stuff it
> > makes stuff confusing.  Plugins are just a way to support various
> > offloading targets.
> 
> Huh; my patch exactly clarifies that the offload_targets variable does
> not actually list offload target names, but does list libgomp offload
> plugin names...
> 
> > Can you please post just a short patch without all those changes
> > that does what you want, rather than renaming everything at the same time?
> 
> I thought incremental, self-contained patches were easier to review.
> Anyway, here's the three patches merged into one:
> 
> commit 8060ae3474072eef685381d80f566d1c0942c603
> Author: Thomas Schwinge 
> Date:   Thu Apr 21 11:36:39 2016 +0200
> 
> libgomp: In OpenACC testing, cycle though $offload_targets, and by 
> default only build for the offload target that we're actually going to test
> 
>   libgomp/
>   * plugin/configfrag.ac (offload_targets): Actually enumerate
>   offload targets, and add...
>   (offload_plugins): ... this one to enumerate offload plugins.
>   (OFFLOAD_PLUGINS): Renamed from OFFLOAD_TARGETS.
>   * target.c (gomp_target_init): Adjust to that.
>   * testsuite/lib/libgomp.exp: Likewise.
>   (offload_targets_s, offload_targets_s_openacc): Remove variables.
>   (offload_target_to_openacc_device_type): New proc.
>   (check_effective_target_openacc_nvidia_accel_selected)
>   (check_effective_target_openacc_host_selected): Examine
>   $openacc_device_type instead of $offload_target_openacc.
>   * Makefile.in: Regenerate.
>   * config.h.in: Likewise.
>   * configure: Likewise.
>   * testsuite/Makefile.in: Likewise.
>   * testsuite/libgomp.oacc-c++/c++.exp: Cycle through
>   $offload_targets (plus "disable") instead of
>   $offload_targets_s_openacc, and add "-foffload=$offload_target" to
>   tagopt.
>   * testsuite/libgomp.oacc-c/c.exp: Likewise.
>   * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
> ---
>  libgomp/Makefile.in|  1 +
>  libgomp/config.h.in|  4 +-
>  libgomp/configure  | 44 +++--
>  libgomp/plugin/configfrag.ac   | 39 +++-
>  libgomp/target.c   |  8 +--
>  libgomp/testsuite/Makefile.in  |  1 +
>  libgomp/testsuite/lib/libgomp.exp  | 72 
> ++
>  libgomp/testsuite/libgomp.oacc-c++/c++.exp | 30 +
>  libgomp/testsuite/libgomp.oacc-c/c.exp | 30 +
>  libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 22 ---
>  10 files changed, 142 insertions(+), 109 deletions(-)
> 
> diff --git libgomp/Makefile.in libgomp/Makefile.in
> [snipped]
> diff --git libgomp/config.h.in libgomp/config.h.in
> [snipped]
> diff --git libgomp/configure libgomp/configure
> [snipped]
> diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
> index 88b4156..de0a6f6 100644
> --- libgomp/plugin/configfrag.ac
> +++ libgomp/plugin/configfrag.ac
> @@ -26,8 +26,6 @@
>  # see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  # .
>  
> -offload_targets=
> -AC_SUBST(offload_targets)
>  plugin_support=yes
>  AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
>  if test x"$plugin_support" = xyes; then
> @@ -142,7 +140,13 @@ AC_SUBST(PLUGIN_HSA_LIBS)
>  
>  
>  
> -# Get offload targets and path to install tree of offloading compiler.
> +# Parse offload targets, and figure out libgomp plugin, and configure the
> +# corresponding offload compiler.  offload_plugins and offload_targets will 
> be
> +# populated in the same order.
> +offload_plugins=
> +offload_targets=
> +AC_SUBST(offload_plugins)
> +AC_SUBST(offload_targets)
>  offload_additional_options=
>  offload_additional_lib_paths=
>  AC_SUBST(offload_additional_options)
> @@ -151,13 +155,13 @@ if test x"$enable_offload_targets" != x; then
>for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
>  tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'`
>  tgt=`echo $tgt | sed 's/=.*//'`
> -tgt_name=
> +tgt_plugin=
>  case $tgt in
>*-intelmic-* | *-intelmicemul-*)
> - tgt_name=intelmic
> + tgt_plugin=intelmic
>   ;;
>nvptx*)
> -tgt_name=nvptx
> + tgt_plugin=nvptx
>   PLUGIN_NVPTX=$tgt
>   PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
>   

Re: Splitting up gcc/omp-low.c?

2016-05-11 Thread Thomas Schwinge
Hi!

Ping.

On Tue, 03 May 2016 11:34:39 +0200, I wrote:
> On Wed, 13 Apr 2016 18:01:09 +0200, I wrote:
> > On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> > > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek  
> > > wrote:
> > > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > > > >how about we split up gcc/omp-low.c into several
> > > > > >files?  Would it make sense (I have not yet looked in detail) to do 
> > > > > >so
> > > > > >along the borders of the several passes defined therein?
> > 
> > > > > I suspect a split along the ompexp/omplow boundary would be quite 
> > > > > easy to
> > > > > achieve.
> > 
> > That was indeed the first one that I tackled, omp-expand.c (spelled out
> > "expand" instead of "exp" to avoid confusion as "exp" might also be short
> > for "expression"; OK?) [...]
> 
> That's the one I'd suggest to pursue next, now that GCC 6.1 has been
> released.  How would you like me to submit the patch for review?  (It's
> huge, obviously.)
> 
> A few high-level comments, and questions that remain to be answered:
> 
> > Stuff that does not relate to OMP lowering, I did not move stuff out of
> > omp-low.c (into a new omp.c, or omp-misc.c, for example) so far, but
> > instead just left all that in omp-low.c.  We'll see how far we get.
> > 
> > One thing I noticed is that there sometimes is more than one suitable
> > place to put stuff: omp-low.c and omp-expand.c categorize by compiler
> > passes, and omp-offload.c -- at least in part -- [would be] about the 
> > orthogonal
> > "offloading" category.  For example, see the OMPTODO "struct oacc_loop
> > and enum oacc_loop_flags" in gcc/omp-offload.h.  We'll see how that goes.
> 
> > Some more comments, to help review:
> 
> > As I don't know how this is usually done: is it appropriate to remove
> > "Contributed by Diego Novillo" from omp-low.c (he does get mentioned for
> > his OpenMP work in gcc/doc/contrib.texi; a ton of other people have been
> > contributing a ton of other stuff since omp-low.c has been created), or
> > does this line stay in omp-low.c, or do I even duplicate it into the new
> > files?
> > 
> > I tried not to re-order stuff when moving.  But: we may actually want to
> > reorder stuff, to put it into a more sensible order.  Any suggestions?
> 
> > I had to export a small number of functions (see the prototypes not moved
> > but added to the header files).
> > 
> > Because it's also used in omp-expand.c, I moved the one-line static
> > inline is_reference function from omp-low.c to omp-low.h, and renamed it
> > to omp_is_reference because of the very generic name.  Similar functions
> > stay in omp-low.c however, so they're no longer defined next to each
> > other.  OK, or does this need a different solution?


Grüße
 Thomas


[Patch ARM/AArch64 03/11] AdvSIMD tests: be more verbose.

2016-05-11 Thread Christophe Lyon
It is useful to have more detailed information in the logs when checking
validation results: instead of repeating the intrinsic name, we now print
its return type too.

2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (CHECK,
CHECK_FP, CHECK_CUMULATIVE_SAT): Print which type was checked.

Change-Id: I74759d6a211cf52962f860fe77653a6f6edc1848

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 49fbd84..a2c160c 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -81,7 +81,7 @@ extern size_t strlen(const char *);
  abort();  \
}   \
   }
\
-fprintf(stderr, "CHECKED %s\n", MSG);  \
+fprintf(stderr, "CHECKED %s %s\n", STR(VECT_TYPE(T, W, N)), MSG);  \
   }
 
 /* Floating-point variant.  */
@@ -110,7 +110,7 @@ extern size_t strlen(const char *);
  abort();  \
}   \
   }
\
-fprintf(stderr, "CHECKED %s\n", MSG);  \
+fprintf(stderr, "CHECKED %s %s\n", STR(VECT_TYPE(T, W, N)), MSG);  \
   }
 
 /* Clean buffer with a non-zero pattern to help diagnose buffer
@@ -335,7 +335,8 @@ extern int VECT_VAR(expected_cumulative_sat, uint, 64, 2);
  strlen(COMMENT) > 0 ? " " COMMENT : "");  \
   abort(); \
 }  \
-fprintf(stderr, "CHECKED CUMULATIVE SAT %s\n", MSG);   \
+fprintf(stderr, "CHECKED CUMULATIVE SAT %s %s\n",  \
+   STR(VECT_TYPE(T, W, N)), MSG);  \
   }
 
 #define CHECK_CUMULATIVE_SAT_NAMED(test_name,EXPECTED,comment) \
-- 
1.9.1



[Patch ARM/AArch64 04/11] Add forgotten vsliq_n_u64 test.

2016-05-11 Thread Christophe Lyon
2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vsli_n.c: Add check for 
vsliq_n_u64.

Change-Id: I90bb2b225ffd7bfd54a0827a0264ac20271f54f2

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
index 0285083..e5f78d0 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
@@ -169,6 +169,7 @@ void vsli_extra(void)
   CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_max_shift, COMMENT);
   CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_max_shift, COMMENT);
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_max_shift, COMMENT);
+  CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected_max_shift, COMMENT);
   CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_max_shift, COMMENT);
   CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_max_shift, COMMENT);
 }
-- 
1.9.1



[Patch ARM/AArch64 08/11] Add missing vstX_lane fp16 tests.

2016-05-11 Thread Christophe Lyon
2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c: Add fp16 tests.

Change-Id: I64e30bc30a9a9cc5c47eff212e7d745bf3230fe7

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
index b923b64..282edd5 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
@@ -14,6 +14,7 @@ VECT_VAR_DECL(expected_st2_0,uint,32,2) [] = { 0xfff0, 
0xfff1 };
 VECT_VAR_DECL(expected_st2_0,poly,8,8) [] = { 0xf0, 0xf1, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st2_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_0,hfloat,32,2) [] = { 0xc180, 0xc170 };
 VECT_VAR_DECL(expected_st2_0,int,16,8) [] = { 0xfff0, 0xfff1, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
@@ -24,6 +25,8 @@ VECT_VAR_DECL(expected_st2_0,uint,32,4) [] = { 0xfff0, 
0xfff1,
   0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0x0, 0x0,
   0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st2_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0x0, 0x0,
+0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_0,hfloat,32,4) [] = { 0xc180, 0xc170,
 0x0, 0x0 };
 
@@ -39,6 +42,7 @@ VECT_VAR_DECL(expected_st2_1,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st2_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_1,hfloat,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
@@ -48,6 +52,8 @@ VECT_VAR_DECL(expected_st2_1,uint,16,8) [] = { 0x0, 0x0, 0x0, 
0x0,
 VECT_VAR_DECL(expected_st2_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
   0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st2_1,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st2_1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
 
 /* Expected results for vst3, chunk 0.  */
@@ -62,6 +68,7 @@ VECT_VAR_DECL(expected_st3_0,uint,32,2) [] = { 0xfff0, 
0xfff1 };
 VECT_VAR_DECL(expected_st3_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0x0,
  0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st3_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0 };
+VECT_VAR_DECL(expected_st3_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0x0 };
 VECT_VAR_DECL(expected_st3_0,hfloat,32,2) [] = { 0xc180, 0xc170 };
 VECT_VAR_DECL(expected_st3_0,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0,
  0x0, 0x0, 0x0, 0x0 };
@@ -73,6 +80,8 @@ VECT_VAR_DECL(expected_st3_0,uint,32,4) [] = { 0xfff0, 
0xfff1,
   0xfff2, 0x0 };
 VECT_VAR_DECL(expected_st3_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0,
   0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st3_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0x0,
+0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st3_0,hfloat,32,4) [] = { 0xc180, 0xc170,
 0xc160, 0x0 };
 
@@ -88,6 +97,7 @@ VECT_VAR_DECL(expected_st3_1,uint,32,2) [] = { 0xfff2, 
0x0 };
 VECT_VAR_DECL(expected_st3_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st3_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st3_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st3_1,hfloat,32,2) [] = { 0xc160, 0x0 };
 VECT_VAR_DECL(expected_st3_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
@@ -97,6 +107,8 @@ VECT_VAR_DECL(expected_st3_1,uint,16,8) [] = { 0x0, 0x0, 
0x0, 0x0,
 VECT_VAR_DECL(expected_st3_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
 VECT_VAR_DECL(expected_st3_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
   0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st3_1,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+0x0, 0x0, 0x0, 0x0 };
 

[Patch ARM/AArch64 05/11] Add missing vreinterpretq_p{8,16} tests.

2016-05-11 Thread Christophe Lyon
2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c: Add
missing tests for vreinterpretq_p{8,16}.

Change-Id: I7e9bb18c668c34685f12aa578868d7752232a96c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
index d4e5768..2570f73 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
@@ -371,6 +371,83 @@ VECT_VAR_DECL(expected_q_u64_8,uint,64,2) [] = { 
0xf7f6f5f4f3f2f1f0,
 VECT_VAR_DECL(expected_q_u64_9,uint,64,2) [] = { 0xfff3fff2fff1fff0,
 0xfff7fff6fff5fff4 };
 
+
+/* Expected results for vreinterpretq_p8_xx.  */
+VECT_VAR_DECL(expected_q_p8_1,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+   0xf4, 0xf5, 0xf6, 0xf7,
+   0xf8, 0xf9, 0xfa, 0xfb,
+   0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected_q_p8_2,poly,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
+   0xf2, 0xff, 0xf3, 0xff,
+   0xf4, 0xff, 0xf5, 0xff,
+   0xf6, 0xff, 0xf7, 0xff };
+VECT_VAR_DECL(expected_q_p8_3,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
+   0xf1, 0xff, 0xff, 0xff,
+   0xf2, 0xff, 0xff, 0xff,
+   0xf3, 0xff, 0xff, 0xff };
+VECT_VAR_DECL(expected_q_p8_4,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
+   0xff, 0xff, 0xff, 0xff,
+   0xf1, 0xff, 0xff, 0xff,
+   0xff, 0xff, 0xff, 0xff };
+VECT_VAR_DECL(expected_q_p8_5,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+   0xf4, 0xf5, 0xf6, 0xf7,
+   0xf8, 0xf9, 0xfa, 0xfb,
+   0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected_q_p8_6,poly,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
+   0xf2, 0xff, 0xf3, 0xff,
+   0xf4, 0xff, 0xf5, 0xff,
+   0xf6, 0xff, 0xf7, 0xff };
+VECT_VAR_DECL(expected_q_p8_7,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
+   0xf1, 0xff, 0xff, 0xff,
+   0xf2, 0xff, 0xff, 0xff,
+   0xf3, 0xff, 0xff, 0xff };
+VECT_VAR_DECL(expected_q_p8_8,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
+   0xff, 0xff, 0xff, 0xff,
+   0xf1, 0xff, 0xff, 0xff,
+   0xff, 0xff, 0xff, 0xff };
+VECT_VAR_DECL(expected_q_p8_9,poly,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
+   0xf2, 0xff, 0xf3, 0xff,
+   0xf4, 0xff, 0xf5, 0xff,
+   0xf6, 0xff, 0xf7, 0xff };
+
+/* Expected results for vreinterpretq_p16_xx.  */
+VECT_VAR_DECL(expected_q_p16_1,poly,16,8) [] = { 0xf1f0, 0xf3f2,
+0xf5f4, 0xf7f6,
+0xf9f8, 0xfbfa,
+0xfdfc, 0xfffe };
+VECT_VAR_DECL(expected_q_p16_2,poly,16,8) [] = { 0xfff0, 0xfff1,
+0xfff2, 0xfff3,
+0xfff4, 0xfff5,
+0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected_q_p16_3,poly,16,8) [] = { 0xfff0, 0x,
+0xfff1, 0x,
+0xfff2, 0x,
+0xfff3, 0x };
+VECT_VAR_DECL(expected_q_p16_4,poly,16,8) [] = { 0xfff0, 0x,
+0x, 0x,
+0xfff1, 0x,
+0x, 0x };
+VECT_VAR_DECL(expected_q_p16_5,poly,16,8) [] = { 0xf1f0, 0xf3f2,
+0xf5f4, 0xf7f6,
+0xf9f8, 0xfbfa,
+0xfdfc, 0xfffe };
+VECT_VAR_DECL(expected_q_p16_6,poly,16,8) [] = { 0xfff0, 0xfff1,
+0xfff2, 0xfff3,
+ 

[Patch ARM/AArch64 07/11] Add missing vget_lane fp16 tests.

2016-05-11 Thread Christophe Lyon
2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vget_lane.c: Add fp16 tests.

Change-Id: I5fafd1e90baf09588ab9f5444817c74e7d865a20

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
index 5806050..fe41c5f 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
@@ -13,6 +13,7 @@ uint32_t   expected_u32  = 0xfff1;
 uint64_t   expected_u64  = 0xfff0;
 poly8_texpected_p8   = 0xf6;
 poly16_t   expected_p16  = 0xfff2;
+hfloat16_t expected_f16  = 0xcb80;
 hfloat32_t expected_f32  = 0xc170;
 
 int8_t expectedq_s8  = 0xff;
@@ -25,6 +26,7 @@ uint32_t   expectedq_u32 = 0xfff2;
 uint64_t   expectedq_u64 = 0xfff1;
 poly8_texpectedq_p8  = 0xfe;
 poly16_t   expectedq_p16 = 0xfff6;
+hfloat16_t expectedq_f16 = 0xca80;
 hfloat32_t expectedq_f32 = 0xc150;
 
 int error_found = 0;
@@ -52,6 +54,10 @@ void exec_vget_lane (void)
 uint32_t var_int32;
 float32_t var_float32;
   } var_int32_float32;
+  union {
+uint16_t var_int16;
+float16_t var_float16;
+  } var_int16_float16;
 
 #define TEST_VGET_LANE_FP(Q, T1, T2, W, N, L) \
   VAR(var, T1, W) = vget##Q##_lane_##T2##W(VECT_VAR(vector, T1, W, N), L); \
@@ -81,10 +87,17 @@ void exec_vget_lane (void)
   VAR_DECL(var, uint, 64);
   VAR_DECL(var, poly, 8);
   VAR_DECL(var, poly, 16);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VAR_DECL(var, float, 16);
+#endif
   VAR_DECL(var, float, 32);
 
   /* Initialize input values.  */
   TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, float, f, 32, 4);
 
@@ -99,6 +112,9 @@ void exec_vget_lane (void)
   TEST_VGET_LANE(, uint, u, 64, 1, 0);
   TEST_VGET_LANE(, poly, p, 8, 8, 6);
   TEST_VGET_LANE(, poly, p, 16, 4, 2);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VGET_LANE_FP(, float, f, 16, 4, 1);
+#endif
   TEST_VGET_LANE_FP(, float, f, 32, 2, 1);
 
   TEST_VGET_LANE(q, int, s, 8, 16, 15);
@@ -111,6 +127,9 @@ void exec_vget_lane (void)
   TEST_VGET_LANE(q, uint, u, 64, 2, 1);
   TEST_VGET_LANE(q, poly, p, 8, 16, 14);
   TEST_VGET_LANE(q, poly, p, 16, 8, 6);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VGET_LANE_FP(q, float, f, 16, 8, 3);
+#endif
   TEST_VGET_LANE_FP(q, float, f, 32, 4, 3);
 }
 
-- 
1.9.1



[Patch ARM/AArch64 09/11] Add missing vrnd{,a,m,n,p,x} tests.

2016-05-11 Thread Christophe Lyon
2016-05-02  Christophe Lyon  

* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c: New.

Change-Id: Iab5f98dc4b15f9a2f61b622a9f62b207872f1737

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
new file mode 100644
index 000..5f492d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-add-options arm_v8_neon } */
+
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc180, 0xc170,
+  0xc160, 0xc150 };
+
+#define INSN vrnd
+#define TEST_MSG "VRND"
+
+#include "vrndX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
new file mode 100644
index 000..629240d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
@@ -0,0 +1,43 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+void FNNAME (INSN) (void)
+{
+  /* vector_res = vrndX (vector), then store the result.  */
+#define TEST_VRND2(INSN, Q, T1, T2, W, N)  \
+  VECT_VAR (vector_res, T1, W, N) =\
+INSN##Q##_##T2##W (VECT_VAR (vector, T1, W, N));   \
+vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N),\
+  VECT_VAR (vector_res, T1, W, N))
+
+  /* Two auxliary macros are necessary to expand INSN.  */
+#define TEST_VRND1(INSN, Q, T1, T2, W, N)  \
+  TEST_VRND2 (INSN, Q, T1, T2, W, N)
+
+#define TEST_VRND(Q, T1, T2, W, N) \
+  TEST_VRND1 (INSN, Q, T1, T2, W, N)
+
+  DECL_VARIABLE (vector, float, 32, 2);
+  DECL_VARIABLE (vector, float, 32, 4);
+
+  DECL_VARIABLE (vector_res, float, 32, 2);
+  DECL_VARIABLE (vector_res, float, 32, 4);
+
+  clean_results ();
+
+  VLOAD (vector, buffer, , float, f, 32, 2);
+  VLOAD (vector, buffer, q, float, f, 32, 4);
+
+  TEST_VRND ( , float, f, 32, 2);
+  TEST_VRND (q, float, f, 32, 4);
+
+  CHECK_FP (TEST_MSG, float, 32, 2, PRIx32, expected, "");
+  CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
+}
+
+int
+main (void)
+{
+  FNNAME (INSN) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
new file mode 100644
index 000..816fd28d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-add-options arm_v8_neon } */
+
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc180, 0xc170,
+  0xc160, 0xc150 };
+
+#define INSN vrnda
+#define TEST_MSG "VRNDA"
+
+#include "vrndX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
new file mode 100644
index 000..029880c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-add-options arm_v8_neon } */
+
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc180, 0xc170,
+  0xc160, 0xc150 };
+
+#define INSN vrndm
+#define TEST_MSG "VRNDM"
+
+#include "vrndX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
new file mode 100644
index 000..571243c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-add-options arm_v8_neon } */
+
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL 

[Patch ARM/AArch64 02/11] We can remove useless #ifdefs from these tests: vmul, vshl and vtst.

2016-05-11 Thread Christophe Lyon
2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vmul.c: Remove useless #ifdef.
* gcc.target/aarch64/advsimd-intrinsics/vshl.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vtst.c: Likewise.

Change-Id: I1b00b8edc4db6e6457be5bc1f92e8b6e218da644

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
index 0cbb656..63f0d8d 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
@@ -37,10 +37,8 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0x60, 0xca, 0x34, 
0x9e,
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc4c7, 0xc4bac000,
   0xc4ae4ccd, 0xc4a1d999 };
 
-#ifndef INSN_NAME
 #define INSN_NAME vmul
 #define TEST_MSG "VMUL"
-#endif
 
 #define FNNAME1(NAME) exec_ ## NAME
 #define FNNAME(NAME) FNNAME1(NAME)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshl.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshl.c
index 821c11e..e8a57a4 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshl.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshl.c
@@ -101,10 +101,8 @@ VECT_VAR_DECL(expected_negative_shift,uint,64,2) [] = { 
0x7ff,
0x7ff };
 
 
-#ifndef INSN_NAME
 #define INSN_NAME vshl
 #define TEST_MSG "VSHL/VSHLQ"
-#endif
 
 #define FNNAME1(NAME) exec_ ## NAME
 #define FNNAME(NAME) FNNAME1(NAME)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
index 7f96540..9e74ffb 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
@@ -32,10 +32,8 @@ VECT_VAR_DECL(expected_unsigned,uint,16,8) [] = { 0x0, 
0x,
 VECT_VAR_DECL(expected_unsigned,uint,32,4) [] = { 0x0, 0x,
  0x0, 0x };
 
-#ifndef INSN_NAME
 #define INSN_NAME vtst
 #define TEST_MSG "VTST/VTSTQ"
-#endif
 
 /* We can't use the standard ref_v_binary_op.c template because vtst
has no 64 bits variant, and outputs are always of uint type.  */
-- 
1.9.1



[Patch ARM/AArch64 10/11] Add missing tests for intrinsics operating on poly64 and poly128 types.

2016-05-11 Thread Christophe Lyon
2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (result):
Add poly64x1_t and poly64x2_t cases if supported.
* gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
(buffer, buffer_pad, buffer_dup, buffer_dup_pad): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/p64_p128.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c: New file.

Change-Id: Ie9bb0c4fd0b8f04fb37668cdb315eaafd06e55c4

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index a2c160c..8664dfc 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -133,6 +133,9 @@ static ARRAY(result, uint, 32, 2);
 static ARRAY(result, uint, 64, 1);
 static ARRAY(result, poly, 8, 8);
 static ARRAY(result, poly, 16, 4);
+#if defined (__ARM_FEATURE_CRYPTO)
+static ARRAY(result, poly, 64, 1);
+#endif
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 static ARRAY(result, float, 16, 4);
 #endif
@@ -147,6 +150,9 @@ static ARRAY(result, uint, 32, 4);
 static ARRAY(result, uint, 64, 2);
 static ARRAY(result, poly, 8, 16);
 static ARRAY(result, poly, 16, 8);
+#if defined (__ARM_FEATURE_CRYPTO)
+static ARRAY(result, poly, 64, 2);
+#endif
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 static ARRAY(result, float, 16, 8);
 #endif
diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
index c8d4336..f8c4aef 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
@@ -118,6 +118,10 @@ VECT_VAR_DECL_INIT(buffer, uint, 32, 2);
 PAD(buffer_pad, uint, 32, 2);
 VECT_VAR_DECL_INIT(buffer, uint, 64, 1);
 PAD(buffer_pad, uint, 64, 1);
+#if defined (__ARM_FEATURE_CRYPTO)
+VECT_VAR_DECL_INIT(buffer, poly, 64, 1);
+PAD(buffer_pad, poly, 64, 1);
+#endif
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 VECT_VAR_DECL_INIT(buffer, float, 16, 4);
 PAD(buffer_pad, float, 16, 4);
@@ -144,6 +148,10 @@ VECT_VAR_DECL_INIT(buffer, poly, 8, 16);
 PAD(buffer_pad, poly, 8, 16);
 VECT_VAR_DECL_INIT(buffer, poly, 16, 8);
 PAD(buffer_pad, poly, 16, 8);
+#if defined (__ARM_FEATURE_CRYPTO)
+VECT_VAR_DECL_INIT(buffer, poly, 64, 2);
+PAD(buffer_pad, poly, 64, 2);
+#endif
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 VECT_VAR_DECL_INIT(buffer, float, 16, 8);
 PAD(buffer_pad, float, 16, 8);
@@ -178,6 +186,10 @@ VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 8);
 VECT_VAR_DECL(buffer_dup_pad, poly, 8, 8);
 VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 4);
 VECT_VAR_DECL(buffer_dup_pad, poly, 16, 4);
+#if defined (__ARM_FEATURE_CRYPTO)
+VECT_VAR_DECL_INIT4(buffer_dup, poly, 64, 1);
+VECT_VAR_DECL(buffer_dup_pad, poly, 64, 1);
+#endif
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 VECT_VAR_DECL_INIT4(buffer_dup, float, 16, 4);
 VECT_VAR_DECL(buffer_dup_pad, float, 16, 4);
@@ -205,6 +217,10 @@ VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 16);
 VECT_VAR_DECL(buffer_dup_pad, poly, 8, 16);
 VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 8);
 VECT_VAR_DECL(buffer_dup_pad, poly, 16, 8);
+#if defined (__ARM_FEATURE_CRYPTO)
+VECT_VAR_DECL_INIT4(buffer_dup, poly, 64, 2);
+VECT_VAR_DECL(buffer_dup_pad, poly, 64, 2);
+#endif
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 VECT_VAR_DECL_INIT(buffer_dup, float, 16, 8);
 VECT_VAR_DECL(buffer_dup_pad, float, 16, 8);
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
new file mode 100644
index 000..ced3884
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
@@ -0,0 +1,665 @@
+/* This file contains tests for all the *p64 intrinsics, except for
+   vreinterpret which have their own testcase.  */
+
+/* { dg-require-effective-target arm_crypto_ok } */
+/* { dg-add-options arm_crypto } */
+
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results: vbsl.  */
+VECT_VAR_DECL(vbsl_expected,poly,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(vbsl_expected,poly,64,2) [] = { 0xfff1,
+ 0xfff1 };
+
+/* Expected results: vceq.  */
+VECT_VAR_DECL(vceq_expected,uint,64,1) [] = { 0x0 };
+
+/* Expected results: vcombine.  */
+VECT_VAR_DECL(vcombine_expected,poly,64,2) [] = { 0xfff0, 0x88 };
+
+/* Expected results: vcreate.  */
+VECT_VAR_DECL(vcreate_expected,poly,64,1) [] = { 

[Patch ARM/AArch64 06/11] Add missing vtst_p8 and vtstq_p8 tests.

2016-05-11 Thread Christophe Lyon
2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vtst.c: Add tests
for vtst_p8 and vtstq_p8.

Change-Id: Id555a9b3214945506a106e2465b42d38bf76a3a7

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
index 9e74ffb..4c7ee79 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
@@ -32,6 +32,14 @@ VECT_VAR_DECL(expected_unsigned,uint,16,8) [] = { 0x0, 
0x,
 VECT_VAR_DECL(expected_unsigned,uint,32,4) [] = { 0x0, 0x,
  0x0, 0x };
 
+/* Expected results with poly input.  */
+VECT_VAR_DECL(expected_poly,uint,8,8) [] = { 0x0, 0xff, 0xff, 0xff,
+0xff, 0xff, 0xff, 0xff };
+VECT_VAR_DECL(expected_poly,uint,8,16) [] = { 0x0, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff };
+
 #define INSN_NAME vtst
 #define TEST_MSG "VTST/VTSTQ"
 
@@ -71,12 +79,14 @@ FNNAME (INSN_NAME)
   VDUP(vector2, , uint, u, 8, 8, 15);
   VDUP(vector2, , uint, u, 16, 4, 5);
   VDUP(vector2, , uint, u, 32, 2, 1);
+  VDUP(vector2, , poly, p, 8, 8, 15);
   VDUP(vector2, q, int, s, 8, 16, 15);
   VDUP(vector2, q, int, s, 16, 8, 5);
   VDUP(vector2, q, int, s, 32, 4, 1);
   VDUP(vector2, q, uint, u, 8, 16, 15);
   VDUP(vector2, q, uint, u, 16, 8, 5);
   VDUP(vector2, q, uint, u, 32, 4, 1);
+  VDUP(vector2, q, poly, p, 8, 16, 15);
 
 #define TEST_MACRO_NO64BIT_VARIANT_1_5(MACRO, VAR, T1, T2) \
   MACRO(VAR, , T1, T2, 8, 8);  \
@@ -109,6 +119,14 @@ FNNAME (INSN_NAME)
   CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_unsigned, CMT);
   CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_unsigned, CMT);
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_unsigned, CMT);
+
+  /* Now, test the variants with poly8 as input.  */
+#undef CMT
+#define CMT " (poly input)"
+  TEST_BINARY_OP(INSN_NAME, , poly, p, 8, 8);
+  TEST_BINARY_OP(INSN_NAME, q, poly, p, 8, 16);
+  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_poly, CMT);
+  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_poly, CMT);
 }
 
 int main (void)
-- 
1.9.1



[Patch ARM/AArch64 11/11] Add missing tests for vreinterpret, operating of fp16 type.

2016-05-11 Thread Christophe Lyon
2016-05-04  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c: Add fp16 tests.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c: Likewise.

Change-Id: Ic8061f1a5f3e042844a33a70c0f42a5f92c43c98

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
index 2570f73..0de2ab3 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
@@ -21,6 +21,8 @@ VECT_VAR_DECL(expected_s8_8,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 
0xf3,
0xf4, 0xf5, 0xf6, 0xf7 };
 VECT_VAR_DECL(expected_s8_9,int,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
0xf2, 0xff, 0xf3, 0xff };
+VECT_VAR_DECL(expected_s8_10,int,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
+0x00, 0xcb, 0x80, 0xca };
 
 /* Expected results for vreinterpret_s16_xx.  */
 VECT_VAR_DECL(expected_s16_1,int,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
@@ -32,6 +34,7 @@ VECT_VAR_DECL(expected_s16_6,int,16,4) [] = { 0xfff0, 0x, 
0xfff1, 0x };
 VECT_VAR_DECL(expected_s16_7,int,16,4) [] = { 0xfff0, 0x, 0x, 0x };
 VECT_VAR_DECL(expected_s16_8,int,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
 VECT_VAR_DECL(expected_s16_9,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_s16_10,int,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 
};
 
 /* Expected results for vreinterpret_s32_xx.  */
 VECT_VAR_DECL(expected_s32_1,int,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
@@ -43,6 +46,7 @@ VECT_VAR_DECL(expected_s32_6,int,32,2) [] = { 0xfff0, 
0xfff1 };
 VECT_VAR_DECL(expected_s32_7,int,32,2) [] = { 0xfff0, 0x };
 VECT_VAR_DECL(expected_s32_8,int,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
 VECT_VAR_DECL(expected_s32_9,int,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
+VECT_VAR_DECL(expected_s32_10,int,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
 
 /* Expected results for vreinterpret_s64_xx.  */
 VECT_VAR_DECL(expected_s64_1,int,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
@@ -54,6 +58,7 @@ VECT_VAR_DECL(expected_s64_6,int,64,1) [] = { 
0xfff1fff0 };
 VECT_VAR_DECL(expected_s64_7,int,64,1) [] = { 0xfff0 };
 VECT_VAR_DECL(expected_s64_8,int,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
 VECT_VAR_DECL(expected_s64_9,int,64,1) [] = { 0xfff3fff2fff1fff0 };
+VECT_VAR_DECL(expected_s64_10,int,64,1) [] = { 0xca80cb00cb80cc00 };
 
 /* Expected results for vreinterpret_u8_xx.  */
 VECT_VAR_DECL(expected_u8_1,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
@@ -74,6 +79,8 @@ VECT_VAR_DECL(expected_u8_8,uint,8,8) [] = { 0xf0, 0xf1, 
0xf2, 0xf3,
 0xf4, 0xf5, 0xf6, 0xf7 };
 VECT_VAR_DECL(expected_u8_9,uint,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
 0xf2, 0xff, 0xf3, 0xff };
+VECT_VAR_DECL(expected_u8_10,uint,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
+ 0x00, 0xcb, 0x80, 0xca };
 
 /* Expected results for vreinterpret_u16_xx.  */
 VECT_VAR_DECL(expected_u16_1,uint,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 
};
@@ -85,6 +92,7 @@ VECT_VAR_DECL(expected_u16_6,uint,16,4) [] = { 0xfff0, 
0x, 0xfff1, 0x };
 VECT_VAR_DECL(expected_u16_7,uint,16,4) [] = { 0xfff0, 0x, 0x, 0x 
};
 VECT_VAR_DECL(expected_u16_8,uint,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 
};
 VECT_VAR_DECL(expected_u16_9,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 
};
+VECT_VAR_DECL(expected_u16_10,uint,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 
};
 
 /* Expected results for vreinterpret_u32_xx.  */
 VECT_VAR_DECL(expected_u32_1,uint,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
@@ -96,6 +104,7 @@ VECT_VAR_DECL(expected_u32_6,uint,32,2) [] = { 0xfff1fff0, 
0xfff3fff2 };
 VECT_VAR_DECL(expected_u32_7,uint,32,2) [] = { 0xfff0, 0x };
 VECT_VAR_DECL(expected_u32_8,uint,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
 VECT_VAR_DECL(expected_u32_9,uint,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
+VECT_VAR_DECL(expected_u32_10,uint,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
 
 /* Expected results for vreinterpret_u64_xx.  */
 VECT_VAR_DECL(expected_u64_1,uint,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
@@ -107,6 +116,7 @@ VECT_VAR_DECL(expected_u64_6,uint,64,1) [] = { 
0xfff3fff2fff1fff0 };
 VECT_VAR_DECL(expected_u64_7,uint,64,1) [] = { 0xfff1fff0 };
 VECT_VAR_DECL(expected_u64_8,uint,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
 VECT_VAR_DECL(expected_u64_9,uint,64,1) [] = { 0xfff3fff2fff1fff0 };
+VECT_VAR_DECL(expected_u64_10,uint,64,1) [] = { 0xca80cb00cb80cc00 };
 
 /* Expected results for vreinterpret_p8_xx.  */
 VECT_VAR_DECL(expected_p8_1,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
@@ -127,6 +137,8 @@ VECT_VAR_DECL(expected_p8_8,poly,8,8) [] = { 0xf0, 

[Patch ARM/AArch64 00/11][testsuite] AdvSIMD intrinsics update

2016-05-11 Thread Christophe Lyon
Hi,

A few months ago, we decided it was time to remove neon-testgen.ml
and its generated tests. I did it, just to realize too late that
some intrinsics were not covered anymore, so I reverted the removal.

This patch series performs a little bit of cleanup and adds the
missing tests to cover all what is defined in arm_neon.h for AArch32.

Globally, this consists in adding tests for:
- missing poly8 and poly16 for vreinterpret and vtst
- fp16 tests for vget_lane, vstX_lane and vreinterpret
- armv8 vrnd{,a,m,n,p,x}
- tests for poly64 and poly128 intrinsics

Some intrinsics are not covered in aarch64/advsimd-intrinsics, but in
arm/crypto: vldrq, vstrq, vaes, vsha1, vsha256, vmull_p64,
vmull_high_p64.

Patches 1-4 are cleanup.
Patch 5 adds the missing poly8 and poly16 tests for vreinterpret.
Patch 6 adds the missing tests for vtst_p8 and vtstq_p8.
Patches 7,8, 11 add the missing fp16 tests
Patch 9 adds armv8 vrnd{,a,m,n,p,x} tests
Patch 10 adds tests for poly64 and poly128 operations

I've checked the coverage by building the list of intrinsics tested
via neon-testgen.ml, the list of intrinsics defined in arm_neon.h, and
running the advsimd-intrinsics.exp tests with -save-temps to gather
the list of actually tested intrinsics.

This series partly addresses PR 70369 which I created to keep track
of these missing intrinsics tests: several AArch64 AdvSIMD intrinsics
are still missing tests.

Tested with QEMU on arm* and aarch64*, with no regression, and
several new PASSes.

OK for trunk?

Christophe

Christophe Lyon (11):
  Fix typo in vreinterpret.c test comment.
  We can remove useless #ifdefs from these tests: vmul, vshl and vtst.
  AdvSIMD tests: be more verbose.
  Add forgotten vsliq_n_u64 test.
  Add missing vreinterpretq_p{8,16} tests.
  Add missing vtst_p8 and vtstq_p8 tests.
  Add missing vget_lane fp16 tests.
  Add missing vstX_lane fp16 tests.
  Add missing vrnd{,a,m,n,p,x} tests.
  Add missing tests for intrinsics operating on poly64 and poly128
types.
  Add missing tests for vreinterpret, operating of fp16 type.

 .../aarch64/advsimd-intrinsics/arm-neon-ref.h  |  13 +-
 .../aarch64/advsimd-intrinsics/compute-ref-data.h  |  16 +
 .../aarch64/advsimd-intrinsics/p64_p128.c  | 691 +
 .../aarch64/advsimd-intrinsics/vget_lane.c |  19 +
 .../gcc.target/aarch64/advsimd-intrinsics/vmul.c   |   2 -
 .../aarch64/advsimd-intrinsics/vreinterpret.c  | 255 +++-
 .../aarch64/advsimd-intrinsics/vreinterpret_p128.c | 160 +
 .../aarch64/advsimd-intrinsics/vreinterpret_p64.c  | 202 ++
 .../gcc.target/aarch64/advsimd-intrinsics/vrnd.c   |  16 +
 .../aarch64/advsimd-intrinsics/vrndX.inc   |  43 ++
 .../gcc.target/aarch64/advsimd-intrinsics/vrnda.c  |  16 +
 .../gcc.target/aarch64/advsimd-intrinsics/vrndm.c  |  16 +
 .../gcc.target/aarch64/advsimd-intrinsics/vrndn.c  |  16 +
 .../gcc.target/aarch64/advsimd-intrinsics/vrndp.c  |  16 +
 .../gcc.target/aarch64/advsimd-intrinsics/vrndx.c  |  16 +
 .../gcc.target/aarch64/advsimd-intrinsics/vshl.c   |   2 -
 .../gcc.target/aarch64/advsimd-intrinsics/vsli_n.c |   1 +
 .../aarch64/advsimd-intrinsics/vstX_lane.c | 105 +++-
 .../gcc.target/aarch64/advsimd-intrinsics/vtst.c   |  20 +-
 19 files changed, 1612 insertions(+), 13 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c

-- 
1.9.1



[Patch ARM/AArch64 01/11] Fix typo in vreinterpret.c test comment.

2016-05-11 Thread Christophe Lyon
2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c: Fix typo in 
comment.

Change-Id: I7244c0dc0a5ab2dbcec65b40c050f72f92707139

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
index 9e45e25..d4e5768 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
@@ -405,7 +405,7 @@ VECT_VAR_DECL(expected_q_f32_9,hfloat,32,4) [] = { 
0xf3f2f1f0, 0xf7f6f5f4,
 VECT_VAR_DECL(expected_q_f32_10,hfloat,32,4) [] = { 0xfff1fff0, 0xfff3fff2,
0xfff5fff4, 0xfff7fff6 };
 
-/* Expected results for vreinterpretq_xx_f32.  */
+/* Expected results for vreinterpret_xx_f32.  */
 VECT_VAR_DECL(expected_xx_f32_1,int,8,8) [] = { 0x0, 0x0, 0x80, 0xc1,
0x0, 0x0, 0x70, 0xc1 };
 VECT_VAR_DECL(expected_xx_f32_2,int,16,4) [] = { 0x0, 0xc180, 0x0, 0xc170 };
-- 
1.9.1



[committed] Fix up matmul in !$omp workshare (PR fortran/70855)

2016-05-11 Thread Jakub Jelinek
Hi!

The parsing of !$omp workshare requires only small precise set of statements
that can appear inside, which the inlining of matmul assign breaks.

Fixed by disabling the inlining in !$omp workshare regions.

Long term, it would be nice to inline those in !$omp workshare again and
actually parallelize, but I guess in that case it should not be done during
frontend passes, but during workshare translation into generic - and would
need to arrange for proper worksharing of both of the actions (clearing of
the target array as well as actually computing it).

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
and 6.2.

2016-05-11  Jakub Jelinek  

PR fortran/70855
* frontend-passes.c (inline_matmul_assign): Disable in !$omp workshare.

* gfortran.dg/gomp/pr70855.f90: New test.

--- gcc/fortran/frontend-passes.c.jj2016-02-29 19:33:09.0 +0100
+++ gcc/fortran/frontend-passes.c   2016-05-11 11:54:43.747298277 +0200
@@ -2812,6 +2812,12 @@ inline_matmul_assign (gfc_code **c, int
   if (in_where)
 return 0;
 
+  /* For now don't do anything in OpenMP workshare, it confuses
+ its translation, which expects only the allowed statements in there.
+ We should figure out how to parallelize this eventually.  */
+  if (in_omp_workshare)
+return 0;
+
   expr1 = co->expr1;
   expr2 = co->expr2;
   if (expr2->expr_type != EXPR_FUNCTION
--- gcc/testsuite/gfortran.dg/gomp/pr70855.f90.jj   2016-05-11 
12:03:37.627977013 +0200
+++ gcc/testsuite/gfortran.dg/gomp/pr70855.f90  2016-05-11 12:04:06.927575148 
+0200
@@ -0,0 +1,18 @@
+! PR fortran/70855
+! { dg-do compile }
+! { dg-additional-options "-O2" }
+
+program pr70855
+   integer, parameter :: m = 4
+   integer, parameter :: n = 2
+   real :: a(m,n)
+   real :: x(n)
+   real :: y(m)
+   a = 1.0
+   x = 1.0
+!$omp parallel
+!$omp workshare
+   y(1:m) = matmul ( a(1:m,1:n), x(1:n) )
+!$omp end workshare
+!$omp end parallel
+end program pr70855

Jakub


Re: [PATCH] Fix PR70986

2016-05-11 Thread Richard Biener
On Wed, 11 May 2016, Richard Biener wrote:

> On Tue, 10 May 2016, Richard Biener wrote:
> 
> > 
> > ifcvt is confused about fake edges not being loop exits which the 
> > following fixes.
> > 
> > Bootstrap / regtest pending on x86_64-unknown-linux-gnu.
> 
> Didn't fare well.  The following patch mitigates the issue as well
> but in the end make it just latent - it is still sth I wanted
> to persue anyway seeing the sometimes odd choices for added
> fake edges.  The patch reduces the number from two to one
> and adds it to the real dead-end infinite loop only for the
> testcases.
> 
> 2016-05-11  Richard Biener  
> 
>   PR tree-optimization/70986
>   * cfganal.c (dfs_find_deadend): Prefer to take edges exiting
>   loops.
> 
>   * gcc.dg/torture/pr70986-1.c: New testcase.
>   * gcc.dg/torture/pr70986-2.c: Likewise.
>   * gcc.dg/torture/pr70986-3.c: Likewise.
> 
> Index: gcc/testsuite/gcc.dg/torture/pr70986-1.c
> ===
> --- gcc/testsuite/gcc.dg/torture/pr70986-1.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/torture/pr70986-1.c  (working copy)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +
> +int a, g;
> +char b, c;
> +short d, e, f;
> +
> +char
> +fn1 ()
> +{
> +  return a ? a : 1;
> +}
> +
> +void
> +fn2 ()
> +{
> +  char h;
> +  for (; d;)
> +for (; e; e++)
> +  c = (fn1 () && h) & !(f |= 9 ^ (b > (g = c)));
> +  for (;;)
> +;
> +}
> Index: gcc/testsuite/gcc.dg/torture/pr70986-2.c
> ===
> --- gcc/testsuite/gcc.dg/torture/pr70986-2.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/torture/pr70986-2.c  (working copy)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +
> +int gi, dg;
> +
> +void
> +fe (void)
> +{
> +  int ka = gi;
> +
> +  for (;;)
> +{
> +  if (ka != 0)
> + {
> +   if (dg != 0)
> + gi = 0;
> +   ++ka;
> + }
> +  ++dg;
> +}
> +}
> Index: gcc/testsuite/gcc.dg/torture/pr70986-3.c
> ===
> --- gcc/testsuite/gcc.dg/torture/pr70986-3.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/torture/pr70986-3.c  (working copy)
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +
> +int a, b;
> +int
> +fn1 (int p1)
> +{
> +  return p1 < 0 ? p1 : a;
> +}
> +
> +void
> +fn2 ()
> +{
> +lbl_100:
> +  b = 1;
> +  for (; b != 21; b = fn1 (b))
> +;
> +  goto lbl_100;
> +}
> Index: gcc/cfganal.c
> ===
> *** gcc/cfganal.c (revision 236069)
> --- gcc/cfganal.c (working copy)
> *** dfs_find_deadend (basic_block bb)
> *** 747,753 
> return bb;
>   }
>   
> !   bb = EDGE_SUCC (bb, 0)->dest;
>   }
>   
> gcc_unreachable ();
> --- 751,771 
> return bb;
>   }
>   
> !   /* If we are in an analyzed cycle make sure to try exiting it.
> !  Note this is a heuristic only and expected to work when loop
> !  fixup is needed as well.  */
> !   if (! bb->loop_father
> !   || ! loop_outer (bb->loop_father))
> ! bb = EDGE_SUCC (bb, 0)->dest;
> !   else
> ! {
> !   edge_iterator ei;
> !   edge e;
> !   FOR_EACH_EDGE (e, ei, bb->succs)
> ! if (bb->loop_father != e->dest->loop_father)

Actually I'll test using loop_exit_edge_p (bb->loop_father, e) as the
above may choose to enter a subloop.

Richard.


Re: [PATCH] Fix PR70986

2016-05-11 Thread Richard Biener
On Tue, 10 May 2016, Richard Biener wrote:

> 
> ifcvt is confused about fake edges not being loop exits which the 
> following fixes.
> 
> Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Didn't fare well.  The following patch mitigates the issue as well
but in the end make it just latent - it is still sth I wanted
to persue anyway seeing the sometimes odd choices for added
fake edges.  The patch reduces the number from two to one
and adds it to the real dead-end infinite loop only for the
testcases.

2016-05-11  Richard Biener  

PR tree-optimization/70986
* cfganal.c (dfs_find_deadend): Prefer to take edges exiting
loops.

* gcc.dg/torture/pr70986-1.c: New testcase.
* gcc.dg/torture/pr70986-2.c: Likewise.
* gcc.dg/torture/pr70986-3.c: Likewise.

Index: gcc/testsuite/gcc.dg/torture/pr70986-1.c
===
--- gcc/testsuite/gcc.dg/torture/pr70986-1.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr70986-1.c(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+
+int a, g;
+char b, c;
+short d, e, f;
+
+char
+fn1 ()
+{
+  return a ? a : 1;
+}
+
+void
+fn2 ()
+{
+  char h;
+  for (; d;)
+for (; e; e++)
+  c = (fn1 () && h) & !(f |= 9 ^ (b > (g = c)));
+  for (;;)
+;
+}
Index: gcc/testsuite/gcc.dg/torture/pr70986-2.c
===
--- gcc/testsuite/gcc.dg/torture/pr70986-2.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr70986-2.c(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+int gi, dg;
+
+void
+fe (void)
+{
+  int ka = gi;
+
+  for (;;)
+{
+  if (ka != 0)
+   {
+ if (dg != 0)
+   gi = 0;
+ ++ka;
+   }
+  ++dg;
+}
+}
Index: gcc/testsuite/gcc.dg/torture/pr70986-3.c
===
--- gcc/testsuite/gcc.dg/torture/pr70986-3.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr70986-3.c(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+
+int a, b;
+int
+fn1 (int p1)
+{
+  return p1 < 0 ? p1 : a;
+}
+
+void
+fn2 ()
+{
+lbl_100:
+  b = 1;
+  for (; b != 21; b = fn1 (b))
+;
+  goto lbl_100;
+}
Index: gcc/cfganal.c
===
*** gcc/cfganal.c   (revision 236069)
--- gcc/cfganal.c   (working copy)
*** dfs_find_deadend (basic_block bb)
*** 747,753 
return bb;
  }
  
!   bb = EDGE_SUCC (bb, 0)->dest;
  }
  
gcc_unreachable ();
--- 751,771 
return bb;
  }
  
!   /* If we are in an analyzed cycle make sure to try exiting it.
!  Note this is a heuristic only and expected to work when loop
!fixup is needed as well.  */
!   if (! bb->loop_father
! || ! loop_outer (bb->loop_father))
!   bb = EDGE_SUCC (bb, 0)->dest;
!   else
!   {
! edge_iterator ei;
! edge e;
! FOR_EACH_EDGE (e, ei, bb->succs)
!   if (bb->loop_father != e->dest->loop_father)
! break;
! bb = e ? e->dest : EDGE_SUCC (bb, 0)->dest;
!   }
  }
  
gcc_unreachable ();



Re: [PATCH] Introduce tests for -fsanitize=use-after-scope

2016-05-11 Thread Martin Liška
On 05/06/2016 01:07 PM, Martin Liška wrote:
> Hi.
> 
> This is a new test coverage for the new sanitizer option.
> 
> Martin

Hello.

This is second version of tests. I fixed a test where a variable overflowed and
couple of tests were adopted from LLVM's testsuite (basically rewritten from 
scratch).

Martin
>From 7dd04d12a4bf04ac18dca266f44b18e39e1d711f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 4 May 2016 12:57:05 +0200
Subject: [PATCH 2/2] Introduce tests for -fsanitize=use-after-scope

gcc/testsuite/ChangeLog:

2016-05-10  Martin Liska  

	* g++.dg/asan/use-after-scope-1.C: New test.
	* g++.dg/asan/use-after-scope-2.C: New test.
	* gcc.dg/asan/use-after-scope-1.c: New test.
	* gcc.dg/asan/use-after-scope-2.c: New test.
	* gcc.dg/asan/use-after-scope-3.c: New test.
	* gcc.dg/asan/use-after-scope-4.c: New test.
	* gcc.dg/asan/use-after-scope-5.c: New test.
	* gcc.dg/asan/use-after-scope-goto-1.c: New test.
---
 gcc/testsuite/g++.dg/asan/use-after-scope-1.C  | 22 ++
 gcc/testsuite/g++.dg/asan/use-after-scope-2.C  | 41 ++
 gcc/testsuite/gcc.dg/asan/use-after-scope-1.c  | 19 +
 gcc/testsuite/gcc.dg/asan/use-after-scope-2.c  | 48 ++
 gcc/testsuite/gcc.dg/asan/use-after-scope-3.c  | 21 ++
 gcc/testsuite/gcc.dg/asan/use-after-scope-4.c  | 17 
 gcc/testsuite/gcc.dg/asan/use-after-scope-5.c  | 28 +
 gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c | 47 +
 8 files changed, 243 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-1.C
 create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-2.C
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-1.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-2.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-3.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-4.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-5.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c

diff --git a/gcc/testsuite/g++.dg/asan/use-after-scope-1.C b/gcc/testsuite/g++.dg/asan/use-after-scope-1.C
new file mode 100644
index 000..ed61aed
--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/use-after-scope-1.C
@@ -0,0 +1,22 @@
+// { dg-do run }
+// { dg-additional-options "-fsanitize=use-after-scope" }
+// { dg-shouldfail "asan" }
+
+#include 
+
+int main() {
+  std::function function;
+  {
+int v = 0;
+function = []()
+{
+  return v;
+};
+  }
+  return function();
+}
+
+
+// { dg-output "ERROR: AddressSanitizer: stack-use-after-scope on address.*(\n|\r\n|\r)" }
+// { dg-output "READ of size 4 at.*" }
+// { dg-output ".*'v' <== Memory access at offset \[0-9\]* is inside this variable.*" }
diff --git a/gcc/testsuite/g++.dg/asan/use-after-scope-2.C b/gcc/testsuite/g++.dg/asan/use-after-scope-2.C
new file mode 100644
index 000..d82bc88
--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/use-after-scope-2.C
@@ -0,0 +1,41 @@
+// { dg-do run }
+// { dg-additional-options "-fsanitize=use-after-scope" }
+// { dg-shouldfail "asan" }
+
+#include 
+
+struct Test
+{
+  Test ()
+{
+  my_value = 0;
+}
+
+  ~Test ()
+{
+  fprintf (stderr, "Value: %d\n", *my_value);
+}
+
+  void init (int *v)
+{
+  my_value = v;
+}
+
+  int *my_value;
+};
+
+int main(int argc, char **argv)
+{
+  Test t;
+
+  {
+int x = argc;
+t.init();
+  }
+
+  return 0;
+}
+
+// { dg-output "ERROR: AddressSanitizer: stack-use-after-scope on address.*(\n|\r\n|\r)" }
+// { dg-output "READ of size 4 at.*" }
+// { dg-output ".*'x' <== Memory access at offset \[0-9\]* is inside this variable.*" }
diff --git a/gcc/testsuite/gcc.dg/asan/use-after-scope-1.c b/gcc/testsuite/gcc.dg/asan/use-after-scope-1.c
new file mode 100644
index 000..1420416
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/use-after-scope-1.c
@@ -0,0 +1,19 @@
+// { dg-do run }
+// { dg-additional-options "-fsanitize=use-after-scope" }
+// { dg-shouldfail "asan" }
+
+int
+main (void)
+{
+  char *ptr;
+  {
+char my_char[9];
+ptr = _char[0];
+  }
+
+  return *(ptr+8);
+}
+
+// { dg-output "ERROR: AddressSanitizer: stack-use-after-scope on address.*(\n|\r\n|\r)" }
+// { dg-output "READ of size 1 at.*" }
+// { dg-output ".*'my_char' <== Memory access at offset \[0-9\]* is inside this variable.*" }
diff --git a/gcc/testsuite/gcc.dg/asan/use-after-scope-2.c b/gcc/testsuite/gcc.dg/asan/use-after-scope-2.c
new file mode 100644
index 000..96f0082
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/use-after-scope-2.c
@@ -0,0 +1,48 @@
+// { dg-do run }
+// { dg-additional-options "-fsanitize=use-after-scope" }
+// { dg-shouldfail "asan" }
+
+int *bar (int *x, int *y) { return y; }
+
+int foo (void)
+{
+  char *p;
+  {
+char a = 0;
+p = 
+  }
+
+  if (*p)
+return 1;
+  else
+return 0;
+}
+
+int
+main (void)
+{
+  char *ptr;
+  {
+

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-11 Thread Martin Liška
On 05/06/2016 02:22 PM, Jakub Jelinek wrote:
> On Fri, May 06, 2016 at 01:04:30PM +0200, Martin Liška wrote:
>> I've started working on the patch couple of month go, basically after
>> a brief discussion with Jakub on IRC.
>>
>> I'm sending the initial version which can successfully run instrumented
>> tramp3d, postgresql server and Inkscape. It catches the basic set of
>> examples which are added in following patch.
>>
>> The implementation is quite straightforward as works in following steps:
>>
>> 1) Every local variable stack slot is poisoned at the very beginning of a 
>> function (RTL emission)
>> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
>> emitting ASAN_MARK builtin)
>> and the variable is marked as addressable
> 
> Not all vars have DECL_EXPRs though.

Yeah, I've spotted one interesting example which is part of LLVM's testsuite:

struct IntHolder {
  int val;
};

const IntHolder *saved;

void save(const IntHolder ) {
  saved = 
}

int main(int argc, char *argv[]) {
  save({10});
  int x = saved->val;  // BOOM
  return x;
}

It would be also good to handle such temporaries. Any suggestions how to handle 
that in gimplifier?

> 
>> 3) Similarly, BIND_EXPR is the place where we poison the variable (scope 
>> exit)
>> 4) At the very end of the function, we clean up the poisoned memory
>> 5) The builtins are expanded to call to libsanitizer run-time library 
>> (__asan_poison_stack_memory, __asan_unpoison_stack_memory)
>> 6) As the use-after-scope stuff is already included in libsanitizer, no 
>> change is needed for the library
> 
>> As mentioned, it's request for comment as it still has couple of limitations:
>> a) VLA are not supported, which should make sense as we are unable to 
>> allocate a stack slot for that
>> b) we can possibly strip some instrumentation in situations where a variable 
>> is introduced in a very first BB (RTL poisoning is superfluous).
>> Similarly for a very last BB of a function, we can strip end of scope 
>> poisoning (and RTL unpoisoning). I'll do that incrementally.
> 
> Yeah.
> 
>> c) We require -fstack-reuse=none option, maybe it worth to warn a user if 
>> -fsanitize=use-after-scope is provided without the option?
> 
> This should be implicitly set by -fsanitize=use-after-scope.  Only if some
> other -fstack-reuse= option is explicitly set together with
> -fsanitize=use-after-scope, we should warn and reset it anyway.

Handled in v2 of the patch.

> 
>> d) An instrumented binary is quite slow (~20x for tramp3d) as every function 
>> call produces many memory read/writes. I'm wondering whether
>> we should provide a faster alternative (like instrument just variables that 
>> have address taken) ?
> 
> I don't see any point in instrumenting !needs_to_live_in_memory vars,
> at least not those that don't need to live in memory at gimplification time.
> How could one use those after scope?  They can't be accessed by
> dereferencing some pointer, and the symbol itself should be unavailable for
> symbol lookup after the symbol goes out of scope.
> Plus obviously ~20x slowdown isn't acceptable.
> 
> Another thing is what to do with variables that are addressable at
> gimplification time, but generally are made non-addressable afterwards,
> such as due to optimizing away the taking of their address, inlining, etc.
> 
> Perhaps depending on how big slowdown you get after just instrumenting
> needs_to_live_in_memory vars from ~ gimplification time and/or with the
> possible inlining of the poisoning/unpoisoning (again, should be another
> knob), at least with small sized vars, there might be another knob,
> which would tell if vars that are made non-addressable during optimizations
> later on should be instrumented or not.
> E.g. if you ASAN_MARK all needs_to_live_in_memory vars early, you could
> during the addressable determination when the knob says stuff should be
> faster, but less precise, ignore the vars that are addressable just because
> of the ASAN_MARK calls, and if they'd then turn to be non-addressable,
> remove the corresponding ASAN_MARK calls.

Following the aforementioned instrumentation and utilizing direct shadow memory
instruction emission, I was able to reduce tramp3d slowdown to 3x and
postgresql server test-suite runs 2x slower.

Apart from that, second version of the patch changes:
+ fixed issues with missing stack unpoisoning; currently, I mark all VAR_DECLs 
that
are in ASAN_MARK internal fns and stack prologue/epilogue is emitted just for 
these vars
+ removed unneeded hunks (tree-vect-patterns.c and asan_poisoning.cc)
+ LABEL unpoisoning code makes stable sort for variables that were already used 
in the context
+ stack poisoning hasn't worked for -O1+ due to following guard in asan.c
 /* Automatic vars in the current function will be always accessible.  */
+ direct shadow memory poisoning/unpoisoning code is introduced - in both 
scenarios (RTL and GIMPLE),
I would appreciate feedback if storing multiple bytes is 

Re: [PATCH, PR middle-end/70807] Free dominance info in CSE pass

2016-05-11 Thread Ilya Enkovich
2016-05-11 15:45 GMT+03:00 H.J. Lu :
> On Wed, May 11, 2016 at 2:26 AM, Ilya Enkovich  wrote:
>> 2016-05-10 21:13 GMT+03:00 H.J. Lu :
>>> On Tue, May 10, 2016 at 9:19 AM, Ilya Enkovich  
>>> wrote:
 Hi,

 Curretly CSE may modify CFG and leave invalid dominance info.  This patch
 improves track of CFG changes by CSE passes and frees dominance info if
 required.  This allows to remove corresponding workaround from STV pass.

 Does it look OK?

 Bootstrapped and regtested on x86-64-unknown-linux-gnu.

>>>
 diff --git a/gcc/testsuite/gcc.dg/pr70807.c 
 b/gcc/testsuite/gcc.dg/pr70807.c
 new file mode 100644
 index 000..9ef2a4d
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/pr70807.c
 @@ -0,0 +1,18 @@
 +/* PR middle-end/70807 */
 +/* { dg-do compile } */
 +/* { dg-options "-O2" } */
 +
 +typedef int INT;
 +int a, b, c, d, e, f;
 +void fn1() {
 +  INT g;
 +  if (d && a)
 +;
 +  else if (e && b)
 +;
 +  else if (!a && !b && c)
 +;
 +  else if (b && d || a && e)
 +a = 0;
 +  f = g || d;
 +}
>>>
>>> Does this test fail without the fix?
>>
>> Yes, I creduced it from a libgcc build fail caused by the trigger
>> patch from the tracker.
>>
>
> This test only uses int, which is 32-bit.  How does it trigger
> STV?

I don't fix any bug in STV.  This test should trigger CSE.

Thanks,
Ilya

>
> --
> H.J.


Re: [PATCH, PR middle-end/70807] Free dominance info in CSE pass

2016-05-11 Thread H.J. Lu
On Wed, May 11, 2016 at 2:26 AM, Ilya Enkovich  wrote:
> 2016-05-10 21:13 GMT+03:00 H.J. Lu :
>> On Tue, May 10, 2016 at 9:19 AM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> Curretly CSE may modify CFG and leave invalid dominance info.  This patch
>>> improves track of CFG changes by CSE passes and frees dominance info if
>>> required.  This allows to remove corresponding workaround from STV pass.
>>>
>>> Does it look OK?
>>>
>>> Bootstrapped and regtested on x86-64-unknown-linux-gnu.
>>>
>>
>>> diff --git a/gcc/testsuite/gcc.dg/pr70807.c b/gcc/testsuite/gcc.dg/pr70807.c
>>> new file mode 100644
>>> index 000..9ef2a4d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/pr70807.c
>>> @@ -0,0 +1,18 @@
>>> +/* PR middle-end/70807 */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +typedef int INT;
>>> +int a, b, c, d, e, f;
>>> +void fn1() {
>>> +  INT g;
>>> +  if (d && a)
>>> +;
>>> +  else if (e && b)
>>> +;
>>> +  else if (!a && !b && c)
>>> +;
>>> +  else if (b && d || a && e)
>>> +a = 0;
>>> +  f = g || d;
>>> +}
>>
>> Does this test fail without the fix?
>
> Yes, I creduced it from a libgcc build fail caused by the trigger
> patch from the tracker.
>

This test only uses int, which is 32-bit.  How does it trigger
STV?

-- 
H.J.


[PATCH] libstdc++/71049 fix --disable-libstdcxx-dual-abi bootstrap

2016-05-11 Thread Jonathan Wakely

The TM library support broke --disable-libstdcxx-dual-abi. This fixes
it, by only defining the constructors for new strings when the dual
ABI is active.

PR libstdc++/71049
* src/c++11/cow-stdexcept.cc [!_GLIBCXX_USE_DUAL_ABI]: Don't define
exception constructors with __sso_string parameters.

Tested x86_64-linux, committed to trunk, gcc6 backport to follow
shortly.

commit 459dc828d9d731800ad41b9506dff76070e69aaa
Author: redi 
Date:   Wed May 11 12:39:28 2016 +

libstdc++/71049 fix --disable-libstdcxx-dual-abi bootstrap

PR libstdc++/71049
* src/c++11/cow-stdexcept.cc [!_GLIBCXX_USE_DUAL_ABI]: Don't define
exception constructors with __sso_string parameters.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@236118 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc 
b/libstdc++-v3/src/c++11/cow-stdexcept.cc
index a0f505c..31a89df 100644
--- a/libstdc++-v3/src/c++11/cow-stdexcept.cc
+++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc
@@ -292,6 +292,7 @@ _txnal_cow_string_c_str(const void* that)
   return (const char*) txnal_read_ptr((void**)>_M_dataplus._M_p);
 }
 
+#if _GLIBCXX_USE_DUAL_ABI
 const char*
 _txnal_sso_string_c_str(const void* that)
 {
@@ -299,6 +300,7 @@ _txnal_sso_string_c_str(const void* that)
   (void* const*)const_cast(
  &((const std::__sso_string*) that)->_M_s._M_p));
 }
+#endif
 
 void
 _txnal_cow_string_D1_commit(void* data)
@@ -344,9 +346,24 @@ _txnal_runtime_error_get_msg(void* e)
 // result in undefined behavior, which is in this case not initializing this
 // string.
 #if _GLIBCXX_USE_DUAL_ABI
-#define CTORDTORSTRINGCSTR(s) _txnal_sso_string_c_str((s))
+#define CTORS_FROM_SSOSTRING(NAME, CLASS, BASE)\
+void   \
+_ZGTtNSt##NAME##C1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE( \
+CLASS* that, const std::__sso_string& s)   \
+{  \
+  CLASS e(""); \
+  _ITM_memcpyRnWt(that, , sizeof(CLASS));\
+  /* Get the C string from the SSO string.  */ \
+  _txnal_cow_string_C1_for_exceptions(_txnal_##BASE##_get_msg(that),   \
+ _txnal_sso_string_c_str(), that); \
+}  \
+void   \
+_ZGTtNSt##NAME##C2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE( \
+CLASS*, const std::__sso_string&) __attribute__((alias \
+("_ZGTtNSt" #NAME  \
+  "C1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE")));
 #else
-#define CTORDTORSTRINGCSTR(s) ""
+#define CTORS_FROM_SSOSTRING(NAME, CLASS, BASE)
 #endif
 
 // This macro defines transaction constructors and destructors for a specific
@@ -373,21 +390,7 @@ _ZGTtNSt##NAME##C1EPKc (CLASS* that, const char* s)
\
 void   \
 _ZGTtNSt##NAME##C2EPKc (CLASS*, const char*)   \
   __attribute__((alias ("_ZGTtNSt" #NAME "C1EPKc")));  \
-void   \
-_ZGTtNSt##NAME##C1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE( \
-CLASS* that, const std::__sso_string& s)   \
-{  \
-  CLASS e(""); \
-  _ITM_memcpyRnWt(that, , sizeof(CLASS));\
-  /* Get the C string from the SSO string.  */ \
-  _txnal_cow_string_C1_for_exceptions(_txnal_##BASE##_get_msg(that),   \
- CTORDTORSTRINGCSTR(), that);\
-}  \
-void   \
-_ZGTtNSt##NAME##C2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE( \
-CLASS*, const std::__sso_string&) __attribute__((alias \
-("_ZGTtNSt" #NAME  \
-  "C1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE")));  \
+CTORS_FROM_SSOSTRING(NAME, CLASS, BASE)
\
 void   \
 _ZGTtNSt##NAME##D1Ev(CLASS* that)  \
 { _txnal_cow_string_D1(_txnal_##BASE##_get_msg(that)); }   \


Re: [SH][committed] Remove SH5 support in compiler

2016-05-11 Thread Oleg Endo
On Wed, 2016-05-04 at 00:42 +0200, Eric Botcazou wrote:
> > Did that.  Looks there are no changes after regeneration.
> 
> Not in the libada directory:
> 
> eric@polaris:~/svn/gcc/libada> svn info configure configure.ac
> Path: configure
> Name: configure
> Working Copy Root Path: /home/eric/svn/gcc
> URL: svn+ssh://gcc.gnu.org/svn/gcc/trunk/libada/configure
> Relative URL: ^/trunk/libada/configure
> Repository Root: svn+ssh://gcc.gnu.org/svn/gcc
> Repository UUID: 138bc75d-0d04-0410-961f-82ee72b054a4
> Revision: 235678
> Node Kind: file
> Schedule: normal
> Last Changed Author: olegendo
> Last Changed Rev: 235676
> Last Changed Date: 2016-04-30 11:11:03 +0200 (Sat, 30 Apr 2016)
> Text Last Updated: 2016-04-30 12:43:53 +0200 (Sat, 30 Apr 2016)
> Checksum: 81a247fa3a0b075a9d5912fc76108ca80958bf4d
> 
> Path: configure.ac
> Name: configure.ac
> Working Copy Root Path: /home/eric/svn/gcc
> URL: svn+ssh://gcc.gnu.org/svn/gcc/trunk/libada/configure.ac
> Relative URL: ^/trunk/libada/configure.ac
> Repository Root: svn+ssh://gcc.gnu.org/svn/gcc
> Repository UUID: 138bc75d-0d04-0410-961f-82ee72b054a4
> Revision: 235678
> Node Kind: file
> Schedule: normal
> Last Changed Author: jakub
> Last Changed Rev: 232055
> Last Changed Date: 2016-01-04 15:30:50 +0100 (Mon, 04 Jan 2016)
> Text Last Updated: 2016-01-06 08:00:38 +0100 (Wed, 06 Jan 2016)
> Checksum: 33d89322bb3530f3c351968d9d127c6af002acf7
> 
> You cannot change any configure files without changing configure.ac.

Ugh, sorry.  Those pieces that I've changed manually are pulled from
the GNU config repo.  At least that's what I understand.  My proposed
changes to remove SH5 support in GNU config were rejected.  I will
revert the corresponding changes in GCC.

Cheers,
Oleg


Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations

2016-05-11 Thread Jiong Wang



On 09/05/16 16:08, Segher Boessenkool wrote:

Hi Christophe,

On Mon, May 09, 2016 at 03:54:26PM +0200, Christophe Lyon wrote:

After this patch, I've noticed that
gcc.target/arm/pr43920-2.c
now fails at:
/* { dg-final { scan-assembler-times "pop" 2 } } */

Before the patch, the generated code was:
[...]
 pop {r3, r4, r5, r6, r7, pc}
.L4:
 mov r0, #-1
.L1:
 pop {r3, r4, r5, r6, r7, pc}

it is now:
[...]
.L1:
 pop {r3, r4, r5, r6, r7, pc}
.L4:
 mov r0, #-1
 b   .L1

The new version does not seem better, as it adds a branch on the path
and it is not smaller.

That looks like bb-reorder isn't doing its job?  Maybe it thinks that
pop is too expensive to copy?

I think so. Filed PR71061

ARM backend is not setting the length attribute correctly, that the bb
failed copy_bb_p check.

Unfortunately I am afraid even we fixed the backend length issue, this 
testcase
will keep failing, because it's specify "-Os" that some bb copy won't be 
triggerd.




RE: [ARM] mno-pic-data-is-text-relative & msingle-pic-base

2016-05-11 Thread Joey Ye


> -Original Message-
> From: Nathan Sidwell [mailto:nathanmsidw...@gmail.com] On Behalf Of
> Nathan Sidwell
> Sent: 09 May 2016 18:22
> To: Joey Ye; Richard Earnshaw; GCC Patches
> Subject: Re: [ARM] mno-pic-data-is-text-relative & msingle-pic-base
> 
> Joey,
> > This patch will do what you intend it to do. However, I am not sure in part
> related to VxWorks. The logic behind this patch is that -mno-pic-data-is-
> text-relative should enable -msingle-pic-base because otherwise it will be
> useless. The logic itself is orthogonal to OS. So I am not convinced the 'else
> if' shouldn't be just 'if'. It should not change VxWorks behaviour if VxWorks
> enables -msingle-pic-base explicitly. Or otherwise there is at least one use
> case that -mno-pic-data-is-text-relative can be used without -msingle-pic-
> base, which breaks the logic that this whole patch stands on.
> 
> VxWorks has two modes of code generation -- kernel and RTP.  RTPs don't
> have a fixed mapping between code and data (and use special sequence to
> initialize the PIC register, using vxworks-specific relocs).  Kernel mode
> doesn't support PIC code generation -- see config/vxworks.c
> 
> So I don't think there's a problem.
No other commit. OK for me though I cannot approve it.

- Joey



Please update config.sub from GNU config

2016-05-11 Thread Jakub Sejdak
Hello,

I'm currently working on port for new OS for both binutils and GCC. My
patches depend on config.sub, which has been recently updated for
Phoenix OS.
Please synchronize, so I could test and send my patches here later on.

Thanks,
Jakub


Re: [PATCH 4/4] Initial version of RTL frontend

2016-05-11 Thread Bernd Schmidt

On 05/11/2016 01:06 PM, Ramana Radhakrishnan wrote:

Is there any reason why this framework cannot be used to replace a
large number of scan-assembler tests in various target testsuites
which are essentially testing for either a peephole, a transformation
or the register allocator eliminating particular moves ? One of the
problems in gcc.target/arm is the myriad of options to make skip
these scan assembler tests depending on multilib options passed in
from the top and I'm wondering if this gives a way out. Obviously I
don't want to keep 2 copies of the md patterns just to test a
peephole or what reload is doing in a test backend !


The idea of having a test backend is that it is stable, and will not 
diverge from the tests, while a real backend may undergo development 
that may periodically invalidate test rtl.


In practice I expect we might end up using both approaches, depending on 
the test. Ideally we'd have some unit tests to verify pass behaviour, 
and these could be tested with the special test backend, while any 
deeply target-specific tests (maybe relying on special reorg passes) 
could live in the normal gcc.target/.



Bernd


Re: [PATCH 4/4] Initial version of RTL frontend

2016-05-11 Thread Ramana Radhakrishnan


On 10/05/16 19:20, Bernd Schmidt wrote:
> On 05/10/2016 08:05 PM, Richard Biener wrote:
>> On May 10, 2016 7:02:33 PM GMT+02:00, Jeff Law 
>> wrote:
>>> Well, not if we take Bernd's idea and create a new backend for
>>> testing purposes.  If we want to know/test what reload's doing, we
>>> cons up the appropriate RTL for that testing backend, set the right
>>> flags and call reload() then look at the result.

>>
>> Hmm, I guess that some test cases rely on specific patterns being
>> (not) present so we'd have one backend per testcase ... (Or .md file
>> snippets per testcase).
> 
> You could have any pattern you want in such a machine description, and if you 
> run into conflicts or if you want to test different variants, you can have -m 
> switches.
> 



Is there any reason why this framework cannot be used to replace a large number 
of scan-assembler tests in various target testsuites which are essentially 
testing for either a peephole, a transformation or the register allocator 
eliminating particular moves ? One of the problems in gcc.target/arm is the 
myriad of options to make skip these scan assembler tests depending on multilib 
options passed in from the top and I'm wondering if this gives a way out. 
Obviously I don't want to keep 2 copies of the md patterns just to test a 
peephole or what reload is doing in a test backend !

regards
Ramana
> 
> Bernd
> 


Re: [PATCH] Preserve GCC 5 behavior for PR71002

2016-05-11 Thread Richard Biener
On Tue, 10 May 2016, Jakub Jelinek wrote:

> On Tue, May 10, 2016 at 10:18:59AM +0200, Richard Biener wrote:
> > There are two options - not apply this folding if it doesn't preserve
> > TBAA behavior (first patch) or preserve TBAA behavior by wrapping
> > the base into a MEM_REF (which also requires fixing 
> > reference_alias_ptr_type, sth which we need to do anyway).
> > See the two patches below.
> > 
> > Incidentially the folding (which I still consider premature) prevents
> > CSE on the GIMPLE level as well:
> > 
> >   :
> >   MEM[(struct  &)this_2(D)] ={v} {CLOBBER};
> >   _4 = _2(D)->m_str;
> >   MEM[(struct  &)this_2(D)] ={v} {CLOBBER};
> >   MEM[(struct short_t &)this_2(D)].h.is_short = 1;
> >   MEM[(struct short_t &)this_2(D)].h.length = 0;
> >   MEM[(struct short_t &)this_2(D)].data[0] = 0;
> >   _19 = BIT_FIELD_REF ;
> >   _20 = _19 & 1;
> >   if (_20 != 0)
> > 
> > Here we can't CSE the is_short access and optimize the compare.
> > So on trunk I am thinking of at least removing the compare
> > against constant case.  [I've removed the whole code twice in
> > GCC history and it always got installed back ...]
> > 
> > Bootstrap and regtest of both patches running on x86_64-unknown-linux-gnu.
> > 
> > Any preference for GCC 6?
> 
> I think I prefer the second patch for the branch.

On trunk this exposed an issue with the gimplify-into-SSA patch and
I catched an issue with passing the original ref.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
sofar.

Richard.

2016-05-11  Richard Biener  

PR middle-end/71002
* alias.c (reference_alias_ptr_type): Preserve alias-set zero
if the langhook insists on it.
* fold-const.c (make_bit_field_ref): Add arg for the original
reference and preserve its alias-set.
(decode_field_reference): Take exp by reference and adjust it
to the original memory reference.
(optimize_bit_field_compare): Adjust callers.
(fold_truth_andor_1): Likewise.
* gimplify.c (gimplify_expr): Adjust in-SSA form test.

* g++.dg/torture/pr71002.C: New testcase.

Index: gcc/alias.c
===
*** gcc/alias.c (revision 236032)
--- gcc/alias.c (working copy)
*** reference_alias_ptr_type_1 (tree *t)
*** 769,774 
--- 769,778 
  tree
  reference_alias_ptr_type (tree t)
  {
+   /* If the frontend assigns this alias-set zero, preserve that.  */
+   if (lang_hooks.get_alias_set (t) == 0)
+ return ptr_type_node;
+ 
tree ptype = reference_alias_ptr_type_1 ();
/* If there is a given pointer type for aliasing purposes, return it.  */
if (ptype != NULL_TREE)
Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 236069)
--- gcc/fold-const.c(working copy)
*** static enum tree_code compcode_to_compar
*** 117,130 
  static int operand_equal_for_comparison_p (tree, tree, tree);
  static int twoval_comparison_p (tree, tree *, tree *, int *);
  static tree eval_subst (location_t, tree, tree, tree, tree, tree);
- static tree make_bit_field_ref (location_t, tree, tree,
-   HOST_WIDE_INT, HOST_WIDE_INT, int, int);
  static tree optimize_bit_field_compare (location_t, enum tree_code,
tree, tree, tree);
- static tree decode_field_reference (location_t, tree, HOST_WIDE_INT *,
-   HOST_WIDE_INT *,
-   machine_mode *, int *, int *, int *,
-   tree *, tree *);
  static int simple_operand_p (const_tree);
  static bool simple_operand_p_2 (tree);
  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
--- 117,124 
*** distribute_real_division (location_t loc
*** 3803,3817 
  
  /* Return a BIT_FIELD_REF of type TYPE to refer to BITSIZE bits of INNER
 starting at BITPOS.  The field is unsigned if UNSIGNEDP is nonzero
!and uses reverse storage order if REVERSEP is nonzero.  */
  
  static tree
! make_bit_field_ref (location_t loc, tree inner, tree type,
HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
int unsignedp, int reversep)
  {
tree result, bftype;
  
if (bitpos == 0 && !reversep)
  {
tree size = TYPE_SIZE (TREE_TYPE (inner));
--- 3797,3819 
  
  /* Return a BIT_FIELD_REF of type TYPE to refer to BITSIZE bits of INNER
 starting at BITPOS.  The field is unsigned if UNSIGNEDP is nonzero
!and uses reverse storage order if REVERSEP is nonzero.  ORIG_INNER
!is the original memory reference used to preserve the alias set of
!the access.  */
  
  static tree
! make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
   

Re: [PATCH] [rtlfe] Barebones implementation of "__RTL"; next steps?

2016-05-11 Thread Richard Biener
On Wed, May 11, 2016 at 3:31 AM, Trevor Saunders  wrote:
> On Tue, May 10, 2016 at 05:01:00PM -0400, David Malcolm wrote:
>> [CCing Prasad since this may be useful for his gimple FE work, by
>> replacing "rtl" with "gimple" in the patch]
>>
>> On Mon, 2016-05-09 at 11:44 +0200, Richard Biener wrote:
>> > On Wed, May 4, 2016 at 10:49 PM, David Malcolm 
>> > wrote:
>>
>> > > This patch kit introduces an RTL frontend, for the purpose
>> > > of unit testing: primarly for unit testing of RTL passes, and
>> > > possibly for unit testing of .md files.
>> > >
>> > > It's very much a work-in-progress; I'm posting it now to get
>> > > feedback.
>>
>> [...snip...]
>>
>> > > * The RTL frontend doesn't have any knowledge of the name of the
>> > > function,
>> > >   of parameters, types, locals, globals, etc.  It creates a single
>> > > function.
>> > >   The function is currently hardcoded to have this signature:
>> > >
>> > >  int test_1 (int, int, int);
>> > >
>> > >   since there's no syntax for specify otherwise, and we need to
>> > > provide
>> > >   a FUNCTION_DECL tree when building a function object (by calling
>> > >   allocate_struct_function).
>> > >
>> > > * Similarly, there are no types beyond the built-in ones; all
>> > > expressions
>> > >   are treated as being of type int.  I suspect that this approach
>> > >   will be too simplistic when it comes to e.g. aliasing.
>> >
>> > To address this and the previous issue I suggest to implement the RTL
>> > FE
>> > similar to how I proposed the GIMPLE FE - piggy-back on the C FE and
>> > thus
>> > allow
>> >
>> > int __RTL foo (int a, int b) // gets you function decl and param
>> > decls
>> > {
>> >  (insn ...)
>> > ...
>> >
>> > }
>> >
>> > int main()
>> > {
>> >   if (foo (1) != 0)
>> > abort ();
>> > }
>> >
>> > That would also allow dg-do run testcases and not rely solely on
>> > scanning
>> > RTL dumps.
>>
>> The following is an attempt at implementing this, by adding a new
>> "__RTL" keyword, and detecting it in the C frontend, switching
>> to a custom parser for the function body.
>>
>> Does this look like the kind of thing you had in mind for the
>> RTL and gimple "frontends"?

Yes!

>> Wiring this up to the existing RTL parser might be awkward: the
>> existing RTL parser in read-md.c etc works at the level of
>> characters (read_char and unread_char from a FILE *), whereas the
>> C frontend is in terms of tokens.
>>
>> I have a patch that ports the RTL parser to using libcpp for
>> location-tracking, and another that updates it to use libcpp
>> for diagnostics.  This adds more dependency on a build-time
>> libcpp to the gen* tools.  Both patches are currently messy.
>> Potentially I could build on them and attempt to update the RTL
>> parser further, to use libcpp's tokenizer.
>>
>> Does that general approach sound sane?  In particular:
>> - is it sane to eliminate errors.c in favor of building
>> diagnostics*.c for the build machine as well as the host machine?
>> - is it sane to rewrite the read-md.c/read-rtl.c code to
>> a token-based approach, using libcpp?
>>
>> Alternatively, the shorter term approach would be to kludge
>> in reading from a FILE * in read-md.c based on where the
>> C parser has got to, with a hybrid of the two approaches
>> (character-based vs token-based).
>
> Another option is to make read-md.c use tokens, but instead of building
> libcpp for the build machine write a new token parser that is text
> compatible with the libcpp one, but just enough to do what read-md.c
> needs.  However that seems silly, and suggests just using libcpp is the
> sane thing to do :)
>
>> Thoughts?
>
> I'm not aware of any pitfalls in using libcpp in build tools, though it
> does seem slightly unfortunate to need to build so much build tool
> stuff.
>
> Thinking about this I wonder if libcpp would be useful in gengtype to
> get around some of the wierdness with headers (and maybe even
> languages?) but that doesn't need to be thought about now.

genmatch also uses libcpp, it's really convenient for the diagnostics as well.

That said, another kludge would be to simply use cpp_token_as_text
(see genmatch.c:c_expr::gen_transform), write the whole function
to a temporary file and parse that back in with read_md ;)

In my mind sth to continue prototyping is more important than to clean
this piece up righ now.

Richard.

> Trev
>


Re: [PATCH] Fix PR71039

2016-05-11 Thread Bernhard Reutner-Fischer
On May 11, 2016 11:36:11 AM GMT+02:00, Richard Biener  wrote:
>On Wed, 11 May 2016, Bernhard Reutner-Fischer wrote:
>
>> On May 10, 2016 3:07:12 PM GMT+02:00, Richard Biener
> wrote:
>> >
>> >The following fixes PR71039 - we were failing to verify we can
>> >insert the lhs on the predecessor edges.
>> >
>> >Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>> 
>> >+ /* Verify if *idx is available at *DATA.  */
>> >+ 
>> >+ static bool
>> >+ chk_uses (tree t, tree *idx, void *data)
>> >+ {
>> >+   basic_block dom = (basic_block) data;
>> >+   if (TREE_CODE (*idx) == SSA_NAME)
>> >+ return (SSA_NAME_IS_DEFAULT_DEF (*idx)
>> >+   || ! dominated_by_p (CDI_DOMINATORS,
>> >+gimple_bb (SSA_NAME_DEF_STMT (*idx)), dom));
>> >+   return true;
>> >+ }
>> 
>> Shouldn't this warn about unused t?
>
>I fixed that before committing.

Thanks, Sorry for the noise.



Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-11 Thread Ramana Radhakrishnan

> 
> I've looked at the generated code in more details, and for armv6 this 
> generates
> mcr p15, 0, r0, c7, c10, 5
> which is not what __cilkrts_fence uses currently (CP15DSB vs CP15DMB)


Wow I hadn't noticed that it was a DSB -  DSB is way too heavy weight. Userland 
shouldn't need to use this by default IMNSHO. It's needed if you are working on 
non-cacheable memory or performing cache maintenance operations but I can't 
imagine cilkplus wanting to do that ! 

http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf

It's almost like the default definitions need to be in terms of the atomic 
extensions rather than having these written in this form. Folks usually get 
this wrong ! 

> 
> Looking at arm/sync.md it seems that there is no way to generate CP15DSB.

No - there is no way of generating DSB,  DMB's should be sufficient for this 
purpose. Would anyone know what the semantics of __cilkrts_fence are that 
require this to be a DSB ? 

Ramana

> 
> 
>> Christophe
>>
>>> Thanks,
>>>   -- Ilya


Re: [patch] Tidy up RTL libfunc machinery

2016-05-11 Thread Richard Biener
On Tue, May 10, 2016 at 10:12 PM, Eric Botcazou  wrote:
> Hi,
>
> this patch is aimed at cleaning up the mess with the RTL libfunc machinery.
>
> On the one hand you have comments like these in the RTL expander:
>
>   /* It is incorrect to use the libcall calling conventions to call
>  memcpy in this context.  This could be a user call to memcpy and
>  the user may wish to examine the return value from memcpy.  For
>  targets where libcalls and normal calls have different conventions
>  for returning pointers, we could end up generating incorrect code.  */
>
>   /* It is incorrect to use the libcall calling conventions to call
>  memset in this context.  This could be a user call to memset and
>  the user may wish to examine the return value from memset.  For
>  targets where libcalls and normal calls have different conventions
>  for returning pointers, we could end up generating incorrect code.  */
>
> and on the other hand you have at the end of expand_builtin_memcmp:
>
>   emit_library_call_value (memcmp_libfunc, result, LCT_PURE,
>TYPE_MODE (integer_type_node), 3,
>XEXP (arg1_rtx, 0), Pmode,
>XEXP (arg2_rtx, 0), Pmode,
>convert_to_mode (TYPE_MODE (sizetype), arg3_rtx,
> TYPE_UNSIGNED (sizetype)),
>TYPE_MODE (sizetype));
>
> so the RTL expander is not allowed to call memcpy & memset libfuncs on its own
> but it is for __builtin_memcmp, which is rather inconsistent.
>
> So the patch eliminates all the RTL libfuncs and replaces them by calls to the
> corresponding builtins, except for 3 of them:
>   - synchronize_libfunc, which is used by the ARM and MIPS back-ends,
>   - unwind_sjlj_register_libfunc, which is used by the SJLJ machinery,
>   - unwind_sjlj_unregister_libfunc, likewise.
>
> abort_libfunc and memcmp_libfunc are set by VMS to something else than the
> default but this looks like an optimization and can presumably be dropped.
>
> Bootstrapped/regtested on x86-64/Linux and PowerPC/Linux, OK for mainline?

The middle-end parts are ok, I'm leaving backend parts for their
maintainers to comment
but those parts are ok as well if they do not comment within a
reasonable amount of time.

Did you check the internals manual if it needs adjustment?

Thanks,
Richard.

>
> 2016-05-10  Eric Botcazou  
>
> * builtins.c (expand_builtin_memcmp): Do not emit the call here.
> (expand_builtin_trap): Emit a regular call.
> (set_builtin_user_assembler_name): Remove obsolete cases.
> * dse.c (scan_insn): Adjust.
> * except.c: Include calls.h.
> (sjlj_emit_function_enter): If DONT_USE_BUILTIN_SETJMP is defined,
> emit a regular call to setjmp.
> * expr.c (emit_block_move_hints): Call emit_block_copy_via_libcall.
> (block_move_libcall_safe_for_call_parm): Use memcpy builtin.
> (emit_block_move_via_libcall): Delete.
> (block_move_fn): Delete.
> (init_block_move_fn): Likewise.
> (emit_block_move_libcall_fn): Likewise.
> (emit_block_op_via_libcall): New function.
> (set_storage_via_libcall): Tidy up and use memset builtin.
> (block_clear_fn): Delete.
> (init_block_clear_fn): Likewise.
> (clear_storage_libcall_fn): Likewise.
> (expand_assignment): Call emit_block_move_via_libcall.
> Do not include gt-expr.h.
> * expr.h (emit_block_op_via_libcall): Declare.
> (emit_block_copy_via_libcall): New inline function.
> (emit_block_move_via_libcall): Likewise.
> (emit_block_comp_via_libcall): Likewise.
> (block_clear_fn): Delete.
> (init_block_move_fn): Likewise.
> (init_block_clear_fn): Likewise.
> (emit_block_move_via_libcall): Likewise.
> (set_storage_via_libcall): Add default parameter value.
> * libfuncs.h (enum libfunc_index): Remove obsolete values.
> (abort_libfunc): Delete.
> (memcpy_libfunc): Likewise.
> (memmove_libfunc): Likewise.
> (memcmp_libfunc): Likewise.
> (memset_libfunc): Likewise.
> (setbits_libfunc): Likewise.
> (setjmp_libfunc): Likewise.
> (longjmp_libfunc): Likewise.
> (profile_function_entry_libfunc): Likewise.
> (profile_function_exit_libfunc): Likewise.
> (gcov_flush_libfunc): Likewise.
> * optabs-libfuncs.c (build_libfunc_function): Set DECL_ARTIFICIAL
> and DECL_VISIBILITY on the declaration.
> (init_optabs): Do not initialize obsolete libfuncs.
> * optabs.c (prepare_cmp_insn): Call emit_block_comp_via_libcall.
> * tree-core.h (ECF_RET1): Define.
> (ECF_TM_PURE): Adjust.
> (ECF_TM_BUILTIN): Likewise.
> * tree.c (set_call_expr_flags): Deal with ECF_RET1.
>

Re: [PATCH] Fix PR71039

2016-05-11 Thread Richard Biener
On Wed, 11 May 2016, Bernhard Reutner-Fischer wrote:

> On May 10, 2016 3:07:12 PM GMT+02:00, Richard Biener  
> wrote:
> >
> >The following fixes PR71039 - we were failing to verify we can
> >insert the lhs on the predecessor edges.
> >
> >Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> 
> >+ /* Verify if *idx is available at *DATA.  */
> >+ 
> >+ static bool
> >+ chk_uses (tree t, tree *idx, void *data)
> >+ {
> >+   basic_block dom = (basic_block) data;
> >+   if (TREE_CODE (*idx) == SSA_NAME)
> >+ return (SSA_NAME_IS_DEFAULT_DEF (*idx)
> >+|| ! dominated_by_p (CDI_DOMINATORS,
> >+ gimple_bb (SSA_NAME_DEF_STMT (*idx)), dom));
> >+   return true;
> >+ }
> 
> Shouldn't this warn about unused t?

I fixed that before committing.

Richard.


Re: [PATCH, PR middle-end/70807] Free dominance info in CSE pass

2016-05-11 Thread Ilya Enkovich
2016-05-10 21:13 GMT+03:00 H.J. Lu :
> On Tue, May 10, 2016 at 9:19 AM, Ilya Enkovich  wrote:
>> Hi,
>>
>> Curretly CSE may modify CFG and leave invalid dominance info.  This patch
>> improves track of CFG changes by CSE passes and frees dominance info if
>> required.  This allows to remove corresponding workaround from STV pass.
>>
>> Does it look OK?
>>
>> Bootstrapped and regtested on x86-64-unknown-linux-gnu.
>>
>
>> diff --git a/gcc/testsuite/gcc.dg/pr70807.c b/gcc/testsuite/gcc.dg/pr70807.c
>> new file mode 100644
>> index 000..9ef2a4d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr70807.c
>> @@ -0,0 +1,18 @@
>> +/* PR middle-end/70807 */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +typedef int INT;
>> +int a, b, c, d, e, f;
>> +void fn1() {
>> +  INT g;
>> +  if (d && a)
>> +;
>> +  else if (e && b)
>> +;
>> +  else if (!a && !b && c)
>> +;
>> +  else if (b && d || a && e)
>> +a = 0;
>> +  f = g || d;
>> +}
>
> Does this test fail without the fix?

Yes, I creduced it from a libgcc build fail caused by the trigger
patch from the tracker.

Thanks,
Ilya

>
> --
> H.J.


Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-11 Thread Christophe Lyon
On 10 May 2016 at 21:24, Christophe Lyon  wrote:
> On 10 May 2016 at 19:18, Ilya Verbin  wrote:
>> On Tue, May 10, 2016 at 14:36:36 +0100, Ramana Radhakrishnan wrote:
>>> On Tue, May 10, 2016 at 2:02 PM, Christophe Lyon
>>>  wrote:
>>> > On 9 May 2016 at 15:34, Christophe Lyon  
>>> > wrote:
>>> >> On 9 May 2016 at 15:29, Jeff Law  wrote:
>>> >>> On 05/09/2016 01:37 AM, Christophe Lyon wrote:
>>>  After this merge, I'm seeing lots of timeouts on arm (using QEMU).
>>>  Is this "expected"? (as in: should I increase my timeout value)
>>> >>>
>>> >>> I wouldn't say it's expected; this is the first time Cilk+ has been
>>> >>> supported on ARM.  It could be a bug in the ARM support in the runtime, 
>>> >>> an
>>> >>> ARM compiler bug or even a bug in the ARM QEMU support.
>>> >>>
>>> >>> Probably the first step is to see if it's working properly on real 
>>> >>> hardware.
>>> >>> That would at least allow us to eliminate QEMU from the equation if it's
>>> >>> failing in the same manner on a real machine.
>>> >>>
>>> >> OK, I'll check that.
>>> >> I wanted to know if I was missing something obvious.
>>> >
>>> > I've tested in an armhf chroot on an armv8 machine, and I saw SIGILL 
>>> > errors
>>> > on:
>>> > mcr 15, 0, r3, cr7, cr10, {4}
>>> > which is how __cilkrts_fence is implemented in
>>> > ../libcilkrts/runtime/config/arm/os-fence.h
>>>
>>> At first glance I'd ask why this shouldn't be __atomic_thread_fence or
>>> __atomic_signal_fence ( SEQ_CST)  if that's what they want here and
>>> then it will work (TM) regardless of architecture levels.
>>>
>>> > This instruction is not supported anymore on armv8. Recent arm64 kernels
>>> > have handlers for it.
>>> >
>>> > So we may want the implementation to be conditional, or prefer to rely on
>>> > kernel support.
>>
>> ARM enabling code was taken from community contribution, we haven't tested 
>> it.
>> If someone wants to fix this, it would be appreciated.
>>
>
> Following Ramana's suggestion, I tried:
> # define __cilkrts_fence() __atomic_thread_fence(__ATOMIC_SEQ_CST);
> and the tests now pass.
>

I've looked at the generated code in more details, and for armv6 this generates
mcr p15, 0, r0, c7, c10, 5
which is not what __cilkrts_fence uses currently (CP15DSB vs CP15DMB)

Looking at arm/sync.md it seems that there is no way to generate CP15DSB.


> Christophe
>
>> Thanks,
>>   -- Ilya


Re: [PATCH][AArch64] Simplify ashl3 expander for SHORT modes

2016-05-11 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01651.html

Thanks,
Kyrill

On 27/04/16 15:10, Kyrill Tkachov wrote:

Hi all,

The ashl3 expander for QI and HI modes is needlessly obfuscated.
The 2nd operand predicate accepts nonmemory_operand but the expand code
FAILs if it's not a CONST_INT. We can just demand a const_int_operand in
the predicate and remove the extra CONST_INT check.

Looking at git blame, it seems it was written that way as a result of some
other refactoring a few years back for an unrelated change.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?

Thanks,
Kyrill

2016-04-27  Kyrylo Tkachov  

* config/aarch64/aarch64.md (ashl3, SHORT modes):
Use const_int_operand for operand 2 predicate.  Simplify expand code
as a result.




Re: [PATCH][AArch64] Delete obsolete CC_ZESWP and CC_SESWP CC modes

2016-05-11 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01652.html

Thanks,
Kyrill
On 27/04/16 15:12, Kyrill Tkachov wrote:

Hi all,

The CC_ZESWP and CC_SESWP are not used anywhere and seem to be a remmant of some
old code that was removed. The various compare+extend patterns in aarch64.md 
don't
use these modes. So it should be safe to remove them to avoid future confusion.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2016-04-27  Kyrylo Tkachov  

* config/aarch64/aarch64-modes.def (CC_ZESWP, CC_SESWP): Delete.
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Remove condition
that returns CC_SESWPmode and CC_ZESWPmode.
(aarch64_get_condition_code_1): Remove handling of CC_SESWPmode
and CC_SESWPmode.
(aarch64_rtx_costs): Likewise.




Re: [PATCH][ARM] Fix costing of sign-extending load in rtx costs

2016-05-11 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01655.html

Thanks,
Kyrill

On 27/04/16 15:13, Kyrill Tkachov wrote:

Hi all,

Another costs issue that came out of the investigation for PR 65932 is that
sign-extending loads get a higher cost than they should in the arm backend.
The problem is that when handling a sign-extend of a MEM we add the cost
of the load_sign_extend cost field and then recursively add the cost of the 
inner MEM
rtx, which is bogus. This will end up adding an extra load cost on it.

The solution in this patch is to just remove that recursive step.
With this patch from various CSE dumps I see much more sane costs assign to 
these
expressions (such as 12 instead of 32 or higher).

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-04-27  Kyrylo Tkachov  

* config/arm/arm.c (arm_new_rtx_costs, SIGN_EXTEND case):
Don't add cost of inner memory when handling sign-extended
loads.




Re: [PATCH vs] Take known zero bits into account when checking extraction.

2016-05-11 Thread Dominik Vogt
On Wed, May 11, 2016 at 10:40:11AM +0200, Bernd Schmidt wrote:
> On 05/11/2016 09:42 AM, Dominik Vogt wrote:
> >On Tue, May 10, 2016 at 05:05:06PM +0200, Bernd Schmidt wrote:
> >>Earlier in the discussion you mentioned the intention to remove
> >>these costs. Nothing else in the function does cost calculations -
> >>maybe you can try placing a gcc_unreachable into the case where the
> >>costs would prevent the transformation to see if it ever triggers.
> >
> >You mean to try it out locally or as part of the patch?
> 
> I meant try it out locally. I'm almost certain the patch shouldn't
> be trying to use costs here.

That's what I mentioned somewhere during the discussion.  The s390
backend just uses COSTS_N_INSNS(1) for AND as well as ZERO_EXTEND,
so this won't ever trigger.  I just left the rtx_cost call in the
patch for further discussion as Jeff said he liked the approach.
We don't need it to achieve the behaviour we want for s390.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH] Fix PR71039

2016-05-11 Thread Bernhard Reutner-Fischer
On May 10, 2016 3:07:12 PM GMT+02:00, Richard Biener  wrote:
>
>The following fixes PR71039 - we were failing to verify we can
>insert the lhs on the predecessor edges.
>
>Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

>+ /* Verify if *idx is available at *DATA.  */
>+ 
>+ static bool
>+ chk_uses (tree t, tree *idx, void *data)
>+ {
>+   basic_block dom = (basic_block) data;
>+   if (TREE_CODE (*idx) == SSA_NAME)
>+ return (SSA_NAME_IS_DEFAULT_DEF (*idx)
>+  || ! dominated_by_p (CDI_DOMINATORS,
>+   gimple_bb (SSA_NAME_DEF_STMT (*idx)), dom));
>+   return true;
>+ }

Shouldn't this warn about unused t?
Thanks,



Re: [PATCH vs] Take known zero bits into account when checking extraction.

2016-05-11 Thread Bernd Schmidt

On 05/11/2016 09:42 AM, Dominik Vogt wrote:

On Tue, May 10, 2016 at 05:05:06PM +0200, Bernd Schmidt wrote:

Earlier in the discussion you mentioned the intention to remove
these costs. Nothing else in the function does cost calculations -
maybe you can try placing a gcc_unreachable into the case where the
costs would prevent the transformation to see if it ever triggers.


You mean to try it out locally or as part of the patch?


I meant try it out locally. I'm almost certain the patch shouldn't be 
trying to use costs here.



Bernd


Re: [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns

2016-05-11 Thread Christophe Lyon
On 11 May 2016 at 01:26, Segher Boessenkool  wrote:
> On Tue, May 10, 2016 at 09:33:56PM +0200, Christophe Lyon wrote:
>> This patch causes an ICE on gcc.dg/20010822-1.c for target arm-none-eabi
>> --with-cpu=cortex-a9
>
> That is PR71028, I sent a patch to fix it, will commit in a minute.
> (See https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00673.html ).
>
> Sorry for the breakage,
>

OK thanks. Sorry for the delay in reporting this. All the noise caused
by the cilkplus
merge meant I had to dig longer to identify real regressions.

I confirm your patch did fix the regression.

Christophe.

>
> Segher
>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/20010822-1.c:
>> In function 'bar':
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/20010822-1.c:31:1:
>> internal compiler error: in redirect_jump, at jump.c:1560
>> 0x949a27 redirect_jump(rtx_jump_insn*, rtx_def*, int)
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/jump.c:1560
>> 0x10ec689 try_optimize_cfg
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:2899
>> 0x10ec689 cleanup_cfg(int)
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:3150
>> 0x10ed1f6 execute
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:3279


Re: [PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-11 Thread Marc Glisse

On Tue, 10 May 2016, Mikhail Maltsev wrote:


On 05/08/2016 10:57 PM, Marc Glisse wrote:

On Sun, 8 May 2016, Mikhail Maltsev wrote:


Hi!

I decided to revive this patch:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00999.html.
I addressed review comments about sign conversions. Bootstrapped and regtested
on x86_64-linux-gnu {,-m32}. OK for trunk?


Hello,

are you sure that your transformations are safe for any kind of conversion?


Oops, indeed, only narrowing conversions should be allowed. I updated the patch
and added some more test cases.


+/* ~((~X) >> Y) -> X >> Y (for arithmetic shift).  */
+(simplify
+ (bit_not (convert? (rshift (bit_not @0) @1)))
+  (if (!TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))
+   (convert (rshift @0 @1

Is there a particular reason to split the converting / non-converting
cases? For rotate, you managed to merge them nicely.

+
+(simplify
+ (bit_not (convert? (rshift (convert@0 (bit_not @1)) @2)))
+  (if (!TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (TREE_TYPE (@1))
+   && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))
+   (with
+{ tree shift_type = TREE_TYPE (@0); }
+ (convert (rshift:shift_type (convert @1) @2)
+
+/* Same as above, but for rotates.  */
+(for rotate (lrotate rrotate)
+ (simplify
+  (bit_not (convert1?@0 (rotate (convert2?@1 (bit_not @2)) @3)))
+   (if (TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (TREE_TYPE (@2))
+&& TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (TREE_TYPE (@1)))
+(with
+ { tree operand_type = TREE_TYPE (@2); }
+  (convert (rotate:operand_type @2 @3))

Is that really safe when the conversion from @2 to @1 is narrowing? I
would expect something closer to
(convert (rotate (convert:type_of_1 @2) @3))
so the rotation is done in a type of the same precision as the original.

Or
(convert (rotate:type_of_1 (convert @2) @3))
if you prefer specifying the type there (I don't), and note that you
need the 'convert' inside or specifying the type on rotate doesn't work.

I have a slight preference for element_precision over TYPE_PRECISION 
(which for vectors is the number of elements), but I don't think it can 
currently cause issues for these particular transformations.


I don't know if we might want some :c / single_use restrictions, maybe 
on the outer convert and the rshift/rotate.


--
Marc Glisse


Re: [PATCH vs] Take known zero bits into account when checking extraction.

2016-05-11 Thread Dominik Vogt
On Tue, May 10, 2016 at 05:05:06PM +0200, Bernd Schmidt wrote:
> On 05/10/2016 03:06 PM, Dominik Vogt wrote:
> >+  int cost_of_and;
> >+  int cost_of_zero_ext;
> >+
> >+  cost_of_and = rtx_cost (x, mode, in_code, 1, speed_p);
> >+  cost_of_zero_ext = rtx_cost (temp, mode, in_code, 1, speed_p);
> >+  if (cost_of_zero_ext <= cost_of_and)
> 
> Earlier in the discussion you mentioned the intention to remove
> these costs. Nothing else in the function does cost calculations -
> maybe you can try placing a gcc_unreachable into the case where the
> costs would prevent the transformation to see if it ever triggers.

You mean to try it out locally or as part of the patch?

> >+/* Test whether an AND mask or'ed with the know zero bits that equals a mode
> >+   mask is a candidate for zero extendion.  */
> >+
> >+/* Note: This test requires that char, int and long have different sizes 
> >and the
> >+   target has a way to do 32 -> 64 bit zero extension other than AND.  
> >Targets
> >+   that fail the test because they do not satisfy these preconditions can 
> >skip
> >+   it.  */
> 
> Hmm, maybe place copies into a few gcc.target subdirectories
> instead? Or add a whitelist of targets (x86, power, aarch64 maybe)?

That's fine with me, but someone needs to find out what targets to
put on the whitelist.  I've only tested s390x and x86_64 so far.

> >+/* { dg-do compile { target lp64 } } */
> 
> I suspect this should be
> 
> /* { dg-do compile } */
> /* { dg-require-effective-target lp64 } */

Er, yes.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH 1/4, libgomp] Resolve deadlock on plugin exit (Ping x2)

2016-05-11 Thread Chung-Lin Tang
Ping x2

On 2016/4/16 3:39 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2016/3/21 06:21 PM, Chung-Lin Tang wrote:
>> Hi, this is the set of patches from 
>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01411.html
>> revised again, this time also with audits for the HSA plugin.
>>
>> The changes are pretty minor, mainly that the unload_image hook now
>> receives similar error handling treatment.
>>
>> Tested again without regressions for nvptx and intelmic, however
>> while I was able to build the toolchain with HSA offloading support, I was
>> unsure how I could test it, as I currently don't have any AMD hardware (not
>> aware if there's an emulator like intelmic).  I would be grateful if
>> the HSA folks can run them for me.
>>
>> Thanks,
>> Chung-Lin
>>
>> ChangeLog for the libgomp proper parts, patch as attached.
>>
>> 2016-03-20  Chung-Lin Tang  
>>
>> * target.c (gomp_device_copy): New function.
>> (gomp_copy_host2dev): Likewise.
>> (gomp_copy_dev2host): Likewise.
>> (gomp_free_device_memory): Likewise.
>> (gomp_map_vars_existing): Adjust to call gomp_copy_host2dev().
>> (gomp_map_pointer): Likewise.
>> (gomp_map_vars): Adjust to call gomp_copy_host2dev(), handle
>> NULL value from alloc_func plugin hook.
>> (gomp_unmap_tgt): Adjust to call gomp_free_device_memory().
>> (gomp_copy_from_async): Adjust to call gomp_copy_dev2host().
>> (gomp_unmap_vars): Likewise.
>> (gomp_update): Adjust to call gomp_copy_dev2host() and
>> gomp_copy_host2dev() functions.
>> (gomp_unload_image_from_device): Handle false value from
>> unload_image_func plugin hook.
>> (gomp_init_device): Handle false value from init_device_func
>> plugin hook.
>> (gomp_exit_data): Adjust to call gomp_copy_dev2host().
>> (omp_target_free): Adjust to call gomp_free_device_memory().
>> (omp_target_memcpy): Handle return values from host2dev_func,
>> dev2host_func, and dev2dev_func plugin hooks.
>> (omp_target_memcpy_rect_worker): Likewise.
>> (gomp_target_fini): Handle false value from fini_device_func
>> plugin hook.
>> * libgomp.h (struct gomp_device_descr): Adjust return type of
>> init_device_func, fini_device_func, unload_image_func, free_func,
>> dev2host_func,host2dev_func, and dev2dev_func plugin hooks to 'bool'.
>> * oacc-host.c (host_init_device): Change return type to bool.
>> (host_fini_device): Likewise.
>> (host_unload_image): Likewise.
>> (host_free): Likewise.
>> (host_dev2host): Likewise.
>> (host_host2dev): Likewise.
>> * oacc-mem.c (acc_free): Handle plugin hook fatal error case.
>> (acc_memcpy_to_device): Likewise.
>> (acc_memcpy_from_device): Likewise.
>> (delete_copyout): Add libfnname parameter, handle free_func
>> hook fatal error case.
>> (acc_delete): Adjust delete_copyout call.
>> (acc_copyout): Likewise.
>> (update_dev_host): Move gomp_mutex_unlock to after
>> host2dev/dev2host hook calls.
>>
> 



Re: [PATCH, libgomp] Fix deadlock in acc_set_device_type (ping x3)

2016-05-11 Thread Chung-Lin Tang
Ping x3

On 2016/4/19 10:30 PM, Chung-Lin Tang wrote:
> Ping x2.
> 
> Hi Jakub,
> This patch is fairly straightforward, and solves a easily encountered
> deadlock. Please approve for trunk and gcc-6-branch.
> 
> Thanks,
> Chung-Lin
> 
> On 2016/4/16 03:39 PM, Chung-Lin Tang wrote:
>> Ping.
>>
>> On 2016/3/28 05:45 PM, Chung-Lin Tang wrote:
>>> Hi Jakub, there's a path for deadlock on acc_device_lock when going
>>> through the acc_set_device_type() OpenACC library function.
>>> Basically, the gomp_init_targets_once() function should not be
>>> called with that held. The attached patch moves it appropriately.
>>>
>>> Also in this patch, there are several cases in acc_* functions
>>> where gomp_init_targets_once() is guarded by a test of
>>> !cached_base_dev. Since that function already uses pthread_once() to
>>> call gomp_target_init(), and technically cached_base_dev
>>> is protected by acc_device_lock, the cleanest way should be to
>>> simply drop those "if(!cached_base_dev)" tests.
>>>
>>> Tested libgomp without regressions on an nvptx offloaded system,
>>> is this okay for trunk?
>>>
>>> Thanks,
>>> Chung-Lin
>>>
>>> 2016-03-28  Chung-Lin Tang  
>>>
>>> * oacc-init.c (acc_init): Remove !cached_base_dev condition on call 
>>> to
>>> gomp_init_targets_once().
>>> (acc_set_device_type): Remove !cached_base_dev condition on call to
>>> gomp_init_targets_once(), move call to before acc_device_lock 
>>> acquire,
>>> to avoid deadlock.
>>> (acc_get_device_num): Remove !cached_base_dev condition on call to
>>> gomp_init_targets_once().
>>> (acc_set_device_num): Likewise.
>>>
>>
> 



Re: [PATCH, libgomp] Rewire OpenACC async (Ping x3)

2016-05-11 Thread Chung-Lin Tang
Ping x3

On 2016/4/16 3:40 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2016/4/8 07:02 PM, Chung-Lin Tang wrote:
>> Ping.
>>
>> On 2016/3/29 5:48 PM, Chung-Lin Tang wrote:
>>> I've updated this patch for trunk (as attached), and re-tested without
>>> regressions. This patch is still a fix for 
>>> libgomp.oacc-c-c++-common/asyncwait-1.c,
>>> which FAILs right now.
>>>
>>> ChangeLog is still as before. Is this okay for trunk?
>>>
>>> Thanks,
>>> Chung-Lin
>>>
>>> On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
 Ping.

 On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
> Hi, this patch reworks some of the way that asynchronous copyouts are
> implemented for OpenACC in libgomp.
>
> Before this patch, we had a somewhat confusing way of implementing this
> by having two refcounts for each mapping: refcount and async_refcount,
> which I never got working again after the last wave of async regressions
> showed up.
>
> So this patch implements what I believe to be a simplification: 
> async_refcount
> is removed, and instead of trying to queue the async copyouts during 
> unmapping
> we actually do that during the plugin event handling. This requires a 
> addition
> of the async stream integer as an argument to the register_async_cleanup
> plugin hook, but overall I think this should be more elegant than before.
>
> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
> It also fixed data-[23].c regressions before, but some other recent 
> check-in
> happened to already fixed those.
>
> Tested without regressions, is this okay for trunk?
>
> Thanks,
> Chung-Lin
>
> 2015-11-24  Chung-Lin Tang  
>
> * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
> * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
> parameter, use to set async stream around call to gomp_unmap_vars,
> call gomp_unmap_vars() with 'do_copyfrom' set to true.
> * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
> (event_gc): Adjust event handling loop, collect 
> PTX_EVT_ASYNC_CLEANUP
> events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
> (event_add): Add int parameter, initialize 'val' field when
> adding new ptx_event struct.
> (nvptx_evec): Adjust event_add() call arguments.
> (nvptx_host2dev): Likewise.
> (nvptx_dev2host): Likewise.
> (nvptx_wait_async): Likewise.
> (nvptx_wait_all_async): Likewise.
> (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async 
> parameter,
> pass to event_add() call.
> * oacc-host.c (host_openacc_register_async_cleanup): Add 'int 
> async'
> parameter.
> * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
> call openacc.register_async_cleanup_func() hook.
> * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
> * target.c (gomp_copy_from_async): Delete function.
> (gomp_map_vars): Remove async_refcount.
> (gomp_unmap_vars): Likewise.
> (gomp_load_image_to_device): Likewise.
> (omp_target_associate_ptr): Likewise.
> * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
> (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
> (gomp_copy_from_async): Remove.
>

>>>
>>
>