Re: [PATCH] Add anitizer_linux_x86_64.lo if __x86_64__ is defined by $CC

2017-10-04 Thread Jakub Jelinek
On Sat, Sep 30, 2017 at 06:12:39PM -0700, H.J. Lu wrote:
> Since size of "void *" is 4 bytes for x32, check if __x86_64__ is defined
> by $CC, instead of
> 
> if test x$ac_cv_sizeof_void_p = x8; then
> 
> to decide wether anitizer_linux_x86_64.lo should be used.
> 
> I am testing this on i686 and x86-64.  OK for trunk and GCC 7 branch if
> there are no regression?  Please upstream it for me if appropriate.
> 
> Thanks.
> 
> 
> H.J.
> ---
>   PR sanitizer/82379
>   * configure.tgt (ANITIZER_COMMON_TARGET_DEPENDENT_OBJECTS): Set

s/ANITIZER/SANITIZER/

>   to anitizer_linux_x86_64.lo if __x86_64__ is defined by $CC.

s/anitizer/sanitizer/

Ok with those changes.

> --- a/libsanitizer/configure.tgt
> +++ b/libsanitizer/configure.tgt
> @@ -27,6 +27,8 @@ case "${target}" in
>   TSAN_SUPPORTED=yes
>   LSAN_SUPPORTED=yes
>   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_amd64.lo
> + fi
> + if echo "int x = __x86_64__;" | $CC -c -x c -o /dev/null - > /dev/null 
> 2>&1; then
>   
> SANITIZER_COMMON_TARGET_DEPENDENT_OBJECTS=sanitizer_linux_x86_64.lo
>   fi
>   ;;
> -- 

Jakub


Re: [RFA] [PATCH] Add a warning for invalid function casts

2017-10-04 Thread Eric Gallager
On Tue, Oct 3, 2017 at 3:33 PM, Bernd Edlinger
 wrote:
> Hi!
>
> I have implemented a warning -Wcast-function-type that analyzes
> type casts which change the function signatures.
>
> I would consider function pointers with different result type
> invalid, also if both function types have a non-null TYPE_ARG_TYPES
> I would say this deserves a warning.  As an exception I have
> used for instance in recog.h, the warning allows casting
> a function with the type typedef rtx (*stored_funcptr) (...);
> to any function with the same result type.
>
> I would think a warning like that should be enabled with -Wextra.
>
> Attached is a first version of the patch and as you can see
> the warning found already lots of suspicious type casts.  The worst
> is the splay-tree which always calls functions with uintptr_t
> instead of the correct parameter type.  I was unable to find
> a solution for this, and just silenced the warning with a
> second type-cast.
>
> Note that I also changed one line in libgo, but that is only
> a quick hack which I only did to make the boot-strap with
> all languages succeed.
>
> I'm not sure if this warning may be a bit too strict, but I think
> so far it just triggered on rather questionable code.
>
> Thoughts?
>
>
> Bernd.

Sorry if this is a stupid question, but could you explain how this
warning is different from -Wbad-function-cast? Something about direct
calls to functions vs. passing them as function pointers?


Re: [PATCH] C++ warning on vexing parse

2017-10-04 Thread Eric Gallager
On Tue, Oct 3, 2017 at 1:08 PM, Nathan Sidwell  wrote:
> [non-c++ people on CC, there's a reason ...]
>
> This patch adds a new warning, concerning unnecessary parentheses on a
> declaration.  For instance:
>prettyprinter (pp);
> That's a declaration of a pp variable of type prettyprinter -- not a call of
> a prettyprinter function.  The reason is that in C++, anything that is
> syntactically a declaration, is a declaration.  This can lead to surprising
> unexpected code generation, and the documentation I added provides
> motivation:
>   std::unique_lock (mymutex);
> That is NOT a lock of mymutex.  It's a declaration of a variable called
> mymutex locking no mutex.  If we're in a member function and 'mymutex' is a
> field, Wshadow-parm doesn't trigger either.
>
> This patch notes when a direct-declarator is parenthesized, and then warns
> about it in grokdeclarator.  Obviously, parens are sometimes necessary --
> when declaring pointers to arrays or functions, and we don't warn then.  The
> simplest way to do that was during the parsing, not in grokdeclarator, where
> the cp_declarator object is constant.  We'd either have to change that, or
> have some other local state.
>
> I added this to -Wparentheses (enabled by -Wall).  It triggered in a few
> cases during bootstrap:
>
> 1) in toplev.c we have the above prettyprinter example.  I think that's
> clearly poorly formatted code.
>
> 2) caller-save.c: insert_save has 'HARD_REG_SET (*to_save)', which also
> seems poorly formatted to me -- other cases of ptr-to-HRS don't do this.
>
> 3) A couple of places do:
>T (name  // line break to avoid wrap
>   [LONGEXPR][OTHERLONGEXPR]);
>
> The parens aid the editor's formatter. The patch removes the parens, but I
> can see there may be disagreement.  I suppose we could permit parens at the
> outermost declarator for an array?
> Affects ira-int.h (target_ira_int::x_ira_reg_mode_hard_regset)
>  reload.h (target_reload::x_regno_save_mode)
>
> 3) A set of typedef'd function types (NOT ptr-to-fn).  The name being
> typedef'd is parenthesized and not an idiom I recall seeing before. AFAICT
> these are the only cases of typedefing a plain fn.  We could, I suppose,
> ignore parens on a typedef'd fntype, but that seems a little random.
> Affects gengtype.c (frul_actionrout_t)
> lto-streamer.h (lto_get_section_data_f & to_free_section_data_f)
> tree-ssa-threadedge.c (pfn_simplify)
>
> If you've objections to the changes I've made in instances of #3 & #4 let me
> know.
>
> Jason, do you have concerns about the C++ patch itself?
>
> nathan
> --
> Nathan Sidwell

Could you check and see if this fixes any of the preexisting
currently-open bugs related to most-vexing-parse (or similar) warnings
on Bugzilla? For example:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64679
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25814
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69855


Re: Make tests less istreambuf_iterator implementation dependent

2017-10-04 Thread Jonathan Wakely

On 04/10/17 18:21 +0200, François Dumont wrote:

On 03/10/2017 16:20, Jonathan Wakely wrote:

Doesn't the modified test PASS anyway, even without changing how _M_c
is used?


No it doesn't because the first check for eof capture the 'a' char 
which is never reseted so after string construction it is still 
pointing to this char and is not an eof iterator. Without the _M_c 
assignment the iterator is still "live" and so keeps on reflecting the 
streambuf state.


Ah yes, I must have not adjusted the test properly when I tried to
check that. Your new test does indeed fail with the old
implementation.


Attached patch committed with:


Thanks.




Re: [PATCH] Improve alloca alignment

2017-10-04 Thread Jeff Law
On 10/04/2017 08:53 AM, Eric Botcazou wrote:
>> This seems like a SPARC target problem to me -- essentially it's
>> claiming a higher STACK_BOUNDARY than it really has.
> 
> No, it is not, I can guarantee you that the stack pointer is always aligned 
> to 
> 64-bit boundaries on SPARC, otherwise all hell would break loose...
Then something is inconsistent somehwere.  Either the stack is aligned
prior to that code or it is not.  If it is aligned, then Wilco's patch
ought to keep it aligned.  If is not properly aligned, then well, that's
the problem ISTM.

Am I missing something here?

jeff


Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-10-04 Thread Steve Ellcey
On Wed, 2017-10-04 at 16:41 +0100, Richard Earnshaw (lists) wrote:
> On 02/10/17 10:05, Sudi Das wrote:
> > 
> > 2017-10-02 Sudakshina Das  
> > 
> > * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
> > check type
> > for aarch64_simd_valid_immediate.
> > (aarch64_output_simd_mov_immediate): Update prototype.
> > (aarch64_simd_valid_immediate): Update prototype.
> > 
> > * config/aarch64/aarch64-simd.md (orr3): modified pattern to add
> > support for ORR-immediate.
> > (and3): modified pattern to add support for BIC-immediate.
> > 
> > * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now 
> > checks
> > for valid immediate for BIC and ORR based on new enum argument.
> > (aarch64_output_simd_mov_immediate): Function now used to output 
> > BIC/ORR imm
> > as well based on new enum argument.
> >  
> > * config/aarch64/constraints.md (Do): New vector immediate constraint.
> > (Db) : Likewise.
> > 
> > * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): New
> > predicate.
> > (aarch64_reg_or_bic_imm): Likewise.

I think this patch is causing a bunch of test failures on aarch64.  I
had to apply the patch for PR82396 (that was reverted) in order to
build ToT GCC, but when I did that and ran the testsuite I got a bunch
of failures like:

/home/sellcey/cavium-pr-27386/src/gcc/gcc/testsuite/gcc.c-
torture/compile/pr54713-1.c:45:18: internal compiler error: in
aarch64_simd_valid_immediate, at config/aarch64/aarch64.c:11539
0xf2227b aarch64_simd_valid_immediate(rtx_def*, machine_mode, bool,
simd_immediate_info*, simd_immediate_check)
../../../src/gcc/gcc/config/aarch64/aarch64.c:11539
0x11047b3 aarch64_reg_or_bic_imm(rtx_def*, machine_mode)
../../../src/gcc/gcc/config/aarch64/predicates.md:79
0xab29ab insn_operand_matches(insn_code, unsigned int, rtx_def*)
../../../src/gcc/gcc/optabs.c:6891
0xab29ab maybe_legitimize_operand_same_code
../../../src/gcc/gcc/optabs.c:6919
0xab545f maybe_legitimize_operand
../../../src/gcc/gcc/optabs.c:6990
0xab545f maybe_legitimize_operands(insn_code, unsigned int, unsigned
int, expand_operand*)
../../../src/gcc/gcc/optabs.c:7055
0xab5a8f maybe_gen_insn(insn_code, unsigned int, expand_operand*)
../../../src/gcc/gcc/optabs.c:7073
0xab8503 expand_binop_directly
../../../src/gcc/gcc/optabs.c:1075
0xab87af expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
rtx_def*, int, optab_methods)
../../../src/gcc/gcc/optabs.c:1156
0x8736d7 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
../../../src/gcc/gcc/expr.c:9582
0x749027 expand_gimple_stmt_1
../../../src/gcc/gcc/cfgexpand.c:3691
0x749027 expand_gimple_stmt
../../../src/gcc/gcc/cfgexpand.c:3751
0x750387 expand_gimple_basic_block
../../../src/gcc/gcc/cfgexpand.c:5750
0x751ef7 execute
../../../src/gcc/gcc/cfgexpand.c:6357


[PATCH, rs6000] Process deferred rescans between mini-passes

2017-10-04 Thread Bill Schmidt
Hi,

The Power8 swap optimization pass contains a mini-pass to replace certain
patterns prior to swap optimization proper.  In order for this not to
distort the dataflow information for swap optimization, we should process
all the deferred rescans between the two passes.  Currently that is not
done.  This patch fixes that.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
this ok for trunk?

Thanks,
Bill


2017-10-04  Bill Schmidt  

* config/rs6000/rs6000-p8swap.c (rs6000_analyze_swaps): Process
deferred rescans after the lvx/stvx recombination pre-pass.


Index: gcc/config/rs6000/rs6000-p8swap.c
===
--- gcc/config/rs6000/rs6000-p8swap.c   (revision 253388)
+++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
@@ -1882,6 +1882,7 @@ rs6000_analyze_swaps (function *fun)
 
   /* Pre-pass to recombine lvx and stvx patterns so we don't lose info.  */
   recombine_lvx_stvx_patterns (fun);
+  df_process_deferred_rescans ();
 
   /* Allocate structure to represent webs of insns.  */
   insn_entry = XCNEWVEC (swap_web_entry, get_max_uid ());



[patch, fortran] Set implicit ASYNCHRONOUS attribute

2017-10-04 Thread Thomas Koenig

Hello world,

the attached patch sets the implicit ASYNCHRONPUS according to F2008,
9.6.2.5:

# If a variable is used in an asynchronous data transfer statement as
# • an item in an input/output list,
# • a group object in a namelist, or
# • a SIZE= specifier
# the base object of the data-ref is implicitly given the ASYNCHRONOUS 
attribute in the scoping unit of the
# data transfer statement. This attribute may be confirmed by explicit 
declaration.


At the moment, the only thing this does is show up on the fortran tree
dump. This will hopefully change once asynchronous I/O is implemented.
And if you are wondering why I put setting the global variable into
check_io_constraints: It is because the parsing of the YES/NO
is done there, and I didn't want to duplicate the code.

No test case because we don't (yet) have tests for -fdump-fortran-original.

Regression-tested.  OK for trunk?

Regards

Thomas

2017-10-04  Thomas Koenig  

* gfortran.h (async_io_dt): Add external reference.
* io.c (async_io_dt): Add variable.
(compare_to_allowed_values): Add prototyte. Add optional argument
num. If present, set it to the number of the entry that was
matched.
(check_io_constraints): If this is for an asynchronous I/O
statement, set async_io_dt and set the asynchronous flag for
a SIZE tag.
* resolve.c (resolve_transfer): If async_io_dt is set, set
the asynchronous flag on the variable.
(resolve_fl_namelist): If async_io_dt is set, set the asynchronous
flag on all elements of the namelist.
Index: gfortran.h
===
--- gfortran.h	(Revision 253377)
+++ gfortran.h	(Arbeitskopie)
@@ -3311,6 +3311,7 @@ void gfc_free_dt (gfc_dt *);
 bool gfc_resolve_dt (gfc_dt *, locus *);
 void gfc_free_wait (gfc_wait *);
 bool gfc_resolve_wait (gfc_wait *);
+extern bool async_io_dt;
 
 /* module.c */
 void gfc_module_init_2 (void);
Index: io.c
===
--- io.c	(Revision 253381)
+++ io.c	(Arbeitskopie)
@@ -111,7 +111,10 @@ static gfc_dt *current_dt;
 
 #define RESOLVE_TAG(x, y) if (!resolve_tag (x, y)) return false;
 
+/* Are we currently processing an asynchronous I/O statement? */
 
+bool async_io_dt;
+
 / Fortran 95 FORMAT parser  */
 
 /* FORMAT tokens returned by format_lex().  */
@@ -1944,7 +1947,15 @@ static int
 compare_to_allowed_values (const char *specifier, const char *allowed[],
 			   const char *allowed_f2003[], 
 			   const char *allowed_gnu[], gfc_char_t *value,
-			   const char *statement, bool warn)
+			   const char *statement, bool warn,
+			   int *num = NULL);
+
+
+static int
+compare_to_allowed_values (const char *specifier, const char *allowed[],
+			   const char *allowed_f2003[], 
+			   const char *allowed_gnu[], gfc_char_t *value,
+			   const char *statement, bool warn, int *num)
 {
   int i;
   unsigned int len;
@@ -1961,7 +1972,11 @@ compare_to_allowed_values (const char *specifier,
   for (i = 0; allowed[i]; i++)
 if (len == strlen (allowed[i])
 	&& gfc_wide_strncasecmp (value, allowed[i], strlen (allowed[i])) == 0)
+  {
+	if (num)
+	  *num = i;
   return 1;
+  }
 
   for (i = 0; allowed_f2003 && allowed_f2003[i]; i++)
 if (len == strlen (allowed_f2003[i])
@@ -3719,6 +3734,7 @@ if (condition) \
 
   if (dt->asynchronous) 
 {
+  int num;
   static const char * asynchronous[] = { "YES", "NO", NULL };
 
   if (!gfc_reduce_init_expr (dt->asynchronous))
@@ -3734,9 +3750,15 @@ if (condition) \
   if (!compare_to_allowed_values
 		("ASYNCHRONOUS", asynchronous, NULL, NULL,
 		 dt->asynchronous->value.character.string,
-		 io_kind_name (k), warn))
+		 io_kind_name (k), warn, &num))
 	return MATCH_ERROR;
+
+  async_io_dt = num == 0;
+  if (async_io_dt && dt->size)
+	dt->size->symtree->n.sym->attr.asynchronous = 1;
 }
+  else
+async_io_dt = false;
 
   if (dt->id)
 {
Index: resolve.c
===
--- resolve.c	(Revision 253377)
+++ resolve.c	(Arbeitskopie)
@@ -9196,6 +9196,9 @@ resolve_transfer (gfc_code *code)
 		 "an assumed-size array", &code->loc);
   return;
 }
+
+  if (async_io_dt && exp->expr_type == EXPR_VARIABLE)
+exp->symtree->n.sym->attr.asynchronous = 1;
 }
 
 
@@ -14079,6 +14082,11 @@ resolve_fl_namelist (gfc_symbol *sym)
 	}
 }
 
+  if (async_io_dt)
+{
+  for (nl = sym->namelist; nl; nl = nl->next)
+	nl->sym->attr.asynchronous = 1;
+}
   return true;
 }
 


PING: [libsanitizer, PATCH] Add anitizer_linux_x86_64.lo if __x86_64__ is defined by $CC

2017-10-04 Thread H.J. Lu
On 9/30/17, H.J. Lu  wrote:
> Since size of "void *" is 4 bytes for x32, check if __x86_64__ is defined
> by $CC, instead of
>
> if test x$ac_cv_sizeof_void_p = x8; then
>
> to decide wether anitizer_linux_x86_64.lo should be used.
>
> I am testing this on i686 and x86-64.  OK for trunk and GCC 7 branch if
> there are no regression?  Please upstream it for me if appropriate.
>
> Thanks.
>
>
> H.J.
> ---
>   PR sanitizer/82379
>   * configure.tgt (ANITIZER_COMMON_TARGET_DEPENDENT_OBJECTS): Set
>   to anitizer_linux_x86_64.lo if __x86_64__ is defined by $CC.
> ---
>  libsanitizer/configure.tgt | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
> index 82e8a5513c5..573e3b482e9 100644
> --- a/libsanitizer/configure.tgt
> +++ b/libsanitizer/configure.tgt
> @@ -27,6 +27,8 @@ case "${target}" in
>   TSAN_SUPPORTED=yes
>   LSAN_SUPPORTED=yes
>   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_amd64.lo
> + fi
> + if echo "int x = __x86_64__;" | $CC -c -x c -o /dev/null - > /dev/null
> 2>&1; then
>   
> SANITIZER_COMMON_TARGET_DEPENDENT_OBJECTS=sanitizer_linux_x86_64.lo
>   fi
>   ;;
> --
> 2.13.6
>
>

This file is only used in GCC.   I have tested it on i686 and x86-64.
Any objections or comments?

Thanks.

-- 
H.J.


Re: [PATCH] Improve V?TImode shifts (PR target/82370)

2017-10-04 Thread H.J. Lu
On 10/4/17, Jakub Jelinek  wrote:
> Hi!
>
> The following patch tweaks the TImode vector shifts similarly
> to the earlier vector shift patch, so that for shifts by immediate
> we can accept a memory input.  Additionally, it removes the vec_shl_*

I prefer 2 patches, a separate patch to drop vec_shl_* first. which can
go in now.

Thanks.


> expander, because the middle-end has dropped that a few years ago,
> and merges the left and right shift patterns using code iterators.
> Appart from the code/names that can be handled by mode attributes,
> the only difference was that one of the insns had
> (set_attr "atom_unit" "sishuf")
> and the other didn't.  I hope that is just an error, I'd really expect
> both vpslldq and vpsrldq to use the same atom unit, isn't that the case?
> CCing H.J. for that.  If it is intentional difference, I can of course
> adjust the patch and undo the merging of the 4 define_insns into 2.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-10-04  Jakub Jelinek  
>
>   PR target/82370
>   * config/i386/sse.md (VIMAX_AVX2): Remove V4TImode.
>   (VIMAX_AVX2_AVX512BW, VIMAX_AVX512VL): New mode iterators.
>   (vec_shl_): Remove unused expander.
>   (avx512bw_3): New define_insn.
>   (_ashl3, _lshr3): Replaced by ...
>   (_3): ... this.  New define_insn.
>
>   * gcc.target/i386/pr82370.c: New test.
>
> --- gcc/config/i386/sse.md.jj 2017-10-04 12:18:19.0 +0200
> +++ gcc/config/i386/sse.md2017-10-04 15:34:00.541860351 +0200
> @@ -371,10 +371,17 @@ (define_mode_iterator V16FI
>[V16SF V16SI])
>
>  ;; ??? We should probably use TImode instead.
> -(define_mode_iterator VIMAX_AVX2
> +(define_mode_iterator VIMAX_AVX2_AVX512BW
>[(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") V1TI])
>
> -;; ??? This should probably be dropped in favor of VIMAX_AVX2.
> +;; Suppose TARGET_AVX512BW as baseline
> +(define_mode_iterator VIMAX_AVX512VL
> +  [V4TI (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")])
> +
> +(define_mode_iterator VIMAX_AVX2
> +  [(V2TI "TARGET_AVX2") V1TI])
> +
> +;; ??? This should probably be dropped in favor of VIMAX_AVX2_AVX512BW.
>  (define_mode_iterator SSESCALARMODE
>[(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") TI])
>
> @@ -10792,9 +10799,9 @@ (define_insn "3 (set_attr "mode" "")])
>
>
> -(define_expand "vec_shl_"
> +(define_expand "vec_shr_"
>[(set (match_dup 3)
> - (ashift:V1TI
> + (lshiftrt:V1TI
>(match_operand:VI_128 1 "register_operand")
>(match_operand:SI 2 "const_0_to_255_mul_8_operand")))
> (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> @@ -10805,48 +10812,24 @@ (define_expand "vec_shl_"
>operands[4] = gen_lowpart (mode, operands[3]);
>  })
>
> -(define_insn "_ashl3"
> -  [(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v")
> - (ashift:VIMAX_AVX2
> -  (match_operand:VIMAX_AVX2 1 "register_operand" "0,v")
> -  (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))]
> -  "TARGET_SSE2"
> +(define_insn "avx512bw_3"
> +  [(set (match_operand:VIMAX_AVX512VL 0 "register_operand" "=v")
> + (any_lshift:VIMAX_AVX512VL
> +  (match_operand:VIMAX_AVX512VL 1 "nonimmediate_operand" "vm")
> +  (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n")))]
> +  "TARGET_AVX512BW"
>  {
>operands[2] = GEN_INT (INTVAL (operands[2]) / 8);
> -
> -  switch (which_alternative)
> -{
> -case 0:
> -  return "pslldq\t{%2, %0|%0, %2}";
> -case 1:
> -  return "vpslldq\t{%2, %1, %0|%0, %1, %2}";
> -default:
> -  gcc_unreachable ();
> -}
> +  return "vpdq\t{%2, %1, %0|%0, %1, %2}";
>  }
> -  [(set_attr "isa" "noavx,avx")
> -   (set_attr "type" "sseishft")
> +  [(set_attr "type" "sseishft")
> (set_attr "length_immediate" "1")
> -   (set_attr "prefix_data16" "1,*")
> -   (set_attr "prefix" "orig,vex")
> +   (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "")])
>
> -(define_expand "vec_shr_"
> -  [(set (match_dup 3)
> - (lshiftrt:V1TI
> -  (match_operand:VI_128 1 "register_operand")
> -  (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
> -   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> -  "TARGET_SSE2"
> -{
> -  operands[1] = gen_lowpart (V1TImode, operands[1]);
> -  operands[3] = gen_reg_rtx (V1TImode);
> -  operands[4] = gen_lowpart (mode, operands[3]);
> -})
> -
> -(define_insn "_lshr3"
> +(define_insn "_3"
>[(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v")
> - (lshiftrt:VIMAX_AVX2
> + (any_lshift:VIMAX_AVX2
>(match_operand:VIMAX_AVX2 1 "register_operand" "0,v")
>(match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))]
>"TARGET_SSE2"
> @@ -10856,9 +10839,9 @@ (define_insn "_lshr3"
>switch (which_alternative)
>  {
>  case 0:
> -  return "psrldq\t{%2, %0|%0, %2}";
> +  return "pdq\t{%2, %0|%0, %2}";
>  case 1:
> -  return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
> +  return "v

Re: correct attribute ifunc C++ type safety (PR 82301)

2017-10-04 Thread Martin Sebor

On 09/28/2017 08:25 AM, Nathan Sidwell wrote:

On 09/24/2017 06:03 PM, Martin Sebor wrote:

r253041 enhanced type checking for alias and ifunc attributes to
detect declarations of incompatible aliases, or ifunc resolvers
that return pointers to functions of an incompatible type.  More
extensive testing exposed a bug in the implementation of the ifunc
attribute handling in C++ where the checker expected the ifunc
resolver to return a pointer to a member function when the
implementation actually expects it return a pointer to a non-
member function.

In a discussion of the test suite failures, Jakub also suggested
to break the enhanced warning out of -Wattributes and issue it
under a different option.

The attached patch corrects the C++ problem and moves the warning
under -Wincompatible-pointer-types.  Since this is a C-only option,
the patch also enables for it C++.  Since the option is enabled by
default, the patch further requires -Wextra to issue the warning
for ifunc resolvers returning void*.  However, the patched checker
diagnoses other incompatibilities without it.

Martin


I find the maybe_diag_incompatible_alias function confusing.


+/* Check declaration of the type of ALIAS for compatibility with its
TARGET
+   (which may be an ifunc resolver) and issue a diagnostic when they are
+   not compatible according to language rules (plus a C++ extension for
+   non-static member functions).  */
+
+static void
+maybe_diag_incompatible_alias (tree alias, tree target)
+{
+  tree altype = TREE_TYPE (alias);
+  tree targtype = TREE_TYPE (target);
+
+  bool ifunc = lookup_attribute ("ifunc", DECL_ATTRIBUTES (alias));
+  if (ifunc)


I think it might be clearer if this was broken out into a diag_ifunc
function?  But see below ...


Thanks for the review.  I've updated the patch to incorporate
your suggestions.  My responses (mostly agreeing with your
comments or clarifying things, plus one question) are inline.




+{
+  /* Handle attribute ifunc first.  */
+
+  tree funcptr = altype;
+
+  /* Set FUNCPTR to the type of the alias target.  If the type
+ is a non-static member function of class C, construct a type
+ of an ordinary function taking C* as the first argument,
+ followed by the member function argument list, and use it
+ instead to check for compatibilties.  FUNCPTR is used only
+ in diagnostics.  */


This comment is self-contradictory.
  1 Set FUNCPTR
  2 Do some method-type shenanigans
  3 Use it to check for incompatibilites
  4 FUNCPTR is only used in diags

Which of #3 and #4 is true?


Both.  It's used to control diagnostics (as opposed to something
else).  But the comment is from an earlier version of the patch
where the function body was still a part of its caller, so it's
redundant now that all the code in maybe_diag_incompatible_alias
is only used to control diagnostics.


+
+  if (TREE_CODE (altype) == METHOD_TYPE)
+{


IMHO put the description of the METHOD_TYPE chicanery inside the block
doing it?  FWIW, although the change being made works on many (most?)
ABIs, it's not formally correct and I think fails on some where 'this'
is passed specially. You might want to note that?


Sure.  I've added a comment.

Since the original tests (where the resolver returns void*) pass
across the board I assume it must work for all supported ABIs.
Or is there some subtlety between the before and after code that
I'm missing?




+  tree rettype = TREE_TYPE (altype);
+  tree args = TYPE_ARG_TYPES (altype);
+  altype = build_function_type (rettype, args);
+  funcptr = altype;
+}
+



+  if ((!FUNC_OR_METHOD_TYPE_P (targtype)
+   || (prototype_p (altype)
+   && prototype_p (targtype)
+   && !types_compatible_p (altype, targtype
+{
+  funcptr = build_pointer_type (funcptr);
+
+  if (warning_at (DECL_SOURCE_LOCATION (target),
+  OPT_Wincompatible_pointer_types,
+  "% resolver for %qD should return %qT",
+  alias, funcptr))
+inform (DECL_SOURCE_LOCATION (alias),
+"resolver indirect function declared here");
+}


this block is almost the same as the non-ifunc block.  Surely they can
be the same code? (by generalizing one of the cases until it turns into
the other?)


The existing code does that but in this patch I made the warnings
and informational notes distinct.  It feels like a tossup between
parameterizing the code and making the flow more complex and harder
to follow and keeping the two cases (ifunc and alias) separate from
one another.  But I don't feel strongly one way or the other so
I changed it as you suggest.


+  /* Deal with static member function pointers.  */


I do not understand this comment or condition. We seem to have dealt
with pointers already and the conditions seem confused.


+  if (TREE_CODE (targtype) != RECORD_TYPE
+  || TYPE_FIELDS (targtype)
+  || TREE_CODE (TR

[PATCH] Remove useless isa attributes from various sse.md patterns

2017-10-04 Thread Jakub Jelinek
Hi!

While working on the previous patch, I've noticed we have quite a few
seemingly useless isa attributes (first I've noticed isa attribute
which had one value for all alternatives, which IMHO should just
been done in insn condition instead).

fma_avx512f isa is only used on insns that have TARGET_AVX512F && ...
in their conditions, and the isa has been enabled if:
TARGET_FMA || TARGET_AVX512F
so that is clearly satisfied always.

The last hunk had isa avx and TARGET_AVX512BW && in the condition,
that doesn't make any sense to me either.

And the two hunks before that had avx512f isa, and TARGET_AVX512F && ...
in the condition.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-04  Jakub Jelinek  

* config/i386/i386.md (isa): Remove fma_avx512f.
* config/i386/sse.md (_fmadd__mask,
_fmadd__mask3,
_fmsub__mask,
_fmsub__mask3,
_fnmadd__mask,
_fnmadd__mask3,
_fnmsub__mask,
_fnmsub__mask3,
_fmaddsub__mask,
_fmaddsub__mask3,
_fmsubadd__mask,
_fmsubadd__mask3): Remove isa attribute.
(*vec_widen_umult_even_v16si,
*vec_widen_smult_even_v16si): Likewise.
(avx512bw_dbpsadbw): Likewise.

--- gcc/config/i386/i386.md.jj  2017-10-04 09:45:55.0 +0200
+++ gcc/config/i386/i386.md 2017-10-04 16:28:36.954551561 +0200
@@ -798,7 +798,7 @@ (define_attr "movu" "0,1" (const_string
 (define_attr "isa" "base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
-   fma_avx512f,avx512bw,noavx512bw,avx512dq,noavx512dq,
+   avx512bw,noavx512bw,avx512dq,noavx512dq,
avx512vl,noavx512vl,x64_avx512dq,x64_avx512bw"
   (const_string "base"))
 
@@ -832,8 +832,6 @@ (define_attr "enabled" ""
 (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
 (eq_attr "isa" "avx512f") (symbol_ref "TARGET_AVX512F")
 (eq_attr "isa" "noavx512f") (symbol_ref "!TARGET_AVX512F")
-(eq_attr "isa" "fma_avx512f")
-  (symbol_ref "TARGET_FMA || TARGET_AVX512F")
 (eq_attr "isa" "avx512bw") (symbol_ref "TARGET_AVX512BW")
 (eq_attr "isa" "noavx512bw") (symbol_ref "!TARGET_AVX512BW")
 (eq_attr "isa" "avx512dq") (symbol_ref "TARGET_AVX512DQ")
--- gcc/config/i386/sse.md.jj   2017-10-04 15:34:00.0 +0200
+++ gcc/config/i386/sse.md  2017-10-04 16:21:41.724535506 +0200
@@ -3720,8 +3720,7 @@ (define_insn "_fmadd__mask
   "@
vfmadd132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, 
%2}
vfmadd213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, 
%3}"
-  [(set_attr "isa" "fma_avx512f,fma_avx512f")
-   (set_attr "type" "ssemuladd")
+  [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
 (define_insn "_fmadd__mask3"
@@ -3735,8 +3734,7 @@ (define_insn "_fmadd__mask
  (match_operand: 4 "register_operand" "Yk")))]
   "TARGET_AVX512F"
   "vfmadd231\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, 
%2}"
-  [(set_attr "isa" "fma_avx512f")
-   (set_attr "type" "ssemuladd")
+  [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
 (define_insn "*fma_fmsub_"
@@ -3786,8 +3784,7 @@ (define_insn "_fmsub__mask
   "@
vfmsub132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, 
%2}
vfmsub213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, 
%3}"
-  [(set_attr "isa" "fma_avx512f,fma_avx512f")
-   (set_attr "type" "ssemuladd")
+  [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
 (define_insn "_fmsub__mask3"
@@ -3802,8 +3799,7 @@ (define_insn "_fmsub__mask
  (match_operand: 4 "register_operand" "Yk")))]
   "TARGET_AVX512F && "
   "vfmsub231\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, 
%2}"
-  [(set_attr "isa" "fma_avx512f")
-   (set_attr "type" "ssemuladd")
+  [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
 (define_insn "*fma_fnmadd_"
@@ -3853,8 +3849,7 @@ (define_insn "_fnmadd__mas
   "@
vfnmadd132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, 
%2}
vfnmadd213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, 
%3}"
-  [(set_attr "isa" "fma_avx512f,fma_avx512f")
-   (set_attr "type" "ssemuladd")
+  [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
 (define_insn "_fnmadd__mask3"
@@ -3869,8 +3864,7 @@ (define_insn "_fnmadd__mas
  (match_operand: 4 "register_operand" "Yk")))]
   "TARGET_AVX512F && "
   "vfnmadd231\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, 
%2}"
-  [(set_attr "isa" "fma_avx512f")
-   (set_attr "type" "ssemuladd")
+  [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
 (define_insn "*fma_fnmsub_"
@@ -3923,8 +3917,7 @@ (define_insn "_fnmsub__mas
   "@
vfnmsub132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, 
%2}
vfnmsub213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, 
%3}"
-  [(set_attr "isa" "fma_avx512f,fma_avx512f")
-   (set_attr "type" "ssemuladd")
+  [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
 (define_insn "_fnmsub__mask3"
@@ -3940,8 +3933,7 @@ (define_insn "_fnmsub__mas
  (match_opera

[PATCH] Improve V?TImode shifts (PR target/82370)

2017-10-04 Thread Jakub Jelinek
Hi!

The following patch tweaks the TImode vector shifts similarly
to the earlier vector shift patch, so that for shifts by immediate
we can accept a memory input.  Additionally, it removes the vec_shl_*
expander, because the middle-end has dropped that a few years ago,
and merges the left and right shift patterns using code iterators.
Appart from the code/names that can be handled by mode attributes,
the only difference was that one of the insns had
(set_attr "atom_unit" "sishuf")
and the other didn't.  I hope that is just an error, I'd really expect
both vpslldq and vpsrldq to use the same atom unit, isn't that the case?
CCing H.J. for that.  If it is intentional difference, I can of course
adjust the patch and undo the merging of the 4 define_insns into 2.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-04  Jakub Jelinek  

PR target/82370
* config/i386/sse.md (VIMAX_AVX2): Remove V4TImode.
(VIMAX_AVX2_AVX512BW, VIMAX_AVX512VL): New mode iterators.
(vec_shl_): Remove unused expander.
(avx512bw_3): New define_insn.
(_ashl3, _lshr3): Replaced by ...
(_3): ... this.  New define_insn.

* gcc.target/i386/pr82370.c: New test.

--- gcc/config/i386/sse.md.jj   2017-10-04 12:18:19.0 +0200
+++ gcc/config/i386/sse.md  2017-10-04 15:34:00.541860351 +0200
@@ -371,10 +371,17 @@ (define_mode_iterator V16FI
   [V16SF V16SI])
 
 ;; ??? We should probably use TImode instead.
-(define_mode_iterator VIMAX_AVX2
+(define_mode_iterator VIMAX_AVX2_AVX512BW
   [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") V1TI])
 
-;; ??? This should probably be dropped in favor of VIMAX_AVX2.
+;; Suppose TARGET_AVX512BW as baseline
+(define_mode_iterator VIMAX_AVX512VL
+  [V4TI (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")])
+
+(define_mode_iterator VIMAX_AVX2
+  [(V2TI "TARGET_AVX2") V1TI])
+
+;; ??? This should probably be dropped in favor of VIMAX_AVX2_AVX512BW.
 (define_mode_iterator SSESCALARMODE
   [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") TI])
 
@@ -10792,9 +10799,9 @@ (define_insn "3")])
 
 
-(define_expand "vec_shl_"
+(define_expand "vec_shr_"
   [(set (match_dup 3)
-   (ashift:V1TI
+   (lshiftrt:V1TI
 (match_operand:VI_128 1 "register_operand")
 (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
(set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
@@ -10805,48 +10812,24 @@ (define_expand "vec_shl_"
   operands[4] = gen_lowpart (mode, operands[3]);
 })
 
-(define_insn "_ashl3"
-  [(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v")
-   (ashift:VIMAX_AVX2
-(match_operand:VIMAX_AVX2 1 "register_operand" "0,v")
-(match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))]
-  "TARGET_SSE2"
+(define_insn "avx512bw_3"
+  [(set (match_operand:VIMAX_AVX512VL 0 "register_operand" "=v")
+   (any_lshift:VIMAX_AVX512VL
+(match_operand:VIMAX_AVX512VL 1 "nonimmediate_operand" "vm")
+(match_operand:SI 2 "const_0_to_255_mul_8_operand" "n")))]
+  "TARGET_AVX512BW"
 {
   operands[2] = GEN_INT (INTVAL (operands[2]) / 8);
-
-  switch (which_alternative)
-{
-case 0:
-  return "pslldq\t{%2, %0|%0, %2}";
-case 1:
-  return "vpslldq\t{%2, %1, %0|%0, %1, %2}";
-default:
-  gcc_unreachable ();
-}
+  return "vpdq\t{%2, %1, %0|%0, %1, %2}";
 }
-  [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sseishft")
+  [(set_attr "type" "sseishft")
(set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "maybe_evex")
(set_attr "mode" "")])
 
-(define_expand "vec_shr_"
-  [(set (match_dup 3)
-   (lshiftrt:V1TI
-(match_operand:VI_128 1 "register_operand")
-(match_operand:SI 2 "const_0_to_255_mul_8_operand")))
-   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
-  "TARGET_SSE2"
-{
-  operands[1] = gen_lowpart (V1TImode, operands[1]);
-  operands[3] = gen_reg_rtx (V1TImode);
-  operands[4] = gen_lowpart (mode, operands[3]);
-})
-
-(define_insn "_lshr3"
+(define_insn "_3"
   [(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v")
-   (lshiftrt:VIMAX_AVX2
+   (any_lshift:VIMAX_AVX2
 (match_operand:VIMAX_AVX2 1 "register_operand" "0,v")
 (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))]
   "TARGET_SSE2"
@@ -10856,9 +10839,9 @@ (define_insn "_lshr3"
   switch (which_alternative)
 {
 case 0:
-  return "psrldq\t{%2, %0|%0, %2}";
+  return "pdq\t{%2, %0|%0, %2}";
 case 1:
-  return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
+  return "vpdq\t{%2, %1, %0|%0, %1, %2}";
 default:
   gcc_unreachable ();
 }
--- gcc/testsuite/gcc.target/i386/pr82370.c.jj  2017-10-04 16:01:16.350247297 
+0200
+++ gcc/testsuite/gcc.target/i386/pr82370.c 2017-10-04 16:03:06.704922288 
+0200
@@ -0,0 +1,18 @@
+/* PR target/82370 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mav

[PATCH] Improve AVX512 vector shift patterns (PR target/82370)

2017-10-04 Thread Jakub Jelinek
Hi!

EVEX encoded vector shifts by immediate allow memory operand as input.
We handle this right for the sra patterns by having 3 distinct
define_insns, one TARGET_AVX512VL with masking, where the non-masked
insn names start with *, that have (=v,v,v) and (=v,vm,N) alternatives
and nonimmediate_operand for the middle operand, then SSE2 pattern
for the same modes with just the noavx and avx alternatives and finally
a 512-bit vector pattern with masking that also has the nonimmediate_operand
etc.  For the logical shifts we have 3 define_insns too, but with very
different split that makes it not possible to do this.

The following patch reworks the logical vector shifts so that they are
similar to the arithmetic right vector shifts, except for the needed V?DI
differences.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-04  Jakub Jelinek  

PR target/82370
* config/i386/sse.md (VI248_AVX2, VI248_AVX512BW, VI248_AVX512BW_2):
New mode iterators.
(3): Change the last of the 3
define_insns for logical vector shifts to use VI248_AVX512BW
iterator instead of VI48_AVX512, remove 
condition, useless isa and prefix attributes.  Change the first
2 of these define_insns to ...
(3): ... this, new
define_insn for avx512vl.
(3): ... and this, new define_insn without
masking for non-avx512vl.

* gcc.target/i386/avx-pr82370.c: New test.
* gcc.target/i386/avx2-pr82370.c: New test.
* gcc.target/i386/avx512f-pr82370.c: New test.
* gcc.target/i386/avx512bw-pr82370.c: New test.
* gcc.target/i386/avx512vl-pr82370.c: New test.
* gcc.target/i386/avx512vlbw-pr82370.c: New test.

--- gcc/config/i386/sse.md.jj   2017-10-04 09:45:55.0 +0200
+++ gcc/config/i386/sse.md  2017-10-04 12:18:19.163858188 +0200
@@ -403,11 +403,19 @@ (define_mode_iterator VI48_AVX2
   [(V8SI "TARGET_AVX2") V4SI
(V4DI "TARGET_AVX2") V2DI])
 
+(define_mode_iterator VI248_AVX2
+  [(V16HI "TARGET_AVX2") V8HI
+   (V8SI "TARGET_AVX2") V4SI
+   (V4DI "TARGET_AVX2") V2DI])
+
 (define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW
   [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
(V16SI "TARGET_AVX512BW") (V8SI "TARGET_AVX2") V4SI
(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
 
+(define_mode_iterator VI248_AVX512BW
+  [(V32HI "TARGET_AVX512BW") V16SI V8DI])
+
 (define_mode_iterator VI248_AVX512BW_AVX512VL
   [(V32HI "TARGET_AVX512BW") 
(V4DI "TARGET_AVX512VL") V16SI V8DI])
@@ -418,6 +426,11 @@ (define_mode_iterator VI248_AVX512BW_1
   V8SI V4SI
   V2DI])

+(define_mode_iterator VI248_AVX512BW_2
+ [(V16HI "TARGET_AVX512BW") (V8HI "TARGET_AVX512BW")
+  V8SI V4SI
+  V4DI V2DI])
+   
 (define_mode_iterator VI48_AVX512F
   [(V16SI "TARGET_AVX512F") V8SI V4SI
(V8DI "TARGET_AVX512F") V4DI V2DI])
@@ -10731,59 +10744,51 @@ (define_insn "ashr3"
(const_string "0")))
(set_attr "mode" "")])
 
-(define_insn "3"
-  [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,v")
-   (any_lshift:VI2_AVX2_AVX512BW
- (match_operand:VI2_AVX2_AVX512BW 1 "register_operand" "0,v")
- (match_operand:DI 2 "nonmemory_operand" "xN,vN")))]
-  "TARGET_SSE2 &&  && "
-  "@
-   p\t{%2, %0|%0, %2}
-   vp\t{%2, %1, %0|%0, 
%1, %2}"
-  [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sseishft")
+(define_insn "3"
+  [(set (match_operand:VI248_AVX512BW_2 0 "register_operand" "=v,v")
+   (any_lshift:VI248_AVX512BW_2
+ (match_operand:VI248_AVX512BW_2 1 "nonimmediate_operand" "v,vm")
+ (match_operand:DI 2 "nonmemory_operand" "v,N")))]
+  "TARGET_AVX512VL"
+  "vp\t{%2, %1, %0|%0, 
%1, %2}"
+  [(set_attr "type" "sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
(set_attr "mode" "")])
 
-(define_insn "3"
-  [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v")
-   (any_lshift:VI48_AVX2
- (match_operand:VI48_AVX2 1 "register_operand" "0,x,v")
- (match_operand:DI 2 "nonmemory_operand" "xN,xN,vN")))]
-  "TARGET_SSE2 && "
+(define_insn "3"
+  [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x")
+   (any_lshift:VI248_AVX2
+ (match_operand:VI248_AVX2 1 "register_operand" "0,x")
+ (match_operand:DI 2 "nonmemory_operand" "xN,xN")))]
+  "TARGET_SSE2"
   "@
p\t{%2, %0|%0, %2}
-   vp\t{%2, %1, %0|%0, 
%1, %2}
-   vp\t{%2, %1, %0|%0, 
%1, %2}"  
-  [(set_attr "isa" "noavx,avx,avx512bw")
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "prefix_data16" "1,*,*")
-   (set_attr "prefix" "orig,vex,evex")

Re: Transform (x >> cst) != 0 to x >= (1 << cst) and (x >> cst) == 0 to x < (1 << cst)

2017-10-04 Thread Prathamesh Kulkarni
On 3 October 2017 at 14:10, Jeff Law  wrote:
> On 10/03/2017 03:00 PM, Marc Glisse wrote:
>> On Tue, 3 Oct 2017, Jakub Jelinek wrote:
>>
>>> On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote:
 Hi,
 This follow-up patch implements the patterns mentioned in $subject.
 Bootstrap+test in progress on x86_64-unknown-linux-gnu and
 aarch64-linux-gnu.
 OK to commit if passes ?

 Thanks,
 Prathamesh
>>>
 2017-10-03  Prathamesh Kulkarni  

 * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern.
 ((X >> CST) != 0 -> X >= (1 << CST)): Likewise.
>>
>> build_int_cstu doesn't work for vectors, you want build_one_cst. I never
>> know if we should check single_use or not :-(
Thanks for the pointers, I wasn't aware about build_one_cst.
>>
 testsuite/
 * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4.
>>>
>>> Why this way and not the other way around?
>>
>> For high level gimple optimizations, X < CST is more convenient (and
>> smaller, just one insn) than (X >> CST) == 0.
> Right.  One could easily argue that Prathamesh's form should be the
> preferred form because it is simpler at the gimple level -- and that
> x86-isms should be dealt with later in the pipeline.
>
>
>>
>>> E.g. i?86/x86_64 and various other targets have shift instructions which
>>> set condition codes, so (X >> 51) == 0 is certainly smaller
>>> smaller and I believe cheaper than the latter.
>>> Try:
>>> void foo (void);
>>>
>>> void
>>> bar (unsigned long long x)
>>> {
>>>  if ((x >> 51) == 0)
>>>foo ();
>>> }
>>>
>>> void
>>> baz (unsigned long long x)
>>> {
>>>  if (x < (1LL << 51))
>>>foo ();
>>> }
>>> with -Os on x86_64, the first function is 4 insns, 12 bytes,
>>> the second one 5 insns, 21 bytes.
Indeed, I can reproduce this behavior. Thanks for pointing it out.
>>>
>>> I wonder if this kind of instruction selection stuff shouldn't be
>>> done in target.pd instead, with input from the target.
> Right, but I think that argues that Prathamesh's patch is the right
> direction and that to move forward what needs to happen is something
> needs to be fixed at the gimple/rtl border to ensure we get good x86 code.
>
>>
>> At a late stage, maybe during an RTL pass or expansion (or just before
>> expansion) it would indeed be good to generate a shift for such
>> comparisons, on targets where that sets a cc. The lack of this
>> transformation could be considered a blocker for the other one, to avoid
>> regressing on bar.
> Right.  In fact, I think Jakub's test ought to be added to this work as
> part of its basic testing.
Um, how to write the above-test case for bar() in DejaGNU format ?
Is it possible to test for number of insns or to use nm to test for
size of a function with dejagnu directive ?
Or should I scan for "cmpq" since the pattern transforms a right shift
and cmp against 0 to cmp between operands ?

Thanks,
Prathamesh
>
> jeff
>


Re: [PATCH] Add sanopt support for UBSAN_PTR.

2017-10-04 Thread Jakub Jelinek
On Wed, Oct 04, 2017 at 11:05:23AM +0200, Martin Liška wrote:
> Following patch adds support for optimizing out unnecessary UBSAN_PTR checks.
> It handles separately positive and negative offsets, zero offset is ignored.
> Apart from that, we utilize get_inner_reference for local and global 
> variables,
> that also helps to reduce some.

Thanks for working on it.

> @@ -148,6 +149,61 @@ struct sanopt_tree_triplet_hash : typed_noop_remove 
> 
>}
>  };
>  
> +/* Tree couple for ptr_check_map.  */
> +struct sanopt_tree_couple
> +{
> +  tree ptr;
> +  bool positive;

Maybe pos_p instead, so that it isn't that long?

> +  static inline bool
> +  equal (const sanopt_tree_couple &ref1, const sanopt_tree_couple &ref2)
> +  {
> +return operand_equal_p (ref1.ptr, ref2.ptr, 0)
> +&& ref1.positive == ref2.positive;

Or this would need to use ()s around the return expression for emacs
users.

> +/* Return true when pointer PTR for a given OFFSET is already sanitized
> +   in a given sanitization context CTX.  */
> +
> +static bool
> +has_dominating_ubsan_ptr_check (sanopt_ctx *ctx, tree ptr, tree cur_offset)

Any reason why you pass cur_offset as a tree rather than wide_int & or
offset_int &?  If the caller makes sure it is sign extended from
pointer/sizetype's precision, then:

> +{
> +  int r = get_range_pos_neg (cur_offset);

here you can just wi::neg_p or similar.

> +  /* We already have recorded a UBSAN_PTR check for this pointer.  Perhaps we
> + can drop this one.  But only if this check doesn't specify larger 
> offset.
> + */
> +  tree offset = gimple_call_arg (g, 1);
> +  gcc_assert (TREE_CODE (offset) == INTEGER_CST);
> +
> +  if (positive && tree_int_cst_le (cur_offset, offset))
> +return true;
> +  else if (!positive && tree_int_cst_le (offset, cur_offset))
> +return true;

And the comparisons here in wide_int would be cheaper too.

> +
> +static void
> +record_ubsan_ptr_check_stmt (sanopt_ctx *ctx, gimple *stmt, tree ptr,
> +  tree offset)
> +{
> +  int r = get_range_pos_neg (offset);

See above.

> +  gcc_assert (r != 3);
> +  bool positive = r == 1;
> +
> +  sanopt_tree_couple couple;
> +  couple.ptr = ptr;
> +  couple.positive = positive;
> +
> +  auto_vec &v = ctx->ptr_check_map.get_or_insert (couple);
> +  v.safe_push (stmt);
> +}
> +
> +/* Optimize away redundant UBSAN_PTR calls.  */
> +
> +static bool
> +maybe_optimize_ubsan_ptr_ifn (sanopt_ctx *ctx, gimple *stmt)
> +{
> +  gcc_assert (gimple_call_num_args (stmt) == 2);
> +  tree ptr = gimple_call_arg (stmt, 0);
> +  tree cur_offset = gimple_call_arg (stmt, 1);
> +
> +  if (TREE_CODE (cur_offset) != INTEGER_CST)
> +return false;
> +
> +  if (integer_zerop (cur_offset))
> +return true;
> +
> +  if (has_dominating_ubsan_ptr_check (ctx, ptr, cur_offset))
> +return true;
> +
> +  tree base = ptr;
> +  if (TREE_CODE (base) == ADDR_EXPR)
> +{
> +  base = TREE_OPERAND (base, 0);
> +
> +  HOST_WIDE_INT bitsize, bitpos;
> +  machine_mode mode;
> +  int volatilep = 0, reversep, unsignedp = 0;
> +  tree offset;
> +  base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode,
> +   &unsignedp, &reversep, &volatilep);
> +  if (DECL_P (base))
> + {
> +   gcc_assert (!DECL_REGISTER (base));
> +   /* If BASE is a fixed size automatic variable or
> +  global variable defined in the current TU and bitpos
> +  fits, don't instrument anything.  */
> +   if ((offset == NULL_TREE || TREE_CODE (offset) == INTEGER_CST)

Do you really need to handle offset != NULL_TREE?
If the bit offset is representable in shwi, then it will just be
in bitpos and offset will be NULL.

> +   && (VAR_P (base)
> +   || TREE_CODE (base) == PARM_DECL
> +   || TREE_CODE (base) == RESULT_DECL)
> +   && DECL_SIZE (base)
> +   && TREE_CODE (DECL_SIZE (base)) == INTEGER_CST

Why are you testing here DECL_SIZE when you actually then use
DECL_SIZE_UNIT?

> +   && (!is_global_var (base) || decl_binds_to_current_def_p (base)))
> + {
> +   HOST_WIDE_INT bytepos = bitpos / BITS_PER_UNIT;
> +   offset_int total_offset = bytepos;
> +   if (offset != NULL_TREE)
> + total_offset += wi::sext (wi::to_offset (offset), POINTER_SIZE);
> +   total_offset += wi::sext (wi::to_offset (cur_offset),
> + POINTER_SIZE);

Why are you sign extending it each time?  Can't it be sign extended after
all the additions?
On the other side, is it safe to add the (bytepos + offset) part with
the cur_offset part unconditionally?
If the former is "positive" and cur_offset is "negative", they cancel each
other and we IMHO shouldn't try to optimize it away.  bytepos could be some
very large integer, outside of bounds of the var, and cur_offset some
negative constant, where both var + bytepos and (var + bytepos) + cur_offset
ove

Re: [C++ PATCH] give builtin types consistent name

2017-10-04 Thread Jason Merrill
On Wed, Oct 4, 2017 at 1:52 PM, Nathan Sidwell  wrote:
> The builtin type machinery is one of the places were we push decls into ::
> under not-their-name.
>
> There's no real reason for this, and the fix is simple -- create a new
> TYPE_DECL if we use a name from the ridpointer table.  I took the
> opportunity of reordering record_builtin_type, which was a series of
> interleaved conditionals dealing with different things.
>
> I add an assert to set_global_binding that the given name is DECL_NAME. Of
> course this raises the opportunity of just making set_global_binding have a
> single arg, like pushdecl and friends.  And then the two macros
> IDENTIFIER_GLOBAL_VALUE and SET_IDENTIFIER_GLOBAL_VALUE could be dispensed
> with and we have simply
>   tree get_global_value (tree identifier);
>   tree set_global_value (tree decl);
> perhaps 's/value/decl/'?
>
> Jason, any preference?

I lean toward "binding".

Jason


[C++ PATCH] give builtin types consistent name

2017-10-04 Thread Nathan Sidwell
The builtin type machinery is one of the places were we push decls into 
:: under not-their-name.


There's no real reason for this, and the fix is simple -- create a new 
TYPE_DECL if we use a name from the ridpointer table.  I took the 
opportunity of reordering record_builtin_type, which was a series of 
interleaved conditionals dealing with different things.


I add an assert to set_global_binding that the given name is DECL_NAME. 
Of course this raises the opportunity of just making set_global_binding 
have a single arg, like pushdecl and friends.  And then the two macros
IDENTIFIER_GLOBAL_VALUE and SET_IDENTIFIER_GLOBAL_VALUE could be 
dispensed with and we have simply

  tree get_global_value (tree identifier);
  tree set_global_value (tree decl);
perhaps 's/value/decl/'?

Jason, any preference?

set_global_value has to do the stat_hack dance, I can't recall the case 
where it happens, but I recall it biting me when I tried to dispense 
with it.


The remaining mismatched name case is the anonymous namespace.  Once 
that's fixed we can change namespace bindings from a map to a hash, and 
halve its size.


Applying to trunk.

nathan
--
Nathan Sidwell
2017-10-04  Nathan Sidwell  

	Give builtin types the correct name.
	* name-lookup.c (set_global_binding): Assert name is DECL_NAME.
	* decl.c (record_builtin_type): Reimplement, use new TYPE_DECL for
	rname.

Index: decl.c
===
--- decl.c	(revision 253413)
+++ decl.c	(working copy)
@@ -3895,47 +3895,47 @@ make_unbound_class_template (tree contex
 /* Push the declarations of builtin types into the global namespace.
RID_INDEX is the index of the builtin type in the array
RID_POINTERS.  NAME is the name used when looking up the builtin
-   type.  TYPE is the _TYPE node for the builtin type.  */
+   type.  TYPE is the _TYPE node for the builtin type.
+
+   The calls to SET_IDENTIFIER_GLOBAL_VALUE below should be
+   eliminated.  Built-in types should not be looked up name; their
+   names are keywords that the parser can recognize.  However, there
+   is code in c-common.c that uses identifier_global_value to look up
+   built-in types by name.  */
 
 void
 record_builtin_type (enum rid rid_index,
 		 const char* name,
 		 tree type)
 {
-  tree rname = NULL_TREE, tname = NULL_TREE;
-  tree tdecl = NULL_TREE;
+  tree decl = NULL_TREE;
 
-  if ((int) rid_index < (int) RID_MAX)
-rname = ridpointers[(int) rid_index];
   if (name)
-tname = get_identifier (name);
-
-  /* The calls to SET_IDENTIFIER_GLOBAL_VALUE below should be
- eliminated.  Built-in types should not be looked up name; their
- names are keywords that the parser can recognize.  However, there
- is code in c-common.c that uses identifier_global_value to look
- up built-in types by name.  */
-  if (tname)
 {
-  tdecl = build_decl (BUILTINS_LOCATION, TYPE_DECL, tname, type);
+  tree tname = get_identifier (name);
+  tree tdecl = build_decl (BUILTINS_LOCATION, TYPE_DECL, tname, type);
   DECL_ARTIFICIAL (tdecl) = 1;
   SET_IDENTIFIER_GLOBAL_VALUE (tname, tdecl);
+  decl = tdecl;
 }
-  if (rname)
-{
-  if (!tdecl)
+
+  if ((int) rid_index < (int) RID_MAX)
+if (tree rname = ridpointers[(int) rid_index])
+  if (!decl || DECL_NAME (decl) != rname)
 	{
-	  tdecl = build_decl (BUILTINS_LOCATION, TYPE_DECL, rname, type);
-	  DECL_ARTIFICIAL (tdecl) = 1;
+	  tree rdecl = build_decl (BUILTINS_LOCATION, TYPE_DECL, rname, type);
+	  DECL_ARTIFICIAL (rdecl) = 1;
+	  SET_IDENTIFIER_GLOBAL_VALUE (rname, rdecl);
+	  if (!decl)
+	decl = rdecl;
 	}
-  SET_IDENTIFIER_GLOBAL_VALUE (rname, tdecl);
-}
-
-  if (!TYPE_NAME (type))
-TYPE_NAME (type) = tdecl;
 
-  if (tdecl)
-debug_hooks->type_decl (tdecl, 0);
+  if (decl)
+{
+  if (!TYPE_NAME (type))
+	TYPE_NAME (type) = decl;
+  debug_hooks->type_decl (decl, 0);
+}
 }
 
 /* Push a type into the namespace so that the back ends ignore it.  */
Index: name-lookup.c
===
--- name-lookup.c	(revision 253421)
+++ name-lookup.c	(working copy)
@@ -4847,6 +4847,7 @@ set_global_binding (tree name, tree val)
 {
   bool subtime = timevar_cond_start (TV_NAME_LOOKUP);
 
+  gcc_checking_assert (name == DECL_NAME (val));
   tree *slot = find_namespace_slot (global_namespace, name, true);
   tree old = MAYBE_STAT_DECL (*slot);
 


[PATCH] remove some unneeded parens

2017-10-04 Thread Nathan Sidwell
As described in 
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00122.html, I found a few 
places where declarations were excessively parenthesized.  These seem to 
be the uncontroversial cases, so fixing thusly.


Applying to trunk.

nathan
--
Nathan Sidwell
2017-10-04  Nathan Sidwell  

	* toplev.c (toplev::main): Remove excess parens on pretty_printer
	decl.
	* caller-save.c (insert_save): Remove excess parens on TO_SAVE parm.

Index: toplev.c
===
--- toplev.c	(revision 253418)
+++ toplev.c	(working copy)
@@ -2186,7 +2186,7 @@ toplev::main (int argc, char **argv)
 {
   gcc_assert (global_dc->edit_context_ptr);
 
-  pretty_printer (pp);
+  pretty_printer pp;
   pp_show_color (&pp) = pp_show_color (global_dc->printer);
   global_dc->edit_context_ptr->print_diff (&pp, true);
   pp_flush (&pp);
Index: caller-save.c
===
--- caller-save.c	(revision 253418)
+++ caller-save.c	(working copy)
@@ -1265,7 +1265,7 @@ insert_restore (struct insn_chain *chain
 
 static int
 insert_save (struct insn_chain *chain, int before_p, int regno,
-	 HARD_REG_SET (*to_save), machine_mode *save_mode)
+	 HARD_REG_SET *to_save, machine_mode *save_mode)
 {
   int i;
   unsigned int k;


[C++ PATCH] Move mangling alias out of ::

2017-10-04 Thread Nathan Sidwell
To deal with changes in the ABI that affected manglings, we like pushing 
decls into :: under their mangled names.  This is one of the 3 places we 
push decls under something not their DECL_NAME, and causes the namespace 
binding map to not be a simple hash table of DECLs.


It also can give rise to confusing duplicate decl errors that are 
followed by a somewhat obscure note that an ABI change will fix it.


This patch reimplements that machinery using a bespoke hash_map.  This 
allows a better diagnostic that talks about mangled names up front.


Applying to trunk.

nathan

--
Nathan Sidwell
2017-10-04  Nathan Sidwell  

	gcc/cp/
	Move mangling aliases out of global namespace.
	* cp-tree.h (record_mangling): New.
	(maybe_remove_implicit_alias): Delete.
	* decl2.c (mangled_decls): New hash map.
	(generate_mangling_alias): Reimplement using mangled_decls.
	(record_mangling): New.
	* mangle.c (decl_implicit_alias_p,
	maybe_remove_implicit_alias): Delete.
	(mangle_decl): Use record_mangling.
	* name-lookup.c (supplement_binding_1): Remove
	maybe_remove_implicit_alias check.

	* call.c (convert_arg_to_ellipsis): Correct comment about passing
	by reference.

	gcc/testsuite/
	* g++.dg/abi/mangle41.C: Adjust diagnostics.

	libcc1/
	* libcp1plugin.cc (supplement_binding): Don't use
	maybe_remove_implicit_alias.

Index: gcc/cp/cp-tree.h
===
--- gcc/cp/cp-tree.h	(revision 253413)
+++ gcc/cp/cp-tree.h	(working copy)
@@ -6142,6 +6142,7 @@ extern tree finish_case_label			(locatio
 extern tree cxx_maybe_build_cleanup		(tree, tsubst_flags_t);
 
 /* in decl2.c */
+extern void record_mangling			(tree, bool);
 extern void note_mangling_alias			(tree, tree);
 extern void generate_mangling_aliases		(void);
 extern tree build_memfn_type			(tree, tree, cp_cv_quals, cp_ref_qualifier);
@@ -7154,7 +7155,6 @@ extern tree add_exception_specifier		(tr
 extern tree merge_exception_specifiers		(tree, tree);
 
 /* in mangle.c */
-extern bool maybe_remove_implicit_alias		(tree);
 extern void init_mangle(void);
 extern void mangle_decl(tree);
 extern const char *mangle_type_string		(tree);
Index: gcc/cp/decl2.c
===
--- gcc/cp/decl2.c	(revision 253413)
+++ gcc/cp/decl2.c	(working copy)
@@ -102,6 +102,10 @@ static GTY(()) vec *no_link
is to be an alias for the former if the former is defined.  */
 static GTY(()) vec *mangling_aliases;
 
+/* A hash table of mangled names to decls.  Used to figure out if we
+   need compatibility aliases.  */
+static GTY(()) hash_map *mangled_decls;
+
 /* Nonzero if we're done parsing and into end-of-file activities.  */
 
 int at_eof;
@@ -4290,25 +4294,34 @@ handle_tls_init (void)
 static void
 generate_mangling_alias (tree decl, tree id2)
 {
+  struct cgraph_node *n = NULL;
+
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+{
+  n = cgraph_node::get (decl);
+  if (!n)
+	/* Don't create an alias to an unreferenced function.  */
+	return;
+}
+
+  bool existed;
+  tree *slot = &mangled_decls->get_or_insert (id2, &existed);
+
   /* If there's a declaration already using this mangled name,
  don't create a compatibility alias that conflicts.  */
-  if (IDENTIFIER_GLOBAL_VALUE (id2))
-return;
-
-  struct cgraph_node *n = NULL;
-  if (TREE_CODE (decl) == FUNCTION_DECL
-  && !(n = cgraph_node::get (decl)))
-/* Don't create an alias to an unreferenced function.  */
+  if (existed)
 return;
 
   tree alias = make_alias_for (decl, id2);
-  SET_IDENTIFIER_GLOBAL_VALUE (id2, alias);
+  *slot = alias;
+
   DECL_IGNORED_P (alias) = 1;
   TREE_PUBLIC (alias) = TREE_PUBLIC (decl);
   DECL_VISIBILITY (alias) = DECL_VISIBILITY (decl);
   if (vague_linkage_p (decl))
 DECL_WEAK (alias) = 1;
-  if (TREE_CODE (decl) == FUNCTION_DECL)
+
+  if (n)
 n->create_same_body_alias (alias, decl);
   else
 varpool_node::create_extra_name_alias (alias, decl);
@@ -4347,6 +4360,50 @@ generate_mangling_aliases ()
   defer_mangling_aliases = false;
 }
 
+/* Record a mangling of DECL, whose DECL_ASSEMBLER_NAME has just been
+   set.  NEED_WARNING is true if we must warn about collisions.  We do
+   this to spot changes in mangling that may require compatibility
+   aliases.  */
+
+void
+record_mangling (tree decl, bool need_warning)
+{
+  if (!mangled_decls)
+mangled_decls = hash_map::create_ggc (499);
+
+  gcc_checking_assert (DECL_ASSEMBLER_NAME_SET_P (decl));
+  tree id = DECL_ASSEMBLER_NAME (decl);
+  bool existed;
+  tree *slot = &mangled_decls->get_or_insert (id, &existed);
+
+  /* If this is already an alias, remove the alias, because the real
+ decl takes presidence.  */
+  if (!existed)
+;
+  else if (DECL_ARTIFICIAL (*slot) && DECL_IGNORED_P (*slot))
+if (symtab_node *n = symtab_node::get (*slot))
+  if (n->cpp_implicit_alias)
+	{
+	  n->remove ();
+	  existed = false;
+	}
+
+  if (!existed)
+*slot = decl;
+  else if (need_warning)
+

Re: [Patch AArch64] Stop generating BSL for simple integer code

2017-10-04 Thread James Greenhalgh

On Thu, Jul 27, 2017 at 06:49:01PM +0100, James Greenhalgh wrote:
>
> On Mon, Jun 12, 2017 at 02:44:40PM +0100, James Greenhalgh wrote:
> > [Sorry for the re-send. I spotted that the attributes were not right for the
> >  new pattern I was adding. The change between this and the first version 
> > was:
> >
> >   +  [(set_attr "type" "neon_bsl,neon_bsl,neon_bsl,multiple")
> >   +   (set_attr "length" "4,4,4,12")]
> > ]
> >
> > ---
> >
> > Hi,
> >
> > In this testcase, all argument registers and the return register
> > will be general purpose registers:
> >
> >   long long
> >   foo (long long a, long long b, long long c)
> >   {
> > return ((a ^ b) & c) ^ b;
> >   }
> >
> > However, due to the implementation of aarch64_simd_bsl_internal
> > we'll match that pattern and emit a BSL, necessitating moving all those
> > arguments and results to the Advanced SIMD registers:
> >
> > fmovd2, x0
> > fmovd0, x2
> > fmovd1, x1
> > bsl v0.8b, v2.8b, v1.8b
> > fmovx0, d0
> >
> > To fix this, we turn aarch64_simd_bsldi_internal in to an insn_and_split 
> > that
> > knows to split back to integer operations if the register allocation
> > falls that way.
> >
> > We could have used an unspec, but then we lose some of the nice
> > simplifications that can be made from explicitly spelling out the semantics
> > of BSL.
>
> Off list, Richard and I found considerable issues with this patch. From
> idioms failing to match in the testcase, to subtle register allocation bugs,
> to potential for suboptimal code generation.
>
> That's not good!
>
> This spin of the patch corrects those issues by adding a second split
> pattern, allocating a temporary register if we're permitted to, or
> properly using tied output operands if we're not, and by generally playing
> things a bit safer around the possibility of register overlaps.
>
> The testcase is expanded to an execute test, hopefully giving a little
> more assurance that we're doing the right thing - now testing the BSL idiom
> generation in both general-purpose and floating-point registers and comparing
> the results. Hopefully we're now a bit more robust!
>
> Bootstrapped and tested on aarch64-none-linux-gnu and cross-tested on
> aarch64-none-elf with no issues.
>
> OK for trunk?

I'd like to ping this patch for review (I would like the explicit review as
an earlier revision had bugs), attached is a rebase of this patch on trunk.

OK? If so, please commit it on my behalf as I will be out of office for a
few weeks for some time off.

Thanks,
James

---
gcc/

2017-10-04  James Greenhalgh  

* config/aarch64/aarch64-simd.md
(aarch64_simd_bsl_internal): Remove DImode.
(*aarch64_simd_bsl_alt): Likewise.
(aarch64_simd_bsldi_internal): New.
(aarch64_simd_bsldi_alt): Likewise.

gcc/testsuite/

2017-10-04  James Greenhalgh  

* gcc.target/aarch64/bsl-idiom.c: New.
* gcc.target/aarch64/copysign-bsl.c: New.

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 70e9339..1888d2f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2322,13 +2322,13 @@
 ;; in *aarch64_simd_bsl_alt.
 
 (define_insn "aarch64_simd_bsl_internal"
-  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
-	(xor:VSDQ_I_DI
-	   (and:VSDQ_I_DI
-	 (xor:VSDQ_I_DI
+  [(set (match_operand:VDQ_I 0 "register_operand" "=w,w,w")
+	(xor:VDQ_I
+	   (and:VDQ_I
+	 (xor:VDQ_I
 	   (match_operand: 3 "register_operand" "w,0,w")
-	   (match_operand:VSDQ_I_DI 2 "register_operand" "w,w,0"))
-	 (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
+	   (match_operand:VDQ_I 2 "register_operand" "w,w,0"))
+	 (match_operand:VDQ_I 1 "register_operand" "0,w,w"))
 	  (match_dup: 3)
 	))]
   "TARGET_SIMD"
@@ -2346,14 +2346,14 @@
 ;; permutations of commutative operations, we have to have a separate pattern.
 
 (define_insn "*aarch64_simd_bsl_alt"
-  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
-	(xor:VSDQ_I_DI
-	   (and:VSDQ_I_DI
-	 (xor:VSDQ_I_DI
-	   (match_operand:VSDQ_I_DI 3 "register_operand" "w,w,0")
-	   (match_operand:VSDQ_I_DI 2 "register_operand" "w,0,w"))
-	  (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
-	  (match_dup:VSDQ_I_DI 2)))]
+  [(set (match_operand:VDQ_I 0 "register_operand" "=w,w,w")
+	(xor:VDQ_I
+	   (and:VDQ_I
+	 (xor:VDQ_I
+	   (match_operand:VDQ_I 3 "register_operand" "w,w,0")
+	   (match_operand: 2 "register_operand" "w,0,w"))
+	  (match_operand:VDQ_I 1 "register_operand" "0,w,w"))
+	  (match_dup: 2)))]
   "TARGET_SIMD"
   "@
   bsl\\t%0., %3., %2.
@@ -2362,6 +2362,100 @@
   [(set_attr "type" "neon_bsl")]
 )
 
+;; DImode is special, we want to avoid computing operations which are
+;; more naturally computed in general purpose registers in the vector
+;; registers.  If we do that, we need to move all three operands from general
+;; purpose registers to 

[Committed] S/390: Fix mode in vector merge pattern.

2017-10-04 Thread Andreas Krebbel
vec_unpacks_hi_v4sf/vec_unpacks_lo_v4sf expand vec_mergeh and vec_mergel
patterns also for z13 with V4SF modes so the patterns should better
accept this.  Fixed by changing the mode iterator to V_128_NOSINGLE
which accepts V4SF unconditionally.

Committed to mainline.

gcc/ChangeLog:

2017-10-04  Andreas Krebbel  

* config/s390/vx-builtins.md ("vec_mergeh")
("vec_mergel"): Change mode iterator to V_128_NOSINGLE.
---
 gcc/config/s390/vx-builtins.md | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 7fb176c..1149355 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -211,9 +211,9 @@
 ; (vec_select op0) (vec_select op1)
 ; vmrhb, vmrhh, vmrhf, vmrhg
 (define_insn "vec_mergeh"
-  [(set (match_operand:VEC_HW 0 "register_operand" "=v")
-   (unspec:VEC_HW [(match_operand:VEC_HW 1 "register_operand"  "v")
-   (match_operand:VEC_HW 2 "register_operand"  "v")]
+  [(set (match_operand:V_128_NOSINGLE 0 
"register_operand" "=v")
+   (unspec:V_128_NOSINGLE [(match_operand:V_128_NOSINGLE 1 
"register_operand"  "v")
+   (match_operand:V_128_NOSINGLE 2 
"register_operand"  "v")]
   UNSPEC_VEC_MERGEH))]
   "TARGET_VX"
   "vmrh\t%v0,%1,%2"
@@ -221,9 +221,9 @@
 
 ; vmrlb, vmrlh, vmrlf, vmrlg
 (define_insn "vec_mergel"
-  [(set (match_operand:VEC_HW 0 "register_operand" "=v")
-   (unspec:VEC_HW [(match_operand:VEC_HW 1 "register_operand"  "v")
-   (match_operand:VEC_HW 2 "register_operand"  "v")]
+  [(set (match_operand:V_128_NOSINGLE 0 
"register_operand" "=v")
+   (unspec:V_128_NOSINGLE [(match_operand:V_128_NOSINGLE 1 
"register_operand"  "v")
+   (match_operand:V_128_NOSINGLE 2 
"register_operand"  "v")]
 UNSPEC_VEC_MERGEL))]
   "TARGET_VX"
   "vmrl\t%v0,%1,%2"
-- 
2.9.1



Re: Make tests less istreambuf_iterator implementation dependent

2017-10-04 Thread François Dumont

On 03/10/2017 16:20, Jonathan Wakely wrote:

On 02/10/17 07:43 +0200, François Dumont wrote:

On 28/09/2017 23:56, Jonathan Wakely wrote:

On 28/09/17 21:59 +0200, François Dumont wrote:


The current istreambuf_iterator implementation capture the current 
streambuf state each time it is tested for eof or evaluated. This 
is why I considered those tests as fragile.


Yes, and I think that's non-conforming.



Good, then we have something to fix.


As I said in the other thread, I'd really like to see references to
the standard used to justify any changes to our istreambuf_iterator


I think _M_c has been introduced to cover this paragraph of the 
Standard:


24.6.3.1

"1. Class istreambuf_iterator::proxy is for exposition 
only. An implementation is permit-
ted to provide equivalent functionality without providing a class 
with this name. Class istreambuf_-
iterator::proxy provides a temporary placeholder as 
the return value of the post-
increment operator (operator++). It keeps the character pointed to by 
the previous value of the iterator for

some possible future access to get the character."

This is why it is being set in operator++(int):

    istreambuf_iterator __old = *this;
    __old._M_c = _M_sbuf->sbumpc();

It is also the reason why libstdc++ fails:
http://llvm.org/svn/llvm-project/libcxx/trunk/test/std/iterators/stream.iterators/istreambuf.iterator/istreambuf.iterator.cons/proxy.pass.cpp 




It looks like this test is simply not Standard conformant.


I think you're right, because it relies on a property that may not be
true:

 any copies of the previous
 value of r are no longer
 required either to be
 dereferenceable or to be in
 the domain of ==.

I guess at some point _M_c started to be used also to cache the 
streambuf::sgetc resulting in current situation.


And arguably that is not "equivalent functionality" to returning a
proxy object. Although the standard seems a bit underspecified.

In attached patch I limit usage of _M_c to cover 24.6.3.1.1 point and 
so made additional changes to 2.cc test case to demonstrate it.


Doesn't the modified test PASS anyway, even without changing how _M_c
is used?


No it doesn't because the first check for eof capture the 'a' char which 
is never reseted so after string construction it is still pointing to 
this char and is not an eof iterator. Without the _M_c assignment the 
iterator is still "live" and so keeps on reflecting the streambuf state.


I'll show you some other side effects of this _M_c later.



Generally this looks good (I'm not sure if it addresses Petr's
concerns, but I don't think his "stop and go" usage is valid anyway -
it's certainly not portable or guaranteed by the standard).
No it doesn't address Petr's concern who is more focus on the _M_sbuf 
nullification part.



@@ -103,7 +103,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  : _M_sbuf(0), _M_c(traits_type::eof()) { }

#if __cplusplus >= 201103L
-  istreambuf_iterator(const istreambuf_iterator&) noexcept = 
default;

+  istreambuf_iterator(const istreambuf_iterator&) = default;


Please don't remove this.

It's present in the standard, and it ensures we don't accidentally add
a member that makes the default noexcept(false).

Ok



)),
    _M_message(__gnu_debug::__msg_inc_istreambuf)
    ._M_iterator(*this));
-    if (_M_sbuf)
-  {


This check should be unnecessary, because it's undefined to increment
and end-of-stream iterator, but I wonder if anyone is relying on it
being currently a no-op, rather than crashing.

Let's remove the check, and if it causes too many problems we can
restore it as:

 if (__builtin_expect(_M_sbuf != 0, 1))


-    else if (!traits_type::eq_int_type((__ret = _M_sbuf->sgetc()),
-   __eof))
-  _M_c = __ret;


Removing this assignment means that we will call _M_sbuf->sgetc() for
every dereference operation, instead of caching the current value in
_M_c. I think that's probably OK, because it's unlikely that
performance critical code is dereferencing the same iterator
repeatedly without incrementing it.

And getting rid of the mutable member is a Good Thing.

An alternative would be to cache it in operator*() instead of in
_M_get(), which would still give the desired change of not updating it
when _M_get() is used elsewhere. But would still involve setting a
mutable member in a const function.


-    else
+    int_type __ret = _M_c;
+    if (_M_sbuf && _S_at_eof(__ret) && _S_at_eof(__ret = 
_M_sbuf->sgetc()))

  _M_sbuf = 0;


It's unfortunate that we still have this assignment to a mutable
member in a const function, but I think it's necessary to avoid an
end-of-stream iterator becoming valid again.
Yes, an alternative could be to reset it in increment operators but it 
had other side effects. I'll try to send you the result of this approach.



-  }
return __ret;
  }

  bool
  _M_at_eof() const
+  { return _S_at_eof(_M_get()); }
+
+  static bo

Re: [C++ PATCH] Fix comment

2017-10-04 Thread Nathan Sidwell

On 10/04/2017 11:29 AM, Jason Merrill wrote:

On Wed, Oct 4, 2017 at 10:12 AM, Nathan Sidwell  wrote:

In answering a question about passing non-trivial types through ..., I
discovered a misleading comment.  It is NOT just like a value parm, because
we don't locally copy it.


Hmm, I think the error is in the behavior, not the comment.  :)


I wouldn't disagree.

nathan

--
Nathan Sidwell


Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-10-04 Thread Richard Earnshaw (lists)
On 02/10/17 10:05, Sudi Das wrote:
> 
> Hi Richard
> 
> Thanks, I have made the change to the patch.
> 
> 
> 2017-10-02 Sudakshina Das  
> 
>   * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
> check type
>   for aarch64_simd_valid_immediate.
>   (aarch64_output_simd_mov_immediate): Update prototype.
>   (aarch64_simd_valid_immediate): Update prototype.
> 
>   * config/aarch64/aarch64-simd.md (orr3): modified pattern to add
>   support for ORR-immediate.
>   (and3): modified pattern to add support for BIC-immediate.
> 
>   * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now 
> checks
>   for valid immediate for BIC and ORR based on new enum argument.
>   (aarch64_output_simd_mov_immediate): Function now used to output 
> BIC/ORR imm
>   as well based on new enum argument.
>  
>   * config/aarch64/constraints.md (Do): New vector immediate constraint.
>   (Db) : Likewise.
> 
>   * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): New predicate.
>   (aarch64_reg_or_bic_imm): Likewise.
> 
> 
> 2017-10-02 Sudakshina Das  
> 
>   * gcc.target/aarch64/bic_imm_1.c: New test.
>   * gcc.target/aarch64/orr_imm_1.c: Likewise.
> 

OK.

R.

> 
> From: Richard Earnshaw (lists) 
> Sent: Thursday, September 28, 2017 9:55 AM
> To: Sudi Das; James Greenhalgh
> Cc: gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
> Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
> 
> On 27/09/17 18:57, Sudi Das wrote:
>>
>>
>> Hi James
>>
>> I have made the requested changes to the patch.
>>
>>
>> 2017-09-27 Sudakshina Das  
>>
>> * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
>> check type
>> for aarch64_simd_valid_immediate.
>> (aarch64_output_simd_mov_immediate): Update prototype.
>> (aarch64_simd_valid_immediate): Update prototype.
>>
>> * config/aarch64/aarch64-simd.md (orr3): modified pattern to 
>> add
>> support for ORR-immediate.
>> (and3): modified pattern to add support for BIC-immediate.
>>
>> * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function 
>> now checks
>> for valid immediate for BIC and ORR based on new enum argument.
>> (aarch64_output_simd_mov_immediate): Function now used to output 
>> BIC/ORR imm
>> as well based on new enum argument.
>>   
>> * config/aarch64/constraints.md (Do): New vector immediate 
>> constraint.
>> (Db): Likewise.
>>
>> 2017-09-27 Sudakshina Das  
>>
>> * gcc.target/aarch64/bic_imm_1.c: New test.
>> * gcc.target/aarch64/orr_imm_1.c: Likewise.
>>
>>
>> Thanks
>> Sudi
>>
>>
>> From: James Greenhalgh 
>> Sent: Tuesday, September 26, 2017 8:04:38 PM
>> To: Sudi Das
>> Cc: Richard Earnshaw; gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
>> Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
>>  
>> On Mon, Sep 25, 2017 at 11:13:57AM +0100, Sudi Das wrote:
>>>
>>> Hi James
>>>
>>> I put aarch64_output_simd_general_immediate looking at the similarities of
>>> the immediates for mov/mvni and orr/bic. The CHECK macro in
>>> aarch64_simd_valid_immediate both checks
>>> and converts the immediates in a manner that are needed for the 
>>> instructions.
>>>
>>> Having said that, I agree that maybe I could have refactored
>>> aarch64_output_simd_mov_immediate to do the work rather than creating a new
>>> functions to do similar things. I have done so in this patch.
>>
>> Thanks, this looks much neater.
>>
>>> I have also changed the names of the enum simd_immediate_check to be better
>>> indicative of what they are doing. 
>>
>> Thanks, I'd tweak them to look more like the bitmasks you use them as, but
>> that is a small change for my personal preference.
>>
>>> Lastly I have added more cases in the tests (according to all the possible
>>> CHECKs) and made them dg-do assemble (although I had to add --save-temps so
>>> that the scan-assembler would work). Do you think I should not put that
>>> option and rather create separate tests?
>>
>> This is good - thanks.
>>
>> I think clean up the enum definitions and this patch will be good.
>>
>>> @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result
>>>  AARCH64_PARSE_INVALID_ARG  /* Invalid arch, tune, cpu arg.  */
>>>};
>>>
>>> +/* Enum to distinguish which type of check is to be done in
>>> +   aarch64_simd_valid_immediate.  This is used as a bitmask where
>>> +   AARCH64_CHECK_MOV has both bits set.  Thus AARCH64_CHECK_MOV will
>>> +   perform all checks.  Adding new types would require changes 
>>> accordingly.  */
>>> +enum simd_immediate_check {
>>> +  AARCH64_CHECK_ORR  = 1,/* Perform immediate checks for ORR.  */
>>> +  AARCH64_CHECK_BIC  = 2,/* Perform immediate checks for BIC.  */
>>> +  AARCH64_CHECK_MOV  = 3 /* Perform all checks (used for MOVI/MNVI).  
>>> */
>>
>> These are used in bit-mask style, so how about:
>>
>>   

Re: C++ PATCH for c++/81525, auto and generic lambda

2017-10-04 Thread Jason Merrill
On Wed, Aug 9, 2017 at 3:20 PM, Jason Merrill  wrote:
> In this testcase, when building up an extra version of N to refer to
> when instantiating the generic lambda, we were mistakenly replacing
> the 'auto' with a template argument for the generic lambda.
>
> Tested x86_64-pc-linux-gnu, applying to trunk and 7.

Recent testing found a bug in this patch, whereby we would mistakenly
clobber the TREE_VEC_LENGTH of a single-level argument set, or
increase the length rather than decreasing it.  Fixed by using
strip_innermost_template_args and calculating how many levels we
actually want to remove.  On the trunk I've also added an assert that
we shouldn't get into this situation anymore.

Tested x86_64-pc-linux-gnu, applying to trunk and 7.
commit 2e694fe9477fefb4d746943c6996785109f61aa6
Author: Jason Merrill 
Date:   Thu Sep 28 16:07:34 2017 -0400

PR c++/81525 - broken handling of auto in generic lambda.

* pt.c (tsubst_decl) [VAR_DECL]: Use strip_innermost_template_args.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f3ad6083190..38d7b45eb9b 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -12896,15 +12896,17 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
&& VAR_HAD_UNKNOWN_BOUND (t)
&& type != error_mark_node)
  type = strip_array_domain (type);
-   tree auto_node = type_uses_auto (type);
-   int len = TREE_VEC_LENGTH (args);
-   if (auto_node)
- /* Mask off any template args past the variable's context so we
-don't replace the auto with an unrelated argument.  */
- TREE_VEC_LENGTH (args) = TEMPLATE_TYPE_LEVEL (auto_node) - 1;
-   type = tsubst (type, args, complain, in_decl);
-   if (auto_node)
- TREE_VEC_LENGTH (args) = len;
+   tree sub_args = args;
+   if (tree auto_node = type_uses_auto (type))
+ {
+   /* Mask off any template args past the variable's context so we
+  don't replace the auto with an unrelated argument.  */
+   int nouter = TEMPLATE_TYPE_LEVEL (auto_node) - 1;
+   int extra = TMPL_ARGS_DEPTH (args) - nouter;
+   if (extra > 0)
+ sub_args = strip_innermost_template_args (args, extra);
+ }
+   type = tsubst (type, sub_args, complain, in_decl);
  }
if (VAR_P (r))
  {
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-auto1.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-auto1.C
new file mode 100644
index 000..b9e98c551c0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-auto1.C
@@ -0,0 +1,19 @@
+// Related to c++/81525
+// { dg-do compile { target c++14 } }
+
+template 
+struct A
+{
+  template 
+  static void f()
+  {
+[](auto b) {
+  auto c = +b;
+}(42);
+  }
+};
+
+int main()
+{
+  A::f();
+}
commit c6c77e3c274bc8a1bdd33be58895e1ae08a453a7
Author: Jason Merrill 
Date:   Thu Sep 28 16:07:34 2017 -0400

PR c++/81525 - broken handling of auto in generic lambda.

* pt.c (tsubst_decl) [VAR_DECL]: Use strip_innermost_template_args.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c29c779a147..36c8c106439 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13042,15 +13042,20 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
&& VAR_HAD_UNKNOWN_BOUND (t)
&& type != error_mark_node)
  type = strip_array_domain (type);
-   tree auto_node = type_uses_auto (type);
-   int len = TREE_VEC_LENGTH (args);
-   if (auto_node)
- /* Mask off any template args past the variable's context so we
-don't replace the auto with an unrelated argument.  */
- TREE_VEC_LENGTH (args) = TEMPLATE_TYPE_LEVEL (auto_node) - 1;
-   type = tsubst (type, args, complain, in_decl);
-   if (auto_node)
- TREE_VEC_LENGTH (args) = len;
+   tree sub_args = args;
+   if (tree auto_node = type_uses_auto (type))
+ {
+   /* Mask off any template args past the variable's context so we
+  don't replace the auto with an unrelated argument.  */
+   int nouter = TEMPLATE_TYPE_LEVEL (auto_node) - 1;
+   int extra = TMPL_ARGS_DEPTH (args) - nouter;
+   if (extra > 0)
+ /* This should never happen with the new lambda instantiation
+model, but keep the handling just in case.  */
+ gcc_assert (!CHECKING_P),
+ sub_args = strip_innermost_template_args (args, extra);
+ }
+   type = tsubst (type, sub_args, complain, in_decl);
  }
if (VAR_P (r))
  {
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-auto1.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-auto1.C
new file mode 100644
index 000..b9e98c551c0
--- /dev/null
+++ b/g

Re: [C++ Patch] PR 71946 ("asm in toplevel lambda function rejected")

2017-10-04 Thread Jason Merrill
OK.

On Wed, Oct 4, 2017 at 8:50 AM, Paolo Carlini  wrote:
> Hi,
>
> Andrew noticed that the reason we reject entities like inline-asm or GNU's
> statement-expressions in the lambda body is simply that we don't set the
> parser->in_function_body flag. Thus the below, which appears to work
> perfectly for that. Tested x86_64-linux.
>
> Thanks, Paolo.
>
> ///
>
>


Re: [C++ PATCH] Fix comment

2017-10-04 Thread Jason Merrill
On Wed, Oct 4, 2017 at 10:12 AM, Nathan Sidwell  wrote:
> In answering a question about passing non-trivial types through ..., I
> discovered a misleading comment.  It is NOT just like a value parm, because
> we don't locally copy it.

Hmm, I think the error is in the behavior, not the comment.  :)

Jason


Re: [PATCH] Pretty-print GOACC_REDUCTION arguments

2017-10-04 Thread Thomas Schwinge
Hi Tom!

On Tue, 3 Oct 2017 10:56:59 +0200, Tom de Vries  wrote:
> On 09/27/2017 03:46 PM, Thomas Schwinge wrote:
> > On Mon, 25 Sep 2017 16:57:52 +0200, Tom de Vries  
> > wrote:
> >> currently for a GOACC_REDUCTION internal fn call we print:
> >> ...
> >> sum_5 = GOACC_REDUCTION (SETUP, _3, 0, 0, 67, 0);
> >> ...
> >>
> >> This patch adds a comment for some arguments explaining the meaning of
> >> the argument:
> >> ...
> >> sum_5 = GOACC_REDUCTION (SETUP, _3, 0, 0 /*gang*/, 67 /*+*/, 0);
> >> ...
> > 
> > ACK to the general idea.
> > 
> > However, I note that for the "CODE" we currently *only* print the textual
> > variant ("SETUP") (just like we're also doing for a few other "IFN_*"),
> > whereas for "LEVEL" and "OP" you now print both.  Should these really be
> > different?  I think I actually do prefer the style you're using (print
> > both).
> 
> I chose the c-like syntax to make it easier to copy gimple to test-case 
> ( based on comment 
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg02180.html ). But 
> reflecting on it a bit more, I realized that it's an internal function 
> and as such won't work anyway in test-cases, so I'm not sure how 
> relevant it is in this case.

The way you're doing it still makes sense to me: we might (at some later
point...) read such dumps back in, using the GIMPLE front end or similar.
(Clearly nothing to worry about right now.)


> >> +case IFN_GOACC_REDUCTION:
> >> +  switch (i)
> >> +{
> >> +case 3:
> >> +  switch (tree_to_uhwi (gimple_call_arg (gs, i)))
> > 
> > Something ;-) is wrong.  Running this on
> > libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c, I run into:
> > 
> >  during GIMPLE pass: omplower
> >  dump file: reduction-1.c.006t.omplower
> >  
> >  In file included from 
> > source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c:9:0:
> >  source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c: 
> > In function 'test_reductions':
> >  
> > source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h:20:7: 
> > internal compiler error: in tree_to_uhwi, at tree.c:6606
> > abort ();\
> 
> > This seems to be the "LEVEL" being "-1" -- probably meaning "not yet
> > decided"?  (Haven't looked that up.)

As far as I can tell, initially all these "LEVEL" arguments are set to
"-1".  See the "place" argument in the "lower_oacc_reductions" call in
"lower_oacc_head_tail".  So, I understand this to mean "not yet decided",
from here on, until some real level of parallelism gets set.

> In execute_oacc_device_lower we find:
> ...
>case IFN_GOACC_REDUCTION:
>  /* Mark the function for SSA renaming.  */
>   mark_virtual_operands_for_renaming (cfun);
> 
>  /* If the level is -1, this ended up being an unused 
> 
> axis.  Handle as a default.  */
>  if (integer_minus_onep (gimple_call_arg (call, 3)))
>default_goacc_reduction (call);
>  else
>targetm.goacc.reduction (call);
> ...
> 
> I'm printing it now as GOMP_DIM_NONE.

But after this processing, there are no "IFN_GOACC_REDUCTION" anymore, so
we should rather print something like "unknown" instead of
"GOMP_DIM_NONE" (which doesn't exist anyway).


With that changed, we're good as far as I'm concerned, thanks!  But I
can't formally approve, of course.  Reviewed-by: Thomas Schwinge
 (See
.)


Grüße
 Thomas


> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> Pretty-print GOACC_REDUCTION arguments
> 
> Prints
>   sum_5 = GOACC_REDUCTION (SETUP, _3, 0, 0 /*GOMP_DIM_GANG*/, 67 /*+*/, 0);
> instead of
>   sum_5 = GOACC_REDUCTION (SETUP, _3, 0, 0, 67, 0);
> 
> 2017-09-25  Tom de Vries  
> 
>   * gimple-pretty-print.c (dump_gimple_call_args): Pretty-print
>   GOACC_REDUCTION arguments.
> 
> ---
>  gcc/gimple-pretty-print.c | 42 ++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
> index ed8e51c..33ca45d 100644
> --- a/gcc/gimple-pretty-print.c
> +++ b/gcc/gimple-pretty-print.c
> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "asan.h"
> +#include "gomp-constants.h"
>  
>  #define INDENT(SPACE)
> \
>do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0)
> @@ -765,6 +766,47 @@ dump_gimple_call_args (pretty_printer *buffer, gcall 
> *gs, dump_flags_t flags)
>if (i)
>   pp_string (buffer, ", ");
>dump_generic_node (buffer, gimple_call_arg (gs, i), 0, flags, false);
> +
> +  if (gimple_call_internal_p (gs))
> + switch (gimple_call_internal_fn (gs))
> +   {
> +   case IFN_GOACC_REDUCTION:
> + 

Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Eric Botcazou
> We have now had 5 days of no builds for a major target, which is a huge
> inconvenience. So I don't think it is reasonable to wait any longer.

well, to put things in perspective, you broke the SPARC compiler about one 
month ago and have done anything AFAIK since then to repair the breakage so...

-- 
Eric Botcazou


Re: [PATCH] Improve alloca alignment

2017-10-04 Thread Eric Botcazou
> This seems like a SPARC target problem to me -- essentially it's
> claiming a higher STACK_BOUNDARY than it really has.

No, it is not, I can guarantee you that the stack pointer is always aligned to 
64-bit boundaries on SPARC, otherwise all hell would break loose...

> Presumably there's a good reason for this and some kind of hack may be
> needed to deal with it in dynamically allocated space.  But it does not
> seem like we should be forcing all targets to allocate unnecessary space
> to deal with this.

I agree but SPARC is presumably not the only affected platform, so I think 
that it's wrong to sureptitiously change the interface with the ~50 back-ends 
and hope that the maintainers will repair the damage; they won't and we'll 
have introduced very nasty bugs for a few wasted bytes on the stack.

-- 
Eric Botcazou


Re: [PATCH] C++ warning on vexing parse

2017-10-04 Thread Jakub Jelinek
On Wed, Oct 04, 2017 at 10:44:11AM -0400, Jason Merrill wrote:
> > 3) A couple of places do:
> >T (name  // line break to avoid wrap
> >   [LONGEXPR][OTHERLONGEXPR]);
> >
> > The parens aid the editor's formatter. The patch removes the parens, but I
> > can see there may be disagreement.  I suppose we could permit parens at the
> > outermost declarator for an array?
> > Affects ira-int.h (target_ira_int::x_ira_reg_mode_hard_regset)
> >  reload.h (target_reload::x_regno_save_mode)
> 
> We could also check whether the declarator spans lines to detect this case.

I think it is better to just fix it up so that no line break is needed
there.  E.g. define a macro, enum or const int with the sizes and use that
in the declaration.

> > 4) A set of typedef'd function types (NOT ptr-to-fn).  The name being
> > typedef'd is parenthesized and not an idiom I recall seeing before. AFAICT
> > these are the only cases of typedefing a plain fn.  We could, I suppose,
> > ignore parens on a typedef'd fntype, but that seems a little random.
> > Affects gengtype.c (frul_actionrout_t)
> > lto-streamer.h (lto_get_section_data_f & to_free_section_data_f)
> > tree-ssa-threadedge.c (pfn_simplify)
> 
> Hmm, I don't think we want to diagnose these; if a parameter-list
> follows the parenthesized declarator, it isn't ambiguous.
> 
> 3 and 4 seem like false positives.  The problematic cases are all ones
> where the parenthesized name is at the end of the declarator; if the
> declarator continues after the name, either within or without the
> parens, this
> 
> > Jason, do you have concerns about the C++ patch itself?
> 
> Nope.

Jakub


Re: [PATCH] C++ warning on vexing parse

2017-10-04 Thread Jason Merrill
On Tue, Oct 3, 2017 at 1:08 PM, Nathan Sidwell  wrote:
> [non-c++ people on CC, there's a reason ...]
>
> This patch adds a new warning, concerning unnecessary parentheses on a
> declaration.  For instance:
>prettyprinter (pp);
> That's a declaration of a pp variable of type prettyprinter -- not a call of
> a prettyprinter function.  The reason is that in C++, anything that is
> syntactically a declaration, is a declaration.  This can lead to surprising
> unexpected code generation, and the documentation I added provides
> motivation:
>   std::unique_lock (mymutex);
> That is NOT a lock of mymutex.  It's a declaration of a variable called
> mymutex locking no mutex.  If we're in a member function and 'mymutex' is a
> field, Wshadow-parm doesn't trigger either.
>
> This patch notes when a direct-declarator is parenthesized, and then warns
> about it in grokdeclarator.  Obviously, parens are sometimes necessary --
> when declaring pointers to arrays or functions, and we don't warn then.  The
> simplest way to do that was during the parsing, not in grokdeclarator, where
> the cp_declarator object is constant.  We'd either have to change that, or
> have some other local state.
>
> I added this to -Wparentheses (enabled by -Wall).  It triggered in a few
> cases during bootstrap:
>
> 1) in toplev.c we have the above prettyprinter example.  I think that's
> clearly poorly formatted code.

Definitely.

> 2) caller-save.c: insert_save has 'HARD_REG_SET (*to_save)', which also
> seems poorly formatted to me -- other cases of ptr-to-HRS don't do this.

Probably autopilot from someone used to parenthesizing declarators for
pointer-to-function or pointer-to-array types.

> 3) A couple of places do:
>T (name  // line break to avoid wrap
>   [LONGEXPR][OTHERLONGEXPR]);
>
> The parens aid the editor's formatter. The patch removes the parens, but I
> can see there may be disagreement.  I suppose we could permit parens at the
> outermost declarator for an array?
> Affects ira-int.h (target_ira_int::x_ira_reg_mode_hard_regset)
>  reload.h (target_reload::x_regno_save_mode)

We could also check whether the declarator spans lines to detect this case.

> 4) A set of typedef'd function types (NOT ptr-to-fn).  The name being
> typedef'd is parenthesized and not an idiom I recall seeing before. AFAICT
> these are the only cases of typedefing a plain fn.  We could, I suppose,
> ignore parens on a typedef'd fntype, but that seems a little random.
> Affects gengtype.c (frul_actionrout_t)
> lto-streamer.h (lto_get_section_data_f & to_free_section_data_f)
> tree-ssa-threadedge.c (pfn_simplify)

Hmm, I don't think we want to diagnose these; if a parameter-list
follows the parenthesized declarator, it isn't ambiguous.

3 and 4 seem like false positives.  The problematic cases are all ones
where the parenthesized name is at the end of the declarator; if the
declarator continues after the name, either within or without the
parens, this

> Jason, do you have concerns about the C++ patch itself?

Nope.

Jason


[PATCH][GRAPHITE] Rewrite PHI handling in code-gen

2017-10-04 Thread Richard Biener

The following patch completely re-does PHI handling during 
code-generation.  PHI handling is currently responsible for 99% of
all code-generation issues.  With the patch the number of code-generation
issues in SPEC 2k6 decreases from 180 to 5, similar adjustments happen
to the testsuite - only gfortran.dg/graphite has some expected code-gen
issues left.

The current idea of categorizing PHIs and doing code-gen based on
pattern matching with the original GIMPLE IL isn't feasible given
ISL can do transforms like peeling, optimizing away conditions and
creating arbitrary number of GBB copies.  The current code fences
off a lot of cases by simply giving up.

To fix the current code one would need to basically replicate the
update-SSA machinery we already have (and pointlessly exercise
from the graphite code-gen at the moment).

Thus the patch rips out all manual handling of PHIs during code-generation
and leaves all cross-BB scalar updates to update-SSA.

This means "going out-of-SSA" again, but instead of applying out-of-SSA
on the original GIMPLE IL I'm just doing this on-the-fly during
scalar dependence generation and code generation.

 bb3:
  goto bb5;

 bb4:

 bb5:
  _2 = PHI <_3(3), _4(4)>

becomes (for an identity rewrite) before update-SSA:

 bb3':
  D.1234 = _3;

 bb4':
  D.1234 = _4;

 bb5':
  _5 = D.1234;

with _5 being a new def for _2.  update-SSA then re-writes the
_3 and _4 uses with available new defs we have registered during
code generation of the _3 and _4 def copies and rewrites D.1234
into SSA, inserting PHIs where necessary.

This scheme of course relies on ISL outputting a correct schedule
which of course relies on us setting proper dependence constraints.
I've fixed quite a few issues there, for example missing constraints
for the SESE liveout variables.

One awkward thing is that to not confuse ISL with PHI edge copies
placed in latch blocks, like

  for (int c0 = 0; c0 < P_22; c0 += 1) {
S_6(0, c0);
if (P_22 >= c0 + 2)
  S_7(0, c0);
  }

and ISL then happilly peeling off the last iteration where the latch S_7
containing only the out-of-SSA copy is not needed.  So I'm trying to
detect empty latches and instead insert the out-of-SSA copy in its
predecessor instead (I know it doesn't matter if we execute the stmt
in the last iteration).

The patch as-is ends up with quite some useless SSA copies which
is cleaned up by the copyprop pass inside the graphite pipeline
but can also be improved by teaching the into-SSA rewrite to
eliminate copies.

I do expect issues with the patch (I'm seeing CE 416.gamess, but not
sure why and 403.gcc miscompare), but it's already become somewhat too big 
to handle.

Currently re-bootstrapping and testing after some cosmetic changes,
testsuites ran successfully, SPEC CPU 2006 built and run (with test data).
Statistics (with all graphite params set to unlimited) are

loop nest optimized: 119
loop nest not optimized, code generation error: 5
loop nest not optimized, optimized schedule is identical to original 
schedule: 110
loop nest not optimized, optimization timed out: 31
loop nest not optimized, ISL signalled an error: 6
loop nest: 271

Ok for trunk?

Thanks,
Richard.

2017-10-04  Richard Biener  

* graphite-isl-ast-to-gimple.c: Include ssa.h and tree-ssa.h.
(translate_isl_ast_to_gimple::translate_pending_phi_nodes,
translate_isl_ast_to_gimple::is_valid_rename,
translate_isl_ast_to_gimple::get_rename,
translate_isl_ast_to_gimple::get_def_bb_for_const,
translate_isl_ast_to_gimple::get_new_name,
translate_isl_ast_to_gimple::collect_all_ssa_names,
translate_isl_ast_to_gimple::copy_loop_phi_args,
translate_isl_ast_to_gimple::collect_all_ssa_names,
translate_isl_ast_to_gimple::copy_loop_phi_args,
translate_isl_ast_to_gimple::copy_loop_phi_nodes,
translate_isl_ast_to_gimple::add_close_phis_to_merge_points,
translate_isl_ast_to_gimple::add_close_phis_to_outer_loops,
translate_isl_ast_to_gimple::copy_loop_close_phi_args,
translate_isl_ast_to_gimple::copy_loop_close_phi_nodes,
translate_isl_ast_to_gimple::copy_cond_phi_args,
translate_isl_ast_to_gimple::copy_cond_phi_nodes,
translate_isl_ast_to_gimple::edge_for_new_close_phis,
translate_isl_ast_to_gimple::add_phi_arg_for_new_expr,
translate_isl_ast_to_gimple::rename_uses,
translate_isl_ast_to_gimple::rename_all_uses): Remove.
(translate_isl_ast_to_gimple::get_rename_from_scev): Simplify.
(set_rename_for_each_def): Likewise.
(graphite_copy_stmts_from_block): Handle debug stmt resetting
here.  Handle rewriting SCEV analyzable uses here.
(copy_bb_and_scalar_dependences): Generate code for PHI
copy-in/outs.
(graphite_regenerate_ast_isl): Adjust.
* graphite-scop-detection.c (trivially_empty_bb_p): Move to sese.[ch].
(add_write, add_read): New functions.
(build_c

[C++ PATCH] Fix comment

2017-10-04 Thread Nathan Sidwell
In answering a question about passing non-trivial types through ..., I 
discovered a misleading comment.  It is NOT just like a value parm, 
because we don't locally copy it.


class X { ..,non-poddy... };
void Foo (X, ...);
void bin (X &p)
{
  Foo (p, p);
}
Only the first arg to Foo goes via a copy-constructor.

Fixed thusly and applied.

nathan
--
Nathan Sidwell
2017-10-04  Nathan Sidwell  

	* call.c (convert_arg_to_ellipsis): Correct comment about passing
	by reference.

Index: call.c
===
--- call.c	(revision 253409)
+++ call.c	(working copy)
@@ -7178,8 +7178,9 @@ convert_arg_to_ellipsis (tree arg, tsubs
 	 with no corresponding parameter is conditionally-supported, with
 	 implementation-defined semantics.
 
-	 We support it as pass-by-invisible-reference, just like a normal
-	 value parameter.
+	 We support it as pass-by-invisible-reference to the caller's
+	 object.  That's different to named by-value parameters, which
+	 construct a copy and pass a reference to that.
 
 	 If the call appears in the context of a sizeof expression,
 	 it is not potentially-evaluated.  */


Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Jakub Jelinek
On Wed, Oct 04, 2017 at 02:12:11PM +0100, Ramana Radhakrishnan wrote:
> On Wed, Oct 4, 2017 at 12:01 PM, Richard Sandiford
>  wrote:
> > Wilco Dijkstra  writes:
> >> Christophe Lyon wrote:
> >>
> >>> FWIW, I confirm that this patch fixes the build failure I reported at:
> >>> https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01980.html
> >>
> >> Thanks, now committed as r253399.
> >
> > I don't think it's reasonable to commit this as obvious.  You said
> > yourself in the covering message that "it doesn't at all restore
> > the original behaviour since we no longer compare the base address".
> > So even with the bootstrap failure, I think the patch needed review
> > before going in.
> 
> I agree that this patch shouldn't have gone in without review from a
> maintainer of the appropriate area regardless of whether this fixes a
> bootstrap failure or not.
> 
> However we need a scheduler maintainer or global reviewer to please
> help review this patch or help come up with an alternative patch. A
> primary platform broken for 5 days with a commit and no public
> response from the original poster is really not appropriate behaviour
> in this community. If not, the original patch should have been
> considered for reverting under the 48 hour rule .

A workaround that could be committed as obvious would be wrapping
some qsort call affected by this into (), i.e. instead of doing
qsort (...); do (qsort) (...);
That way you revert to previous behavior for the problematic qsort and not
change anything else.
The committed patch isn't in any way obvious and really needs to be reviewed
by a scheduler maintainer.

Jakub


[committed] jit: implement gcc_jit_context_new_rvalue_from_vector

2017-10-04 Thread David Malcolm
This patch implements a new API entrypoint:

/* Build a vector rvalue from an array of elements.

   "vec_type" should be a vector type, created using gcc_jit_type_get_vector.

   This API entrypoint was added in LIBGCCJIT_ABI_10; you can test for its
   presence using
 #ifdef LIBGCCJIT_HAVE_gcc_jit_context_new_rvalue_from_vector
*/
extern gcc_jit_rvalue *
gcc_jit_context_new_rvalue_from_vector (gcc_jit_context *ctxt,
gcc_jit_location *loc,
gcc_jit_type *vec_type,
size_t num_elements,
gcc_jit_rvalue **elements);

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu;
takes jit.sum from 9909 to 10294 PASS results.

Committed to trunk as r253409.

gcc/jit/ChangeLog:
* docs/cp/topics/expressions.rst (Vector expressions): New
section.
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_10): New ABI tag.
* docs/topics/expressions.rst (Vector expressions): New section.
* docs/topics/types.rst (gcc_jit_type_get_vector): Add link to
gcc_jit_context_new_rvalue_from_vector.
* jit-common.h (gcc::jit:recording::vector_type): New forward
decl.
* jit-playback.c
(gcc::jit::playback::context::new_rvalue_from_vector): New method.
* jit-playback.h
(gcc::jit::playback::context::new_rvalue_from_vector): New method.
* jit-recording.c: In namespace gcc::jit::
(class comma_separated_string): New class.
(comma_separated_string::comma_separated_string): New ctor,
adapted from recording::call::make_debug_string.
(comma_separated_string::~comma_separated_string): New dtor.
In namespace gcc::jit::recording::
(context::new_rvalue_from_vector): New method.
(type::get_vector): Update for renaming of memento_of_get_vector.
(class memento_of_get_vector): Rename to...
(class vector_type): ..this.
(memento_of_new_rvalue_from_vector::memento_of_new_rvalue_from_vector):
New ctor.
(memento_of_new_rvalue_from_vector::replay_into): New method.
(memento_of_new_rvalue_from_vector::visit_children): New method.
(memento_of_new_rvalue_from_vector::make_debug_string): New
method.
(memento_of_new_rvalue_from_vector::write_reproducer): New method.
(call::make_debug_string): Split out arg-printing code into ctor
for comma_separated_string.
* jit-recording.h: In namespace gcc::jit::recording::
(context::new_rvalue_from_vector): New method.
(type::dyn_cast_vector_type): New virtual function.
(class memento_of_get_vector): Rename to...
(class vector_type): ...this.
(vector_type::unqualified): Remove this vfunc override in favor
of...
(vector_type::get_element_type): ...this new method.
(vector_type::get_num_units): New method.
(vector_type::dyn_cast_vector_type): New vfunc override.
(class memento_of_new_rvalue_from_vector): New class.
* libgccjit++.h (gccjit::context::new_rvalue): Add overload for
vector of rvalue.
* libgccjit.c (gcc_jit_context_new_binary_op): Strip off type
qualifications when checking that both operands have same type.
(gcc_jit_context_new_rvalue_from_vector): New API entrypoint.
* libgccjit.h
(LIBGCCJIT_HAVE_gcc_jit_context_new_rvalue_from_vector): New
macro.
(gcc_jit_context_new_rvalue_from_vector): New API entrypoint.
* libgccjit.map (LIBGCCJIT_ABI_10): New ABI tag.

gcc/testsuite/ChangeLog:
* jit.dg/test-expressions.c (make_test_of_vectors): New function.
(create_code): Call it.
* jit.dg/test-vector-rvalues.cc: New test case.
---
 gcc/jit/docs/cp/topics/expressions.rst  |  11 ++
 gcc/jit/docs/topics/compatibility.rst   |   8 ++
 gcc/jit/docs/topics/expressions.rst |  24 
 gcc/jit/docs/topics/types.rst   |   3 +
 gcc/jit/jit-common.h|   1 +
 gcc/jit/jit-playback.c  |  16 +++
 gcc/jit/jit-playback.h  |   5 +
 gcc/jit/jit-recording.c | 207 ++-
 gcc/jit/jit-recording.h |  42 +-
 gcc/jit/libgccjit++.h   |  22 +++
 gcc/jit/libgccjit.c |  60 +++-
 gcc/jit/libgccjit.h |  17 +++
 gcc/jit/libgccjit.map   |   5 +
 gcc/testsuite/jit.dg/test-expressions.c |  30 
 gcc/testsuite/jit.dg/test-vector-rvalues.cc | 211 
 15 files changed, 622 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-vector-rvalues.cc

diff --git a/gcc/jit/docs/cp/topics/expressions.rst 
b/gcc/jit/docs/cp/topics/expressions.rst
index 147d065..b0081f6 100644
--- a/gcc/ji

Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Ramana Radhakrishnan
On Wed, Oct 4, 2017 at 2:39 PM, Alexander Monakov  wrote:
> On Wed, 4 Oct 2017, Ramana Radhakrishnan wrote:
>> However we need a scheduler maintainer or global reviewer to please
>> help review this patch or help come up with an alternative patch. A
>> primary platform broken for 5 days with a commit and no public
>> response from the original poster is really not appropriate behaviour
>> in this community. If not, the original patch should have been
>> considered for reverting under the 48 hour rule .
>
> I responded within 30 minutes of the original report by Christophe:
> https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01984.html
>

My apologies.

I tried looking in my theading in my gmail and was surprised not to
see a response and somehow I missed this in the archives.

Ramana


Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Alexander Monakov
On Wed, 4 Oct 2017, Ramana Radhakrishnan wrote:
> However we need a scheduler maintainer or global reviewer to please
> help review this patch or help come up with an alternative patch. A
> primary platform broken for 5 days with a commit and no public
> response from the original poster is really not appropriate behaviour
> in this community. If not, the original patch should have been
> considered for reverting under the 48 hour rule .

I responded within 30 minutes of the original report by Christophe:
https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01984.html

I think Wilco's patch is reasonable. A potentially simpler alternative
to just restore bootstrap is to take only the second hunk of my patch
above (i.e. without removal of autopref_rank_data and associated cleanups).

FWIW in the previous thread for a closely related issue I Cc'ed Maxim and
Kyrill: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01229.html

Alexander


[AArch64] Backport to gcc-7 PR71727 fix -mstrict-align

2017-10-04 Thread Christophe Lyon
Hi,

I've recently committed a follow-up fix for PR71727 for -mstrict-align
on aarch64 (r253242).
I think it would be appropriate to apply it to gcc-7-branch. The patch
from trunk applies cleanly to gcc-7-branch.

Although the original bug was reported against 4.9.4, 5.3.1, 6.1.0,
Naveen's patch was not backported to these branches, so it's not
appropriate to backport my patch there.

OK?

Thanks,

Christophe
2017-09-20  Christophe Lyon  

PR target/71727
gcc/
* config/aarch64/aarch64.c
(aarch64_builtin_support_vector_misalignment): Always return false
when misalignment is unknown.

gcc/testsuite/
* gcc.target/aarch64/pr71727-2.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 799989a..7cc67ec 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11757,19 +11757,9 @@ aarch64_builtin_support_vector_misalignment 
(machine_mode mode,
   if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
 return false;
 
+  /* Misalignment factor is unknown at compile time.  */
   if (misalignment == -1)
-   {
- /* Misalignment factor is unknown at compile time but we know
-it's word aligned.  */
- if (aarch64_simd_vector_alignment_reachable (type, is_packed))
-{
-  int element_size = TREE_INT_CST_LOW (TYPE_SIZE (type));
-
-  if (element_size != 64)
-return true;
-}
- return false;
-   }
+   return false;
 }
   return default_builtin_support_vector_misalignment (mode, type, misalignment,
  is_packed);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr71727-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr71727-2.c
new file mode 100644
index 000..2bc803a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr71727-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mstrict-align -O3" } */
+
+unsigned char foo(const unsigned char *buffer, unsigned int length)
+{
+  unsigned char sum;
+  unsigned int  count;
+
+  for (sum = 0, count = 0; count < length; count++) {
+sum = (unsigned char) (sum + *(buffer + count));
+  }
+
+  return sum;
+}
+
+/* { dg-final { scan-assembler-times "and\tw\[0-9\]+, w\[0-9\]+, 15" 1 } } */


Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Jakub Jelinek
On Wed, Oct 04, 2017 at 01:24:15PM +, Wilco Dijkstra wrote:
> Well my understanding was that it is OK to fix a bootstrap failure. I believe 
> my
> patch is trivial since it mostly removes redundant code. Also I took Jakub's
> review as an OK as there were no technical objections. However since you

I've only commented (off-list) on the ChangeLog nits, not on anything else.

Jakub


Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Wilco Dijkstra
Richard Sandiford wrote:
>
> I don't think it's reasonable to commit this as obvious.  You said
> yourself in the covering message that "it doesn't at all restore
> the original behaviour since we no longer compare the base address".
> So even with the bootstrap failure, I think the patch needed review
> before going in.
>
> Christophe's message doesn't change anything because you knew when you
> posted the patch that it fixed the failure.

Well my understanding was that it is OK to fix a bootstrap failure. I believe my
patch is trivial since it mostly removes redundant code. Also I took Jakub's
review as an OK as there were no technical objections. However since you
seem to disagree, I will revert it.

We have now had 5 days of no builds for a major target, which is a huge
inconvenience. So I don't think it is reasonable to wait any longer.
The alternative is to revert the original patch that caused the bootstrap 
failure
plus the patch(es) that unexpectedly changed the behaviour of the scheduler
(I don't think there was any testing as to what effect those had on the 
schedule).

So the question is who will do that and when?

Wilco

Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Ramana Radhakrishnan
On Wed, Oct 4, 2017 at 12:01 PM, Richard Sandiford
 wrote:
> Wilco Dijkstra  writes:
>> Christophe Lyon wrote:
>>
>>> FWIW, I confirm that this patch fixes the build failure I reported at:
>>> https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01980.html
>>
>> Thanks, now committed as r253399.
>
> I don't think it's reasonable to commit this as obvious.  You said
> yourself in the covering message that "it doesn't at all restore
> the original behaviour since we no longer compare the base address".
> So even with the bootstrap failure, I think the patch needed review
> before going in.

I agree that this patch shouldn't have gone in without review from a
maintainer of the appropriate area regardless of whether this fixes a
bootstrap failure or not.

However we need a scheduler maintainer or global reviewer to please
help review this patch or help come up with an alternative patch. A
primary platform broken for 5 days with a commit and no public
response from the original poster is really not appropriate behaviour
in this community. If not, the original patch should have been
considered for reverting under the 48 hour rule .

regards
Ramana

>
> Christophe's message doesn't change anything because you knew when you
> posted the patch that it fixed the failure.
>
> Richard


Re: [PATCH #2], Define __FP_FAST_FMAF128 on PowerPC ISA 3.0

2017-10-04 Thread Segher Boessenkool
Hi!

On Mon, Oct 02, 2017 at 07:52:50PM -0400, Michael Meissner wrote:
> Whoops, I forgot to attach the patch.

Heh.  The rs6000 parts are of course okay (trivial / obvious, but maybe
you are waiting for an ack).

Thanks,


Segher


[C++ Patch] PR 71946 ("asm in toplevel lambda function rejected")

2017-10-04 Thread Paolo Carlini

Hi,

Andrew noticed that the reason we reject entities like inline-asm or 
GNU's statement-expressions in the lambda body is simply that we don't 
set the parser->in_function_body flag. Thus the below, which appears to 
work perfectly for that. Tested x86_64-linux.


Thanks, Paolo.

///


/cp
2017-10-04  Paolo Carlini  
Andrew Pinski  

PR c++/71946
* parser.c (cp_parser_lambda_body): Set parser->in_function_body.

/testsuite
2017-10-04  Paolo Carlini  
Andrew Pinski  

PR c++/71946
* g++.dg/cpp0x/lambda/lambda-asm1.C: New.
* g++.dg/cpp0x/lambda/lambda-stmtexpr1.C: Likewise.
Index: cp/parser.c
===
--- cp/parser.c (revision 253398)
+++ cp/parser.c (working copy)
@@ -10557,6 +10557,7 @@ cp_parser_lambda_body (cp_parser* parser, tree lam
 {
   bool nested = (current_function_decl != NULL_TREE);
   bool local_variables_forbidden_p = parser->local_variables_forbidden_p;
+  bool in_function_body = parser->in_function_body;
   if (nested)
 push_function_context ();
   else
@@ -10567,6 +10568,7 @@ cp_parser_lambda_body (cp_parser* parser, tree lam
   save_omp_privatization_clauses (omp_privatization_save);
   /* Clear this in case we're in the middle of a default argument.  */
   parser->local_variables_forbidden_p = false;
+  parser->in_function_body = true;
 
   /* Finish the function call operator
  - class_specifier
@@ -10653,6 +10655,7 @@ cp_parser_lambda_body (cp_parser* parser, tree lam
 
   restore_omp_privatization_clauses (omp_privatization_save);
   parser->local_variables_forbidden_p = local_variables_forbidden_p;
+  parser->in_function_body = in_function_body;
   if (nested)
 pop_function_context();
   else
Index: testsuite/g++.dg/cpp0x/lambda/lambda-asm1.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-asm1.C (revision 0)
+++ testsuite/g++.dg/cpp0x/lambda/lambda-asm1.C (working copy)
@@ -0,0 +1,4 @@
+// PR c++/71946
+// { dg-do compile { target c++11 } }
+
+auto test = []{ __asm__ __volatile__ ("" : : "r" (0) ); };
Index: testsuite/g++.dg/cpp0x/lambda/lambda-stmtexpr1.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-stmtexpr1.C(revision 0)
+++ testsuite/g++.dg/cpp0x/lambda/lambda-stmtexpr1.C(working copy)
@@ -0,0 +1,5 @@
+// PR c++/71946
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+auto test = []{ int t = ({ int t1; t1 = 7; t1; }); };


Re: [C++ PATCH] Fix pretty-printing of auto returning functions (PR c++/82373)

2017-10-04 Thread Nathan Sidwell

On 10/04/2017 04:13 AM, Jakub Jelinek wrote:

Hi!

If a function has auto return value that is deducted to return function
pointer (or pointer array?), we pretty print it as e.g.
auto N::foo)(int)
The problem is that if show_return we use fndecl_declared_return_type (t)
for the type prefix, but TREE_TYPE (fntype) for the type suffix,
and if those types aren't the same, we can emit something that syntactically
makes no sense.

Fixed by making sure that type suffix is called on whatever we called type
prefix on.


ok thanks

nathan

--
Nathan Sidwell


Re: [PATCH] Prefer shorter VEX encoding of VP{AND,OR,XOR,ANDN} over EVEX when possible (PR target/82370)

2017-10-04 Thread Uros Bizjak
On Wed, Oct 4, 2017 at 10:33 AM, Jakub Jelinek  wrote:
> Hi!
>
> Most AVX* instructions have the same insn name between VEX and EVEX
> encoded insns and whether it is EVEX or VEX encoded is determined by
> the operands by the assembler (if there is masking/broadcast, or
> %[xy]mm16+ operands are present, or when using 512-bit vectors).
> This is not the case for the logical instruction, we have VEX
> encoded VP{AND,OR,XOR,ANDN} and EVEX encoded VP{AND,OR,XOR,ANDN}{D,Q}.
>
> Right now we use the d or q suffixes for -mavx512vl even when we could
> use VEX encoded insn, which results in larger opcode.
>
> The following patch fixes that, by emitting the suffix only when needed.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-10-04  Jakub Jelinek  
>
> PR target/82370
> * config/i386/sse.md (*andnot3,
> 3, *3): Split
> (=v,v,vm) alternative into (=x,x,xm) and (=v,v,vm), for 128-bit
> and 256-bit vectors, the (=x,x,xm) alternative and when mask is
> not applied use empty suffix even for TARGET_AVX512VL.
> * config/i386/subst.md (mask_prefix3, mask_prefix4): When mask
> is applied, supply evex,evex or evex,evex,evex instead of just
> evex.

LGTM, but Kirill should give the final approval for this patch.

Uros.

> --- gcc/config/i386/sse.md.jj   2017-09-22 20:51:51.0 +0200
> +++ gcc/config/i386/sse.md  2017-10-03 18:25:24.289527626 +0200
> @@ -11568,10 +11568,10 @@ (define_expand "_andnot
>"TARGET_AVX512BW")
>
>  (define_insn "*andnot3"
> -  [(set (match_operand:VI 0 "register_operand" "=x,v")
> +  [(set (match_operand:VI 0 "register_operand" "=x,x,v")
> (and:VI
> - (not:VI (match_operand:VI 1 "register_operand" "0,v"))
> - (match_operand:VI 2 "vector_operand" "xBm,vm")))]
> + (not:VI (match_operand:VI 1 "register_operand" "0,x,v"))
> + (match_operand:VI 2 "vector_operand" "xBm,xm,vm")))]
>"TARGET_SSE"
>  {
>static char buf[64];
> @@ -11606,10 +11606,11 @@ (define_insn "*andnot3"
> case E_V4DImode:
> case E_V4SImode:
> case E_V2DImode:
> - ssesuffix = TARGET_AVX512VL ? "" : "";
> + ssesuffix = (TARGET_AVX512VL && which_alternative == 2
> +  ? "" : "");
>   break;
> default:
> - ssesuffix = TARGET_AVX512VL ? "q" : "";
> + ssesuffix = TARGET_AVX512VL && which_alternative == 2 ? "q" : "";
> }
>break;
>
> @@ -11635,6 +11636,7 @@ (define_insn "*andnot3"
>ops = "%s%s\t{%%2, %%0|%%0, %%2}";
>break;
>  case 1:
> +case 2:
>ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
>break;
>  default:
> @@ -11644,7 +11646,7 @@ (define_insn "*andnot3"
>snprintf (buf, sizeof (buf), ops, tmp, ssesuffix);
>return buf;
>  }
> -  [(set_attr "isa" "noavx,avx")
> +  [(set_attr "isa" "noavx,avx,avx")
> (set_attr "type" "sselog")
> (set (attr "prefix_data16")
>   (if_then_else
> @@ -11652,7 +11654,7 @@ (define_insn "*andnot3"
> (eq_attr "mode" "TI"))
> (const_string "1")
> (const_string "*")))
> -   (set_attr "prefix" "orig,vex")
> +   (set_attr "prefix" "orig,vex,evex")
> (set (attr "mode")
> (cond [(and (match_test " == 16")
> (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
> @@ -11697,10 +11699,10 @@ (define_expand "3"
>  })
>
>  (define_insn "3"
> -  [(set (match_operand:VI48_AVX_AVX512F 0 "register_operand" "=x,v")
> +  [(set (match_operand:VI48_AVX_AVX512F 0 "register_operand" "=x,x,v")
> (any_logic:VI48_AVX_AVX512F
> - (match_operand:VI48_AVX_AVX512F 1 "vector_operand" "%0,v")
> - (match_operand:VI48_AVX_AVX512F 2 "vector_operand" "xBm,vm")))]
> + (match_operand:VI48_AVX_AVX512F 1 "vector_operand" "%0,x,v")
> + (match_operand:VI48_AVX_AVX512F 2 "vector_operand" "xBm,xm,vm")))]
>"TARGET_SSE && 
> && ix86_binary_operator_ok (, mode, operands)"
>  {
> @@ -11730,7 +11732,9 @@ (define_insn "
> case E_V4DImode:
> case E_V4SImode:
> case E_V2DImode:
> - ssesuffix = TARGET_AVX512VL ? "" : "";
> + ssesuffix = (TARGET_AVX512VL
> +  && ( || which_alternative == 2)
> +  ? "" : "");
>   break;
> default:
>   gcc_unreachable ();
> @@ -11759,6 +11763,7 @@ (define_insn "
>  ops = "%s%s\t{%%2, %%0|%%0, %%2}";
>break;
>  case 1:
> +case 2:
>ops = "v%s%s\t{%%2, %%1, %%0|%%0, 
> %%1, %%2}";
>break;
>  default:
> @@ -11768,7 +11773,7 @@ (define_insn "
>snprintf (buf, sizeof (buf), ops, tmp, ssesuffix);
>return buf;
>  }
> -  [(set_attr "isa" "noavx,avx")
> +  [(set_attr "isa" "noavx,avx,avx")
> (set_attr "type" "sselog")
> (set (attr "prefix_data16")
>   (if_then_else
> @@ -11776,7 +11781,7 @@ (define_insn "
> (eq_attr "mode" "TI"))
>  

Re: [committed] Propagate attributes, including optimization and target node, to OMP outlined regions (PR tree-optimization/82374)

2017-10-04 Thread Thomas Schwinge
Hi!

On Wed, 4 Oct 2017 10:01:10 +0200, Jakub Jelinek  wrote:
> This patch propagates attributes, including opt and target nodes, from
> the containing function to the OMP/OACC outlined region functions.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

Hmm, in your testing, why didn't you observe this breaking a few OpenACC
test cases?  Committed to trunk r253402, as obvious:

commit da7a1a683a84d32e9d7be6a5d00925fdfa125430
Author: tschwinge 
Date:   Wed Oct 4 11:13:24 2017 +

Adjust test cases for attributes propagation changes for OMP outlined 
regions

PR tree-optimization/82374
* c-c++-common/goacc/kernels-double-reduction-n.c: Adjust for
attributes propagation changes for OMP outlined regions.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* c-c++-common/goacc/kernels-reduction.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253402 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog   | 8 
 gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c   | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-reduction.c  | 2 +-
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index 2c96f3a..4ac50d0 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,11 @@
+2017-10-04  Thomas Schwinge  
+
+   PR tree-optimization/82374
+   * c-c++-common/goacc/kernels-double-reduction-n.c: Adjust for
+   attributes propagation changes for OMP outlined regions.
+   * c-c++-common/goacc/kernels-double-reduction.c: Likewise.
+   * c-c++-common/goacc/kernels-reduction.c: Likewise.
+
 2017-10-04  Richard Sandiford  
 
PR tree-optimization/82413
diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c 
gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
index 27ea2e9..10b364b 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
@@ -27,7 +27,7 @@ foo (unsigned int n)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 
"parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels 
parallelized, oacc function \\(, , \\), oacc kernels, omp target 
entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels 
parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, 
noclone, noinline\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops1" 
} } */
 
diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c 
gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
index 0841e90..c026346 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
@@ -27,7 +27,7 @@ foo (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 
"parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels 
parallelized, oacc function \\(, , \\), oacc kernels, omp target 
entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels 
parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, 
noclone, noinline\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops1" 
} } */
 
diff --git gcc/testsuite/c-c++-common/goacc/kernels-reduction.c 
gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
index 4a18272..5921b88 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
@@ -26,7 +26,7 @@ foo (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 
"parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels 
parallelized, oacc function \\(, , \\), oacc kernels, omp target 
entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels 
parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, 
noclone, noinline\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */


Grüße
 Thomas


> 201

Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Christophe Lyon wrote:
>
>> FWIW, I confirm that this patch fixes the build failure I reported at:
>> https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01980.html
>
> Thanks, now committed as r253399.

I don't think it's reasonable to commit this as obvious.  You said
yourself in the covering message that "it doesn't at all restore
the original behaviour since we no longer compare the base address".
So even with the bootstrap failure, I think the patch needed review
before going in.

Christophe's message doesn't change anything because you knew when you
posted the patch that it fixed the failure.

Richard


[committed] PR82413: Mismatched precisions in build_range_check

2017-10-04 Thread Richard Sandiford
build_range_check explicitly allows LOW and HIGH to be a different type
from EXP, so we need to use w::to_widest when comparing a value based on
HIGH with a value based on EXP's type.

Tested on powerpc64le-linux-gnu and installed as obvious.

Richard


2017-10-04  Richard Sandiford  

gcc/
PR tree-optimization/82413
* fold-const.c (build_range_check): Use widest_int when comparing
the maximum ETYPE value with HIGH.

gcc/testsuite/
PR tree-optimization/82413
* g++.dg/pr82413.C: New test.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c2017-10-04 11:43:11.653350269 +0100
+++ gcc/fold-const.c2017-10-04 11:43:49.030813918 +0100
@@ -4851,7 +4851,7 @@ build_range_check (location_t loc, tree
 {
   int prec = TYPE_PRECISION (etype);
 
-  if (wi::mask (prec - 1, false, prec) == high)
+  if (wi::mask  (prec - 1, false) == wi::to_widest (high))
{
  if (TYPE_UNSIGNED (etype))
{
Index: gcc/testsuite/g++.dg/pr82413.C
===
--- /dev/null   2017-10-04 10:21:58.939451407 +0100
+++ gcc/testsuite/g++.dg/pr82413.C  2017-10-04 11:43:49.030813918 +0100
@@ -0,0 +1,3 @@
+bool a;
+int b;
+void c() { b &&a <= 0; }


Re: [Patch, fortran] PR77296 - [F03] Compiler Error with allocatable string and associate

2017-10-04 Thread Paul Richard Thomas
Committed as revision 253400.

Thanks for the review and the test.

Paul

On 1 October 2017 at 14:24, Thomas Koenig  wrote:
> Hi Paul,
>
>> The attached patch fixes the PR and most of the remaining, if not all,
>> problems associated with deferred string length targets in the
>> associate construct.
>>
>> Bootstraps and regtests on FC23/x86_64 - OK for trunk?
>
>
> Yes.
>
> Thanks a lot for working on this!
>
> Regards
>
> Thomas



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Wilco Dijkstra
Christophe Lyon wrote:

> FWIW, I confirm that this patch fixes the build failure I reported at:
> https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01980.html

Thanks, now committed as r253399.

Wilco

Re: [PING][PATCH][Aarch64] Improve int<->FP conversions

2017-10-04 Thread James Greenhalgh
On Sun, Oct 01, 2017 at 02:07:57AM +0100, Michael Collison wrote:
> Sorry. Here is the patch.

I think this needs a small amount fo rework in iterators.md - the names
you've used don't follow conventions in that file (e.g. "V" normally has
something to do with vectors) so could do with patching up.


> -(define_insn "_trunc2"
> +(define_insn "_trunc2"
> +  [(set (match_operand:GPI 0 "register_operand" "=?r,w")
> + (FIXUORS:GPI (match_operand: 1 "register_operand" "w,w")))]
> +  "TARGET_FLOAT"
> +  "@
> +   fcvtz\t%0, %1
> +   fcvtz\t%0, %1"
> +  [(set_attr "type" "f_cvtf2i,neon_fp_to_int_s")]
> +)

So the point here is that we need to fork the pattern for two reasons.
Before we were iterating over both floating-point modes as the input to any
integer-modes as output. Because only the same-sized instructions have
vector-register to vector-register forms we need two patterns. One for
same-size, one for cross-size. And one more special pattern for HFmode.

This makes sense to me. A comment explaining why we need the two patterns
would be even easier to read.

This pattern gives us SFmode to SImode and DFmode to DImode.

> +(define_insn "_trunchf2"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> + (FIXUORS:GPI (match_operand:HF 1 "register_operand" "w")))]
> +  "TARGET_FP_F16INST"
> +  "fcvtz\t%0, %h1"
> +  [(set_attr "type" "f_cvtf2i")]

This pattern we need for HFmode to SImode.

> +(define_insn "_trunc2"
>[(set (match_operand:GPI 0 "register_operand" "=r")
> - (FIXUORS:GPI (match_operand:GPF_F16 1 "register_operand" "w")))]
> + (FIXUORS:GPI (match_operand: 1 "register_operand" "w")))]
>"TARGET_FLOAT"
> -  "fcvtz\t%0, %1"
> +  "fcvtz\t%0, %1"
>[(set_attr "type" "f_cvtf2i")]
>  )

And this pattern gives SFmode to DImode and DFmode to SImode.

Comments would definitely help here.

> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index cceb575..166a044 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -391,6 +391,9 @@
>  (define_mode_attr w1 [(HF "w") (SF "w") (DF "x")])
>  (define_mode_attr w2 [(HF "x") (SF "x") (DF "w")])
>  
> +;; For inequal width float to int conversion
> +(define_mode_attr wv [(DI "s") (SI "d")])

Could you invent a more descriptive name for this?

> +
>  (define_mode_attr short_mask [(HI "65535") (QI "255")])
>  
>  ;; For constraints used in scalar immediate vector moves
> @@ -399,6 +402,14 @@
>  ;; For doubling width of an integer mode
>  (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI")])
>  
> +(define_mode_attr vf [(SI "sf") (DI "df")])
> +
> +(define_mode_attr VF [(SI "SF") (DI "DF")])

These two are fcvt_target and FCVT_TARGET ?

> +(define_mode_attr vgp [(SI "df") (DI "sf")])
> +
> +(define_mode_attr VGP [(SI "DF") (DI "SF")])

These names don't make sense to me - V is usually vector and GP sounds
like general purpose. Maybe something like fcvt_change_mode ? Try to be
more descriptive.

> +
>  ;; For scalar usage of vector/FP registers
>  (define_mode_attr v [(QI "b") (HI "h") (SI "s") (DI "d")
>   (HF  "h") (SF "s") (DF "d")
> @@ -432,7 +443,7 @@
>  (define_mode_attr vas [(DI "") (SI ".2s")])
>  
>  ;; Map a floating point mode to the appropriate register name prefix

This comment is out of date after your changes, please update it.

> -(define_mode_attr s [(HF "h") (SF "s") (DF "d")])
> +(define_mode_attr s [(HF "h") (SF "s") (DF "d") (SI "s") (DI "d")])
>  
>  ;; Give the length suffix letter for a sign- or zero-extension.
>  (define_mode_attr size [(QI "b") (HI "h") (SI "w")])


Thanks,
James



[PATCH] Add sanopt support for UBSAN_PTR.

2017-10-04 Thread Martin Liška
Hello.

Following patch adds support for optimizing out unnecessary UBSAN_PTR checks.
It handles separately positive and negative offsets, zero offset is ignored.
Apart from that, we utilize get_inner_reference for local and global variables,
that also helps to reduce some.

Some numbers:

1) postgres:

bloaty /tmp/after2 -- /tmp/before2
 VM SIZE  FILE SIZE
 ++ GROWING++
  [ = ]   0 .debug_abbrev  +1.84Ki  +0.3%

 -- SHRINKING  --
 -30.3% -3.98Mi .text  -3.98Mi -30.3%
  [ = ]   0 .debug_info-3.69Mi -17.2%
  [ = ]   0 .debug_loc -2.02Mi -13.4%
 -43.1% -1.37Mi .data  -1.37Mi -43.1%
  [ = ]   0 .debug_ranges   -390Ki -14.3%
  [ = ]   0 .debug_line -295Ki -11.6%
  -4.0% -26.3Ki .eh_frame  -26.3Ki  -4.0%
  [ = ]   0 [Unmapped] -1.61Ki -62.3%
  [ = ]   0 .strtab-1.15Ki  -0.4%
  [ = ]   0 .symtab-1.08Ki  -0.3%
  -0.4%-368 .eh_frame_hdr -368  -0.4%
  [ = ]   0 .debug_aranges-256  -0.7%
  [DEL] -16 [None]   0  [ = ]

 -28.0% -5.37Mi TOTAL  -11.8Mi -18.8%

Left checks:
261039
Optimized out:
85643

2) tramp3d:

bloaty after -- before
 VM SIZE FILE SIZE
 ++ GROWING   ++
  +167% +30 [Unmapped]+1.01Ki   +39%

 -- SHRINKING --
 -58.5% -2.52Mi .text -2.52Mi -58.5%
 -64.2%  -574Ki .data  -574Ki -64.2%
  -5.7% -4.27Ki .eh_frame -4.27Ki  -5.7%
  -6.4% -1.06Ki .gcc_except_table -1.06Ki  -6.4%
  -7.2%-192 .bss0  [ = ]
  -0.1% -32 .rodata   -32  -0.1%
  [ = ]   0 .strtab   -29  -0.0%
  -1.1% -24 .dynsym   -24  -1.1%
  -1.5% -24 .rela.plt -24  -1.5%
  [ = ]   0 .symtab   -24  -0.1%
  -0.6% -16 .dynstr   -16  -0.6%
  -1.5% -16 .plt  -16  -1.5%
  -1.4%  -8 .got.plt   -8  -1.4%
  -0.6%  -4 .hash  -4  -0.6%
  -1.1%  -2 .gnu.version   -2  -1.1%

 -58.0% -3.09Mi TOTAL -3.08Mi -55.0%

Left checks:
31131
Optimized out:
36752

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2017-10-04  Martin Liska  

* sanopt.c (struct sanopt_tree_couple): New struct.
(struct sanopt_tree_couple_hash): Likewise.
(struct sanopt_ctx): Add ptr_check_map.
(has_dominating_ubsan_ptr_check): New function.
(record_ubsan_ptr_check_stmt): Likewise.
(maybe_optimize_ubsan_ptr_ifn): Likewise.
(sanopt_optimize_walker): Handle IFN_UBSAN_PTR.  Dump info
inline and newly print stmts that are left in code stream.
(pass_sanopt::execute): Handle also SANITIZE_POINTER_OVERFLOW.

gcc/testsuite/ChangeLog:

2017-10-04  Martin Liska  

* c-c++-common/ubsan/ptr-overflow-sanitization-1.c: New test.
---
 gcc/sanopt.c   | 212 -
 .../ubsan/ptr-overflow-sanitization-1.c|  38 
 2 files changed, 244 insertions(+), 6 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/ubsan/ptr-overflow-sanitization-1.c


diff --git a/gcc/sanopt.c b/gcc/sanopt.c
index d17c7db3321..7d4a2029377 100644
--- a/gcc/sanopt.c
+++ b/gcc/sanopt.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfghooks.h"
 #include "tree-dfa.h"
 #include "tree-ssa.h"
+#include "varasm.h"
 
 /* This is used to carry information about basic blocks.  It is
attached to the AUX field of the standard CFG block.  */
@@ -148,6 +149,61 @@ struct sanopt_tree_triplet_hash : typed_noop_remove 
   }
 };
 
+/* Tree couple for ptr_check_map.  */
+struct sanopt_tree_couple
+{
+  tree ptr;
+  bool positive;
+};
+
+/* Traits class for tree triplet hash maps below.  */
+
+struct sanopt_tree_couple_hash : typed_noop_remove 
+{
+  typedef sanopt_tree_couple value_type;
+  typedef sanopt_tree_couple compare_type;
+
+  static inline hashval_t
+  hash (const sanopt_tree_couple &ref)
+  {
+inchash::hash hstate (0);
+inchash::add_expr (ref.ptr, hstate);
+hstate.add_int (ref.positive);
+return hstate.end ();
+  }
+
+  static inline bool
+  equal (const sanopt_tree_couple &ref1, const sanopt_tree_couple &ref2)
+  {
+return operand_equal_p (ref1.ptr, ref2.ptr, 0)
+	   && ref1.positive == ref2.positive;
+  }
+
+  static inline void
+  mark_deleted (sanopt_tree_couple &ref)
+  {
+ref.ptr = reinterpret_cast (1);
+  }
+
+  static inline void
+  mark_empty (sanopt_tree_couple &ref)
+  {
+ref.ptr = NULL;
+  }
+
+  static inline bool
+  is_deleted (const sanopt_tree_couple &ref)
+  {
+return ref.ptr == (void *) 1;
+  }
+
+  static inline bool
+  is_empty (const sanopt_tree_couple &ref)
+  {
+return ref.ptr == NULL;
+  }
+};
+
 /* 

[openacc, testsuite, committed] Fix openacc float reduction testcases

2017-10-04 Thread Tom de Vries

Hi,

I found 3 test-cases in libgomp/testsuite/libgomp.oacc-c-c++-common that 
do a 32-bit floating point reduction on integer values and verify the 
result using an equality test, while the result is larger than the 
largest integer that can be exactly represented in 32-bit fp (16777216). 
 This makes the test-cases sensitive to changes in computation order.


Fixed by modifying the arrays that are being reduced to contain smaller 
values.


Committed.

Thanks,
- Tom
Fix openacc float reduction testcases

2017-10-04  Tom de Vries  

	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
	(main): Reduce sum of arr elements.  Assert that hres is exactly
	representable in 32-bit floating point.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
	(main): Reduce sum of arr elements.  Assert that hres and hmres are
	exactly representable in 32-bit floating point.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c (gwv_np_4): Same.

---
 .../testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c  | 3 ++-
 .../testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c  | 5 -
 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c| 5 -
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
index 8d85fed..6369d7f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
@@ -11,7 +11,7 @@ main (int argc, char *argv[])
   float res = 0, hres = 0;
 
   for (i = 0; i < 32768; i++)
-arr[i] = i;
+arr[i] = i % (32768 / 64);
 
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
 reduction(+:res) copy(res)
@@ -36,6 +36,7 @@ main (int argc, char *argv[])
 	hres += arr[j * 1024 + (1023 - i)];
   }
 
+  assert (hres <= 16777216);
   assert (res == hres);
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
index 1904b4a..140c322 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
@@ -11,7 +11,7 @@ main (int argc, char *argv[])
   float res = 0, mres = 0, hres = 0, hmres = 0;
 
   for (i = 0; i < 32768; i++)
-arr[i] = i;
+arr[i] = i % (32768 / 64);
 
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
 reduction(+:res) reduction(max:mres) copy(res, mres)
@@ -48,7 +48,10 @@ main (int argc, char *argv[])
 	  hmres = arr[j * 1024 + (1023 - i)];
   }
 
+  assert (hres <= 16777216);
   assert (res == hres);
+
+  assert (hmres <= 16777216);
   assert (mres == hmres);
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
index cc3cd07..c4940b8 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
@@ -183,7 +183,7 @@ void gwv_np_4()
   float res = 0, mres = 0, hres = 0, hmres = 0;
 
   for (i = 0; i < 32768; i++)
-arr[i] = i;
+arr[i] = i % (32768 / 64);
 
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32)
   {
@@ -219,7 +219,10 @@ void gwv_np_4()
 	  hmres = arr[j * 1024 + (1023 - i)];
   }
 
+  assert (hres <= 16777216);
   assert (res == hres);
+
+  assert (hmres <= 16777216);
   assert (mres == hmres);
 }
 


Re: [PATCH, ARM] fix .cfi inconsistency out of builtin_eh_return

2017-10-04 Thread Olivier Hainque
Hello Ramana,

> On Oct 4, 2017, at 10:21 , Ramana Radhakrishnan  
> wrote:

> This is OK .

Great :-)

> Sorry about the delay - I've been traveling a bit.

No worries, thanks for the review!

With Kind Regards,

Olivier



[PATCH] Prefer shorter VEX encoding of VP{AND,OR,XOR,ANDN} over EVEX when possible (PR target/82370)

2017-10-04 Thread Jakub Jelinek
Hi!

Most AVX* instructions have the same insn name between VEX and EVEX
encoded insns and whether it is EVEX or VEX encoded is determined by
the operands by the assembler (if there is masking/broadcast, or
%[xy]mm16+ operands are present, or when using 512-bit vectors).
This is not the case for the logical instruction, we have VEX
encoded VP{AND,OR,XOR,ANDN} and EVEX encoded VP{AND,OR,XOR,ANDN}{D,Q}.

Right now we use the d or q suffixes for -mavx512vl even when we could
use VEX encoded insn, which results in larger opcode.

The following patch fixes that, by emitting the suffix only when needed.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-04  Jakub Jelinek  

PR target/82370
* config/i386/sse.md (*andnot3,
3, *3): Split
(=v,v,vm) alternative into (=x,x,xm) and (=v,v,vm), for 128-bit
and 256-bit vectors, the (=x,x,xm) alternative and when mask is
not applied use empty suffix even for TARGET_AVX512VL.
* config/i386/subst.md (mask_prefix3, mask_prefix4): When mask
is applied, supply evex,evex or evex,evex,evex instead of just
evex.

--- gcc/config/i386/sse.md.jj   2017-09-22 20:51:51.0 +0200
+++ gcc/config/i386/sse.md  2017-10-03 18:25:24.289527626 +0200
@@ -11568,10 +11568,10 @@ (define_expand "_andnot
   "TARGET_AVX512BW")
 
 (define_insn "*andnot3"
-  [(set (match_operand:VI 0 "register_operand" "=x,v")
+  [(set (match_operand:VI 0 "register_operand" "=x,x,v")
(and:VI
- (not:VI (match_operand:VI 1 "register_operand" "0,v"))
- (match_operand:VI 2 "vector_operand" "xBm,vm")))]
+ (not:VI (match_operand:VI 1 "register_operand" "0,x,v"))
+ (match_operand:VI 2 "vector_operand" "xBm,xm,vm")))]
   "TARGET_SSE"
 {
   static char buf[64];
@@ -11606,10 +11606,11 @@ (define_insn "*andnot3"
case E_V4DImode:
case E_V4SImode:
case E_V2DImode:
- ssesuffix = TARGET_AVX512VL ? "" : "";
+ ssesuffix = (TARGET_AVX512VL && which_alternative == 2
+  ? "" : "");
  break;
default:
- ssesuffix = TARGET_AVX512VL ? "q" : "";
+ ssesuffix = TARGET_AVX512VL && which_alternative == 2 ? "q" : "";
}
   break;
 
@@ -11635,6 +11636,7 @@ (define_insn "*andnot3"
   ops = "%s%s\t{%%2, %%0|%%0, %%2}";
   break;
 case 1:
+case 2:
   ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
   break;
 default:
@@ -11644,7 +11646,7 @@ (define_insn "*andnot3"
   snprintf (buf, sizeof (buf), ops, tmp, ssesuffix);
   return buf;
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx")
(set_attr "type" "sselog")
(set (attr "prefix_data16")
  (if_then_else
@@ -11652,7 +11654,7 @@ (define_insn "*andnot3"
(eq_attr "mode" "TI"))
(const_string "1")
(const_string "*")))
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,vex,evex")
(set (attr "mode")
(cond [(and (match_test " == 16")
(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
@@ -11697,10 +11699,10 @@ (define_expand "3"
 })
 
 (define_insn "3"
-  [(set (match_operand:VI48_AVX_AVX512F 0 "register_operand" "=x,v")
+  [(set (match_operand:VI48_AVX_AVX512F 0 "register_operand" "=x,x,v")
(any_logic:VI48_AVX_AVX512F
- (match_operand:VI48_AVX_AVX512F 1 "vector_operand" "%0,v")
- (match_operand:VI48_AVX_AVX512F 2 "vector_operand" "xBm,vm")))]
+ (match_operand:VI48_AVX_AVX512F 1 "vector_operand" "%0,x,v")
+ (match_operand:VI48_AVX_AVX512F 2 "vector_operand" "xBm,xm,vm")))]
   "TARGET_SSE && 
&& ix86_binary_operator_ok (, mode, operands)"
 {
@@ -11730,7 +11732,9 @@ (define_insn "
case E_V4DImode:
case E_V4SImode:
case E_V2DImode:
- ssesuffix = TARGET_AVX512VL ? "" : "";
+ ssesuffix = (TARGET_AVX512VL
+  && ( || which_alternative == 2)
+  ? "" : "");
  break;
default:
  gcc_unreachable ();
@@ -11759,6 +11763,7 @@ (define_insn "
 ops = "%s%s\t{%%2, %%0|%%0, %%2}";
   break;
 case 1:
+case 2:
   ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, 
%%2}";
   break;
 default:
@@ -11768,7 +11773,7 @@ (define_insn "
   snprintf (buf, sizeof (buf), ops, tmp, ssesuffix);
   return buf;
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx")
(set_attr "type" "sselog")
(set (attr "prefix_data16")
  (if_then_else
@@ -11776,7 +11781,7 @@ (define_insn "
(eq_attr "mode" "TI"))
(const_string "1")
(const_string "*")))
-   (set_attr "prefix" "")
+   (set_attr "prefix" ",evex")
(set (attr "mode")
(cond [(and (match_test " == 16")
(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
@@ -11795,10 +11800,10 @@ (define_insn "
  (const_string "")))])
 
 (define_insn "*3"
-  [(set (match_operan

Re: [PATCH, ARM] fix .cfi inconsistency out of builtin_eh_return

2017-10-04 Thread Ramana Radhakrishnan
On Thu, Aug 3, 2017 at 10:02 AM, Olivier Hainque  wrote:
> Hello,
>
> Activating dwarf2 based eh for real on VxWorks6 (patch to come) triggers a
> libgcc build failure, where most functions resorting to __builtin_eh_return
> fail this check from maybe_record_trace_start in dwarf2cfi.c:
>
>  /* We ought to have the same state incoming to a given trace no
>  matter how we arrive at the trace.  Anything else means we've
>  got some kind of optimization error.  */
>  #if CHECKING_P
> if (!cfi_row_equal_p (cur_row, ti->beg_row))
> ...
>
> The inconsistency is introduced by a store inserted in the middle of the insn
> stream by arm_set_return_address for __builtin_eh_return, marked FRAME_RELATED
> with:
>
>  /* The store needs to be marked as frame related in order to prevent
> DSE from deleting it as dead if it is based on fp.  */
>  rtx insn = emit_move_insn (gen_frame_mem (Pmode, addr), source);
>  RTX_FRAME_RELATED_P (insn) = 1;
>  add_reg_note (insn, REG_CFA_RESTORE, gen_rtx_REG (Pmode, LR_REGNUM));
>
> The FRAME_RELATED ness was setup to fixed DWARF2 unwinding at
> the time already, indeed broken by DSE removing the store - see
> https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01669.html.
>
> The attached patch is a proposal to fix this by setting MEM_VOLATILE_P on
> the store destination mem instead of setting RTX_FRAME_RELATED_P on the insn,
> which alleviates the .cfi complications and is as effective in preventing
> the store removal by DSE, from:
>
>   /* Assuming that there are sets in these insns, we cannot delete
>  them.  */
>   if ((GET_CODE (PATTERN (insn)) == CLOBBER)
>   || volatile_refs_p (PATTERN (insn))
>   || (!cfun->can_delete_dead_exceptions && !insn_nothrow_p (insn))
>   || (RTX_FRAME_RELATED_P (insn))
>   || find_reg_note (insn, REG_FRAME_RELATED_EXPR, NULL_RTX))
> insn_info->cannot_delete = true;
>
> For testing, I verified that
>
> - an arm-wrs-vxworks build with the extra patch to activate DWARF2
>   eh proceeds to completion on mainline,
>
> - the same symptoms were visible and cured on our in-house
>   gcc-7 based toolchain, and that
>
> - ACATS are clean for Ada in this configuration after the patch,
>   confirming proper behavior of the DWARF2 exception propagation
>   circuitry.
>
> OK to commit ?
>

This is OK . Sorry about the delay - I've been traveling a bit.

Thanks
Ramana

> Thanks much in advance for your feedback,
>
> With Kind Regards,
>
> Olivier
>


Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Christophe Lyon
On 3 October 2017 at 18:34, Wilco Dijkstra  wrote:
> r253236 broke AArch64 bootstrap. Earlier revision r253071 changed scheduling
> behaviour on AArch64 as autopref scheduling no longer checks the base.
>
> This patch fixes the bootstrap failure and cleans up autopref scheduling.
> The code is greatly simplified.  Sort accesses on the offset first, and
> only if the offsets are the same fall back to other comparisons in
> rank_for_schedule.  This doesn't at all restore the original behaviour
> since we no longer compare the base address, but it now defines a total
> sorting order.  More work will be required to improve the sorting so
> that only loads/stores with the same base are affected.
>
> AArch64 bootstrap completes. I'd like to commit this ASAP as the AArch64
> bootstrap has been broken for days now.
>
> ChangeLog:
> 2017-10-03  Wilco Dijkstra  
>
> PR rtl-optimization/82396
> * gcc/haifa-sched.c (autopref_multipass_init): Simplify
> initialization.
> (autopref_rank_data): Simplify sort order.
> * gcc/sched-int.h (autopref_multipass_data_): Remove
> multi_mem_insn_p, min_offset and max_offset.
> --
>

Hi,

FWIW, I confirm that this patch fixes the build failure I reported at:
https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01980.html

Thanks,

Christophe

> diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
> index 
> 549e8961411ecd0a04ac3b24ba78b5d53e63258a..77ba8c5292a0801bbbcb35ed61ab3040015f6677
>  100644
> --- a/gcc/haifa-sched.c
> +++ b/gcc/haifa-sched.c
> @@ -5568,9 +5568,7 @@ autopref_multipass_init (const rtx_insn *insn, int 
> write)
>
>gcc_assert (data->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED);
>data->base = NULL_RTX;
> -  data->min_offset = 0;
> -  data->max_offset = 0;
> -  data->multi_mem_insn_p = false;
> +  data->offset = 0;
>/* Set insn entry initialized, but not relevant for auto-prefetcher.  */
>data->status = AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
>
> @@ -5585,10 +5583,9 @@ autopref_multipass_init (const rtx_insn *insn, int 
> write)
>  {
>int n_elems = XVECLEN (pat, 0);
>
> -  int i = 0;
> -  rtx prev_base = NULL_RTX;
> -  int min_offset = 0;
> -  int max_offset = 0;
> +  int i, offset;
> +  rtx base, prev_base = NULL_RTX;
> +  int min_offset = INT_MAX;
>
>for (i = 0; i < n_elems; i++)
> {
> @@ -5596,38 +5593,23 @@ autopref_multipass_init (const rtx_insn *insn, int 
> write)
>   if (GET_CODE (set) != SET)
> return;
>
> - rtx base = NULL_RTX;
> - int offset = 0;
>   if (!analyze_set_insn_for_autopref (set, write, &base, &offset))
> return;
>
> - if (i == 0)
> -   {
> - prev_base = base;
> - min_offset = offset;
> - max_offset = offset;
> -   }
>   /* Ensure that all memory operations in the PARALLEL use the same
>  base register.  */
> - else if (REGNO (base) != REGNO (prev_base))
> + if (i > 0 && REGNO (base) != REGNO (prev_base))
> return;
> - else
> -   {
> - min_offset = MIN (min_offset, offset);
> - max_offset = MAX (max_offset, offset);
> -   }
> + prev_base = base;
> + min_offset = MIN (min_offset, offset);
> }
>
> -  /* If we reached here then we have a valid PARALLEL of multiple memory
> -ops with prev_base as the base and min_offset and max_offset
> -containing the offsets range.  */
> +  /* If we reached here then we have a valid PARALLEL of multiple memory 
> ops
> +with prev_base as the base and min_offset containing the offset.  */
>gcc_assert (prev_base);
>data->base = prev_base;
> -  data->min_offset = min_offset;
> -  data->max_offset = max_offset;
> -  data->multi_mem_insn_p = true;
> +  data->offset = min_offset;
>data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
> -
>return;
>  }
>
> @@ -5637,7 +5619,7 @@ autopref_multipass_init (const rtx_insn *insn, int 
> write)
>  return;
>
>if (!analyze_set_insn_for_autopref (set, write, &data->base,
> -  &data->min_offset))
> +  &data->offset))
>  return;
>
>/* This insn is relevant for the auto-prefetcher.
> @@ -5646,63 +5628,6 @@ autopref_multipass_init (const rtx_insn *insn, int 
> write)
>data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
>  }
>
> -
> -/* Helper for autopref_rank_for_schedule.  Given the data of two
> -   insns relevant to the auto-prefetcher modelling code DATA1 and DATA2
> -   return their comparison result.  Return 0 if there is no sensible
> -   ranking order for the two insns.  */
> -
> -static int
> -autopref_rank_data (autopref_multipass_data_t data1,
> -autopref_multipass_data_t data2)
> -{
> -  /* Simple case when both insns are simple single memory ops.

[C++ PATCH] Fix pretty-printing of auto returning functions (PR c++/82373)

2017-10-04 Thread Jakub Jelinek
Hi!

If a function has auto return value that is deducted to return function
pointer (or pointer array?), we pretty print it as e.g.
auto N::foo)(int)
The problem is that if show_return we use fndecl_declared_return_type (t)
for the type prefix, but TREE_TYPE (fntype) for the type suffix,
and if those types aren't the same, we can emit something that syntactically
makes no sense.

Fixed by making sure that type suffix is called on whatever we called type
prefix on.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-04  Jakub Jelinek  

PR c++/82373
* error.c (dump_function_decl): If show_return, call dump_type_suffix
on the same return type dump_type_prefix has been called on.

* g++.dg/cpp1y/pr82373.C: New test.

--- gcc/cp/error.c.jj   2017-09-01 09:26:24.0 +0200
+++ gcc/cp/error.c  2017-10-03 15:17:34.399489001 +0200
@@ -1574,6 +1574,7 @@ dump_function_decl (cxx_pretty_printer *
   int do_outer_scope = ! (flags & TFF_UNQUALIFIED_NAME);
   tree exceptions;
   bool constexpr_p;
+  tree ret = NULL_TREE;
 
   flags &= ~(TFF_UNQUALIFIED_NAME | TFF_TEMPLATE_NAME);
   if (TREE_CODE (t) == TEMPLATE_DECL)
@@ -1636,7 +1637,7 @@ dump_function_decl (cxx_pretty_printer *
   && !DECL_DESTRUCTOR_P (t) && !deduction_guide_p (t));
   if (show_return)
 {
-  tree ret = fndecl_declared_return_type (t);
+  ret = fndecl_declared_return_type (t);
   dump_type_prefix (pp, ret, flags);
 }
 
@@ -1677,7 +1678,7 @@ dump_function_decl (cxx_pretty_printer *
}
 
   if (show_return)
-   dump_type_suffix (pp, TREE_TYPE (fntype), flags);
+   dump_type_suffix (pp, ret, flags);
   else if (deduction_guide_p (t))
{
  pp_cxx_ws_string (pp, "->");
--- gcc/testsuite/g++.dg/cpp1y/pr82373.C.jj 2017-10-03 15:52:23.486162763 
+0200
+++ gcc/testsuite/g++.dg/cpp1y/pr82373.C2017-10-03 15:54:34.751576786 
+0200
@@ -0,0 +1,20 @@
+// PR c++/82373
+// { dg-do compile { target c++14 } }
+
+namespace N
+{
+  int (*fp)(int);
+  auto foo(int a)  // { dg-message "In function 'auto N::foo\\(int\\)'" "" 
{ target *-*-* } 0 }
+  {
+if (a)
+  return fp;
+return nullptr;// { dg-error "inconsistent deduction for auto return 
type: 'int \\(\\*\\)\\(int\\)' and then 'std::nullptr_t'" } */
+  }
+}
+int (*fp2)(int);
+auto bar(int a)// { dg-message "In function 'auto 
bar\\(int\\)'" "" { target *-*-* } 0 }
+{
+  if (a)
+return fp2;
+  return nullptr;  // { dg-error "inconsistent deduction for auto return 
type: 'int \\(\\*\\)\\(int\\)' and then 'std::nullptr_t'" } */
+}

Jakub


[committed] Propagate attributes, including optimization and target node, to OMP outlined regions (PR tree-optimization/82374)

2017-10-04 Thread Jakub Jelinek
Hi!

This patch propagates attributes, including opt and target nodes, from
the containing function to the OMP/OACC outlined region functions.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2017-10-04  Jakub Jelinek  

PR tree-optimization/82374
* omp-low.c (create_omp_child_function): Copy DECL_ATTRIBUTES,
DECL_FUNCTION_SPECIFIC_OPTIMIZATION,
DECL_FUNCTION_SPECIFIC_TARGET and DECL_FUNCTION_VERSIONED from
current_function_decl to the new decl.

* gcc.dg/gomp/pr82374.c: New test.

--- gcc/omp-low.c.jj2017-09-05 23:32:02.0 +0200
+++ gcc/omp-low.c   2017-10-03 12:25:13.956261522 +0200
@@ -1626,6 +1626,14 @@ create_omp_child_function (omp_context *
   DECL_CONTEXT (decl) = NULL_TREE;
   DECL_INITIAL (decl) = make_node (BLOCK);
   BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_ATTRIBUTES (decl) = DECL_ATTRIBUTES (current_function_decl);
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
+= DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl);
+  DECL_FUNCTION_SPECIFIC_TARGET (decl)
+= DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl);
+  DECL_FUNCTION_VERSIONED (decl)
+= DECL_FUNCTION_VERSIONED (current_function_decl);
+
   if (omp_maybe_offloaded_ctx (ctx))
 {
   cgraph_node::get_create (decl)->offloadable = 1;
--- gcc/testsuite/gcc.dg/gomp/pr82374.c.jj  2017-10-03 13:09:15.515879118 
+0200
+++ gcc/testsuite/gcc.dg/gomp/pr82374.c 2017-10-03 13:08:56.0 +0200
@@ -0,0 +1,31 @@
+/* PR tree-optimization/82374 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-vectorize -fdump-tree-vect-details" } */
+/* { dg-additional-options "-mavx -mno-avx2" { target i?86-*-* x86_64-*-* } } 
*/
+/* { dg-additional-options "-mvsx" { target powerpc_vsx_ok } } */
+
+#define SIZE (1024 * 1024 * 1024)
+
+float a[SIZE];
+float b[SIZE];
+float c[SIZE];
+float d[SIZE];
+
+__attribute__((optimize ("O2", "tree-vectorize"))) void
+foo (void)
+{
+  int i;
+#pragma omp parallel for
+  for (i = 0; i < SIZE; i++)
+c[i] = a[i] + b[i];
+}
+
+__attribute__((optimize ("O2", "tree-vectorize"))) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < SIZE; i++)
+d[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { 
{ i?86-*-* x86_64-*-* } || { powerpc_vsx_ok } } } } } */

Jakub


[PING#3, Makefile] improve libsubdir variable transmission to sub-makes on Windows

2017-10-04 Thread Olivier Hainque
Hello,

Ping #3 for https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00017.html

please.

Took the liberty to cc a maintainer.

Thanks much in advance!

With Kind Regards,

Olivier

> On Sep 25, 2017, at 08:49 , Olivier Hainque  wrote:
> 
> Hello,
> 
> Ping #2 for https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00017.html
> 
>> Makefile.in:
>> ...
>> export libsubdir
>> 
>> This is not working well on cygwin environments where environment
>> variable names are translated to uppercase (so sub-makes evaluating
>> the variable with the lowercase name don't get the value).
> 
>> 2017-09-01  Jerome Lambourg  
>> 
>>  * Makefile.in (FLAGS_TO_PASS): Add libsubdir.
> 
> Thanks much in advance!
> 
> With Kind Regards,
> 
> Olivier
> 



[PING#2, ARM] fix .cfi inconsistency out of builtin_eh_return

2017-10-04 Thread Olivier Hainque
Hello,

Ping # 2 for https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00271.html

   * config/arm/arm.c (arm_set_return_address): Use MEM_VOLATILE_P
   on the target mem instead of RTX_FRAME_RELATED_P on the insn to
   prevent DSE.
   (thumb_set_return_address): Likewise.

This one fixes ICEs during the build of libgcc with dwarf2 unwind info
activated. The explanation message is longish but the patch is short.

Thanks in advance,

With Kind Regards,

Olivier



Re: [PING**3, Makefile] improve default cpu selection for e500v2

2017-10-04 Thread Olivier Hainque
Hello,

Ping # 3  for https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01783.html
please.

Took the liberty to cc a maintainer.

Thanks much in advance!

With Kind Regards,

Olivier

> On Sep 25, 2017, at 08:51 , Olivier Hainque  wrote:
> 
> Hello,
> 
> Ping #2 for https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01783.html
> 
>> config.gcc already has a provision for a good default
>> cpu selection for SPE with double precision floats
>> when the latter is explicitly requested with an explicit
>> --enable command line option.
> ...
>> The attached patch is a proposal to refine this to select 8548 also
>> when we were configured for an e500v2 target cpu (canonicalized to
>> match powerpc-*spe), regardless of enable_e500_double.
> 
>> 2017-08-31  Olivier Hainque  
>> 
>>  * gcc/config.gcc (powerpc*-*-*spe*): Pick 8548 as the
>>  default with_cpu for an e500v2 target cpu name, in addition
>>  to --enable-e500-double.
> 
> 
> Thanks much in advance!
> 
> With Kind Regards,
> 
> Olivier
> 



[PATCH][GRAPHITE] Adjust CASE_CONVERT in extract_affine

2017-10-04 Thread Richard Biener

While my last change involving signed types was correct it wasn't optimal.
We can avoid the modulo constraints if the conversion is widening
(thus all values fit in the new type).

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok?

Thanks,
Richard.

2017-10-04  Richard Biener  

* graphite-sese-to-poly.c (extract_affine): For casts increasing
precision do not perform modulo reduction.

Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 253336)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -299,11 +299,18 @@ extract_affine (scop_p s, tree e, __isl_
   return res;
 
 CASE_CONVERT:
-  res = extract_affine (s, TREE_OPERAND (e, 0), space);
-  /* signed values, even if overflow is undefined, get modulo-reduced.  */
-  if (! TYPE_UNSIGNED (type))
-   res = wrap (res, TYPE_PRECISION (type) - 1);
-  break;
+  {
+   tree itype = TREE_TYPE (TREE_OPERAND (e, 0));
+   res = extract_affine (s, TREE_OPERAND (e, 0), space);
+   /* Signed values, even if overflow is undefined, get modulo-reduced.
+  But only if not all values of the old type fit in the new.  */
+   if (! TYPE_UNSIGNED (type)
+   && ((TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (e, 0)))
+&& TYPE_PRECISION (type) <= TYPE_PRECISION (itype))
+   || TYPE_PRECISION (type) < TYPE_PRECISION (itype)))
+ res = wrap (res, TYPE_PRECISION (type) - 1);
+   break;
+  }
 
 case NON_LVALUE_EXPR:
   res = extract_affine (s, TREE_OPERAND (e, 0), space);



Re: [PATCH][GRAPHITE] Test for code generation errors

2017-10-04 Thread Richard Biener
uOn Tue, 3 Oct 2017, Rainer Orth wrote:

> Hi Richard,
> 
> > What ISL Versions are affected? 
> 
> it's 0.18.
> 
> >>Besides, there's
> >>
> >>+UNRESOLVED: gfortran.dg/graphite/pr42393-1.f90   -O  
> >>scan-tree-dump-times graphite "code generation error" 1
> >>
> >>for both 32 and 64-bit.  The log indicates
> >>
> >>gfortran.dg/graphite/pr42393-1.f90   -O  : dump file does not exist
> >>
> >>and indeed the test lacks -fdump-tree-graphite-details contrary to
> >>ChangeLog.
> >
> > Oops, I must have missed that one. Will fix tomorrow. 
> 
> Fine, thanks.

Fixed as follows, tested on x86_64-unknown-linux-gnu.

Richard.

2017-10-04  Richard Biener  

* gfortran.dg/graphite/id-17.f: For ilp32 allow graphite codegen
errors and scan for one.
* gfortran.dg/graphite/id-19.f: Likewise.
* gfortran.dg/graphite/pr29832.f90: Likewise.
* gfortran.dg/graphite/pr42326-1.f90: Likewise.
* gfortran.dg/graphite/pr42326.f90: Likewise.
* gfortran.dg/graphite/pr68550-2.f90: Likewise.
* gfortran.dg/graphite/run-id-2.f90: Likewise.
* gfortran.dg/graphite/run-id-3.f90: Likewise.
* gfortran.dg/graphite/pr42393-1.f90: Dump graphite.


Index: gcc/testsuite/gfortran.dg/graphite/id-17.f
===
--- gcc/testsuite/gfortran.dg/graphite/id-17.f  (revision 253393)
+++ gcc/testsuite/gfortran.dg/graphite/id-17.f  (working copy)
@@ -1,3 +1,4 @@
+! { dg-additional-options "-fdump-tree-graphite-details --param 
graphite-allow-codegen-errors=1" { target ilp32 } }
   SUBROUTINE SPECTOP(Dr,N)
   DIMENSION d1(0:32,0:32) , Dr(0:32,0:32) , x(0:32)
   DO k = 0 , N
@@ -14,3 +15,4 @@
  ENDDO
   ENDDO
   END
+! { dg-final { scan-tree-dump-times "code generation error" 1 " graphite" { 
target ilp32 } } }
Index: gcc/testsuite/gfortran.dg/graphite/id-19.f
===
--- gcc/testsuite/gfortran.dg/graphite/id-19.f  (revision 253393)
+++ gcc/testsuite/gfortran.dg/graphite/id-19.f  (working copy)
@@ -1,3 +1,4 @@
+! { dg-additional-options "-fdump-tree-graphite-details --param 
graphite-allow-codegen-errors=1" { target ilp32 } }
   SUBROUTINE ECCODR(FPQR)
   DIMENSION FPQR(25,25,25)
   INTEGER P,Q,R
@@ -13,3 +14,4 @@
   140QM2= QM2+TWO
   150 PM2= PM2+TWO
   END
+! { dg-final { scan-tree-dump-times "code generation error" 1 " graphite" { 
target ilp32 } } }
Index: gcc/testsuite/gfortran.dg/graphite/pr29832.f90
===
--- gcc/testsuite/gfortran.dg/graphite/pr29832.f90  (revision 253393)
+++ gcc/testsuite/gfortran.dg/graphite/pr29832.f90  (working copy)
@@ -1,5 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-O2 -ftree-loop-linear" }
+! { dg-additional-options "-fdump-tree-graphite-details --param 
graphite-allow-codegen-errors=1" { target ilp32 } }
 
 ! Program to test the scalarizer
 program testarray
@@ -24,3 +25,4 @@ program testarray
end do
 end program
 
+! { dg-final { scan-tree-dump-times "code generation error" 1 " graphite" { 
target ilp32 } } }
Index: gcc/testsuite/gfortran.dg/graphite/pr42326-1.f90
===
--- gcc/testsuite/gfortran.dg/graphite/pr42326-1.f90(revision 253393)
+++ gcc/testsuite/gfortran.dg/graphite/pr42326-1.f90(working copy)
@@ -1,7 +1,7 @@
 ! { dg-do compile { target i?86-*-* x86_64-*-* } }
 ! { dg-require-effective-target ilp32 }
 ! { dg-require-effective-target sse2 }
-! { dg-options "-O2 -floop-parallelize-all -fprefetch-loop-arrays -msse2" }
+! { dg-options "-O2 -floop-parallelize-all -fprefetch-loop-arrays -msse2 
-fdump-tree-graphite-details --param graphite-allow-codegen-errors=1" }
 
 subroutine phasad(t,i,ium)
   implicit none
@@ -17,3 +17,4 @@ subroutine phasad(t,i,ium)
   return
 end subroutine phasad
 
+! { dg-final { scan-tree-dump-times "code generation error" 1 " graphite" } }
Index: gcc/testsuite/gfortran.dg/graphite/pr42326.f90
===
--- gcc/testsuite/gfortran.dg/graphite/pr42326.f90  (revision 253393)
+++ gcc/testsuite/gfortran.dg/graphite/pr42326.f90  (working copy)
@@ -1,7 +1,7 @@
 ! { dg-do compile { target i?86-*-* x86_64-*-* } }
 ! { dg-require-effective-target ilp32 }
 ! { dg-require-effective-target sse2 }
-! { dg-options "-O2 -floop-strip-mine -fprefetch-loop-arrays -msse2" }
+! { dg-options "-O2 -floop-strip-mine -fprefetch-loop-arrays -msse2 
-fdump-tree-graphite-details --param graphite-allow-codegen-errors=1" }
 
 subroutine blts ( ldmx, ldmy, v, tmp1, i, j, k)
   implicit none
@@ -34,3 +34,4 @@ subroutine phasad(t,i,ium)
   return
 end subroutine phasad
 
+! { dg-final { scan-tree-dump-times "code generation error" 1 " graphite" } }
Index: gcc/testsuite/gfortran.dg/graphite/pr42393-1.f90
===
--- gcc/te