date:20170131

Re: [PATCH] Fix gcc-5-branch build with libc++

2017-01-31 Thread Dimitry Andric

On 01 Feb 2017, at 01:23, David Edelsohn  wrote:
...
>> In trunk r235362, most gcc C++ sources were updated to define
>> INCLUDE_xxx macros before including gcc/system.h, which fixes the
>> incompatibility with libc++.  However, this revision is most likely too
>> disruptive to backport to the gcc-5-branch.
> 
> As discussed in the thread to define INCLUDE_xxx macros, no headers
> may be included before GCC system.h because it breaks the GCC build on
> some systems.

Ah, sorry about that.  I was completely unaware about this thread.


> Anyone who suggested to the FreeBSD community that it would be
> acceptable to include C++ headers early was severely mistaken.  Sorry.

There was no suggestion, I'm the only one to blame. :)  That said, for
the FreeBSD port this patch doesn't seem to cause any further trouble.
So Gerald has accepted it as a stopgap measure for fixing the port.

-Dimitry



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [PATCH] Actually fix libhsail-rt build on x86_64/i?86 32-bit (take 2)

2017-01-31 Thread Pekka Jääskeläinen

Hi Jakub,

Thanks for the fix. I've tested this patch with the HSA PRM suite and
no bri{ck,g}ing,
so please go ahead and commit.

BR,
Pekka



On Mon, Jan 30, 2017 at 10:17 PM, Jakub Jelinek  wrote:
> On Mon, Jan 30, 2017 at 08:53:18PM +0100, Bernhard Reutner-Fischer wrote:
>> As this will likely bri{ck,g} horribly I'll leave these to Martin and Pekka.
>> Thanks for fixing but note that you attached the wrong patch below.
>
> Oops, here is the right one (bootstrapped/regtested on x86_64-linux and
> i686-linux again):
>
> 2017-01-30  Jakub Jelinek  
>
> * configure.tgt: Fix i?86-*-linux* entry.
> * rt/sat_arithmetic.c (__hsail_sat_add_u32, __hsail_sat_add_u64,
> __hsail_sat_add_s32, __hsail_sat_add_s64): Use __builtin_add_overflow.
> (__hsail_sat_sub_u8, __hsail_sat_sub_u16): Remove pointless for 
> overflow
> over maximum.
> (__hsail_sat_sub_u32, __hsail_sat_sub_u64, __hsail_sat_sub_s32,
> __hsail_sat_sub_s64): Use __builtin_sub_overflow.
> (__hsail_sat_mul_u32, __hsail_sat_mul_u64, __hsail_sat_mul_s32,
> __hsail_sat_mul_s64): Use __builtin_mul_overflow.
> * rt/arithmetic.c (__hsail_borrow_u32, __hsail_borrow_u64): Use
> __builtin_sub_overflow_p.
> (__hsail_carry_u32, __hsail_carry_u64): Use __builtin_add_overflow_p.
> * rt/misc.c (__hsail_groupbaseptr, __hsail_kernargbaseptr_u64):
> Cast pointers to uintptr_t first before casting to some other integral
> type.
> * rt/segment.c (__hsail_segmentp_private, __hsail_segmentp_group): 
> Likewise.
> * rt/queue.c (__hsail_ldqueuereadindex, __hsail_ldqueuewriteindex,
> __hsail_addqueuewriteindex, __hsail_casqueuewriteindex,
> __hsail_stqueuereadindex, __hsail_stqueuewriteindex): Cast integral 
> value
> to uintptr_t first before casting to pointer.
> * rt/workitems.c (__hsail_alloca_pop_frame): Cast memcpy first 
> argument to
> void * to avoid warning.
>
> --- libhsail-rt/configure.tgt.jj2017-01-30 09:31:51.0 +0100
> +++ libhsail-rt/configure.tgt   2017-01-30 09:35:04.402926654 +0100
> @@ -28,9 +28,7 @@
>  # broken systems. Currently it has been tested only on x86_64 Linux
>  # of the upstream gcc targets. More targets shall be added after testing.
>  case "${target}" in
> -  i[[3456789]]86-*linux*)
> -;;
> -  x86_64-*-linux*)
> +  x86_64-*-linux* | i?86-*-linux*)
>  ;;
>  *)
>  UNSUPPORTED=1
> --- libhsail-rt/rt/sat_arithmetic.c.jj  2017-01-26 09:17:46.0 +0100
> +++ libhsail-rt/rt/sat_arithmetic.c 2017-01-30 10:27:27.861325330 +0100
> @@ -49,21 +49,19 @@ __hsail_sat_add_u16 (uint16_t a, uint16_
>  uint32_t
>  __hsail_sat_add_u32 (uint32_t a, uint32_t b)
>  {
> -  uint64_t c = (uint64_t) a + (uint64_t) b;
> -  if (c > UINT32_MAX)
> +  uint32_t c;
> +  if (__builtin_add_overflow (a, b, ))
>  return UINT32_MAX;
> -  else
> -return c;
> +  return c;
>  }
>
>  uint64_t
>  __hsail_sat_add_u64 (uint64_t a, uint64_t b)
>  {
> -  __uint128_t c = (__uint128_t) a + (__uint128_t) b;
> -  if (c > UINT64_MAX)
> +  uint64_t c;
> +  if (__builtin_add_overflow (a, b, ))
>  return UINT64_MAX;
> -  else
> -return c;
> +  return c;
>  }
>
>  int8_t
> @@ -93,25 +91,19 @@ __hsail_sat_add_s16 (int16_t a, int16_t
>  int32_t
>  __hsail_sat_add_s32 (int32_t a, int32_t b)
>  {
> -  int64_t c = (int64_t) a + (int64_t) b;
> -  if (c > INT32_MAX)
> -return INT32_MAX;
> -  else if (c < INT32_MIN)
> -return INT32_MIN;
> -  else
> -return c;
> +  int32_t c;
> +  if (__builtin_add_overflow (a, b, ))
> +return b < 0 ? INT32_MIN : INT32_MAX;
> +  return c;
>  }
>
>  int64_t
>  __hsail_sat_add_s64 (int64_t a, int64_t b)
>  {
> -  __int128_t c = (__int128_t) a + (__int128_t) b;
> -  if (c > INT64_MAX)
> -return INT64_MAX;
> -  else if (c < INT64_MIN)
> -return INT64_MIN;
> -  else
> -return c;
> +  int64_t c;
> +  if (__builtin_add_overflow (a, b, ))
> +return b < 0 ? INT64_MIN : INT64_MAX;
> +  return c;
>  }
>
>  uint8_t
> @@ -120,8 +112,6 @@ __hsail_sat_sub_u8 (uint8_t a, uint8_t b
>int16_t c = (uint16_t) a - (uint16_t) b;
>if (c < 0)
>  return 0;
> -  else if (c > UINT8_MAX)
> -return UINT8_MAX;
>else
>  return c;
>  }
> @@ -132,8 +122,6 @@ __hsail_sat_sub_u16 (uint16_t a, uint16_
>int32_t c = (uint32_t) a - (uint32_t) b;
>if (c < 0)
>  return 0;
> -  else if (c > UINT16_MAX)
> -return UINT16_MAX;
>else
>  return c;
>  }
> @@ -141,25 +129,19 @@ __hsail_sat_sub_u16 (uint16_t a, uint16_
>  uint32_t
>  __hsail_sat_sub_u32 (uint32_t a, uint32_t b)
>  {
> -  int64_t c = (uint64_t) a - (uint64_t) b;
> -  if (c < 0)
> +  uint32_t c;
> +  if (__builtin_sub_overflow (a, b, ))
>  return 0;
> -  else if (c > UINT32_MAX)
> -return UINT32_MAX;
> -  else
> -return c;
> +  return c;
>  }
>
>  uint64_t
>  __hsail_sat_sub_u64 (uint64_t a, uint64_t b)
>

Re: [PATCH] Fix gcc-5-branch build with libc++

2017-01-31 Thread David Edelsohn

> As discussed with the FreeBSD gcc ports maintainer, building the
> gcc-5-branch with libc++ requires standard C++ headers to be included
> *before* gcc/system.h, otherwise the redefinition of abort() to
> fancy_abort() will cause trouble.
>
> In trunk r235362, most gcc C++ sources were updated to define
> INCLUDE_xxx macros before including gcc/system.h, which fixes the
> incompatibility with libc++.  However, this revision is most likely too
> disruptive to backport to the gcc-5-branch.

As discussed in the thread to define INCLUDE_xxx macros, no headers
may be included before GCC system.h because it breaks the GCC build on
some systems.  The inclusion of C++ headers prior to system.h pulls in
system headers that define global macros in conflict with system.h and
system headers included later in the sequence.

The developers who slipped in C++ headers did so in error, and it has
been corrected.

Anyone who suggested to the FreeBSD community that it would be
acceptable to include C++ headers early was severely mistaken.  Sorry.

Thanks, David

Re: [committed] move constant to the right of relational operators (Re: [PATCH 4/5] distinguish likely and unlikely results (PR 78703))

2017-01-31 Thread Martin Sebor


On 01/31/2017 01:49 PM, Bernhard Reutner-Fischer wrote:

On 31 January 2017 00:19:46 CET, Martin Sebor  wrote:

So I see the introduction of many

if (const OP object) expressions

Can you please fix those as an independent patch after #4 and #5 are
installed on the trunk?  Consider that patch pre-approved, but please
post it here for the historical record.

I think a regexp of paren followed by a constant would probably take

you

to them pretty quickly.


I committed the attached patch in r245040.


The majority of these are not equivalent and I'm curious why they aren't?


All the changes in that revision should be equivalent to the original
code and thus no-ops.  If you spotted some that aren't that wouldn't
be good (and should cause test failures).  If you see some I messed
up please point them out.

Thanks
Martin

Re: [wwwdocs] changes.html - document new warning options

2017-01-31 Thread Martin Sebor


On 01/31/2017 02:16 PM, Gerald Pfeifer wrote:

Wow, that's quite a patch.  And quite some contributions behind that! :-)

Let me offer some comments, and then I suggest you commit what you
have (taking these comments into account), and we can still improve
things then if there is further feedback.


Sounds good.



Index: gcc-7/changes.html
===
+  GCC 7 can determine the return value or range of return values of
+some calls to the sprintf family of functions and make
+it available to other optimization passes.  Some calls to the 
+  snprintf function with a zero size argument can be folded

Formatting here appears a little odd?  I wouldn't have that line break
afer .


Sure.



+into constants.  The optimization is included in -O1
+and can be selectively controlled by the

"This optimization"


Okay.



+GCC 7 contains a number of enhancements that help detect buffer overflow
+  and other forms of invalid memory accesses.

"buffer overflows" (plural) ?


Both are common in literature.  Google returns about 5,220 hits for
the string "detect buffer overflow and" and about 2,980 for the
singular, so I kept it as is.



+   errors.  Such bugs have been known to be the source of
+   vulnerabilities and a target of exploits.

Perhaps say "security vulnerabilities"?  Not sure about that.


Sure.


+  -Walloc-size-larger-than=PTRDIFF_MAX is included
+  in -Wall.

PTRDIFF_MAX without ... I would say, since it's not a variable.


Okay.



+  For example, the following call to malloc incorrectly tries

malloc


Done.



+void* f (int n)
+{
+  return malloc (n > 0 ? 0 : n);
+}

Great example! :-)

+The -Walloc-zero option detects calls to standard
+   and user-defined memory allocation functions decorated with attribute
+   alloc_size with a zero argument.  -Walloc-zero
+   is not included in either -Wall or -Wextra
+   and must be explicitly enabled.

Why are you adding  within ?  This should not be necessary.


Only for consistency with the other list items where it separates
paragraphs (it made it easier for me to spot missing tags and such).
I removed from where it was redundant.



+  sprintf is diagnosed because even though its
   ^^^

Why ... here?


It was a copy and paste typo.  I removed the tags.



+  output has been constrained using the modulo operation it could
+  result in as many as three bytes if mday were negative.
+  The solution is to either allocate a larger buffer or make sure
+  the argument is not negative, for example by changing mday's
+  type to unsigned or by making the type of the second operand
+  of the modulo expression to unsigned: 100U.

"changing the type"



Right, thanks.


+   -Wformat-overflow=1 is included in

No ... around 1, since it's a constant (not a variable).


Done.



+   nonnull). By taking advantage of optimization the option
+   can detect many more cases of the problem than in prior GCC
+   versions.

"optimizations"  (Or "compiler optimizations", to avoid ambiguity
whether the option was optimized or is now leveraging compiler
optimizations?)

+The -Wstringop-overflow=type option detects
+   buffer overflow in calls to string manipulation functions like

"overflows"

+   memcpy and strcpy. The option relies

Is memcpy really a string manipulation function?


The C standard calls all the functions in  (including memcpy)
string handling functions.  I'm not sure where I got "manipulation"
from.  I changed it to string handling though I'm not sure it's a big
improvement given that memcpy arguments need not be strings.



+   on Object Size Checking and has an effect similar to defining
+   the _FORTIFY_SOURCE macro.

Naive question: What is "Object Size Checking", and where is it
introduced or desribed?


It's described in the GCC manual.  I added a link to it.



+   -Wstringop-overflow=1 is enabled by default.

No s around constants.

+  For example, in the following snippet, because the call to
+   strncat specifies a maximum that allows the function to
+   write past the end of the destination, it is diagnosed.  To correct
+   the problem and avoid the overflow the function should be called
+   with a size of at most sizeof d - strlen(d).

I'm getting tired this evening, but is this taking care of the \0
being added?  Or would that require sizeof d - strlen(d) - 1?


It sure would!

I've committed the patch with the changes above (plus another example
to make Aldy extra happy).  Please let me know if you spot something
else that needs adjusting.

Thanks for the careful review (and debugging)!

Martin

GCC patch committed: fix unwind info for 32-bit x86 split stack

2017-01-31 Thread Ian Lance Taylor

With a new libgo patch (waiting to be committed until we are out of
stage 4) I noticed that the runtime/pprof tests were failing about 2%
of the time on 32-bit x86.  The problem turned out to be testing
profiling with a very call-heavy benchmark.  The 32-bit x86 split
stack sequence pushes two numbers on the stack as arguments to the
__morestack function.  GCC was not generating unwind information for
those push instructions.  When the SIGPROF signal was delivered while
executing the second push instruction or the call instruction, the
unwinder would be unable to locate the PC and would crash.

Fortunately this is easy to fix as the instructions always run at the
very beginning of the function, before any stack manipulation has
occurred.  I have committed this patch to fix the problem.
Bootstrapped on x86_64-pc-linux-gnu.  Ran the Go testsuite and all the
split-stack tests in both 32-bit and 64-bit mode.

Ian

2017-01-31  Ian Lance Taylor  

* config/i386/i386.c (ix86_expand_split_stack_prologue): Add
REG_ARGS_SIZE note to 32-bit push insns and call insn.
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 244456)
+++ gcc/config/i386/i386.c  (working copy)
@@ -14944,6 +14944,7 @@ ix86_expand_split_stack_prologue (void)
   allocate_rtx = GEN_INT (allocate);
   args_size = crtl->args.size >= 0 ? crtl->args.size : 0;
   call_fusage = NULL_RTX;
+  rtx pop = NULL_RTX;
   if (TARGET_64BIT)
 {
   rtx reg10, reg11;
@@ -15021,13 +15022,18 @@ ix86_expand_split_stack_prologue (void)
 }
   else
 {
-  emit_insn (gen_push (GEN_INT (args_size)));
-  emit_insn (gen_push (allocate_rtx));
+  rtx_insn *insn = emit_insn (gen_push (GEN_INT (args_size)));
+  add_reg_note (insn, REG_ARGS_SIZE, GEN_INT (UNITS_PER_WORD));
+  insn = emit_insn (gen_push (allocate_rtx));
+  add_reg_note (insn, REG_ARGS_SIZE, GEN_INT (2 * UNITS_PER_WORD));
+  pop = GEN_INT (2 * UNITS_PER_WORD);
 }
   call_insn = ix86_expand_call (NULL_RTX, gen_rtx_MEM (QImode, fn),
GEN_INT (UNITS_PER_WORD), constm1_rtx,
-   NULL_RTX, false);
+   pop, false);
   add_function_usage_to (call_insn, call_fusage);
+  if (!TARGET_64BIT)
+add_reg_note (call_insn, REG_ARGS_SIZE, GEN_INT (0));
 
   /* In order to make call/return prediction work right, we now need
  to execute a return instruction.  See

Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-01-31 Thread Jeff Law


On 01/31/2017 04:22 PM, Jakub Jelinek wrote:

On Tue, Jan 31, 2017 at 03:52:10PM -0700, Jeff Law wrote:

Which makes your patch safe -- but introduces a non-obvious dependency
between useless_type_conversion_p and your definition of
INTEGRAL_BOOLEAN_TYPE and how it's used in the vectorizer.


The predicate is simply true for all BOOLEAN_TYPEs and all types that are
compatible with it in the middle-end.  BOOLEAN_TYPEs with different
precisions are not considered compatible types, therefore they won't appear
together without explicit casts in between.
I understand that.  My objection is that there's a highly non-obvious 
dependency between useless_type_conversion, INTEGRAL_TYPE_P (in 
particular how its used in the tree-vect-patterns).


If someone came along and changed useless_type_conversion, they'd have 
no way to know they need to go fix up INTEGRAL_TYPE_P and/or the 
vectorizer at the same time.


Thus my suggestion that you explicitly check the precision and rename 
the macro.  THough I don't offhand have a better suggestion.


Jeff

Re: [PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-01-31 Thread Jeff Law


On 01/31/2017 02:01 AM, Richard Biener wrote:


This amends ADJUST_FIELD_ALIGN to always get the type of the field
as argument but make the field itself optional.  All actual target
macro implementations only look at the type of the field but FRV
(which seems to misuse ADJUST_FIELD_ALIGN to do bitfield layout
rather than using one of the three standard ways - Alex/Nick?).

Didn't we deprecate FRV?  Oh, that was MEP..  Nevermind.




This speeds up min_align_of_type (no longer needs to build a FIELD_DECL)
and thus (IMHO) makes it usable from get_object_alignment.  This
causes us no longer to return bogus answers for indirect accesses to
doubles on i?86 and expand to RTL with proper MEM_ALIGN.  (it also
makes the previous fix for PR79256 no longer necessary)

Bootstrap and regtest running on x86_64-unknown-linux-gnu - is this ok
for trunk at this stage?

grep found a ADJUST_FIELD_ALIGN use in libobjc/encoding.c but that
is fed a C string(!?) as FIELD_DECL so I discounted it as unrelated
(and grep didn't find a way this macro could be defined there)?
Presumably this is the code that takes the structure and encodes 
information about it for the runtime.  Though taking a string sounds 
horribly broken.


I suspect it gets included via tm.h.

I bet if someone built a cross far enough to build libobjc we could see 
it in action.  It does make one wonder how this part of libobjc could 
possibly be working on targets that define ADJUST_FIELD_ALIGN.


I'll note it's been that way since libobjc was moved into its own 
directory, but wasn't like that prior to moving into its own directory.


I have no idea what to do here...


Jeff

Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-01-31 Thread Jakub Jelinek

On Tue, Jan 31, 2017 at 03:52:10PM -0700, Jeff Law wrote:
> Which makes your patch safe -- but introduces a non-obvious dependency
> between useless_type_conversion_p and your definition of
> INTEGRAL_BOOLEAN_TYPE and how it's used in the vectorizer.

The predicate is simply true for all BOOLEAN_TYPEs and all types that are
compatible with it in the middle-end.  BOOLEAN_TYPEs with different
precisions are not considered compatible types, therefore they won't appear
together without explicit casts in between.

> I think if you checked for a TYPE_PRECISION of 1 in INTEGRAL_BOOLEAN_TYPE,
> that would help -- but leaves INTEGRAL_BOOLEAN_TYPE poorly named.

I used BOOLEAN_TYPE_P initially, INTEGRAL_BOOLEAN_TYPE_P has been Richard's
suggestion on IRC.  I'm fine with other names as long as it is not too long.

Jakub

Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-01-31 Thread Joseph Myers

On Tue, 31 Jan 2017, Jeff Law wrote:

> My general inclination is to ask this to wait for gcc-8 as it is not a
> regression, but instead a false positive in a new warning.

I'd hope it would be possible for current releases of GCC and glibc to 
build with each other.  As this seems to be a case where the warning is 
clearly bogus (and started appearing during GCC stage 4), a fix in GCC 
seems more appropriate than disabling the warning for that code in glibc.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-01-31 Thread Jeff Law


On 01/31/2017 03:42 PM, Jakub Jelinek wrote:

On Tue, Jan 31, 2017 at 03:33:04PM -0700, Jeff Law wrote:

My general inclination is to ask this to wait for gcc-8 as it is not a
regression, but instead a false positive in a new warning.


Well, as the warning is enabled by -Wall, the false positives in it are
regressions (while false negatives are not, as it is a new warning).

I wouldn't necessarily call that a regression though. But I see your point.




However, if we see this triggering with any significant frequency, then we
should reevaluate.  Getting Marek's build logs and grepping through them
might guide us a bit on this...


I'm not sure what the rationale is for length of zero at level 1 and length
of one at higher levels for unknown strings.  I guess I can kindof see the
former, though I suspect if we looked at the actual strings, zero length
would actually be uncommon.

For level 1 and above assuming a single character seems way too tolerant.
We're issuing a "may" warning, so ISTM we ought to be assuming a much larger
length here.  I realize that makes a lot more noise for the warning, but
doesn't that better reflect what may happen?


If the compiler doesn't know anything useful about the string lengths
(and that is the case when the value range is VARYING or just extremely
large), then using the maximum bounds is really not useful at all.

We would then have to complain about every "%s %s" because in theory that
would not fit into address space and similar.
I'm well aware of that.  But ISTM the choice of "1" is so lenient that 
it makes tracking in those cases largely pointless -- at which point I 
have to ask why bother tracking at all when we've got an unconstrained 
string like that.


jeff

Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-01-31 Thread Jeff Law


On 01/31/2017 03:46 PM, Jakub Jelinek wrote:

On Tue, Jan 31, 2017 at 03:41:27PM -0700, Jeff Law wrote:

useless_type_conversion_p says that precision 1 unsigned BOOLEAN_TYPE
conversion to/from precision 1 unsigned integral type is useless,
but apparently the vectorizer heavily relies on the BOOLEAN_TYPE vs.
unsigned integral:1 distinction (uses VECTOR_BOOLEAN_TYPE_P types
only for real BOOLEAN_TYPE).  As the conversion is useless, we can end up
as in the following testcase with e.g. binary operations which have boolean
on one side and unsigned int:1 on another.

The following patch replaces all explicit BOOLEAN_TYPE checks in the
vectorizer with a new macro that handles unsigned integral:1 the same
as BOOLEAN_TYPE.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-01-31  Jakub Jelinek  

PR tree-optimization/79284
* tree.h (INTEGRAL_BOOLEAN_TYPE_P): Define.
* tree-vect-stmts.c (vect_get_vec_def_for_operand,
vectorizable_mask_load_store, vectorizable_operation,
vect_is_simple_cond, get_same_sized_vectype): Use it instead
of comparing TREE_CODE of a type against BOOLEAN_TYPE.
* tree-vect-patterns.c (check_bool_pattern, search_type_for_mask_1,
vect_recog_bool_pattern, vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.

Didn't we determine that the Ada front end generated boolean types with a
precision greater than 1?   And if so, does that suggest that
INTEGRAL_BOOLEAN_TYPE_P might need further adjustment (and that we likely
have latent bugs in the vectorizer if it was presented with such types?)


Some BOOLEAN_TYPE types (I think in Fortran too, though it might have
already changed) have bigger precision than one, but still have only 2 valid
values, 0 and 1 and I'm not aware of the vectorizer generating something
wrong for those.  For the non-BOOLEAN_TYPEs, only unsigned:1 is special:

  if (INTEGRAL_TYPE_P (inner_type)
  && INTEGRAL_TYPE_P (outer_type))
{
  /* Preserve changes in signedness or precision.  */
  if (TYPE_UNSIGNED (inner_type) != TYPE_UNSIGNED (outer_type)
  || TYPE_PRECISION (inner_type) != TYPE_PRECISION (outer_type))
return false;

  /* Preserve conversions to/from BOOLEAN_TYPE if types are not
 of precision one.  */
  if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
   != (TREE_CODE (outer_type) == BOOLEAN_TYPE))
  && TYPE_PRECISION (outer_type) != 1)
return false;

  /* We don't need to preserve changes in the types minimum or
 maximum value in general as these do not generate code
 unless the types precisions are different.  */
  return true;
Which makes your patch safe -- but introduces a non-obvious dependency 
between useless_type_conversion_p and your definition of 
INTEGRAL_BOOLEAN_TYPE and how it's used in the vectorizer.


I think if you checked for a TYPE_PRECISION of 1 in 
INTEGRAL_BOOLEAN_TYPE, that would help -- but leaves 
INTEGRAL_BOOLEAN_TYPE poorly named.


Am I missing something here?

jeff

Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-01-31 Thread Jakub Jelinek

On Tue, Jan 31, 2017 at 03:41:27PM -0700, Jeff Law wrote:
> > useless_type_conversion_p says that precision 1 unsigned BOOLEAN_TYPE
> > conversion to/from precision 1 unsigned integral type is useless,
> > but apparently the vectorizer heavily relies on the BOOLEAN_TYPE vs.
> > unsigned integral:1 distinction (uses VECTOR_BOOLEAN_TYPE_P types
> > only for real BOOLEAN_TYPE).  As the conversion is useless, we can end up
> > as in the following testcase with e.g. binary operations which have boolean
> > on one side and unsigned int:1 on another.
> > 
> > The following patch replaces all explicit BOOLEAN_TYPE checks in the
> > vectorizer with a new macro that handles unsigned integral:1 the same
> > as BOOLEAN_TYPE.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > 2017-01-31  Jakub Jelinek  
> > 
> > PR tree-optimization/79284
> > * tree.h (INTEGRAL_BOOLEAN_TYPE_P): Define.
> > * tree-vect-stmts.c (vect_get_vec_def_for_operand,
> > vectorizable_mask_load_store, vectorizable_operation,
> > vect_is_simple_cond, get_same_sized_vectype): Use it instead
> > of comparing TREE_CODE of a type against BOOLEAN_TYPE.
> > * tree-vect-patterns.c (check_bool_pattern, search_type_for_mask_1,
> > vect_recog_bool_pattern, vect_recog_mask_conversion_pattern): Likewise.
> > * tree-vect-slp.c (vect_get_constant_vectors): Likewise.
> > * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
> Didn't we determine that the Ada front end generated boolean types with a
> precision greater than 1?   And if so, does that suggest that
> INTEGRAL_BOOLEAN_TYPE_P might need further adjustment (and that we likely
> have latent bugs in the vectorizer if it was presented with such types?)

Some BOOLEAN_TYPE types (I think in Fortran too, though it might have
already changed) have bigger precision than one, but still have only 2 valid
values, 0 and 1 and I'm not aware of the vectorizer generating something
wrong for those.  For the non-BOOLEAN_TYPEs, only unsigned:1 is special:

  if (INTEGRAL_TYPE_P (inner_type)
  && INTEGRAL_TYPE_P (outer_type))
{
  /* Preserve changes in signedness or precision.  */
  if (TYPE_UNSIGNED (inner_type) != TYPE_UNSIGNED (outer_type)
  || TYPE_PRECISION (inner_type) != TYPE_PRECISION (outer_type))
return false;

  /* Preserve conversions to/from BOOLEAN_TYPE if types are not
 of precision one.  */
  if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
   != (TREE_CODE (outer_type) == BOOLEAN_TYPE))
  && TYPE_PRECISION (outer_type) != 1)
return false;

  /* We don't need to preserve changes in the types minimum or
 maximum value in general as these do not generate code
 unless the types precisions are different.  */
  return true;
}

Jakub

Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-01-31 Thread Jakub Jelinek

On Tue, Jan 31, 2017 at 03:33:04PM -0700, Jeff Law wrote:
> My general inclination is to ask this to wait for gcc-8 as it is not a
> regression, but instead a false positive in a new warning.

Well, as the warning is enabled by -Wall, the false positives in it are
regressions (while false negatives are not, as it is a new warning).

> However, if we see this triggering with any significant frequency, then we
> should reevaluate.  Getting Marek's build logs and grepping through them
> might guide us a bit on this...
> 
> 
> I'm not sure what the rationale is for length of zero at level 1 and length
> of one at higher levels for unknown strings.  I guess I can kindof see the
> former, though I suspect if we looked at the actual strings, zero length
> would actually be uncommon.
> 
> For level 1 and above assuming a single character seems way too tolerant.
> We're issuing a "may" warning, so ISTM we ought to be assuming a much larger
> length here.  I realize that makes a lot more noise for the warning, but
> doesn't that better reflect what may happen?

If the compiler doesn't know anything useful about the string lengths
(and that is the case when the value range is VARYING or just extremely
large), then using the maximum bounds is really not useful at all.

We would then have to complain about every "%s %s" because in theory that
would not fit into address space and similar.

Jakub

Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-01-31 Thread Jeff Law


On 01/31/2017 11:26 AM, Jakub Jelinek wrote:

Hi!

useless_type_conversion_p says that precision 1 unsigned BOOLEAN_TYPE
conversion to/from precision 1 unsigned integral type is useless,
but apparently the vectorizer heavily relies on the BOOLEAN_TYPE vs.
unsigned integral:1 distinction (uses VECTOR_BOOLEAN_TYPE_P types
only for real BOOLEAN_TYPE).  As the conversion is useless, we can end up
as in the following testcase with e.g. binary operations which have boolean
on one side and unsigned int:1 on another.

The following patch replaces all explicit BOOLEAN_TYPE checks in the
vectorizer with a new macro that handles unsigned integral:1 the same
as BOOLEAN_TYPE.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-01-31  Jakub Jelinek  

PR tree-optimization/79284
* tree.h (INTEGRAL_BOOLEAN_TYPE_P): Define.
* tree-vect-stmts.c (vect_get_vec_def_for_operand,
vectorizable_mask_load_store, vectorizable_operation,
vect_is_simple_cond, get_same_sized_vectype): Use it instead
of comparing TREE_CODE of a type against BOOLEAN_TYPE.
* tree-vect-patterns.c (check_bool_pattern, search_type_for_mask_1,
vect_recog_bool_pattern, vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
Didn't we determine that the Ada front end generated boolean types with 
a precision greater than 1?   And if so, does that suggest that 
INTEGRAL_BOOLEAN_TYPE_P might need further adjustment (and that we 
likely have latent bugs in the vectorizer if it was presented with such 
types?)


jeff

Re: [PATCH/VECT/AARCH64] Improve cost model for ThunderX2 CN99xx

2017-01-31 Thread Andrew Pinski

On Sat, Jan 28, 2017 at 12:34 PM, Andrew Pinski  wrote:
> Hi,
>   On some (most) AARCH64 cores, it is not always profitable to
> vectorize some integer loops.  This patch does two things (I can split
> it into different patches if needed).
> 1) It splits the aarch64 back-end's vector cost model's vector and
> scalar costs into int and fp fields
> 1a) For thunderx2t99, models correctly the integer vector/scalar costs.
> 2) Fixes/Improves a few calls to record_stmt_cost in tree-vect-loop.c
> where stmt_info was not being passed.
>
> OK?  Bootstrapped and tested on aarch64-linux-gnu and provides 20% on
> libquantum and ~1% overall on SPEC CPU 2006 int.

Here is the updated patch with the fixes requested by both Richards.
Still the same performance as above.

OK?

Thanks,
Andrew

ChangLog:
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Pass
stmt_info to record_stmt_cost.
(vect_get_known_peeling_cost): Pass stmt_info if known to record_stmt_cost.

* config/aarch64/aarch64-protos.h (cpu_vector_cost): Split
cpu_vector_cost field into
scalar_int_stmt_cost and scalar_fp_stmt_cost.  Split vec_stmt_cost
field into vec_int_stmt_cost and vec_fp_stmt_cost.
* config/aarch64/aarch64.c (generic_vector_cost): Update for the
splitting of scalar_stmt_cost and vec_stmt_cost.
(thunderx_vector_cost): Likewise.
(cortexa57_vector_cost): LIkewise.
(exynosm1_vector_cost): Likewise.
(xgene1_vector_cost): Likewise.
(thunderx2t99_vector_cost): Improve after the splitting of the two fields.
(aarch64_builtin_vectorization_cost): Update for the splitting of
scalar_stmt_cost and vec_stmt_cost.

>
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Pass
> stmt_info to record_stmt_cost.
> (vect_get_known_peeling_cost): Pass stmt_info if known to record_stmt_cost.
>
> * config/aarch64/aarch64-protos.h (cpu_vector_cost): Split
> cpu_vector_cost field into
> scalar_int_stmt_cost and scalar_fp_stmt_cost.  Split vec_stmt_cost
> field into vec_int_stmt_cost and vec_fp_stmt_cost.
> * config/aarch64/aarch64.c (generic_vector_cost): Update for the
> splitting of scalar_stmt_cost and vec_stmt_cost.
> (thunderx_vector_cost): Likewise.
> (cortexa57_vector_cost): LIkewise.
> (exynosm1_vector_cost): Likewise.
> (xgene1_vector_cost): Likewise.
> (thunderx2t99_vector_cost): Improve after the splitting of the two fields.
> (aarch64_builtin_vectorization_cost): Update for the splitting of
> scalar_stmt_cost and vec_stmt_cost.
Index: config/aarch64/aarch64-protos.h
===
--- config/aarch64/aarch64-protos.h (revision 245070)
+++ config/aarch64/aarch64-protos.h (working copy)
@@ -151,11 +151,17 @@ struct cpu_regmove_cost
 /* Cost for vector insn classes.  */
 struct cpu_vector_cost
 {
-  const int scalar_stmt_cost;   /* Cost of any scalar operation,
+  const int scalar_int_stmt_cost;   /* Cost of any int scalar operation,
+   excluding load and store.  */
+  const int scalar_fp_stmt_cost;/* Cost of any fp scalar operation,
excluding load and store.  */
   const int scalar_load_cost;   /* Cost of scalar load.  */
   const int scalar_store_cost;  /* Cost of scalar store.  */
-  const int vec_stmt_cost;  /* Cost of any vector operation,
+  const int vec_int_stmt_cost;  /* Cost of any int vector operation,
+   excluding load, store, permute,
+   vector-to-scalar and
+   scalar-to-vector operation.  */
+  const int vec_fp_stmt_cost;   /* Cost of any fp vector operation,
excluding load, store, permute,
vector-to-scalar and
scalar-to-vector operation.  */
Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c(revision 245070)
+++ config/aarch64/aarch64.c(working copy)
@@ -365,10 +365,12 @@ static const struct cpu_regmove_cost thu
 /* Generic costs for vector insn classes.  */
 static const struct cpu_vector_cost generic_vector_cost =
 {
-  1, /* scalar_stmt_cost  */
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
   1, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
-  1, /* vec_stmt_cost  */
+  1, /* vec_int_stmt_cost  */
+  1, /* vec_fp_stmt_cost  */
   2, /* vec_permute_cost  */
   1, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
@@ -383,10 +385,12 @@ static const struct cpu_vector_cost gene
 /* ThunderX costs for vector insn classes.  */
 static const struct cpu_vector_cost thunderx_vector_cost =
 {
-  1, /* scalar_stmt_cost  */
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */

Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-01-31 Thread Jeff Law


On 01/30/2017 02:28 PM, Martin Sebor wrote:

Bug 79275 - -Wformat-overflow false positive exceeding INT_MAX in
glibc sysdeps/posix/tempname.c points out a false positive found
during a Glibc build and caused by the checker using the upper
bound of a range of precisions in string directives with string
arguments of non-constant length.  The attached patch relaxes
the checker to use the lower bound instead when appropriate.

Martin

gcc-79275.diff


PR middle-end/79275 -  -Wformat-overflow false positive exceeding INT_MAX in 
glibc sysdeps/posix/tempname.c

gcc/testsuite/ChangeLog:

PR middle-end/79275
* gcc.dg/tree-ssa/builtin-sprintf-warn-11.c: New test.
* gcc.dg/tree-ssa/pr79275.c: New test.

gcc/ChangeLog:

PR middle-end/79275
* gimple-ssa-sprintf.c (get_string_length): Set lower bound to zero.
(format_string): Tighten up the range of output for non-constant
strings and correct the expected range for wide non-constant strings.
My general inclination is to ask this to wait for gcc-8 as it is not a 
regression, but instead a false positive in a new warning.


However, if we see this triggering with any significant frequency, then 
we should reevaluate.  Getting Marek's build logs and grepping through 
them might guide us a bit on this...



I'm not sure what the rationale is for length of zero at level 1 and 
length of one at higher levels for unknown strings.  I guess I can 
kindof see the former, though I suspect if we looked at the actual 
strings, zero length would actually be uncommon.


For level 1 and above assuming a single character seems way too 
tolerant.  We're issuing a "may" warning, so ISTM we ought to be 
assuming a much larger length here.  I realize that makes a lot more 
noise for the warning, but doesn't that better reflect what may happen?







diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 8261a44..c0c0a5f 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1986,43 +1987,89 @@ format_string (const directive , tree arg)
 }
   else
 {
-  /* For a '%s' and '%ls' directive with a non-constant string,
-the minimum number of characters is the greater of WIDTH
-and either 0 in mode 1 or the smaller of PRECISION and 1
-in mode 2, and the maximum is PRECISION or -1 to disable
-tracking.  */
+  /* For a '%s' and '%ls' directive with a non-constant string (either
+one of a number of strings of known length or an unknown string)
+the minimum number of characters is lesser of PRECISION[0] and
+the length of the shortest known string or zero, and the maximum
+is the lessser of the length of the longest known string or

s/lessser/lesser/

Jeff

Re: [PATCH 1/6] RISC-V Port: gcc/config/riscv/riscv.c

2017-01-31 Thread Andrew Waterman

On Tue, Jan 31, 2017 at 10:01 AM, Richard Henderson  wrote:
> On 01/30/2017 04:53 PM, Andrew Waterman wrote:
>> The ISA spec references an out-of-date calling convention, and will be
>> removed in the next revision to orthogonalize the ABI from the ISA.
>> We are in the process of drafting a RISC-V ELF psABI spec, which the
>> GCC port targets.
>> https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md
>> In this calling convention, the GPRs and FPRs are allocated
>> separately, which is more performant, especially when XLEN and FLEN
>> are not equal.
>
> Good to know.
>
> Re the new spec, you don't require arguments stored in two fp registers to be
> aligned, but you do require arguments stored in two int registers to be
> aligned?  Why?
>
> I see that as a waste when it comes to argument lists (for xlen=32) like
>
>   void foo(class *x, int64_t y, int a, int b, int c, int d)
>
> It's not as if the ISA has a store-multiple-register instruction that requires
> even register numbers...

We will reconsider only aligning arguments in the varargs case.

>
>
> r~

[PATCH] Fix gcc-5-branch build with libc++

2017-01-31 Thread Dimitry Andric

As discussed with the FreeBSD gcc ports maintainer, building the
gcc-5-branch with libc++ requires standard C++ headers to be included
*before* gcc/system.h, otherwise the redefinition of abort() to
fancy_abort() will cause trouble.

In trunk r235362, most gcc C++ sources were updated to define
INCLUDE_xxx macros before including gcc/system.h, which fixes the
incompatibility with libc++.  However, this revision is most likely too
disruptive to backport to the gcc-5-branch.

Attached is the patch which has been used in the FreeBSD gcc5 port (see
also https://bugs.freebsd.org/216266 ).  Proposed ChangeLog entry:

2017-01-31  Dimitry Andric 

* system.h: For C++, always include .
* graphite-isl-ast-to-gimple.c: Include standard C++ headers before
system.h.
* auto-profile.c: Likewise.  Also remove string.h, since it is already
included in system.h.

-Dimitry


gcc5-cxx-includes-1.diff
Description: Binary data


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [PATCH] Fix -masm=intel output for AVX512{F,VL} gathers (PR target/79299)

2017-01-31 Thread Jakub Jelinek

On Tue, Jan 31, 2017 at 08:07:05PM +0100, Rainer Orth wrote:
> Hi Jakub,
> 
> > --- gcc/testsuite/gcc.target/i386/avx512vl-pr79299-1.c.jj   2017-01-31 
> > 13:15:20.919592886 +0100
> > +++ gcc/testsuite/gcc.target/i386/avx512vl-pr79299-1.c  2017-01-31 
> > 14:33:38.425028304 +0100
> > @@ -0,0 +1,91 @@
> > +/* PR target/79299 */
> > +/* { dg-do assemble { target avx512vl } } */
> > +/* { dg-options "-Ofast -mavx512vl -masm=intel" } */
> 
> both tests need
> 
> /* { dg-require-effective-target masm_intel } */
> 
> I believe.

Added in my copy.

Jakub

Re: [PATCH] Fix ICE with spelling hints within explicit namespace aliases (PR c++/79298)

2017-01-31 Thread Jason Merrill

OK.

On Tue, Jan 31, 2017 at 3:10 PM, David Malcolm  wrote:
> PR c++/79298 reports a segfault when handling a qualified name lookup error,
> where consider_binding_level dereferences a NULL cp_binding_level *.
>
> The root cause is that suggest_alternative_in_explicit_scope assumes that
> NAMESPACE_LEVEL (scope) is non-NULL, but this can be NULL for namespace 
> aliases.
>
> This patch adds a use of ORIGINAL_NAMESPACE to look through any namespace
> aliases.
>
> Successfully bootstrapped on x86_64-pc-linux-gnu.
>
> OK for trunk?
>
> gcc/cp/ChangeLog:
> PR c++/79298
> * name-lookup.c (suggest_alternative_in_explicit_scope): Resolve
> any namespace aliases.
>
> gcc/testsuite/ChangeLog:
> PR c++/79298
> * g++.dg/spellcheck-pr79298.C: New test case.
> ---
>  gcc/cp/name-lookup.c  |  3 +++
>  gcc/testsuite/g++.dg/spellcheck-pr79298.C | 17 +
>  2 files changed, 20 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/spellcheck-pr79298.C
>
> diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
> index a3cb7ee..994f7f0 100644
> --- a/gcc/cp/name-lookup.c
> +++ b/gcc/cp/name-lookup.c
> @@ -4545,6 +4545,9 @@ bool
>  suggest_alternative_in_explicit_scope (location_t location, tree name,
>tree scope)
>  {
> +  /* Resolve any namespace aliases.  */
> +  scope = ORIGINAL_NAMESPACE (scope);
> +
>cp_binding_level *level = NAMESPACE_LEVEL (scope);
>
>best_match  bm (name);
> diff --git a/gcc/testsuite/g++.dg/spellcheck-pr79298.C 
> b/gcc/testsuite/g++.dg/spellcheck-pr79298.C
> new file mode 100644
> index 000..4d7bbf9
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/spellcheck-pr79298.C
> @@ -0,0 +1,17 @@
> +// Ensure that we can offer suggestions for misspellings via a
> +// namespace alias.
> +
> +namespace N { int x; int color; }
> +namespace M = N;
> +namespace O = M;
> +
> +int foo ()
> +{
> +  return M::y; // { dg-error ".y. is not a member of .M." }
> +}
> +
> +int bar ()
> +{
> +  return O::colour; // { dg-error ".colour. is not a member of .O." }
> +  // { dg-message "suggested alternative: .color." "" { target *-*-* } .-1 }
> +}
> --
> 1.8.5.3
>

Re: [wwwdocs] changes.html - document new warning options

2017-01-31 Thread Gerald Pfeifer

Wow, that's quite a patch.  And quite some contributions behind that! :-)  

Let me offer some comments, and then I suggest you commit what you 
have (taking these comments into account), and we can still improve 
things then if there is further feedback.

Index: gcc-7/changes.html
===
+  GCC 7 can determine the return value or range of return values of
+some calls to the sprintf family of functions and make
+it available to other optimization passes.  Some calls to the 
+  snprintf function with a zero size argument can be folded

Formatting here appears a little odd?  I wouldn't have that line break
afer .

+into constants.  The optimization is included in -O1
+and can be selectively controlled by the

"This optimization"

+GCC 7 contains a number of enhancements that help detect buffer overflow
+  and other forms of invalid memory accesses.

"buffer overflows" (plural) ?

+   errors.  Such bugs have been known to be the source of
+   vulnerabilities and a target of exploits.

Perhaps say "security vulnerabilities"?  Not sure about that.

+  -Walloc-size-larger-than=PTRDIFF_MAX is included
+  in -Wall.

PTRDIFF_MAX without ... I would say, since it's not a variable.

+  For example, the following call to malloc incorrectly tries

malloc

+void* f (int n)
+{
+  return malloc (n > 0 ? 0 : n);
+}

Great example! :-)

+The -Walloc-zero option detects calls to standard
+   and user-defined memory allocation functions decorated with attribute
+   alloc_size with a zero argument.  -Walloc-zero
+   is not included in either -Wall or -Wextra
+   and must be explicitly enabled.

Why are you adding  within ?  This should not be necessary.

+  sprintf is diagnosed because even though its
   ^^^

Why ... here?

+  output has been constrained using the modulo operation it could
+  result in as many as three bytes if mday were negative.
+  The solution is to either allocate a larger buffer or make sure
+  the argument is not negative, for example by changing mday's
+  type to unsigned or by making the type of the second operand
+  of the modulo expression to unsigned: 100U.

"changing the type"

+   -Wformat-overflow=1 is included in

No ... around 1, since it's a constant (not a variable).

+   nonnull). By taking advantage of optimization the option
+   can detect many more cases of the problem than in prior GCC
+   versions.

"optimizations"  (Or "compiler optimizations", to avoid ambiguity
whether the option was optimized or is now leveraging compiler
optimizations?)

+The -Wstringop-overflow=type option detects
+   buffer overflow in calls to string manipulation functions like

"overflows"

+   memcpy and strcpy. The option relies

Is memcpy really a string manipulation function?

+   on Object Size Checking and has an effect similar to defining
+   the _FORTIFY_SOURCE macro.

Naive question: What is "Object Size Checking", and where is it
introduced or desribed?

+   -Wstringop-overflow=1 is enabled by default.

No s around constants.

+  For example, in the following snippet, because the call to
+   strncat specifies a maximum that allows the function to
+   write past the end of the destination, it is diagnosed.  To correct
+   the problem and avoid the overflow the function should be called
+   with a size of at most sizeof d - strlen(d).

I'm getting tired this evening, but is this taking care of the \0
being added?  Or would that require sizeof d - strlen(d) - 1?

Gerald

Re: [C++ PATCH] Fix diagnostic printing of COMPONENT_REF with ARROW_EXPR first operand (PR c++/79304)

2017-01-31 Thread Jason Merrill

OK.

On Tue, Jan 31, 2017 at 1:29 PM, Jakub Jelinek  wrote:
> Hi!
>
> For dependent expression followed by -> and identifier or ~ identifier,
> the parser creates COMPONENT_REF with ARROW_EXPR as its first argument.
> When printing that, we actually print expr->.identifier , which is not
> what the user typed.  The following patch omits the undesirable dot in
> that case, we know that dump_expr on the ARROW_EXPR already printed ->
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-01-31  Jakub Jelinek  
>
> PR c++/79304
> * error.c (dump_expr) : Don't print .
> after ARROW_EXPR.
>
> * g++.dg/diagnostic/pr79304.C: New test.
>
> --- gcc/cp/error.c.jj   2017-01-21 02:26:06.0 +0100
> +++ gcc/cp/error.c  2017-01-31 17:03:17.737123973 +0100
> @@ -2247,7 +2247,8 @@ dump_expr (cxx_pretty_printer *pp, tree
> else
>   {
> dump_expr (pp, ob, flags | TFF_EXPR_IN_PARENS);
> -   pp_cxx_dot (pp);
> +   if (TREE_CODE (ob) != ARROW_EXPR)
> + pp_cxx_dot (pp);
>   }
> dump_expr (pp, TREE_OPERAND (t, 1), flags & ~TFF_EXPR_IN_PARENS);
>}
> --- gcc/testsuite/g++.dg/diagnostic/pr79304.C.jj2017-01-31 
> 17:08:17.347318113 +0100
> +++ gcc/testsuite/g++.dg/diagnostic/pr79304.C   2017-01-31 17:08:01.0 
> +0100
> @@ -0,0 +1,20 @@
> +// PR c++/79304
> +// { dg-do compile }
> +
> +struct C { };
> +
> +template
> +struct X
> +{
> +  C* c;
> +
> +  void f() {
> +this->c.s();   // { dg-error "->c" }
> +  }
> +};
> +
> +int main()
> +{
> +  X x;
> +  x.f();
> +}
>
> Jakub

Re: [patch 79279] combine/simplify_set issue

2017-01-31 Thread Segher Boessenkool

Hello,

On Mon, Jan 30, 2017 at 10:43:23AM +0100, Aurelien Buhrig wrote:
> This patch fixes a combiner bug in simplify_set which calls
> CANNOT_CHANGE_MODE_CLASS with wrong mode params.
> It occurs when trying to simplify paradoxical subregs of hw regs (whose
> natural mode is lower than a word).
> 
> In fact, changing from (set x:m1 (subreg:m1 (op:m2))) to (set (subreg:m2
> x)  op2) is valid if REG_CANNOT_CHANGE_MODE_P (x, m1, m2) is false
> and REG_CANNOT_CHANGE_MODE_P (x, GET_MODE (src), GET_MODE (SUBREG_REG
> (src))

r62212 (in 2003) changed it to what we have now, it used to be what you
want to change it back to.

You say m1 > m2, which means you have WORD_REGISTER_OPERATIONS defined.

Where does this transformation go wrong?  Why is the resulting insn
recognised at all?  For example, general_operand should refuse it.
Maybe you have custom *_operand that do not handle subreg correctly?

The existing code looks correct: what we want to know is if an m2
interpreted as an m1 yields the correct value.  We might *also* want
your condition, but I'm not sure about that.

>See:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79279
> 
> Validated on a private target,
> bootstraped on x86_64-pc-linux-gnu, but I'm not sure which target is the
> most relevant for this patch though ...
> 
> OK to commit?

Sorry, no.  We're currently in development stage 4, and this is not a
regression (see ).  But we can
of course discuss this further, and you can repost the patch when stage 1
opens (a few months from now) if you still want it.

Segher

Re: [PATCH 2/6] RISC-V Port: gcc

2017-01-31 Thread Andrew Waterman

On Tue, Jan 31, 2017 at 10:32 AM, Richard Henderson  wrote:
> On 01/30/2017 05:10 PM, Andrew Waterman wrote:
 +(define_expand "clear_cache"
 +  [(match_operand 0 "pmode_register_operand")
 +   (match_operand 1 "pmode_register_operand")]
 +  ""
 +  "
 +{
 +  emit_insn (gen_fence_i ());
 +  DONE;
 +}")
>>>
>>>
>>> Do you need a FENCE before the FENCE.I?
>>
>> It's actually not clear to me what the semantics of clear_cache are
>> for multiprocessors.  Can you shed some light?
>>
>> If thread A does modifies code then sets a flag, then thread B reads
>> the flag and executes a FENCE.I, then thread A needs a FENCE before
>> setting the flag and thread B needs a fence before the FENCE.I.  But,
>> is it not the software's responsibility to insert both fences, rather
>> than assuming one of the fences is folded into clear_cache?
>
> Your introduction of "flag" confuses the issue.
>
> Having re-read the description in section 2.7, I see that FENCE.I is
> thread-local and is all that is required for a single thread to sync its own I
> and D caches.  I think perhaps I'd mis-read or mis-remembered before.
>
> Practically speaking, I'm not sure we have put any real thought about what
> needs to happen for threads using on-stack trampolines.  Certainly no other 
> gcc
> port attempts to broadcast the need for an icache flush to other cpus.
>
> So just leave as-is -- __builtin_clear_cache works properly for the local 
> thread.

Sorry, I jumped right to the assumption that you were suggesting a
FENCE for multiprocessor synchronization.  Indeed, it's not necessary
for enforcing ordering between local stores and FENCE.I.

>
>
 +(define_insn "call_value_multiple_internal"
 +  [(set (match_operand 0 "register_operand" "")
 +   (call (mem:SI (match_operand 1 "call_insn_operand" "l,S"))
 + (match_operand 2 "" "")))
 +   (set (match_operand 3 "register_operand" "")
 +   (call (mem:SI (match_dup 1))
 + (match_dup 2)))
>>>
>>>
>>> Any reason for this?  Your return value registers are sequential.  The
>>> normal thing to do is just use e.g. (reg:TI 10).
>>
>> I think we'd need different patterns for mixed int/FP struct returns
>> (which use a0 and fa0) if we took that approach.
>
> Ah.  Other targets get away with using a PARALLEL.
>
> >From sparc.c, function_arg_record_value:
>
>   data.ret = gen_rtx_PARALLEL (mode, rtvec_alloc (data.stack + nregs));
>
> and sparc.md:
>
> (define_insn "*call_value_symbolic_sp64"
>   [(set (match_operand 0 "" "")
> (call (mem:DI (match_operand:DI 1 "symbolic_operand" "s"))
>   (match_operand 2 "" "")))
>(clobber (reg:DI O7_REG))]
>
> So you wind up with things like
>
>   (set (parallel [
>  (reg:DI o0)
>  (reg:DF fp0)
>  (reg:DI o1)
>  (reg:DF fp1)
> ])
>(call (mem:DI (symbol_ref:DI "function"))
>  (const_int 0)))
>
> We do the same for x86_64 -- in function_value_64:
>
>   ret = construct_container (mode, orig_mode, valtype, 1,
>  X86_64_REGPARM_MAX, X86_64_SSE_REGPARM_MAX,
>  x86_64_int_return_registers, 0);

Thanks--these pointers were quite helpful.

>
>
> r~

Re: [committed] move constant to the right of relational operators (Re: [PATCH 4/5] distinguish likely and unlikely results (PR 78703))

2017-01-31 Thread Bernhard Reutner-Fischer

On 31 January 2017 00:19:46 CET, Martin Sebor  wrote:
>> So I see the introduction of many
>>
>> if (const OP object) expressions
>>
>> Can you please fix those as an independent patch after #4 and #5 are
>> installed on the trunk?  Consider that patch pre-approved, but please
>> post it here for the historical record.
>>
>> I think a regexp of paren followed by a constant would probably take
>you
>> to them pretty quickly.
>
>I committed the attached patch in r245040.

The majority of these are not equivalent and I'm curious why they aren't?
TIA,

[committed] Prevent ICEs due to bogus substring locations (PR preprocessor/79210)

2017-01-31 Thread David Malcolm

PR preprocessor/79210 identifies a ICE due to a failing assertion
within get_substring_ranges_for_loc when re-lexing strings, due to
the wrong location being used for a string token.

The root cause of the bogus string location is due to
(A) a bogus location being generated when pasting together two tokens
during macro expansion.  This bogus location gets inherited by
subsequent temporary tokens during the expansion, including
(B) that of a stringized macro argument - the
assertion failure happens when attempting to re-lex the string token
obtained from this stringized macro argument, used within a string
concatenation.

It doesn't appear possible to give (A) a full
source_location/location_t since such a value can only cover one range
(rather than the multiple ranges involved in pasted tokens).

The location of (B) could perhaps be addressed by giving the stringized
token the location of the macro argument.  However, this would not
address the issue with re-lexing the string, which expects C/C++ syntax
(quotes, etc).

So the simplest fix at stage 4 is to convert the assertion into an
error-checking conditional, so that when this situation occurs, we
don't provide a location_t within the string, and instead fail
gracefully by providing the location for the start of the string.

Successfully bootstrapped on x86_64-pc-linux-gnu.

Committed to trunk as r245070.

gcc/ChangeLog:
PR preprocessor/79210
* input.c (get_substring_ranges_for_loc): Replace line_width
assertion with error-handling.

gcc/testsuite/ChangeLog:
PR preprocessor/79210
* gcc.dg/format/pr79210.c: New test case.
* gcc.dg/plugin/diagnostic-test-string-literals-2.c (test_pr79210):
New function.
---
 gcc/input.c|  5 -
 gcc/testsuite/gcc.dg/format/pr79210.c  | 23 ++
 .../plugin/diagnostic-test-string-literals-2.c | 23 ++
 3 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/format/pr79210.c

diff --git a/gcc/input.c b/gcc/input.c
index 3e67314..38deb62 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1395,7 +1395,10 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
   const char *literal = line + start.column - 1;
   int literal_length = finish.column - start.column + 1;
 
-  gcc_assert (line_width >= (start.column - 1 + literal_length));
+  /* Ensure that we don't crash if we got the wrong location.  */
+  if (line_width < (start.column - 1 + literal_length))
+   return "line is not wide enough";
+
   cpp_string from;
   from.len = literal_length;
   /* Make a copy of the literal, to avoid having to rely on
diff --git a/gcc/testsuite/gcc.dg/format/pr79210.c 
b/gcc/testsuite/gcc.dg/format/pr79210.c
new file mode 100644
index 000..71f5dd6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/format/pr79210.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-Wformat -Wformat-signedness" } */
+
+__attribute__((format(printf, 3, 4)))
+void dev_printk(const char *level, void *dev, const char *fmt, ...);
+
+#define lpfc_vport_param_init(attr)\
+void lpfc_##attr##_init(void *vport, unsigned int val) \
+{ \
+   dev_printk("3", (void *)0, \
+  "0423 lpfc_"#attr" attribute cannot be set to %d, "\
+  "allowed range is [0, 1]\n", val); \
+}
+
+#define LPFC_VPORT_ATTR_R(name, desc) \
+unsigned int lpfc_##name;\
+lpfc_vport_param_init(name)\
+
+LPFC_VPORT_ATTR_R(peer_port_login,
+ "Allow peer ports on the same physical port to login to each "
+ "other.");
+
+/* { dg-warning "6: format .%d. expects argument of type .int., but argument 4 
has type .unsigned int. " "" { target *-*-* } .-12 } */
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-2.c 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-2.c
index 25cb3f0..e916b93 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-2.c
@@ -51,3 +51,26 @@ test_stringified_token_3 (int x)
 #undef FOO
 }
 
+/* Test of a stringified macro argument within a concatenation.  */
+
+void
+test_pr79210 (void)
+{
+#define lpfc_vport_param_init(attr)\
+   __emit_string_literal_range ( \
+  "0423 lpfc_"#attr" attribute cannot be set to %d, "\
+  "allowed range is [0, 1]\n", 54, 53, 54) \
+
+#define LPFC_VPORT_ATTR_R(name, decc)  \
+  unsigned int lpfc_##name;\
+  lpfc_vport_param_init(name) \
+
+  LPFC_VPORT_ATTR_R(peer_port_login,
+  "some multiline blurb with a short final line "
+  "here");
+
+  /* { dg-error "19: unable to read substring location: line is not wide 
enough" "" { target *-*-* } .-11 } */
+
+#undef LPFC_VPORT_ATTR_R
+#undef lpfc_vport_param_init
+}
-- 
1.8.5.3

Re: [PATCH v2] aarch64: Add split-stack initial support

2017-01-31 Thread Adhemerval Zanella

On 25/01/2017 10:10, Jiong Wang wrote:
> On 24/01/17 18:05, Adhemerval Zanella wrote:
>>
>> On 03/01/2017 13:13, Wilco Dijkstra wrote:
>>
>>> +  /* If function uses stacked arguments save the old stack value so 
>>> morestack
>>> + can return it.  */
>>> +  reg11 = gen_rtx_REG (Pmode, R11_REGNUM);
>>> +  if (cfun->machine->frame.saved_regs_size
>>> +  || cfun->machine->frame.saved_varargs_size)
>>> +emit_move_insn (reg11, stack_pointer_rtx);
>>>
>>> This doesn't look right - we could have many arguments even without varargs 
>>> or
>>> saved regs.  This would need to check varargs as well as ctrl->args.size (I 
>>> believe
>>> that is the size of the arguments on the stack). It's fine to omit this 
>>> optimization
>>> in the first version - we already emit 2-3 extra instructions for the check 
>>> anyway.
>> I will check for a better solution.
> 
> Hi Adhemerval
> 
>   My only concern on this this patch is the initialization of R11 (internal 
> arg
> pointer).  The current implementation looks to me is generating wrong code 
> for a
> testcase simply return the sum of ten int param, I see the function body is
> using R11 while there is no initialization of it in split prologue,  so if the
> execution flow is *not* through __morestack, then R11 is not initialized.
> As Wilco suggested, I feel using crtl->args.size instead of
> cfun->machine->frame.saved_regs_size might be the correct approach after
> checking assign_parms in function.c.
> 

Hi Jiong,

Indeed the previous version which used 'saved_regs_size' is wrong for stacked
parameters.  A simple 10 arguments function call shows when the reg11 is
evaluated:

cfun->machine->frame.saved_regs_size= 0
cfun->machine->frame.saved_varargs_size = 0
crtl->args.size = 16

So indeed 'ctrl->args.size' seems the correct argument to used in this case
(which will trigger the correct reg11 set for split-stack case).  In this 
version
I also removed some unused variables I left in previous patch.

>From 30cbedc303d364dd94f0d35abee1d237a3671cdb Mon Sep 17 00:00:00 2001
From: Adhemerval Zanella 
Date: Wed, 4 May 2016 21:13:39 +
Subject: [PATCH] aarch64: Add split-stack initial support

This patch adds the split-stack support on aarch64 (PR #67877).  As for
other ports this patch should be used along with glibc and gold support.

The support is done similar to other architectures: a __private_ss field is
added on TCB in glibc, a target-specific __morestack implementation and
helper functions are added in libgcc and compiler supported in adjusted
(split-stack prologue, va_start for argument handling).  I also plan to
send the gold support to adjust stack allocation acrosss split-stack
and default code calls.

Current approach is similar to powerpc one: at most 2 GB of stack allocation
is support so stack adjustments can be done with 2 instructions (either just
a movn plus nop or a movn followed by movk).  The morestack call is non
standard with x10 hollding the requested stack pointer, x11 the argument
pointer, and x12 to return continuation address.  Unwinding is handled by a
personality routine that knows how to find stack segments.

Split-stack prologue on function entry is as follow (this goes before the
usual function prologue):

function:
	mrsx9, tpidr_el0
	movx10, -
	movk   0x0
	addx10, sp, x10
	movx11, sp   	# if function has stacked arguments
	adrp   x12, main_fn_entry
	addx12, x12, :lo12:.L2
	cmpx9, x10
	b.lt   
	b  __morestack
main_fn_entry:
	[function prologue]

Notes:

1. Even if a function does not allocate a stack frame, a split-stack prologue
   is created.  It is to avoid issues with tail call for external symbols
   which might require linker adjustment (libgo/runtime/go-varargs.c).

2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr
   to after the required stack calculation.

3. Similar to powerpc, When the linker detects a call from split-stack to
   non-split-stack code, it adds 16k (or more) to the value found in "allocate"
   instructions (so non-split-stack code gets a larger stack).  The amount is
   tunable by a linker option.  The edit means aarch64 does not need to
   implement __morestack_non_split, necessary on x86 because insufficient
   space is available there to edit the stack comparison code.  This feature
   is only implemented in the GNU gold linker.

4. AArch64 does not handle >4G stack initially and although it is possible
   to implement it, limiting to 4G allows to materize the allocation with
   only 2 instructions (mov + movk) and thus simplifying the linker
   adjustments required.  Supporting multiple threads each requiring more
   than 4G of stack is probably not that important, and likely to OOM at
   run time.

5. The TCB support on GLIBC is meant to be included in version 2.25.

6. The continuation address materialized on x12 is done using 'adrp'
   plus add and a static relocation.

[C++ PATCH] fix c++79290 bogus error with -Wall

2017-01-31 Thread Nathan Sidwell

This patch fixes 79290, a bogus error caused by -Wall trying to figure 
out if a component ref expr results a thing that can never be null.


unfortunately, it was looking at a PMF record access, and even though 
that;s a RECORD_TYPE, it's not CLASSTYPE because the relevant flag is clear.


Fixed by setting TREE_NO_WARN on the COMPONENT_REF, so we don't go 
trying to warn about it.


applied to trunk.

nathan
--
Nathan Sidwell
2017-01-31  Nathan Sidwell  

	PR c++/79290
	* typeck.c (build_ptrmemfunc_access_expr): Set TREE_NO_WARNING.

	PR c++/79290
	* g++.dg/warn/pr79290.C: New.

Index: cp/typeck.c
===
--- cp/typeck.c	(revision 245067)
+++ cp/typeck.c	(working copy)
@@ -2950,7 +2950,10 @@ build_ptrmemfunc_access_expr (tree ptrme
member = DECL_CHAIN (member))
 if (DECL_NAME (member) == member_name)
   break;
-  return build_simple_component_ref (ptrmem, member);
+  tree res = build_simple_component_ref (ptrmem, member);
+
+  TREE_NO_WARNING (res) = 1;
+  return res;
 }
 
 /* Given an expression PTR for a pointer, return an expression
Index: testsuite/g++.dg/warn/pr79290.C
===
--- testsuite/g++.dg/warn/pr79290.C	(revision 0)
+++ testsuite/g++.dg/warn/pr79290.C	(working copy)
@@ -0,0 +1,25 @@
+// { dg-additional-options "-Wall" }
+// PR 79290, bogus warning looking inside PMF
+
+struct Song {
+  int get() const ;
+};
+
+typedef int (Song::*PMF_t)() const;
+
+struct SongTag {
+  PMF_t function () const;
+};
+
+
+template
+struct Printer {
+  bool Foo(const SongTag ) {
+return st.function () == ::get;
+  }
+};
+
+void Baz (Printer *p, SongTag const )
+{
+  p->Foo (st);
+}

[PATCH] Fix ICE with spelling hints within explicit namespace aliases (PR c++/79298)

2017-01-31 Thread David Malcolm

PR c++/79298 reports a segfault when handling a qualified name lookup error,
where consider_binding_level dereferences a NULL cp_binding_level *.

The root cause is that suggest_alternative_in_explicit_scope assumes that
NAMESPACE_LEVEL (scope) is non-NULL, but this can be NULL for namespace aliases.

This patch adds a use of ORIGINAL_NAMESPACE to look through any namespace
aliases.

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/cp/ChangeLog:
PR c++/79298
* name-lookup.c (suggest_alternative_in_explicit_scope): Resolve
any namespace aliases.

gcc/testsuite/ChangeLog:
PR c++/79298
* g++.dg/spellcheck-pr79298.C: New test case.
---
 gcc/cp/name-lookup.c  |  3 +++
 gcc/testsuite/g++.dg/spellcheck-pr79298.C | 17 +
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-pr79298.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index a3cb7ee..994f7f0 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -4545,6 +4545,9 @@ bool
 suggest_alternative_in_explicit_scope (location_t location, tree name,
   tree scope)
 {
+  /* Resolve any namespace aliases.  */
+  scope = ORIGINAL_NAMESPACE (scope);
+
   cp_binding_level *level = NAMESPACE_LEVEL (scope);
 
   best_match  bm (name);
diff --git a/gcc/testsuite/g++.dg/spellcheck-pr79298.C 
b/gcc/testsuite/g++.dg/spellcheck-pr79298.C
new file mode 100644
index 000..4d7bbf9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/spellcheck-pr79298.C
@@ -0,0 +1,17 @@
+// Ensure that we can offer suggestions for misspellings via a
+// namespace alias.
+
+namespace N { int x; int color; }
+namespace M = N; 
+namespace O = M; 
+
+int foo () 
+{
+  return M::y; // { dg-error ".y. is not a member of .M." }
+}
+
+int bar () 
+{
+  return O::colour; // { dg-error ".colour. is not a member of .O." }
+  // { dg-message "suggested alternative: .color." "" { target *-*-* } .-1 }
+}
-- 
1.8.5.3

Re: [PATCH] Fix -masm=intel output for AVX512{F,VL} gathers (PR target/79299)

2017-01-31 Thread Rainer Orth

Hi Jakub,

> --- gcc/testsuite/gcc.target/i386/avx512vl-pr79299-1.c.jj 2017-01-31 
> 13:15:20.919592886 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512vl-pr79299-1.c2017-01-31 
> 14:33:38.425028304 +0100
> @@ -0,0 +1,91 @@
> +/* PR target/79299 */
> +/* { dg-do assemble { target avx512vl } } */
> +/* { dg-options "-Ofast -mavx512vl -masm=intel" } */

both tests need

/* { dg-require-effective-target masm_intel } */

I believe.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [wwwdocs] changes.html - document new warning options

2017-01-31 Thread Martin Sebor


On 01/31/2017 10:30 AM, Aldy Hernandez wrote:

On 01/31/2017 10:57 AM, Martin Sebor wrote:

On 01/31/2017 03:50 AM, Aldy Hernandez wrote:

On 01/24/2017 03:07 PM, Martin Sebor wrote:

Hi Martin.

Thank you for taking care of this.


+The -Walloca-larger-than=size option
detects
+calls to the alloca function whose argument may
exceed
+the specified size.
+-Walloca-larger-than is not included in either
+-Wall or -Wextra and must be explicitly
+enabled.


You should probably document that we warn, not just on arguments that
exceed a specific threshold, but arguments that are unbounded or unknown
at compile time:

foo (size_t n)
{
...
p = alloca(n);
...
}


Sure.  I added a bit of text to the end of the sentence that should
hopefully make that clearer.  The updated patch is attached.


The -Walloca-larger-than=size option

detects

calls to the alloca function whose argument either may
exceed the specified size, or that is not known
to be sufficiently constrained to avoid exceeding it.


I'd like to see an example, if possible.  I'm a big fan of them :).

Pretty please :).


Since you ask so nicely I added another example but I'm afraid it
isn't terribly interesting:

  In contrast, a call to alloca that isn't bounded at all such as
  in the following function will elicit the warning below regardless
  of the size argument to the option.

  void f (size_t n)
  {
char *d = alloca (n)
...
  }

  warning: unbounded use of 'alloca' [-Walloca-larger-than=]

Martin
Index: gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.49
diff -u -r1.49 changes.html
--- gcc-7/changes.html	30 Jan 2017 14:04:20 -	1.49
+++ gcc-7/changes.html	31 Jan 2017 18:54:54 -
@@ -40,8 +40,14 @@
 
 
 General Optimizer Improvements
-
 
+  GCC 7 can determine the return value or range of return values of
+some calls to the sprintf family of functions and make
+it available to other optimization passes.  Some calls to the 
+  snprintf function with a zero size argument can be folded
+into constants.  The optimization is included in -O1
+and can be selectively controlled by the
+-fprintf-return-value option.
   A new store merging pass has been added.  It merges constant stores to
   adjacent memory locations into fewer, wider, stores.
   It can be enabled by using the -fstore-merging option and is
@@ -283,6 +289,138 @@
  enum operation { add, count };
^
 
+GCC 7 contains a number of enhancements that help detect buffer overflow
+  and other forms of invalid memory accesses.
+  
+The -Walloc-size-larger-than=size option
+	detects calls to standard and user-defined memory allocation
+	functions decorated with attribute alloc_size whose
+	argument exceeds the specified size
+	(PTRDIFF_MAX by default).  The option also detects
+	arithmetic overflow in the computation of the size in two-argument
+	allocation functions like calloc where the total size
+	is the product of the two arguments.  Since calls with an excessive
+	size cannot succeed they are typically the result of programming
+	errors.  Such bugs have been known to be the source of
+	vulnerabilities and a target of exploits.
+  -Walloc-size-larger-than=PTRDIFF_MAX is included
+  in -Wall.
+  For example, the following call to malloc incorrectly tries
+	to avoid passing a negative argument to the function and instead ends
+	up unconditionally invoking it with an argument less than or equal
+	to zero.  Since after conversion to the type of the argument of the
+  function (size_t) a negative argument results in a value
+  in excess of the maximum PTRDIFF_MAX the call is diagnosed.
+  
+void* f (int n)
+{
+  return malloc (n > 0 ? 0 : n);
+}
+
+warning: argument 1 range [2147483648, 4294967295] exceeds maximum object size 2147483647 [-Walloc-size-larger-than=]
+The -Walloc-zero option detects calls to standard
+	and user-defined memory allocation functions decorated with attribute
+	alloc_size with a zero argument.  -Walloc-zero
+	is not included in either -Wall or -Wextra
+	and must be explicitly enabled.
+The -Walloca option detects all calls to the
+	alloca function in the program.  -Walloca
+	is not included in either -Wall or -Wextra
+	and must be explicitly enabled.
+The -Walloca-larger-than=size option detects
+	calls to the alloca function whose argument either may
+	exceed the specified size, or that is not known
+	to be sufficiently constrained to avoid exceeding it.
+	-Walloca-larger-than is not included in either
+	-Wall or -Wextra and must be explicitly
+	enabled.
+  For example, compiling the following snippet with
+	-Walloca-larger-than=1024 results in a warning because
+	even though the code appears to call alloca only with
+	sizes of 1kb and less, since n is signed, a negative
+	value would result in a call

Re: [PATCH] PR fortran/79305 -- Fix spelling in #if defined() condition

2017-01-31 Thread Steve Kargl

On Tue, Jan 31, 2017 at 07:41:27PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 31, 2017 at 10:35:14AM -0800, Steve Kargl wrote:
> > I would like to apply the following patch to trunk.  It technically
> > isn't a regression or documentation fix.  Is this ok with the release
> > managers?
> 
> Ok.
> > 
> > 2017-01-31  Steven G. Kargl  
> > 
> > PR fortran/79305
> > * libgfortran/c99_protos.h: Spell HAVE_EXPL correctly.
> > * libgfortran/intrinsics/c99_functions.c: Ditto.
> 
> But please no libgfortran/ prefix in libgfortran/ChangeLog.
> 

Thanks.  I caught the libgfortran/ prefix issue, but copied
from the the wrong edit window.

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

Re: [PATCH] PR fortran/79305 -- Fix spelling in #if defined() condition

2017-01-31 Thread Jakub Jelinek

On Tue, Jan 31, 2017 at 10:35:14AM -0800, Steve Kargl wrote:
> I would like to apply the following patch to trunk.  It technically
> isn't a regression or documentation fix.  Is this ok with the release
> managers?

Ok.
> 
> 2017-01-31  Steven G. Kargl  
> 
>   PR fortran/79305
>   * libgfortran/c99_protos.h: Spell HAVE_EXPL correctly.
>   * libgfortran/intrinsics/c99_functions.c: Ditto.

But please no libgfortran/ prefix in libgfortran/ChangeLog.

Jakub

[PATCH] PR fortran/79305 -- Fix spelling in #if defined() condition

2017-01-31 Thread Steve Kargl

I would like to apply the following patch to trunk.  It technically
isn't a regression or documentation fix.  Is this ok with the release
managers?

2017-01-31  Steven G. Kargl  

PR fortran/79305
* libgfortran/c99_protos.h: Spell HAVE_EXPL correctly.
* libgfortran/intrinsics/c99_functions.c: Ditto.

Index: libgfortran/c99_protos.h
===
--- libgfortran/c99_protos.h(revision 244065)
+++ libgfortran/c99_protos.h(working copy)
@@ -332,7 +332,7 @@ extern float complex cexpf (float comple
 extern double complex cexp (double complex);
 #endif
 
-#if !defined(HAVE_CEXPL) && defined(HAVE_COSL) && defined(HAVE_SINL) && 
defined(EXPL)
+#if !defined(HAVE_CEXPL) && defined(HAVE_COSL) && defined(HAVE_SINL) && 
defined(HAVE_EXPL)
 #define HAVE_CEXPL 1
 extern long double complex cexpl (long double complex);
 #endif
Index: libgfortran/intrinsics/c99_functions.c
===
--- libgfortran/intrinsics/c99_functions.c  (revision 244065)
+++ libgfortran/intrinsics/c99_functions.c  (working copy)
@@ -913,7 +913,7 @@ cexp (double complex z)
 }
 #endif
 
-#if !defined(HAVE_CEXPL) && defined(HAVE_COSL) && defined(HAVE_SINL) && 
defined(EXPL)
+#if !defined(HAVE_CEXPL) && defined(HAVE_COSL) && defined(HAVE_SINL) && 
defined(HAVE_EXPL)
 #define HAVE_CEXPL 1
 long double complex cexpl (long double complex z);
 

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

Re: [PATCH 2/6] RISC-V Port: gcc

2017-01-31 Thread Richard Henderson

On 01/30/2017 05:10 PM, Andrew Waterman wrote:
>>> +(define_expand "clear_cache"
>>> +  [(match_operand 0 "pmode_register_operand")
>>> +   (match_operand 1 "pmode_register_operand")]
>>> +  ""
>>> +  "
>>> +{
>>> +  emit_insn (gen_fence_i ());
>>> +  DONE;
>>> +}")
>>
>>
>> Do you need a FENCE before the FENCE.I?
> 
> It's actually not clear to me what the semantics of clear_cache are
> for multiprocessors.  Can you shed some light?
> 
> If thread A does modifies code then sets a flag, then thread B reads
> the flag and executes a FENCE.I, then thread A needs a FENCE before
> setting the flag and thread B needs a fence before the FENCE.I.  But,
> is it not the software's responsibility to insert both fences, rather
> than assuming one of the fences is folded into clear_cache?

Your introduction of "flag" confuses the issue.

Having re-read the description in section 2.7, I see that FENCE.I is
thread-local and is all that is required for a single thread to sync its own I
and D caches.  I think perhaps I'd mis-read or mis-remembered before.

Practically speaking, I'm not sure we have put any real thought about what
needs to happen for threads using on-stack trampolines.  Certainly no other gcc
port attempts to broadcast the need for an icache flush to other cpus.

So just leave as-is -- __builtin_clear_cache works properly for the local 
thread.


>>> +(define_insn "call_value_multiple_internal"
>>> +  [(set (match_operand 0 "register_operand" "")
>>> +   (call (mem:SI (match_operand 1 "call_insn_operand" "l,S"))
>>> + (match_operand 2 "" "")))
>>> +   (set (match_operand 3 "register_operand" "")
>>> +   (call (mem:SI (match_dup 1))
>>> + (match_dup 2)))
>>
>>
>> Any reason for this?  Your return value registers are sequential.  The
>> normal thing to do is just use e.g. (reg:TI 10).
> 
> I think we'd need different patterns for mixed int/FP struct returns
> (which use a0 and fa0) if we took that approach.

Ah.  Other targets get away with using a PARALLEL.

>From sparc.c, function_arg_record_value:

  data.ret = gen_rtx_PARALLEL (mode, rtvec_alloc (data.stack + nregs));

and sparc.md:

(define_insn "*call_value_symbolic_sp64"
  [(set (match_operand 0 "" "")
(call (mem:DI (match_operand:DI 1 "symbolic_operand" "s"))
  (match_operand 2 "" "")))
   (clobber (reg:DI O7_REG))]

So you wind up with things like

  (set (parallel [
 (reg:DI o0)
 (reg:DF fp0)
 (reg:DI o1)
 (reg:DF fp1)
])
   (call (mem:DI (symbol_ref:DI "function"))
 (const_int 0)))

We do the same for x86_64 -- in function_value_64:

  ret = construct_container (mode, orig_mode, valtype, 1,
 X86_64_REGPARM_MAX, X86_64_SSE_REGPARM_MAX,
 x86_64_int_return_registers, 0);


r~

[C++ PATCH] Fix diagnostic printing of COMPONENT_REF with ARROW_EXPR first operand (PR c++/79304)

2017-01-31 Thread Jakub Jelinek

Hi!

For dependent expression followed by -> and identifier or ~ identifier,
the parser creates COMPONENT_REF with ARROW_EXPR as its first argument.
When printing that, we actually print expr->.identifier , which is not
what the user typed.  The following patch omits the undesirable dot in
that case, we know that dump_expr on the ARROW_EXPR already printed ->

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-01-31  Jakub Jelinek  

PR c++/79304
* error.c (dump_expr) : Don't print .
after ARROW_EXPR.

* g++.dg/diagnostic/pr79304.C: New test.

--- gcc/cp/error.c.jj   2017-01-21 02:26:06.0 +0100
+++ gcc/cp/error.c  2017-01-31 17:03:17.737123973 +0100
@@ -2247,7 +2247,8 @@ dump_expr (cxx_pretty_printer *pp, tree
else
  {
dump_expr (pp, ob, flags | TFF_EXPR_IN_PARENS);
-   pp_cxx_dot (pp);
+   if (TREE_CODE (ob) != ARROW_EXPR)
+ pp_cxx_dot (pp);
  }
dump_expr (pp, TREE_OPERAND (t, 1), flags & ~TFF_EXPR_IN_PARENS);
   }
--- gcc/testsuite/g++.dg/diagnostic/pr79304.C.jj2017-01-31 
17:08:17.347318113 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr79304.C   2017-01-31 17:08:01.0 
+0100
@@ -0,0 +1,20 @@
+// PR c++/79304
+// { dg-do compile }
+
+struct C { };
+
+template
+struct X
+{
+  C* c;
+
+  void f() {
+this->c.s();   // { dg-error "->c" }
+  }
+};
+
+int main()
+{
+  X x;
+  x.f();
+}

Jakub

[PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-01-31 Thread Jakub Jelinek

Hi!

useless_type_conversion_p says that precision 1 unsigned BOOLEAN_TYPE
conversion to/from precision 1 unsigned integral type is useless,
but apparently the vectorizer heavily relies on the BOOLEAN_TYPE vs.
unsigned integral:1 distinction (uses VECTOR_BOOLEAN_TYPE_P types
only for real BOOLEAN_TYPE).  As the conversion is useless, we can end up
as in the following testcase with e.g. binary operations which have boolean
on one side and unsigned int:1 on another.

The following patch replaces all explicit BOOLEAN_TYPE checks in the
vectorizer with a new macro that handles unsigned integral:1 the same
as BOOLEAN_TYPE.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-01-31  Jakub Jelinek  

PR tree-optimization/79284
* tree.h (INTEGRAL_BOOLEAN_TYPE_P): Define.
* tree-vect-stmts.c (vect_get_vec_def_for_operand,
vectorizable_mask_load_store, vectorizable_operation,
vect_is_simple_cond, get_same_sized_vectype): Use it instead
of comparing TREE_CODE of a type against BOOLEAN_TYPE.
* tree-vect-patterns.c (check_bool_pattern, search_type_for_mask_1,
vect_recog_bool_pattern, vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
testsuite/
* gcc.c-torture/compile/pr79284.c: New test.

--- gcc/tree.h.jj   2017-01-23 23:57:41.0 +0100
+++ gcc/tree.h  2017-01-31 16:16:28.969224648 +0100
@@ -532,6 +532,16 @@ extern void omp_clause_range_check_faile
 || VECTOR_TYPE_P (TYPE))   \
&& INTEGRAL_TYPE_P (TREE_TYPE (TYPE
 
+/* Nonzero if TYPE represents a (scalar) boolean type or type
+   in the middle-end compatible with it.  */
+
+#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
+  (TREE_CODE (TYPE) == BOOLEAN_TYPE\
+   || ((TREE_CODE (TYPE) == INTEGER_TYPE   \
+   || TREE_CODE (TYPE) == ENUMERAL_TYPE)   \
+   && TYPE_PRECISION (TYPE) == 1   \
+   && TYPE_UNSIGNED (TYPE)))
+
 /* Nonzero if TYPE represents a non-saturating fixed-point type.  */
 
 #define NON_SAT_FIXED_POINT_TYPE_P(TYPE) \
--- gcc/tree-vect-stmts.c.jj2017-01-09 12:44:14.0 +0100
+++ gcc/tree-vect-stmts.c   2017-01-31 16:18:51.855369244 +0100
@@ -1420,7 +1420,7 @@ vect_get_vec_def_for_operand (tree op, g
 
   if (vectype)
vector_type = vectype;
-  else if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+  else if (INTEGRAL_BOOLEAN_TYPE_P (TREE_TYPE (op))
   && VECTOR_BOOLEAN_TYPE_P (stmt_vectype))
vector_type = build_same_sized_truth_vector_type (stmt_vectype);
   else
@@ -2029,7 +2029,7 @@ vectorizable_mask_load_store (gimple *st
 
   mask = gimple_call_arg (stmt, 2);
 
-  if (TREE_CODE (TREE_TYPE (mask)) != BOOLEAN_TYPE)
+  if (!INTEGRAL_BOOLEAN_TYPE_P (TREE_TYPE (mask)))
 return false;
 
   /* FORNOW. This restriction should be relaxed.  */
@@ -5275,9 +5275,9 @@ vectorizable_operation (gimple *stmt, gi
 of booleans or vector of integers).  We use output
 vectype because operations on boolean don't change
 type.  */
-  if (TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE)
+  if (INTEGRAL_BOOLEAN_TYPE_P (TREE_TYPE (op0)))
{
- if (TREE_CODE (TREE_TYPE (scalar_dest)) != BOOLEAN_TYPE)
+ if (!INTEGRAL_BOOLEAN_TYPE_P (TREE_TYPE (scalar_dest)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -7666,7 +7666,7 @@ vect_is_simple_cond (tree cond, vec_info
 
   /* Mask case.  */
   if (TREE_CODE (cond) == SSA_NAME
-  && TREE_CODE (TREE_TYPE (cond)) == BOOLEAN_TYPE)
+  && INTEGRAL_BOOLEAN_TYPE_P (TREE_TYPE (cond)))
 {
   gimple *lhs_def_stmt = SSA_NAME_DEF_STMT (cond);
   if (!vect_is_simple_use (cond, vinfo, _def_stmt,
@@ -9059,7 +9059,7 @@ get_mask_type_for_scalar_type (tree scal
 tree
 get_same_sized_vectype (tree scalar_type, tree vector_type)
 {
-  if (TREE_CODE (scalar_type) == BOOLEAN_TYPE)
+  if (INTEGRAL_BOOLEAN_TYPE_P (scalar_type))
 return build_same_sized_truth_vector_type (vector_type);
 
   return get_vectype_for_scalar_type_and_size
--- gcc/tree-vect-patterns.c.jj 2017-01-01 12:45:37.0 +0100
+++ gcc/tree-vect-patterns.c2017-01-31 16:17:48.082197351 +0100
@@ -3158,9 +3158,7 @@ check_bool_pattern (tree var, vec_info *
   break;
 
 CASE_CONVERT:
-  if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
-  || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
- && TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE)
+  if (!INTEGRAL_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
return false;
   if (! check_bool_pattern (rhs1, vinfo, stmts))
return false;
@@ -3474,9 +3472,7 @@ search_type_for_mask_1 (tree var, vec_in
   if (TREE_CODE (var) != SSA_NAME)
 return NULL_TREE;
 
-  if ((TYPE_PRECISION (TREE_TYPE (var))

[PATCH] Fix -masm=intel output for AVX512{F,VL} gathers (PR target/79299)

2017-01-31 Thread Jakub Jelinek

Hi!

As mentioned in the PR and shown by the testcases, many of the AVX512{F,VL}
gathers fail to assemble with binutils.

Unlike AVX2 gathers, gas for some strange reason requires the memory operand
to be {Q,XMM,YMM,ZMM}WORD depending on the sizes of all the memory locations
together (and when not masked), rather then just using always DWORD or QWORD
depending on whether the argument is described as vm32{x,y,z} or
vm64{x,y,z} and for some AVX512{F,VL} that is what is actually emitted.
I have only a year old ICC around and that emits DWORD/QWORD instead.

Anyway, the following patch honors what gas accepts, if we want to change
gas, we likely want to do that afterwards (but keep compatibility), then add a
configure check whether gas supports the correct stuff and then decide based
on that.

There is also a fix for an inconsistency in the destination register:
%5, %x0%{%1%}|%t0%{%1%}, %...
where we were using %xmm? in AT, but %ymm? in Intel mode.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-01-31  Jakub Jelinek  

PR target/79299
* config/i386/sse.md (xtg_mode, gatherq_mode): New mode attrs.
(*avx512f_gathersi, *avx512f_gathersi_2,
*avx512f_gatherdi, *avx512f_gatherdi_2): Use them,
fix -masm=intel patterns.

* gcc.target/i386/avx512vl-pr79299-1.c: New test.
* gcc.target/i386/avx512vl-pr79299-2.c: New test.

--- gcc/config/i386/sse.md.jj   2017-01-26 13:22:55.0 +0100
+++ gcc/config/i386/sse.md  2017-01-31 14:33:15.389332480 +0100
@@ -811,6 +811,12 @@ (define_mode_attr concat_tg_mode
   [(V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t")
(V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")])
 
+;; Tie mode of assembler operand to mode iterator
+(define_mode_attr xtg_mode
+  [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x") (V4SF "x") (V2DF "x")
+   (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t")
+   (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")])
+
 ;; Half mask mode for unpacks
 (define_mode_attr HALFMASKMODE
   [(DI "SI") (SI "HI")])
@@ -19041,6 +19047,12 @@ (define_insn "*avx2_gatherdi_4"
(set_attr "prefix" "vex")
(set_attr "mode" "")])
 
+;; Memory operand override for -masm=intel of the v*gatherq* patterns.
+(define_mode_attr gatherq_mode
+  [(V4SI "q") (V2DI "x") (V4SF "q") (V2DF "x")
+   (V8SI "x") (V4DI "t") (V8SF "x") (V4DF "t")
+   (V16SI "t") (V8DI "g") (V16SF "t") (V8DF "g")])
+
 (define_expand "_gathersi"
   [(parallel [(set (match_operand:VI48F 0 "register_operand")
   (unspec:VI48F
@@ -19074,7 +19086,7 @@ (define_insn "*avx512f_gathersi"
  UNSPEC_GATHER))
(clobber (match_scratch: 2 "="))]
   "TARGET_AVX512F"
-  "vgatherd\t{%6, %0%{%2%}|%0%{%2%}, %g6}"
+  "vgatherd\t{%6, %0%{%2%}|%0%{%2%}, 
%6}"
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -19093,7 +19105,7 @@ (define_insn "*avx512f_gathersi_2"
  UNSPEC_GATHER))
(clobber (match_scratch: 1 "="))]
   "TARGET_AVX512F"
-  "vgatherd\t{%5, %0%{%1%}|%0%{%1%}, %g5}"
+  "vgatherd\t{%5, %0%{%1%}|%0%{%1%}, 
%5}"
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -19133,9 +19145,7 @@ (define_insn "*avx512f_gatherdi"
(clobber (match_scratch:QI 2 "="))]
   "TARGET_AVX512F"
 {
-  if (GET_MODE_SIZE (GET_MODE_INNER (mode)) == 4)
-return "vgatherq\t{%6, %1%{%2%}|%1%{%2%}, 
%t6}";
-  return "vgatherq\t{%6, %1%{%2%}|%1%{%2%}, %g6}";
+  return "vgatherq\t{%6, %1%{%2%}|%1%{%2%}, 
%6}";
 }
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -19159,11 +19169,11 @@ (define_insn "*avx512f_gatherdi_2"
   if (mode != mode)
 {
   if ( != 64)
-   return "vgatherq\t{%5, 
%x0%{%1%}|%t0%{%1%}, %g5}";
+   return "vgatherq\t{%5, 
%x0%{%1%}|%x0%{%1%}, %5}";
   else
-   return "vgatherq\t{%5, 
%t0%{%1%}|%t0%{%1%}, %g5}";
+   return "vgatherq\t{%5, 
%t0%{%1%}|%t0%{%1%}, %t5}";
 }
-  return "vgatherq\t{%5, %0%{%1%}|%0%{%1%}, %g5}";
+  return "vgatherq\t{%5, %0%{%1%}|%0%{%1%}, 
%5}";
 }
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
--- gcc/testsuite/gcc.target/i386/avx512vl-pr79299-1.c.jj   2017-01-31 
13:15:20.919592886 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr79299-1.c  2017-01-31 
14:33:38.425028304 +0100
@@ -0,0 +1,91 @@
+/* PR target/79299 */
+/* { dg-do assemble { target avx512vl } } */
+/* { dg-options "-Ofast -mavx512vl -masm=intel" } */
+
+#define N 1024
+
+unsigned long long a[N];
+unsigned int b[N], c[N], d[N], e[N], f[N];
+unsigned long long g[N], h[N], j[N], k[N];
+float l[N], m[N], n[N], o[N];
+double p[N], q[N], r[N], s[N];
+
+void
+f1 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+d[i] = c[a[i]];
+  for (i = 0; i < N; i++)
+e[i] = f[i] ? f[i] : c[a[i]];
+}
+
+void
+f2 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+d[i] = c[b[i]];
+  for (i = 0; i <

Re: [C++ PATCH] pr 67273 & 79253 lambdas, warnings & ICEs

2017-01-31 Thread Nathan Sidwell


On 01/31/2017 10:04 AM, Jason Merrill wrote:


Agreed.  As I was suggesting in response to one of Adam's patches, I
think we need to defer creating the closure until f is instantiated;
at that point we can resolve all names from f and so we should be able
to always push to top when instantiating the lambda.


Yup.


Perhaps

if (!fn_context || fn_context != current_function_decl)


That works fine, thanks.   It also makes it clearer that 'nested' must 
be true if !push_to_top (but not vice-versa), which allowed a little 
more simplification.


Committed the attached to trunk.

nathan
--
Nathan Sidwell
2017-01-31  Nathan Sidwell  

	PR c++/67273
	PR c++/79253
	* pt.c: (instantiate_decl): Push to top level when current
	function scope doesn't match.  Only push lmabda scope stack when
	pushing to top.

	PR c++/67273
	PR c++/79253
	* g++.dg/cpp1y/pr67273.C: New.
	* g++.dg/cpp1y/pr79253.C: New.

Index: cp/pt.c
===
--- cp/pt.c	(revision 245066)
+++ cp/pt.c	(working copy)
@@ -22666,20 +22666,21 @@ instantiate_decl (tree d, bool defer_ok,
 	goto out;
 }
 
-  bool nested;
+  bool push_to_top, nested;
   tree fn_context;
   fn_context = decl_function_context (d);
-  nested = (current_function_decl != NULL_TREE);
+  nested = current_function_decl != NULL_TREE;
+  push_to_top = !(nested && fn_context == current_function_decl);
+
   vec omp_privatization_save;
   if (nested)
 save_omp_privatization_clauses (omp_privatization_save);
 
-  if (!fn_context)
+  if (push_to_top)
 push_to_top_level ();
   else
 {
-  if (nested)
-	push_function_context ();
+  push_function_context ();
   cp_unevaluated_operand = 0;
   c_inhibit_evaluation_warnings = 0;
 }
@@ -22756,7 +22757,7 @@ instantiate_decl (tree d, bool defer_ok,
 	block = push_stmt_list ();
   else
 	{
-	  if (LAMBDA_FUNCTION_P (d))
+	  if (push_to_top && LAMBDA_FUNCTION_P (d))
 	{
 	  /* When instantiating a lambda's templated function
 		 operator, we need to push the non-lambda class scope
@@ -22849,9 +22850,9 @@ instantiate_decl (tree d, bool defer_ok,
   /* We're not deferring instantiation any more.  */
   TI_PENDING_TEMPLATE_FLAG (DECL_TEMPLATE_INFO (d)) = 0;
 
-  if (!fn_context)
+  if (push_to_top)
 pop_from_top_level ();
-  else if (nested)
+  else
 pop_function_context ();
 
   if (nested)
Index: testsuite/g++.dg/cpp1y/pr67273.C
===
--- testsuite/g++.dg/cpp1y/pr67273.C	(revision 0)
+++ testsuite/g++.dg/cpp1y/pr67273.C	(working copy)
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++14 } }
+// { dg-additional-options "-Wshadow" }
+
+// pr67273 bogus warning about shadowing.
+
+
+template  void Foo (T &)
+{
+  int ARG = 2;
+  lambda (1);
+}
+
+void Baz ()
+{
+  Foo ([] (auto &) {});
+}
Index: testsuite/g++.dg/cpp1y/pr79253.C
===
--- testsuite/g++.dg/cpp1y/pr79253.C	(revision 0)
+++ testsuite/g++.dg/cpp1y/pr79253.C	(working copy)
@@ -0,0 +1,33 @@
+// { dg-do compile { target c++14 } }
+// PR 79253 ICE instantiating lambda body.
+
+template  struct A;
+template > class B {};
+template  void foo (U, V) { T (0, 0); }
+struct C {};
+template  class F, class>
+void
+bar ()
+{
+  F<0, 0, 0>::baz;
+}
+struct G { int l; };
+template  struct E : C
+{
+  E (int, int) : e (0)
+  {
+auto  = this->m ();
+auto c = [&] { m.l; };
+  }
+  G  ();
+  int e;
+};
+struct D
+{
+  void
+  baz () { bar>; }
+  template  struct F
+  {
+static B<> baz () { foo> (0, 0); }
+  };
+};

Re: [PATCH 1/6] RISC-V Port: gcc/config/riscv/riscv.c

2017-01-31 Thread Richard Henderson

On 01/30/2017 04:53 PM, Andrew Waterman wrote:
> The ISA spec references an out-of-date calling convention, and will be
> removed in the next revision to orthogonalize the ABI from the ISA.
> We are in the process of drafting a RISC-V ELF psABI spec, which the
> GCC port targets.
> https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md
> In this calling convention, the GPRs and FPRs are allocated
> separately, which is more performant, especially when XLEN and FLEN
> are not equal.

Good to know.

Re the new spec, you don't require arguments stored in two fp registers to be
aligned, but you do require arguments stored in two int registers to be
aligned?  Why?

I see that as a waste when it comes to argument lists (for xlen=32) like

  void foo(class *x, int64_t y, int a, int b, int c, int d)

It's not as if the ISA has a store-multiple-register instruction that requires
even register numbers...

r~

Re: [wwwdocs] changes.html - document new warning options

2017-01-31 Thread Aldy Hernandez


On 01/31/2017 10:57 AM, Martin Sebor wrote:

On 01/31/2017 03:50 AM, Aldy Hernandez wrote:

On 01/24/2017 03:07 PM, Martin Sebor wrote:

Hi Martin.

Thank you for taking care of this.


+The -Walloca-larger-than=size option
detects
+calls to the alloca function whose argument may exceed
+the specified size.
+-Walloca-larger-than is not included in either
+-Wall or -Wextra and must be explicitly
+enabled.


You should probably document that we warn, not just on arguments that
exceed a specific threshold, but arguments that are unbounded or unknown
at compile time:

foo (size_t n)
{
...
p = alloca(n);
...
}


Sure.  I added a bit of text to the end of the sentence that should
hopefully make that clearer.  The updated patch is attached.


The -Walloca-larger-than=size option

detects

calls to the alloca function whose argument either may
exceed the specified size, or that is not known
to be sufficiently constrained to avoid exceeding it.


I'd like to see an example, if possible.  I'm a big fan of them :).

Pretty please :).

Re: [wwwdocs] Document in changes.html -fcode-hoisting, -fipa-bit-cp, -fipa-vrp, -fsplit-loops, GCJ removal, x86 ISA additions, -fshrink-wrap-separate etc.

2017-01-31 Thread Martin Jambor

On Tue, Jan 31, 2017 at 05:31:41PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 31, 2017 at 12:15:36PM +0100, Martin Jambor wrote:
> > > I've committed it as is, feel free to propose changes.  But before that, 
> > > we
> > > should figure out what to do with -fipa-cp-alignment.  Should it have
> > > Ignore flag and
> > > Does nothing. Preserved for backward compatibility.
> > > description in common.opt, and
> > > 
> > > @item -fipa-cp-alignment
> > > @opindex -fipa-cp-alignment
> > > When enabled, this optimization propagates alignment of function
> > > parameters to support better vectorization and string operations.
> > > 
> > > This flag is enabled by default at @option{-O2} and @option{-Os}.  It
> > > requires that @option{-fipa-cp} is enabled.
> > > @option{-fipa-cp-alignment} is obsolete, use @option{-fipa-bit-cp} 
> > > instead.
> > > 
> > > removed altogether, or should it say be alias for -fipa-bit-cp?
> > 
> > Well, -fipa-bit-cp is a superset of the old -fipa-cp-alignment in the
> > sense that the latter worked only for pointers whereas the former now
> > also operates on integers.
> > 
> > We could still refuse to store results of the analysis to pointers if
> > user provided -fno-ipa-cp-alignment, but I do not think it is
> > reasonable.  My preference would be to mark it obsolete and document
> > that -fipa-bit-cp should be used instead.
> 
> What are obsolete options?  Do you mean deprecated or removed?

I am sorry, I got confused by the mail I was quoting.  I meant
deprecated, because I thought that was the appropriate thing to do.
However, I personally will not object if we remove it.

> We have ignored options retained just for compatibility that do nothing.

I'll have a look at how it is done and prepare a patch, it may take a
few days though because I am traveling now.

Martin

> -fipa-cp-alignment does nothing right now (nothing ever checks it), but
> it isn't declared as Ignore and is documented (ignored options aren't
> documented).  The current state is that e.g. x_flag_ipa_cp_alignment is
> still a field in global_options* etc.
> 
>   Jakub

[C++ patch] instantiate_decl cleanup

2017-01-31 Thread Nathan Sidwell

I've committed this patch, which is the tidying up I've done for the 
67273 and 79253 patch on its way.


instantiate_decl's cleanup is a little untidy because of some 'goto out' 
statements, that jump into the middle of the cleanup.


1) finally make defer_ok a bool
2) sort the state restoration to be in reverse order.
3) move the omp_privatization restoration to just before the out label. 
(it's saved after the last goto).


nathan
--
Nathan Sidwell
2017-01-31  Nathan Sidwell  

	* cp-tree.h (instantiate_decl): Make defer_ok bool.
	* pt.c: Fix instantiate_decl calls to pass true/false not 0/1
	(instantiate_decl): Simplify and reorder state saving and restoration.

Index: cp/cp-tree.h
===
--- cp/cp-tree.h	(revision 245065)
+++ cp/cp-tree.h	(working copy)
@@ -6189,7 +6189,7 @@ extern void do_decl_instantiation		(tree
 extern void do_type_instantiation		(tree, tree, tsubst_flags_t);
 extern bool always_instantiate_p		(tree);
 extern void maybe_instantiate_noexcept		(tree);
-extern tree instantiate_decl			(tree, int, bool);
+extern tree instantiate_decl			(tree, bool, bool);
 extern int comp_template_parms			(const_tree, const_tree);
 extern bool uses_parameter_packs(tree);
 extern bool template_parameter_pack_p   (const_tree);
Index: cp/pt.c
===
--- cp/pt.c	(revision 245065)
+++ cp/pt.c	(working copy)
@@ -10676,7 +10676,7 @@ instantiate_class_template_1 (tree type)
 	{
 	  /* Set function_depth to avoid garbage collection.  */
 	  ++function_depth;
-	  instantiate_decl (decl, false, false);
+	  instantiate_decl (decl, /*defer_ok=*/false, false);
 	  --function_depth;
 	}
 
@@ -16022,7 +16022,8 @@ tsubst_expr (tree t, tree args, tsubst_f
 	  complete_type (tmp);
 	  for (fn = TYPE_METHODS (tmp); fn; fn = DECL_CHAIN (fn))
 	if (!DECL_ARTIFICIAL (fn))
-	  instantiate_decl (fn, /*defer_ok*/0, /*expl_inst_class*/false);
+	  instantiate_decl (fn, /*defer_ok=*/false,
+/*expl_inst_class=*/false);
 	}
   break;
 
@@ -21946,7 +21947,7 @@ do_decl_instantiation (tree decl, tree s
   check_explicit_instantiation_namespace (result);
   mark_decl_instantiated (result, extern_p);
   if (! extern_p)
-instantiate_decl (result, /*defer_ok=*/1,
+instantiate_decl (result, /*defer_ok=*/true,
 		  /*expl_inst_class_mem_p=*/false);
 }
 
@@ -21984,7 +21985,7 @@ instantiate_class_member (tree decl, int
 {
   mark_decl_instantiated (decl, extern_p);
   if (! extern_p)
-instantiate_decl (decl, /*defer_ok=*/1,
+instantiate_decl (decl, /*defer_ok=*/true,
 		  /*expl_inst_class_mem_p=*/true);
 }
 
@@ -22405,15 +22406,14 @@ maybe_instantiate_noexcept (tree fn)
 }
 
 /* Produce the definition of D, a _DECL generated from a template.  If
-   DEFER_OK is nonzero, then we don't have to actually do the
+   DEFER_OK is true, then we don't have to actually do the
instantiation now; we just have to do it sometime.  Normally it is
an error if this is an explicit instantiation but D is undefined.
-   EXPL_INST_CLASS_MEM_P is true iff D is a member of an
-   explicitly instantiated class template.  */
+   EXPL_INST_CLASS_MEM_P is true iff D is a member of an explicitly
+   instantiated class template.  */
 
 tree
-instantiate_decl (tree d, int defer_ok,
-		  bool expl_inst_class_mem_p)
+instantiate_decl (tree d, bool defer_ok, bool expl_inst_class_mem_p)
 {
   tree tmpl = DECL_TI_TEMPLATE (d);
   tree gen_args;
@@ -22428,8 +22428,6 @@ instantiate_decl (tree d, int defer_ok,
   int saved_inhibit_evaluation_warnings = c_inhibit_evaluation_warnings;
   bool external_p;
   bool deleted_p;
-  tree fn_context;
-  bool nested = false;
 
   /* This function should only be used to instantiate templates for
  functions and static member variables.  */
@@ -22444,7 +22442,7 @@ instantiate_decl (tree d, int defer_ok,
  if the variable has a constant value the referring expression can
  take advantage of that fact.  */
   if (VAR_P (d))
-defer_ok = 0;
+defer_ok = false;
 
   /* Don't instantiate cloned functions.  Instead, instantiate the
  functions they cloned.  */
@@ -22668,6 +22666,8 @@ instantiate_decl (tree d, int defer_ok,
 	goto out;
 }
 
+  bool nested;
+  tree fn_context;
   fn_context = decl_function_context (d);
   nested = (current_function_decl != NULL_TREE);
   vec omp_privatization_save;
@@ -22854,16 +22854,16 @@ instantiate_decl (tree d, int defer_ok,
   else if (nested)
 pop_function_context ();
 
-out:
-  input_location = saved_loc;
-  cp_unevaluated_operand = saved_unevaluated_operand;
-  c_inhibit_evaluation_warnings = saved_inhibit_evaluation_warnings;
-  pop_deferring_access_checks ();
-  pop_tinst_level ();
   if (nested)
 restore_omp_privatization_clauses (omp_privatization_save);
 
+out:
+  pop_deferring_access_checks ();
   timevar_pop (TV_TEMPLATE_INST);
+

Re: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

2017-01-31 Thread Jeff Law


On 01/30/2017 06:32 AM, Martin Liška wrote:

On 01/30/2017 12:27 PM, Martin Liška wrote:

Hi.

Following patch simply fixes issues reported by -Wmaybe-unitialized. That 
enables PGO bootstrap
on a s390x machine.

Ready to be installed?
Martin


There's second version that adds one more hunk for s390 target.

Martin


0001-Fix-PGO-bootstrap-on-x390x-PR-bootstrap-78985-v2.patch


From 598d0a59b91070211b09056195bc0f971bc57ae1 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 30 Jan 2017 11:09:29 +0100
Subject: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

gcc/ChangeLog:

2017-01-30  Martin Liska  

PR bootstrap/78985
* config/s390/s390.c (s390_gimplify_va_arg): Initialize local
variable to NULL.
(print_operand_address): Initialize a struct to zero.
Presumably the issue with print_operand_address is that there are paths 
where s390_decompose_address can return without initializing AD/OUT. 
But AFAICT those are invalid addresses that presumably shouldn't be 
showing up in print_operand_address.


Can you add an assert in print_operand_address to ensure decomposition 
never returns false?


OK with that addition.

jeff

Re: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

2017-01-31 Thread Jeff Law


On 01/30/2017 04:27 AM, Martin Liška wrote:

Hi.

Following patch simply fixes issues reported by -Wmaybe-unitialized. That 
enables PGO bootstrap
on a s390x machine.

Ready to be installed?
Martin


0001-Fix-PGO-bootstrap-on-x390x-PR-bootstrap-78985.patch


From 3f3c3fe790ebffd038a033b6946de663e2305574 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 30 Jan 2017 11:09:29 +0100
Subject: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

gcc/ChangeLog:

2017-01-30  Martin Liska  

PR bootstrap/78985
* config/s390/s390.c (s390_gimplify_va_arg): Initialize local
variable to NULL.
Odds are jump threading didn't want do duplicate this code due to better 
CFG profile data:


 /* ... Otherwise out of the overflow area.  */

  t = ovf;
  if (size < UNITS_PER_LONG && !left_align_p)
t = fold_build_pointer_plus_hwi (t, UNITS_PER_LONG - size);

  gimplify_expr (, pre_p, NULL, is_gimple_val, fb_rvalue);

  gimplify_assign (addr, t, pre_p);

  if (size < UNITS_PER_LONG && left_align_p)
t = fold_build_pointer_plus_hwi (t, UNITS_PER_LONG);
  else
t = fold_build_pointer_plus_hwi (t, size);

  gimplify_assign (ovf, t, pre_p);


As a result of not duplicating that code we don't thread the reg != 
NULL_TREE that guards the lab_over use.



OK for the trunk,
jeff

Re: [wwwdocs] Document in changes.html -fcode-hoisting, -fipa-bit-cp, -fipa-vrp, -fsplit-loops, GCJ removal, x86 ISA additions, -fshrink-wrap-separate etc.

2017-01-31 Thread Jakub Jelinek

On Tue, Jan 31, 2017 at 12:15:36PM +0100, Martin Jambor wrote:
> > I've committed it as is, feel free to propose changes.  But before that, we
> > should figure out what to do with -fipa-cp-alignment.  Should it have
> > Ignore flag and
> > Does nothing. Preserved for backward compatibility.
> > description in common.opt, and
> > 
> > @item -fipa-cp-alignment
> > @opindex -fipa-cp-alignment
> > When enabled, this optimization propagates alignment of function
> > parameters to support better vectorization and string operations.
> > 
> > This flag is enabled by default at @option{-O2} and @option{-Os}.  It
> > requires that @option{-fipa-cp} is enabled.
> > @option{-fipa-cp-alignment} is obsolete, use @option{-fipa-bit-cp} instead.
> > 
> > removed altogether, or should it say be alias for -fipa-bit-cp?
> 
> Well, -fipa-bit-cp is a superset of the old -fipa-cp-alignment in the
> sense that the latter worked only for pointers whereas the former now
> also operates on integers.
> 
> We could still refuse to store results of the analysis to pointers if
> user provided -fno-ipa-cp-alignment, but I do not think it is
> reasonable.  My preference would be to mark it obsolete and document
> that -fipa-bit-cp should be used instead.

What are obsolete options?  Do you mean deprecated or removed?
We have ignored options retained just for compatibility that do nothing.
-fipa-cp-alignment does nothing right now (nothing ever checks it), but
it isn't declared as Ignore and is documented (ignored options aren't
documented).  The current state is that e.g. x_flag_ipa_cp_alignment is
still a field in global_options* etc.

Jakub

Re: [wwwdocs] changes.html - document new warning options

2017-01-31 Thread Martin Sebor


On 01/31/2017 03:50 AM, Aldy Hernandez wrote:

On 01/24/2017 03:07 PM, Martin Sebor wrote:

Hi Martin.

Thank you for taking care of this.


+The -Walloca-larger-than=size option
detects
+calls to the alloca function whose argument may exceed
+the specified size.
+-Walloca-larger-than is not included in either
+-Wall or -Wextra and must be explicitly
+enabled.


You should probably document that we warn, not just on arguments that
exceed a specific threshold, but arguments that are unbounded or unknown
at compile time:

foo (size_t n)
{
...
p = alloca(n);
...
}


Sure.  I added a bit of text to the end of the sentence that should
hopefully make that clearer.  The updated patch is attached.

> The -Walloca-larger-than=size option 
detects

>calls to the alloca function whose argument either may
>exceed the specified size, or that is not known
>to be sufficiently constrained to avoid exceeding it.


Let me know if you think it needs more work.

Martin
Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.49
diff -r1.49 changes.html
43d42
< 
44a44,50
>   GCC 7 can determine the return value or range of return values of
> some calls to the sprintf family of functions and make
> it available to other optimization passes.  Some calls to the 
>   snprintf function with a zero size argument can be folded
> into constants.  The optimization is included in -O1
> and can be selectively controlled by the
> -fprintf-return-value option.
285a292,412
> GCC 7 contains a number of enhancements that help detect buffer overflow
>   and other forms of invalid memory accesses.
>   
> The -Walloc-size-larger-than=size option
> 	detects calls to standard and user-defined memory allocation
> 	functions decorated with attribute alloc_size whose
> 	argument exceeds the specified size
> 	(PTRDIFF_MAX by default).  The option also detects
> 	arithmetic overflow in the computation of the size in two-argument
> 	allocation functions like calloc where the total size
> 	is the product of the two arguments.  Since calls with an excessive
> 	size cannot succeed they are typically the result of programming
> 	errors.  Such bugs have been known to be the source of
> 	vulnerabilities and a target of exploits.
>   -Walloc-size-larger-than=PTRDIFF_MAX is included
>   in -Wall.
>   For example, the following call to malloc incorrectly tries
> 	to avoid passing a negative argument to the function and instead ends
> 	up unconditionally invoking it with an argument less than or equal
> 	to zero.  Since after conversion to the type of the argument of the
>   function (size_t) a negative argument results in a value
>   in excess of the maximum PTRDIFF_MAX the call is diagnosed.
>   
> void* f (int n)
> {
>   return malloc (n > 0 ? 0 : n);
> }
> 
> warning: argument 1 range [2147483648, 4294967295] exceeds maximum object size 2147483647 [-Walloc-size-larger-than=]
> The -Walloc-zero option detects calls to standard
> 	and user-defined memory allocation functions decorated with attribute
> 	alloc_size with a zero argument.  -Walloc-zero
> 	is not included in either -Wall or -Wextra
> 	and must be explicitly enabled.
> The -Walloca option detects all calls to the
> 	alloca function in the program.  -Walloca
> 	is not included in either -Wall or -Wextra
> 	and must be explicitly enabled.
> The -Walloca-larger-than=size option detects
> 	calls to the alloca function whose argument either may
> 	exceed the specified size, or that is not known
> 	to be sufficiently constrained to avoid exceeding it.
> 	-Walloca-larger-than is not included in either
> 	-Wall or -Wextra and must be explicitly
> 	enabled.
>   For example, compiling the following snippet with
> 	-Walloca-larger-than=1024 results in a warning because
> 	even though the code appears to call alloca only with
> 	sizes of 1kb and less, since n is signed, a negative
> 	value would result in a call to the function well in excess of
> 	the limit.
>   
> void f (int n)
> {
>   char *d;
>   if (n < 1025)
> d = alloca (n);
>   else
> d = malloc (n);
>   
> }
> 
> warning: argument to 'alloca may be too large due to conversion from 'int' to 'long unsigned int' [-Walloca-larger-than=]
> The -Wformat-overflow=level option detects
> 	certain and likely buffer overflow in calls to the sprintf
> 	family of formatted output functions.  Although the option is enabled
> 	even without optimization it works best with -O2 and
> 	higher.
>   For example, in the following snippet the call to
>   sprintf is diagnosed because even though its
>   output has been constrained using the modulo operation it could
>   result in as many as three bytes if mday were negative.
>   The solution is to either allocate a larger buffer or make sure
>

Re: [PATCH] Fix PR71824

2017-01-31 Thread Sebastian Pop

On Tue, Jan 31, 2017 at 9:11 AM, Richard Biener  wrote:
> On Tue, 31 Jan 2017, Richard Biener wrote:
>
>> On Tue, 31 Jan 2017, Sebastian Pop wrote:
>>
>> > Resend as plain text to please gcc-patches@
>> >
>> > On Tue, Jan 31, 2017 at 8:39 AM, Sebastian Pop  wrote:
>> > >
>> > >
>> > > On Tue, Jan 31, 2017 at 7:49 AM, Richard Biener  
>> > > wrote:
>> > >>
>> > >>
>> > >> The following fixes an ICE that happens because instantiate_scev
>> > >> doesn't really work as expected for SESE regions (a FIXME comment
>> > >> hints at this already).  So instead of asserting all goes well
>> > >> just bail out (add_loop_constraints seems to add constraints not
>> > >> required for correctness?).
>> > >
>> > >
>> > > The conditions under which a loop is executed are required for 
>> > > correctness.
>> > > There is a similar check in scop_detection::can_represent_loop_1
>> > >
>> > > && (niter = number_of_latch_executions (loop))
>> > > && !chrec_contains_undetermined (niter)
>> > >
>> > > that is supposed to filter out all these loops where this assert does not
>> > > hold.
>> > > The question is: why scop detection has not rejected this loop?
>> > >
>> > > Well, I see that we do not check that niter can be analyzed in the 
>> > > region:
>> > > so we would need another check like this:
>> > >
>> > > diff --git a/gcc/graphite-scop-detection.c 
>> > > b/gcc/graphite-scop-detection.c
>> > > index 3860693..8e14412 100644
>> > > --- a/gcc/graphite-scop-detection.c
>> > > +++ b/gcc/graphite-scop-detection.c
>> > > @@ -931,6 +931,7 @@ scop_detection::can_represent_loop_1 (loop_p loop,
>> > > sese_l scop)
>> > >  && niter_desc.control.no_overflow
>> > >  && (niter = number_of_latch_executions (loop))
>> > >  && !chrec_contains_undetermined (niter)
>> > > +&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
>> > > loop, niter))
>> > >  && graphite_can_represent_expr (scop, loop, niter);
>> > >  }
>> > >
>> > > Could you please try this patch and see whether it fixes the problem?
>>
>> It doesn't.  It seems we test the above before the regions are
>> eventually merged?  That is, the above enters with
>>
>> $46 = (const sese_l &) @0x7fffd6f0: {
>>   entry =  7)>,
>>   exit =  8)>}
>>
>> but the failing case with
>>
>> $15 = (const sese_l &) @0x298b420: {entry = 
>> 3)>,
>>   exit =  15)>}
>
> Index: graphite-scop-detection.c
> ===
> --- graphite-scop-detection.c   (revision 245064)
> +++ graphite-scop-detection.c   (working copy)
> @@ -905,7 +905,9 @@ scop_detection::build_scop_breadth (sese
>
>sese_l combined = merge_sese (s1, s2);
>
> -  if (combined)
> +  if (combined
> +  && loop_is_valid_in_scop (loop, combined)
> +  && loop_is_valid_in_scop (loop->next, combined))

Looks good.  Thanks for the fix!

Sebastian

>  s1 = combined;
>else
>  add_scop (s2);
> @@ -931,6 +933,8 @@ scop_detection::can_represent_loop_1 (lo
>  && niter_desc.control.no_overflow
>  && (niter = number_of_latch_executions (loop))
>  && !chrec_contains_undetermined (niter)
> +&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
> +loop,
> niter))
>  && graphite_can_represent_expr (scop, loop, niter);
>  }
>
>
> seems to fix it.
>
> Richard.

[C++ PATCH] PR c++/79264 another lambda capture

2017-01-31 Thread Nathan Sidwell

PR 79264 is an ICE with concepts and lambdas, but underneath is another 
case of failing to notice this capture.


My fix for 61636 had ignored the 'g ()' case of calling a 
template member fn.


This patch adds the smarts for that, and also an assert in 
finish_member_declaration that we're inserting into being-defined 
classes.  The assert in name-lookup used to be good enough, but not anymore.


Committed to trunk.

nathan
--
Nathan Sidwell
2017-01-31  Nathan Sidwell  

	PR c++/79264
	* lambda.c (maybe_generic_this_capture): Deal with template-id-exprs.
	* semantics.c (finish_member_declaration): Assert class is being
	defined.

	PR c++/79264
	* g++.dg/cpp1y/pr61636-1.C: Augment.

Index: cp/lambda.c
===
--- cp/lambda.c	(revision 245028)
+++ cp/lambda.c	(working copy)
@@ -849,13 +849,21 @@ maybe_generic_this_capture (tree object,
 	   interest.  */
 	if (BASELINK_P (fns))
 	  fns = BASELINK_FUNCTIONS (fns);
+	bool id_expr = TREE_CODE (fns) == TEMPLATE_ID_EXPR;
+	if (id_expr)
+	  fns = TREE_OPERAND (fns, 0);
 	for (; fns; fns = OVL_NEXT (fns))
-	  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (OVL_CURRENT (fns)))
-	{
-	  /* Found a non-static member.  Capture this.  */
-	  lambda_expr_this_capture (lam, true);
-	  break;
-	}
+	  {
+	tree fn = OVL_CURRENT (fns);
+
+	if ((!id_expr || TREE_CODE (fn) == TEMPLATE_DECL)
+		&& DECL_NONSTATIC_MEMBER_FUNCTION_P (fn))
+	  {
+		/* Found a non-static member.  Capture this.  */
+		lambda_expr_this_capture (lam, true);
+		break;
+	  }
+	  }
   }
 }
 
Index: cp/semantics.c
===
--- cp/semantics.c	(revision 245028)
+++ cp/semantics.c	(working copy)
@@ -2962,6 +2962,12 @@ finish_member_declaration (tree decl)
   /* We should see only one DECL at a time.  */
   gcc_assert (DECL_CHAIN (decl) == NULL_TREE);
 
+  /* Don't add decls after definition.  */
+  gcc_assert (TYPE_BEING_DEFINED (current_class_type)
+	  /* We can add lambda types when late parsing default
+		 arguments.  */
+	  || LAMBDA_TYPE_P (TREE_TYPE (decl)));
+
   /* Set up access control for DECL.  */
   TREE_PRIVATE (decl)
 = (current_access_specifier == access_private_node);
Index: testsuite/g++.dg/cpp1y/pr61636-1.C
===
--- testsuite/g++.dg/cpp1y/pr61636-1.C	(revision 245028)
+++ testsuite/g++.dg/cpp1y/pr61636-1.C	(working copy)
@@ -1,4 +1,5 @@
 // PR c++/61636
+// PR c++/79264
 // { dg-do compile { target c++14 } }
 
 // ICE because we figure this capture too late.
@@ -28,4 +29,8 @@ void A::b() {
   auto lam2 = [&](auto asdf) { Baz (asdf); };
 
   lam2 (0);
+
+  auto lam3 = [&](auto asdf) { Baz (asdf); };
+
+  lam3 (0);
 }

Re: [PATCH] Fix PR71824

2017-01-31 Thread Sebastian Pop

On Tue, Jan 31, 2017 at 9:06 AM, Richard Biener  wrote:
> On Tue, 31 Jan 2017, Sebastian Pop wrote:
>
>> On Tue, Jan 31, 2017 at 7:49 AM, Richard Biener  wrote:
>>
>> >
>> > The following fixes an ICE that happens because instantiate_scev
>> > doesn't really work as expected for SESE regions (a FIXME comment
>> > hints at this already).  So instead of asserting all goes well
>> > just bail out (add_loop_constraints seems to add constraints not
>> > required for correctness?).
>> >
>>
>> The conditions under which a loop is executed are required for correctness.
>> There is a similar check in scop_detection::can_represent_loop_1
>>
>> && (niter = number_of_latch_executions (loop))
>> && !chrec_contains_undetermined (niter)
>>
>> that is supposed to filter out all these loops where this assert does not
>> hold.
>> The question is: why scop detection has not rejected this loop?
>>
>> Well, I see that we do not check that niter can be analyzed in the region:
>> so we would need another check like this:
>>
>> diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
>> index 3860693..8e14412 100644
>> --- a/gcc/graphite-scop-detection.c
>> +++ b/gcc/graphite-scop-detection.c
>> @@ -931,6 +931,7 @@ scop_detection::can_represent_loop_1 (loop_p loop,
>> sese_l scop)
>>  && niter_desc.control.no_overflow
>>  && (niter = number_of_latch_executions (loop))
>>  && !chrec_contains_undetermined (niter)
>> +&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
>> loop, niter))
>>  && graphite_can_represent_expr (scop, loop, niter);
>>  }
>>
>> Could you please try this patch and see whether it fixes the problem?
>
> It doesn't.  It seems we test the above before the regions are
> eventually merged?

We are supposed to verify that the merged regions are still a valid scop.
So we are probably missing in merge_sese an extra call to verify that all
the loops can be represented in the combined region.

Re: [PATCH] Fix PR71824

2017-01-31 Thread Richard Biener

On Tue, 31 Jan 2017, Richard Biener wrote:

> On Tue, 31 Jan 2017, Sebastian Pop wrote:
> 
> > Resend as plain text to please gcc-patches@
> > 
> > On Tue, Jan 31, 2017 at 8:39 AM, Sebastian Pop  wrote:
> > >
> > >
> > > On Tue, Jan 31, 2017 at 7:49 AM, Richard Biener  wrote:
> > >>
> > >>
> > >> The following fixes an ICE that happens because instantiate_scev
> > >> doesn't really work as expected for SESE regions (a FIXME comment
> > >> hints at this already).  So instead of asserting all goes well
> > >> just bail out (add_loop_constraints seems to add constraints not
> > >> required for correctness?).
> > >
> > >
> > > The conditions under which a loop is executed are required for 
> > > correctness.
> > > There is a similar check in scop_detection::can_represent_loop_1
> > >
> > > && (niter = number_of_latch_executions (loop))
> > > && !chrec_contains_undetermined (niter)
> > >
> > > that is supposed to filter out all these loops where this assert does not
> > > hold.
> > > The question is: why scop detection has not rejected this loop?
> > >
> > > Well, I see that we do not check that niter can be analyzed in the region:
> > > so we would need another check like this:
> > >
> > > diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
> > > index 3860693..8e14412 100644
> > > --- a/gcc/graphite-scop-detection.c
> > > +++ b/gcc/graphite-scop-detection.c
> > > @@ -931,6 +931,7 @@ scop_detection::can_represent_loop_1 (loop_p loop,
> > > sese_l scop)
> > >  && niter_desc.control.no_overflow
> > >  && (niter = number_of_latch_executions (loop))
> > >  && !chrec_contains_undetermined (niter)
> > > +&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
> > > loop, niter))
> > >  && graphite_can_represent_expr (scop, loop, niter);
> > >  }
> > >
> > > Could you please try this patch and see whether it fixes the problem?
> 
> It doesn't.  It seems we test the above before the regions are
> eventually merged?  That is, the above enters with
> 
> $46 = (const sese_l &) @0x7fffd6f0: {
>   entry =  7)>, 
>   exit =  8)>}
> 
> but the failing case with
> 
> $15 = (const sese_l &) @0x298b420: {entry =  
> 3)>, 
>   exit =  15)>}

Index: graphite-scop-detection.c
===
--- graphite-scop-detection.c   (revision 245064)
+++ graphite-scop-detection.c   (working copy)
@@ -905,7 +905,9 @@ scop_detection::build_scop_breadth (sese
 
   sese_l combined = merge_sese (s1, s2);
 
-  if (combined)
+  if (combined
+  && loop_is_valid_in_scop (loop, combined)
+  && loop_is_valid_in_scop (loop->next, combined))
 s1 = combined;
   else
 add_scop (s2);
@@ -931,6 +933,8 @@ scop_detection::can_represent_loop_1 (lo
 && niter_desc.control.no_overflow
 && (niter = number_of_latch_executions (loop))
 && !chrec_contains_undetermined (niter)
+&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
+loop, 
niter))
 && graphite_can_represent_expr (scop, loop, niter);
 }
 

seems to fix it.

Richard.

Re: [PATCH] Fix PR71824

2017-01-31 Thread Richard Biener

On Tue, 31 Jan 2017, Sebastian Pop wrote:

> On Tue, Jan 31, 2017 at 7:49 AM, Richard Biener  wrote:
> 
> >
> > The following fixes an ICE that happens because instantiate_scev
> > doesn't really work as expected for SESE regions (a FIXME comment
> > hints at this already).  So instead of asserting all goes well
> > just bail out (add_loop_constraints seems to add constraints not
> > required for correctness?).
> >
> 
> The conditions under which a loop is executed are required for correctness.
> There is a similar check in scop_detection::can_represent_loop_1
> 
> && (niter = number_of_latch_executions (loop))
> && !chrec_contains_undetermined (niter)
> 
> that is supposed to filter out all these loops where this assert does not
> hold.
> The question is: why scop detection has not rejected this loop?
> 
> Well, I see that we do not check that niter can be analyzed in the region:
> so we would need another check like this:
> 
> diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
> index 3860693..8e14412 100644
> --- a/gcc/graphite-scop-detection.c
> +++ b/gcc/graphite-scop-detection.c
> @@ -931,6 +931,7 @@ scop_detection::can_represent_loop_1 (loop_p loop,
> sese_l scop)
>  && niter_desc.control.no_overflow
>  && (niter = number_of_latch_executions (loop))
>  && !chrec_contains_undetermined (niter)
> +&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
> loop, niter))
>  && graphite_can_represent_expr (scop, loop, niter);
>  }
> 
> Could you please try this patch and see whether it fixes the problem?

It doesn't.  It seems we test the above before the regions are
eventually merged?  That is, the above enters with

$46 = (const sese_l &) @0x7fffd6f0: {
  entry =  7)>, 
  exit =  8)>}

but the failing case with

$15 = (const sese_l &) @0x298b420: {entry =  
3)>, 
  exit =  15)>}

Richard.

> Thanks,
> Sebastian
> 
> 
> > Bootstrap & regtest in progress (though the patch can at most turn
> > an ICE into wrong-code).
> >
> > This fixes build of 445.gobmk with -floop-nest-optimize.
> >
> > Ok?
> >
> > Thanks,
> > Richard
> >
> > 2017-01-31  Richard Biener  
> >
> > PR tree-optimization/71824
> > * graphite-sese-to-poly.c (add_loop_constraints): Bail out
> > instead of asserting.
> >
> > * gcc.dg/graphite/pr71824.c: New testcase.
> >
> > Index: gcc/graphite-sese-to-poly.c
> > ===
> > --- gcc/graphite-sese-to-poly.c (revision 245058)
> > +++ gcc/graphite-sese-to-poly.c (working copy)
> > @@ -930,7 +931,11 @@ add_loop_constraints (scop_p scop, __isl
> >/* loop_i <= expr_nb_iters */
> >gcc_assert (!chrec_contains_undetermined (nb_iters));
> >nb_iters = scalar_evolution_in_region (region, loop, nb_iters);
> > -  gcc_assert (!chrec_contains_undetermined (nb_iters));
> > +  if (chrec_contains_undetermined (nb_iters))
> > +{
> > +  isl_space_free (space);
> > +  return domain;
> > +}
> >
> >isl_pw_aff *aff_nb_iters = extract_affine (scop, nb_iters,
> >  isl_space_copy (space));
> > Index: gcc/testsuite/gcc.dg/graphite/pr71824.c
> > ===
> > --- gcc/testsuite/gcc.dg/graphite/pr71824.c (nonexistent)
> > +++ gcc/testsuite/gcc.dg/graphite/pr71824.c (working copy)
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -floop-nest-optimize" } */
> > +
> > +int a, b, d;
> > +int **c;
> > +int fn1() {
> > +while (a)
> > +  if (d) {
> > + int e = -d;
> > + for (; b < e; b++)
> > +   c[b] = 
> > +  } else {
> > + for (; b; b++)
> > +   c[b] = 
> > + d = 0;
> > +  }
> > +}
> >
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Fix PR71824

2017-01-31 Thread Richard Biener

On Tue, 31 Jan 2017, Sebastian Pop wrote:

> Resend as plain text to please gcc-patches@
> 
> On Tue, Jan 31, 2017 at 8:39 AM, Sebastian Pop  wrote:
> >
> >
> > On Tue, Jan 31, 2017 at 7:49 AM, Richard Biener  wrote:
> >>
> >>
> >> The following fixes an ICE that happens because instantiate_scev
> >> doesn't really work as expected for SESE regions (a FIXME comment
> >> hints at this already).  So instead of asserting all goes well
> >> just bail out (add_loop_constraints seems to add constraints not
> >> required for correctness?).
> >
> >
> > The conditions under which a loop is executed are required for correctness.
> > There is a similar check in scop_detection::can_represent_loop_1
> >
> > && (niter = number_of_latch_executions (loop))
> > && !chrec_contains_undetermined (niter)
> >
> > that is supposed to filter out all these loops where this assert does not
> > hold.
> > The question is: why scop detection has not rejected this loop?
> >
> > Well, I see that we do not check that niter can be analyzed in the region:
> > so we would need another check like this:
> >
> > diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
> > index 3860693..8e14412 100644
> > --- a/gcc/graphite-scop-detection.c
> > +++ b/gcc/graphite-scop-detection.c
> > @@ -931,6 +931,7 @@ scop_detection::can_represent_loop_1 (loop_p loop,
> > sese_l scop)
> >  && niter_desc.control.no_overflow
> >  && (niter = number_of_latch_executions (loop))
> >  && !chrec_contains_undetermined (niter)
> > +&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
> > loop, niter))
> >  && graphite_can_represent_expr (scop, loop, niter);
> >  }
> >
> > Could you please try this patch and see whether it fixes the problem?

It doesn't.  It seems we test the above before the regions are
eventually merged?  That is, the above enters with

$46 = (const sese_l &) @0x7fffd6f0: {
  entry =  7)>, 
  exit =  8)>}

but the failing case with

$15 = (const sese_l &) @0x298b420: {entry =  
3)>, 
  exit =  15)>}

Richard.

Re: [C++ PATCH] pr 67273 & 79253 lambdas, warnings & ICEs

2017-01-31 Thread Jason Merrill

On Tue, Jan 31, 2017 at 8:31 AM, Nathan Sidwell  wrote:
> On 01/30/2017 03:48 PM, Jason Merrill wrote:
>> Why can't it figure that out for itself?  We should be able to tell
>> whether its containing function is currently open.
>
> It doesn't have sufficient information (but that may not matter, see below).
>
> template  void Foo (T lam)
> {
>   lam (1u); // #1
> }
>
> template 
> void f(T x)
> {
>   auto lam = [](auto x) { return  (x); };
>
>   lam (1); // #2
>   Foo (lam);
> }
>
> void Bar ()
> {
>   f(1);
> }
>
> at #1 and #2 we end up via maybe_instantiate in instantiate_decl (to
> determine the return type when building the CALL_EXPR).  At #1
> current_function_decl is the template Foo, which is not the context of the
> closure type.
>
> At #2 we go via the same path, but current_function_decl is 'f', which is
> the context of the closure type.
>
> It seems wrong to me to push to top in one of those cases but not in the
> other.  Again, absent of generic lambdas, we'd always push to top in these
> circumstances.

Agreed.  As I was suggesting in response to one of Adam's patches, I
think we need to defer creating the closure until f is instantiated;
at that point we can resolve all names from f and so we should be able
to always push to top when instantiating the lambda.

> That said, using the predicate:
>current_function_decl
> == DECL_CONTEXT (TYPE_NAME (CP_DECL_CONTEXT (d)))
> to determine whether a lambda instantiate should push to top or not doesn't
> cause any test failures. (and does resolve the shadowing bug).

Perhaps

if (!fn_context || fn_context != current_function_decl)

?

Jason

Re: [PATCH] Fix PR71824

2017-01-31 Thread Sebastian Pop

Resend as plain text to please gcc-patches@

On Tue, Jan 31, 2017 at 8:39 AM, Sebastian Pop  wrote:
>
>
> On Tue, Jan 31, 2017 at 7:49 AM, Richard Biener  wrote:
>>
>>
>> The following fixes an ICE that happens because instantiate_scev
>> doesn't really work as expected for SESE regions (a FIXME comment
>> hints at this already).  So instead of asserting all goes well
>> just bail out (add_loop_constraints seems to add constraints not
>> required for correctness?).
>
>
> The conditions under which a loop is executed are required for correctness.
> There is a similar check in scop_detection::can_represent_loop_1
>
> && (niter = number_of_latch_executions (loop))
> && !chrec_contains_undetermined (niter)
>
> that is supposed to filter out all these loops where this assert does not
> hold.
> The question is: why scop detection has not rejected this loop?
>
> Well, I see that we do not check that niter can be analyzed in the region:
> so we would need another check like this:
>
> diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
> index 3860693..8e14412 100644
> --- a/gcc/graphite-scop-detection.c
> +++ b/gcc/graphite-scop-detection.c
> @@ -931,6 +931,7 @@ scop_detection::can_represent_loop_1 (loop_p loop,
> sese_l scop)
>  && niter_desc.control.no_overflow
>  && (niter = number_of_latch_executions (loop))
>  && !chrec_contains_undetermined (niter)
> +&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
> loop, niter))
>  && graphite_can_represent_expr (scop, loop, niter);
>  }
>
> Could you please try this patch and see whether it fixes the problem?
>
> Thanks,
> Sebastian
>

Re: libgomp: Provide prototypes for functions implemented by libgomp plugins

2017-01-31 Thread Thomas Schwinge

Hi!

On Fri, 27 Jan 2017 10:28:42 +0100, Jakub Jelinek  wrote:
> On Fri, Jan 27, 2017 at 10:19:18AM +0100, Thomas Schwinge wrote:
> > During development, I had been changing the libgomp plugin API, which
> > should have caused build failures in unmodified plugins -- but it didn't.
> > Here is patch to address that.  OK for trunk?  Should this also go into
> > release branches?

> For the prototypes, just use
>   (GOMP_OFFLOAD_get_name, GOMP_OFFLOAD_get_caps,
>   
>   GOMP_OFFLOAD_openacc_set_stream): New prototype.

Well, even if regularly done that way, it's not technically quite right:
I'm not modifying "GOMP_OFFLOAD_get_name" et al., but instead I'm adding
them (modifying the "top level" scope).  But, I changed that anyway.

> > --- libgomp/libgomp.h
> > +++ libgomp/libgomp.h

> > -  void (*exec_func) (void (*) (void *), size_t, void **, void **, int,
> > -unsigned *, void *);
> > +  typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
> 
> Please use __typeof instead of typeof, that is the keyword spelling rest of
> libgomp uses.

ACK.

> > -  void (*register_async_cleanup_func) (void *, int);
> > +  typeof (GOMP_OFFLOAD_openacc_register_async_cleanup) 
> > *register_async_cleanup_func;
> 
> Many lines are too long, plus adding 2 __s might make some just below limit
> lines over it.  Please use
>   __typeof (GOMP_OFFLOAD_openacc_register_async_cleanup)
> *register_async_cleanup_func;
> in that case.

Ugly, but I guess we still have this 80 columns limit...  ;-)

> Ok with those changes for trunk, please leave release branches as is.

Thanks for the review; committed to trunk in r245062:

commit 468af399108a74c00c9d16cbe02a780aea1b2987
Author: tschwinge 
Date:   Tue Jan 31 14:32:58 2017 +

libgomp: Provide prototypes for functions implemented by libgomp plugins

libgomp/
* libgomp-plugin.h: #include .
(GOMP_OFFLOAD_get_name, GOMP_OFFLOAD_get_caps)
(GOMP_OFFLOAD_get_type, GOMP_OFFLOAD_get_num_devices)
(GOMP_OFFLOAD_init_device, GOMP_OFFLOAD_fini_device)
(GOMP_OFFLOAD_version, GOMP_OFFLOAD_load_image)
(GOMP_OFFLOAD_unload_image, GOMP_OFFLOAD_alloc, GOMP_OFFLOAD_free)
(GOMP_OFFLOAD_dev2host, GOMP_OFFLOAD_host2dev)
(GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_can_run, GOMP_OFFLOAD_run)
(GOMP_OFFLOAD_async_run, GOMP_OFFLOAD_openacc_parallel)
(GOMP_OFFLOAD_openacc_register_async_cleanup)
(GOMP_OFFLOAD_openacc_async_test)
(GOMP_OFFLOAD_openacc_async_test_all)
(GOMP_OFFLOAD_openacc_async_wait)
(GOMP_OFFLOAD_openacc_async_wait_async)
(GOMP_OFFLOAD_openacc_async_wait_all)
(GOMP_OFFLOAD_openacc_async_wait_all_async)
(GOMP_OFFLOAD_openacc_async_set_async)
(GOMP_OFFLOAD_openacc_create_thread_data)
(GOMP_OFFLOAD_openacc_destroy_thread_data)
(GOMP_OFFLOAD_openacc_get_current_cuda_device)
(GOMP_OFFLOAD_openacc_get_current_cuda_context)
(GOMP_OFFLOAD_openacc_get_cuda_stream)
(GOMP_OFFLOAD_openacc_set_cuda_stream): New prototypes.
* libgomp.h (struct acc_dispatch_t, struct gomp_device_descr): Use
these.
* plugin/plugin-hsa.c (GOMP_OFFLOAD_load_image)
(GOMP_OFFLOAD_unload_image): Fix argument types.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_type): Fix
return type.
(GOMP_OFFLOAD_load_image): Fix argument types.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@245062 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog| 30 ++
 libgomp/libgomp-plugin.h | 43 ++-
 libgomp/libgomp.h| 70 +---
 libgomp/plugin/plugin-hsa.c  |  4 +-
 liboffloadmic/ChangeLog  | 12 
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp |  4 +-
 6 files changed, 123 insertions(+), 40 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index b4e7d6e..829a30f 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,33 @@
+2017-01-31  Thomas Schwinge  
+
+   * libgomp-plugin.h: #include .
+   (GOMP_OFFLOAD_get_name, GOMP_OFFLOAD_get_caps)
+   (GOMP_OFFLOAD_get_type, GOMP_OFFLOAD_get_num_devices)
+   (GOMP_OFFLOAD_init_device, GOMP_OFFLOAD_fini_device)
+   (GOMP_OFFLOAD_version, GOMP_OFFLOAD_load_image)
+   (GOMP_OFFLOAD_unload_image, GOMP_OFFLOAD_alloc, GOMP_OFFLOAD_free)
+   (GOMP_OFFLOAD_dev2host, GOMP_OFFLOAD_host2dev)
+   (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_can_run, GOMP_OFFLOAD_run)
+   (GOMP_OFFLOAD_async_run, GOMP_OFFLOAD_openacc_parallel)
+

Re: [PATCH] Fix ICEs with power8 fixuns_truncdi2 patterns (PR target/79197)

2017-01-31 Thread Jakub Jelinek

On Tue, Jan 31, 2017 at 09:11:11AM +0100, Jakub Jelinek wrote:
> But I don't see an iterator that would handle SFmode in vsx.md, only
> VSX_B is only
> (define_mode_iterator VSX_B [DF V4SF V2DF])
> and even most of the iterators the vsx.md insn uses don't handle SFmode.
> So, shall the condition actually be || (mode != SFmode && 
> VECTOR_UNIT_VSX_P (mode)
> ?

Variant with only allowing DF->DI for VECTOR_UNIT_VSX_P.
If vsx.md is changed to support it also for SFmode (which I'm afraid would
be far more changes), then that condition can be simplified again:

2017-01-31  Jakub Jelinek  

PR target/79197
* config/rs6000/rs6000.md (fixuns_truncdi2): Use gpc_reg_operand
instead of register_operand.  Add all the conditions of
*fixuns_truncdi2_fctiduz define_insn into condition.
(*fixuns_truncdi2_fctiduz): Put all conditions on a single line.
* config/vsx/vsx.md (vsx_fixuns_trunc2): Use VSX_B instead
of VSX_F.

* gcc.target/powerpc/pr79197.c: New test.
* gcc.c-torture/compile/pr79197.c: New test.

--- gcc/config/rs6000/rs6000.md.jj  2017-01-31 08:48:22.645210016 +0100
+++ gcc/config/rs6000/rs6000.md 2017-01-31 08:49:53.491028737 +0100
@@ -5742,16 +5742,17 @@
(set_attr "type" "fp")])
 
 (define_expand "fixuns_truncdi2"
-  [(set (match_operand:DI 0 "register_operand" "")
-   (unsigned_fix:DI (match_operand:SFDF 1 "register_operand" "")))]
-  "TARGET_HARD_FLOAT && (TARGET_FCTIDUZ || VECTOR_UNIT_VSX_P (mode))"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "")
+   (unsigned_fix:DI (match_operand:SFDF 1 "gpc_reg_operand" "")))]
+  "TARGET_HARD_FLOAT && ((TARGET_DOUBLE_FLOAT && TARGET_FPRS && TARGET_FCTIDUZ)
+|| (mode != SFmode
+&& VECTOR_UNIT_VSX_P (mode)))"
   "")
 
 (define_insn "*fixuns_truncdi2_fctiduz"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wi")
(unsigned_fix:DI (match_operand:SFDF 1 "gpc_reg_operand" ",")))]
-  "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT && TARGET_FPRS
-&& TARGET_FCTIDUZ"
+  "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT && TARGET_FPRS && TARGET_FCTIDUZ"
   "@
fctiduz %0,%1
xscvdpuxds %x0,%x1"
--- gcc/config/rs6000/vsx.md.jj 2017-01-26 09:14:27.0 +0100
+++ gcc/config/rs6000/vsx.md2017-01-31 08:45:30.164452804 +0100
@@ -1650,7 +1650,7 @@ (define_insn "vsx_fix_trunc2"
 
 (define_insn "vsx_fixuns_trunc2"
   [(set (match_operand: 0 "gpc_reg_operand" "=,?")
-   (unsigned_fix: (match_operand:VSX_F 1 "gpc_reg_operand" 
",")))]
+   (unsigned_fix: (match_operand:VSX_B 1 "gpc_reg_operand" 
",")))]
   "VECTOR_UNIT_VSX_P (mode)"
   "xcvuxs %x0,%x1"
   [(set_attr "type" "")
--- gcc/testsuite/gcc.target/powerpc/pr79197.c.jj   2017-01-31 
08:47:30.976881865 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr79197.c  2017-01-31 08:47:30.976881865 
+0100
@@ -0,0 +1,11 @@
+/* PR target/79197 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -mno-popcntd" } */
+
+unsigned a;
+
+void
+foo (void)
+{
+  a = *(double *) (__UINTPTR_TYPE__) 0x40;
+}
--- gcc/testsuite/gcc.c-torture/compile/pr79197.c.jj2017-01-31 
08:47:30.976881865 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr79197.c   2017-01-31 
08:47:30.976881865 +0100
@@ -0,0 +1,10 @@
+/* PR target/79197 */
+
+unsigned long b;
+
+unsigned long
+foo (float *a, float *x)
+{
+  __builtin_memcpy (a, x, sizeof (float));
+  return *a;
+}


Jakub

[PATCH] Fix PR71824

2017-01-31 Thread Richard Biener


The following fixes an ICE that happens because instantiate_scev
doesn't really work as expected for SESE regions (a FIXME comment
hints at this already).  So instead of asserting all goes well
just bail out (add_loop_constraints seems to add constraints not
required for correctness?).

Bootstrap & regtest in progress (though the patch can at most turn
an ICE into wrong-code).

This fixes build of 445.gobmk with -floop-nest-optimize.

Ok?

Thanks,
Richard

2017-01-31  Richard Biener  

PR tree-optimization/71824
* graphite-sese-to-poly.c (add_loop_constraints): Bail out
instead of asserting.

* gcc.dg/graphite/pr71824.c: New testcase.

Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 245058)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -930,7 +931,11 @@ add_loop_constraints (scop_p scop, __isl
   /* loop_i <= expr_nb_iters */
   gcc_assert (!chrec_contains_undetermined (nb_iters));
   nb_iters = scalar_evolution_in_region (region, loop, nb_iters);
-  gcc_assert (!chrec_contains_undetermined (nb_iters));
+  if (chrec_contains_undetermined (nb_iters))
+{
+  isl_space_free (space);
+  return domain;
+}
 
   isl_pw_aff *aff_nb_iters = extract_affine (scop, nb_iters,
 isl_space_copy (space));
Index: gcc/testsuite/gcc.dg/graphite/pr71824.c
===
--- gcc/testsuite/gcc.dg/graphite/pr71824.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr71824.c (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+int a, b, d;
+int **c;
+int fn1() {
+while (a)
+  if (d) {
+ int e = -d;
+ for (; b < e; b++)
+   c[b] = 
+  } else {
+ for (; b; b++)
+   c[b] = 
+ d = 0;
+  }
+}

Re: [C++ PATCH] pr 67273 & 79253 lambdas, warnings & ICEs

2017-01-31 Thread Nathan Sidwell


On 01/30/2017 03:48 PM, Jason Merrill wrote:


Why can't it figure that out for itself?  We should be able to tell
whether its containing function is currently open.


It doesn't have sufficient information (but that may not matter, see below).

template  void Foo (T lam)
{
  lam (1u); // #1
}

template 
void f(T x)
{
  auto lam = [](auto x) { return  (x); };

  lam (1); // #2
  Foo (lam);
}

void Bar ()
{
  f(1);
}

at #1 and #2 we end up via maybe_instantiate in instantiate_decl (to 
determine the return type when building the CALL_EXPR).  At #1 
current_function_decl is the template Foo, which is not the context of 
the closure type.


At #2 we go via the same path, but current_function_decl is 'f', which 
is the context of the closure type.


It seems wrong to me to push to top in one of those cases but not in the 
other.  Again, absent of generic lambdas, we'd always push to top in 
these circumstances.


That said, using the predicate:
   current_function_decl
== DECL_CONTEXT (TYPE_NAME (CP_DECL_CONTEXT (d)))
to determine whether a lambda instantiate should push to top or not 
doesn't cause any test failures. (and does resolve the shadowing bug).


In case you're wondering, if we always push to top for a lambda a bunch 
of tests fail.  It seems (at least) the access checking breaks.


nathan

--
Nathan Sidwell

Re: [PATCH][PR target/79170] fix memcmp builtin expansion sequence for rs6000 target.

2017-01-31 Thread Segher Boessenkool

On Tue, Jan 31, 2017 at 01:42:31PM +0100, Christophe Lyon wrote:
> The updated test does not link when using newlib, missing random().
> 
> The small attached patch fixes this by calling rand() instead.
> Tested on arm-none-eabi, aarch64-elf.
> 
> OK?
> 
> Christophe

Okay, thanks!


Segher


> 2017-01-31  Christophe Lyon  
> 
>   * gcc.dg/memcmp-1.c (static void test_driver_memcmp): Call
>   rand() instead of random().

Re: [PATCH][PR target/79170] fix memcmp builtin expansion sequence for rs6000 target.

2017-01-31 Thread Christophe Lyon

Hi,

On 30 January 2017 at 17:00, Peter Bergner  wrote:
> On 1/27/17 5:43 PM, Segher Boessenkool wrote:
>>
>> On Fri, Jan 27, 2017 at 12:11:05PM -0600, Aaron Sawdey wrote:
>>>
>>> +addi 9,4,7
>>> +lwbrx 10,0,9
>>> +addi 9,5,7
>>> +lwbrx 9,0,9
>>
>>
>> It would be nice if this was
>>
>> li 9,7
>> lwbrx 10,9,4
>> lwbrx 9,9,5
>
>
> Nicer still, we want the base address as the RA operand
> and the offset as the RB operand, so like so:
>
> li 9,7
> lwbrx 10,4,9
> lwbrx 9,5,9
>
> On some processors, it matters performance wise.
>
> Peter
>

The updated test does not link when using newlib, missing random().

The small attached patch fixes this by calling rand() instead.
Tested on arm-none-eabi, aarch64-elf.

OK?

Christophe
gcc/testsuite/ChangeLog:

2017-01-31  Christophe Lyon  

* gcc.dg/memcmp-1.c (static void test_driver_memcmp): Call
  rand() instead of random().


diff --git a/gcc/testsuite/gcc.dg/memcmp-1.c b/gcc/testsuite/gcc.dg/memcmp-1.c
index b4fd780..828a0ca 100644
--- a/gcc/testsuite/gcc.dg/memcmp-1.c
+++ b/gcc/testsuite/gcc.dg/memcmp-1.c
@@ -28,12 +28,12 @@ static void test_driver_memcmp (void (test_memcmp)(const 
char *, const char *, i
   for(l=0;l

Re: [PATCH, wwwdocs, committed] Update gcc-7/changes.html for LRA

2017-01-31 Thread Segher Boessenkool

On Sat, Jan 28, 2017 at 10:51:34PM +0100, Gerald Pfeifer wrote:
> On Mon, 19 Sep 2016, Segher Boessenkool wrote:
> > Index: htdocs/gcc-7/changes.html
> > ===
> > +  GCC now uses LRA by default for new targets.
> 
> > +  The PowerPC port now uses LRA by default.
> 
> All of us (on this list) know what LRA is, why this is a good
> change, etc.
> 
> Since our release notes in changes.html are also consumed by
> users and media, would it make sense to provide a little more
> color?
> 
> I applied the patch below, but there may be more we could do?

Not sure.  This really does not change anything for users.  But we
have no "news for developers" page.


Segher

Re: Patch ping Re: [PATCH] -fsanitize=address,undefined support on s390x

2017-01-31 Thread Jakub Jelinek

On Tue, Jan 31, 2017 at 01:15:20PM +0100, Andreas Krebbel wrote:
> >> 2017-01-23  Jakub Jelinek  
> >>
> >> gcc/
> >>* config/s390/s390.c (s390_asan_shadow_offset): New function.
> >>(TARGET_ASAN_SHADOW_OFFSET): Redefine.
> >> libsanitizer/
> >>* configure.tgt: Enable asan and ubsan on 64-bit s390*-*-linux*.
> 
> Ok. Thanks!
> 
> >> +/* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
> >> +
> >> +static unsigned HOST_WIDE_INT
> >> +s390_asan_shadow_offset (void)
> >> +{
> >> +  return TARGET_64BIT ? HOST_WIDE_INT_1U << 52 : HOST_WIDE_INT_UC 
> >> (0x2000);
> >> +}
> 
> These values probably come from the LLVM implementation?!
> The 64 bit offset appears to reside beyond the default task size which is 4TB 
> (1<<42). So it will
> trigger the task to be upgraded to an additional level of page tables. Sounds 
> reasonable in order to
> avoid collisions for most of the cases. The task size in that case will end 
> up as 1<<53 so the
> shadow map should end up near to the top of it.
> The 32 bit value should not be used since 32 bit is disabled - right? This 
> appears to be the default
> value for 32 bit targets. I'm not sure if we will have adjust it since we 
> only have 31 bit address
> space.

Both values reflect on what is right now in libsanitizer/asan/asan_mapping.h
(the library has to agree with the compiler on the shift).
That header has for s390{,x}:

// Default Linux/S390 mapping:
// || `[0x3000, 0x7fff]` || HighMem||
// || `[0x2600, 0x2fff]` || HighShadow ||
// || `[0x2400, 0x25ff]` || ShadowGap  ||
// || `[0x2000, 0x23ff]` || LowShadow  ||
// || `[0x, 0x1fff]` || LowMem ||
//
// Default Linux/SystemZ mapping:
// || `[0x14, 0x1f]` || HighMem||
// || `[0x128000, 0x13]` || HighShadow ||
// || `[0x12, 0x127fff]` || ShadowGap  ||
// || `[0x10, 0x11]` || LowShadow  ||
// || `[0x00, 0x0f]` || LowMem ||
...
static const u64 kDefaultShadowOffset32 = 1ULL << 29;  // 0x2000
...
static const u64 kSystemZ_ShadowOffset64 = 1ULL << 52;
...
#if SANITIZER_WORDSIZE == 32
...
#  else
#define SHADOW_OFFSET kDefaultShadowOffset32
#  endif
#else
...
#  elif defined(__s390x__)
#define SHADOW_OFFSET kSystemZ_ShadowOffset64
...

If you think some of those aren't appropriate for s390 64-bit or 32-bit,
then it would need to be changed in upstream too.
The if:

> >> --- libsanitizer/configure.tgt.jj  2017-01-23 15:25:21.0 +0100
> >> +++ libsanitizer/configure.tgt 2017-01-23 15:36:40.787456320 +0100
> >> @@ -39,6 +39,11 @@ case "${target}" in
> >>;;
> >>sparc*-*-linux*)
> >>;;
> >> +  s390*-*-linux*)
> >> +  if test x$ac_cv_sizeof_void_p = x4; then
> >> +  UNSUPPORTED=1
> >> +  fi
> >> +  ;;

part is purely because in Fedora s390 31-bit is not available anymore (only
64-bit s390x), so I can't test that.  If you could test it somewhere and
verify even 31-bit works fine, then the
  if test x$ac_cv_sizeof_void_p = x4; then
UNSUPPORTED=1
  fi
lines could be removed.

Jakub

Re: Patch ping Re: [PATCH] -fsanitize=address,undefined support on s390x

2017-01-31 Thread Andreas Krebbel

On 01/30/2017 10:31 PM, Jakub Jelinek wrote:
> Hi!
> 
> On Mon, Jan 23, 2017 at 09:36:29PM +0100, Jakub Jelinek wrote:
>> So, I've bootstrapped/regtested s390x-linux (64-bit only, don't have 32-bit
>> userland around anymore to test 31-bit) with the attached patch (and on top
>> of the PR79168 patch I'll post soon) and the
>> only regressions I got are:
>> FAIL: c-c++-common/asan/null-deref-1.c   {-O2,-O2 -flto 
>> -fno-use-linker-plugin -flto-partition=none,-O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects,-O3 -g,-Os}  output pattern test
>> FAIL: g++.dg/asan/deep-stack-uaf-1.C   {-O0,-O1,-O2,-O3 -g,-Os}  output 
>> pattern test
>> FAIL: c-c++-common/ubsan/overflow-vec-1.c   {-O0,-O1,-O2,-O2 -flto 
>> -fno-use-linker-plugin -flto-partition=none,-O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects,-O3 -g,-Os}  execution test
>> FAIL: c-c++-common/ubsan/overflow-vec-2.c   {-O0,-O1,-O2,-O2 -flto 
>> -fno-use-linker-plugin -flto-partition=none,-O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects,-O3 -g,-Os}  execution test
>> All but deep-stack-uaf-1.C in both check-gcc and check-g++.
>>
>> In null-deref-1.c it seems the problem is in the line for the deref,
>> the testcase is expecting runtime error on line 10, while
>> #0 0x8a6d in NullDeref c-c++-common/asan/null-deref-1.c:11   
>>   
>> #1 0x88f1 in main c-c++-common/asan/null-deref-1.c:15
>>   
>> #2 0x3ff93022c5f in __libc_start_main (/lib64/libc.so.6+0x22c5f) 
>>   
>> #3 0x896d  
>> (gcc-7.0.1-20170120/obj-s390x-redhat-linux/gcc/testsuite/g++/null-deref-1.exe+0x896d)
>>
>> is reported.
>> The second test fails
>> ERROR: AddressSanitizer: heap-use-after-free on address 0x61500205 at pc 
>> 0x8b12 bp 0x03fff8378928 sp 0x03fff8378918   
>> READ of size 1 at 0x61500205 thread T0   
>>   
>> #0 0x8b11 in main g++.dg/asan/deep-stack-uaf-1.C:33  
>>   
>> #1 0x3ffabe22c5f in __libc_start_main (/lib64/libc.so.6+0x22c5f) 
>>   
>> #2 0x89cd  
>> (gcc-7.0.1-20170120/obj-s390x-redhat-linux/gcc/testsuite/g++/deep-stack-uaf-1.exe+0x89cd)
>> will need to debug if we don't need to add further options on s390x to
>> make sure it has all the frames it is expecting.
>> The last 2 tests aren't really asan related, will debug.
>>
>> Note apparently asan_test.C isn't enabled on non-i?86/x86_64, which
>> is a big mistake, will try to change it separately.
> 
> I'd like to ping this patch, since then bootstrapped/regtested again
> several times on s390x-linux.  asan_test.C is since then enabled
> on all architectures and passes with the patch (lots of small tests),
> the overflow-vec-*.c tests fail even on powerpc*-linux, so I think the
> port is in quite good shape and for feature parity it would be nice to
> have this feature on s390x-linux.
> 
>> 2017-01-23  Jakub Jelinek  
>>
>> gcc/
>>  * config/s390/s390.c (s390_asan_shadow_offset): New function.
>>  (TARGET_ASAN_SHADOW_OFFSET): Redefine.
>> libsanitizer/
>>  * configure.tgt: Enable asan and ubsan on 64-bit s390*-*-linux*.

Ok. Thanks!

>> +/* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
>> +
>> +static unsigned HOST_WIDE_INT
>> +s390_asan_shadow_offset (void)
>> +{
>> +  return TARGET_64BIT ? HOST_WIDE_INT_1U << 52 : HOST_WIDE_INT_UC 
>> (0x2000);
>> +}

These values probably come from the LLVM implementation?!
The 64 bit offset appears to reside beyond the default task size which is 4TB 
(1<<42). So it will
trigger the task to be upgraded to an additional level of page tables. Sounds 
reasonable in order to
avoid collisions for most of the cases. The task size in that case will end up 
as 1<<53 so the
shadow map should end up near to the top of it.
The 32 bit value should not be used since 32 bit is disabled - right? This 
appears to be the default
value for 32 bit targets. I'm not sure if we will have adjust it since we only 
have 31 bit address
space.

>> +
>>  /* Initialize GCC target structure.  */
>>  
>>  #undef  TARGET_ASM_ALIGNED_HI_OP
>> @@ -15536,6 +15544,8 @@ s390_excess_precision (enum excess_preci
>>  #define TARGET_BUILD_BUILTIN_VA_LIST s390_build_builtin_va_list
>>  #undef TARGET_EXPAND_BUILTIN_VA_START
>>  #define TARGET_EXPAND_BUILTIN_VA_START s390_va_start
>> +#undef TARGET_ASAN_SHADOW_OFFSET
>> +#define TARGET_ASAN_SHADOW_OFFSET s390_asan_shadow_offset
>>  #undef TARGET_GIMPLIFY_VA_ARG_EXPR
>>  #define TARGET_GIMPLIFY_VA_ARG_EXPR

RE: [PATCH] MIPS: Fix mode mismatch error between Loongson builtin arguments and insn operands.

2017-01-31 Thread Matthew Fortune

Toma Tabacu  writes:
> The builtins for the pshufh, psllh, psllw, psrah, psraw, psrlh, psrlw
> Loongson instructions have the third argument's type set to UQI while
> its corresponding insn operand is in SImode.
> 
> This results in the following error when matching insn operands:
> 
> ../gcc/gcc/include/loongson.h: In function 'test_psllw_s':
> ../gcc/gcc/include/loongson.h:483:10: error: invalid argument to built-
> in function
>return __builtin_loongson_psllw_s (s, amount);
>   ^~
> 
> This causes the loongson-simd.c and loongson-shift-count-truncated-1.c
> tests to fail.
> 
> This patch fixes this by wrapping the QImode builtin argument inside a
> paradoxical SUBREG with SImode, which will successfully match against
> the insn operand.
> 
> Tested with mips-mti-elf.
> 
> Regards,
> Toma
> 
> gcc/
> 
>   * config/mips/mips.c (mips_expand_builtin_insn): Put the QImode
>   argument of the pshufh, psllh, psllw, psrah, psraw, psrlh, psrlw
>   builtins into an SImode paradoxical SUBREG.
> 
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index
> da7fa8f..f1ca6e2 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -16574,6 +16574,20 @@ mips_expand_builtin_insn (enum insn_code icode,
> unsigned int nops,
> 
>switch (icode)
>  {
> +/* The third argument needs to be in SImode in order to succesfully
> match
> +   the operand from the insn definition.  */

Please refer to operand here not argument as it is the second argument
to the builtin but third operand of the instruction.  Also double ss in 
successfully.

> +case CODE_FOR_loongson_pshufh:
> +case CODE_FOR_loongson_psllh:
> +case CODE_FOR_loongson_psllw:
> +case CODE_FOR_loongson_psrah:
> +case CODE_FOR_loongson_psraw:
> +case CODE_FOR_loongson_psrlh:
> +case CODE_FOR_loongson_psrlw:
> +  gcc_assert (has_target_p && nops == 3 && ops[2].mode == QImode);
> +  ops[2].value = lowpart_subreg (SImode, ops[2].value, QImode);
> +  ops[2].mode = SImode;
> +  break;
> +
>  case CODE_FOR_msa_addvi_b:
>  case CODE_FOR_msa_addvi_h:
>  case CODE_FOR_msa_addvi_w:

For the record, given paradoxical subregs are a headache...
I am OK with this on the basis that the argument to psllh etc is actually
a uint8_t which means that bits 8 upwards are guaranteed to be zero so
the subreg can be eliminated without any explicit sign or zero extension
inserted.  This is the same kind of optimisation that combine would
perform when eliminating zero extension.

Please can you check that a zero extension is inserted for the following
case with -O2 or above:

int16x4_t testme(int16x4_t s, int amount)
{
  return psllh_s (s, amount);
}

If my understanding is correct there should be an ANDI 0xff inserted
or similar.

OK with those changes and confirmation of the test above.

Thanks,
Matthew

Re: [wwwdocs] Document in changes.html -fcode-hoisting, -fipa-bit-cp, -fipa-vrp, -fsplit-loops, GCJ removal, x86 ISA additions, -fshrink-wrap-separate etc.

2017-01-31 Thread Martin Jambor

Hi,

On Sun, Jan 29, 2017 at 07:55:18PM +0100, Jakub Jelinek wrote:
> On Sun, Jan 29, 2017 at 06:08:32PM +0530, Prathamesh Kulkarni wrote:
> > > +  A new interprocedural bitwise constant propagation optimization
> > > +  has been added, which propagates knowledge about which bits of 
> > > variables
> > > +  are known to be zero (including pointer alignment information) across
> > > +  the call graph.  It can be enabled by using the 
> > > -fipa-bit-cp
> > > +  option if -fipa-cp is enabled as well, and is enabled by
> > > +  default at the -O2 optimization level and higher.
> > Hi,
> > Just a small comment:  -fipa-bit-cp makes -fipa-cp-alignment deprecated.
> 
> I've committed it as is, feel free to propose changes.  But before that, we
> should figure out what to do with -fipa-cp-alignment.  Should it have
> Ignore flag and
> Does nothing. Preserved for backward compatibility.
> description in common.opt, and
> 
> @item -fipa-cp-alignment
> @opindex -fipa-cp-alignment
> When enabled, this optimization propagates alignment of function
> parameters to support better vectorization and string operations.
> 
> This flag is enabled by default at @option{-O2} and @option{-Os}.  It
> requires that @option{-fipa-cp} is enabled.
> @option{-fipa-cp-alignment} is obsolete, use @option{-fipa-bit-cp} instead.
> 
> removed altogether, or should it say be alias for -fipa-bit-cp?

Well, -fipa-bit-cp is a superset of the old -fipa-cp-alignment in the
sense that the latter worked only for pointers whereas the former now
also operates on integers.

We could still refuse to store results of the analysis to pointers if
user provided -fno-ipa-cp-alignment, but I do not think it is
reasonable.  My preference would be to mark it obsolete and document
that -fipa-bit-cp should be used instead.

> The documentation certainly doesn't document what it does, because nothing
> ever looks at that flag, so it is effectively ignored, not just obsolete.

I'm not sure I understand, it was used by gcc 6 and the documentation
describes what it did.  The documentation of course should be updated.

Martin

[PATCH] Fix FAIL: gfortran.dg/graphite/pr68279.f90 -O (internal compiler error) (PR77318)

2017-01-31 Thread Richard Biener


The following patch fixes the graphite ICE in the fortran testsuite.
After digging and reverse engineering I concluded that the loop
passed to scalar_evolution_in_region via add_condition_to_pbb
and create_pw_aff_from_tree is bogus and instead the loop of the
condition stmt should be used to not get chrecs with evolutions in
loops that are not inside the SESE region of the SCOP.  This in
turn causes to trigger the assert because scalar_evolution_in_region
doesn't "instantiate" defs in the region (but not defined in a loop
fully contained in the region) up to parameters of the region.
So what we are really interested in is that SSA names _not_ defined
in the SESE region are registered as parameters.

Bootstrap and regtest running on x86_64-unknown-linux-gnu,
preliminary testing on all graphite.exp with {,-m32} passed
(I guess that's what we have in test coverage).

I've come up with this through reverse-engineering only, knowing
nothing about graphite so I'd appreciate a OK from the CCed
maintainers (well, "maintainers"...)

I'll try a quick before/after test of -floop-nest-optimize on SPEC 2k6.

Thanks,
Richard.

2017-01-31  Richard Biener  

PR tree-optimization/77318
* graphite-sese-to-poly.c (extract_affine): Fix assert.
(create_pw_aff_from_tree): Take loop parameter.
(add_condition_to_pbb): Pass loop of the condition to
create_pw_aff_from_tree.

Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 245052)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -407,7 +407,7 @@ extract_affine (scop_p s, tree e, __isl_
 
 case SSA_NAME:
   gcc_assert (-1 != parameter_index_in_region_1 (e, s->scop_info)
- || !invariant_in_sese_p_rec (e, s->scop_info->region, NULL));
+ || defined_in_sese_p (e, s->scop_info->region));
   res = extract_affine_name (s, e, space);
   break;
 
@@ -436,11 +436,11 @@ extract_affine (scop_p s, tree e, __isl_
 /* Returns a linear expression for tree T evaluated in PBB.  */
 
 static isl_pw_aff *
-create_pw_aff_from_tree (poly_bb_p pbb, tree t)
+create_pw_aff_from_tree (poly_bb_p pbb, loop_p loop, tree t)
 {
   scop_p scop = PBB_SCOP (pbb);
 
-  t = scalar_evolution_in_region (scop->scop_info->region, pbb_loop (pbb), t);
+  t = scalar_evolution_in_region (scop->scop_info->region, loop, t);
 
   gcc_assert (!chrec_contains_undetermined (t));
   gcc_assert (!automatically_generated_chrec_p (t));
@@ -455,8 +455,9 @@ create_pw_aff_from_tree (poly_bb_p pbb,
 static void
 add_condition_to_pbb (poly_bb_p pbb, gcond *stmt, enum tree_code code)
 {
-  isl_pw_aff *lhs = create_pw_aff_from_tree (pbb, gimple_cond_lhs (stmt));
-  isl_pw_aff *rhs = create_pw_aff_from_tree (pbb, gimple_cond_rhs (stmt));
+  loop_p loop = gimple_bb (stmt)->loop_father;
+  isl_pw_aff *lhs = create_pw_aff_from_tree (pbb, loop, gimple_cond_lhs 
(stmt));
+  isl_pw_aff *rhs = create_pw_aff_from_tree (pbb, loop, gimple_cond_rhs 
(stmt));
 
   isl_set *cond;
   switch (code)

Re: [PATCH], PR target/78597, Fix PowerPC fp-int-convert-{float128-ieee,float64x}

2017-01-31 Thread Segher Boessenkool

On Mon, Jan 30, 2017 at 08:21:00PM -0500, Michael Meissner wrote:
> This patch fixes PR target/78597 on PowerPC.  The basic problem is conversion
> between unsigned int and _Float128 fails for 0x8000.  Both power{7,8} 
> using
> simulated IEEE 128-bit floating point and power9 using hardware IEEE 128-bit
> failed in the same test.
> 
> I cut down the patches I had developed for 79038 that are waiting for GCC 8 to
> open up to include the patches that fix the problem, but don't do additional
> improvements (optimizing conversions between char/short and _Float128, and
> optimizing converting _Float128 to int/short/char and storing the result).
> 
> This patch is a little on the big side, because I deleted the two functions
> (convert_float128_to_int and (convert_int_to_float128) that were doing the
> integer/_Float128 conversions, and instead implemented them directly.  I also
> deleted the various insns that those two functions called.  It only affects
> _Float128/__float128 conversions.

Yes, it looks quite safe like this.  Thanks for the rework.

Just two very very minor things:

>  (define_expand "floatsi2"

> +  rtx op1 = operands[1]; 

Trailing space here.

> +;; Conersion between IEEE 128-bit and integer types
> +(define_insn "fix_di2_hw"

Typo (conversion).

Okay for trunk with those fixed.  Thanks!


Segher

Re: [wwwdocs] changes.html - document new warning options

2017-01-31 Thread Aldy Hernandez


On 01/24/2017 03:07 PM, Martin Sebor wrote:

Hi Martin.

Thank you for taking care of this.


+The -Walloca-larger-than=size option detects
+   calls to the alloca function whose argument may exceed
+   the specified size.
+   -Walloca-larger-than is not included in either
+   -Wall or -Wextra and must be explicitly
+   enabled.


You should probably document that we warn, not just on arguments that 
exceed a specific threshold, but arguments that are unbounded or unknown 
at compile time:


foo (size_t n)
{
...
p = alloca(n);
...
}

Thanks.
Aldy

Re: Patch to fix PR79131

2017-01-31 Thread Christophe Lyon

On 31 January 2017 at 10:05, Kyrill Tkachov  wrote:
> Hi Christophe,
>
> On 30/01/17 20:59, Christophe Lyon wrote:
>>
>> Hi Vladimir,
>>
>> On 26 January 2017 at 18:09, Vladimir Makarov  wrote:
>>>
>>> The following patch fixes
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79131
>>>
>>> The patch also adapts IP IRA in LRA because without it GCC IP RA tests
>>> become broken (it was just a luck that the tests worked before the
>>> patch).
>>>
>>> The patch was successfully bootstrapped and tested on x86-64.
>>>
>>> Committed as rev. 244942.
>>>
>>>
>> After this patch, I've noticed regressions on arm:
>> FAIL:  gcc.target/arm/neon-for-64bits-1.c scan-assembler-times vshr 0
>> Your follow-up patch didn't fix this.
>> There are now 6 vshr instructions generated.
>>
>> Can you check ?
>
>
> I've opened PR 79282.
>
> I think it may be an issue exposed by Vladimir's patch.

That's my feeling too.
Thanks for the heads-up, for some reason I did not receive an email
from bugzilla.

>
> Kyrill
>
>> Thanks,
>> Christophe
>
>

Re: [PATCH] Call symbol_summary<>::release instead of ~symbol_summary (PR, ipa/79285).

2017-01-31 Thread Richard Biener

On Tue, Jan 31, 2017 at 9:34 AM, Martin Liška  wrote:
> Hello.
>
> Calling twice ~symbol_summary (once from ipa_free_all_node_params, second 
> time from ggc free page) causes
> double calling ~hash_map, which triggers the valgrind issue. Proper fix is to 
> call ::release from
> ipa_free_all_node_params.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Ok.

Richard.

> Martin

[Committed] S/390: Rename __S390_ARCH_LEVEL to ARCH__.

2017-01-31 Thread Andreas Krebbel

We have recently added the predefined macro __S390_ARCH_LEVEL__ to
expand to a numerical value representing the architecture document
level currently in use.  However, the IBM XL compiler already has a
macro named __ARCH__ for that purpose.  So we change our macro to that
in order to improve portability.

gcc/ChangeLog:

2017-01-31  Andreas Krebbel  

* config/s390/s390-c.c (s390_cpu_cpp_builtins_internal): Rename
__S390_ARCH_LEVEL__ to __ARCH__.

gcc/testsuite/ChangeLog:

2017-01-31  Andreas Krebbel  

* gcc.target/s390/s390.exp: Rename __S390_ARCH_LEVEL__ to
__ARCH__.
---
 gcc/config/s390/s390-c.c   |  4 ++--
 gcc/testsuite/gcc.target/s390/s390.exp | 14 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c
index bf84269..8f184ea 100644
--- a/gcc/config/s390/s390-c.c
+++ b/gcc/config/s390/s390-c.c
@@ -340,8 +340,8 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
   arch_level--;
 /* Review when a new arch is added and increase the value.  */
 char dummy[23 - 2 * PROCESSOR_max] __attribute__((unused));
-sprintf (macro_def, "__S390_ARCH_LEVEL__=%d", arch_level);
-cpp_undef (pfile, "__S390_ARCH_LEVEL__");
+sprintf (macro_def, "__ARCH__=%d", arch_level);
+cpp_undef (pfile, "__ARCH__");
 cpp_define (pfile, macro_def);
   }
 
diff --git a/gcc/testsuite/gcc.target/s390/s390.exp 
b/gcc/testsuite/gcc.target/s390/s390.exp
index c74d659..107ba98 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -100,19 +100,19 @@ proc check_effective_target_s390_useable_hw { } {
int main (void)
{
asm (".machinemode zarch" : : );
-   #if __S390_ARCH_LEVEL__ >= 11
+   #if __ARCH__ >= 11
asm ("lcbb %%r2,0(%%r15),0" : : );
-   #elif __S390_ARCH_LEVEL__ >= 10
+   #elif __ARCH__ >= 10
asm ("risbgn %%r2,%%r2,0,0,0" : : );
-   #elif __S390_ARCH_LEVEL__ >= 9
+   #elif __ARCH__ >= 9
asm ("sgrk %%r2,%%r2,%%r2" : : );
-   #elif __S390_ARCH_LEVEL__ >= 8
+   #elif __ARCH__ >= 8
asm ("rosbg %%r2,%%r2,0,0,0" : : );
-   #elif __S390_ARCH_LEVEL__ >= 7
+   #elif __ARCH__ >= 7
asm ("nilf %%r2,0" : : );
-   #elif __S390_ARCH_LEVEL__ >= 6
+   #elif __ARCH__ >= 6
asm ("lay %%r2,0(%%r15)" : : );
-   #elif __S390_ARCH_LEVEL__ >= 5
+   #elif __ARCH__ >= 5
asm ("tam" : : );
#endif
#ifdef __HTM__
-- 
2.9.1

Re: Patch to fix PR79131

2017-01-31 Thread Kyrill Tkachov


Hi Christophe,

On 30/01/17 20:59, Christophe Lyon wrote:

Hi Vladimir,

On 26 January 2017 at 18:09, Vladimir Makarov  wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79131

The patch also adapts IP IRA in LRA because without it GCC IP RA tests
become broken (it was just a luck that the tests worked before the patch).

The patch was successfully bootstrapped and tested on x86-64.

Committed as rev. 244942.



After this patch, I've noticed regressions on arm:
FAIL:  gcc.target/arm/neon-for-64bits-1.c scan-assembler-times vshr 0
Your follow-up patch didn't fix this.
There are now 6 vshr instructions generated.

Can you check ?


I've opened PR 79282.

I think it may be an issue exposed by Vladimir's patch.

Kyrill


Thanks,
Christophe

[PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-01-31 Thread Richard Biener


This amends ADJUST_FIELD_ALIGN to always get the type of the field
as argument but make the field itself optional.  All actual target
macro implementations only look at the type of the field but FRV
(which seems to misuse ADJUST_FIELD_ALIGN to do bitfield layout
rather than using one of the three standard ways - Alex/Nick?).

This speeds up min_align_of_type (no longer needs to build a FIELD_DECL)
and thus (IMHO) makes it usable from get_object_alignment.  This
causes us no longer to return bogus answers for indirect accesses to
doubles on i?86 and expand to RTL with proper MEM_ALIGN.  (it also
makes the previous fix for PR79256 no longer necessary)

Bootstrap and regtest running on x86_64-unknown-linux-gnu - is this ok
for trunk at this stage?

grep found a ADJUST_FIELD_ALIGN use in libobjc/encoding.c but that
is fed a C string(!?) as FIELD_DECL so I discounted it as unrelated
(and grep didn't find a way this macro could be defined there)?

Ian, go_field_alignment seems to be basically a copy of min_align_of_type.

CCing maintainers (though the changes are really obvious).  If we were
to solve the FRV "issue" we could get rid of the field-decl arg instead.

Thanks,
Richard.

2017-01-31  Richard Biener  

PR tree-optimization/79256
PR middle-end/79278
* builtins.c (get_object_alignment_2): Use min_align_of_type
to extract alignment for MEM_REFs to honor BIGGEST_FIELD_ALIGNMENT
and ADJUST_FIELD_ALIGN.

* doc/tm.texi.in (ADJUST_FIELD_ALIGN): Adjust to take additional
type parameter.
* doc/tm.texi: Regenerate.
* stor-layout.c (layout_decl): Adjust.
(update_alignment_for_field): Likewise.
(place_field): Likewise.
(min_align_of_type): Likewise.
* config/arc/arc.h (ADJUST_FIELD_ALIGN): Adjust.
* config/epiphany/epiphany.h (ADJUST_FIELD_ALIGN): Likewise.
* config/epiphany/epiphany.c (epiphany_adjust_field_align): Likewise.
* config/frv/frv.h (ADJUST_FIELD_ALIGN): Likewise.
* config/frv/frv.c (frv_adjust_field_align): Likewise.
* config/i386/i386.h (ADJUST_FIELD_ALIGN): Likewise.
* config/i386/i386.c (x86_field_alignment): Likewise.
* config/rs6000/aix.h (ADJUST_FIELD_ALIGN): Likewise.
* config/rs6000/darwin.h (ADJUST_FIELD_ALIGN): Likewise.
* config/rs6000/freebsd64.h (ADJUST_FIELD_ALIGN): Likewise.
* config/rs6000/linux64.h (ADJUST_FIELD_ALIGN): Likewise.
* config/rs6000/sysv4.h (ADJUST_FIELD_ALIGN): Likewise.
* config/rs6000/rs6000.c (rs6000_special_adjust_field_align_p):
 Likewise.

go/
* go-backend.c (go_field_alignment): Adjust.

Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 245052)
+++ gcc/builtins.c  (working copy)
@@ -334,9 +334,11 @@ get_object_alignment_2 (tree exp, unsign
 Do so only if get_pointer_alignment_1 did not reveal absolute
 alignment knowledge and if using that alignment would
 improve the situation.  */
+  unsigned int talign;
   if (!addr_p && !known_alignment
- && TYPE_ALIGN (TREE_TYPE (exp)) > align)
-   align = TYPE_ALIGN (TREE_TYPE (exp));
+ && (talign = min_align_of_type (TREE_TYPE (exp)) * BITS_PER_UNIT)
+ && talign > align)
+   align = talign;
   else
{
  /* Else adjust bitpos accordingly.  */
Index: gcc/config/arc/arc.h
===
--- gcc/config/arc/arc.h(revision 245052)
+++ gcc/config/arc/arc.h(working copy)
@@ -317,8 +317,8 @@ if (GET_MODE_CLASS (MODE) == MODE_INT   \
construct.
 */
 
-#define ADJUST_FIELD_ALIGN(FIELD, COMPUTED) \
-(TYPE_MODE (strip_array_types (TREE_TYPE (FIELD))) == DFmode \
+#define ADJUST_FIELD_ALIGN(FIELD, TYPE, COMPUTED) \
+(TYPE_MODE (strip_array_types (TYPE)) == DFmode \
  ? MIN ((COMPUTED), 32) : (COMPUTED))
 
 
Index: gcc/config/epiphany/epiphany.c
===
--- gcc/config/epiphany/epiphany.c  (revision 245052)
+++ gcc/config/epiphany/epiphany.c  (working copy)
@@ -2855,12 +2855,12 @@ epiphany_special_round_type_align (tree
arrays-at-the-end-of-structs work, like for struct gcov_fn_info in
libgcov.c .  */
 unsigned
-epiphany_adjust_field_align (tree field, unsigned computed)
+epiphany_adjust_field_align (tree type, unsigned computed)
 {
   if (computed == 32
-  && TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE)
+  && TREE_CODE (type) == ARRAY_TYPE)
 {
-  tree elmsz = TYPE_SIZE (TREE_TYPE (TREE_TYPE (field)));
+  tree elmsz = TYPE_SIZE (TREE_TYPE (type));
 
   if (!tree_fits_uhwi_p (elmsz) || tree_to_uhwi (elmsz) >= 32)
return 64;
Index: gcc/config/epiphany/epiphany.h
===
---

Re: [patch] Fix PR middle-end/78468

2017-01-31 Thread Eric Botcazou

[Dave replied privately that PA-RISC is OK too].

> I'd say let's not have a middle ground - this stuff is sufficiently
> brain-twisting that I'd rather go back to a known working state. If
> there was an error in the previous patch, let's roll it back until we
> fully understand the situation.

See the audit trail of the PR, in particular comment #24:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78468#c24

In other words, the issue is more general than just STACK_DYNAMIC_OFFSET.

-- 
Eric Botcazou

Re: [PATCH v3][PR lto/79061] Fix LTO plus ASAN fails with "AddressSanitizer: initialization-order-fiasco".

2017-01-31 Thread Jakub Jelinek

On Tue, Jan 31, 2017 at 08:59:27AM +0100, Richard Biener wrote:
> > The problem is that at least right now the handling of the init-order is
> > done in 2 separate places.
> > The C++ FE emits the special libasan calls with the name of the TU at the
> > beginning and end of the static initialization.
> > And then the variables need to be registered with the libasan runtime from
> > an even earlier constructor, and this is something that is done very late
> > (where we collect all the variables).
> 
> So we are basically telling libasan a list of dynamic initialized vars plus
> when they start/end being constructed so it can catch access to uninitialized
> vars during init of others?

We are telling libasan a list of all sanitized variables (those where we
managed to insert the padding around them in the end) and in that list mark
variables with dynamic initialization.  So, we can't e.g. register vars
where we for whatever reason can't add the padding around them, and this is
something that is decided when the variables are finalized.
Thus, constructing the list early (as a VAR_DECL with initializer) means
we'd then need to be able to remove stuff from it etc.
It looks much easier to me to carry this list only in the LTO data
structures (basically remember which TU owns each var).

> 
> I don't see why we need to compose that list of vars so late then.  Just
> generate it early, say, after the first swoop of unused var removal.  Use
> weak references so later removed ones get NULL.  Or simply bite the
> bullet of asan changing code gen (well, it does that anyway) and thus
> "pin" all vars life at that point as used.
> 
> Sounds much easier to me than carrying this over LTO...
> 
> And I suppose the TU name is only used for diagnostics?  Otherwise
> the symbol name (DECL_ASSEMBLER_NAME) of the symbol could
> be used as in C++ globals need to follow ODR?

The way it works is that you register the sanitized variables and each var
has a string for the owning TU (it is used also for diagnostics, so it is
desirable to not use random strings).  Then when the start of dynamic
initialization for some TU is called, libasan poisons all dynamic_init
global variables except those that have the owning TU string equal to the
current TU.  Then the construction is run (which means it will fail
if it accesses a dynamically initialized variable from some other TU),
and finally another libasan routine is called which will unpoison all the
variables it poisoned earlier.  So, for this use it is desirable the
names are actually the TU names, what you pass in corresponding static
initialization to the libasan functions.

Jakub

[PATCH] Call symbol_summary<>::release instead of ~symbol_summary (PR, ipa/79285).

2017-01-31 Thread Martin Liška

Hello.

Calling twice ~symbol_summary (once from ipa_free_all_node_params, second time 
from ggc free page) causes
double calling ~hash_map, which triggers the valgrind issue. Proper fix is to 
call ::release from
ipa_free_all_node_params.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From ac5b1e90271a83139cbb71486ddf67e0a05df9cc Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 31 Jan 2017 06:31:29 +0100
Subject: [PATCH] Call symbol_summary<>::release instead of ~symbol_summary (PR
 ipa/79285).

gcc/ChangeLog:

2017-01-31  Martin Liska  

	PR ipa/79285
	* ipa-prop.c (ipa_free_all_node_params): Call release method
	instead of ~sumbol_summary to not to trigger double times
	dtor of hash_map.
---
 gcc/ipa-prop.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 834c27d100e..3ef3d4fae9e 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3574,7 +3574,7 @@ ipa_free_all_edge_args (void)
 void
 ipa_free_all_node_params (void)
 {
-  ipa_node_params_sum->~ipa_node_params_t ();
+  ipa_node_params_sum->release ();
   ipa_node_params_sum = NULL;
 }
 
-- 
2.11.0

Re: [PATCH] Fix ICEs with power8 fixuns_truncdi2 patterns (PR target/79197)

2017-01-31 Thread Jakub Jelinek

On Mon, Jan 30, 2017 at 06:43:09PM -0600, Segher Boessenkool wrote:
> Hi Jakub, Mike,
> 
> On Mon, Jan 30, 2017 at 10:27:15PM +0100, Jakub Jelinek wrote:
> > Accoring to make mddump generated tmp-mddump.md, on powerpc the only pattern
> > with unsigned_fix:DI where the inner operand is SF or DFmode is the
> > *fixuns_truncdi2_fctiduz.
> 
> It seems like vsx_fixuns_trunc2 (in vsx.md) also wants to
> handle this.  Mike, do you remember?

Then we'd need something like (incomplete) following patch instead.
gpc_reg_operand is actually used both in *fixuns_truncdi2_fctiduz
and in vsx_fixuns_trunc2.

What I think this patch doesn't handle properly is SFmode -> DImode
unsigned_fix.  I have no idea how SFmode is represented in vector registers,
so have no idea if xscvdpuxds actually handles SFmode as well.

But I don't see an iterator that would handle SFmode in vsx.md, only
VSX_B is only
(define_mode_iterator VSX_B [DF V4SF V2DF])
and even most of the iterators the vsx.md insn uses don't handle SFmode.
So, shall the condition actually be || (mode != SFmode && 
VECTOR_UNIT_VSX_P (mode)
?

2017-01-31  Jakub Jelinek  

PR target/79197
* config/rs6000/rs6000.md (fixuns_truncdi2): Use gpc_reg_operand
instead of register_operand.  Add all the conditions of
*fixuns_truncdi2_fctiduz define_insn into condition.
(*fixuns_truncdi2_fctiduz): Put all conditions on a single line.
* config/vsx/vsx.md (vsx_fixuns_trunc2): Use VSX_B instead
of VSX_F.

* gcc.target/powerpc/pr79197.c: New test.
* gcc.c-torture/compile/pr79197.c: New test.

--- gcc/config/rs6000/rs6000.md.jj  2017-01-31 08:48:22.645210016 +0100
+++ gcc/config/rs6000/rs6000.md 2017-01-31 08:49:53.491028737 +0100
@@ -5742,16 +5742,16 @@ (define_insn_and_split "fixuns_truncdi2"
-  [(set (match_operand:DI 0 "register_operand" "")
-   (unsigned_fix:DI (match_operand:SFDF 1 "register_operand" "")))]
-  "TARGET_HARD_FLOAT && (TARGET_FCTIDUZ || VECTOR_UNIT_VSX_P (mode))"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "")
+   (unsigned_fix:DI (match_operand:SFDF 1 "gpc_reg_operand" "")))]
+  "TARGET_HARD_FLOAT && ((TARGET_DOUBLE_FLOAT && TARGET_FPRS && TARGET_FCTIDUZ)
+|| VECTOR_UNIT_VSX_P (mode))"
   "")
 
 (define_insn "*fixuns_truncdi2_fctiduz"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wi")
(unsigned_fix:DI (match_operand:SFDF 1 "gpc_reg_operand" ",")))]
-  "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT && TARGET_FPRS
-&& TARGET_FCTIDUZ"
+  "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT && TARGET_FPRS && TARGET_FCTIDUZ"
   "@
fctiduz %0,%1
xscvdpuxds %x0,%x1"
--- gcc/config/rs6000/vsx.md.jj 2017-01-26 09:14:27.0 +0100
+++ gcc/config/rs6000/vsx.md2017-01-31 08:45:30.164452804 +0100
@@ -1650,7 +1650,7 @@ (define_insn "vsx_fix_trunc2"
 
 (define_insn "vsx_fixuns_trunc2"
   [(set (match_operand: 0 "gpc_reg_operand" "=,?")
-   (unsigned_fix: (match_operand:VSX_F 1 "gpc_reg_operand" 
",")))]
+   (unsigned_fix: (match_operand:VSX_B 1 "gpc_reg_operand" 
",")))]
   "VECTOR_UNIT_VSX_P (mode)"
   "xcvuxs %x0,%x1"
   [(set_attr "type" "")
--- gcc/testsuite/gcc.target/powerpc/pr79197.c.jj   2017-01-31 
08:47:30.976881865 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr79197.c  2017-01-31 08:47:30.976881865 
+0100
@@ -0,0 +1,11 @@
+/* PR target/79197 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -mno-popcntd" } */
+
+unsigned a;
+
+void
+foo (void)
+{
+  a = *(double *) (__UINTPTR_TYPE__) 0x40;
+}
--- gcc/testsuite/gcc.c-torture/compile/pr79197.c.jj2017-01-31 
08:47:30.976881865 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr79197.c   2017-01-31 
08:47:30.976881865 +0100
@@ -0,0 +1,10 @@
+/* PR target/79197 */
+
+unsigned long b;
+
+unsigned long
+foo (float *a, float *x)
+{
+  __builtin_memcpy (a, x, sizeof (float));
+  return *a;
+}


Jakub

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-31 Thread Richard Biener

On Mon, Jan 30, 2017 at 6:19 PM, Aldy Hernandez  wrote:
> On 01/30/2017 10:03 AM, Richard Biener wrote:
>>
>> On Fri, Jan 27, 2017 at 12:20 PM, Aldy Hernandez  wrote:
>>>
>>> On 01/26/2017 07:29 AM, Richard Biener wrote:


 On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez 
 wrote:
>
>
> On 01/24/2017 07:23 AM, Richard Biener wrote:
>>>
>>>
>>>
 Your initial on-demand approach is fine to catch some of the cases but
 it
 will not handle those for which we'd need post-dominance.

 I guess we can incrementally add that.
>>>
>>>
>>>
>>> No complaints from me.
>>>
>>> This is my initial on-demand approach, with a few fixes you've commented
>>> on
>>> throughout.
>>>
>>> As you can see, there is still an #if 0 wrt to using your suggested
>>> conservative handling of memory loads, which I'm not entirely convinced
>>> of,
>>> as it pessimizes gcc.dg/loop-unswitch-1.c.  If you feel strongly about
>>> it, I
>>> can enable the code again.
>>
>>
>> It is really required -- fortunately loop-unswitch-1.c is one of the cases
>> where
>> the post-dom / always-executed bits help .  The comparison is inside the
>> loop header and thus always executed when the loop enters, so inserrting
>> it
>> on the preheader edge is fine.
>
>
> Left as is then.
>
>>
>>> Also, I enhanced gcc.dg/loop-unswitch-1.c to verify that we're actually
>>> unswitching something.  It seems kinda silly that we have various
>>> unswitch
>>> tests, but we don't actually check whether we have unswitched anything.
>>
>>
>> Heh.  It probably was added for an ICE...
>>
>>> This test was the only one in the *unswitch*.c set that I saw was
>>> actually
>>> being unswitched.  Of course, if you don't agree with my #if 0 above, I
>>> will
>>> remove this change to the test.
>>>
>>> Finally, are we even guaranteed to unswitch in loop-unswitch-1.c across
>>> architectures?  If not, again, I can remove the one-liner.
>>
>>
>> I think so.
>
>
> Left as well.
>
>>
>>>
>>> How does this look for trunk?
>>
>>
>> With a unswitch-local solution I meant to not add new files but put the
>> defined_or_undefined class (well, or rather a single function...) into
>> tree-ssa-loop-unswitch.c.
>
>
> Done.
>
>>
>> @@ -138,7 +141,7 @@ tree_may_unswitch_on (basic_block bb, struct loop
>> *loop)
>>  {
>>/* Unswitching on undefined values would introduce undefined
>>  behavior that the original program might never exercise.  */
>> -  if (ssa_undefined_value_p (use, true))
>> +  if (defined_or_undefined->is_maybe_undefined (use))
>> return NULL_TREE;
>>def = SSA_NAME_DEF_STMT (use);
>>def_bb = gimple_bb (def);
>>
>> as I said, moving this (now more expensive check) after
>>
>>   if (def_bb
>>   && flow_bb_inside_loop_p (loop, def_bb))
>> return NULL_TREE;
>>
>> this cheap check would be better.  It should avoid 99% of all calls I bet.
>
>
> Done.
>
>>
>> You can recover the loop-unswitch-1.c case by passing down
>> the using stmt and checking its BB against loop_header (the only
>> block that we trivially know is always executed when entering the region).
>> Or do that check in the caller, like
>>
>> if (bb != loop->header
>>&& ssa_undefined_value_p (use, true) /
>> defined_or_undefined->is_maybe_undefined (use))
>
>
> Done in callee.
>
>>
>> +  gimple *def = SSA_NAME_DEF_STMT (t);
>> +
>> +  /* Check that all the PHI args are fully defined.  */
>> +  if (gphi *phi = dyn_cast  (def))
>> +   {
>> + if (virtual_operand_p (PHI_RESULT (phi)))
>> +   continue;
>>
>> You should never run into virtual operands (you only walk SSA_OP_USEs).
>
>
> Done.
>
>>
>> You can stop walking at stmts that dominate the region header,
>> like with
>>
>> +  gimple *def = SSA_NAME_DEF_STMT (t);
>> /* Uses in stmts always executed when the region header
>> executes are fine.  */
>> if (dominated_by_p (CDI_DOMINATORS, loop_header, gimple_bb (def)))
>>   continue;
>
>
> H... doing this causes the PR testcase (gcc.dg/loop-unswitch-5.c in the
> attached patch to FAIL).  I haven't looked at it, but I seem to recall in
> the testcase that we could have a DEF that dominated the loop but was a mess
> of PHI's, some of whose args were undefined.
>
> Did you perhaps want to put that dominated_by_p call after the PHI arg
> checks (which seems to work)?:

Ah, yes - of course.  PHIs are not really "defs".

>   /* Check that all the PHI args are fully defined.  */
>   if (gphi *phi = dyn_cast  (def))
> ...
> ...
> ...
>
> +  /* Uses in stmts always executed when the region header executes
> +are fine.  */
> +  if (dominated_by_p (CDI_DOMINATORS, loop->header, gimple_bb (def)))
> +   continue;
> +
>/* Handle calls and memory loads conservatively.  */
>if (!is_gimple_assign (def)
>   || (gimple_assign_single_p (def)
>
>

Re: [PATCH v3][PR lto/79061] Fix LTO plus ASAN fails with "AddressSanitizer: initialization-order-fiasco".

2017-01-31 Thread Richard Biener

On Mon, Jan 30, 2017 at 4:26 PM, Jakub Jelinek  wrote:
> On Mon, Jan 30, 2017 at 04:14:40PM +0100, Richard Biener wrote:
>> > as was figured out in PR, using DECL_NAME (TRANSLATION_UNIT_DECL) does not
>> > always give us a correct module name in LTO mode because e.g. DECL_CONTEXT 
>> > of
>> > some variables can be NAMESPACE_DECL and LTO merges NAMESPACE_DECLs.
>>
>> Yes, it indeed does.  Note that GCC8+ both TU decls and NAMESPACE_DECLs
>> should no longer be neccessary and eventually vanish completely...
>> (in lto1, that is).  Can we code-gen the init order stuff early before
>> LTO write-out?
>
> The problem is that at least right now the handling of the init-order is
> done in 2 separate places.
> The C++ FE emits the special libasan calls with the name of the TU at the
> beginning and end of the static initialization.
> And then the variables need to be registered with the libasan runtime from
> an even earlier constructor, and this is something that is done very late
> (where we collect all the variables).

So we are basically telling libasan a list of dynamic initialized vars plus
when they start/end being constructed so it can catch access to uninitialized
vars during init of others?

I don't see why we need to compose that list of vars so late then.  Just
generate it early, say, after the first swoop of unused var removal.  Use
weak references so later removed ones get NULL.  Or simply bite the
bullet of asan changing code gen (well, it does that anyway) and thus
"pin" all vars life at that point as used.

Sounds much easier to me than carrying this over LTO...

And I suppose the TU name is only used for diagnostics?  Otherwise
the symbol name (DECL_ASSEMBLER_NAME) of the symbol could
be used as in C++ globals need to follow ODR?

> So the options for LTO are:
> 1) somewhere preserve (at least for dynamically_initialized vars) the TU
> it came originally from, if some dynamically_initialized var is owned by
> more TUs, just drop that flag
> 2) rewrite in LTO the dynamic_init libasan calls (register with the whole
> LTO partition name rather than individual original TUs); this has the major
> disadvantage that it will not diagnose initialization order bugs between
> vars from TUs from the same LTO partition
> 3) create the table of global vars for dynamically_initialized vars early
> (save the artificial array with the var addresses + ctor into LTO bytecode),
> at least for LTO, and then just register the non-dynamically_initialized
> vars later (not really good idea for non-LTO, we want to register all the
> vars from the whole TU together
>
> 1) looks easiest to mebut can grow varpool_node struct by a pointer size
>
> Jakub

81 matches

Mail list logo