Re: [RFC,PATCH] Introduce -msdata=explicit for powerpc

2018-08-07 Thread Alexandre Oliva
On Aug  7, 2018, Segher Boessenkool  wrote:

> Hi!

Hi!

> On Tue, Aug 07, 2018 at 02:18:59AM -0300, Alexandre Oliva wrote:

>> I saw comments, docs and init code that suggested the possibility of
>> using r2/.sdata2 for small data, but I couldn't get code to be generated
>> for such access, even with very old toolchains.  I'm not sure I'm just
>> missing how this magic comes about, or whether it hasn't been available
>> for a very long time but nobody removed the comments/docs.  Assuming I'm
>> missing something, I put in the possibility of using r2 in the test in
>> the patch, but I'm sure I have not exercised it to make sure I got it
>> right.  Help?

> attribute(section("sdata2")))  works, perhaps.  Nothing is put there
> implicitly afais.

Yeah, that's the point.  Docs say:

@item -msdata=eabi
@opindex msdata=eabi
On System V.4 and embedded PowerPC systems, put small initialized
@code{const} global and static data in the @code{.sdata2} section, which
is pointed to by register @code{r2}.  Put small initialized
non-@code{const} global and static data in the @code{.sdata} section,
which is pointed to by register @code{r13}.  [...]

and rs6000.c contains:

rs6000_elf_asm_init_sections (void)
{
[...]
  sdata2_section
= get_unnamed_section (SECTION_WRITE, output_section_asm_op,
   SDATA2_SECTION_ASM_OP);

but sdata2_section seems to be unused, and I don't see anything that
selects r2 over SMALL_DATA_REG.


>> I have not YET given this much testing.  I'm posting it so as to give
>> ppc maintainers an opportunity to comment on the proposed approach, in
>> hopes of getting buy-in for the idea, if not necessarily for the patch,
>> but I welcome alternate suggestions to enable users to choose what goes
>> in faster sdata when there's too much data for size-based assignment to
>> place interesting variables in sdata without overflowing its range.

> There is 64kB of sdata, and the maximum size of an object to be put there
> is 8 bytes by default.  That will require an awful lot of manual markup.

In this case, it's machine-generated code we're talking about.  IIUC one
large-ish data set is performance critical, and another data set that
would take them both over the edge isn't, though they're both used
together in the same translation units.

I suppose defining a fastint C macro / Fast_Integer Ada subtype in some
C header file or Ada package could make it a lot more appealing and easy
to use ;-)

> Well I'm sure you try it out on a reasonably sized
> project, and you'll find out if it is handy or not :-)

I wanted to run this design through port maintainers to check for
objections before offering it to the interested customer.  If you tell
me there's no fundamental objection, I will have it run through the
customer to confirm it will indeed meet their needs.

Thanks,

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist


Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Martin Sebor

On 08/07/2018 11:44 AM, Richard Biener wrote:

On August 7, 2018 4:37:00 PM GMT+02:00, Martin Sebor  wrote:

On 08/07/2018 02:51 AM, Richard Biener wrote:

On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor

 wrote:

On 07/26/2018 02:55 AM, Richard Biener wrote:

On Wed, 25 Jul 2018, Martin Sebor wrote:


BUT - for the string_constant and c_strlen functions we are,
in all cases we return something interesting, able to look
at an initializer which then determines that type.  Hopefully.
I think the strlen() folding code when it sets SSA ranges
now looks at types ...?

Consider

struct X { int i; char c[4]; int j;};
struct Y { char c[16]; };

void foo (struct X *p, struct Y *q)
{
  memcpy (p, q, sizeof (struct Y));
  if (strlen ((char *)(struct Y *)p + 4) < 7)
abort ();
}

here the GIMPLE IL looks like

  const char * _1;

   [local count: 1073741825]:
  _5 = MEM[(char * {ref-all})q_4(D)];
  MEM[(char * {ref-all})p_6(D)] = _5;
  _1 = p_6(D) + 4;
  _2 = __builtin_strlen (_1);

and I guess Martin would argue that since p is of type struct X
+ 4 gets you to c[4] and thus strlen of that cannot be larger
than 3.  But of course the middle-end doesn't work like that
and luckily we do not try to draw such conclusions or we
are somehow lucky that for the testcase as written above we do

not

(I'm not sure whether Martins changes in this area would derive
such conclusions in principle).


Only if the strlen argument were p->c.


NOTE - we do not know the dynamic type here since we do not know
the dynamic type of the memory pointed-to by q!  We can only
derive that at q+4 there must be some object that we can
validly call strlen on (where Martin again thinks strlen
imposes constrains that memchr does not - sth I do not agree
with from a QOI perspective)


The dynamic type is a murky area.


It's well-specified in the middle-end.  A store changes the
dynamic type of the stored-to object.  If that type is
compatible with the surrounding objects dynamic type that one
is not affected, if not then the surrounding objects dynamic
type becomes unspecified.  There is TYPE_TYPELESS_STORAGE
to somewhat control "compatibility" of subobjects.


I never responded to this.  Using a dynamic (effective) type as
you describe it would invalidate the aggressive loop optimization
in the following:

  void foo (struct X *p)
  {
  struct Y y = { "12345678" };
  memcpy (p, , sizeof (struct Y));

  // *p effective type is now struct Y

  int n = 0;
  while (p->c[n])
++n;

  if (n < 7)
abort ();
  }

GCC unconditionally aborts, just as it does with strlen(p->c).
Why is that not wrong (in either case)?

Because the code is invalid either way, for two reasons:


No, because the storage has only 4 non-null characters starting at

offset 4?

No, for the reasons below.  I made a mistake of making
the initializer string too short.  If we make it longer it
still aborts.  Say with this

  struct Y y = { "123456789012345" };

we end up with this DCE:

 struct Y y;

   :
  MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231;
  __builtin_abort ();

With -fdump-tree-cddce1-details (and a patch to show the upper
bound) we see:

  Found loop 1 to be finite: upper bound found: 3.

With -fno-aggressive-loop-optimizations the abort becomes
conditional because the array bound isn't considered. I would
expect you to know this since you implemented the feature.


Honza added the array indexing part and it may very well be too aggressive. I 
have to take a closer look after vacation to tell. Can you open a PR and CC me 
there?


I opened bug 86884.

Martin



Richard.



Martin



1) it accesses an object of (effective) type struct Y via
   an lvalue of type struct X (specifically via (*p).c)
2) it relies on p->c

The loop optimization relies on the exact same requirement
as the strlen one.  Either they are both valid or neither is.

Martin








Re: [PATCH, rs6000] Correct descriptions of __builtin_bcdadd* and _builtin_bcdsub* functions

2018-08-07 Thread Segher Boessenkool
Hi Kelvin,

On Tue, Aug 07, 2018 at 05:42:48PM -0500, Kelvin Nilsen wrote:
> 
> My "consistency" check was against the implementation.
> 
> On 8/2/18 11:38 AM, Segher Boessenkool wrote:
> > On Wed, Aug 01, 2018 at 02:55:22PM -0500, Kelvin Nilsen wrote:
> >> Several errors were discovered in the descriptions of the 
> >> __builtin_bcdadd, __builtin_bcdadd_lt, __builtin_bcdadd_eq, 
> >> __builtin_bcdadd_gt, __builtin_bcdadd_ov, __builtin_bcdsub, 
> >> __builtin_bcdsub_lt, __builtin_bcdsub_eq, __builtin_bcdsub_gt, and 
> >> __builtin_bcdsub_ov functions.  This patch corrects these documentation 
> >> errors.
> > 
> > What did you check this against?  The ABI doc, or what is currently
> > implemented?  Neither is very clear to me :-/

The ABI calls for "const int" here.  Ah, the code only check for 0 or
1 I think?  Please change the documentation to const int then.  Or,
alternatively, what do I miss?

Okay for trunk with const int.  Thanks!


Segher


Re: [PATCH] convert braced initializers to strings (PR 71625)

2018-08-07 Thread Martin Sebor

On 08/07/2018 05:31 AM, Joseph Myers wrote:

On Tue, 7 Aug 2018, Martin Sebor wrote:


2) skipping embedded nuls made it possible to create a string
with fewer elements than the initializer array, which caused
arrays with unspecified bound to be smaller than they would
have been otherwise


I think there should be explicit tests of sizeof for arrays with
unspecified bound - to make sure both that it isn't any smaller than it
should be, but also that any NULs implicitly added for a STRING_CST don't
make the arrays any larger than their size should be for the originally
given initializer that doesn't have a 0 as the last element.


I added some more tests to the latest revision of the patch.
Please see it in my other response.

Thanks
Martin



Re: [PATCH] convert braced initializers to strings (PR 71625)

2018-08-07 Thread Martin Sebor

On 08/07/2018 02:57 AM, Jason Merrill wrote:

On Wed, Aug 1, 2018 at 12:49 AM, Martin Sebor  wrote:

On 07/31/2018 07:38 AM, Jason Merrill wrote:


On Tue, Jul 31, 2018 at 9:51 AM, Martin Sebor  wrote:


The middle-end contains code to determine the lengths of constant
character arrays initialized by string literals.  The code is used
in a number of optimizations and warnings.

However, the code is unable to deal with constant arrays initialized
using the braced initializer syntax, as in

  const char a[] = { '1', '2', '\0' };

The attached patch extends the C and C++ front-ends to convert such
initializers into a STRING_CST form.

The goal of this work is to both enable existing optimizations for
such arrays, and to help detect bugs due to using non-nul terminated
arrays where nul-terminated strings are expected.  The latter is
an extension of the GCC 8 _Wstringop-overflow and
-Wstringop-truncation warnings that help detect or prevent reading
past the end of dynamically created character arrays.  Future work
includes detecting potential past-the-end reads from uninitialized
local character arrays.




  && TYPE_MAIN_VARIANT (TREE_TYPE (valtype)) == char_type_node)



Why? Don't we want this for other character types as well?


It suppresses narrowing warnings for things like

  signed char a[] = { 0xff };

(there are a couple of tests that exercise this).


Why is plain char different in this respect?  Presumably one of

char a[] = { -1 };
char b[] = { 0xff };

should give the same narrowing warning, depending on whether char is signed.


Right.  I've added more tests to verify that it does.


At the same time, STRING_CST is supposed to be able to represent
strings of any integer type so there should be a way to make it
work.  On the flip side, recent discussions of changes in this
area suggest there may be bugs in the wide character handling of
STRING_CST so those would need to be fixed before relying on it
for robust support.

In any case, if you have a suggestion for how to make it work for
at least the narrow character types I'll adjust the patch.


I suppose braced_list_to_string should call check_narrowing for C++.


I see.  I've made that change.  That has made it possible to
convert arrays of all character types.  Thanks!


Currently it uses tree_fits_shwi_p (signed host_wide_int) and then
stores the extracted value in a host unsigned int, which is then
converted to host char.  Does the right thing happen for -fsigned-char
or targets with a different character set?


I believe so.  I've added tests for these too (ASCII and EBCDIC)
and also changed the type of the extracted value to HWI to match
(it doesn't change the results of the tests).

Attached is an updated patch with these changes plus more tests
as suggested by Joseph.

Martin
PR tree-optimization/71625 - missing strlen optimization on different array initialization style

gcc/c/ChangeLog:

	PR tree-optimization/71625
	* c-parser.c (c_parser_declaration_or_fndef): Call
	braced_list_to_string.

gcc/c-family/ChangeLog:

	PR tree-optimization/71625
	* c-common.c (braced_list_to_string): New function.
	* c-common.h (braced_list_to_string): Declare it.

gcc/cp/ChangeLog:

	PR tree-optimization/71625
	* parser.c (cp_parser_init_declarator):  Call braced_list_to_string.
	(eval_check_narrowing): New function.

gcc/testsuite/ChangeLog:

	PR tree-optimization/71625
	* g++.dg/init/string2.C: New test.
	* g++.dg/init/string3.C: New test.
	* gcc.dg/strlenopt-55.c: New test.
	* gcc.dg/strlenopt-56.c: New test.

Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c	(revision 263372)
+++ gcc/c/c-parser.c	(working copy)
@@ -2126,6 +2126,15 @@ c_parser_declaration_or_fndef (c_parser *parser, b
 	  if (d != error_mark_node)
 		{
 		  maybe_warn_string_init (init_loc, TREE_TYPE (d), init);
+
+		  /* Try to convert a string CONSTRUCTOR into a STRING_CST.  */
+		  tree valtype = TREE_TYPE (init.value);
+		  if (TREE_CODE (init.value) == CONSTRUCTOR
+		  && TREE_CODE (valtype) == ARRAY_TYPE
+		  && TYPE_STRING_FLAG (TREE_TYPE (valtype)))
+		if (tree str = braced_list_to_string (valtype, init.value))
+		  init.value = str;
+
 		  finish_decl (d, init_loc, init.value,
 			   init.original_type, asm_name);
 		}
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 263372)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8509,4 +8509,84 @@ maybe_add_include_fixit (rich_location *richloc, c
   free (text);
 }
 
+/* Attempt to convert a braced array initializer list CTOR for array
+   TYPE into a STRING_CST for convenience and efficiency.  When non-null,
+   use EVAL to attempt to evalue constants (used by C++).  Return
+   the converted string on success or null on failure.  */
+
+tree
+braced_list_to_string (tree type, tree ctor, tree (*eval)(tree, tree))
+{
+  /* If the array has an explicit bound, use 

Re: [PATCH, rs6000] Early gimple folding of vec_mergeh and vec_mergel for float

2018-08-07 Thread Segher Boessenkool
Hi!

On Tue, Aug 07, 2018 at 02:24:58PM -0500, Will Schmidt wrote:
>This adds support for gimple folding of vec_mergeh and vec_mergel
> for float and double types.   Support for the integral types is already
> in-tree.

> +  /* The permute_type will match the lhs for integral types.  For double and
> + float types, the permute type needs to map to the V2 or V4 type that
> + matches size.  */
> +  tree permute_type;
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs_type)))
> +permute_type = lhs_type;
> +  else
> +if (TREE_TYPE (lhs_type) == TREE_TYPE (V2DF_type_node))
> +  permute_type = V2DI_type_node;
> +else if (TREE_TYPE (lhs_type) == TREE_TYPE (V4SF_type_node))
> +  permute_type = V4SI_type_node;
> +else
> +  gcc_unreachable ();

Please write this as

  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs_type)))
permute_type = lhs_type;
  else if (TREE_TYPE (lhs_type) == TREE_TYPE (V2DF_type_node))
permute_type = V2DI_type_node;
  else if (TREE_TYPE (lhs_type) == TREE_TYPE (V4SF_type_node))
permute_type = V4SI_type_node;
  else
gcc_unreachable ();

or, if you want to emphasize integer vs. float:

  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs_type)))
permute_type = lhs_type;
  else
{
  if (TREE_TYPE (lhs_type) == TREE_TYPE (V2DF_type_node))
permute_type = V2DI_type_node;
  else if (TREE_TYPE (lhs_type) == TREE_TYPE (V4SF_type_node))
permute_type = V4SI_type_node;
  else
gcc_unreachable ();
}

Okay for trunk with that changed.  Thanks!


Segher


Re: [PATCH v2 01/10] Initial TI PRU GCC port

2018-08-07 Thread Sandra Loosemore

On 07/28/2018 07:44 AM, Dimitar Dimitrov wrote:

ChangeLog:

2018-07-27  Dimitar Dimitrov  

* configure: Regenerate.
* configure.ac: Add PRU target.

gcc/ChangeLog:

2018-07-27  Dimitar Dimitrov  

* common/config/pru/pru-common.c: New file.
* config.gcc: Add PRU target.
* config/pru/alu-zext.md: New file.
* config/pru/constraints.md: New file.
* config/pru/predicates.md: New file.
* config/pru/pru-opts.h: New file.
* config/pru/pru-passes.c: New file.
* config/pru/pru-pragma.c: New file.
* config/pru/pru-protos.h: New file.
* config/pru/pru.c: New file.
* config/pru/pru.h: New file.
* config/pru/pru.md: New file.
* config/pru/pru.opt: New file.
* config/pru/t-pru: New file.
* doc/extend.texi: Document PRU pragmas.
* doc/invoke.texi: Document PRU-specific options.
* doc/md.texi: Document PRU asm constraints.


I have a few nit-picky comments about the documentation parts.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1dcdfb51c47..ff3245ed3eb 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1041,6 +1041,10 @@ See RS/6000 and PowerPC Options.
  -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
  -mstack-protector-guard-offset=@var{offset}}
  
+@emph{PRU Options}

+@gccoptlist{-mmcu=@var{mcu}  -minrt -mno-relax -mloop @gol
+-mabi=@var{variant} @gol}
+
  @emph{RISC-V Options}
  @gccoptlist{-mbranch-cost=@var{N-instruction} @gol
  -mplt  -mno-plt @gol


I think all the @gccoptlist items in this section are uniformly 
formatted with two spaces between each option  or they should be.  :-S




@@ -23093,6 +23098,56 @@ the offset with a symbol reference to a canary in the 
TLS block.
  @end table
  
  
+@node PRU Options

+@subsection PRU Options
+@cindex PRU Options
+
+These command-line options are defined for PRU target:
+
+@table @gcctabopt
+@item -minrt
+@opindex minrt
+Enable the use of a minimum runtime environment - no static


Format long dashes like "environment---no"; that is, three dashes, and 
no space around them.



+initializers or constructors.  Results in significant code size
+reduction of final ELF binaries.
+
+@item -mmcu=@var{mcu}
+@opindex mmcu
+Specify the PRU MCU variant to use.  Check newlib for exact list of options.


s/newlib/Newlib/


+
+@item -mno-relax
+@opindex mno-relax
+Pass on (or do not pass on) the @option{-mrelax} command-line option
+to the linker.
+
+@item -mloop
+@opindex mloop
+Allow (or do not allow) gcc to use the LOOP instruction.


s/gcc/GCC/


+
+@item -mabi=@var{variant}
+@opindex mabi
+Specify the ABI variant to output code for.  Permissible values are @samp{gnu}
+for GCC, and @samp{ti} for fully conformant TI ABI.  These are the differences:
+
+@table @samp
+@item Function Pointer Size
+TI ABI specifies that function (code) pointers are 16-bit, whereas GCC
+supports only 32-bit data and code pointers.
+
+@item Optional Return Value Pointer
+Function return values larger than 64-bits are passed by using a hidden


s/64-bits/64 bits/


+pointer as the first argument of the function.  TI ABI, though, mandates that
+the pointer can be NULL in case the caller is not using the returned value.
+GCC always passes and expects a valid return value pointer.
+
+@end table
+
+The current @samp{mabi=ti} implementation will simply raise a compile error
+when any of the above code constructs is detected.


s/will simply raise/simply raises/


+
+@end table
+
+
  @node RISC-V Options
  @subsection RISC-V Options
  @cindex RISC-V Options


-Sandra


Re: [PATCH, rs6000] Correct descriptions of __builtin_bcdadd* and _builtin_bcdsub* functions

2018-08-07 Thread Kelvin Nilsen


My "consistency" check was against the implementation.

On 8/2/18 11:38 AM, Segher Boessenkool wrote:
> Hi Kelvin,
> 
> On Wed, Aug 01, 2018 at 02:55:22PM -0500, Kelvin Nilsen wrote:
>> Several errors were discovered in the descriptions of the __builtin_bcdadd, 
>> __builtin_bcdadd_lt, __builtin_bcdadd_eq, __builtin_bcdadd_gt, 
>> __builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt, 
>> __builtin_bcdsub_eq, __builtin_bcdsub_gt, and __builtin_bcdsub_ov functions. 
>>  This patch corrects these documentation errors.
> 
> What did you check this against?  The ABI doc, or what is currently
> implemented?  Neither is very clear to me :-/
> 
> 
> Segher
> 
> 



Re: PR libstdc++/68222 Hide safe iterator operators

2018-08-07 Thread Jonathan Wakely

On 07/08/18 14:47 +0100, Jonathan Wakely wrote:

On 02/08/18 22:16 +0200, François Dumont wrote:

Hi

    Here is a patch to avoid definition of invalid operators on the 
Debug mode safe iterator type depending on its category.


    Even if it is limited to the Debug mode code I would like to 
have a feedback before committing. Especially on the following 
points:


- _Safe_tagged_iterator: Is the name ok ?


Hmm, maybe "strict" instead of tagged?

But do we need a new name? Can we just change _Safe_iterator instead
of adding a new type?

Where is _Safe_iterator still used? Just local iterators in unordered
containers?  Is it OK to remove most of the functions that used to
support it? (__niter_base etc).

Could we add a new type for the local iterators, and just change
_Safe_iterator directly so it doesn't expose unsupported operations?

That would make the patch *much* smaller, as you wouldn't need to
change all the uses of _Safe_iterator.


Another approach would be to use mixins to expose the operations:

template
struct _Safe_iterator_mixin<_Iter, _Cat>
{
typename iterator_traits<_Iter>::reference
operator*()
{ return static_cast<_Iter*>(this)->_M_deref(); }
};

template
struct _Safe_iterator_mixin<_Iter, forward_iterator_tag>
{
_Iter& operator++()
{ return static_cast<_Iter*>(this)->_M_preinc(); }
_Iter operator++(int)
{ return static_cast<_Iter*>(this)->_M_postinc(); }
};

template
struct _Safe_iterator_mixin<_Iter, bidirectional_iterator_tag>
{
_Iter& operator--()
{ return static_cast<_Iter*>(this)->_M_predec(); }
_Iter operator--(int)
{ return static_cast<_Iter*>(this)->_M_postdec(); }
};

etc.

then in _Safe_iterator rename the operator functions, so operator*
becomes _M_deref, operator++ becomes _M_preinc etc. and then derive
from _Safe_iterator_mixin which declares the operators.


FWIW I think your proposal with partial specializations for each
iterator category is probably better than this mixins idea (although I
think it requires a lot more code to implement it).

But I would like to know if it's possible to just change
_Safe_iterator instead of introducing a new _Safe_tagged_iterator
type.



Re: PR libstdc++/68222 Hide safe iterator operators

2018-08-07 Thread Jonathan Wakely

On 02/08/18 22:16 +0200, François Dumont wrote:

+#if __cplusplus >= 201103L
+  /** @brief Copy assignment. */
+  _Safe_tagged_iterator&
+  operator=(const _Safe_tagged_iterator&) = default;
+
+  /** @brief Move assignment. */
+  _Safe_tagged_iterator&
+  operator=(_Safe_tagged_iterator&&) = default;
+#else
+  /**
+   * @brief Copy assignment.
+   */
+  _Safe_tagged_iterator&
+  operator=(const _Safe_tagged_iterator& __x) _GLIBCXX_NOEXCEPT


This _GLIBCXX_NOEXCEPT can be removed, because it expands to nothing
for C++98 mode.



+  {
+   _Safe_base::operator=(__x);
+   return *this;
+  }
+#endif


[PATCH][OpenACC] Update deviceptr handling during gimplification

2018-08-07 Thread Cesar Philippidis
I had previously posted this patch as part of a monster deviceptr patch
here . This
patch breaks out the generic gimplifier changes. Essentially, with this
patch, the gimplifier will now transfer deviceptr data clauses using
GOMP_MAP_FORCE_DEVICEPTR.

Is this patch OK for trunk? It bootstrapped / regression tested cleanly
for x86_64 with nvptx offloading.

Thanks,
Cesar
>From b5cf37b795ce78c78f3f434ac6999f7094bd86aa Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Mon, 7 May 2018 08:23:48 -0700
Subject: [PATCH] [OpenACC] Update deviceptr handling

2018-XX-YY  Cesar Philippidis  

	gcc/fortran/
	* trans-openmp.c (gfc_omp_finish_clause): Don't create pointer data
	mappings for deviceptr clauses.
	(gfc_trans_omp_clauses): Likewise.
	gcc/
	* gimplify.c (enum gimplify_omp_var_data): Add GOVD_DEVICETPR.
	(omp_notice_variable): Add GOVD_DEVICEPTR attribute when appropriate.
	(gimplify_scan_omp_clauses): Likewise.
	(gimplify_adjust_omp_clauses_1): Set GOMP_MAP_FORCE_DEVICEPTR for
	implicit deviceptr mappings.
	gcc/testsuite/
	* c-c++-common/goacc/deviceptr-4.c: Update expected data mapping.

(cherry picked from openacc-gcc-7-branch commit
d3de16b461545aac1925f0d7c2851c8c49a07d06 and commit
f0514fe1899666bb5b8ee52601f5d4263d4c4646)
---
 gcc/fortran/trans-openmp.c |  9 +
 gcc/gimplify.c | 12 +++-
 gcc/testsuite/c-c++-common/goacc/deviceptr-4.c |  2 +-
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f038f4c..ca31c88 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1060,6 +1060,8 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 }
 
   tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
+  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
+return;
   if (POINTER_TYPE_P (TREE_TYPE (decl)))
 {
   if (!gfc_omp_privatize_by_reference (decl)
@@ -2111,6 +2113,12 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 	  if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL)
 		{
 		  if (POINTER_TYPE_P (TREE_TYPE (decl))
+		  && n->u.map_op == OMP_MAP_FORCE_DEVICEPTR)
+		{
+		  OMP_CLAUSE_DECL (node) = decl;
+		  goto finalize_map_clause;
+		}
+		  else if (POINTER_TYPE_P (TREE_TYPE (decl))
 		  && (gfc_omp_privatize_by_reference (decl)
 			  || GFC_DECL_GET_SCALAR_POINTER (decl)
 			  || GFC_DECL_GET_SCALAR_ALLOCATABLE (decl)
@@ -2282,6 +2290,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		  ptr2 = fold_convert (sizetype, ptr2);
 		  OMP_CLAUSE_SIZE (node3)
 		= fold_build2 (MINUS_EXPR, sizetype, ptr, ptr2);
+		finalize_map_clause:;
 		}
 	  switch (n->u.map_op)
 		{
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 4a109ae..bcf862f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -105,6 +105,9 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_MAP: must be present already.  */
   GOVD_MAP_FORCE_PRESENT = 524288,
 
+  /* Flag for OpenACC deviceptrs.  */
+  GOVD_DEVICEPTR = (1<<21),
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
 			   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
 			   | GOVD_LOCAL)
@@ -7232,6 +7235,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
 		error ("variable %qE declared in enclosing "
 			   "% region", DECL_NAME (decl));
 		  nflags |= GOVD_MAP;
+		  nflags |= (n2->value & GOVD_DEVICEPTR);
 		  if (octx->region_type == ORT_ACC_DATA
 			  && (n2->value & GOVD_MAP_0LEN_ARRAY))
 			nflags |= GOVD_MAP_0LEN_ARRAY;
@@ -8213,6 +8217,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_TO
 	  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_TOFROM)
 	flags |= GOVD_MAP_ALWAYS_TO;
+	  else if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
+	flags |= GOVD_DEVICEPTR;
 	  goto do_add;
 
 	case OMP_CLAUSE_DEPEND:
@@ -8828,7 +8834,8 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
   /* Not all combinations of these GOVD_MAP flags are actually valid.  */
   switch (flags & (GOVD_MAP_TO_ONLY
 		   | GOVD_MAP_FORCE
-		   | GOVD_MAP_FORCE_PRESENT))
+		   | GOVD_MAP_FORCE_PRESENT
+		   | GOVD_DEVICEPTR))
 	{
 	case 0:
 	  kind = GOMP_MAP_TOFROM;
@@ -8845,6 +8852,9 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 	case GOVD_MAP_FORCE_PRESENT:
 	  kind = GOMP_MAP_FORCE_PRESENT;
 	  break;
+	case GOVD_DEVICEPTR:
+	  kind = GOMP_MAP_FORCE_DEVICEPTR;
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
diff --git a/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c b/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
index db1b916..79a5162 100644
--- a/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
@@ 

[PATCH][OpenACC] Don't error on implicitly private induction variables in gfortran

2018-08-07 Thread Cesar Philippidis
At present, the fortran FE reports an error if the user adds an explicit
private clause to an induction variable used by an acc loop. This patch
teaches the fortran acc block resolver how to cope with "duplicate"
private clauses, so that it doesn't error anymore.

Is this patch OK for trunk? I bootstrapped and regression tested it for
x86_64 with nvptx offloading.

Thanks,
Cesar
>From 576b2a7d5574400f067ec309929b38b324d8c6f6 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Fri, 27 Jan 2017 14:58:16 +
Subject: [PATCH] [OpenACC] Don't error on implicitly private induction
 variables in gfortran

2018-XX-YY  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_resolve_oacc_blocks): Populate list of private
	variables.

	gcc/testsuite/
	* gfortran.dg/goacc/implicitly-private.f90: New test.

---
 gcc/fortran/openmp.c   |  5 +
 gcc/testsuite/gfortran.dg/goacc/implicitly-private.f90 | 12 
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/implicitly-private.f90

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index b346b51..798c5fa 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5951,6 +5951,7 @@ void
 gfc_resolve_oacc_blocks (gfc_code *code, gfc_namespace *ns)
 {
   fortran_omp_context ctx;
+  gfc_omp_namelist *n;
 
   resolve_oacc_loop_blocks (code);
 
@@ -5961,6 +5962,10 @@ gfc_resolve_oacc_blocks (gfc_code *code, gfc_namespace *ns)
   ctx.is_openmp = false;
   omp_current_ctx = 
 
+  if (code->ext.omp_clauses)
+for (n = code->ext.omp_clauses->lists[OMP_LIST_PRIVATE]; n; n = n->next)
+  ctx.private_iterators->add (n->sym);
+
   gfc_resolve_blocks (code->block, ns);
 
   omp_current_ctx = ctx.previous;
diff --git a/gcc/testsuite/gfortran.dg/goacc/implicitly-private.f90 b/gcc/testsuite/gfortran.dg/goacc/implicitly-private.f90
new file mode 100644
index 000..a687d8a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/implicitly-private.f90
@@ -0,0 +1,12 @@
+! Ensure that implicitly private variables do not clash with those
+! that are explicitly private.
+
+program main
+  implicit none
+
+  integer i
+
+  !$acc parallel loop private(i)
+  do i = 1, 100
+  end do
+end program main
-- 
2.7.4



[PATCH][OpenACC] Add support for firstprivate Fortran allocatable scalars

2018-08-07 Thread Cesar Philippidis
This patch updates the way that lower_omp_target uses firstprivate
pointers in OpenACC offloaded regions. On host side, when preparing
firstprivate data mapping for pointer type objects, not to be confused
with GOMP_MAP_FIRSTPRIVATE_POINTER, the compiler passes passes the
address of the value being pointed to and not the address of the pointer
itself to the runtime. Correspondingly, on the device side, the compiler
generates to code to dereference the remapped pointer once to copy the
data to a local buffer.

While this behavior looks like it would break things, it will not affect
C or C++ data mappings, because those languages transfer pointers via
GOMP_MAP_FIRSTPRIVATE_POINTER. In addition, this will not cause
problems with array types, because the default remapping rules for
OpenACC is to transfer them in via copy. Besides it really doesn't
make sense to allow arrays to be transferred in via firstprivate
because that would use up a lot of memory on the accelerator.

Is this OK for trunk? I bootstrapped and regtested it for x86_64 with
nvptx offloading.

Thanks,
Cesar
>From b8fb83b36d0f96b12af9a1f5596f31b3c6b72ef0 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Mon, 6 Aug 2018 09:19:28 -0700
Subject: [PATCH] [OpenACC] Add support for firstprivate Fortran allocatable
 scalars

This patch updates the way that lower_omp_target uses firstprivate
pointers in OpenACC offloaded regions. On host side, when preparing
pointer type firstprivate data mapping, not to be confused with
GOMP_MAP_FIRSTPRIVATE_POINTER, the compiler passes passes the address
of the value being pointed to, not the address of the pointer
itself. Correspondingly, on the device side, the compiler generates to
deference the remapped pointer once and copy the data to a local
buffer.

While this behavior like it would break things, it will not affect C
or C++ data mappings, because those languages transfer pointers via
GOMP_MAP_FIRSTPRIVATE_POINTER. In addition, this will not cause
problems with array types, because the default remapping rules for
OpenACC is to transfer them in via copy. Besides it really doesn't
make sense to allow arrays to be transferred in via firstprivate
because that would use up a lot of memory on the accelerator.

2018-XX-YY  Cesar Philippidis  

	gcc/
	omp-low.c (lower_omp_target): Update OpenACC handling of
	pointer variables with GOMP_MAP_FIRSTPRIVATE mappings.

	libgomp/
	testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New
	test.
---
 gcc/omp-low.c  | 18 
 .../libgomp.oacc-fortran/allocatable-scalar.f90| 33 ++
 2 files changed, 46 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/allocatable-scalar.f90

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 843c66f..47603c4 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -7643,15 +7643,21 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
 	  {
 		gcc_assert (is_gimple_omp_oacc (ctx->stmt));
-		if (omp_is_reference (new_var)
-		&& TREE_CODE (TREE_TYPE (new_var)) != POINTER_TYPE)
+		if (omp_is_reference (new_var))
 		  {
 		/* Create a local object to hold the instance
 		   value.  */
-		tree type = TREE_TYPE (TREE_TYPE (new_var));
+		tree type = TREE_TYPE (new_var);
+		/* Pointer types are mapped onto the device via a
+		   single level of indirection.  */
+		if (TREE_CODE (type) != POINTER_TYPE)
+		  type = TREE_TYPE (type);
 		const char *id = IDENTIFIER_POINTER (DECL_NAME (new_var));
 		tree inst = create_tmp_var (type, id);
-		gimplify_assign (inst, fold_indirect_ref (x), );
+		if (TREE_CODE (TREE_TYPE (new_var)) == POINTER_TYPE)
+		  gimplify_assign (inst, fold_indirect_ref (x), );
+		else
+		  gimplify_assign (inst, fold_indirect_ref (x), );
 		x = build_fold_addr_expr (inst);
 		  }
 		gimplify_assign (new_var, x, );
@@ -7879,7 +7885,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
 		  {
 		gcc_assert (is_gimple_omp_oacc (ctx->stmt));
-		if (!omp_is_reference (var))
+		/* Handle Fortran allocatable scalars.  */
+		if (!omp_is_reference (var)
+			&& TREE_CODE (TREE_TYPE (var)) != POINTER_TYPE)
 		  {
 			if (is_gimple_reg (var)
 			&& OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT (c))
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/allocatable-scalar.f90 b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-scalar.f90
new file mode 100644
index 000..be86d14
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-scalar.f90
@@ -0,0 +1,33 @@
+! Test non-declared allocatable scalars in OpenACC data clauses.
+
+! { dg-do run }
+
+program main
+  implicit none
+  integer, parameter :: n = 100
+  integer, allocatable :: a, c
+  integer :: i, b(n)
+
+  allocate (a)
+
+  a = 50
+
+  !$acc 

Re: [PATCH, rs6000] testcases for GIMPLE folding of vec_splat builtin.

2018-08-07 Thread Segher Boessenkool
Hi!

On Tue, Aug 07, 2018 at 02:25:06PM -0500, Will Schmidt wrote:
>   Some testcases to exercise the vec_splat() built-in.

> In building and updating these tests I consciously violated the
> 80 char per line rule, as I was doing experimentation with the
> assorted values and making lots of local updates.  I've left them as-is
> just because, but can update that if so desired.

That is fine in testcases (and even practically unavoidable in many
dg statements).

> +// vec_splat() using variable vectors should generate the vspltb instruction.
> +/* { dg-final { scan-assembler-times "vspltb" 24 } } */
> +// vec_splat() using a constant vector should generate a load.
> +/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvw4x\M} 3 } } */

You may want to use {} and \m\M everywhere, for symmetry and for future-
proofing things a bit.  It's not necessary of course.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
> @@ -0,0 +1,54 @@
> +/* Verify that overloaded built-ins for vec_splat with float and
> +   double inputs produce the right code.  */
> +
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-maltivec -mvsx -O1" } */

-mvsx implies -maltivec.

Is there any reason some of these tests use -O2 and some use -O1?

> new file mode 100644
> index 000..729b0cc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c
> @@ -0,0 +1,62 @@
> +/* Verify that overloaded built-ins for vec_splat with long long
> +   inputs produce the right code.  */
> +
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mvsx -O2" } */

What is that p8vector_ok for?  You don't force p8 vector support on.
I think you can remove this.

Okay for trunk with that last one resolved.  Thanks!


Segher


[PATCH][OpenACC] update gfortran's tile clause error handling

2018-08-07 Thread Cesar Philippidis
This patch updates how the OpenACC tile clause is handled in the Fortran
FE to match it's behavior in C/C++. Specifically, the tile clause now
errors on negative integer arguments, instead of emitting a warning.

Is this OK for trunk?

Thanks,
Cesar
>From af39a6d65cfb46397fa62c88521189002fb3d705 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Mon, 3 Oct 2016 13:58:59 +
Subject: [PATCH] [OpenACC] update gfortran's tile clause error handling

2018-XX-YY  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (resolve_positive_int_expr): Promote the warning to an
	error.

	gcc/testsuite/
	* gfortran.dg/goacc/loop-2.f95: Change expected tile clause
	warnings to errors.
	* gfortran.dg/goacc/loop-5.f95: Likewise.
	* gfortran.dg/goacc/sie.f95: Likewise.
	* gfortran.dg/goacc/tile-1.f90: New test.
	* gfortran.dg/goacc/tile-2.f90: New test.

---
 gcc/fortran/openmp.c   |  4 ++--
 gcc/testsuite/gfortran.dg/goacc/loop-2.f95 |  8 +++
 gcc/testsuite/gfortran.dg/goacc/loop-5.f95 | 12 --
 gcc/testsuite/gfortran.dg/goacc/sie.f95| 36 +++---
 gcc/testsuite/gfortran.dg/goacc/tile-1.f90 | 16 ++---
 gcc/testsuite/gfortran.dg/gomp/pr77516.f90 |  2 +-
 6 files changed, 33 insertions(+), 45 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 5c0ae45..b346b51 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -3719,8 +3719,8 @@ resolve_positive_int_expr (gfc_expr *expr, const char *clause)
   if (expr->expr_type == EXPR_CONSTANT
   && expr->ts.type == BT_INTEGER
   && mpz_sgn (expr->value.integer) <= 0)
-gfc_warning (0, "INTEGER expression of %s clause at %L must be positive",
-		 clause, >where);
+gfc_error ("INTEGER expression of %s clause at %L must be positive",
+	   clause, >where);
 }
 
 static void
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/loop-2.f95
index 0c902b2..d4c6273 100644
--- a/gcc/testsuite/gfortran.dg/goacc/loop-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-2.f95
@@ -143,7 +143,7 @@ program test
   DO j = 1,10
   ENDDO
 ENDDO
-!$acc loop tile(-1) ! { dg-warning "must be positive" }
+!$acc loop tile(-1) ! { dg-error "must be positive" }
 do i = 1,10
 enddo
 !$acc loop tile(i) ! { dg-error "constant expression" }
@@ -307,7 +307,7 @@ program test
   DO j = 1,10
   ENDDO
 ENDDO
-!$acc loop tile(-1) ! { dg-warning "must be positive" }
+!$acc loop tile(-1) ! { dg-error "must be positive" }
 do i = 1,10
 enddo
 !$acc loop tile(i) ! { dg-error "constant expression" }
@@ -460,7 +460,7 @@ program test
 DO j = 1,10
 ENDDO
   ENDDO
-  !$acc kernels loop tile(-1) ! { dg-warning "must be positive" }
+  !$acc kernels loop tile(-1) ! { dg-error "must be positive" }
   do i = 1,10
   enddo
   !$acc kernels loop tile(i) ! { dg-error "constant expression" }
@@ -612,7 +612,7 @@ program test
 DO j = 1,10
 ENDDO
   ENDDO
-  !$acc parallel loop tile(-1) ! { dg-warning "must be positive" }
+  !$acc parallel loop tile(-1) ! { dg-error "must be positive" }
   do i = 1,10
   enddo
   !$acc parallel loop tile(i) ! { dg-error "constant expression" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-5.f95 b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
index d059cf7..fe137d5 100644
--- a/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
@@ -93,9 +93,6 @@ program test
   DO j = 1,10
   ENDDO
 ENDDO
-!$acc loop tile(-1) ! { dg-warning "must be positive" }
-do i = 1,10
-enddo
 !$acc loop vector tile(*)
 DO i = 1,10
 ENDDO
@@ -129,9 +126,6 @@ program test
   DO j = 1,10
   ENDDO
 ENDDO
-!$acc loop tile(-1) ! { dg-warning "must be positive" }
-do i = 1,10
-enddo
 !$acc loop vector tile(*)
 DO i = 1,10
 ENDDO
@@ -242,9 +236,6 @@ program test
 DO j = 1,10
 ENDDO
   ENDDO
-  !$acc kernels loop tile(-1) ! { dg-warning "must be positive" }
-  do i = 1,10
-  enddo
   !$acc kernels loop vector tile(*)
   DO i = 1,10
   ENDDO
@@ -333,9 +324,6 @@ program test
 DO j = 1,10
 ENDDO
   ENDDO
-  !$acc parallel loop tile(-1) ! { dg-warning "must be positive" }
-  do i = 1,10
-  enddo
   !$acc parallel loop vector tile(*)
   DO i = 1,10
   ENDDO
diff --git a/gcc/testsuite/gfortran.dg/goacc/sie.f95 b/gcc/testsuite/gfortran.dg/goacc/sie.f95
index abfe28b..3abf2c8 100644
--- a/gcc/testsuite/gfortran.dg/goacc/sie.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/sie.f95
@@ -78,10 +78,10 @@ program test
   !$acc parallel num_gangs(i+1)
   !$acc end parallel
 
-  !$acc parallel num_gangs(-1) ! { dg-warning "must be positive" }
+  !$acc parallel num_gangs(-1) ! { dg-error "must be positive" }
   !$acc end parallel
 
-  !$acc parallel num_gangs(0) ! { dg-warning "must be positive" }
+  !$acc parallel num_gangs(0) ! { dg-error "must be positive" }
   !$acc end parallel
 
   !$acc parallel num_gangs() 

[PATCH][OpenACC] cleanup trans-stmt.h

2018-08-07 Thread Cesar Philippidis
This patch removes a stale reference to trans-openacc.c in
gcc/fortran/trans-statement.h. I'll apply it to trunk as obvious shortly.

Cesar
>From a08fe168c3f3ca4d446915ad26027786cda58394 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Tue, 14 Mar 2017 22:33:00 +
Subject: [PATCH] [OpenACC] cleanup trans-stmt.h

2018-08-07  Cesar Philippidis  

	gcc/fortran/
	* trans-stmt.h: Remove stale reference to trans-openacc.c.

---
 gcc/fortran/trans-stmt.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/fortran/trans-stmt.h b/gcc/fortran/trans-stmt.h
index c798c80..848c7d9 100644
--- a/gcc/fortran/trans-stmt.h
+++ b/gcc/fortran/trans-stmt.h
@@ -70,8 +70,6 @@ tree gfc_trans_deallocate_array (tree);
 /* trans-openmp.c */
 tree gfc_trans_omp_directive (gfc_code *);
 void gfc_trans_omp_declare_simd (gfc_namespace *);
-
-/* trans-openacc.c */
 tree gfc_trans_oacc_directive (gfc_code *);
 tree gfc_trans_oacc_declare (gfc_namespace *);
 
-- 
2.7.4



Re: [PATCH] Line map table allocation

2018-08-07 Thread Nathan Sidwell

On 08/07/2018 03:43 PM, David Malcolm wrote:


OK for trunk, with the nits noted above fixed.


Committing this, thanks

nathan

--
Nathan Sidwell
2018-08-07  Nathan Sidwell  

	* line-map.c: (linemap_init): Set default allocator here.
	(new_linemap): Rather than here.  Refactor allocation logic.

Index: line-map.c
===
--- line-map.c	(revision 263365)
+++ line-map.c	(working copy)
@@ -346,6 +346,8 @@ linemap_init (struct line_maps *set,
 #else
   new (set) line_maps();
 #endif
+  /* Set default reallocator (used for initial alloc too).  */
+  set->reallocator = xrealloc;
   set->highest_location = RESERVED_LOCATION_COUNT - 1;
   set->highest_line = RESERVED_LOCATION_COUNT - 1;
   set->location_adhoc_data_map.htab =
@@ -376,81 +378,59 @@ linemap_check_files_exited (struct line_
 static struct line_map *
 new_linemap (struct line_maps *set,  source_location start_location)
 {
-  struct line_map *result;
-  bool macro_map_p = start_location >= LINE_MAP_MAX_LOCATION;
+  bool macro_p = start_location >= LINE_MAP_MAX_LOCATION;
+  unsigned num_maps_allocated = LINEMAPS_ALLOCATED (set, macro_p);
+  unsigned num_maps_used = LINEMAPS_USED (set, macro_p);
 
-  if (LINEMAPS_USED (set, macro_map_p) == LINEMAPS_ALLOCATED (set, macro_map_p))
+  if (num_maps_used == num_maps_allocated)
 {
-  /* We ran out of allocated line maps. Let's allocate more.  */
-  size_t alloc_size;
-
-  /* Cast away extern "C" from the type of xrealloc.  */
-  line_map_realloc reallocator = (set->reallocator
-  ? set->reallocator
-  : (line_map_realloc) xrealloc);
-  line_map_round_alloc_size_func round_alloc_size =
-	set->round_alloc_size;
-
-  size_t map_size = (macro_map_p
-			 ? sizeof (line_map_macro)
-			 : sizeof (line_map_ordinary));
+  /* We need more space!  */
+  if (!num_maps_allocated)
+	num_maps_allocated = 128;
+  num_maps_allocated *= 2;
+
+  size_t size_of_a_map;
+  void *buffer;
+  if (macro_p)
+	{
+	  size_of_a_map = sizeof (line_map_macro);
+	  buffer = set->info_macro.maps;
+	}
+  else
+	{
+	  size_of_a_map = sizeof (line_map_ordinary);
+	  buffer = set->info_ordinary.maps;
+	}
 
   /* We are going to execute some dance to try to reduce the
 	 overhead of the memory allocator, in case we are using the
 	 ggc-page.c one.
 	 
 	 The actual size of memory we are going to get back from the
-	 allocator is the smallest power of 2 that is greater than the
-	 size we requested.  So let's consider that size then.  */
-
-  alloc_size =
-	(2 * LINEMAPS_ALLOCATED (set, macro_map_p) +  256)
-	* map_size;
-
-  /* Get the actual size of memory that is going to be allocated
-	 by the allocator.  */
-  alloc_size = round_alloc_size (alloc_size);
+	 allocator may well be larger than what we ask for.  Use this
+	 hook to find what that size is.  */
+  size_t alloc_size
+	= set->round_alloc_size (num_maps_allocated * size_of_a_map);
 
   /* Now alloc_size contains the exact memory size we would get if
 	 we have asked for the initial alloc_size amount of memory.
-	 Let's get back to the number of macro map that amounts
-	 to.  */
-  LINEMAPS_ALLOCATED (set, macro_map_p) =
-	alloc_size / map_size;
-
-  /* And now let's really do the re-allocation.  */
-  if (macro_map_p)
-	{
-	  set->info_macro.maps
-	= (line_map_macro *) (*reallocator) (set->info_macro.maps,
-		 (LINEMAPS_ALLOCATED (set, macro_map_p)
-		  * map_size));
-	  result = >info_macro.maps[LINEMAPS_USED (set, macro_map_p)];
-	}
-  else
-	{
-	  set->info_ordinary.maps =
-	(line_map_ordinary *) (*reallocator) (set->info_ordinary.maps,
-		  (LINEMAPS_ALLOCATED (set, macro_map_p)
-		   * map_size));
-	  result = >info_ordinary.maps[LINEMAPS_USED (set, macro_map_p)];
-	}
-  memset (result, 0,
-	  ((LINEMAPS_ALLOCATED (set, macro_map_p)
-		- LINEMAPS_USED (set, macro_map_p))
-	   * map_size));
-}
-  else
-{
-  if (macro_map_p)
-	result = >info_macro.maps[LINEMAPS_USED (set, macro_map_p)];
+	 Let's get back to the number of map that amounts to.  */
+  unsigned num_maps = alloc_size / size_of_a_map;
+  buffer = set->reallocator (buffer, num_maps * size_of_a_map);
+  memset ((char *)buffer + num_maps_used * size_of_a_map, 0,
+	  (num_maps - num_maps_used) * size_of_a_map);
+  if (macro_p)
+	set->info_macro.maps = (line_map_macro *)buffer;
   else
-	result = >info_ordinary.maps[LINEMAPS_USED (set, macro_map_p)];
+	set->info_ordinary.maps = (line_map_ordinary *)buffer;
+  LINEMAPS_ALLOCATED (set, macro_p) = num_maps;
 }
 
-  result->start_location = start_location;
+  line_map *result = (macro_p ? (line_map *)>info_macro.maps[num_maps_used]
+		  : (line_map *)>info_ordinary.maps[num_maps_used]);
+  LINEMAPS_USED (set, macro_p)++;
 
-  LINEMAPS_USED (set, macro_map_p)++;
+  result->start_location = start_location;
 
   return result;
 }


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-08-07 Thread Giuliano Augusto Faulin Belinassi
That is a good question because I didn't know that such targets
exists. Any suggestion?


On Tue, Aug 7, 2018 at 5:29 PM, Paul Koning  wrote:
>
>
>> On Aug 7, 2018, at 4:00 PM, Giuliano Augusto Faulin Belinassi 
>>  wrote:
>>
>> Related with bug 86829, but for hyperbolic trigonometric functions.
>> This patch adds substitution rules to both sinh(tanh(x)) -> x / sqrt(1
>> - x*x) and cosh(tanh(x)) -> 1 / sqrt(1 - x*x). Notice that the both
>> formulas has division by 0, but it causes no harm because 1/(+0) ->
>> +infinity, thus the math is still safe.
>
> What about non-IEEE targets that don't have "infinite" in their float 
> representation?
>
> paul
>
>


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-08-07 Thread Paul Koning



> On Aug 7, 2018, at 4:00 PM, Giuliano Augusto Faulin Belinassi 
>  wrote:
> 
> Related with bug 86829, but for hyperbolic trigonometric functions.
> This patch adds substitution rules to both sinh(tanh(x)) -> x / sqrt(1
> - x*x) and cosh(tanh(x)) -> 1 / sqrt(1 - x*x). Notice that the both
> formulas has division by 0, but it causes no harm because 1/(+0) ->
> +infinity, thus the math is still safe.

What about non-IEEE targets that don't have "infinite" in their float 
representation?

paul




Re: [RFC][PATCH][mid-end] Optimize immediate choice in comparisons.

2018-08-07 Thread Richard Sandiford
Hi Vlad,

Thanks for the patch.

Vlad Lazar  writes:
> Hi.
>
> This patch optimises the choice of immediates in integer comparisons. Integer
> comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
> and there are cases where an incremented/decremented immediate can be loaded 
> into a
> register in fewer instructions. The cases are as follows:
>
> i)   a >  b or a >= b + 1
> ii)  a <= b or a <  b + 1
> iii) a >= b or a >  b - 1
> iv)  a <  b or a <= b - 1
>
> For each comparison we check whether the equivalent can be performed in less 
> instructions.
> This is done on a statement by statement basis, right before the GIMPLE 
> statement is expanded
> to RTL. Therefore, it requires a correct implementation of the 
> TARGET_INSN_COST hook.
> The change is general and it applies to any integer comparison, regardless of 
> types or location.
>
> For example, on AArch64 for the following code:
>
> int foo (int x)
> {
>return x > 0xfefe;
> }
>
> it generates:
>
> mov   w1, -16777217
> cmp   w0, w1
> cset  w0, cs
>
> instead of:
>
> mov   w1, 65534
> movk  w1, 0xfeff, lsl 16
> cmp   w0, w1
> cset  w0, hi
>
> Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and 
> there are no regressions.

Looks like a useful feature.  I'm playing devil's advocate to some
extent here, but did you consider instead doing this during the
expansion functions themselves?  In some ways that feels more
natural because we're already operating on rtxes at that point.
It also has the advantage that it would trap comparisons that didn't
exist at the gimple level, such as when computing the carry for a
doubleword add/sub.

I think the two main functions that would need to change are:

- prepare_cmp_insn
- emit_store_flag_1

We could have something like:

  void canonicalize_comparison_for_target (machine_mode, rtx_code *, rtx *);

which changes the comparison code and second operand as in your patch.
In the case of emit_store_flag_1, it should happen after:

  if (swap_commutative_operands_p (op0, op1))
{
  std::swap (op0, op1);
  code = swap_condition (code);
}

(In the case of prepare_cmp_insn, the callers have already done this.)

Note that the patch as its stands won't handle comparisons nested
inside the first operand of a COND_EXPR gassign (which unfortunately
are still a thing).  E.g.:

  _1 = _2 > 0xfefe ? _3 : _4;

FWIW, I agree this is a target-independent technique that should be
done in target-independent code, rather than something that should
be duplicated in each individual target that benefits from it.

Thanks,
Richard

>
> Thanks,
> Vlad
>
> gcc/testsuite/
> Changelog for gcc/testsuite/Changelog
> 2018-07-30  Vlad Lazar  
>
>   * gcc.target/aarch64/imm_choice_comparison.c: New.
>
> gcc/
> Changelog for gcc/Changelog
> 2018-07-30  Vlad Lazar  
>
>   * cfgexpand.c (optimize_immediate_choice): New.
>   (can_optimize_immediate_p): Likewise.
>   (validate_immediate_optimization): Likewise.
>   (update_immediate): Likewise.
>   (immediate_optimized_code): Likewise.
>
> ---
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 
> 9b91279282e1c6956c8b3699f13036c401ea1dcd..5b0a2e0cdb23f649336844506c8241428b3e6e7d
>  100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -5423,6 +5423,157 @@ reorder_operands (basic_block bb)
> XDELETE (lattice);
>   }
>   
> +/* Helper function for update_immediate.  Returns the adjusted conditional
> +   code for the immediate choice optimization.  See optimize_immediate_choice
> +   for cases.  */
> +
> +static enum tree_code
> +immediate_optimized_code (enum tree_code code)
> +{
> +  switch (code)
> +{
> +case GT_EXPR:
> +  return GE_EXPR;
> +case GE_EXPR:
> +  return GT_EXPR;
> +case LT_EXPR:
> +  return LE_EXPR;
> +case LE_EXPR:
> +  return LT_EXPR;
> +default:
> +  return code;
> +}
> +}
> +
> +/* Helper function for optimize_immediate_choice.  It updates the immediate
> +   and the subcode of the gimple statement.  At the point of calling
> +   the function we know that the optimization leads to fewer instructions.
> +   stmt points to the gimple statement containing the comparison we wish to
> +   optimize and to_add is the amount by which the immediate needs to be
> +   adjusted (1 or -1).  */
> +
> +static void
> +update_immediate (gimple *stmt, int to_add)
> +{
> +  tree inc_dec = to_add == 1 ? build_one_cst (integer_type_node) :
> +build_minus_one_cst (integer_type_node);
> +
> +  enum gimple_code code = gimple_code (stmt);
> +  if (code == GIMPLE_COND)
> +{
> +  gcond *gc = GIMPLE_CHECK2 (stmt);
> +  tree rhs = gimple_cond_rhs (gc);
> +
> +  /* Update the statement.  */
> +  tree new_rhs = fold_build2 (PLUS_EXPR, TREE_TYPE (rhs), rhs, inc_dec);
> +  gimple_cond_set_rhs (gc, new_rhs);
> +  gc->subcode = immediate_optimized_code ((enum tree_code) gc->subcode);
> +}

[PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-08-07 Thread Giuliano Augusto Faulin Belinassi
Related with bug 86829, but for hyperbolic trigonometric functions.
This patch adds substitution rules to both sinh(tanh(x)) -> x / sqrt(1
- x*x) and cosh(tanh(x)) -> 1 / sqrt(1 - x*x). Notice that the both
formulas has division by 0, but it causes no harm because 1/(+0) ->
+infinity, thus the math is still safe.

Changelog:
2018-08-07  Giuliano Belinassi 

* match.pd: add simplification rules to sinh(atanh(x)) and cosh(atanh(x)).

All tests added by this patch runs without errors in trunk, however,
there are tests unrelated with this patch that fails in my x86_64
Ubuntu 18.04.
Index: gcc/match.pd
===
--- gcc/match.pd	(revisão 263359)
+++ gcc/match.pd	(cópia de trabalho)
@@ -4219,6 +4219,25 @@
   (mult:c (TAN:s @0) (COS:s @0))
(SIN @0))
 
+ /* Simplify sinh(atanh(x)) -> x / sqrt(1 - x*x). */
+ (for sins (SINH)
+  atans (ATANH)
+  sqrts (SQRT)
+  (simplify
+   (sins (atans:s @0))
+   (rdiv @0 (sqrts (minus {build_one_cst (type);} 
+   (mult @0 @0))
+ 
+ /* Simplify cosh(atanh(x)) -> 1 / sqrt(1 - x*x). */
+ (for coss (COSH)
+  atans (ATANH)
+  sqrts (SQRT)
+  (simplify
+   (coss (atans:s @0))
+   (rdiv {build_one_cst (type);} 
+   (sqrts (minus {build_one_cst (type);} 
+(mult @0 @0))
+
  /* Simplify x * pow(x,c) -> pow(x,c+1). */
  (simplify
   (mult:c @0 (POW:s @0 REAL_CST@1))
Index: gcc/testsuite/gcc.dg/sinhtanh-1.c
===
--- gcc/testsuite/gcc.dg/sinhtanh-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/sinhtanh-1.c	(cópia de trabalho)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
+
+extern double sinh(double x);
+extern double atanh(double x);
+
+double __attribute__ ((noinline)) 
+sinhatanh_(double x)
+{
+return sinh(atanh(x));
+}
+
+/* There should be no calls to sinh nor atanh */
+/* { dg-final { scan-tree-dump-not "sinh " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "atanh " "optimized" } } */
Index: gcc/testsuite/gcc.dg/sinhtanh-2.c
===
--- gcc/testsuite/gcc.dg/sinhtanh-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/sinhtanh-2.c	(cópia de trabalho)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
+
+extern double cosh(double x);
+extern double atanh(double x);
+
+double __attribute__ ((noinline)) 
+coshatanh_(double x)
+{
+return cosh(atanh(x));
+}
+
+/* There should be no calls to cosh nor atanh */
+/* { dg-final { scan-tree-dump-not "cosh " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "atanh " "optimized" } } */
Index: gcc/testsuite/gcc.dg/sinhtanh-3.c
===
--- gcc/testsuite/gcc.dg/sinhtanh-3.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/sinhtanh-3.c	(cópia de trabalho)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
+
+extern double sinh(double x);
+extern double atanh(double x);
+
+double __attribute__ ((noinline)) 
+sinhatanh_(double x)
+{
+double atgh = atanh(x);
+return sinh(atgh) + atgh;
+}
+
+/* There should be calls to both sinh and atanh */
+/* { dg-final { scan-tree-dump "sinh " "optimized" } } */
+/* { dg-final { scan-tree-dump "atanh " "optimized" } } */
Index: gcc/testsuite/gcc.dg/sinhtanh-4.c
===
--- gcc/testsuite/gcc.dg/sinhtanh-4.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/sinhtanh-4.c	(cópia de trabalho)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
+
+extern double cosh(double x);
+extern double atanh(double x);
+
+double __attribute__ ((noinline)) 
+coshatanh_(double x)
+{
+double atgh = atanh(x);
+return cosh(atgh) + atgh;
+}
+
+/* There should be calls to both cosh and atanh */
+/* { dg-final { scan-tree-dump "cosh " "optimized" } } */
+/* { dg-final { scan-tree-dump "atanh " "optimized" } } */


Re: [PATCH, testsuite]: Fix g++.dg/torture/pr86763.C link failure for glibc < 2.17

2018-08-07 Thread Rainer Orth
Hi Andreas,

> How about replacing clock_gettime with something else?  It's not needed
> for the particular test.

that would certainly be useful: Solaris 10 also needs librt or
clock_gettime, while Solaris 11 has folded it into libc.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Line map table allocation

2018-08-07 Thread David Malcolm
On Mon, 2018-08-06 at 12:58 -0400, Nathan Sidwell wrote:
> Here's a line-map patch to make the new_linemap logic simpler.  On
> the 
> modules branch I need to allocate blocks of linemaps, and found the 
> current allocation scheme a little confusing to adjust.  This'll
> make 
> that subsequent change simpler.
> 
> While there, I set the default allocator (xmalloc) in the init
> routine, 
> rather than check it for each allocation.  I doubt the loss of a 
> devirtualization possibility is significant (we're doing allocation 
> wrong if it is).
> 
> booted & tested on x86_64-linux
> 
> ok?

> 2018-08-06  Nathan Sidwell  
> 
>   * line-map.c: (linemap_init): Set default allocator here.
>   (new_linemap): Rather than here.  Refactor allocation logic.
> 
> Index: line-map.c
> ===
> --- line-map.c(revision 263332)
> +++ line-map.c(working copy)
> @@ -346,6 +346,8 @@ linemap_init (struct line_maps *set,
>  #else
>new (set) line_maps();
>  #endif
> +  /* Set default allocator.  */
> +  set->reallocator = (line_map_realloc) xrealloc;

Set default *reallocator*, surely?

I wonder if we still need that cast.  I see include/libiberty.h has:

  extern void *xrealloc (void *, size_t) ATTRIBUTE_RETURNS_NONNULL;

and libcpp/include/linemap.h has:

  typedef void *(*line_map_realloc) (void *, size_t);

which appear to be identical to me.  But there's enough macro cruft
elsewhere involving realloc that I'm nervous about removing the cast.

>set->highest_location = RESERVED_LOCATION_COUNT - 1;
>set->highest_line = RESERVED_LOCATION_COUNT - 1;
>set->location_adhoc_data_map.htab =
> @@ -376,81 +378,58 @@ linemap_check_files_exited (struct line_
>  static struct line_map *
>  new_linemap (struct line_maps *set,  source_location start_location)
>  {
> -  struct line_map *result;
> -  bool macro_map_p = start_location >= LINE_MAP_MAX_LOCATION;
> +  bool macro_p = start_location >= LINE_MAP_MAX_LOCATION;


> +  unsigned alloc = LINEMAPS_ALLOCATED (set, macro_p);
> +  unsigned used = LINEMAPS_USED (set, macro_p);

These two names are too terse for my taste; whilst reading the rest of
the patch I had to double-check "is this a count of structs or of
bytes?".

How about "num_alloc_maps" and "num_used_maps"?

> -  if (LINEMAPS_USED (set, macro_map_p) == LINEMAPS_ALLOCATED (set, 
> macro_map_p))
> +  if (used == alloc)
>  {
> -  /* We ran out of allocated line maps. Let's allocate more.  */
> -  size_t alloc_size;
> -
> -  /* Cast away extern "C" from the type of xrealloc.  */
> -  line_map_realloc reallocator = (set->reallocator
> -   ? set->reallocator
> -   : (line_map_realloc) xrealloc);
> -  line_map_round_alloc_size_func round_alloc_size =
> - set->round_alloc_size;
> -
> -  size_t map_size = (macro_map_p
> -  ? sizeof (line_map_macro)
> -  : sizeof (line_map_ordinary));
> +  /* We need more space!  */
> +  if (!alloc)
> + alloc = 128;
> +  alloc *= 2;
> +
> +  size_t map_size;

Whilst we're refactoring, please rename this to "size_per_map".

> +  void *buffer;
> +  if (macro_p)
> + {
> +   map_size = sizeof (line_map_macro);
> +   buffer = set->info_macro.maps;
> + }
> +  else
> + {
> +   map_size = sizeof (line_map_ordinary);
> +   buffer = set->info_ordinary.maps;
> + }
>  
>/* We are going to execute some dance to try to reduce the
>overhead of the memory allocator, in case we are using the
>ggc-page.c one.
>
>The actual size of memory we are going to get back from the
> -  allocator is the smallest power of 2 that is greater than the
> -  size we requested.  So let's consider that size then.  */
> -
> -  alloc_size =
> - (2 * LINEMAPS_ALLOCATED (set, macro_map_p) +  256)
> - * map_size;
> -
> -  /* Get the actual size of memory that is going to be allocated
> -  by the allocator.  */
> -  alloc_size = round_alloc_size (alloc_size);
> +  allocator may well be larger than what we ask for.  Use this
> +  hook to find what that size is.  */
> +  size_t alloc_size = set->round_alloc_size (alloc * map_size);

If I'm reading this right, there's a slight change here in how we grow
the buffers: previously, num_alloc_maps grew to:

  2 * num_alloc_maps + 256

whereas now it grows to:

  2 * num_alloc_maps, with an initial size of 256.

(That addition of 256 in the old behavior appears to date back to
r44584 on 2001-08-03, which created line-map.c)

I think this growth change is OK.

>  
>/* Now alloc_size contains the exact memory size we would get if
>we have asked for the initial alloc_size amount of memory.
>Let's get back to the number of macro map that amounts
>to.  */

The above comment contains a pre-existing 

[PATCH, rs6000] testcases for GIMPLE folding of vec_splat builtin.

2018-08-07 Thread Will Schmidt
Hi,
  Some testcases to exercise the vec_splat() built-in.

Tested successfully on assorted powerpc Linux systems, in conjunction
with the vec-splat() folding patch, which is being posted separately.

In building and updating these tests I consciously violated the
80 char per line rule, as I was doing experimentation with the
assorted values and making lots of local updates.  I've left them as-is
just because, but can update that if so desired.

OK for trunk?

Thanks,
-Will

[testsuite]

2018-08-07  Will Schmidt  

* gcc.target/powerpc/fold-vec-splat-char.c: New.
* gcc.target/powerpc/fold-vec-splat-floatdouble.c: New.
* gcc.target/powerpc/fold-vec-splat-int.c: New.
* gcc.target/powerpc/fold-vec-splat-longlong.c: New.
* gcc.target/powerpc/fold-vec-splat-pixel.c: New.
* gcc.target/powerpc/fold-vec-splat-short.c: New.

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-char.c
new file mode 100644
index 000..7c4c784
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-char.c
@@ -0,0 +1,60 @@
+/* Verify that overloaded built-ins for vec_splat with char
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+#include 
+
+vector bool char testb_0  (vector bool char x) { return vec_splat (x, 
0b0); }
+vector bool char testb_1  (vector bool char x) { return vec_splat (x, 
0b1); }
+vector bool char testb_2  (vector bool char x) { return vec_splat (x, 
0b00010); }
+vector bool char testb_4  (vector bool char x) { return vec_splat (x, 
0b00100); }
+vector bool char testb_8  (vector bool char x) { return vec_splat (x, 
0b01000); }
+vector bool char testb_10 (vector bool char x) { return vec_splat (x, 
0b1); }
+vector bool char testb_1e (vector bool char x) { return vec_splat (x, 
0b0); }
+vector bool char testb_1f (vector bool char x) { return vec_splat (x, 
0b1); }
+
+vector signed char tests_0  (vector signed char x) { return vec_splat (x, 
0b0); }
+vector signed char tests_1  (vector signed char x) { return vec_splat (x, 
0b1); }
+vector signed char tests_2  (vector signed char x) { return vec_splat (x, 
0b00010); }
+vector signed char tests_4  (vector signed char x) { return vec_splat (x, 
0b00100); }
+vector signed char tests_8  (vector signed char x) { return vec_splat (x, 
0b01000); }
+vector signed char tests_10 (vector signed char x) { return vec_splat (x, 
0b1); }
+vector signed char tests_1e (vector signed char x) { return vec_splat (x, 
0b0); }
+vector signed char tests_1f (vector signed char x) { return vec_splat (x, 
0b1); }
+
+vector unsigned char testu_0  (vector unsigned char x) { return vec_splat (x, 
0b0); }
+vector unsigned char testu_1  (vector unsigned char x) { return vec_splat (x, 
0b1); }
+vector unsigned char testu_2  (vector unsigned char x) { return vec_splat (x, 
0b00010); }
+vector unsigned char testu_4  (vector unsigned char x) { return vec_splat (x, 
0b00100); }
+vector unsigned char testu_8  (vector unsigned char x) { return vec_splat (x, 
0b01000); }
+vector unsigned char testu_10 (vector unsigned char x) { return vec_splat (x, 
0b1); }
+vector unsigned char testu_1e (vector unsigned char x) { return vec_splat (x, 
0b0); }
+vector unsigned char testu_1f (vector unsigned char x) { return vec_splat (x, 
0b1); }
+
+/* Similar test as above, but the source vector is a known constant. */
+const vector bool char by = 
{'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'};
+const vector signed char sy = 
{'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'};
+const vector unsigned char uy = 
{'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'};
+vector bool char
+test_bc (vector bool char x)
+{
+  return vec_splat (by, 0b00010);
+}
+vector signed char
+test_sc (vector signed char x)
+{
+  return vec_splat (sy, 0b00011);
+}
+vector unsigned char
+test_uc (vector unsigned char x)
+{
+  return vec_splat (uy, 0b00110);
+}
+
+// vec_splat() using variable vectors should generate the vspltb instruction.
+/* { dg-final { scan-assembler-times "vspltb" 24 } } */
+// vec_splat() using a constant vector should generate a load.
+/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvw4x\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
new file mode 100644
index 000..a6eaa72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
@@ -0,0 +1,54 @@
+/* Verify that overloaded built-ins for vec_splat with float and
+   double inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-maltivec -mvsx -O1" } */
+
+#include 
+
+vector float testf_00 

[PATCH, rs6000] Early gimple folding of vec_mergeh and vec_mergel for float

2018-08-07 Thread Will Schmidt
Hi,
   This adds support for gimple folding of vec_mergeh and vec_mergel
for float and double types.   Support for the integral types is already
in-tree.

This change includes updates to the fold_mergehl_helper function to handle
creating the (integral) permute vector when the vector type non integer.

Relevant testcases are already in-tree.

OK for trunk?

Thanks,
-Will

[gcc]

2018-08-07 Will Schmidt  

* config/rs6000/rs600.c (rs6000_gimple_fold_builtin): Add entries to
allow folding of mergeh() and mergel() for the float and double types.
(fold_mergehl_helper): Rework to handle building a permute tree
for float vectors.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6cb5c87..35c32be 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15119,24 +15119,39 @@ fold_mergehl_helper (gimple_stmt_iterator *gsi, 
gimple *stmt, int use_high)
 {
   tree arg0 = gimple_call_arg (stmt, 0);
   tree arg1 = gimple_call_arg (stmt, 1);
   tree lhs = gimple_call_lhs (stmt);
   tree lhs_type = TREE_TYPE (lhs);
-  tree lhs_type_type = TREE_TYPE (lhs_type);
   int n_elts = TYPE_VECTOR_SUBPARTS (lhs_type);
   int midpoint = n_elts / 2;
   int offset = 0;
 
   if (use_high == 1)
 offset = midpoint;
 
-  tree_vector_builder elts (lhs_type, VECTOR_CST_NELTS (arg0), 1);
+  /* The permute_type will match the lhs for integral types.  For double and
+ float types, the permute type needs to map to the V2 or V4 type that
+ matches size.  */
+  tree permute_type;
+  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs_type)))
+permute_type = lhs_type;
+  else
+if (TREE_TYPE (lhs_type) == TREE_TYPE (V2DF_type_node))
+  permute_type = V2DI_type_node;
+else if (TREE_TYPE (lhs_type) == TREE_TYPE (V4SF_type_node))
+  permute_type = V4SI_type_node;
+else
+  gcc_unreachable ();
+
+  tree_vector_builder elts (permute_type, VECTOR_CST_NELTS (arg0), 1);
 
   for (int i = 0; i < midpoint; i++)
 {
-  elts.safe_push (build_int_cst (lhs_type_type, offset + i));
-  elts.safe_push (build_int_cst (lhs_type_type, offset + n_elts + i));
+  elts.safe_push (build_int_cst (TREE_TYPE (permute_type),
+offset + i));
+  elts.safe_push (build_int_cst (TREE_TYPE (permute_type),
+offset + n_elts + i));
 }
 
   tree permute = elts.build ();
 
   gimple *g = gimple_build_assign (lhs, VEC_PERM_EXPR, arg0, arg1, permute);
@@ -15757,18 +15772,22 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case ALTIVEC_BUILTIN_VMRGLH:
 case ALTIVEC_BUILTIN_VMRGLW:
 case VSX_BUILTIN_XXMRGLW_4SI:
 case ALTIVEC_BUILTIN_VMRGLB:
 case VSX_BUILTIN_VEC_MERGEL_V2DI:
+case VSX_BUILTIN_XXMRGLW_4SF:
+case VSX_BUILTIN_VEC_MERGEL_V2DF:
fold_mergehl_helper (gsi, stmt, 1);
return true;
 /* vec_mergeh (integrals).  */
 case ALTIVEC_BUILTIN_VMRGHH:
 case ALTIVEC_BUILTIN_VMRGHW:
 case VSX_BUILTIN_XXMRGHW_4SI:
 case ALTIVEC_BUILTIN_VMRGHB:
 case VSX_BUILTIN_VEC_MERGEH_V2DI:
+case VSX_BUILTIN_XXMRGHW_4SF:
+case VSX_BUILTIN_VEC_MERGEH_V2DF:
fold_mergehl_helper (gsi, stmt, 0);
return true;
 
 /* d = vec_pack (a, b) */
 case P8V_BUILTIN_VPKUDUM:




[PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat

2018-08-07 Thread Will Schmidt
Hi
Enable GIMPLE folding of the vec_splat() intrinsic.

For review.. feedback is expected. :-)

I came up with the following after spending some time poking around
at the tree_vec_extract() and vector_element() functions as seen
in tree-vect-generic.c looking for insights.  Some of this seems a
bit clunky to me yet, but this is functional as far as make-check
can tell, and is as far as I can get without additional input.

This uses the tree_vec_extract() function out of tree-vect-generic.c
to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
made non-static as part of this change.

In review of the .gimple output, this folding takes a sample testcase
of 
vector bool int testb_0  (vector bool int x)
{
  return vec_splat (x, 0b0); 
}
from:
testb_0 (__vector __bool int x)
{
  __vector __bool intD.1486 D.2855;

  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
  _2 = __builtin_altivec_vspltwD.1919 (_1, 0);
  D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
  return D.2855;
}
to:
 testb_0 (__vector __bool int x)
{
  __vector __bool intD.1486 D.2855;

  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
  D.2856 = BIT_FIELD_REF <_1, 32, 0>;
  _2 = {D.2856, D.2856, D.2856, D.2856};
  D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
  return D.2855;
}


Testcases are being posted as a separate patch.

OK for trunk?   
Thanks,
-Will

[gcc]

2018-08-07  Will Schmidt  

* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
  early gimple folding of vec_splat().
* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
* gimple-fold.h:  Add an extern define for tree_vec_extract().

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 35c32be..acc6b49 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15764,10 +15764,56 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
 g = gimple_build_assign (lhs, splat_tree);
 gimple_set_location (g, gimple_location (stmt));
 gsi_replace (gsi, g, true);
 return true;
+   }
+
+/* Flavors of vec_splat.  */
+// a = vec_splat (b, 0x3) ;  // a = { b[3],b[3],b[3],...};
+case ALTIVEC_BUILTIN_VSPLTB:
+case ALTIVEC_BUILTIN_VSPLTH:
+case ALTIVEC_BUILTIN_VSPLTW:
+case VSX_BUILTIN_XXSPLTD_V2DI:
+case VSX_BUILTIN_XXSPLTD_V2DF:
+  {
+   arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
+   arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
+   /* Only fold the vec_splat_*() if arg1 is a constant
+  5-bit unsigned literal.  */
+   if (TREE_CODE (arg1) != INTEGER_CST || TREE_INT_CST_LOW (arg1) > 0x1f)
+ return false;
+
+   lhs = gimple_call_lhs (stmt);
+   tree lhs_type = TREE_TYPE (lhs);
+
+   tree splat;
+   if (TREE_CODE (arg0) == VECTOR_CST)
+ splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));
+   else
+ {
+   /* Determine (in bits) the length and start location of the
+  splat value for a call to the tree_vec_extract helper.  */
+   int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (lhs_type))
+   * BITS_PER_UNIT;
+   int splat_elem_size = tree_size_in_bits / VECTOR_CST_NELTS (arg0);
+   int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
+   /* Do not attempt to early-fold if the size + specified offset into
+  the vector would touch outside of the source vector.  */
+   if ((splat_start_bit + splat_elem_size) > tree_size_in_bits)
+ return false;
+   tree len = build_int_cst (bitsizetype, splat_elem_size);
+   tree start = build_int_cst (bitsizetype, splat_start_bit);
+   splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
+ len, start);
+   }
+   /* And finally, build the new vector.  */
+   tree splat_tree = build_vector_from_val (lhs_type, splat);
+   g = gimple_build_assign (lhs, splat_tree);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
   }
 
 /* vec_mergel (integrals).  */
 case ALTIVEC_BUILTIN_VMRGLH:
 case ALTIVEC_BUILTIN_VMRGLW:
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 04e9bfa..e634180 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -59,10 +59,11 @@ extern tree gimple_fold_indirect_ref (tree);
 extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
 extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
 extern bool arith_code_with_undefined_signed_overflow (tree_code);
 extern gimple_seq rewrite_to_defined_overflow (gimple *);
 extern void replace_call_with_value 

Re: [PATCH] fix std::variant::swap for trivially-move-assignable types

2018-08-07 Thread Jonathan Wakely

On 07/08/18 17:57 +0300, Ville Voutilainen wrote:

On 7 August 2018 at 17:29, Jonathan Wakely  wrote:

On 07/08/18 15:24 +0100, Jonathan Wakely wrote:


This patch fixes the bug, but is it correct?

IIUC the _M_destructive_move effects don't depend on whether move
assignment is trivial, so should be defined the same way in both
specializations. It also looks like we can use it in the non-trivial
move assignment.

Should we define _M_destructive_move on _Move_ctor_base instead of
_Move_assign_base, so the duplication could be avoided?



Or maybe into _Move_ctor_base as in the attached patch. That allows it
to be used in _Copy_assign_base, and means we can omit the try-catch
block when the move construction is trivial.



_Move_ctor_base seems fine to me. I plan to revamp our variant to
bring it up to the changes done before C++17
was done, to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85517,
and I plan to do it shortly.


OK, here's what I've committed after testing.


commit 15e32f819c03a268c11ad48e45c87e996be02479
Author: Jonathan Wakely 
Date:   Tue Aug 7 19:06:08 2018 +0100

PR libstdc++/86874 fix std::variant::swap regression

PR libstdc++/86874
* include/std/variant (_Copy_ctor_base::_M_destructive_move): Define
here instead of in _Move_assign_base.
(_Copy_ctor_base::_M_destructive_move): Define.
(_Copy_assign_base::operator=): Use _M_destructive_move when changing
the contained value to another alternative.
(_Move_assign_base::operator=): Likewise.
(_Move_assign_base::_M_destructive_move): Remove.
* testsuite/20_util/variant/86874.cc: New test.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 66d878142a4..2d86a704c63 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -506,6 +506,20 @@ namespace __variant
 	  }
   }
 
+  void _M_destructive_move(_Move_ctor_base&& __rhs)
+  {
+	this->~_Move_ctor_base();
+	__try
+	  {
+	::new (this) _Move_ctor_base(std::move(__rhs));
+	  }
+	__catch (...)
+	  {
+	this->_M_index = variant_npos;
+	__throw_exception_again;
+	  }
+  }
+
   _Move_ctor_base(const _Move_ctor_base&) = default;
   _Move_ctor_base& operator=(const _Move_ctor_base&) = default;
   _Move_ctor_base& operator=(_Move_ctor_base&&) = default;
@@ -516,6 +530,12 @@ namespace __variant
 {
   using _Base = _Copy_ctor_alias<_Types...>;
   using _Base::_Base;
+
+  void _M_destructive_move(_Move_ctor_base&& __rhs)
+  {
+	this->~_Move_ctor_base();
+	::new (this) _Move_ctor_base(std::move(__rhs));
+  }
 };
 
   template
@@ -538,22 +558,14 @@ namespace __variant
 	  {
 		static constexpr void (*_S_vtable[])(void*, void*) =
 		  { &__erased_assign<_Types&, const _Types&>... };
-		_S_vtable[__rhs._M_index](this->_M_storage(), __rhs._M_storage());
+		_S_vtable[__rhs._M_index](this->_M_storage(),
+	  __rhs._M_storage());
 	  }
 	  }
 	else
 	  {
 	_Copy_assign_base __tmp(__rhs);
-	this->~_Copy_assign_base();
-	__try
-	  {
-		::new (this) _Copy_assign_base(std::move(__tmp));
-	  }
-	__catch (...)
-	  {
-		this->_M_index = variant_npos;
-		__throw_exception_again;
-	  }
+	this->_M_destructive_move(std::move(__tmp));
 	  }
 	__glibcxx_assert(this->_M_index == __rhs._M_index);
 	return *this;
@@ -582,20 +594,6 @@ namespace __variant
   using _Base = _Copy_assign_alias<_Types...>;
   using _Base::_Base;
 
-  void _M_destructive_move(_Move_assign_base&& __rhs)
-  {
-	this->~_Move_assign_base();
-	__try
-	  {
-	::new (this) _Move_assign_base(std::move(__rhs));
-	  }
-	__catch (...)
-	  {
-	this->_M_index = variant_npos;
-	__throw_exception_again;
-	  }
-  }
-
   _Move_assign_base&
   operator=(_Move_assign_base&& __rhs)
 	  noexcept(_Traits<_Types...>::_S_nothrow_move_assign)
@@ -613,16 +611,7 @@ namespace __variant
 	else
 	  {
 	_Move_assign_base __tmp(std::move(__rhs));
-	this->~_Move_assign_base();
-	__try
-	  {
-		::new (this) _Move_assign_base(std::move(__tmp));
-	  }
-	__catch (...)
-	  {
-		this->_M_index = variant_npos;
-		__throw_exception_again;
-	  }
+	this->_M_destructive_move(std::move(__tmp));
 	  }
 	__glibcxx_assert(this->_M_index == __rhs._M_index);
 	return *this;
diff --git a/libstdc++-v3/testsuite/20_util/variant/86874.cc b/libstdc++-v3/testsuite/20_util/variant/86874.cc
new file mode 100644
index 000..b595f9581a1
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/variant/86874.cc
@@ -0,0 +1,55 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your 

Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Richard Biener
On August 7, 2018 6:31:36 PM GMT+02:00, Martin Sebor  wrote:
>On 08/07/2018 09:33 AM, Bernd Edlinger wrote:
>> On 08/07/18 17:02, Martin Sebor wrote:
>>> On 08/06/2018 11:45 PM, Richard Biener wrote:
 On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor
> wrote:
> On 08/06/2018 11:40 AM, Jeff Law wrote:
>> On 08/06/2018 11:15 AM, Martin Sebor wrote:
> These examples do not aim to be valid C, they just point out
> limitations
> of the middle-end design, and a good deal of the problems are
>due
> to trying to do things that are not safe within the boundaries
> given
> by the middle-end design.
 I really think this is important -- and as such I think we need
>to
> move
 away from trying to describe scenarios in C because doing so
>keeps
 bringing us back to the "C doesn't allow XYZ" kinds of
>arguments
> when
 what we're really discussing are GIMPLE semantic issues.

 So examples should be GIMPLE.  You might start with (possibly
> invalid) C
 code to generate the GIMPLE, but the actual discussion needs to
>be
 looking at GIMPLE.  We might include the C code in case someone
> wants to
 look at things in a debugger, but bringing the focus to GIMPLE
>is
> really
 important here.
>>>
>>> I don't understand the goal of this exercise.  Unless the GIMPLE
>>> code is the result of a valid test case (in some language GCC
>>> supports), what does it matter what it looks like?  The basis of
>>> every single transformation done by a compiler is that the
>source
>>> code is correct.  If it isn't then all bets are off.  I'm no
>GIMPLE
>>> expert but even I can come up with any number of GIMPLE
>expressions
>>> that have undefined behavior.  What would that prove?
>> The GIMPLE IL is less restrictive than the original source
>language.
>> The process of translation into GIMPLE and optimization can
>create
>> situations in the GIMPLE IL that can't be validly represented in
>the
>> original source language.  Subobject crossing being one such
>case,
> there
>> are certainly others.  We have to handle these scenarios
>correctly.
>
> Sure, but a valid C test case still needs to exist to show that
> such a transformation is possible.  Until someone comes up with
> one it's all speculation.

 Jakub showed you one wrt CSE of addresses.
>>>
>>> Sorry, there have been so many examples I've lost track.  Can
>>> you please copy it here or point to it in the archive?
>>>
>>
>> This is based on Jakubs example here:
>https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html
>>
>> $ cat y.cc
>> typedef __typeof__ (sizeof 0) size_t;
>> void *operator new (size_t, void *p) { return p; }
>> void *operator new[] (size_t, void *p) { return p; }
>> struct S { int x; unsigned char a[1]; char b[64]; };
>> void baz (char *);
>>
>> size_t
>> foo (S *p)
>> {
>>char *q = new ((char*)p->a) char [16];
>>baz (q);
>>size_t x = __builtin_strlen (q);
>>if (x==0)
>>  __builtin_abort();
>>return x;
>> }
>>
>> $ gcc -O3 -S y.ccup
>> $ cat y.s
>> .LFB2:
>>  .cfi_startproc
>>  subq$8, %rsp
>>  .cfi_def_cfa_offset 16
>>  addq$4, %rdi
>>  call_Z3bazPc
>>  callabort
>>  .cfi_endproc
>>
>>
>>
>> I think this is not a correct optimization.
>
>I see.  This narrows it down to the placement new expression
>exposing the type of the original object rather than that of
>the newly constructed object.  We end up with strlen (_2)
>where _2 = _1(D)->a.
>
>The aggressive loop optimization trigger in this case because
>the access has been transformed to MEM[(char *)p_5(D) + 4B]
>early on which obviates the structure of the accessed type.
>
>This is the case that I think is worth solving -- ideally not
>by removing the optimization but by preserving the conversion
>to the type of the newly constructed object. 

Pointer types carry no information in GIMPLE. 

Richard. 
>
>Martin



Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Richard Biener
On August 7, 2018 4:37:00 PM GMT+02:00, Martin Sebor  wrote:
>On 08/07/2018 02:51 AM, Richard Biener wrote:
>> On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor
> wrote:
>>> On 07/26/2018 02:55 AM, Richard Biener wrote:
 On Wed, 25 Jul 2018, Martin Sebor wrote:

>> BUT - for the string_constant and c_strlen functions we are,
>> in all cases we return something interesting, able to look
>> at an initializer which then determines that type.  Hopefully.
>> I think the strlen() folding code when it sets SSA ranges
>> now looks at types ...?
>>
>> Consider
>>
>> struct X { int i; char c[4]; int j;};
>> struct Y { char c[16]; };
>>
>> void foo (struct X *p, struct Y *q)
>> {
>>   memcpy (p, q, sizeof (struct Y));
>>   if (strlen ((char *)(struct Y *)p + 4) < 7)
>> abort ();
>> }
>>
>> here the GIMPLE IL looks like
>>
>>   const char * _1;
>>
>>[local count: 1073741825]:
>>   _5 = MEM[(char * {ref-all})q_4(D)];
>>   MEM[(char * {ref-all})p_6(D)] = _5;
>>   _1 = p_6(D) + 4;
>>   _2 = __builtin_strlen (_1);
>>
>> and I guess Martin would argue that since p is of type struct X
>> + 4 gets you to c[4] and thus strlen of that cannot be larger
>> than 3.  But of course the middle-end doesn't work like that
>> and luckily we do not try to draw such conclusions or we
>> are somehow lucky that for the testcase as written above we do
>not
>> (I'm not sure whether Martins changes in this area would derive
>> such conclusions in principle).
>
> Only if the strlen argument were p->c.
>
>> NOTE - we do not know the dynamic type here since we do not know
>> the dynamic type of the memory pointed-to by q!  We can only
>> derive that at q+4 there must be some object that we can
>> validly call strlen on (where Martin again thinks strlen
>> imposes constrains that memchr does not - sth I do not agree
>> with from a QOI perspective)
>
> The dynamic type is a murky area.

 It's well-specified in the middle-end.  A store changes the
 dynamic type of the stored-to object.  If that type is
 compatible with the surrounding objects dynamic type that one
 is not affected, if not then the surrounding objects dynamic
 type becomes unspecified.  There is TYPE_TYPELESS_STORAGE
 to somewhat control "compatibility" of subobjects.
>>>
>>> I never responded to this.  Using a dynamic (effective) type as
>>> you describe it would invalidate the aggressive loop optimization
>>> in the following:
>>>
>>>   void foo (struct X *p)
>>>   {
>>>   struct Y y = { "12345678" };
>>>   memcpy (p, , sizeof (struct Y));
>>>
>>>   // *p effective type is now struct Y
>>>
>>>   int n = 0;
>>>   while (p->c[n])
>>> ++n;
>>>
>>>   if (n < 7)
>>> abort ();
>>>   }
>>>
>>> GCC unconditionally aborts, just as it does with strlen(p->c).
>>> Why is that not wrong (in either case)?
>>>
>>> Because the code is invalid either way, for two reasons:
>>
>> No, because the storage has only 4 non-null characters starting at
>offset 4?
>
>No, for the reasons below.  I made a mistake of making
>the initializer string too short.  If we make it longer it
>still aborts.  Say with this
>
>   struct Y y = { "123456789012345" };
>
>we end up with this DCE:
>
>  struct Y y;
>
>:
>   MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231;
>   __builtin_abort ();
>
>With -fdump-tree-cddce1-details (and a patch to show the upper
>bound) we see:
>
>   Found loop 1 to be finite: upper bound found: 3.
>
>With -fno-aggressive-loop-optimizations the abort becomes
>conditional because the array bound isn't considered. I would
>expect you to know this since you implemented the feature.

Honza added the array indexing part and it may very well be too aggressive. I 
have to take a closer look after vacation to tell. Can you open a PR and CC me 
there? 

Richard. 

>
>Martin
>>
>>> 1) it accesses an object of (effective) type struct Y via
>>>an lvalue of type struct X (specifically via (*p).c)
>>> 2) it relies on p->c
>>>
>>> The loop optimization relies on the exact same requirement
>>> as the strlen one.  Either they are both valid or neither is.
>>>
>>> Martin
>>



Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-07 Thread Segher Boessenkool
Hi!

Some very trivial comments...

On Mon, Aug 06, 2018 at 03:13:52PM -0700, Steve Ellcey wrote:
>   (aarch64_components_for_bb): Check for simd function.
>   (aarch64_epilogue_uses): New function.
>   (aarch64_process_components): Ditto.
>   (aarch64_expand_prologue): Ditto.
>   (aarch64_expand_epilogue): Ditto.
>   (aarch64_expand_call): Ditto.

Those "Ditto"s are meant to read "Check for simd function", not "New
function".

> +int
> +aarch64_epilogue_uses (int regno)
> +{
> +  if (epilogue_completed && (regno) == LR_REGNUM)

Those parens around "regno" are a bit superfluous, makes the reader think
there is more going on than there is ;-)

> +(define_insn "store_pair_dw_tftf"
> +  [(set (match_operand:TF 0 "aarch64_mem_pair_operand" "=Ump")
> + (match_operand:TF 1 "register_operand" "w"))
> +   (set (match_operand:TF 2 "memory_operand" "=m")
> + (match_operand:TF 3 "register_operand" "w"))]
> +   "TARGET_SIMD &&
> +rtx_equal_p (XEXP (operands[2], 0),

The && should be on the continuation line.

> +(define_insn "loadwb_pair_"
> +  [(parallel
> +[(set (match_operand:P 0 "register_operand" "=k")
> +  (plus:P (match_operand:P 1 "register_operand" "0")
> +  (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> + (set (match_operand:TX 2 "register_operand" "=w")
> +  (mem:TX (match_dup 1)))
> + (set (match_operand:TX 3 "register_operand" "=w")
> +  (mem:TX (plus:P (match_dup 1)
> +  (match_operand:P 5 "const_int_operand" "n"])]
> +  "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (mode)"
> +  "ldp\\t%q2, %q3, [%1], %4"
> +  [(set_attr "type" "neon_ldp_q")]
> +)

Here you don't indent with tabs.

> +/* { dg-final { scan-assembler "\[ \t\]stp\tq10, q11" } } */

If you use {} instead of "" you don't need to backtick everything.
Also, instead of [ \t] you can probably use [[:space:]] which has
shorthand \s .

> +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" } } */

That's [0-7] but maybe you find [01234567] more readable here.

> +/* { dg-final { scan-assembler-not "\[ \t\]str\t" } } */

You can also use \m and \M for start resp. end of word:

/* { dg-final { scan-assembler-not {\mstr\M} } } */

(or if you like double quotes better that is:

/* { dg-final { scan-assembler-not "\\mstr\\M" } } */

but why would you want that ;-) )


Segher


Re: [PING 3][PATCH] [v4][aarch64] Avoid tag collisions for loads falkor

2018-08-07 Thread Siddhesh Poyarekar

On 08/07/2018 10:30 PM, James Greenhalgh wrote:

To help set expectations here. I'm currently only able to dedicate a couple
of hours of time to review each week. Tamar's Stack Clash has been taking
a big chunk of that time recently as we push it to a final state for trunk.

This pass is... large and complex. I'm aware that it needs some attention,
and will try to get to it within the coming weeks.


Thank you, I totally understand.  I'll keep pinging on a weekly basis 
though if that's OK; it helps my workflow because otherwise I might 
forget about the patch for weeks at a time.



All the review from Kyrill in the interim is extremely helpful.


It definitely is.

Thanks,
Siddhesh


libgo patch committed: Uncomment trace.Stop call in testing package

2018-08-07 Thread Ian Lance Taylor
This patch by Than McIntosh uncomments the call to trace.Stop in the
testing package.  It was commented out when the runtime/trace package
did not build with gccgo, but now it does and has for some time.  The
fixes the go test -test.trace option.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 263362)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-8997a3afcc746824cb70b48b32d9c86b4814807d
+274c88df4d6f9360dcd657b6e069a3b5a1d37a90
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/testing/testing.go
===
--- libgo/go/testing/testing.go (revision 263362)
+++ libgo/go/testing/testing.go (working copy)
@@ -1159,7 +1159,7 @@ func (m *M) writeProfiles() {
m.deps.StopCPUProfile() // flushes profile to disk
}
if *traceFile != "" {
-   // trace.Stop() // flushes trace to disk
+   trace.Stop() // flushes trace to disk
}
if *memProfile != "" {
f, err := os.Create(toOutputDir(*memProfile))


Re: [PATCH] [Ada] Make middle-end string literals NUL terminated

2018-08-07 Thread Bernd Edlinger
On 08/07/18 18:40, Olivier Hainque wrote:
> Hi Bernd,
> 
> (Thanks for your interest in the Ada case as well :)
> 
>> On 7 Aug 2018, at 15:07, Bernd Edlinger  wrote:
> 
>> When I try this example:
>>
>> $ cat array9.adb
>> -- { dg-do run }
>>
>> procedure Array9 is
>>
>>V1 : String(1..10) := "1234567890";
>>V2 : String(1..-1) := "";
>>
>>procedure Compare (S : String) is
>>begin
>>  if S'Size /= 8*S'Length then
>>raise Program_Error;
>>  end if;
>>end;
>>
>> begin
>>Compare ("");
>>Compare ("1234");
>>Compare (V1);
>>Compare (V2);
>> end;
>>
>> I see that "1234" is put in the merge section,
>> but V1 is not.  Maybe because of the alignment requirement?
>>
>> But it sould not be much different from the C test case,
>> which is now able to merge the non-zero terminated strings.
> 
> I'm happy to have a look. I'd just like to make
> sure I'll be looking at the right thing:
> 
> I would need to start from trunk + the patches you
> referenced at
> 
>https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00339.html
> 
> + the last one you sent at
> 
>https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00481.html
> 
> Correct ?
> 

Yes,

Please use the latest patch patch here:
[PATCH] Check the STRING_CSTs in varasm.c
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00361.html

There is also a Fortran patch:
[PATCH] Create internally nul terminated string literals in fortan FE
https://gcc.gnu.org/ml/fortran/2018-08/msg0.html

And a JIT FE patch:
[PATCH] Fix not properly nul-terminated string constants in JIT
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00370.html


My host triplet is x86_64-pc-linux-gnu (I let config.guess determine it).
I use gmp, mpfr, mpc, isl in-tree.
And configure simply with ../gcc-trunk/configure --prefix=/home/ed/gnu/install 
--enable-languages=all


Thanks
Bernd.

> Can you then confirm the target triplet and
> compilation options ?
> 
> Thanks in advance!
> 
> Olivier
> 
> 
> 
> 


Re: [PATCH][Middle-end] disable strcmp/strncmp inlining with O2 below and Os

2018-08-07 Thread Qing Zhao
Hi, Christophe,

I have attached a patch in PR86519, could you please download it and test it, 
and let me know the result.

thanks.

Qing
> On Jul 30, 2018, at 8:45 AM, Christophe Lyon  
> wrote:
> 
> On Wed, 25 Jul 2018 at 19:08, Qing Zhao  > wrote:
>> 
>> Hi,
>> 
>> As Wilco suggested, the new added strcmp/strncmp inlining should be only 
>> enabled with O2 and above.
>> 
>> this is the simple patch for this change.
>> 
>> tested on both X86 and aarch64.
>> 
>> Okay for thunk?
>> 
>> Qing
>> 
>> gcc/ChangeLog:
>> 
>> +2018-07-25  Qing Zhao  
>> +
>> +   * builtins.c (inline_expand_builtin_string_cmp): Disable inlining
>> +   when optimization level is lower than 2 or optimize for size.
>> +
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> +2018-07-25  Qing Zhao  
>> +
>> +   * gcc.dg/strcmpopt_5.c: Change to O2 to enable the transformation.
>> +   * gcc.dg/strcmpopt_6.c: Likewise.
>> +
>> 
> 
> Hi,
> 
> After this change, I've noticed that:
> FAIL: gcc.dg/strcmpopt_6.c scan-rtl-dump-times expand "__builtin_memcmp" 6
> on arm-none-linux-gnueabi
> --with-mode thumb
> --with-cpu cortex-a9
> and forcing -march=armv5t via RUNTESTFLAGS
> 
> The log says:
> gcc.dg/strcmpopt_6.c: pattern found 4 times
> FAIL: gcc.dg/strcmpopt_6.c scan-rtl-dump-times expand "__builtin_memcmp" 6
> 
> Christophe



Re: [PING 3][PATCH] [v4][aarch64] Avoid tag collisions for loads falkor

2018-08-07 Thread James Greenhalgh
On Tue, Aug 07, 2018 at 03:01:28AM -0500, Siddhesh Poyarekar wrote:
> Hello,
> 
> Ping!

To help set expectations here. I'm currently only able to dedicate a couple
of hours of time to review each week. Tamar's Stack Clash has been taking
a big chunk of that time recently as we push it to a final state for trunk.

This pass is... large and complex. I'm aware that it needs some attention,
and will try to get to it within the coming weeks.

All the review from Kyrill in the interim is extremely helpful.

Thanks,
James

> 
> Siddhesh
> 
> On 07/24/2018 12:37 PM, Siddhesh Poyarekar wrote:
> > Hi,
> > 
> > This is a rewrite of the tag collision avoidance patch that Kugan had
> > written as a machine reorg pass back in February.
> > 
> > The falkor hardware prefetching system uses a combination of the
> > source, destination and offset to decide which prefetcher unit to
> > train with the load.  This is great when loads in a loop are
> > sequential but sub-optimal if there are unrelated loads in a loop that
> > tag to the same prefetcher unit.
> > 
> > This pass attempts to rename the desination register of such colliding
> > loads using routines available in regrename.c so that their tags do
> > not collide.  This shows some performance gains with mcf and xalancbmk
> > (~5% each) and will be tweaked further.  The pass is placed near the
> > fag end of the pass list so that subsequent passes don't inadvertantly
> > end up undoing the renames.
> > 
> > A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
> > did not introduce any new regressions.  I also did a make-check with
> > -mcpu=falkor to ensure that there were no regressions.  The couple of
> > regressions I found were target-specific and were related to scheduling
> > and cost differences and are not correctness issues.
> > 
> > Changes from v3:
> > - Avoid renaming argument/return registers and registers that have a
> >specific architectural meaning, i.e. stack pointer, frame pointer,
> >etc.  Try renaming their aliases instead.
> > 
> > Changes from v2:
> > - Ignore SVE instead of asserting that falkor does not support sve
> > 
> > Changes from v1:
> > 
> > - Fixed up issues pointed out by Kyrill
> > - Avoid renaming R0/V0 since they could be return values
> > - Fixed minor formatting issues.
> > 
> > 2018-07-02  Siddhesh Poyarekar  
> > Kugan Vivekanandarajah  
> > 
> > * config/aarch64/falkor-tag-collision-avoidance.c: New file.
> > * config.gcc (extra_objs): Build it.
> > * config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
> > Likewise.
> > * config/aarch64/aarch64-passes.def
> > (pass_tag_collision_avoidance): New pass.
> > * config/aarch64/aarch64.c (qdf24xx_tunings): Add
> > AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
> > (aarch64_classify_address): Remove static qualifier.
> > (aarch64_address_info, aarch64_address_type): Move to...
> > * config/aarch64/aarch64-protos.h: ... here.
> > (make_pass_tag_collision_avoidance): New function.
> > * config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
> > New tuning flag.
> > 
> > CC: james.greenha...@arm.com
> > CC: kyrylo.tkac...@foss.arm.com
> > ---
> >   gcc/config.gcc|   2 +-
> >   gcc/config/aarch64/aarch64-passes.def |   1 +
> >   gcc/config/aarch64/aarch64-protos.h   |  49 +
> >   gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
> >   gcc/config/aarch64/aarch64.c  |  48 +-
> >   .../aarch64/falkor-tag-collision-avoidance.c  | 881 ++
> >   gcc/config/aarch64/t-aarch64  |   9 +
> >   7 files changed, 946 insertions(+), 46 deletions(-)
> >   create mode 100644 gcc/config/aarch64/falkor-tag-collision-avoidance.c
> > 
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 78e84c2b864..8f5e458e8a6 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -304,7 +304,7 @@ aarch64*-*-*)
> > extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
> > c_target_objs="aarch64-c.o"
> > cxx_target_objs="aarch64-c.o"
> > -   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
> > +   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o 
> > falkor-tag-collision-avoidance.o"
> > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
> > target_has_targetm_common=yes
> > ;;
> > diff --git a/gcc/config/aarch64/aarch64-passes.def 
> > b/gcc/config/aarch64/aarch64-passes.def
> > index 87747b420b0..f61a8870aa1 100644
> > --- a/gcc/config/aarch64/aarch64-passes.def
> > +++ b/gcc/config/aarch64/aarch64-passes.def
> > @@ -19,3 +19,4 @@
> >  .  */
> >   
> >   INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
> > +INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
> > b/gcc/config/aarch64/aarch64-protos.h
> > index 

Re: [PATCH] [aarch64][v2] Fix falkor pipeline description for dup

2018-08-07 Thread James Greenhalgh
On Thu, Aug 02, 2018 at 09:02:05PM -0500, Siddhesh Poyarekar wrote:
> On 08/03/2018 12:02 AM, James Greenhalgh wrote:
> > On Thu, Aug 02, 2018 at 11:58:37AM -0500, Siddhesh Poyarekar wrote:
> >> There was a typo in the pipeline description where DUP was assigned to
> >> the vector pipes for quad mode ops when it really only uses the VTOG
> >> pipes.  Fixing this does not show any noticeable difference in
> >> performance (there's a very small bump of 1.7% in x264 but that's
> >> about it) in my tests but is the more precise description of operations
> >> for falkor.
> >>
> >> Bootstrapped and tested with --with-cpu=falkor to confirm that there
> >> are no regressions.
> > 
> > OK.
> 
> Thanks, may I backport this to gcc 8.x as well given that it has zero 
> impact on anything except falkor?  We are targeting gcc 8.x as the 
> toolchain for falkor and so would like to at least get all falkor 
> specific optimizations and fixes back into the gcc 8.x branch.

This one is OK for backport.

Thanks,
James



Re: [PATCH] [Ada] Make middle-end string literals NUL terminated

2018-08-07 Thread Olivier Hainque
Hi Bernd,

(Thanks for your interest in the Ada case as well :)

> On 7 Aug 2018, at 15:07, Bernd Edlinger  wrote:

> When I try this example:
> 
> $ cat array9.adb
> -- { dg-do run }
> 
> procedure Array9 is
> 
>   V1 : String(1..10) := "1234567890";
>   V2 : String(1..-1) := "";
> 
>   procedure Compare (S : String) is
>   begin
> if S'Size /= 8*S'Length then
>   raise Program_Error;
> end if;
>   end;
> 
> begin
>   Compare ("");
>   Compare ("1234");
>   Compare (V1);
>   Compare (V2);
> end;
> 
> I see that "1234" is put in the merge section,
> but V1 is not.  Maybe because of the alignment requirement?
> 
> But it sould not be much different from the C test case,
> which is now able to merge the non-zero terminated strings.

I'm happy to have a look. I'd just like to make
sure I'll be looking at the right thing:

I would need to start from trunk + the patches you
referenced at

  https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00339.html

+ the last one you sent at

  https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00481.html

Correct ?

Can you then confirm the target triplet and
compilation options ?

Thanks in advance!

Olivier






Re: dejagnu version update?

2018-08-07 Thread Segher Boessenkool
On Mon, Aug 06, 2018 at 08:25:49AM -0700, Mike Stump wrote:
> Since g++ already requires 1.5.3, it make no sense to bump to anything older 
> that 1.5.3, so let's bump to 1.5.3.  Those packaging systems and OSes that 
> wanted to update by now, have had their chance to update.  Those that punt 
> until we bump the requirement, well, they will now have to bump.  :-)

"g++ requires it"?  In what way?  I haven't seen any issues with older
dejagnu versions.

> Ok to update to 1.5.3.

1.5.3 is only three years old, and not all distros carry it.  This is
rather aggressive...


Segher


Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Martin Sebor

On 08/07/2018 09:33 AM, Bernd Edlinger wrote:

On 08/07/18 17:02, Martin Sebor wrote:

On 08/06/2018 11:45 PM, Richard Biener wrote:

On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor  wrote:

On 08/06/2018 11:40 AM, Jeff Law wrote:

On 08/06/2018 11:15 AM, Martin Sebor wrote:

These examples do not aim to be valid C, they just point out

limitations

of the middle-end design, and a good deal of the problems are due
to trying to do things that are not safe within the boundaries

given

by the middle-end design.

I really think this is important -- and as such I think we need to

move

away from trying to describe scenarios in C because doing so keeps
bringing us back to the "C doesn't allow XYZ" kinds of arguments

when

what we're really discussing are GIMPLE semantic issues.

So examples should be GIMPLE.  You might start with (possibly

invalid) C

code to generate the GIMPLE, but the actual discussion needs to be
looking at GIMPLE.  We might include the C code in case someone

wants to

look at things in a debugger, but bringing the focus to GIMPLE is

really

important here.


I don't understand the goal of this exercise.  Unless the GIMPLE
code is the result of a valid test case (in some language GCC
supports), what does it matter what it looks like?  The basis of
every single transformation done by a compiler is that the source
code is correct.  If it isn't then all bets are off.  I'm no GIMPLE
expert but even I can come up with any number of GIMPLE expressions
that have undefined behavior.  What would that prove?

The GIMPLE IL is less restrictive than the original source language.
The process of translation into GIMPLE and optimization can create
situations in the GIMPLE IL that can't be validly represented in the
original source language.  Subobject crossing being one such case,

there

are certainly others.  We have to handle these scenarios correctly.


Sure, but a valid C test case still needs to exist to show that
such a transformation is possible.  Until someone comes up with
one it's all speculation.


Jakub showed you one wrt CSE of addresses.


Sorry, there have been so many examples I've lost track.  Can
you please copy it here or point to it in the archive?



This is based on Jakubs example here: 
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html

$ cat y.cc
typedef __typeof__ (sizeof 0) size_t;
void *operator new (size_t, void *p) { return p; }
void *operator new[] (size_t, void *p) { return p; }
struct S { int x; unsigned char a[1]; char b[64]; };
void baz (char *);

size_t
foo (S *p)
{
   char *q = new ((char*)p->a) char [16];
   baz (q);
   size_t x = __builtin_strlen (q);
   if (x==0)
 __builtin_abort();
   return x;
}

$ gcc -O3 -S y.ccup
$ cat y.s
.LFB2:
.cfi_startproc
subq$8, %rsp
.cfi_def_cfa_offset 16
addq$4, %rdi
call_Z3bazPc
callabort
.cfi_endproc



I think this is not a correct optimization.


I see.  This narrows it down to the placement new expression
exposing the type of the original object rather than that of
the newly constructed object.  We end up with strlen (_2)
where _2 = _1(D)->a.

The aggressive loop optimization trigger in this case because
the access has been transformed to MEM[(char *)p_5(D) + 4B]
early on which obviates the structure of the accessed type.

This is the case that I think is worth solving -- ideally not
by removing the optimization but by preserving the conversion
to the type of the newly constructed object.

Martin


Re: [PATCH][GCC][AArch64] Validate and set default parameters for stack-clash. [Patch (3/3)]

2018-08-07 Thread James Greenhalgh
On Tue, Aug 07, 2018 at 05:09:30AM -0500, Tamar Christina wrote:
> Hi All,
> 
> This is an updated patch which applies the same style changes as requested in 
> patch 5/6.


No full stop on an error message IIRC.

Otherwise, OK.

Thanks,
James

> gcc/
> 2018-08-07  Tamar Christina  
> 
>   * common/config/aarch64/aarch64-common.c (TARGET_OPTION_DEFAULT_PARAM,
>   aarch64_option_default_param):  New.
>   (params.h): Include.
>   (TARGET_OPTION_VALIDATE_PARAM, aarch64_option_validate_param): New.
>   * config/aarch64/aarch64.c (aarch64_override_options_internal): Simplify
>   stack-clash protection validation code.


Re: [PATCH][GCC][AArch64] Ensure that outgoing argument size is at least 8 bytes when alloca and stack-clash. [Patch (3/6)]

2018-08-07 Thread James Greenhalgh
On Tue, Aug 07, 2018 at 05:09:34AM -0500, Tamar Christina wrote:
> Hi All,
> 
> This is a re-spin to address review comments. No code change aside from a 
> variable rename.
> 
> Ok for trunk?

OK.

Thanks,
James

> gcc/
> 2018-08-07  Tamar Christina  
> 
>   PR target/86486
>   * config/aarch64/aarch64.h (STACK_CLASH_MIN_BYTES_OUTGOING_ARGS,
>   STACK_DYNAMIC_OFFSET): New.
>   * config/aarch64/aarch64.c (aarch64_layout_frame):
>   Update outgoing args size.
>   (aarch64_stack_clash_protection_alloca_probe_range,
>   TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE): New.
> 
> gcc/testsuite/
> 2018-08-07  Tamar Christina  
> 
>   PR target/86486
>   * gcc.target/aarch64/stack-check-alloca-1.c: New.
>   * gcc.target/aarch64/stack-check-alloca-10.c: New.
>   * gcc.target/aarch64/stack-check-alloca-2.c: New.
>   * gcc.target/aarch64/stack-check-alloca-3.c: New.
>   * gcc.target/aarch64/stack-check-alloca-4.c: New.
>   * gcc.target/aarch64/stack-check-alloca-5.c: New.
>   * gcc.target/aarch64/stack-check-alloca-6.c: New.
>   * gcc.target/aarch64/stack-check-alloca-7.c: New.
>   * gcc.target/aarch64/stack-check-alloca-8.c: New.
>   * gcc.target/aarch64/stack-check-alloca-9.c: New.
>   * gcc.target/aarch64/stack-check-alloca.h: New.
>   * gcc.target/aarch64/stack-check-14.c: New.
>   * gcc.target/aarch64/stack-check-15.c: New.


[PATCH] PR libstdc++/86861 Meet precondition for Solaris memalign

2018-08-07 Thread Jonathan Wakely

Solaris memalign requires alignment to be at least sizeof(int), so
increase it as needed.

Also move the check for valid alignments from the fallback
implementation of aligned_alloc into operator new, as it's required for
all of aligned_alloc, memalign, posix_memalign and __aligned_malloc.
This adds a branch to check for undefined behaviour which we could just
ignore, so the check could just be removed. It should certainly be
removed if PR 86878 is implemented to issue a warning about calls with
invalid alignments.

PR libstdc++/86861
* libsupc++/new_opa.cc [_GLIBCXX_HAVE_MEMALIGN] (aligned_alloc):
Replace macro with inline function.
[__sun]: Increase alignment to meet memalign precondition.
[!HAVE__ALIGNED_MALLOC && !HAVE_POSIX_MEMALIGN && !HAVE_MEMALIGN]
(aligned_alloc): Move check for valid alignment to operator new.
Remove redundant check for non-zero size, it's enforced by the caller.
(operator new): Move check for valid alignment here. Use
__builtin_expect on check for zero size.

Tested powerpc64le-linux, committed to trunk.



commit 926a15251b85d0b1a8b6bd08a138ad90ec9c83df
Author: Jonathan Wakely 
Date:   Mon Aug 6 15:43:43 2018 +0100

PR libstdc++/86861 Meet precondition for Solaris memalign

Solaris memalign requires alignment to be at least sizeof(int), so
increase it as needed.

Also move the check for valid alignments from the fallback
implementation of aligned_alloc into operator new, as it's required for
all of aligned_alloc, memalign, posix_memalign and __aligned_malloc.
This adds a branch to check for undefined behaviour which we could just
ignore, so the check could just be removed. It should certainly be
removed if PR 86878 is implemented to issue a warning about calls with
invalid alignments.

PR libstdc++/86861
* libsupc++/new_opa.cc [_GLIBCXX_HAVE_MEMALIGN] (aligned_alloc):
Replace macro with inline function.
[__sun]: Increase alignment to meet memalign precondition.
[!HAVE__ALIGNED_MALLOC && !HAVE_POSIX_MEMALIGN && !HAVE_MEMALIGN]
(aligned_alloc): Move check for valid alignment to operator new.
Remove redundant check for non-zero size, it's enforced by the 
caller.
(operator new): Move check for valid alignment here. Use
__builtin_expect on check for zero size.

diff --git a/libstdc++-v3/libsupc++/new_opa.cc 
b/libstdc++-v3/libsupc++/new_opa.cc
index 7c4bb79cdab..5be0cc2ca65 100644
--- a/libstdc++-v3/libsupc++/new_opa.cc
+++ b/libstdc++-v3/libsupc++/new_opa.cc
@@ -39,6 +39,7 @@ static inline void*
 aligned_alloc (std::size_t al, std::size_t sz)
 {
   void *ptr;
+  // posix_memalign has additional requirement, not present on aligned_alloc:
   // The value of alignment shall be a power of two multiple of sizeof(void *).
   if (al < sizeof(void*))
 al = sizeof(void*);
@@ -53,20 +54,24 @@ aligned_alloc (std::size_t al, std::size_t sz)
 #else
 extern "C" void *memalign(std::size_t boundary, std::size_t size);
 #endif
-#define aligned_alloc memalign
-#else
+static inline void*
+aligned_alloc (std::size_t al, std::size_t sz)
+{
+#ifdef __sun
+  // Solaris 10 memalign requires that alignment is greater than or equal to
+  // the size of a word.
+  if (al < sizeof(int))
+al = sizeof(int);
+#endif
+  return memalign (al, sz);
+}
+#else // !HAVE__ALIGNED_MALLOC && !HAVE_POSIX_MEMALIGN && !HAVE_MEMALIGN
 #include 
 // The C library doesn't provide any aligned allocation functions, define one.
 // This is a modified version of code from gcc/config/i386/gmm_malloc.h
 static inline void*
 aligned_alloc (std::size_t al, std::size_t sz)
 {
-  // Alignment must be a power of two.
-  if (al & (al - 1))
-return nullptr;
-  else if (!sz)
-return nullptr;
-
   // We need extra bytes to store the original value returned by malloc.
   if (al < sizeof(void*))
 al = sizeof(void*);
@@ -90,8 +95,13 @@ operator new (std::size_t sz, std::align_val_t al)
   void *p;
   std::size_t align = (std::size_t)al;
 
+  /* Alignment must be a power of two.  */
+  /* XXX This should be checked by the compiler (PR 86878).  */
+  if (__builtin_expect (align & (align - 1), false))
+_GLIBCXX_THROW_OR_ABORT(bad_alloc());
+
   /* malloc (0) is unpredictable; avoid it.  */
-  if (sz == 0)
+  if (__builtin_expect (sz == 0, false))
 sz = 1;
 
 #if _GLIBCXX_HAVE_ALIGNED_ALLOC


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-07 Thread Steve Ellcey
I have a question about my own patch.  In doing testing I realized
that if I use the aarch64_vector_pcs attribute on a platform without
SIMD (i.e. -march=armv8.1-a+nosimd) I get an ICE.  That is obviously
not what we want but I was wondering what the right behaviour is.

We certainly don't want to generate code for a SIMD function on a
target that does not support SIMD, but is it OK to declare something
with the simd or aarch64_vector_pcs attribute if it is never used or called?
Or should even the declaration of a function with the simd/aarch64_vector_pcs
attribute result in a compilation error?  I don't think we want to
complain about simd declarations because those could show up in the system
headers like mathcalls.h even when not used.  Or should those be ifdef'ed
in some way based on the __ARM_NEON macro so they don't show up when not
compiling for SIMD?

Any ideas on where should this check be done, I thought the
TARGET_OPTION_VALID_ATTRIBUTE_P hook might be the right place, but that
seems to be specific to the 'target' attribute only, not attributes
in general.

Steve Ellcey
sell...@cavium.com


Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Jeff Law
On 08/06/2018 05:59 PM, Martin Sebor wrote:
>>> But as I said, far more essential than the optimization is
>>> the ability to detect these invalid access (both reads and
>>> writes), such as in:
>>
>> The essential thing is to not introduce latent wrong code issues
>> because you exploit C language constraints that are not preserved by
>> GIMPLE transforms because they are not constraints in the GIMPLE IL
>> _WHICH_ _IS_ _NOT_ _C_.
> 
> You misunderstood my point: I'm saying "if you must, disable
> the strlen optimization but please don't compromise the bug
> detection."
And to be clear, you should be involved in that process.

Note if fixing the codegen issues loses warnings and restoring the
warnings is a major effort, then the warnings may have to regress.

jeff


Re: [PATCH][GCC][AARCH64] Use STLUR for atomic_store

2018-08-07 Thread Matthew Malcomson

Hello everyone,

This is an updated patch where I've made what was a predicate into a 
pure C function based on some feedback outside the mailing list.


Ok for trunk?

Matthew

gcc/
2018-08-07  Matthew Malcomson  

    * config/aarch64/aarch64-protos.h
    (aarch64_armv8_4_offset_memory_operand): New declaration.
    * config/aarch64/aarch64.c
    (aarch64_armv8_4_offset_memory_operand): New.
    * config/aarch64/aarch64.h (TARGET_ARMV8_4): Add feature macro.
    * config/aarch64/atomics.md (atomic_store): Allow offset
    and use stlur.
    * config/aarch64/constraints.md (Ust): New constraint.
    * config/aarch64/predicates.md.
    (aarch64_sync_or_offset_memory_operand): New predicate.

gcc/testsuite/
2018-08-07  Matthew Malcomson  

    * gcc.target/aarch64/atomic-store.c: New.



### Attachment also inlined for ease of reply 
###



diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
ef95fc829b83886e2ff00e4664e31af916e99b0c..8a1c1eac75ad486777804cec9c313c49e129cc4d 
100644

--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -393,6 +393,7 @@ void aarch64_split_add_offset (scalar_int_mode, rtx, 
rtx, rtx, rtx, rtx);

 bool aarch64_mov_operand_p (rtx, machine_mode);
 rtx aarch64_reverse_mask (machine_mode, unsigned int);
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
+bool aarch64_armv8_4_offset_memory_operand (rtx, machine_mode);
 char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx);
 char *aarch64_output_sve_addvl_addpl (rtx, rtx, rtx);
 char *aarch64_output_sve_inc_dec_immediate (const char *, rtx);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
c1218503bab19323eee1cca8b7e4bea8fbfcf573..328512e11f4230e24223bc51e55bdca8b31f6a20 
100644

--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -237,6 +237,9 @@ extern unsigned aarch64_architecture_version;
 /* ARMv8.3-A features.  */
 #define TARGET_ARMV8_3    (AARCH64_ISA_V8_3)

+/* ARMv8.4-A features.  */
+#define TARGET_ARMV8_4    (AARCH64_ISA_V8_4)
+
 /* Make sure this is always defined so we don't have to check for ifdefs
    but rather use normal ifs.  */
 #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
13b5448aca88555222481f0955237b6fdcbb38b9..607c4f8fc4786857db8f4c2848df18035ef42495 
100644

--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4501,6 +4501,38 @@ offset_12bit_unsigned_scaled_p (machine_mode 
mode, poly_int64 offset)

   && IN_RANGE (multiple, 0, 4095));
 }

+/* Return true if the rtx describes a memory operand consisting of a DImode
+   register offset with a 9 bit signed unscaled constant and we're 
targeting

+   Armv8.4.
+   This function created to test for a case where the STLUR instruction 
will be

+   used.  */
+bool
+aarch64_armv8_4_offset_memory_operand (rtx op, machine_mode mode)
+{
+  if (!TARGET_ARMV8_4)
+    return false;
+
+  if (!MEM_P (op))
+    return false;
+  rtx mem_op = XEXP (op, 0);
+
+  if (GET_CODE (mem_op) != PLUS)
+    return false;
+  rtx plus_op0 = XEXP (mem_op, 0);
+  rtx plus_op1 = XEXP (mem_op, 1);
+
+  /* STLUR instruction requires DImode register.  */
+  if (GET_MODE (plus_op0) != DImode
+  || !REG_P (plus_op0))
+    return false;
+
+  poly_int64 offset;
+  if (!poly_int_rtx_p (plus_op1, ))
+    return false;
+
+  return offset_9bit_signed_unscaled_p (mode, offset);
+}
+
 /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */

 static sbitmap
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 
36c06756a1f94cadae097b3aad654fbeba1cf2f3..41b9845db00fccb3781d91cb3b95680b5c51eb11 
100644

--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -481,9 +481,9 @@
 )

 (define_insn "atomic_store"
-  [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "=Q")
+  [(set (match_operand:ALLI 0 "aarch64_sync_or_offset_memory_operand" 
"=Q,Ust")

 (unspec_volatile:ALLI
-  [(match_operand:ALLI 1 "general_operand" "rZ")
+  [(match_operand:ALLI 1 "general_operand" "rZ,rZ")
    (match_operand:SI 2 "const_int_operand")]            ;; model
   UNSPECV_STL))]
   ""
@@ -491,8 +491,10 @@
 enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
 if (is_mm_relaxed (model) || is_mm_consume (model) || 
is_mm_acquire (model))

   return "str\t%1, %0";
-    else
+    else if (which_alternative == 0)
   return "stlr\t%1, %0";
+    else
+  return "stlur\t%1, %0";
   }
 )

diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index 
72cacdabdac52dcb40b480f7a5bfbf4997c742d8..40cc2f0143221aa2cd9ee0e5abb79640467ab03e 
100644

--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -218,6 +218,11 @@
  (and (match_code "mem")
   

Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Bernd Edlinger
On 08/07/18 17:02, Martin Sebor wrote:
> On 08/06/2018 11:45 PM, Richard Biener wrote:
>> On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor  
>> wrote:
>>> On 08/06/2018 11:40 AM, Jeff Law wrote:
 On 08/06/2018 11:15 AM, Martin Sebor wrote:
>>> These examples do not aim to be valid C, they just point out
>>> limitations
>>> of the middle-end design, and a good deal of the problems are due
>>> to trying to do things that are not safe within the boundaries
>>> given
>>> by the middle-end design.
>> I really think this is important -- and as such I think we need to
>>> move
>> away from trying to describe scenarios in C because doing so keeps
>> bringing us back to the "C doesn't allow XYZ" kinds of arguments
>>> when
>> what we're really discussing are GIMPLE semantic issues.
>>
>> So examples should be GIMPLE.  You might start with (possibly
>>> invalid) C
>> code to generate the GIMPLE, but the actual discussion needs to be
>> looking at GIMPLE.  We might include the C code in case someone
>>> wants to
>> look at things in a debugger, but bringing the focus to GIMPLE is
>>> really
>> important here.
>
> I don't understand the goal of this exercise.  Unless the GIMPLE
> code is the result of a valid test case (in some language GCC
> supports), what does it matter what it looks like?  The basis of
> every single transformation done by a compiler is that the source
> code is correct.  If it isn't then all bets are off.  I'm no GIMPLE
> expert but even I can come up with any number of GIMPLE expressions
> that have undefined behavior.  What would that prove?
 The GIMPLE IL is less restrictive than the original source language.
 The process of translation into GIMPLE and optimization can create
 situations in the GIMPLE IL that can't be validly represented in the
 original source language.  Subobject crossing being one such case,
>>> there
 are certainly others.  We have to handle these scenarios correctly.
>>>
>>> Sure, but a valid C test case still needs to exist to show that
>>> such a transformation is possible.  Until someone comes up with
>>> one it's all speculation.
>>
>> Jakub showed you one wrt CSE of addresses.
> 
> Sorry, there have been so many examples I've lost track.  Can
> you please copy it here or point to it in the archive?
> 

This is based on Jakubs example here: 
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html

$ cat y.cc
typedef __typeof__ (sizeof 0) size_t;
void *operator new (size_t, void *p) { return p; }
void *operator new[] (size_t, void *p) { return p; }
struct S { int x; unsigned char a[1]; char b[64]; };
void baz (char *);

size_t
foo (S *p)
{
   char *q = new ((char*)p->a) char [16];
   baz (q);
   size_t x = __builtin_strlen (q);
   if (x==0)
 __builtin_abort();
   return x;
}

$ gcc -O3 -S y.cc
$ cat y.s
.LFB2:
.cfi_startproc
subq$8, %rsp
.cfi_def_cfa_offset 16
addq$4, %rdi
call_Z3bazPc
callabort
.cfi_endproc



I think this is not a correct optimization.

Bernd.


> In any event, I would find it reasonable for the strlen
> optimization to be subject to the same constraints as
> the aggressive loop optimization.  If there are valid test
> cases where the strlen optimization goes beyond that then
> let's throttle those.  Doing more than that would be
> arbitrary and result in confusing inconsistencies (as
> the proposed patch does).  For example, these two equivalent
> functions should continue to result in the same optimal code:
> 
>    extern char b[2][4];
> 
>    void f (int i)
>    {
>      if (__builtin_strlen (b[i]) >= sizeof b[i])
>    __builtin_abort ();
>    }
> 
>    void g (int i)
>    {
>      unsigned n = 0;
>      while (b[i][n])
>    ++n;
>      if (n >= sizeof b[i])
>    __builtin_abort ();
>    }
> 
> Martin


Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Jeff Law
On 08/06/2018 09:38 PM, Martin Sebor wrote:
> On 08/06/2018 11:40 AM, Jeff Law wrote:
>> On 08/06/2018 11:15 AM, Martin Sebor wrote:
> These examples do not aim to be valid C, they just point out
> limitations
> of the middle-end design, and a good deal of the problems are due
> to trying to do things that are not safe within the boundaries given
> by the middle-end design.
 I really think this is important -- and as such I think we need to move
 away from trying to describe scenarios in C because doing so keeps
 bringing us back to the "C doesn't allow XYZ" kinds of arguments when
 what we're really discussing are GIMPLE semantic issues.

 So examples should be GIMPLE.  You might start with (possibly
 invalid) C
 code to generate the GIMPLE, but the actual discussion needs to be
 looking at GIMPLE.  We might include the C code in case someone
 wants to
 look at things in a debugger, but bringing the focus to GIMPLE is
 really
 important here.
>>>
>>> I don't understand the goal of this exercise.  Unless the GIMPLE
>>> code is the result of a valid test case (in some language GCC
>>> supports), what does it matter what it looks like?  The basis of
>>> every single transformation done by a compiler is that the source
>>> code is correct.  If it isn't then all bets are off.  I'm no GIMPLE
>>> expert but even I can come up with any number of GIMPLE expressions
>>> that have undefined behavior.  What would that prove?
>> The GIMPLE IL is less restrictive than the original source language.
>> The process of translation into GIMPLE and optimization can create
>> situations in the GIMPLE IL that can't be validly represented in the
>> original source language.  Subobject crossing being one such case, there
>> are certainly others.  We have to handle these scenarios correctly.
> 
> Sure, but a valid C test case still needs to exist to show that
> such a transformation is possible.  Until someone comes up with
> one it's all speculation.
No, not at all.  The defined semantics in this space come from actually
bumping against these problems in this past -- resulting in defining the
semantics in the way we have.


> 
> Under normal circumstances the burden of proof that there is
> a problem is on the reporter.  In this case, the requirement
> has turned into one to prove a negative.  Effectively, you
> are asking for a proof that there is no bug, either in
> the assumptions behind the strlen optimization, or somewhere
> else in GCC that would lead the optimization to invalidate
> a valid piece of code.  That's impossible.
I disagree strongly.  We have *defined* a set of semantics in GIMPLE
based on the language lowering processes and needs of the optimizers.
For anything which transforms the IL, you must adhere to the semantics
of GIMPLE.  It's that simple.

I am sympathetic to the desire to use C semantics to get better refined
ranges, but that's just wrong for anything which impacts code generation.

This discussion doesn't seem to be moving beyond that basic point which
is concerning.  It really feels like we should be moving towards how do
we avoid violating GIMPLE semantics for codegen/opt issues while still
getting good warnings.



Jeff


Re: [AArch64] Fix -mlow-precision-div (PR 86838)

2018-08-07 Thread James Greenhalgh
On Fri, Aug 03, 2018 at 10:34:37AM -0500, Richard Sandiford wrote:
> The "@" handling broke -mlow-precision-div, because the scalar forms of
> the instruction were provided by a pattern that also provided FRECPX
> (and so were parameterised on an unspec code as well as a mode),
> while the SIMD versions had a dedicated FRECPE pattern.  This patch
> moves the scalar FRECPE handling to the SIMD pattern too (as for FRECPS)
> and uses a separate pattern for FRECPX.
> 
> The convention in aarch64-simd-builtins.def seemed to be to add
> comments only if the mapping wasn't obvious (i.e. not just sticking
> "aarch64_" on the beginning and "" on the end), so the patch
> deletes the reference to the combined pattern instead of rewording it.
> 
> There didn't seem to be any coverage of -mlow-precision-div in the
> testsuite, so the patch adds some tests for it.
> 
> Tested on aarch64-linux-gnu.  OK to install?

OK.

Thanks,
James

> 
> Richard
> 
> 
> 2018-08-03  Richard Sandiford  
> 
> gcc/
>   PR target/86838
>   * config/aarch64/iterators.md (FRECP, frecp_suffix): Delete.
>   * config/aarch64/aarch64-simd.md
>   (aarch64_frecp): Fold FRECPE into...
>   (@aarch64_frecpe): ...here and the move FRECPX to...
>   (aarch64_frecpx): ...this new pattern.
>   * config/aarch64/aarch64-simd-builtins.def: Remove comment
>   about aarch64_frecp.
> 
> gcc/testsuite/
>   PR target/86838
>   * gcc.target/aarch64/frecpe_1.c: New test.
>   * gcc.target/aarch64/frecpe_2.c: Likewise.
> 


Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Martin Sebor

On 08/06/2018 11:45 PM, Richard Biener wrote:

On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor  wrote:

On 08/06/2018 11:40 AM, Jeff Law wrote:

On 08/06/2018 11:15 AM, Martin Sebor wrote:

These examples do not aim to be valid C, they just point out

limitations

of the middle-end design, and a good deal of the problems are due
to trying to do things that are not safe within the boundaries

given

by the middle-end design.

I really think this is important -- and as such I think we need to

move

away from trying to describe scenarios in C because doing so keeps
bringing us back to the "C doesn't allow XYZ" kinds of arguments

when

what we're really discussing are GIMPLE semantic issues.

So examples should be GIMPLE.  You might start with (possibly

invalid) C

code to generate the GIMPLE, but the actual discussion needs to be
looking at GIMPLE.  We might include the C code in case someone

wants to

look at things in a debugger, but bringing the focus to GIMPLE is

really

important here.


I don't understand the goal of this exercise.  Unless the GIMPLE
code is the result of a valid test case (in some language GCC
supports), what does it matter what it looks like?  The basis of
every single transformation done by a compiler is that the source
code is correct.  If it isn't then all bets are off.  I'm no GIMPLE
expert but even I can come up with any number of GIMPLE expressions
that have undefined behavior.  What would that prove?

The GIMPLE IL is less restrictive than the original source language.
The process of translation into GIMPLE and optimization can create
situations in the GIMPLE IL that can't be validly represented in the
original source language.  Subobject crossing being one such case,

there

are certainly others.  We have to handle these scenarios correctly.


Sure, but a valid C test case still needs to exist to show that
such a transformation is possible.  Until someone comes up with
one it's all speculation.


Jakub showed you one wrt CSE of addresses.


Sorry, there have been so many examples I've lost track.  Can
you please copy it here or point to it in the archive?

In any event, I would find it reasonable for the strlen
optimization to be subject to the same constraints as
the aggressive loop optimization.  If there are valid test
cases where the strlen optimization goes beyond that then
let's throttle those.  Doing more than that would be
arbitrary and result in confusing inconsistencies (as
the proposed patch does).  For example, these two equivalent
functions should continue to result in the same optimal code:

  extern char b[2][4];

  void f (int i)
  {
if (__builtin_strlen (b[i]) >= sizeof b[i])
  __builtin_abort ();
  }

  void g (int i)
  {
unsigned n = 0;
while (b[i][n])
  ++n;
if (n >= sizeof b[i])
  __builtin_abort ();
  }

Martin


Re: [PATCH] fix std::variant::swap for trivially-move-assignable types

2018-08-07 Thread Ville Voutilainen
On 7 August 2018 at 17:29, Jonathan Wakely  wrote:
> On 07/08/18 15:24 +0100, Jonathan Wakely wrote:
>>
>> This patch fixes the bug, but is it correct?
>>
>> IIUC the _M_destructive_move effects don't depend on whether move
>> assignment is trivial, so should be defined the same way in both
>> specializations. It also looks like we can use it in the non-trivial
>> move assignment.
>>
>> Should we define _M_destructive_move on _Move_ctor_base instead of
>> _Move_assign_base, so the duplication could be avoided?
>
>
> Or maybe into _Move_ctor_base as in the attached patch. That allows it
> to be used in _Copy_assign_base, and means we can omit the try-catch
> block when the move construction is trivial.
>

_Move_ctor_base seems fine to me. I plan to revamp our variant to
bring it up to the changes done before C++17
was done, to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85517,
and I plan to do it shortly.


Re: [PATCH 00/11] (v2) Mitigation against unsafe data speculation (CVE-2017-5753)

2018-08-07 Thread John David Anglin

On 2018-08-07 10:05 AM, Richard Earnshaw (lists) wrote:

Thanks.  Wrong PR, though: that was for the SPU port.  The hppa PR is 86785.

Oops, sorry for extra work.

Dave

--
John David Anglin  dave.ang...@bell.net



Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Martin Sebor

On 08/07/2018 02:51 AM, Richard Biener wrote:

On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor  wrote:

On 07/26/2018 02:55 AM, Richard Biener wrote:

On Wed, 25 Jul 2018, Martin Sebor wrote:


BUT - for the string_constant and c_strlen functions we are,
in all cases we return something interesting, able to look
at an initializer which then determines that type.  Hopefully.
I think the strlen() folding code when it sets SSA ranges
now looks at types ...?

Consider

struct X { int i; char c[4]; int j;};
struct Y { char c[16]; };

void foo (struct X *p, struct Y *q)
{
  memcpy (p, q, sizeof (struct Y));
  if (strlen ((char *)(struct Y *)p + 4) < 7)
abort ();
}

here the GIMPLE IL looks like

  const char * _1;

   [local count: 1073741825]:
  _5 = MEM[(char * {ref-all})q_4(D)];
  MEM[(char * {ref-all})p_6(D)] = _5;
  _1 = p_6(D) + 4;
  _2 = __builtin_strlen (_1);

and I guess Martin would argue that since p is of type struct X
+ 4 gets you to c[4] and thus strlen of that cannot be larger
than 3.  But of course the middle-end doesn't work like that
and luckily we do not try to draw such conclusions or we
are somehow lucky that for the testcase as written above we do not
(I'm not sure whether Martins changes in this area would derive
such conclusions in principle).


Only if the strlen argument were p->c.


NOTE - we do not know the dynamic type here since we do not know
the dynamic type of the memory pointed-to by q!  We can only
derive that at q+4 there must be some object that we can
validly call strlen on (where Martin again thinks strlen
imposes constrains that memchr does not - sth I do not agree
with from a QOI perspective)


The dynamic type is a murky area.


It's well-specified in the middle-end.  A store changes the
dynamic type of the stored-to object.  If that type is
compatible with the surrounding objects dynamic type that one
is not affected, if not then the surrounding objects dynamic
type becomes unspecified.  There is TYPE_TYPELESS_STORAGE
to somewhat control "compatibility" of subobjects.


I never responded to this.  Using a dynamic (effective) type as
you describe it would invalidate the aggressive loop optimization
in the following:

  void foo (struct X *p)
  {
  struct Y y = { "12345678" };
  memcpy (p, , sizeof (struct Y));

  // *p effective type is now struct Y

  int n = 0;
  while (p->c[n])
++n;

  if (n < 7)
abort ();
  }

GCC unconditionally aborts, just as it does with strlen(p->c).
Why is that not wrong (in either case)?

Because the code is invalid either way, for two reasons:


No, because the storage has only 4 non-null characters starting at offset 4?


No, for the reasons below.  I made a mistake of making
the initializer string too short.  If we make it longer it
still aborts.  Say with this

  struct Y y = { "123456789012345" };

we end up with this DCE:

 struct Y y;

   :
  MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231;
  __builtin_abort ();

With -fdump-tree-cddce1-details (and a patch to show the upper
bound) we see:

  Found loop 1 to be finite: upper bound found: 3.

With -fno-aggressive-loop-optimizations the abort becomes
conditional because the array bound isn't considered. I would
expect you to know this since you implemented the feature.

Martin



1) it accesses an object of (effective) type struct Y via
   an lvalue of type struct X (specifically via (*p).c)
2) it relies on p->c

The loop optimization relies on the exact same requirement
as the strlen one.  Either they are both valid or neither is.

Martin






Fix PR number in hppa commit for CVE-2017-5753)

2018-08-07 Thread Richard Earnshaw (lists)
This just fixes the PR number in the ChangeLog.  Nothing we can do about
the SVN history.

gcc/ChangeLog

Committed as obvious.
Index: ChangeLog
===
--- ChangeLog	(revision 263357)
+++ ChangeLog	(working copy)
@@ -9,7 +9,7 @@
 
 2018-08-06  John David Anglin  
 
-	PR target/86807
+	PR target/86785
 	* config/pa/pa.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
 	Define to speculation_safe_value_not_needed.
 


Re: [PATCH] fix std::variant::swap for trivially-move-assignable types

2018-08-07 Thread Jonathan Wakely

On 07/08/18 15:24 +0100, Jonathan Wakely wrote:

This patch fixes the bug, but is it correct?

IIUC the _M_destructive_move effects don't depend on whether move
assignment is trivial, so should be defined the same way in both
specializations. It also looks like we can use it in the non-trivial
move assignment.

Should we define _M_destructive_move on _Move_ctor_base instead of
_Move_assign_base, so the duplication could be avoided?


Or maybe into _Move_ctor_base as in the attached patch. That allows it
to be used in _Copy_assign_base, and means we can omit the try-catch
block when the move construction is trivial.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 66d878142a4..5dd00b05f1f 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -506,6 +506,20 @@ namespace __variant
 	  }
   }
 
+  void _M_destructive_move(_Move_ctor_base&& __rhs)
+  {
+	this->~_Move_ctor_base();
+	__try
+	  {
+	::new (this) _Move_ctor_base(std::move(__rhs));
+	  }
+	__catch (...)
+	  {
+	this->_M_index = variant_npos;
+	__throw_exception_again;
+	  }
+  }
+
   _Move_ctor_base(const _Move_ctor_base&) = default;
   _Move_ctor_base& operator=(const _Move_ctor_base&) = default;
   _Move_ctor_base& operator=(_Move_ctor_base&&) = default;
@@ -516,6 +530,12 @@ namespace __variant
 {
   using _Base = _Copy_ctor_alias<_Types...>;
   using _Base::_Base;
+
+  void _M_destructive_move(_Move_ctor_base&& __rhs)
+  {
+	this->~_Move_ctor_base();
+	::new (this) _Move_ctor_base(std::move(__rhs));
+  }
 };
 
   template
@@ -544,16 +564,7 @@ namespace __variant
 	else
 	  {
 	_Copy_assign_base __tmp(__rhs);
-	this->~_Copy_assign_base();
-	__try
-	  {
-		::new (this) _Copy_assign_base(std::move(__tmp));
-	  }
-	__catch (...)
-	  {
-		this->_M_index = variant_npos;
-		__throw_exception_again;
-	  }
+	this->_M_destructive_move(std::move(__tmp));
 	  }
 	__glibcxx_assert(this->_M_index == __rhs._M_index);
 	return *this;
@@ -582,20 +593,6 @@ namespace __variant
   using _Base = _Copy_assign_alias<_Types...>;
   using _Base::_Base;
 
-  void _M_destructive_move(_Move_assign_base&& __rhs)
-  {
-	this->~_Move_assign_base();
-	__try
-	  {
-	::new (this) _Move_assign_base(std::move(__rhs));
-	  }
-	__catch (...)
-	  {
-	this->_M_index = variant_npos;
-	__throw_exception_again;
-	  }
-  }
-
   _Move_assign_base&
   operator=(_Move_assign_base&& __rhs)
 	  noexcept(_Traits<_Types...>::_S_nothrow_move_assign)
@@ -613,16 +610,7 @@ namespace __variant
 	else
 	  {
 	_Move_assign_base __tmp(std::move(__rhs));
-	this->~_Move_assign_base();
-	__try
-	  {
-		::new (this) _Move_assign_base(std::move(__tmp));
-	  }
-	__catch (...)
-	  {
-		this->_M_index = variant_npos;
-		__throw_exception_again;
-	  }
+	this->_M_destructive_move(std::move(__tmp));
 	  }
 	__glibcxx_assert(this->_M_index == __rhs._M_index);
 	return *this;
@@ -638,6 +626,20 @@ namespace __variant
 {
   using _Base = _Copy_assign_alias<_Types...>;
   using _Base::_Base;
+
+  void _M_destructive_move(_Move_assign_base&& __rhs)
+  {
+	this->~_Move_assign_base();
+	__try
+	  {
+	::new (this) _Move_assign_base(std::move(__rhs));
+	  }
+	__catch (...)
+	  {
+	this->_M_index = variant_npos;
+	__throw_exception_again;
+	  }
+  }
 };
 
   template


[PATCH] fix std::variant::swap for trivially-move-assignable types

2018-08-07 Thread Jonathan Wakely

This patch fixes the bug, but is it correct?

IIUC the _M_destructive_move effects don't depend on whether move
assignment is trivial, so should be defined the same way in both
specializations. It also looks like we can use it in the non-trivial
move assignment.

Should we define _M_destructive_move on _Move_ctor_base instead of
_Move_assign_base, so the duplication could be avoided?

Obviously this needs a ChangeLog entry and tests.



diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 66d878142a4..7d691c487fd 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -613,16 +613,7 @@ namespace __variant
 	else
 	  {
 	_Move_assign_base __tmp(std::move(__rhs));
-	this->~_Move_assign_base();
-	__try
-	  {
-		::new (this) _Move_assign_base(std::move(__tmp));
-	  }
-	__catch (...)
-	  {
-		this->_M_index = variant_npos;
-		__throw_exception_again;
-	  }
+	_M_destructive_move(std::move(__tmp));
 	  }
 	__glibcxx_assert(this->_M_index == __rhs._M_index);
 	return *this;
@@ -638,6 +629,19 @@ namespace __variant
 {
   using _Base = _Copy_assign_alias<_Types...>;
   using _Base::_Base;
+  void _M_destructive_move(_Move_assign_base&& __rhs)
+  {
+	this->~_Move_assign_base();
+	__try
+	  {
+	::new (this) _Move_assign_base(std::move(__rhs));
+	  }
+	__catch (...)
+	  {
+	this->_M_index = variant_npos;
+	__throw_exception_again;
+	  }
+  }
 };
 
   template


Re: Improve std::rotate usages

2018-08-07 Thread Jonathan Wakely

On 27/05/18 19:25 +0200, François Dumont wrote:

Still no chance to review it ?

I'd like this one to go in before submitting other algo related patches.

    * include/bits/stl_algo.h
    (__rotate(_Ite, _Ite, _Ite, forward_iterator_tag))
    (__rotate(_Ite, _Ite, _Ite, bidirectional_iterator_tag))
    (__rotate(_Ite, _Ite, _Ite, random_access_iterator_tag)): Move 
code duplication...

    (rotate(_Ite, _Ite, _Ite)): ...here.
    (__stable_partition_adaptive(_FIt, _FIt, _Pred, _Dist, _Pointer, 
_Dist)):

    Simplify rotate call.
    (__rotate_adaptive(_BIt1, _BIt1, _BIt1, _Dist, _Dist, _Bit2, _Dist)):
    Likewise.
    (__merge_without_buffer(_BIt, _BIt, _BIt, _Dist, _Dist, _Comp)):
    Likewise.

François

On 14/05/2018 22:14, François Dumont wrote:

Any feedback regarding this patch ?


On 02/05/2018 07:26, François Dumont wrote:

Hi

    std::rotate already returns the expected iterator so there is 
no need for calls to std::advance/std::distance.


Yes, looks like that code predated DR 488 which changed the return
type of std::rotate.

OK for trunk, thanks.




Re: [PATCH 00/11] (v2) Mitigation against unsafe data speculation (CVE-2017-5753)

2018-08-07 Thread Richard Earnshaw (lists)
On 06/08/18 22:52, John David Anglin wrote:
> On 2018-08-03 5:06 AM, Richard Earnshaw (lists) wrote:
>>> I don't think there's a suitable barrier.  The sync instruction seems
>>> like overkill.
>>>
>>> So, I'm going to install attached change after testing is complete.
>>>
>> It's your call as port maintainers.
> I committed the attached change after testing on hppa-unknown-linux-gnu.
> 

Thanks.  Wrong PR, though: that was for the SPU port.  The hppa PR is 86785.

R.

> Dave
> 
> 
> pa-spectre.d
> 
> 
> 2018-08-06  John David Anglin  
> 
>   PR target/86807
>   * config/pa/pa.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
>   Define to speculation_safe_value_not_needed.
> 
> Index: config/pa/pa.c
> ===
> --- config/pa/pa.c(revision 263228)
> +++ config/pa/pa.c(working copy)
> @@ -428,6 +428,9 @@
>  #undef TARGET_STARTING_FRAME_OFFSET
>  #define TARGET_STARTING_FRAME_OFFSET pa_starting_frame_offset
>  
> +#undef TARGET_HAVE_SPECULATION_SAFE_VALUE
> +#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
>  /* Parse the -mfixed-range= option string.  */
> 



Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-07 Thread Cesar Philippidis
On 08/06/2018 11:08 PM, Tom de Vries wrote:
> On 08/01/2018 12:18 PM, Tom de Vries wrote:
> 
>> I think we need to add and handle:
>> ...
>>   CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
>> ...
>>
> 
> I realized that the patch I posted introducing CUDA_ONE_CALL_MAYBE_NULL
> was incomplete, and needed to use the weak attribute in case of linking
> against a concrete libcuda.so.
> 
> So, I've now committed a patch implementing just CUDA_ONE_CALL_MAYBE_NULL:
> "[libgomp, nvptx] Handle CUDA_ONE_CALL_MAYBE_NULL" @
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00447.html . You can use
> "CUDA_CALL_EXISTS (cuOccupancyMaxPotentialBlockSize)" to test for
> existence of the function in the cuda driver API.

Sorry for taking so long getting this patch updated. It's a slow build
and test cycle getting older versions of cuda to play nicely. So far,
I've managed to get CUDA 5.5 partially working with Nvidia driver
331.113 (which supports CUDA 6.0) in the sense that I spotted an error
with the patch; I realized that the cuda.h that ships with libgomp
emulates version CUDA 8.0. That lead to problems using cuLinkAddData,
because that function gets remapped to cuLinkAddData_v2 in CUDA 6.5 and
newer.

That leads me to a question, do we really want to support older versions
of CUDA without using the system's CUDA header files?

>> The patch doesn't build in a setup with
>> --enable-offload-targets=nvptx-none and without cuda, that enables usage
>> of plugin/cuda/cuda.h:
>> ...
>> /data/offload-nvptx/src/libgomp/plugin/plugin-nvptx.c:98:16: error:
>> ‘cuOccupancyMaxPotentialBlockSize’ undeclared here (not in a function);
>> did you mean ‘cuOccupancyMaxPotentialBlockSizeWithFlags’?
>>  CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize) \
>> ...
>>
> 
> I've committed a patch "[libgomp, nvptx, --without-cuda-driver] Don't
> use system cuda driver" @
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00348.html .
> 
> Using --without-cuda-driver should make it easy to build using the
> dlopen interface without having to de-install the system libcuda.so.

I attached an updated version of the CUDA driver patch, although I
haven't rebased it against your changes yet. It still needs to be tested
against CUDA 5.5 using the systems/Nvidia's cuda.h. But I wanted to give
you an update.

Does this patch look OK, at least after testing competes? I removed the
tests for CUDA_ONE_CALL_MAYBE_NULL, because the newer CUDA API isn't
supported in the older drivers.

Cesar

>From 7fc093da173543b43e1d83dd5fb9e00e2b92eb09 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Thu, 26 Jul 2018 11:47:35 -0700
Subject: [PATCH] [nvptx] Use CUDA driver API to select default runtime launch
 geometry

	libgomp/
	plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
	(cuDriverGetVersion): Declare.
	(cuOccupancyMaxPotentialBlockSizeWithFlags): Declare.
	plugin/plugin-nvptx.c (CUDA_ONE_CALL): Add entries for
	cuDriverGetVersion and cuOccupancyMaxPotentialBlockSize.
	(ptx_device): Add driver_version member.
	(nvptx_open_device): Initialize it.
	(nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
	default num_gangs and num_workers when the driver supports it.
---
 libgomp/plugin/cuda-lib.def   |  2 ++
 libgomp/plugin/cuda/cuda.h|  4 
 libgomp/plugin/plugin-nvptx.c | 41 +--
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
index be8e3b3ec4d..f2433e1f0a9 100644
--- a/libgomp/plugin/cuda-lib.def
+++ b/libgomp/plugin/cuda-lib.def
@@ -2,6 +2,7 @@ CUDA_ONE_CALL (cuCtxCreate)
 CUDA_ONE_CALL (cuCtxDestroy)
 CUDA_ONE_CALL (cuCtxGetCurrent)
 CUDA_ONE_CALL (cuCtxGetDevice)
+CUDA_ONE_CALL (cuDriverGetVersion)
 CUDA_ONE_CALL (cuCtxPopCurrent)
 CUDA_ONE_CALL (cuCtxPushCurrent)
 CUDA_ONE_CALL (cuCtxSynchronize)
@@ -39,6 +40,7 @@ CUDA_ONE_CALL (cuModuleGetGlobal)
 CUDA_ONE_CALL (cuModuleLoad)
 CUDA_ONE_CALL (cuModuleLoadData)
 CUDA_ONE_CALL (cuModuleUnload)
+CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize)
 CUDA_ONE_CALL (cuStreamCreate)
 CUDA_ONE_CALL (cuStreamDestroy)
 CUDA_ONE_CALL (cuStreamQuery)
diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 4799825bda2..3a790e688e0 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -44,6 +44,7 @@ typedef void *CUevent;
 typedef void *CUfunction;
 typedef void *CUlinkState;
 typedef void *CUmodule;
+typedef size_t (*CUoccupancyB2DSize)(int);
 typedef void *CUstream;
 
 typedef enum {
@@ -123,6 +124,7 @@ CUresult cuCtxSynchronize (void);
 CUresult cuDeviceGet (CUdevice *, int);
 CUresult cuDeviceGetAttribute (int *, CUdevice_attribute, CUdevice);
 CUresult cuDeviceGetCount (int *);
+CUresult cuDriverGetVersion(int *);
 CUresult cuEventCreate (CUevent *, unsigned);
 #define cuEventDestroy cuEventDestroy_v2
 CUresult cuEventDestroy (CUevent);
@@ -170,6 +172,8 @@ CUresult cuModuleGetGlobal (CUdeviceptr *, size_t *, CUmodule, const char *);
 CUresult 

Re: PR libstdc++/68222 Hide safe iterator operators

2018-08-07 Thread Jonathan Wakely

On 02/08/18 22:16 +0200, François Dumont wrote:

Hi

    Here is a patch to avoid definition of invalid operators on the 
Debug mode safe iterator type depending on its category.


    Even if it is limited to the Debug mode code I would like to have 
a feedback before committing. Especially on the following points:


- _Safe_tagged_iterator: Is the name ok ?


Hmm, maybe "strict" instead of tagged?

But do we need a new name? Can we just change _Safe_iterator instead
of adding a new type?

Where is _Safe_iterator still used? Just local iterators in unordered
containers?  Is it OK to remove most of the functions that used to
support it? (__niter_base etc).

Could we add a new type for the local iterators, and just change
_Safe_iterator directly so it doesn't expose unsupported operations?

That would make the patch *much* smaller, as you wouldn't need to
change all the uses of _Safe_iterator.


Another approach would be to use mixins to expose the operations:

template
struct _Safe_iterator_mixin<_Iter, _Cat>
{
 typename iterator_traits<_Iter>::reference
 operator*()
 { return static_cast<_Iter*>(this)->_M_deref(); }
};

template
struct _Safe_iterator_mixin<_Iter, forward_iterator_tag>
{
 _Iter& operator++()
 { return static_cast<_Iter*>(this)->_M_preinc(); }
 _Iter operator++(int)
 { return static_cast<_Iter*>(this)->_M_postinc(); }
};

template
struct _Safe_iterator_mixin<_Iter, bidirectional_iterator_tag>
{
 _Iter& operator--()
 { return static_cast<_Iter*>(this)->_M_predec(); }
 _Iter operator--(int)
 { return static_cast<_Iter*>(this)->_M_postdec(); }
};

etc.

then in _Safe_iterator rename the operator functions, so operator*
becomes _M_deref, operator++ becomes _M_preinc etc. and then derive
from _Safe_iterator_mixin which declares the operators.


- Inheritance between the different _Safe_tagged_iterator 
instantiations. I am already working on making the operators friends 
as we discuss in another thread so I might review this design at this 
moment.


- Are concept checks I added to testsuite_containers.h citerator ok ?


Yes, they look useful.

    This patch also does some cleanup on Debug functions. 
__check_dereferenceable was not used (anymore maybe) so I removed it. 


It would have made the patch a bit more manageable to keep that as a
separate patch, but nevermind.


diff --git a/libstdc++-v3/include/debug/deque b/libstdc++-v3/include/debug/deque
index 93b82cf..213e23b 100644
--- a/libstdc++-v3/include/debug/deque
+++ b/libstdc++-v3/include/debug/deque
@@ -57,12 +57,14 @@ namespace __debug
  typedef __gnu_debug::_Equal_to<_Base_const_iterator> _Equal;

public:
+  typedef _Basenormal_type;


normal_type is not a reserved name, so can't be used.



diff --git a/libstdc++-v3/include/debug/stl_iterator.h 
b/libstdc++-v3/include/debug/stl_iterator.h
index f20b000..fd20160 100644
--- a/libstdc++-v3/include/debug/stl_iterator.h
+++ b/libstdc++-v3/include/debug/stl_iterator.h
@@ -75,12 +76,6 @@ namespace __gnu_debug
#else
  template
inline auto
-__base(const std::reverse_iterator<_Iterator>& __it)
--> decltype(std::__make_reverse_iterator(__base(__it.base(
-{ return std::__make_reverse_iterator(__base(__it.base())); }


This removal doesn't seem to be mentioned in the ChangeLog:


* include/debug/stl_iterator.h
(__is_safe_random_iterator>): Remove.
(__base(const std::reverse_iterator<_Safe_tagged_iterator<_It, _Sq,
std::random_access_iterator_tag>)): New overload.




[RFC][PATCH][mid-end] Optimize immediate choice in comparisons.

2018-08-07 Thread Vlad Lazar

Hi.

This patch optimises the choice of immediates in integer comparisons. Integer
comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
and there are cases where an incremented/decremented immediate can be loaded 
into a
register in fewer instructions. The cases are as follows:
  
i)   a >  b or a >= b + 1

ii)  a <= b or a <  b + 1
iii) a >= b or a >  b - 1
iv)  a <  b or a <= b - 1

For each comparison we check whether the equivalent can be performed in less 
instructions.
This is done on a statement by statement basis, right before the GIMPLE 
statement is expanded
to RTL. Therefore, it requires a correct implementation of the TARGET_INSN_COST 
hook.
The change is general and it applies to any integer comparison, regardless of 
types or location.

For example, on AArch64 for the following code:

int foo (int x)
{
  return x > 0xfefe;
}

it generates:

mov w1, -16777217
cmp w0, w1
csetw0, cs

instead of:

mov w1, 65534
movkw1, 0xfeff, lsl 16
cmp w0, w1
csetw0, hi

Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and 
there are no regressions.

Thanks,
Vlad

gcc/testsuite/
Changelog for gcc/testsuite/Changelog
2018-07-30  Vlad Lazar  

* gcc.target/aarch64/imm_choice_comparison.c: New.

gcc/
Changelog for gcc/Changelog
2018-07-30  Vlad Lazar  

* cfgexpand.c (optimize_immediate_choice): New.
(can_optimize_immediate_p): Likewise.
(validate_immediate_optimization): Likewise.
(update_immediate): Likewise.
(immediate_optimized_code): Likewise.

---

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 
9b91279282e1c6956c8b3699f13036c401ea1dcd..5b0a2e0cdb23f649336844506c8241428b3e6e7d
 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5423,6 +5423,157 @@ reorder_operands (basic_block bb)
   XDELETE (lattice);
 }
 
+/* Helper function for update_immediate.  Returns the adjusted conditional

+   code for the immediate choice optimization.  See optimize_immediate_choice
+   for cases.  */
+
+static enum tree_code
+immediate_optimized_code (enum tree_code code)
+{
+  switch (code)
+{
+case GT_EXPR:
+  return GE_EXPR;
+case GE_EXPR:
+  return GT_EXPR;
+case LT_EXPR:
+  return LE_EXPR;
+case LE_EXPR:
+  return LT_EXPR;
+default:
+  return code;
+}
+}
+
+/* Helper function for optimize_immediate_choice.  It updates the immediate
+   and the subcode of the gimple statement.  At the point of calling
+   the function we know that the optimization leads to fewer instructions.
+   stmt points to the gimple statement containing the comparison we wish to
+   optimize and to_add is the amount by which the immediate needs to be
+   adjusted (1 or -1).  */
+
+static void
+update_immediate (gimple *stmt, int to_add)
+{
+  tree inc_dec = to_add == 1 ? build_one_cst (integer_type_node) :
+  build_minus_one_cst (integer_type_node);
+
+  enum gimple_code code = gimple_code (stmt);
+  if (code == GIMPLE_COND)
+{
+  gcond *gc = GIMPLE_CHECK2 (stmt);
+  tree rhs = gimple_cond_rhs (gc);
+
+  /* Update the statement.  */
+  tree new_rhs = fold_build2 (PLUS_EXPR, TREE_TYPE (rhs), rhs, inc_dec);
+  gimple_cond_set_rhs (gc, new_rhs);
+  gc->subcode = immediate_optimized_code ((enum tree_code) gc->subcode);
+}
+  if (code == GIMPLE_ASSIGN)
+{
+  gassign *ga = GIMPLE_CHECK2 (stmt);
+  tree rhs2 = gimple_assign_rhs2 (ga);
+
+  tree new_rhs2 = fold_build2 (PLUS_EXPR, TREE_TYPE (rhs2), rhs2, inc_dec);
+  gimple_assign_set_rhs2 (ga, new_rhs2);
+  ga->subcode = immediate_optimized_code ((enum tree_code) ga->subcode);
+}
+}
+
+/* Helper function for optimize_immediate_choice.  It checks if the other
+   possible immediate leads to a lower rtx cost.  to_add is the amount by
+   which the immediate needs to be adjusted (1 or -1).  The function
+   returns 0 if there is no improvement and the amount by which the immediate
+   needs to be adjusted (1 or -1) otherwise.  */
+
+static int
+validate_immediate_optimization (gimple *stmt, int to_add)
+{
+  tree tree_imm = gimple_code (stmt) == GIMPLE_COND ? gimple_cond_rhs (stmt)
+   : gimple_assign_rhs2 (stmt);
+  const_tree type = TREE_TYPE (tree_imm);
+  widest_int imm = wi::to_widest (tree_imm);
+  enum signop sgn = TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED;
+
+  /* Check for overflow/underflow in the case of signed values and
+ wrapping around in the case of unsigned values.  If any occur
+ cancel the optimization.  */
+
+  widest_int max_type = wi::to_widest (TYPE_MAX_VALUE (type));
+  widest_int min_type = wi::to_widest (TYPE_MIN_VALUE (type));
+  if ((wi::cmp (imm, max_type, sgn) == 0 && to_add == 1)
+  || (wi::cmp (imm, min_type, sgn) == 0 && to_add == -1))
+return 0;
+
+  widest_int imm_modif = wi::add (imm, to_add);
+
+  rtx reg = gen_reg_rtx (DImode);
+  rtx old_imm = 

Re: [PATCH] [Ada] Make middle-end string literals NUL terminated

2018-08-07 Thread Bernd Edlinger
Hi Olivier,

On 08/06/18 20:01, Olivier Hainque wrote:
> Hi Bernd,
> 
> Things are moving fast so I'll answer your
> messages incrementally :-)
> 
>> On 3 Aug 2018, at 20:37, Bernd Edlinger  wrote:
> [...]
>> Oh, how interesting.
>>
>> My patch only intends to add a dummy byte that is stripped again
>> in the assembly.  The purpose is to make it trivial to find out if
>> a string constant is NUL-terminated in  the middle-end by comparing
>> TYPE_SIZE_UNIT (TREE_TYPE (decl)) with TREE_STRING_LENGTH (decl).
> 
> Ah, Ok. I hadn't quite realized that yet.
> 
>> TYPE_SIZE_UNIT >= TREE_STRING_LENGTH means zero terminated
>> TYPE_SIZE_UNIT < TREE_STRING_LENGTH means not necessarily zero terminated,
> 
> That is opposite to the situation we were having
> for Ada before the patch series:
> 
>> Such strings shall never chop off more than one nul character.
>> Ada strings are generally not zero terminated, right?
> 
> Right. Strings have explicit bounds in Ada, and a NUL character
> is like any other. It can appear anywhere. For example, the
> excerpt below constructs a string of 5 characters, bounds 1 to 5,
> with a nul character in second position:
> 
>   X : String (1 .. 5) := "1" & Ascii.Nul & "345";
> 
> ('&' is the catenation operator)
> 
> The problem we had was that string literals not explicitly
> NUL terminated (as in X : String := "123" & Ascii.NUL) would
> never go in mergeable sections. For example:
> 
> Ada.Text_IO.Put_Line ("Hello World!");
> 
> Ada has so called "generic" units that are sometimes
> instanciated many many times so there is real interest
> in improving the situation there.
> 
>> So in your search loop you would not have to look after
>> type_len, because that byte would be guaranteed to be zero.
>>
>> That is if we agree that we want to restrict the string constants
>> in that way as I proposed.
>>
>> In the case str_len > type_len I do not understand if that is right.
> 
> In your new model, I don't know what sense it would make, indeed.
> 
> In the previous model, it was clearly expected to be a possibility.
> 
>> Because varasm will only output type_size bytes,
>> this seems only to select a string section
>> but the last nul byte will not be assembled.
>> Something that is not contained in the type_size
>> should not be assembled.
>>
>> What exactly do you want to accomplish?
> 
> Based on the previous (say, gcc-7 or gcc-8) code base, arrange
> for most Ada string literals to end up in a mergeable section
> on ELF targets.
> 
> In at least gcc-7 and gcc-8, our gigi adjustment + the
> patch I posted achieve this effect.
> 
> For example, for the following source:
> 
> procedure Hello is
>procedure Process (S : String) with Import, Convention => C;
> begin
>   Process ("1234");
> end;
> 

Yes, thanks for this example,

$ cat hello.adb
procedure Hello is
procedure puts  (S : String) with Import, Convention => C;
X : String(1..8) := "abcdefg" & Ascii.Nul;
begin
   puts ("1234" & Ascii.Nul );
   puts (X);
end;

produces again identical output with the latest patch that I posted
right now.



> Without the gigi change, I get (x86_64-linux,
> -O2 -fmerge-all-constants):
> 
>.section .rodata
>.LC0:
>.ascii "1234"
> 
> Regular section, no terminating zero.
> 
> The gigi change alone gets us the terminating nul (*),
> but not yet a mergeable section, because the extra nul
> isn't covered by the type bounds:
> 
>.section .rodata
>.LC0:
>.string "1234"
> 
> With the varasm change on top, checking beyond the
> type bounds when the string constant isn't an initializer,
> we get:
> 
>.section .rodata.str1.1,"aMS",@progbits,1
>.LC0:
>.string "1234"
> 
> (*) The terminating zero, part of the string value,
> not spanned by the type bounds, gets assembled through:
> 
> assemble_constant_contents (...)
> ...
> size = get_constant_size (exp);
> 
> then:
> 
> get_constant_size (tree exp)
> 
>size = int_size_in_bytes (TREE_TYPE (exp));
>if (TREE_CODE (exp) == STRING_CST)
>  size = MAX (TREE_STRING_LENGTH (exp), size);
> 
> Again, this is just providing more context on the
> original patch I posted, based on your questions.
> 

Ah, good point. That needs to be changed...
Actually I would like to have get_constant_size
return just the type size.

> I understand there are proposals to change things
> pretty significantly in this area, so ...
> 
> now on to your next messages and ideas :-)
> 
> 

When I try this example:

$ cat array9.adb
-- { dg-do run }

procedure Array9 is

   V1 : String(1..10) := "1234567890";
   V2 : String(1..-1) := "";

   procedure Compare (S : String) is
   begin
 if S'Size /= 8*S'Length then
   raise Program_Error;
 end if;
   end;

begin
   Compare ("");
   Compare ("1234");
   Compare (V1);
   Compare (V2);
end;

I see that "1234" is put in the merge section,
but V1 is not.  Maybe because of the alignment requirement?

But it sould not be much different from the C test case,
which is now able to 

Re: [PATCH] Handle not explicitly zero terminated strings in merge sections

2018-08-07 Thread Bernd Edlinger
Hi,

I realized that the previous version did not handle
zero terminated Ada constants correctly as in

$ cat hello.adb
procedure Hello is
procedure puts  (S : String) with Import, Convention => C;
X : String(1..8) := "abcdefg" & Ascii.Nul;
begin
   puts ("1234" & Ascii.Nul );
   puts (X);
end;

accidentally strings were doubly nul-terminated, because I forgot to
handle merge string sections in assemble_constant_contents, and
because get_constant_size increased the string literal by one.

Fixed, and added a positive test case for the merging non-zero terminated
strings in C.


Boot-strapped and regtested on x86_64-pc-linux-gnu.
Is it OK for trunk (after pre-condition patches)?


Thanks
Bernd.
gcc:
2018-08-04  Bernd Edlinger  

	* varasm.c (output_constant): Add new parameter merge_strings.
	Make strings properly zero terminated in merge string sections.
	(mergeable_string_section): Don't fail if the last char is non-zero.
	(assemble_variable_contents): Handle merge string sections.
	(assemble_variable): Likewise.
	(assemble_constant_contents): Likewise.
	(output_constant_def_contents): Likewise.
	(get_constant_size): Remove special handling of STRING_CSTs.
	(redundant_nul_term_p): New helper function.

testsuite:
2018-08-04  Bernd Edlinger  

	* gcc.dg/merge-all-constants-1.c: Adjust test.
	* gcc.dg/merge-all-constants-2.c: New test.

Index: gcc/varasm.c
===
--- gcc/varasm.c	(revision 263072)
+++ gcc/varasm.c	(working copy)
@@ -111,7 +111,8 @@ static int compare_constant (const tree, const tre
 static void output_constant_def_contents (rtx);
 static void output_addressed_constants (tree);
 static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
-	   unsigned int, bool);
+	   unsigned int, bool,
+	   bool = false);
 static void globalize_decl (tree);
 static bool decl_readonly_section_1 (enum section_category);
 #ifdef BSS_SECTION_ASM_OP
@@ -827,7 +828,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS
 	  unit = GET_MODE_SIZE (mode);
 
 	  /* Check for embedded NUL characters.  */
-	  for (i = 0; i < len; i += unit)
+	  for (i = 0; i < len - unit; i += unit)
 	{
 	  for (j = 0; j < unit; j++)
 		if (str[i + j] != '\0')
@@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, const char
 
 static void
 assemble_variable_contents (tree decl, const char *name,
-			bool dont_output_data)
+			bool dont_output_data, bool merge_strings = false)
 {
   /* Do any machine/system dependent processing of the object.  */
 #ifdef ASM_DECLARE_OBJECT_NAME
@@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, const char
 	output_constant (DECL_INITIAL (decl),
 			 tree_to_uhwi (DECL_SIZE_UNIT (decl)),
 			 get_variable_align (decl),
-			 false);
+			 false, merge_strings);
   else
 	/* Leave space for it.  */
 	assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl)));
@@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_level ATTRIB
 	switch_to_section (sect);
   if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-  assemble_variable_contents (decl, name, dont_output_data);
+  assemble_variable_contents (decl, name, dont_output_data,
+  sect->common.flags & SECTION_MERGE
+  && sect->common.flags & SECTION_STRINGS);
   if (asan_protected)
 	{
 	  unsigned HOST_WIDE_INT int size
@@ -3300,12 +3303,7 @@ get_constant_section (tree exp, unsigned int align
 static HOST_WIDE_INT
 get_constant_size (tree exp)
 {
-  HOST_WIDE_INT size;
-
-  size = int_size_in_bytes (TREE_TYPE (exp));
-  if (TREE_CODE (exp) == STRING_CST)
-size = MAX (TREE_STRING_LENGTH (exp), size);
-  return size;
+  return int_size_in_bytes (TREE_TYPE (exp));
 }
 
 /* Subroutine of output_constant_def:
@@ -3468,7 +3466,8 @@ maybe_output_constant_def_contents (struct constan
constant's alignment in bits.  */
 
 static void
-assemble_constant_contents (tree exp, const char *label, unsigned int align)
+assemble_constant_contents (tree exp, const char *label, unsigned int align,
+			bool merge_strings = false)
 {
   HOST_WIDE_INT size;
 
@@ -3478,7 +3477,7 @@ static void
   targetm.asm_out.declare_constant_name (asm_out_file, label, exp, size);
 
   /* Output the value of EXP.  */
-  output_constant (exp, size, align, false);
+  output_constant (exp, size, align, false, merge_strings);
 
   targetm.asm_out.decl_end ();
 }
@@ -3519,10 +3518,13 @@ output_constant_def_contents (rtx symbol)
 		   || (VAR_P (decl) && DECL_IN_CONSTANT_POOL (decl))
 		   ? DECL_ALIGN (decl)
 		   : symtab_node::get (decl)->definition_alignment ());
-  switch_to_section (get_constant_section (exp, align));
+  section *sect = get_constant_section (exp, align);
+  switch_to_section (sect);
   if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-  assemble_constant_contents (exp, XSTR (symbol, 

Re: [PATCH] Introduce __builtin_expect_with_probability (PR target/83610).

2018-08-07 Thread Jan Hubicka
> 2018-07-24  Martin Liska  
> 
> PR target/83610
>   * builtin-types.def (BT_FN_LONG_LONG_LONG_DOUBLE): New type.
>   * builtins.c (expand_builtin_expect_with_probability):
> New function.
>   (expand_builtin): Handle also BUILT_IN_EXPECT_WITH_PROBABILITY.
>   (build_builtin_expect_predicate): Likewise.
>   (fold_builtin_expect): Likewise.
>   (fold_builtin_2): Likewise.
>   (fold_builtin_3): Likewise.
>   * builtins.def (BUILT_IN_EXPECT_WITH_PROBABILITY): Define new
> builtin.
>   * builtins.h (fold_builtin_expect): Add new argument
> (probability).
>   * doc/extend.texi: Document the new builtin.
>   * doc/invoke.texi: Likewise.
>   * gimple-fold.c (gimple_fold_call): Pass new argument.
>   * ipa-fnsummary.c (find_foldable_builtin_expect):
> Handle also BUILT_IN_EXPECT_WITH_PROBABILITY.
>   * predict.c (expr_expected_value): Add new out argument which
> is probability.
>   (expr_expected_value_1): Likewise.
>   (tree_predict_by_opcode): Predict edge based on
> provided probability.
>   (pass_strip_predict_hints::execute): Use newly added
> DECL_BUILT_IN_P macro.
>   * predict.def (PRED_BUILTIN_EXPECT_WITH_PROBABILITY):
> Define new predictor.
>   * tree.h (DECL_BUILT_IN_P): Define.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-07-24  Martin Liska  
> 
>   * gcc.dg/predict-16.c: New test.
>   * gcc.dg/predict-17.c: New test.

OK
> +@deftypefn {Built-in Function} long __builtin_expect_with_probability
> +(long @var{exp}, long @var{c}, long @var{probability})
> +
> +The built-in has same semantics as @code{__builtin_expect_with_probability},
> +but user can provide expected probability (in percent) for value of 
> @var{exp}.
> +Last argument @var{probability} is of flaot type and valid values
flaot->float :)
> @@ -2426,10 +2444,10 @@ expr_expected_value_1 (tree type, tree op0, enum 
> tree_code code,
>  {
>tree res;
>enum br_predictor predictor2;
> -  op0 = expr_expected_value (op0, visited, predictor);
> +  op0 = expr_expected_value (op0, visited, predictor, probability);
>if (!op0)
>   return NULL;
> -  op1 = expr_expected_value (op1, visited, );
> +  op1 = expr_expected_value (op1, visited, , probability);
>if (predictor && *predictor < predictor2)
>   *predictor = predictor2;
>if (!op1)

Here you need to combine probabilities together.

I would probably convert it early into profile_probability type (at the time
builtin is handled) because then combination is better defined.

If probability is known only over one path I guess you will need to turn
predictor into probability and then do the combination. The predictor combining
logic simply takes the weaker one (assumes that they are all first match), 
perhaps
with probability one can just combine probabilties and throw away predictor info
in this case.

It would be still very nice to arrange loop code to set 
loop->nb_iterations_estimate
accordingly in this case.  That would be useful for loop opts as reliable hint 
that
number of iterations is about that much.

Honza


Re: [PATCH] Add malloc predictor (PR middle-end/83023).

2018-08-07 Thread Martin Liška
On 08/07/2018 01:51 PM, Jan Hubicka wrote:
>>
>> 2018-07-26  Martin Liska  
>>
>> PR middle-end/83023
>>  * predict.c (expr_expected_value_1): Handle DECL_IS_MALLOC,
>> BUILT_IN_REALLOC and DECL_IS_OPERATOR_NEW.
>>  * predict.def (PRED_MALLOC_NONNULL): New predictor.
> 
> Patch is OK. I am still somewhat worried that we will run into functions that
> do return NULL in most times and otherwise they return newly mallocated block.

Thanks.

> 
> For this reason please simply document the behaviour in extend.texi.
> For auto-detected malloc attribute I guess we may invent extra flag about
> probability of NULL return value later if we run into interesting testcases.

I documented that and installed patch as r263355.

Martin

> 
> I think it is a mistake that we don't detect malloc attribute early. It has
> good chance of making the simplifications in early opts to cascade.  I will
> look into this.
> 
> Honza
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2018-07-26  Martin Liska  
>>
>> PR middle-end/83023
>>  * gcc.dg/predict-16.c: New test.
>>  * g++.dg/predict-1.C: New test.
>> ---
>>  gcc/predict.c | 12 +++
>>  gcc/predict.def   |  5 -
>>  gcc/testsuite/g++.dg/predict-1.C  | 15 +
>>  gcc/testsuite/gcc.dg/predict-16.c | 36 +++
>>  4 files changed, 67 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/g++.dg/predict-1.C
>>  create mode 100644 gcc/testsuite/gcc.dg/predict-16.c
>>
>> diff --git a/gcc/predict.c b/gcc/predict.c
>> index 2ee8274fe61..ef6794edda5 100644
>> --- a/gcc/predict.c
>> +++ b/gcc/predict.c
>> @@ -2380,6 +2380,14 @@ expr_expected_value_1 (tree type, tree op0, enum 
>> tree_code code,
>>  }
>>return NULL;
>>  }
>> +
>> +  if (DECL_IS_MALLOC (decl) || DECL_IS_OPERATOR_NEW (decl))
>> +{
>> +  if (predictor)
>> +*predictor = PRED_MALLOC_NONNULL;
>> +  return boolean_true_node;
>> +}
>> +
>>if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
>>  switch (DECL_FUNCTION_CODE (decl))
>>{
>> @@ -2420,6 +2428,10 @@ expr_expected_value_1 (tree type, tree op0, enum 
>> tree_code code,
>>  if (predictor)
>>*predictor = PRED_COMPARE_AND_SWAP;
>>  return boolean_true_node;
>> +  case BUILT_IN_REALLOC:
>> +if (predictor)
>> +  *predictor = PRED_MALLOC_NONNULL;
>> +return boolean_true_node;
>>default:
>>  break;
>>  }
>> diff --git a/gcc/predict.def b/gcc/predict.def
>> index 03900bf89b3..bf659704bfc 100644
>> --- a/gcc/predict.def
>> +++ b/gcc/predict.def
>> @@ -55,6 +55,10 @@ DEF_PREDICTOR (PRED_UNCONDITIONAL, "unconditional jump", 
>> PROB_ALWAYS,
>>  DEF_PREDICTOR (PRED_BUILTIN_UNPREDICTABLE, "__builtin_unpredictable", 
>> PROB_EVEN,
>>PRED_FLAG_FIRST_MATCH)
>>  
>> +/* Return value of malloc function is almost always non-null.  */
>> +DEF_PREDICTOR (PRED_MALLOC_NONNULL, "malloc returned non-NULL", \
>> +   PROB_VERY_LIKELY, PRED_FLAG_FIRST_MATCH)
>> +
>>  /* Use number of loop iterations determined by # of iterations
>> analysis to set probability.  We don't want to use Dempster-Shaffer
>> theory here, as the predictions is exact.  */
>> @@ -173,7 +177,6 @@ DEF_PREDICTOR (PRED_HOT_LABEL, "hot label", HITRATE 
>> (85), 0)
>>  DEF_PREDICTOR (PRED_COLD_LABEL, "cold label", PROB_VERY_LIKELY,
>> PRED_FLAG_FIRST_MATCH)
>>  
>> -
>>  /* The following predictors are used in Fortran. */
>>  
>>  /* Branch leading to an integer overflow are extremely unlikely.  */
>> diff --git a/gcc/testsuite/g++.dg/predict-1.C 
>> b/gcc/testsuite/g++.dg/predict-1.C
>> new file mode 100644
>> index 000..8e2032f33a4
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/predict-1.C
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
>> +
>> +#include 
>> +
>> +int *r;
>> +
>> +void test()
>> +{
>> +  r = new(std::nothrow) int;
>> +  if (r)
>> +__builtin_memset (r, 0, sizeof(int));
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "malloc returned non-NULL heuristics of 
>> edge\[^:\]*: 99.96%" "profile_estimate"} } */
>> diff --git a/gcc/testsuite/gcc.dg/predict-16.c 
>> b/gcc/testsuite/gcc.dg/predict-16.c
>> new file mode 100644
>> index 000..e1f331b909a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/predict-16.c
>> @@ -0,0 +1,36 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
>> +
>> +#include 
>> +#include 
>> +
>> +void *r;
>> +void *r2;
>> +void *r3;
>> +void *r4;
>> +void *r5;
>> +
>> +void *m (size_t s, int c)
>> +{
>> +  r = malloc (s);
>> +  if (r)
>> +memset (r, 0, s);
>> +
>> +  r2 = calloc (s, 0);
>> +  if (r2)
>> +memset (r2, 0, s);
>> +
>> +  r3 = __builtin_malloc (s);
>> +  if (r3)
>> +memset (r3, 0, 

Re: [C++ PATCH] PR c++/79133

2018-08-07 Thread Ville Voutilainen
On 7 August 2018 at 14:50, Jason Merrill  wrote:
> OK.

Trunk only, no backports, presumably. I asked about backports in the
second-to-last submission, but I am gravitating
away from backporting this, it's a non-regression that's dealing with
a fairly recent change, so I'm going to operate
with a trunk-only agenda. :)


Re: [PATCH] Add malloc predictor (PR middle-end/83023).

2018-08-07 Thread Jan Hubicka
> 
> 2018-07-26  Martin Liska  
> 
> PR middle-end/83023
>   * predict.c (expr_expected_value_1): Handle DECL_IS_MALLOC,
> BUILT_IN_REALLOC and DECL_IS_OPERATOR_NEW.
>   * predict.def (PRED_MALLOC_NONNULL): New predictor.

Patch is OK. I am still somewhat worried that we will run into functions that
do return NULL in most times and otherwise they return newly mallocated block.

For this reason please simply document the behaviour in extend.texi.
For auto-detected malloc attribute I guess we may invent extra flag about
probability of NULL return value later if we run into interesting testcases.

I think it is a mistake that we don't detect malloc attribute early. It has
good chance of making the simplifications in early opts to cascade.  I will
look into this.

Honza
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-07-26  Martin Liska  
> 
> PR middle-end/83023
>   * gcc.dg/predict-16.c: New test.
>   * g++.dg/predict-1.C: New test.
> ---
>  gcc/predict.c | 12 +++
>  gcc/predict.def   |  5 -
>  gcc/testsuite/g++.dg/predict-1.C  | 15 +
>  gcc/testsuite/gcc.dg/predict-16.c | 36 +++
>  4 files changed, 67 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/predict-1.C
>  create mode 100644 gcc/testsuite/gcc.dg/predict-16.c
> 
> diff --git a/gcc/predict.c b/gcc/predict.c
> index 2ee8274fe61..ef6794edda5 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -2380,6 +2380,14 @@ expr_expected_value_1 (tree type, tree op0, enum 
> tree_code code,
>   }
> return NULL;
>   }
> +
> +   if (DECL_IS_MALLOC (decl) || DECL_IS_OPERATOR_NEW (decl))
> + {
> +   if (predictor)
> + *predictor = PRED_MALLOC_NONNULL;
> +   return boolean_true_node;
> + }
> +
> if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
>   switch (DECL_FUNCTION_CODE (decl))
> {
> @@ -2420,6 +2428,10 @@ expr_expected_value_1 (tree type, tree op0, enum 
> tree_code code,
>   if (predictor)
> *predictor = PRED_COMPARE_AND_SWAP;
>   return boolean_true_node;
> +   case BUILT_IN_REALLOC:
> + if (predictor)
> +   *predictor = PRED_MALLOC_NONNULL;
> + return boolean_true_node;
> default:
>   break;
>   }
> diff --git a/gcc/predict.def b/gcc/predict.def
> index 03900bf89b3..bf659704bfc 100644
> --- a/gcc/predict.def
> +++ b/gcc/predict.def
> @@ -55,6 +55,10 @@ DEF_PREDICTOR (PRED_UNCONDITIONAL, "unconditional jump", 
> PROB_ALWAYS,
>  DEF_PREDICTOR (PRED_BUILTIN_UNPREDICTABLE, "__builtin_unpredictable", 
> PROB_EVEN,
>PRED_FLAG_FIRST_MATCH)
>  
> +/* Return value of malloc function is almost always non-null.  */
> +DEF_PREDICTOR (PRED_MALLOC_NONNULL, "malloc returned non-NULL", \
> +PROB_VERY_LIKELY, PRED_FLAG_FIRST_MATCH)
> +
>  /* Use number of loop iterations determined by # of iterations
> analysis to set probability.  We don't want to use Dempster-Shaffer
> theory here, as the predictions is exact.  */
> @@ -173,7 +177,6 @@ DEF_PREDICTOR (PRED_HOT_LABEL, "hot label", HITRATE (85), 
> 0)
>  DEF_PREDICTOR (PRED_COLD_LABEL, "cold label", PROB_VERY_LIKELY,
>  PRED_FLAG_FIRST_MATCH)
>  
> -
>  /* The following predictors are used in Fortran. */
>  
>  /* Branch leading to an integer overflow are extremely unlikely.  */
> diff --git a/gcc/testsuite/g++.dg/predict-1.C 
> b/gcc/testsuite/g++.dg/predict-1.C
> new file mode 100644
> index 000..8e2032f33a4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/predict-1.C
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
> +
> +#include 
> +
> +int *r;
> +
> +void test()
> +{
> +  r = new(std::nothrow) int;
> +  if (r)
> +__builtin_memset (r, 0, sizeof(int));
> +}
> +
> +/* { dg-final { scan-tree-dump "malloc returned non-NULL heuristics of 
> edge\[^:\]*: 99.96%" "profile_estimate"} } */
> diff --git a/gcc/testsuite/gcc.dg/predict-16.c 
> b/gcc/testsuite/gcc.dg/predict-16.c
> new file mode 100644
> index 000..e1f331b909a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/predict-16.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
> +
> +#include 
> +#include 
> +
> +void *r;
> +void *r2;
> +void *r3;
> +void *r4;
> +void *r5;
> +
> +void *m (size_t s, int c)
> +{
> +  r = malloc (s);
> +  if (r)
> +memset (r, 0, s);
> +
> +  r2 = calloc (s, 0);
> +  if (r2)
> +memset (r2, 0, s);
> +
> +  r3 = __builtin_malloc (s);
> +  if (r3)
> +memset (r3, 0, s);
> +
> +  r4 = __builtin_calloc (s, 0);
> +  if (r4)
> +memset (r4, 0, s);
> +
> +  r5 = __builtin_realloc (r4, s);
> +  if (r5)
> +memset (r4, 0, s);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "malloc returned non-NULL 

Re: [C++ PATCH] PR c++/79133

2018-08-07 Thread Jason Merrill
OK.

On Tue, Aug 7, 2018 at 8:43 PM, Ville Voutilainen
 wrote:
> On 7 August 2018 at 13:12, Jason Merrill  wrote:
>> Maybe put this block first rather than add !is_capture_proxy to the if
>> condition?
>
> Thus?


Re: [C++ Patch] PR 59480 ("Missing error diagnostic: friend declaration specifying a default argument must be a definition")​ (Take 2)

2018-08-07 Thread Jason Merrill
OK.


On Wed, Aug 1, 2018 at 8:38 PM, Paolo Carlini  wrote:
> Hi,
>
> thus, as you may or may not have noticed I reverted my first try, when
> Tobias noticed that in his large codebase we were rejecting code like:
>
> class Matrix;
>
> Matrix rot90 (const Matrix& a, int k = 1);
>
> class Matrix {
>   friend Matrix rot90 (const Matrix&, int);
> };
>
> Matrix rot90 (const Matrix& a, int k) { return Matrix(); }
>
> this was happening because, when duplicate_decls saw the friend declaration,
> smashed together the first two rot90 declaration and we ended up with a
> friend declaration with defaults (taken from the first rot90 declaration),
> as such rejected the third time we saw rot90. I think we can elegantly
> handle this kind of problem by exploiting the DECL_HIDDEN_FRIEND_P
> information, thus, in check_no_redeclaration_friend_default_args, as called
> by duplicate_decls, if the newdecl doesn't have DECL_FRIEND_P set, disregard
> an olddecl which doesn't have DECL_HIDDEN_FRIEND_P set (we can't do this
> directly because duplicate_decls is already smashing decls together at that
> point, thus we need to save the info and pass it as an argument). I added
> quite a few additional tests an also asked Tobias to double check on his
> code base. All good.
>
> As a general arguments of why I think moving from DECL_FRIEND_P to
> DECL_HIDDEN_FRIEND_P for the olddecl is the right thing to do, if the
> olddecl is a friend but doesn't have DECL_HIDDEN_FRIEND_P set, there was a
> declaration preceding it, thus, either the friend declaration has default
> arguments and we already diagnosed that, or it doesn't , thus isn't
> interesting anymore for the diagnostic at issue.
>
> Thanks! Paolo.
>
> ///


[PATCH] PR86844: Fix for store merging

2018-08-07 Thread Andreas Krebbel
From: Andreas Krebbel 

Bootstrapped and regtested on s390x and x86_64.

gcc/ChangeLog:

2018-08-07  Andreas Krebbel  

PR tree-optimization/86844
* gimple-ssa-store-merging.c (check_no_overlap): Add a check to
reject overlaps if it has seen a non-constant store in between.

gcc/testsuite/ChangeLog:

2018-08-07  Andreas Krebbel  

PR tree-optimization/86844
* gcc.dg/pr86844.c: New test.
---
 gcc/gimple-ssa-store-merging.c |  8 +++-
 gcc/testsuite/gcc.dg/pr86844.c | 42 ++
 2 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr86844.c

diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 0ae4581..2abef2e 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -2401,13 +2401,19 @@ check_no_overlap (vec 
m_store_info, unsigned int i,
  unsigned HOST_WIDE_INT end)
 {
   unsigned int len = m_store_info.length ();
+  bool seen_group_end_store_p = false;
+
   for (++i; i < len; ++i)
 {
   store_immediate_info *info = m_store_info[i];
   if (info->bitpos >= end)
break;
+  if (info->rhs_code != INTEGER_CST)
+   seen_group_end_store_p = true;
   if (info->order < last_order
- && (rhs_code != INTEGER_CST || info->rhs_code != INTEGER_CST))
+ && (rhs_code != INTEGER_CST
+ || info->rhs_code != INTEGER_CST
+ || seen_group_end_store_p))
return false;
 }
   return true;
diff --git a/gcc/testsuite/gcc.dg/pr86844.c b/gcc/testsuite/gcc.dg/pr86844.c
new file mode 100644
index 000..9ef08e9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr86844.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-options "-O1 -fstore-merging" } */
+
+#include 
+
+struct foo
+{
+  union
+  {
+uint32_t u4i;
+
+struct
+{
+  uint8_t x;
+  uint8_t y;
+  uint8_t z;
+  uint8_t w;
+} s;
+  } u;
+  uint8_t v;
+};
+
+void __attribute__((noinline,noclone))
+f (struct foo *a)
+{
+  a->u.u4i = 0;
+  a->u.s.w = 222;
+  a->u.s.y = 129;
+  a->u.s.z = a->v;
+}
+
+int
+main ()
+{
+  struct foo s;
+
+  f ();
+
+  if (s.u.s.w != 222)
+__builtin_abort ();
+}
-- 
2.9.1



[PATCH] Define monotonic_buffer_resource members out-of-line

2018-08-07 Thread Jonathan Wakely

Move the allocation logic into libstdc++.so so that it can be changed
without worrying about inlined code in existing binaries.

Leave do_allocate inline so that calls to it can be devirtualized, and
only the slow path needs to call into the library.

* config/abi/pre/gnu.ver: Export monotonic_buffer_resource members.
* include/std/memory_resource (monotonic_buffer_resource::release):
Call _M_release_buffers to free buffers.
(monotonic_buffer_resource::do_allocate): Call _M_new_buffer to
allocate a new buffer from upstream.
(monotonic_buffer_resource::_M_new_buffer): Declare.
(monotonic_buffer_resource::_M_release_buffers): Declare.
(monotonic_buffer_resource::_Chunk): Replace definition with
declaration as opaque type.
* src/c++17/memory_resource.cc (monotonic_buffer_resource::_Chunk):
Define.
(monotonic_buffer_resource::_M_new_buffer): Define.
(monotonic_buffer_resource::_M_release_buffers): Define.

Tested powerpc64le-linux, committed to trunk.


commit 736856a34ec11bdf4a4019eccead58b5c6a6b0cc
Author: Jonathan Wakely 
Date:   Mon Aug 6 23:50:54 2018 +0100

Define monotonic_buffer_resource members out-of-line

Move the allocation logic into libstdc++.so so that it can be changed
without worrying about inlined code in existing binaries.

Leave do_allocate inline so that calls to it can be devirtualized, and
only the slow path needs to call into the library.

* config/abi/pre/gnu.ver: Export monotonic_buffer_resource members.
* include/std/memory_resource (monotonic_buffer_resource::release):
Call _M_release_buffers to free buffers.
(monotonic_buffer_resource::do_allocate): Call _M_new_buffer to
allocate a new buffer from upstream.
(monotonic_buffer_resource::_M_new_buffer): Declare.
(monotonic_buffer_resource::_M_release_buffers): Declare.
(monotonic_buffer_resource::_Chunk): Replace definition with
declaration as opaque type.
* src/c++17/memory_resource.cc (monotonic_buffer_resource::_Chunk):
Define.
(monotonic_buffer_resource::_M_new_buffer): Define.
(monotonic_buffer_resource::_M_release_buffers): Define.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 36459e88b6a..593783da1aa 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2043,6 +2043,8 @@ GLIBCXX_3.4.26 {
 _ZNSt3pmr20null_memory_resourceEv;
 _ZNSt3pmr20get_default_resourceEv;
 _ZNSt3pmr20set_default_resourceEPNS_15memory_resourceE;
+_ZNSt3pmr25monotonic_buffer_resource13_M_new_bufferE[jmy][jmy];
+_ZNSt3pmr25monotonic_buffer_resource18_M_release_buffersEv;
 
 } GLIBCXX_3.4.25;
 
diff --git a/libstdc++-v3/include/std/memory_resource 
b/libstdc++-v3/include/std/memory_resource
index b3f8f7d9477..bb4e31551e6 100644
--- a/libstdc++-v3/include/std/memory_resource
+++ b/libstdc++-v3/include/std/memory_resource
@@ -365,7 +365,8 @@ namespace pmr
 void
 release() noexcept
 {
-  _Chunk::release(_M_head, _M_upstream);
+  if (_M_head)
+   _M_release_buffers();
 
   // reset to initial state at contruction:
   if ((_M_current_buf = _M_orig_buf))
@@ -392,19 +393,14 @@ namespace pmr
   if (__bytes == 0)
__bytes = 1; // Ensures we don't return the same pointer twice.
 
-  if (auto __p = std::align(__alignment, __bytes, _M_current_buf, 
_M_avail))
+  void* __p = std::align(__alignment, __bytes, _M_current_buf, _M_avail);
+  if (!__p)
{
- _M_current_buf = (char*)_M_current_buf + __bytes;
- _M_avail -= __bytes;
- return __p;
+ _M_new_buffer(__bytes, __alignment);
+ __p = _M_current_buf;
}
-
-  const size_t __n = std::max(__bytes, _M_next_bufsiz);
-  const size_t __m = std::max(__alignment, alignof(std::max_align_t));
-  auto [__p, __size] = _Chunk::allocate(_M_upstream, __n, __m, _M_head);
-  _M_current_buf = (char*)__p + __bytes;
-  _M_avail = __size - __bytes;
-  _M_next_bufsiz *= _S_growth_factor;
+  _M_current_buf = (char*)_M_current_buf + __bytes;
+  _M_avail -= __bytes;
   return __p;
 }
 
@@ -417,6 +413,15 @@ namespace pmr
 { return this == &__other; }
 
   private:
+// Update _M_current_buf and _M_avail to refer to a new buffer with
+// at least the specified size and alignment, allocated from upstream.
+void
+_M_new_buffer(size_t __bytes, size_t __alignment);
+
+// Deallocate all buffers obtained from upstream.
+void
+_M_release_buffers() noexcept;
+
 static size_t
 _S_next_bufsize(size_t __buffer_size) noexcept
 {
@@ -437,68 +442,7 @@ namespace pmr
 void* const_M_orig_buf = nullptr;
 size_t const   _M_orig_size = 

Re: [PATCH] convert braced initializers to strings (PR 71625)

2018-08-07 Thread Joseph Myers
On Tue, 7 Aug 2018, Martin Sebor wrote:

> 2) skipping embedded nuls made it possible to create a string
> with fewer elements than the initializer array, which caused
> arrays with unspecified bound to be smaller than they would
> have been otherwise

I think there should be explicit tests of sizeof for arrays with 
unspecified bound - to make sure both that it isn't any smaller than it 
should be, but also that any NULs implicitly added for a STRING_CST don't 
make the arrays any larger than their size should be for the originally 
given initializer that doesn't have a 0 as the last element.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC,PATCH] Introduce -msdata=explicit for powerpc

2018-08-07 Thread Segher Boessenkool
Hi!

On Tue, Aug 07, 2018 at 02:18:59AM -0300, Alexandre Oliva wrote:
> This option allows users to manually select what goes in the limited
> small data range, and still get smaller and faster small data access
> sequences for the selected data.

> We've considered adding a new attribute, say "sdata", to let the
> compiler pick the sdata/sbss section name itself, but that's not
> strictly necessary since attribute section already does what we need.
> I'm not sure how the semantics of conflicting attributes should be
> implemented, but if others think it's a good idea, I could give it a
> shot.  Like, if attribute((sdata, section("data"))) is given, should the
> variable be placed in .data but be accessed using sdata addressing
> modes?  Should we reject that with an error?  Should we warn and ignore
> one of the attributes?  Something else?

> I saw comments, docs and init code that suggested the possibility of
> using r2/.sdata2 for small data, but I couldn't get code to be generated
> for such access, even with very old toolchains.  I'm not sure I'm just
> missing how this magic comes about, or whether it hasn't been available
> for a very long time but nobody removed the comments/docs.  Assuming I'm
> missing something, I put in the possibility of using r2 in the test in
> the patch, but I'm sure I have not exercised it to make sure I got it
> right.  Help?

attribute(section("sdata2")))  works, perhaps.  Nothing is put there
implicitly afais.  What do the ABIs say?

> I have not YET given this much testing.  I'm posting it so as to give
> ppc maintainers an opportunity to comment on the proposed approach, in
> hopes of getting buy-in for the idea, if not necessarily for the patch,
> but I welcome alternate suggestions to enable users to choose what goes
> in faster sdata when there's too much data for size-based assignment to
> place interesting variables in sdata without overflowing its range.

There is 64kB of sdata, and the maximum size of an object to be put there
is 8 bytes by default.  That will require an awful lot of manual markup.

You can use different -msdata options per object file, maybe tune -G a
bit, and change that section attribute (or the target attribute) where
you want to (or does that not work?)

The approach looks like it should work, but it does not seem all that
convenient to me.  Well I'm sure you try it out on a reasonably sized
project, and you'll find out if it is handy or not :-)


Segher


Re: [C++ PATCH] PR c++/79133

2018-08-07 Thread Ville Voutilainen
On 7 August 2018 at 13:12, Jason Merrill  wrote:
> Maybe put this block first rather than add !is_capture_proxy to the if
> condition?

Thus?
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 3aafb0f..0faf739 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -2640,13 +2640,29 @@ check_local_shadow (tree decl)
 		  || TREE_CODE (decl) == TYPE_DECL)))
   && DECL_FUNCTION_SCOPE_P (old)
   && (!DECL_ARTIFICIAL (decl)
+	  || is_capture_proxy (decl)
 	  || DECL_IMPLICIT_TYPEDEF_P (decl)
 	  || (VAR_P (decl) && DECL_ANON_UNION_VAR_P (decl
 {
   /* DECL shadows a local thing possibly of interest.  */
 
+  /* DR 2211: check that captures and parameters
+	 do not have the same name. */
+  if (is_capture_proxy (decl))
+	{
+	  if (current_lambda_expr ()
+	  && DECL_CONTEXT (old) == lambda_function (current_lambda_expr ())
+	  && TREE_CODE (old) == PARM_DECL
+	  && DECL_NAME (decl) != this_identifier)
+	{
+	  error_at (DECL_SOURCE_LOCATION (old),
+			"lambda parameter %qD "
+			"previously declared as a capture", old);
+	}
+	  return;
+	}
   /* Don't complain if it's from an enclosing function.  */
-  if (DECL_CONTEXT (old) == current_function_decl
+  else if (DECL_CONTEXT (old) == current_function_decl
 	  && TREE_CODE (decl) != PARM_DECL
 	  && TREE_CODE (old) == PARM_DECL)
 	{
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-shadow3.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-shadow3.C
new file mode 100644
index 000..8364321
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-shadow3.C
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++11 } }
+
+int main() {
+  int x = 42;
+  auto lambda = [x](int x) {}; // { dg-error "previously declared as a capture" }
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic18.C b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic18.C
new file mode 100644
index 000..1eb9cce
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic18.C
@@ -0,0 +1,11 @@
+// { dg-do compile { target c++14 } }
+
+int main() {
+  int x = 42;
+  auto lambda2 = [x=x](int x) {}; // { dg-error "previously declared as a capture" }
+  auto lambda3 = [x](auto... x) {}; // { dg-error "previously declared as a capture" }
+  auto lambda4 = [](auto... x) {
+auto lambda5 = [x...](auto... x) {};  // { dg-error "previously declared as a capture" }
+auto lambda6 = [x...](int x) {};  // { dg-error "previously declared as a capture" }
+  };
+}


Re: [C++ PATCH] PR c++/79133

2018-08-07 Thread Jason Merrill

On 08/07/2018 07:12 AM, Ville Voutilainen wrote:

On 8 July 2018 at 02:08, Ville Voutilainen  wrote:

On 8 July 2018 at 01:54, Paolo Carlini  wrote:

That would make this more consistent with such a shadow warning, but I
don't want
to use the shadowing wording (which would be easy to do; just set
'shadowed' and do
a 'goto inform'), because this isn't shadowing in the precise sense;
the shadowing cases
are warnings, whereas this is more like the redeclaration errors in
the same function.


... indeed and that annoys me a bit. Not having studied at all c++/79133 so
far (sorry) it seems a little weird to me that according to the standard we
have to handle the two types of "shadowing" in different ways, one more
strict, one less. Thus I would suggest double checking the details of that,
eventually with Jason too in terms of the actual patch you would like to
apply.


Well. The PR is about DR 2211 which, in simple terms, says that lambda
parameters
and captures cannot have the same name. See
http://open-std.org/JTC1/SC22/WG21/docs/cwg_defects.html#2211

That's stricter than -Wshadow, but otherwise equally strict as the
other error cases already handled
in check_local_shadow. So I'll make this error case more consistent
with the others. We already
handle redeclaration errors slightly differently from shadowing
warnings in that function.


Here's an updated patch. Tested on Linux-PPC64, OK for trunk?

Backports?

2018-08-06  Ville Voutilainen  

 gcc/cp/

 PR c++/79133
 * name-lookup.c (check_local_shadow): Reject captures and parameters
 with the same name.



   if (DECL_CONTEXT (old) == current_function_decl
  && TREE_CODE (decl) != PARM_DECL
- && TREE_CODE (old) == PARM_DECL)
+ && TREE_CODE (old) == PARM_DECL
+ && !is_capture_proxy (decl))
{
  /* Go to where the parms should be and see if we find
 them there.  */
@@ -2665,6 +2667,21 @@ check_local_shadow (tree decl)
  return;
}
}
+  /* DR 2211: check that captures and parameters
+do not have the same name. */
+  else if (is_capture_proxy (decl))


Maybe put this block first rather than add !is_capture_proxy to the if 
condition?


Jason


RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/6)]

2018-08-07 Thread Tamar Christina
Hi All,

This is  a re-spin of the patch to address review comments.
It mostly just adds more comments and corrects typos.


Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-07  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* config/aarch64/aarch64.md (cmp,
probe_stack_range): Add k (SP) constraint.
* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
STACK_CLASH_MAX_UNROLL_PAGES): New.
* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
stack probes for stack clash.
(aarch64_allocate_and_probe_stack_space): New.
(aarch64_expand_prologue): Use it.
(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
(aarch64_sub_sp): Add emit_move_imm optional param.

gcc/testsuite/
2018-08-07  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-12.c: New.
* gcc.target/aarch64/stack-check-13.c: New.
* gcc.target/aarch64/stack-check-cfa-1.c: New.
* gcc.target/aarch64/stack-check-cfa-2.c: New.
* gcc.target/aarch64/stack-check-prologue-1.c: New.
* gcc.target/aarch64/stack-check-prologue-10.c: New.
* gcc.target/aarch64/stack-check-prologue-11.c: New.
* gcc.target/aarch64/stack-check-prologue-2.c: New.
* gcc.target/aarch64/stack-check-prologue-3.c: New.
* gcc.target/aarch64/stack-check-prologue-4.c: New.
* gcc.target/aarch64/stack-check-prologue-5.c: New.
* gcc.target/aarch64/stack-check-prologue-6.c: New.
* gcc.target/aarch64/stack-check-prologue-7.c: New.
* gcc.target/aarch64/stack-check-prologue-8.c: New.
* gcc.target/aarch64/stack-check-prologue-9.c: New.
* gcc.target/aarch64/stack-check-prologue.h: New.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection): Add AArch64.

> -Original Message-
> From: Jeff Law 
> Sent: Friday, August 3, 2018 19:05
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh
> ; Richard Earnshaw
> ; Marcus Shawcroft
> 
> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/6)]
> 
> On 07/25/2018 05:09 AM, Tamar Christina wrote:
> > Hi All,
> >
> > Attached is an updated patch that clarifies some of the comments in
> > the patch and adds comments to the individual testcases as requested.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/
> > 2018-07-25  Jeff Law  
> > Richard Sandiford 
> > Tamar Christina  
> >
> > PR target/86486
> > * config/aarch64/aarch64.md (cmp,
> > probe_stack_range): Add k (SP) constraint.
> > * config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
> > STACK_CLASH_MAX_UNROLL_PAGES): New.
> > * config/aarch64/aarch64.c (aarch64_output_probe_stack_range):
> Emit
> > stack probes for stack clash.
> > (aarch64_allocate_and_probe_stack_space): New.
> > (aarch64_expand_prologue): Use it.
> > (aarch64_expand_epilogue): Likewise and update IP regs re-use
> criteria.
> > (aarch64_sub_sp): Add emit_move_imm optional param.
> >
> > gcc/testsuite/
> > 2018-07-25  Jeff Law  
> > Richard Sandiford 
> > Tamar Christina  
> >
> > PR target/86486
> > * gcc.target/aarch64/stack-check-12.c: New.
> > * gcc.target/aarch64/stack-check-13.c: New.
> > * gcc.target/aarch64/stack-check-cfa-1.c: New.
> > * gcc.target/aarch64/stack-check-cfa-2.c: New.
> > * gcc.target/aarch64/stack-check-prologue-1.c: New.
> > * gcc.target/aarch64/stack-check-prologue-10.c: New.
> > * gcc.target/aarch64/stack-check-prologue-11.c: New.
> > * gcc.target/aarch64/stack-check-prologue-2.c: New.
> > * gcc.target/aarch64/stack-check-prologue-3.c: New.
> > * gcc.target/aarch64/stack-check-prologue-4.c: New.
> > * gcc.target/aarch64/stack-check-prologue-5.c: New.
> > * gcc.target/aarch64/stack-check-prologue-6.c: New.
> > * gcc.target/aarch64/stack-check-prologue-7.c: New.
> > * gcc.target/aarch64/stack-check-prologue-8.c: New.
> > * gcc.target/aarch64/stack-check-prologue-9.c: New.
> > * gcc.target/aarch64/stack-check-prologue.h: New.
> > * lib/target-supports.exp
> > (check_effective_target_supports_stack_clash_protection): Add
> AArch64.
> OK on my end.  AArch64 maintainers have the final say since this is all
> AArch64 specific bits.
> 
> jeff
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 976f9afae54c1c98c22a4d5002d8d94e33b3190a..1345f0eb171d05e2b833935c0a32f79c3db03f99 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -84,6 +84,14 @@
 
 #define LONG_DOUBLE_TYPE_SIZE	128
 
+/* This value is the amount of bytes a caller is allowed to drop the stack
+   before probing has to be done for stack clash protection.  */

RE: [PATCH][GCC][AArch64] Ensure that outgoing argument size is at least 8 bytes when alloca and stack-clash. [Patch (3/6)]

2018-08-07 Thread Tamar Christina
Hi All,

This is a re-spin to address review comments. No code change aside from a 
variable rename.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-07  Tamar Christina  

PR target/86486
* config/aarch64/aarch64.h (STACK_CLASH_MIN_BYTES_OUTGOING_ARGS,
STACK_DYNAMIC_OFFSET): New.
* config/aarch64/aarch64.c (aarch64_layout_frame):
Update outgoing args size.
(aarch64_stack_clash_protection_alloca_probe_range,
TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE): New.

gcc/testsuite/
2018-08-07  Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-alloca-1.c: New.
* gcc.target/aarch64/stack-check-alloca-10.c: New.
* gcc.target/aarch64/stack-check-alloca-2.c: New.
* gcc.target/aarch64/stack-check-alloca-3.c: New.
* gcc.target/aarch64/stack-check-alloca-4.c: New.
* gcc.target/aarch64/stack-check-alloca-5.c: New.
* gcc.target/aarch64/stack-check-alloca-6.c: New.
* gcc.target/aarch64/stack-check-alloca-7.c: New.
* gcc.target/aarch64/stack-check-alloca-8.c: New.
* gcc.target/aarch64/stack-check-alloca-9.c: New.
* gcc.target/aarch64/stack-check-alloca.h: New.
* gcc.target/aarch64/stack-check-14.c: New.
* gcc.target/aarch64/stack-check-15.c: New.

> -Original Message-
> From: Jeff Law 
> Sent: Friday, August 3, 2018 19:05
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: Re: [PATCH][GCC][AArch64] Ensure that outgoing argument size is at
> least 8 bytes when alloca and stack-clash. [Patch (3/6)]
> 
> On 07/25/2018 05:09 AM, Tamar Christina wrote:
> > Hi All,
> >
> > Attached an updated patch which documents what the test cases are
> expecting as requested.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/
> > 2018-07-25  Tamar Christina  
> >
> > PR target/86486
> > * config/aarch64/aarch64.h (STACK_CLASH_OUTGOING_ARGS,
> > STACK_DYNAMIC_OFFSET): New.
> > * config/aarch64/aarch64.c (aarch64_layout_frame):
> > Update outgoing args size.
> > (aarch64_stack_clash_protection_alloca_probe_range,
> > TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE):
> New.
> >
> > gcc/testsuite/
> > 2018-07-25  Tamar Christina  
> >
> > PR target/86486
> > * gcc.target/aarch64/stack-check-alloca-1.c: New.
> > * gcc.target/aarch64/stack-check-alloca-10.c: New.
> > * gcc.target/aarch64/stack-check-alloca-2.c: New.
> > * gcc.target/aarch64/stack-check-alloca-3.c: New.
> > * gcc.target/aarch64/stack-check-alloca-4.c: New.
> > * gcc.target/aarch64/stack-check-alloca-5.c: New.
> > * gcc.target/aarch64/stack-check-alloca-6.c: New.
> > * gcc.target/aarch64/stack-check-alloca-7.c: New.
> > * gcc.target/aarch64/stack-check-alloca-8.c: New.
> > * gcc.target/aarch64/stack-check-alloca-9.c: New.
> > * gcc.target/aarch64/stack-check-alloca.h: New.
> > * gcc.target/aarch64/stack-check-14.c: New.
> > * gcc.target/aarch64/stack-check-15.c: New.
> >
> >> -Original Message-
> >> From: Tamar Christina
> >> Sent: Friday, July 13, 2018 17:36
> >> To: Tamar Christina ;
> >> gcc-patches@gcc.gnu.org
> >> Cc: nd ; James Greenhalgh
> ;
> >> Richard Earnshaw ; Marcus Shawcroft
> >> 
> >> Subject: RE: [PATCH][GCC][AArch64] Ensure that outgoing argument size
> >> is at least 8 bytes when alloca and stack-clash. [Patch (3/6)]
> >>
> >> Hi All,
> >>
> >> I'm sending an updated patch which updates a testcase that  hits one
> >> of our corner cases.
> >> This is an assembler scan only update in a testcase.
> >>
> >> Regards,
> >> Tamar
> >>
> >>> -Original Message-
> >>> From: Tamar Christina 
> >>> Sent: Wednesday, July 11, 2018 12:21
> >>> To: gcc-patches@gcc.gnu.org
> >>> Cc: nd ; James Greenhalgh
> ;
> >>> Richard Earnshaw ; Marcus Shawcroft
> >>> 
> >>> Subject: [PATCH][GCC][AArch64] Ensure that outgoing argument size is
> >>> at least 8 bytes when alloca and stack-clash. [Patch (3/6)]
> >>>
> >>> Hi All,
> >>>
> >>> This patch adds a requirement that the number of outgoing arguments
> >>> for a function is at least 8 bytes when using stack-clash protection.
> >>>
> >>> By using this condition we can avoid a check in the alloca code and
> >>> so have smaller and simpler code there.
> >>>
> >>> A simplified version of the AArch64 stack frames is:
> >>>
> >>>+---+
> >>>|   |
> >>>|   |
> >>>|   |
> >>>+---+
> >>>|LR |
> >>>+---+
> >>>|FP |
> >>>+---+
> >>>|dynamic allocations|   expanding area which will push the
> outgoing
> >>>+---+   args down during each allocation.
> >>>|padding|
> >>>+---+
> >>>

RE: [PATCH][GCC][AArch64] Validate and set default parameters for stack-clash. [Patch (3/3)]

2018-08-07 Thread Tamar Christina
Hi All,

This is an updated patch which applies the same style changes as requested in 
patch 5/6.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-07  Tamar Christina  

* common/config/aarch64/aarch64-common.c (TARGET_OPTION_DEFAULT_PARAM,
aarch64_option_default_param):  New.
(params.h): Include.
(TARGET_OPTION_VALIDATE_PARAM, aarch64_option_validate_param): New.
* config/aarch64/aarch64.c (aarch64_override_options_internal): Simplify
stack-clash protection validation code.

> -Original Message-
> From: Tamar Christina
> Sent: Friday, July 13, 2018 17:36
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: RE: [PATCH][GCC][AArch64] Validate and set default parameters for
> stack-clash. [Patch (3/3)]
> 
> Hi All,
> 
> I am sending an updated patch which takes into account a case where the set
> parameter value would not be safe to call.
> 
> No change in the cover letter.
> 
> Regards,
> Tamar
> 
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, July 11, 2018 12:25
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; James Greenhalgh ;
> > Richard Earnshaw ; Marcus Shawcroft
> > 
> > Subject: [PATCH][GCC][AArch64] Validate and set default parameters for
> > stack-clash. [Patch (3/3)]
> >
> > Hi All,
> >
> > This patch defines the default parameters and validation for the
> > aarch64 stack clash probing interval and guard sizes.  It cleans up
> > the previous implementation and insures that at no point the
> > invalidate arguments are present in the pipeline for AArch64.
> > Currently they are only corrected once
> > cc1 initalizes the back-end.
> >
> > The default for AArch64 is 64 KB for both of these and we only support
> > 4 KB and 64 KB probes.  We also enforce that any value you set here
> > for the parameters must be in sync.
> >
> > If an invalid value is specified an error will be generated and
> > compilation aborted.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > Target was tested with stack clash on and off by default.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/
> > 2018-07-11  Tamar Christina  
> >
> > * common/config/aarch64/aarch64-common.c
> > (TARGET_OPTION_DEFAULT_PARAM,
> > aarch64_option_default_param):  New.
> > (params.h): Include.
> > (TARGET_OPTION_VALIDATE_PARAM,
> > aarch64_option_validate_param): New.
> > * config/aarch64/aarch64.c (aarch64_override_options_internal):
> > Simplify
> > stack-clash protection validation code.
> >
> > --
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 292fb818705d4650113da59a6d88cf2aa7c9e57d..97d7770190b8388ba277db96c07609f7b6f46817 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -30,6 +30,7 @@
 #include "opts.h"
 #include "flags.h"
 #include "diagnostic.h"
+#include "params.h"
 
 #ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
@@ -41,6 +42,10 @@
 
 #undef	TARGET_OPTION_OPTIMIZATION_TABLE
 #define TARGET_OPTION_OPTIMIZATION_TABLE aarch_option_optimization_table
+#undef TARGET_OPTION_DEFAULT_PARAMS
+#define TARGET_OPTION_DEFAULT_PARAMS aarch64_option_default_params
+#undef TARGET_OPTION_VALIDATE_PARAM
+#define TARGET_OPTION_VALIDATE_PARAM aarch64_option_validate_param
 
 /* Set default optimization options.  */
 static const struct default_options aarch_option_optimization_table[] =
@@ -60,6 +65,49 @@ static const struct default_options aarch_option_optimization_table[] =
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
+/* Implement target validation TARGET_OPTION_DEFAULT_PARAM.  */
+
+static bool
+aarch64_option_validate_param (const int value, const int param)
+{
+  /* Check that both parameters are the same.  */
+  if (param == (int) PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE)
+{
+  if (value != 12 && value != 16)
+	{
+	  error ("only values 12 (4 KB) and 16 (64 KB) are supported for guard "
+		 "size.  Given value %d (%llu KB) is out of range.",
+		 value, (1ULL << value) / 1024ULL);
+	  return false;
+	}
+}
+
+  return true;
+}
+
+/* Implement TARGET_OPTION_DEFAULT_PARAMS.  */
+
+static void
+aarch64_option_default_params (void)
+{
+  /* We assume the guard page is 64k.  */
+  int index = (int) PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE;
+  set_default_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE,
+			   DEFAULT_STK_CLASH_GUARD_SIZE == 0
+			 ? 16 : DEFAULT_STK_CLASH_GUARD_SIZE);
+
+  int guard_size
+= default_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+
+  /* Set the interval parameter to be the same as the guard size.  This way the
+ mid-end code does the right thing for us.  */
+  set_default_param_value (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL,
+			   guard_size);
+
+  /* Validate the options.  */
+  aarch64_option_validate_param (guard_size, 

Re: [PATCH] convert braced initializers to strings (PR 71625)

2018-08-07 Thread Jason Merrill
On Wed, Aug 1, 2018 at 12:49 AM, Martin Sebor  wrote:
> On 07/31/2018 07:38 AM, Jason Merrill wrote:
>>
>> On Tue, Jul 31, 2018 at 9:51 AM, Martin Sebor  wrote:
>>>
>>> The middle-end contains code to determine the lengths of constant
>>> character arrays initialized by string literals.  The code is used
>>> in a number of optimizations and warnings.
>>>
>>> However, the code is unable to deal with constant arrays initialized
>>> using the braced initializer syntax, as in
>>>
>>>   const char a[] = { '1', '2', '\0' };
>>>
>>> The attached patch extends the C and C++ front-ends to convert such
>>> initializers into a STRING_CST form.
>>>
>>> The goal of this work is to both enable existing optimizations for
>>> such arrays, and to help detect bugs due to using non-nul terminated
>>> arrays where nul-terminated strings are expected.  The latter is
>>> an extension of the GCC 8 _Wstringop-overflow and
>>> -Wstringop-truncation warnings that help detect or prevent reading
>>> past the end of dynamically created character arrays.  Future work
>>> includes detecting potential past-the-end reads from uninitialized
>>> local character arrays.
>>
>>
>>>   && TYPE_MAIN_VARIANT (TREE_TYPE (valtype)) == char_type_node)
>>
>>
>> Why? Don't we want this for other character types as well?
>
> It suppresses narrowing warnings for things like
>
>   signed char a[] = { 0xff };
>
> (there are a couple of tests that exercise this).

Why is plain char different in this respect?  Presumably one of

char a[] = { -1 };
char b[] = { 0xff };

should give the same narrowing warning, depending on whether char is signed.

> At the same time, STRING_CST is supposed to be able to represent
> strings of any integer type so there should be a way to make it
> work.  On the flip side, recent discussions of changes in this
> area suggest there may be bugs in the wide character handling of
> STRING_CST so those would need to be fixed before relying on it
> for robust support.
>
> In any case, if you have a suggestion for how to make it work for
> at least the narrow character types I'll adjust the patch.

I suppose braced_list_to_string should call check_narrowing for C++.

Currently it uses tree_fits_shwi_p (signed host_wide_int) and then
stores the extracted value in a host unsigned int, which is then
converted to host char.  Does the right thing happen for -fsigned-char
or targets with a different character set?

Jason


Re: [PATCH] Make strlen range computations more conservative

2018-08-07 Thread Richard Biener
On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor  wrote:
>On 07/26/2018 02:55 AM, Richard Biener wrote:
>> On Wed, 25 Jul 2018, Martin Sebor wrote:
>>
 BUT - for the string_constant and c_strlen functions we are,
 in all cases we return something interesting, able to look
 at an initializer which then determines that type.  Hopefully.
 I think the strlen() folding code when it sets SSA ranges
 now looks at types ...?

 Consider

 struct X { int i; char c[4]; int j;};
 struct Y { char c[16]; };

 void foo (struct X *p, struct Y *q)
 {
   memcpy (p, q, sizeof (struct Y));
   if (strlen ((char *)(struct Y *)p + 4) < 7)
 abort ();
 }

 here the GIMPLE IL looks like

   const char * _1;

[local count: 1073741825]:
   _5 = MEM[(char * {ref-all})q_4(D)];
   MEM[(char * {ref-all})p_6(D)] = _5;
   _1 = p_6(D) + 4;
   _2 = __builtin_strlen (_1);

 and I guess Martin would argue that since p is of type struct X
 + 4 gets you to c[4] and thus strlen of that cannot be larger
 than 3.  But of course the middle-end doesn't work like that
 and luckily we do not try to draw such conclusions or we
 are somehow lucky that for the testcase as written above we do not
 (I'm not sure whether Martins changes in this area would derive
 such conclusions in principle).
>>>
>>> Only if the strlen argument were p->c.
>>>
 NOTE - we do not know the dynamic type here since we do not know
 the dynamic type of the memory pointed-to by q!  We can only
 derive that at q+4 there must be some object that we can
 validly call strlen on (where Martin again thinks strlen
 imposes constrains that memchr does not - sth I do not agree
 with from a QOI perspective)
>>>
>>> The dynamic type is a murky area.
>>
>> It's well-specified in the middle-end.  A store changes the
>> dynamic type of the stored-to object.  If that type is
>> compatible with the surrounding objects dynamic type that one
>> is not affected, if not then the surrounding objects dynamic
>> type becomes unspecified.  There is TYPE_TYPELESS_STORAGE
>> to somewhat control "compatibility" of subobjects.
>
>I never responded to this.  Using a dynamic (effective) type as
>you describe it would invalidate the aggressive loop optimization
>in the following:
>
>   void foo (struct X *p)
>   {
>   struct Y y = { "12345678" };
>   memcpy (p, , sizeof (struct Y));
>
>   // *p effective type is now struct Y
>
>   int n = 0;
>   while (p->c[n])
> ++n;
>
>   if (n < 7)
> abort ();
>   }
>
>GCC unconditionally aborts, just as it does with strlen(p->c).
>Why is that not wrong (in either case)?
>
>Because the code is invalid either way, for two reasons:

No, because the storage has only 4 non-null characters starting at offset 4?

>1) it accesses an object of (effective) type struct Y via
>an lvalue of type struct X (specifically via (*p).c)
>2) it relies on p->c
>
>The loop optimization relies on the exact same requirement
>as the strlen one.  Either they are both valid or neither is.
>
>Martin



[PATCH 2/2] S/390: Remove TARGET_CPU_ZARCH

2018-08-07 Thread Ilya Leoshkevich
TARGET_CPU_ZARCH allowed to distinguish between g5/g6 and newer
machines.  Since the former are now gone, we can assume that
TARGET_CPU_ZARCH is always true.  As a side-effect, branch splitting
is now completely gone.  Some parts of literal pool splitting are
also gone, but it's still there: we need to support it because
floating point and vector instructions still cannot use relative
addressing.

gcc/ChangeLog:

2018-08-07  Ilya Leoshkevich  

* config/s390/s390.c (s390_loadrelative_operand_p):
Remove TARGET_CPU_ZARCH usages.
(s390_rtx_costs): Likewise.
(s390_legitimate_constant_p): Likewise.
(s390_cannot_force_const_mem): Likewise.
(legitimate_reload_constant_p): Likewise.
(s390_preferred_reload_class): Likewise.
(legitimize_pic_address): Likewise.
(legitimize_tls_address): Likewise.
(s390_split_branches): Removed.
(s390_add_execute): Removed.
(s390_dump_pool): Remove TARGET_CPU_ZARCH usages.
(s390_mainpool_start): Likewise.
(s390_mainpool_finish): Likewise.
(s390_mainpool_cancel): Removed.
(s390_chunkify_start): Remove TARGET_CPU_ZARCH usages.
(s390_chunkify_cancel): Likewise.
(s390_return_addr_rtx): Likewise.
(s390_register_info): Remove split_branches_pending_p uages.
(s390_optimize_register_info): Likewise.
(s390_init_frame_layout): Remove TARGET_CPU_ZARCH and
split_branches_pending_p usages.
(s390_can_eliminate): Remove TARGET_CPU_ZARCH usages.
(s390_load_got): Likewise.
(s390_expand_split_stack_prologue): Likewise.
(output_asm_nops): Likewise.
(s390_function_profiler): Likewise.
(s390_emit_call): Likewise.
(s390_conditional_register_usage): Likewise.
(s390_optimize_prologue): Likewise.
(s390_reorg): Remove TARGET_CPU_ZARCH and
split_branches_pending_p usages.
(s390_option_override_internal): Remove TARGET_CPU_ZARCH
usages.
(s390_output_indirect_thunk_function): Likewise.
* config/s390/s390.h (TARGET_CPU_ZARCH): Removed.
(TARGET_CPU_ZARCH_P): Removed.
(struct machine_function): Remove split_branches_pending_p.
* config/s390/s390.md: Remove TARGET_CPU_ZARCH usages.
---
 gcc/config/s390/s390.c  | 762 
 gcc/config/s390/s390.h  |   7 -
 gcc/config/s390/s390.md | 495 +++---
 3 files changed, 123 insertions(+), 1141 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 930158931df..7e5b497ce6d 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3121,7 +3121,7 @@ s390_loadrelative_operand_p (rtx addr, rtx *symref, 
HOST_WIDE_INT *addend)
   if ((GET_CODE (addr) == SYMBOL_REF && !CONSTANT_POOL_ADDRESS_P (addr))
   || (GET_CODE (addr) == UNSPEC
  && (XINT (addr, 1) == UNSPEC_GOTENT
- || (TARGET_CPU_ZARCH && XINT (addr, 1) == UNSPEC_PLT
+ || XINT (addr, 1) == UNSPEC_PLT)))
 {
   if (symref)
*symref = addr;
@@ -3565,8 +3565,7 @@ s390_rtx_costs (rtx x, machine_mode mode, int outer_code,
  /* mulsidi case: mr, m */
  *total = s390_cost->m;
else if (GET_CODE (left) == ZERO_EXTEND
-&& GET_CODE (right) == ZERO_EXTEND
-&& TARGET_CPU_ZARCH)
+&& GET_CODE (right) == ZERO_EXTEND)
  /* umulsidi case: ml, mlr */
  *total = s390_cost->ml;
else
@@ -3874,7 +3873,7 @@ s390_legitimate_constant_p (machine_mode mode, rtx op)
 return 1;
 
   /* Accept immediate LARL operands.  */
-  if (TARGET_CPU_ZARCH && larl_operand (op, mode))
+  if (larl_operand (op, mode))
 return 1;
 
   /* Thread-local symbols are never legal constants.  This is
@@ -3948,8 +3947,6 @@ s390_cannot_force_const_mem (machine_mode mode, rtx x)
/* If the literal pool shares the code section, be put
   execute template placeholders into the pool as well.  */
case UNSPEC_INSN:
- return TARGET_CPU_ZARCH;
-
default:
  return true;
}
@@ -3995,8 +3992,7 @@ legitimate_reload_constant_p (rtx op)
 return true;
 
   /* Accept larl operands.  */
-  if (TARGET_CPU_ZARCH
-  && larl_operand (op, VOIDmode))
+  if (larl_operand (op, VOIDmode))
 return true;
 
   /* Accept floating-point zero operands that fit into a single GPR.  */
@@ -4103,8 +4099,7 @@ s390_preferred_reload_class (rtx op, reg_class_t rclass)
   handled via secondary reload but this does not happen if
   they are used as literal pool slot replacement in reload
   inheritance (see emit_input_reload_insns).  */
-   if (TARGET_CPU_ZARCH
-   && GET_CODE (XEXP (op, 0)) == PLUS
+   if (GET_CODE (XEXP (op, 0)) == PLUS
&& GET_CODE (XEXP (XEXP(op, 0), 0)) == 

[PATCH 1/2] S/390: Remove support for g5 and g6 machines

2018-08-07 Thread Ilya Leoshkevich
g5 and g6 were deprecated since gcc 6.1.0 (commit 3bd8520f).

gcc/ChangeLog:

2018-08-02  Ilya Leoshkevich  

* common/config/s390/s390-common.c (processor_flags_table):
Remove flags.
* config.gcc: Remove with_arch/with_tune support.
* config/s390/2064.md: Remove cpu attribute comparisons.
* config/s390/driver-native.c (s390_host_detect_local_cpu):
Remove MTN.
* config/s390/linux.h (ASM_SPEC):
Remove -march support.
* config/s390/s390-c.c (s390_cpu_cpp_builtins_internal):
Use a table to get an arch level.
* config/s390/s390-opts.h (enum processor_type):
Remove enum values.
* config/s390/s390.c
(processor_table): Remove entries, add arch_level values.
(s390_issue_rate): Remove cases.
(s390_option_override): Adjust
s390_option_override_internal() call.
(s390_option_override_internal): Remove deprecation warning.
(s390_valid_target_attribute_tree): Adjust
s390_option_override_internal() call.
* config/s390/s390.h (struct s390_processor):
Share with s390-c.c, add arch_level field.
* config/s390/s390.md:
Remove occurrences in cpu attribute.
* config/s390/s390.opt: Remove -march/-mtune support.
* config/s390/tpf.h (ASM_SPEC): Remove -march support.
* doc/invoke.texi: Remove deprecation warning.

gcc/testsuite/ChangeLog:

2018-08-02  Ilya Leoshkevich  

* gcc.target/s390/hotpatch-8.c: Remove.
* gcc.target/s390/hotpatch-9.c: Remove.
* gcc.target/s390/mnop-mcount-m31-fpic.c: Remove.
* gcc.target/s390/mnop-mcount-m31.c: Remove.
---
 gcc/common/config/s390/s390-common.c  |  2 -
 gcc/config.gcc|  2 +-
 gcc/config/s390/2064.md   | 22 +++---
 gcc/config/s390/driver-native.c   |  2 -
 gcc/config/s390/linux.h   |  1 -
 gcc/config/s390/s390-c.c  |  9 +--
 gcc/config/s390/s390-opts.h   | 10 +--
 gcc/config/s390/s390.c| 73 ---
 gcc/config/s390/s390.h| 13 
 gcc/config/s390/s390.md   |  9 +--
 gcc/config/s390/s390.opt  |  9 ---
 gcc/config/s390/tpf.h |  1 -
 gcc/doc/invoke.texi   |  3 +-
 gcc/testsuite/gcc.target/s390/hotpatch-8.c| 20 -
 gcc/testsuite/gcc.target/s390/hotpatch-9.c| 19 -
 .../gcc.target/s390/mnop-mcount-m31-fpic.c|  8 --
 .../gcc.target/s390/mnop-mcount-m31.c |  8 --
 17 files changed, 48 insertions(+), 163 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-8.c
 delete mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-9.c
 delete mode 100644 gcc/testsuite/gcc.target/s390/mnop-mcount-m31-fpic.c
 delete mode 100644 gcc/testsuite/gcc.target/s390/mnop-mcount-m31.c

diff --git a/gcc/common/config/s390/s390-common.c 
b/gcc/common/config/s390/s390-common.c
index d38e47f3b99..a56443c0339 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -29,8 +29,6 @@ along with GCC; see the file COPYING3.  If not see
 
 EXPORTED_CONST int processor_flags_table[] =
   {
-/* g5 */ PF_IEEE_FLOAT,
-/* g6 */ PF_IEEE_FLOAT,
 /* z900 */   PF_IEEE_FLOAT | PF_ZARCH,
 /* z990 */   PF_IEEE_FLOAT | PF_ZARCH | PF_LONG_DISPLACEMENT,
 /* z9-109 */ PF_IEEE_FLOAT | PF_ZARCH | PF_LONG_DISPLACEMENT
diff --git a/gcc/config.gcc b/gcc/config.gcc
index b17fdbad1e5..f777a28e3ad 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4574,7 +4574,7 @@ case "${target}" in
for which in arch tune; do
eval "val=\$with_$which"
case ${val} in
-   "" | native | g5 | g6 | z900 | z990 | z9-109 | z9-ec | 
z10 | z196 | zEC12 | z13 | z14 | arch3 | arch5 | arch6 | arch7 | arch8 | arch9 
| arch10 | arch11 | arch12)
+   "" | native | z900 | z990 | z9-109 | z9-ec | z10 | z196 
| zEC12 | z13 | z14 | arch5 | arch6 | arch7 | arch8 | arch9 | arch10 | arch11 | 
arch12)
# OK
;;
*)
diff --git a/gcc/config/s390/2064.md b/gcc/config/s390/2064.md
index 9c9d62bd8a5..4a73446ce37 100644
--- a/gcc/config/s390/2064.md
+++ b/gcc/config/s390/2064.md
@@ -39,63 +39,61 @@
 ;;  |
 ;;  wr
 
-;; This scheduler description is also used for the g5 and g6.
-
 (define_automaton "z_ipu")
 (define_cpu_unit "z_e1"   "z_ipu")
 (define_cpu_unit "z_wr"   "z_ipu")
 
 
 (define_insn_reservation "z_la" 1
-  (and (eq_attr "cpu" "z900,g5,g6")
+  (and (eq_attr "cpu" "z900")
(eq_attr "type" "la"))
   "z_e1,z_wr")
 
 (define_insn_reservation "z_larl" 1
-  (and (eq_attr "cpu" "z900,g5,g6")
+  (and (eq_attr "cpu" "z900")
(eq_attr "type" 

[PATCH 0/2] S/390: Remove deprecated machines

2018-08-07 Thread Ilya Leoshkevich
Remove g5 and g6 machines as well as the corresponding lower-level
features.




[PING 3][PATCH] [v4][aarch64] Avoid tag collisions for loads falkor

2018-08-07 Thread Siddhesh Poyarekar

Hello,

Ping!

Siddhesh

On 07/24/2018 12:37 PM, Siddhesh Poyarekar wrote:

Hi,

This is a rewrite of the tag collision avoidance patch that Kugan had
written as a machine reorg pass back in February.

The falkor hardware prefetching system uses a combination of the
source, destination and offset to decide which prefetcher unit to
train with the load.  This is great when loads in a loop are
sequential but sub-optimal if there are unrelated loads in a loop that
tag to the same prefetcher unit.

This pass attempts to rename the desination register of such colliding
loads using routines available in regrename.c so that their tags do
not collide.  This shows some performance gains with mcf and xalancbmk
(~5% each) and will be tweaked further.  The pass is placed near the
fag end of the pass list so that subsequent passes don't inadvertantly
end up undoing the renames.

A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
did not introduce any new regressions.  I also did a make-check with
-mcpu=falkor to ensure that there were no regressions.  The couple of
regressions I found were target-specific and were related to scheduling
and cost differences and are not correctness issues.

Changes from v3:
- Avoid renaming argument/return registers and registers that have a
   specific architectural meaning, i.e. stack pointer, frame pointer,
   etc.  Try renaming their aliases instead.

Changes from v2:
- Ignore SVE instead of asserting that falkor does not support sve

Changes from v1:

- Fixed up issues pointed out by Kyrill
- Avoid renaming R0/V0 since they could be return values
- Fixed minor formatting issues.

2018-07-02  Siddhesh Poyarekar  
Kugan Vivekanandarajah  

* config/aarch64/falkor-tag-collision-avoidance.c: New file.
* config.gcc (extra_objs): Build it.
* config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
Likewise.
* config/aarch64/aarch64-passes.def
(pass_tag_collision_avoidance): New pass.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
(aarch64_classify_address): Remove static qualifier.
(aarch64_address_info, aarch64_address_type): Move to...
* config/aarch64/aarch64-protos.h: ... here.
(make_pass_tag_collision_avoidance): New function.
* config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
New tuning flag.

CC: james.greenha...@arm.com
CC: kyrylo.tkac...@foss.arm.com
---
  gcc/config.gcc|   2 +-
  gcc/config/aarch64/aarch64-passes.def |   1 +
  gcc/config/aarch64/aarch64-protos.h   |  49 +
  gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
  gcc/config/aarch64/aarch64.c  |  48 +-
  .../aarch64/falkor-tag-collision-avoidance.c  | 881 ++
  gcc/config/aarch64/t-aarch64  |   9 +
  7 files changed, 946 insertions(+), 46 deletions(-)
  create mode 100644 gcc/config/aarch64/falkor-tag-collision-avoidance.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 78e84c2b864..8f5e458e8a6 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -304,7 +304,7 @@ aarch64*-*-*)
extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
c_target_objs="aarch64-c.o"
cxx_target_objs="aarch64-c.o"
-   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o 
falkor-tag-collision-avoidance.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 87747b420b0..f61a8870aa1 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,3 +19,4 @@
 .  */
  
  INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);

+INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index af5db9c5953..647ad7a9c37 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -288,6 +288,49 @@ struct tune_params
const struct cpu_prefetch_tune *prefetch;
  };
  
+/* Classifies an address.

+
+   ADDRESS_REG_IMM
+   A simple base register plus immediate offset.
+
+   ADDRESS_REG_WB
+   A base register indexed by immediate offset with writeback.
+
+   ADDRESS_REG_REG
+   A base register indexed by (optionally scaled) register.
+
+   ADDRESS_REG_UXTW
+   A base register indexed by (optionally scaled) zero-extended register.
+
+   ADDRESS_REG_SXTW
+   A base register indexed by (optionally scaled) sign-extended register.
+
+   ADDRESS_LO_SUM
+   A LO_SUM rtx with a base register and "LO12" symbol relocation.
+
+   

Re: [Patch, avr, PR85624] - Fix ICE when initializing 128-byte aligned array

2018-08-07 Thread Senthil Kumar Selvaraj


Jeff Law writes:

> On 08/06/2018 04:53 AM, Senthil Kumar Selvaraj wrote:
>> Ping!
>> 
>> Regards
>> Senthil
>> 
>> Senthil Kumar Selvaraj writes:
>> 
>>> Hi,
>>>
>>> The below patch fixes an ICE for the avr target when the setmemhi
>>> expander is involved.
>>>
>>> The setmemhi expander generated RTL ends up as an unrecognized insn
>>> if the alignment of the destination exceeds that of a QI
>>> mode const_int (127), AND the number of bytes to set fits in a QI
>>> mode const_int. The second condition prevents *clrmemhi from matching,
>>> and *clrmemqi does not match because it expects operand 3 (the alignment
>>> const_int rtx) to be QI mode, and a value of 128 or greater does not fit.
>>>   
>>> The patch fixes this by changing the *clrmemqi pattern to match a HI
>>> mode const_int, and also adds a testcase.
>>>
>>> Regression test showed no new failures, ok to commit to trunk?
>>>
>>> Regards
>>> Senthil
>>>
>>> gcc/ChangeLog:
>>> 
>>> 2018-07-18  Senthil Kumar Selvaraj  
>>> 
>>> PR target/85624
>>> * config/avr/avr.md (*clrmemqi): Change mode of operands[2]
>>> from QI to HI.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> 2018-07-18  Senthil Kumar Selvaraj  
>>> 
>>> PR target/85624
>>> * gcc.target/avr/pr85624.c: New test.
> Given there's also pattern clrmemhi it feels like you're papering over a
> bug elsewhere, possibly in the expanders.

clrmemhi and clrmemqi differ on the mode of the register used to hold
the number of bytes to clear. The setmemhi expander picks a QI or HI
mode reg depending on the width of the size operand, and the
clrmem{qi,hi} match on that.

The operand whose mode I modified represents the alignment of the
destination, and isn't actually used in the assembler template.

Regards
Senthil
>
> Ultimately I'll leave the decision here to the AVR maintainers through.
>
> jeff



Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-07 Thread Tom de Vries
On 08/01/2018 12:18 PM, Tom de Vries wrote:

> I think we need to add and handle:
> ...
>   CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
> ...
> 

I realized that the patch I posted introducing CUDA_ONE_CALL_MAYBE_NULL
was incomplete, and needed to use the weak attribute in case of linking
against a concrete libcuda.so.

So, I've now committed a patch implementing just CUDA_ONE_CALL_MAYBE_NULL:
"[libgomp, nvptx] Handle CUDA_ONE_CALL_MAYBE_NULL" @
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00447.html . You can use
"CUDA_CALL_EXISTS (cuOccupancyMaxPotentialBlockSize)" to test for
existence of the function in the cuda driver API.

> The patch doesn't build in a setup with
> --enable-offload-targets=nvptx-none and without cuda, that enables usage
> of plugin/cuda/cuda.h:
> ...
> /data/offload-nvptx/src/libgomp/plugin/plugin-nvptx.c:98:16: error:
> ‘cuOccupancyMaxPotentialBlockSize’ undeclared here (not in a function);
> did you mean ‘cuOccupancyMaxPotentialBlockSizeWithFlags’?
>  CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize) \
> ...
> 

I've committed a patch "[libgomp, nvptx, --without-cuda-driver] Don't
use system cuda driver" @
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00348.html .

Using --without-cuda-driver should make it easy to build using the
dlopen interface without having to de-install the system libcuda.so.

Thanks,
- Tom