Re: [PATCH] Add --with-diagnostics-urls configuration option and GCC_URLS/TERM_URLS env var

2020-01-31 Thread Bernd Edlinger


On 1/31/20 11:06 PM, David Malcolm wrote:
> On Fri, 2020-01-31 at 16:59 +, Bernd Edlinger wrote:
>> Hi,
>>
>> this is patch is heavily based on David's original patch here:
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01409.html
>>
>> and addresses Jakub's review comments here:
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01412.html
>>
>> So I hope you don't mind, when I pick up this patch since there
>> was not much activity recently on this issue, so I assumed you
>> would appreciate some help.
> 
> Thanks Bernd; sorry, I got distracted by analyzer bug-fixing.  It's
> appreciated.
> 
>> I will try to improve the patch a bit, and hope you are gonna like
>> it. I agree that this feature is fine, and should be enabled by
>> default, and just if it is positively clear that it won't work,
>> disabled in the auto mode.
>>
>> Also as requested by Jakub this tries to be more compatible to
>> GCC_COLORS define, and adds the capability to switch between ST
>> and BEL string termination and also have a way to disable urls
>> even with -fdiagnostics-urls=always (like GCC_COLORS= disables
>> colors).
>>
>> In addition to that I propose to use GCC_URLS and if that
>> is not defined use TERM_URLS to control that feature for
>> all applications willing to support that.
> 
> I think I like the overall idea.
> 

Thanks!

>> Since I have seen much garbage from the URLs in the xfce4-terminal
>> 0.6.3
>> admittedly an old Ubuntu installation, but still in LTS status,
>> and no attempt from the xfc4-terminal to _ever_ implement URLs, I
>> would
>> like to detect the xfce4-terminal, and use that in auto mode
>> to switch off the URLs feature, since in best case the URLs will
>> just be ignored, but can and will often create garbage on the screen.
>>
>> Likewise on MinGW, since the windows 10 cmd prompt does at best
>> ignore the URLs, but windows terminal and windows 7 cmd prompt
>> print garbage when URLs are output.
> 
> That sounds reasonable, but we should document those exclusions, I
> think.
> 

done.

> I think the ChangeLog should also refer to "PR other/93168":
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93168
> 

done.

>> diff --git a/gcc/diagnostic-color.c b/gcc/diagnostic-color.c
>> index d554795..ad84d1f 100644
>> --- a/gcc/diagnostic-color.c
>> +++ b/gcc/diagnostic-color.c
>> @@ -239,6 +239,56 @@ colorize_init (diagnostic_color_rule_t rule)
>>  }
>>  }
>>  
>> +bool diagnostic_urls_use_st = false;
> 
> I don't like global variables (a pet peeve of mine); can you make this
> be a field of the pretty_printer instead?
> 
> Even better: convert pretty_printer::show_urls from a bool to a new
> enum; something like:
> 
> enum diagnostic_url_format
> {
>   URL_FORMAT_NONE,
>   URL_FORMAT_ST,
>   URL_FORMAT_BEL  
> };
> 
> with suitable leading comments (probably a good place to summarize some
> of the compatibility info within the source).
> 
> Then we could verify the differences between ST and BEL in the
> selftests, and also not have the selftests be affected by env vars.
> 

Okay, I have to include diagnostic-url.h from pretty-print.h in that case,
since both files are included from the generated options.c in alphabetic order 
/-).
I was first a bit reluctant to do so, since we usually don't have recursive
header files but probably that is okay then, alternatively I could use a 
bitfield
in pretty-print.h and put defines for URL_FORMAT_xxx in diagnostic-url.h.

> 
>> +/* Return true if we should use urls when in always mode, false otherwise.
>> +   Additionally initialize diagnostic_urls_use_st to true if ST escapes
>> +   should be used and false for BEL escapes.  */
> 
> Maybe have this return that 3-way enum instead?
> 

okay, done.

>> +
>> +static bool
>> +parse_gcc_urls()
>> +{
>> +  const char *p;
>> +
>> +  p = getenv ("GCC_URLS"); /* Plural! */
>> +  if (p == NULL)
>> +p = getenv ("TERM_URLS");
>> +
>> +  if (p == NULL)
>> +return true;
>> +  if (*p == '\0')
>> +return false;
>> +
>> +  if (!strcmp (p, "no"))
>> +return false;
>> +  if (!strcmp (p, "st"))
>> +diagnostic_urls_use_st = true;
>> +  else if (!strcmp (p, "bel"))
>> +diagnostic_urls_use_st = false;
>> +  return true;
>> +}
>> +
>> +/* Return true if we should use urls when in auto mode, false otherwise.  */
>> +
>> +static bool
>> +auto_enable_urls ()
>> +{
>> +#ifdef __MINGW32__
>> +  return false;
>> +#else
>> +  const char *p;
>> +
>> +  if (!should_colorize ())
>> +return false;
>> +  p = getenv ("COLORTERM");
>> +  if (p == NULL)
>> +return true;
>> +  if (!strcmp (p, "xfce4-terminal"))
>> +return false;
>> +  return true;
>> +#endif
>> +}
> 
> I think this logic warrants a comment for each of the exclusions: why
> are we excluding them?  I'd prefer to capture that in the source,
> rather than just in the mailing list.
> 

done.

>>  /* Determine if URLs should be enabled, based on RULE.
>> This reuses the logic for colorization.  */
>>  
>> @@ -250,9 +300,12 @@ diagnostic_urls_

Re: [PATCH] Add --with-diagnostics-urls configuration option and GCC_URLS/TERM_URLS env var

2020-01-31 Thread Bernd Edlinger
Hi Segher,

On 2/1/20 2:32 AM, Segher Boessenkool wrote:
> On Fri, Jan 31, 2020 at 11:38:04PM +, Bernd Edlinger wrote:
>> On 1/31/20 11:54 PM, Segher Boessenkool wrote:
>>> about most, which caused me to open PR93168, is TERM=screen (which is
>>> what tmux uses), so at least exclude that one?  And doing all this
>>
>> Definitely, if the situation with tmux is like xfce4-terminal (reportedly
>> the 0.8 version switched to a fixed VTE, which makes the URLs invisible,
>> but the URL feature request is pending sine 2017, with no activity 
>> whatsoever),
>> then detecting that and disabling the URLs until they finally implement
>> the URLs is straight forward.  I can add that to my patch if you confirm,
>> the right detection logic:
> 
> The situation is that it is a terminal multiplexer; you can connect any
> terminal to it, and swap those out, etc.  You might have one that has
> support for the url thing at one time, but when you look at that session
> from a different machine (from your phone, say), not anymore (or the
> other way around).  You can also connect multiple actual terminals at
> the same time.
> 
> In short, even *if* you could detect whether the terminal supports this
> url thing (and you cannot), you cannot detect whether it will support it
> two seconds from now, and even if the url thing you already displayed
> will misbehave *in the future*!
> 

Ah, okay, this is a totally wrong assumption then.

So if I understand you right, I should add a check for
tmux in should_colorize, where the TERM=dumb thing is,
and add TERM=screen, and TMUX=anything, to switch of
auto-color off, which will take down URLs at the same time?

Bernd.

>> >From 
>> >https://unix.stackexchange.com/questions/10689/how-can-i-tell-if-im-in-a-tmux-session-from-a-bash-script
>> I read the safest way to find out if you are in a tmux session, is
>> to look for TERM=screen and TMUX is set.
> 
> Yes, but the same considerations are true for all screen-like programs.
> I happen to like tmux best, some people swear by old-fashioned screen,
> and there are other alternatives too I think.
> 
>> Is that the case for your environment?
> 
> It's true on at least six machines I tested just now, but all of those
> are pretty similar anyway, so that doesn't say much.
> 
>> Note that it is well possible that this environment values are
>> not preserved in a ssh session, but nobody is perfect.
> 
> The session you are looking at is not the ssh session; it will keep
> running even if no actual terminal is connected, that's pretty much
> the point of running it, in many cases :-)
> 
 Since I have seen much garbage from the URLs in the xfce4-terminal 0.6.3
 admittedly an old Ubuntu installation, but still in LTS status,
 and no attempt from the xfc4-terminal to _ever_ implement URLs,
>>>
>>> This is true for *most* terminal emulators.
>>
>> Sadly, I would not do this if there is a chance that someone looses a
>> working feature, so I was told that a more aggressive approach would
>> "be straight harmful to the current state of things
>>  (i.e. hyperlinks at least not causing any problem, plus working
>>  out of the box wherever supported, for the vast majority of people)"
>> https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#gistcomment-3160953
> 
> I have seen not a *single* terminal emulator where this works 100%
> correctly.  I have seen many where it screws up display spectacularly
> (or sound even :-/ )
> 
>> So I would try to detect known terminals which are so buggy that they
>> print garbage instead of silently ignoring the URL escapes AND
>> which may or may not have fixed the visual glitches but have not plan
>> to implement URLs at all.
> 
> That is not buggy at all.  It is a bug to output the wrong escape codes,
> instead.
> 
>>> I have nothing against this feature, I just wish it wouldn't annoy me
>>> on pretty much every system I use.  None of which use "TERM=dumb", but
>>> none of which use "TERM=fancy-pants-term-v42" either.  (Did anyone ever
>>> use "dumb", anyway?)
>>
>> The dumb thing was old code, I only took the freedom to document it ;-)
> 
> I know :-)  I mean, does anyone have "TERM=dumb" in the environment?
> "TERM=ansi", sure, and I've used "TERM=vt220" many times, too, but I've
> never seen "TERM=dumb" used.
> 
> 
> Segher
> 


Re: [PATCH] Add --with-diagnostics-urls configuration option and GCC_URLS/TERM_URLS env var

2020-01-31 Thread Segher Boessenkool
On Fri, Jan 31, 2020 at 11:38:04PM +, Bernd Edlinger wrote:
> On 1/31/20 11:54 PM, Segher Boessenkool wrote:
> > about most, which caused me to open PR93168, is TERM=screen (which is
> > what tmux uses), so at least exclude that one?  And doing all this
> 
> Definitely, if the situation with tmux is like xfce4-terminal (reportedly
> the 0.8 version switched to a fixed VTE, which makes the URLs invisible,
> but the URL feature request is pending sine 2017, with no activity 
> whatsoever),
> then detecting that and disabling the URLs until they finally implement
> the URLs is straight forward.  I can add that to my patch if you confirm,
> the right detection logic:

The situation is that it is a terminal multiplexer; you can connect any
terminal to it, and swap those out, etc.  You might have one that has
support for the url thing at one time, but when you look at that session
from a different machine (from your phone, say), not anymore (or the
other way around).  You can also connect multiple actual terminals at
the same time.

In short, even *if* you could detect whether the terminal supports this
url thing (and you cannot), you cannot detect whether it will support it
two seconds from now, and even if the url thing you already displayed
will misbehave *in the future*!

> >From 
> >https://unix.stackexchange.com/questions/10689/how-can-i-tell-if-im-in-a-tmux-session-from-a-bash-script
> I read the safest way to find out if you are in a tmux session, is
> to look for TERM=screen and TMUX is set.

Yes, but the same considerations are true for all screen-like programs.
I happen to like tmux best, some people swear by old-fashioned screen,
and there are other alternatives too I think.

> Is that the case for your environment?

It's true on at least six machines I tested just now, but all of those
are pretty similar anyway, so that doesn't say much.

> Note that it is well possible that this environment values are
> not preserved in a ssh session, but nobody is perfect.

The session you are looking at is not the ssh session; it will keep
running even if no actual terminal is connected, that's pretty much
the point of running it, in many cases :-)

> >> Since I have seen much garbage from the URLs in the xfce4-terminal 0.6.3
> >> admittedly an old Ubuntu installation, but still in LTS status,
> >> and no attempt from the xfc4-terminal to _ever_ implement URLs,
> > 
> > This is true for *most* terminal emulators.
> 
> Sadly, I would not do this if there is a chance that someone looses a
> working feature, so I was told that a more aggressive approach would
> "be straight harmful to the current state of things
>  (i.e. hyperlinks at least not causing any problem, plus working
>  out of the box wherever supported, for the vast majority of people)"
> https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#gistcomment-3160953

I have seen not a *single* terminal emulator where this works 100%
correctly.  I have seen many where it screws up display spectacularly
(or sound even :-/ )

> So I would try to detect known terminals which are so buggy that they
> print garbage instead of silently ignoring the URL escapes AND
> which may or may not have fixed the visual glitches but have not plan
> to implement URLs at all.

That is not buggy at all.  It is a bug to output the wrong escape codes,
instead.

> > I have nothing against this feature, I just wish it wouldn't annoy me
> > on pretty much every system I use.  None of which use "TERM=dumb", but
> > none of which use "TERM=fancy-pants-term-v42" either.  (Did anyone ever
> > use "dumb", anyway?)
> 
> The dumb thing was old code, I only took the freedom to document it ;-)

I know :-)  I mean, does anyone have "TERM=dumb" in the environment?
"TERM=ansi", sure, and I've used "TERM=vt220" many times, too, but I've
never seen "TERM=dumb" used.


Segher


Re: [PATCH] V12 patch #5 of 14, Make -mpcrel default for -mcpu=future on little endian Linux 64-bit systems

2020-01-31 Thread Segher Boessenkool
Hi!

On Thu, Jan 09, 2020 at 07:40:08PM -0500, Michael Meissner wrote:
>   * config/rs6000/linux64.h (PREFIXED_ADDR_SUPPORTED_BY_OS): Set to
>   1 to enable prefixed addressing if -mcpu=future.
>   (PCREL_SUPPORTED_BY_OS): Set to 1 to enable PC-relative addressing
>   if -mcpu=future.
>   * config/rs6000/rs6000-cpus.h (ISA_FUTURE_MASKS_SERVER): Do not
>   enable -mprefixed-addr or -mpcrel by default.

I understand why this is needed for pcrel (or useful at least), but why
for prefixed addressing in general as well?  What OS support is needed
for that?

Put another way, is this just carefulness, or do you run into actual
problems without it?

> +/* Enable support for pc-relative and numeric prefixed addressing on the
> +   'future' system.  */
> +#undef  PREFIXED_ADDR_SUPPORTED_BY_OS
> +#define PREFIXED_ADDR_SUPPORTED_BY_OS1
> +
> +#undef  PCREL_SUPPORTED_BY_OS
> +#define PCREL_SUPPORTED_BY_OS1

"Numeric prefixed addressing"?  What's that?  Just "and other prefixed
addressing", maybe?

(Is it useful to have those two separate at all, btw?  Now, that is while
we are still developing the code, but also in the future?)

> +/* Addressing related flags on a future processor.  These are options that 
> need
> +   to be cleared if the target OS is not capable of supporting prefixed
> +   addressing at all (such as 32-bit mode or if the object file format is not
> +   ELF v2).  */

Ah.  If we are missing the needed relocations (or other as/ld support).
So it is not about OS really, missing toolchain support instead?

> +  /* Only ELFv2 currently supports prefixed/pcrel addressing.  */
> +  else if (rs6000_current_abi != ABI_ELFv2)
> + {
> +   if (TARGET_PCREL && explicit_pcrel)
> + error ("%qs requires %qs", "-mpcrel", "-mabi=elfv2");
> +
> +   else if (TARGET_PREFIXED_ADDR && explicit_prefixed)
> + error ("%qs requires %qs", "-mprefixed-addr", "-mabi=elfv2");

It would be good if the error messages also said "currently" somehow (it
is not an actual limitation, it's just a matter of code).  "Is only
supported with -mabi=elfv2", perhaps?


Segher


patch, fortan] PR83113 Bogus "duplicate allocatable attribute" error for submodule character function

2020-01-31 Thread Andrew Benson
I've attached on updated patch for PR83113. The now removes the ICE when a 
duplicate DIMENSION attribute is specified in a submodule function.  The patch 
reg tests cleanly.

OK to commit?

Note that this patch doesn't check that the attributes of duplicate 
declarations in a submodule function are valid or consistent with those 
declared in the module. For example both:

module mm
  implicit none
  interface
 module function c()
   integer, dimension(2)  :: c !! Dimension 2 here
 end function c
  end interface
end module mm

submodule (mm) oo
  implicit none
contains
  module function c()
integer, dimension(3)  :: c  !! Inconsistent dimension 3 here
  end function c
end submodule oo


and


module mm
  implicit none
  interface
 module function c()
   integer, dimension(2)  :: c !! Dimension 2 here
 end function c
  end interface
end module mm

submodule (mm) oo
  implicit none
contains
  module function c()
integer, dimension(:)  :: c !! Invalid cannot have a deferred shape
  end function c
end submodule oo


compile successfully, but should be rejected. Presumably we should add some 
check that the attributes of the declaration in the submodule match those in 
the module?

If so, I can go ahead and open a PR for this problem.

-Andrew

On Friday, January 24, 2020 10:54:24 AM PST Andrew Benson wrote:
> Below is a partial patch for PR 83113
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83113
> 
> It simply circumvents the check for duplicate attributes when looking at
> declarations in a module procedure during compilation of a submodule.
> 
> As it stands, the patch handles the cases of "allocatable", "dimension", and
> "pointer" attributes (corresponding to the two test cases in the PR plus a
> case that showed up in my own code). There are other attributes which
> should be handled similarly but before I make those changes I wanted to
> seek some opinions on whether this is the correct/best way to fix this
> issue.
> 
> The patch regression tests cleanly - however,  while the patch solves the
> "duplicate attribute" problem for the test case in comment #4:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83113#c4
> 
> that test case then ICEs with:
> 
> $ gfortran -c pr83113_2.F90 -o pr83113_2.o
> f951: internal compiler error: in gfc_set_array_spec, at fortran/array.c:879
> 0x5fd22f gfc_set_array_spec(gfc_symbol*, gfc_array_spec*, locus*)
> ../../gcc-git/gcc/fortran/array.c:879
> 0x7e868e build_sym
> ../../gcc-git/gcc/fortran/decl.c:1670
> 0x7f0ada variable_decl
> ../../gcc-git/gcc/fortran/decl.c:2760
> 0x7f0ada gfc_match_data_decl()
> ../../gcc-git/gcc/fortran/decl.c:6120
> 0x85b66d match_word
> ../../gcc-git/gcc/fortran/parse.c:65
> 0x85b66d decode_statement
> ../../gcc-git/gcc/fortran/parse.c:376
> 0x861094 next_free
> ../../gcc-git/gcc/fortran/parse.c:1279
> 0x861094 next_statement
> ../../gcc-git/gcc/fortran/parse.c:1511
> 0x8638ac parse_spec
> ../../gcc-git/gcc/fortran/parse.c:3738
> 0x86591c parse_progunit
> ../../gcc-git/gcc/fortran/parse.c:5851
> 0x865df9 parse_contained
> ../../gcc-git/gcc/fortran/parse.c:5752
> 0x866b5e parse_module
> ../../gcc-git/gcc/fortran/parse.c:6124
> 0x86709c gfc_parse_file()
> ../../gcc-git/gcc/fortran/parse.c:6427
> 0x8b756f gfc_be_parse_file
> ../../gcc-git/gcc/fortran/f95-lang.c:210
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> 
> 
> This looks like a separate problem to me, but looking quickly at where the
> ICE occurs it seems to be related to coarrays of which my understanding is
> very limited - so any insights on this would be welcome.
> 
> Patch is appended below.
> 
> -Andrew


-- 

* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html

* Galacticus: https://github.com/galacticusorg/galacticus
diff --git a/gcc/fortran/array.c b/gcc/fortran/array.c
index c873cf2..ff1ff72 100644
--- a/gcc/fortran/array.c
+++ b/gcc/fortran/array.c
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "options.h"
 #include "gfortran.h"
+#include "parse.h"
 #include "match.h"
 #include "constructor.h"
 
@@ -835,6 +836,13 @@ gfc_set_array_spec (gfc_symbol *sym, gfc_array_spec *as, locus *error_loc)
   if (as == NULL)
 return true;
 
+  /* If the symbol corresponds to a submodule module procedure the array spec is
+ already set, so do not attempt to set it again here. */
+  if (gfc_state_stack->previous && gfc_state_stack->previous->previous
+  && gfc_state_stack->previous->previous->state == COMP_SUBMODULE
+  && sym->attr.module_procedure)
+return true;
+  
   if (as->rank
   && !gfc_add_dimension (&sym->attr, sym->name, error_loc))
 return false;
diff --git a/gcc/fortran/symbol.c b/gcc/for

[PATCH] c++: Fix ICE on invalid alignas in a template [PR93530]

2020-01-31 Thread Marek Polacek
This fixes an ICE taking place in cp_default_conversion because we got
a SCOPE_REF that doesn't have a type and so checking
INTEGRAL_OR_UNSCOPED_ENUMERATION_TYPE_P (TREE_TYPE (exp)) will crash.
This happens since the recent Joseph's change in decl_attributes whereby
we don't skip C++11 attributes on types.

[dcl.align] is clear that alignas applied to a function is ill-formed.
That should be fixed, and we have PR90847 for that.  But I think a more
appropriate fix at this stage would be the following: in a template we
want to splice dependent attributes and save them for later, and by
doing so avoid this crash.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

PR c++/93530 - ICE on invalid alignas in a template.
* decl.c (grokdeclarator): Call cplus_decl_attributes instead of
decl_attributes.

* g++.dg/cpp0x/alignas18.C: New test.
---
 gcc/cp/decl.c  | 2 +-
 gcc/testsuite/g++.dg/cpp0x/alignas18.C | 8 
 2 files changed, 9 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alignas18.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 6ad558eef9e..576cbd8643a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -12300,7 +12300,7 @@ grokdeclarator (const cp_declarator *declarator,
 
 The optional attribute-specifier-seq appertains to
 the function type.  */
- decl_attributes (&type, attrs, 0);
+ cplus_decl_attributes (&type, attrs, 0);
 
if (raises)
  type = build_exception_variant (type, raises);
diff --git a/gcc/testsuite/g++.dg/cpp0x/alignas18.C 
b/gcc/testsuite/g++.dg/cpp0x/alignas18.C
new file mode 100644
index 000..820bdd2d7ca
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alignas18.C
@@ -0,0 +1,8 @@
+// PR c++/93530 - ICE on invalid alignas in a template.
+// { dg-do compile { target c++11 } }
+
+template  struct S {
+  using U = S;
+  // FIXME: This is ill-formed; see PR90847.
+  void fn() alignas(U::X);
+};

base-commit: 2a07345c4f8dabc286fc470e76c53473e5bc3eb7
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



[committed] nios2: Support for GOT-relative DW_EH_PE_datarel encoding.

2020-01-31 Thread Sandra Loosemore
On nios2-linux-gnu, there has been a long-standing bug in C++ exception
handling that sometimes resulted in link errors like

../nios2-linux-gnu/bin/ld: FDE encoding in /tmp/cccfpQ2l.o(.eh_frame) prevents 
.eh_frame_hdr table being created

when building some shared libraries or PIE executables.  The root of
the problem is that GCC was incorrectly emitting an absolute encoding
in EH tables for PIC.  This patch changes it to use either
DW_EH_PE_indirect (for global) or DW_EH_PE_datarel (for local), and
fixes libgcc so it can find the address of the GOT as the base address
for DW_EH_PE_datarel.

Complicating matters somewhat, GAS was missing support for
%gotoff(symbol) relocation syntax.  I have just pushed a fix for that,
but I've added a configure check to test for presence of the binutils
support and fall back to the current absolute encoding (which works
most of the time) if it is not available.  Once the fix makes it into
an official binutils release it might be appropriate to make this
error out instead.

Since this is a wrong-code bug and affects only nios2 target, I think
this is appropriate for Stage 4.  I regression-tested on both
nios2-linux-gnu and nios2-elf, with and without the binutils support
present, before committing this.

2020-01-31  Sandra Loosemore  

gcc/
* configure.ac [nios2-*-*]: Check HAVE_AS_NIOS2_GOTOFF_RELOCATION.
* config.in: Regenerated.
* configure: Regenerated.
* config/nios2/nios2.h (ASM_PREFERRED_EH_DATA_FORMAT): Fix handling
for PIC when HAVE_AS_NIOS2_GOTOFF_RELOCATION.
(ASM_MAYBE_OUTPUT_ENCODED_ADDR_RTX): New.

gcc/testsuite/
* g++.target/nios2/hello-pie.C: New.
* g++.target/nios2/nios2.exp: New.

libgcc/
* config.host [nios2-*-linux*] (tmake_file, tm_file): Adjust.
* config/nios2-elf-lib.h: New.
* unwind-dw2-fde-dip.c (_Unwind_IteratePhdrCallback): Use existing
code for finding GOT base for nios2.
---
 gcc/ChangeLog  | 11 
 gcc/config.in  |  6 +
 gcc/config/nios2/nios2.h   | 40 --
 gcc/configure  | 37 +++
 gcc/configure.ac   | 11 
 gcc/testsuite/ChangeLog|  7 ++
 gcc/testsuite/g++.target/nios2/hello-pie.C | 14 +++
 gcc/testsuite/g++.target/nios2/nios2.exp   | 34 +
 libgcc/ChangeLog   |  9 +++
 libgcc/config.host |  3 ++-
 libgcc/config/nios2/elf-lib.h  | 24 ++
 libgcc/unwind-dw2-fde-dip.c|  2 +-
 12 files changed, 189 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/nios2/hello-pie.C
 create mode 100644 gcc/testsuite/g++.target/nios2/nios2.exp
 create mode 100644 libgcc/config/nios2/elf-lib.h

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2d1488c..b2031ee 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2020-01-31  Sandra Loosemore  
+
+   nios2: Support for GOT-relative DW_EH_PE_datarel encoding.
+
+   * configure.ac [nios2-*-*]: Check HAVE_AS_NIOS2_GOTOFF_RELOCATION.
+   * config.in: Regenerated.
+   * configure: Regenerated.
+   * config/nios2/nios2.h (ASM_PREFERRED_EH_DATA_FORMAT): Fix handling
+   for PIC when HAVE_AS_NIOS2_GOTOFF_RELOCATION.
+   (ASM_MAYBE_OUTPUT_ENCODED_ADDR_RTX): New.
+
 2020-02-01  Andrew Burgess  
 
* configure: Regenerate.
diff --git a/gcc/config.in b/gcc/config.in
index 1110492..4829286 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -571,6 +571,12 @@
 #endif
 
 
+/* Define if your assembler supports %gotoff relocation syntax. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_NIOS2_GOTOFF_RELOCATION
+#endif
+
+
 /* Define if your assembler supports the -no-mul-bug-abort option. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_NO_MUL_BUG_ABORT_OPTION
diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h
index 78b538b..e81b928 100644
--- a/gcc/config/nios2/nios2.h
+++ b/gcc/config/nios2/nios2.h
@@ -494,14 +494,40 @@ do {  
  \
 #define EH_RETURN_DATA_REGNO(N) ((N) <= (LAST_ARG_REGNO - FIRST_ARG_REGNO) \
 ? (N) + FIRST_ARG_REGNO : INVALID_REGNUM)
 
-/* Nios II has no appropriate relocations for a 32-bit PC-relative or
-   section-relative pointer encoding.  This therefore always chooses an
-   absolute representation for pointers.  An unfortunate consequence of
-   this is that ld complains about the absolute fde encoding when linking
-   with -shared or -fpie, but the warning is harmless and there seems to
-   be no good way to suppress it.  */
+/* For PIC, use indirect for global references; it'll end up using a dynamic
+   relocation, which we want to keep out of read-only EH sections.
+   For local referen

Re: [PATCH] gcc: Add new configure options to allow static libraries to be selected

2020-01-31 Thread Andrew Burgess
* Iain Sandoe  [2020-01-31 20:18:58 +]:

> Hello Andrew,
> 
> Andrew Burgess  wrote:
> 
> > Here's a cleaned up version of the previous patch I posted.  If Iain
> > reports this fixes the regressions he saw then I will push this.
> 
> I applied this to r10-6364 and tested on a bunch of Darwin platforms.
> 
> AFAICT, the configuration now reports consistent values between
> libcpp, gcc, intl, and libstdc++-v3.
> 
> Bootstrap succeeded with my “normal” defaults, and testing appears
> nominal.
> 
> Notes:
> 
> * I have only tested on Darwin, I’m assuming you have tested on other
>  platforms.
> 
> * there are now a number of interacting ICONV options, I made no
>  attempt to test multiple permutations.

Thanks, I've gone ahead and pushed this fix.  I tested this on my
local machine with no issues, and with your positive results I think I
should get this in sooner rather than later.

Once again, sorry for the breakage.

Thanks,
Andrew


[COMMITTED] c++: Fix sizeof VLA lambda capture.

2020-01-31 Thread Jason Merrill
sizeof a VLA type is not a constant in C or the GNU C++ extension, so we
need to capture the VLA even in unevaluated context.  For PR60855 we stopped
looking through a previous capture, but we also need to capture the first
time the variable is mentioned.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/86216
* semantics.c (process_outer_var_ref): Capture VLAs even in
unevaluated context.
---
 gcc/cp/semantics.c  | 11 +--
 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla5.C | 13 +
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla5.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index a489e2cf399..90f1e18e48a 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3524,8 +3524,15 @@ tree
 process_outer_var_ref (tree decl, tsubst_flags_t complain, bool odr_use)
 {
   if (cp_unevaluated_operand)
-/* It's not a use (3.2) if we're in an unevaluated context.  */
-return decl;
+{
+  tree type = TREE_TYPE (decl);
+  if (!dependent_type_p (type)
+ && variably_modified_type_p (type, NULL_TREE))
+   /* VLAs are used even in unevaluated context.  */;
+  else
+   /* It's not a use (3.2) if we're in an unevaluated context.  */
+   return decl;
+}
   if (decl == error_mark_node)
 return decl;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla5.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla5.C
new file mode 100644
index 000..f3390b2d09f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla5.C
@@ -0,0 +1,13 @@
+// PR c++/86216
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -Wno-vla }
+
+template  void b(int n, T arg) {
+  int buffer[arg];
+  int buffer2[arg][arg];
+  [&] {
+n = sizeof(buffer);
+n = sizeof(buffer2);   // { dg-bogus "sorry" "" { xfail *-*-* } }
+  }();
+}
+int main() { b(2, 3); }

base-commit: e98ebda074bf8fc5f630a93085af81f52437d851
-- 
2.18.1



[RFC PATCH] __builtin_escape/__builtin_bless

2020-01-31 Thread Uecker, Martin


Hi Richard and Joseph,

for discussion: here is my simple patch
for __builtin_escape/__builtin_bless.

Maybe it does something stupid. 


Best,
Martin


diff --git a/gcc/builtins.c b/gcc/builtins.c
index e4a8694054e..d0046135213 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6014,6 +6014,31 @@ expand_builtin_unreachable (void)
   emit_barrier ();
 }
 
+
+static rtx
+expand_builtin_escape (tree exp, rtx target)
+{
+  if (call_expr_nargs (exp) < 1)
+return const0_rtx;
+
+  target = expand_expr (CALL_EXPR_ARG (exp, 0), target, Pmode,
+   EXPAND_NORMAL);
+
+  return target;
+}
+
+static rtx
+expand_builtin_bless (tree exp, rtx target)
+{
+  if (call_expr_nargs (exp) < 1)
+return const0_rtx;
+
+  target = expand_expr (CALL_EXPR_ARG (exp, 0), target, Pmode,
+   EXPAND_NORMAL);
+
+  return target;
+}
+
 /* Expand EXP, a call to fabs, fabsf or fabsl.
Return NULL_RTX if a normal call should be emitted rather than expanding
the function inline.  If convenient, the result should be placed
@@ -8304,6 +8329,12 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
machine_mode mode,
   expand_builtin_unreachable ();
   return const0_rtx;
 
+case BUILT_IN_ESCAPE:
+  return expand_builtin_escape (exp, target);
+
+case BUILT_IN_BLESS:
+  return expand_builtin_bless (exp, target);
+
 CASE_FLT_FN (BUILT_IN_SIGNBIT):
 case BUILT_IN_SIGNBITD32:
 case BUILT_IN_SIGNBITD64:
diff --git a/gcc/builtins.def b/gcc/builtins.def
index fa8b0641ab1..9264a0fdaab 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -865,6 +865,8 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_EXECVP, "execvp", 
BT_FN_INT_CONST_STRING_PT
 DEF_EXT_LIB_BUILTIN(BUILT_IN_EXECVE, "execve", 
BT_FN_INT_CONST_STRING_PTR_CONST_STRING_PTR_CONST_STRING, ATTR_NOTHROW_LIST)
 DEF_LIB_BUILTIN(BUILT_IN_EXIT, "exit", BT_FN_VOID_INT, 
ATTR_NORETURN_NOTHROW_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_EXPECT, "expect", BT_FN_LONG_LONG_LONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_ESCAPE, "escape", BT_FN_PTR_PTR, 
ATTR_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_BLESS, "bless", BT_FN_PTR_PTR, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_EXPECT_WITH_PROBABILITY, 
"expect_with_probability", BT_FN_LONG_LONG_LONG_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_ASSUME_ALIGNED, "assume_aligned", 
BT_FN_PTR_CONST_PTR_SIZE_VAR, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_EXTEND_POINTER, "extend_pointer", 
BT_FN_UNWINDWORD_PTR, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/testsuite/gcc.dg/alias-17.c b/gcc/testsuite/gcc.dg/alias-17.c
new file mode 100644
index 000..c375e4027ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/alias-17.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+
+static
+int test(int a, int b, int c)
+{
+   int x, y;
+   int d = (&x < &y) ? 1 : -1;
+   if (a) __builtin_escape(&y);
+   y = 0; //!
+   int *q = &x;
+   q += d;
+   int *r = q;
+   if (b) r = __builtin_bless(q);
+   *(c ? r : q) = 1;
+   return y;
+}
+
+int main()
+{
+   if (0 != test(0, 0, 1)) __builtin_abort();
+   if (0 != test(0, 1, 0)) __builtin_abort();
+   if (0 != test(0, 1, 1)) __builtin_abort();
+   if (0 != test(1, 0, 1)) __builtin_abort();
+// if (0 != test(1, 1, 0)) __builtin_abort();
+   if (0 == test(1, 1, 1)) __builtin_abort();
+   return 0;
+}
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 670676f20c3..d5befff71a6 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -2462,6 +2462,8 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
bool tbaa_p)
  }
 
/* The following builtins do not read from memory.  */
+   case BUILT_IN_ESCAPE:
+   case BUILT_IN_BLESS:
case BUILT_IN_FREE:
case BUILT_IN_MALLOC:
case BUILT_IN_POSIX_MEMALIGN:
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index be6647db894..53a38a04ebb 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -1963,6 +1963,8 @@ evaluate_stmt (gimple *stmt)
  break;
 
/* These builtins return their first argument, unmodified.  */
+   case BUILT_IN_ESCAPE:
+   case BUILT_IN_BLESS:
case BUILT_IN_MEMCPY:
case BUILT_IN_MEMMOVE:
case BUILT_IN_MEMSET:
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 416a26c996c..003349c7278 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4381,6 +4381,24 @@ find_func_aliases_for_builtin_call (struct function *fn, 
gcall *t)
in the alias-oracle query functions explicitly!  */
 switch (DECL_FUNCTION_CODE (fndecl))
   {
+  case BUILT_IN_ESCAPE:
+   {
+  make_escape_constraint (gimple_call_arg (t, 0));
+  return true;
+   }
+  case BUILT_IN_BLESS:
+   {
+ 

Re: [PATCH] V12 patch #4 of 14, Optimize adjusting PC-relative vector addresses

2020-01-31 Thread Segher Boessenkool
Hi!

On Thu, Jan 09, 2020 at 07:34:31PM -0500, Michael Meissner wrote:
> +static rtx
> +adjust_vec_address_pcrel (rtx addr, rtx element_offset, rtx base_tmp)
> +{
> +  rtx new_addr = NULL;

You don't need this variable, you can just return the value directly in
every case, and then you don't need all these "else"s either, or all
these nested blocks.  The NULL here is never used, either (you call
gcc_unreachable instead of returning it).

But, okay for trunk.  Thanks!


Segher


[PATCH] fortran: Fix up TYPE_ARG_TYPES of procs with scalar VALUE optional args [PR92305]

2020-01-31 Thread Jakub Jelinek
Hi!

The following patch fixes
-FAIL: libgomp.fortran/use_device_addr-1.f90   -O0  execution test
-FAIL: libgomp.fortran/use_device_addr-2.f90   -O0  execution test
that has been FAILing for several months on powerpc64le-linux.
The problem is in the Fortran FE, which adds the artificial arguments
for scalar VALUE OPTIONAL dummy args only to DECL_ARGUMENTS where the
current function can see them, but not to TYPE_ARG_TYPES; if those functions
aren't varargs, this confuses calls.c to pass the remaining arguments
(which aren't named (== not covered by TYPE_ARG_TYPES) and aren't varargs
either) in a different spot from what the callee (which has proper
DECL_ARGUMENTS for all args) expects.  For the artificial length arguments
for character dummy args we already put them in both DECL_ARGUMENTS and
TYPE_ARG_TYPES.

Fixed thusly, bootstrapped/regtested on x86_64-linux, i686-linux,
powerpc64le-linux and powerpc64-linux (the last one with both -m32/-m64).
Ok for trunk?

2020-01-31  Jakub Jelinek  

PR fortran/92305
* trans-types.c (gfc_get_function_type): Also push boolean_type_node
types for non-character scalar VALUE optional dummy arguments.
* trans-decl.c (create_function_arglist): Skip those in
hidden_typelist.  Formatting fix.

--- gcc/fortran/trans-types.c.jj2020-01-12 11:54:36.0 +0100
+++ gcc/fortran/trans-types.c   2020-01-31 21:26:34.199188677 +0100
@@ -3098,6 +3098,16 @@ gfc_get_function_type (gfc_symbol * sym,
 
  vec_safe_push (typelist, type);
}
+  /* For noncharacter scalar intrinsic types, VALUE passes the value,
+hence, the optional status cannot be transferred via a NULL pointer.
+Thus, we will use a hidden argument in that case.  */
+  else if (arg
+  && arg->attr.optional
+  && arg->attr.value
+  && !arg->attr.dimension
+  && arg->ts.type != BT_CLASS
+  && !gfc_bt_struct (arg->ts.type))
+   vec_safe_push (typelist, boolean_type_node);
 }
 
   if (!vec_safe_is_empty (typelist)
--- gcc/fortran/trans-decl.c.jj 2020-01-30 09:34:43.207088430 +0100
+++ gcc/fortran/trans-decl.c2020-01-31 21:28:49.272197084 +0100
@@ -2645,8 +2645,8 @@ create_function_arglist (gfc_symbol * sy
  || f->sym->ts.u.cl->backend_decl == length)
{
  if (POINTER_TYPE_P (len_type))
-   f->sym->ts.u.cl->backend_decl =
-   build_fold_indirect_ref_loc (input_location, length);
+   f->sym->ts.u.cl->backend_decl
+ = build_fold_indirect_ref_loc (input_location, length);
  else if (f->sym->ts.u.cl->backend_decl == NULL)
gfc_create_string_length (f->sym);
 
@@ -2677,6 +2677,8 @@ create_function_arglist (gfc_symbol * sy
   DECL_ARG_TYPE (tmp) = boolean_type_node;
   TREE_READONLY (tmp) = 1;
   gfc_finish_decl (tmp);
+
+ hidden_typelist = TREE_CHAIN (hidden_typelist);
}
 
   /* For non-constant length array arguments, make sure they use

Jakub



Re: [PATCH] V12 patch #3 of 14, Improve address validation in rs6000_adjust_vec_address

2020-01-31 Thread Segher Boessenkool
Hi!

On Thu, Jan 09, 2020 at 07:27:58PM -0500, Michael Meissner wrote:
>   * config/rs6000/rs6000.c (reg_to_non_prefixed): Add forward
>   reference.

FWIW, it is better to just reorder the code, in most cases.

>   (hard_reg_and_mode_to_addr_mask): Delete, no longer used.

Just "Delete.".  Changelogs say what, not why; you have the commit
message for that.

> +   new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);

So this depends on op0 not being r0 here.  Do we guarantee that somehow?
It isn't obvious, so add an assert for this please?  (Or do I miss
something obvious?  :-) )

> +/* If the address isn't valid, move the address into the temporary base
> +   register.  Some reasons it could not be valid include:
> +   The address offset overflowed the 16 or 34 bit offset size;
> +   We need to use a DS-FORM load, and the bottom 2 bits are non-zero;
> +   We need to use a DQ-FORM load, and the bottom 2 bits are non-zero;
> +   Only X_FORM loads can be done, and the address is D_FORM.  */

4 bits for DQ-form?

Okay for trunk with those tweaks.  Thanks!


Segher


Re: [PATCH] Add --with-diagnostics-urls configuration option and GCC_URLS/TERM_URLS env var

2020-01-31 Thread Bernd Edlinger



On 1/31/20 11:54 PM, Segher Boessenkool wrote:
> Hi!
> 
> Thanks for working on this.
> 
> On Fri, Jan 31, 2020 at 04:59:02PM +, Bernd Edlinger wrote:
>> I will try to improve the patch a bit, and hope you are gonna like
>> it. I agree that this feature is fine, and should be enabled by
>> default, and just if it is positively clear that it won't work,
>> disabled in the auto mode.
> 
> Why does it not ask the terminfo, instead of only disabling this for
> "dumb"?  It should disable it for *most* terminals!  The one I care

Unfortunately I was told that terminfo is not willing to support URLs
either, and it need to be administered centrally, which is per definition
an impossible task...


> about most, which caused me to open PR93168, is TERM=screen (which is
> what tmux uses), so at least exclude that one?  And doing all this


Definitely, if the situation with tmux is like xfce4-terminal (reportedly
the 0.8 version switched to a fixed VTE, which makes the URLs invisible,
but the URL feature request is pending sine 2017, with no activity whatsoever),
then detecting that and disabling the URLs until they finally implement
the URLs is straight forward.  I can add that to my patch if you confirm,
the right detection logic:

>From 
>https://unix.stackexchange.com/questions/10689/how-can-i-tell-if-im-in-a-tmux-session-from-a-bash-script
I read the safest way to find out if you are in a tmux session, is
to look for TERM=screen and TMUX is set.
Is that the case for your environment?

Note that it is well possible that this environment values are
not preserved in a ssh session, but nobody is perfect.


> beeping and screen-corrupting stuff for everything that sets TERM=xterm
> is a bad idea imnsho.
> 
>> Since I have seen much garbage from the URLs in the xfce4-terminal 0.6.3
>> admittedly an old Ubuntu installation, but still in LTS status,
>> and no attempt from the xfc4-terminal to _ever_ implement URLs,
> 
> This is true for *most* terminal emulators.
> 

Sadly, I would not do this if there is a chance that someone looses a
working feature, so I was told that a more aggressive approach would
"be straight harmful to the current state of things
 (i.e. hyperlinks at least not causing any problem, plus working
 out of the box wherever supported, for the vast majority of people)"
https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#gistcomment-3160953

So I would try to detect known terminals which are so buggy that they
print garbage instead of silently ignoring the URL escapes AND
which may or may not have fixed the visual glitches but have not plan
to implement URLs at all.  Once they implement URLs it would be nice if they
do something like set a TERM_URLS=yes so we know for sure, that the feature is
positively available, for instance.

> I have nothing against this feature, I just wish it wouldn't annoy me
> on pretty much every system I use.  None of which use "TERM=dumb", but
> none of which use "TERM=fancy-pants-term-v42" either.  (Did anyone ever
> use "dumb", anyway?)
> 

The dumb thing was old code, I only took the freedom to document it ;-)


Bernd.
> 
> Segher
> 


[COMMITTED 2/2] c++: Reduce memory consumption for arrays of non-aggregate type.

2020-01-31 Thread Jason Merrill
The remaining low-hanging fruit for improvement on memory consumption in the
14179 testcase was the duplication of the CONSTRUCTOR for the array by
reshape_init.  This patch changes reshape_init to reuse a single constructor
for an array of non-aggregate type such as the one in the testcase.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/14179
* decl.c (reshape_init_array_1): Reuse a single CONSTRUCTOR with
non-aggregate elements.
(reshape_init_array): Add first_initializer_p parm.
(reshape_init_r): Change first_initializer_p from bool to tree.
(reshape_init): Pass init to it.
---
 gcc/cp/decl.c | 50 ++
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 6ad558eef9e..ef34f436fdf 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5931,7 +5931,7 @@ struct reshape_iter
   constructor_elt *end;
 };
 
-static tree reshape_init_r (tree, reshape_iter *, bool, tsubst_flags_t);
+static tree reshape_init_r (tree, reshape_iter *, tree, tsubst_flags_t);
 
 /* FIELD is a FIELD_DECL or NULL.  In the former case, the value
returned is the next FIELD_DECL (possibly FIELD itself) that can be
@@ -5979,15 +5979,21 @@ is_direct_enum_init (tree type, tree init)
 
 static tree
 reshape_init_array_1 (tree elt_type, tree max_index, reshape_iter *d,
- tsubst_flags_t complain)
+ tree first_initializer_p, tsubst_flags_t complain)
 {
   tree new_init;
   bool sized_array_p = (max_index && TREE_CONSTANT (max_index));
   unsigned HOST_WIDE_INT max_index_cst = 0;
   unsigned HOST_WIDE_INT index;
 
-  /* The initializer for an array is always a CONSTRUCTOR.  */
-  new_init = build_constructor (init_list_type_node, NULL);
+  /* The initializer for an array is always a CONSTRUCTOR.  If this is the
+ outermost CONSTRUCTOR and the element type is non-aggregate, we don't need
+ to build a new one.  */
+  bool reuse = first_initializer_p && !CP_AGGREGATE_TYPE_P (elt_type);
+  if (reuse)
+new_init = first_initializer_p;
+  else
+new_init = build_constructor (init_list_type_node, NULL);
 
   if (sized_array_p)
 {
@@ -6014,12 +6020,20 @@ reshape_init_array_1 (tree elt_type, tree max_index, 
reshape_iter *d,
   constructor_elt *old_cur = d->cur;
 
   check_array_designated_initializer (d->cur, index);
-  elt_init = reshape_init_r (elt_type, d, /*first_initializer_p=*/false,
+  elt_init = reshape_init_r (elt_type, d,
+/*first_initializer_p=*/NULL_TREE,
 complain);
   if (elt_init == error_mark_node)
return error_mark_node;
-  CONSTRUCTOR_APPEND_ELT (CONSTRUCTOR_ELTS (new_init),
- size_int (index), elt_init);
+  tree idx = size_int (index);
+  if (reuse)
+   {
+ old_cur->index = idx;
+ old_cur->value = elt_init;
+   }
+  else
+   CONSTRUCTOR_APPEND_ELT (CONSTRUCTOR_ELTS (new_init),
+   idx, elt_init);
   if (!TREE_CONSTANT (elt_init))
TREE_CONSTANT (new_init) = false;
 
@@ -6056,7 +6070,8 @@ reshape_init_array_1 (tree elt_type, tree max_index, 
reshape_iter *d,
Parameters are the same of reshape_init_r.  */
 
 static tree
-reshape_init_array (tree type, reshape_iter *d, tsubst_flags_t complain)
+reshape_init_array (tree type, reshape_iter *d, tree first_initializer_p,
+   tsubst_flags_t complain)
 {
   tree max_index = NULL_TREE;
 
@@ -6065,7 +6080,8 @@ reshape_init_array (tree type, reshape_iter *d, 
tsubst_flags_t complain)
   if (TYPE_DOMAIN (type))
 max_index = array_type_nelts (type);
 
-  return reshape_init_array_1 (TREE_TYPE (type), max_index, d, complain);
+  return reshape_init_array_1 (TREE_TYPE (type), max_index, d,
+  first_initializer_p, complain);
 }
 
 /* Subroutine of reshape_init_r, processes the initializers for vectors.
@@ -6096,7 +6112,8 @@ reshape_init_vector (tree type, reshape_iter *d, 
tsubst_flags_t complain)
   if (VECTOR_TYPE_P (type))
 max_index = size_int (TYPE_VECTOR_SUBPARTS (type) - 1);
 
-  return reshape_init_array_1 (TREE_TYPE (type), max_index, d, complain);
+  return reshape_init_array_1 (TREE_TYPE (type), max_index, d,
+  NULL_TREE, complain);
 }
 
 /* Subroutine of reshape_init_r, processes the initializers for classes
@@ -6179,7 +6196,8 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
break;
 
   field_init = reshape_init_r (TREE_TYPE (field), d,
-  /*first_initializer_p=*/false, complain);
+  /*first_initializer_p=*/NULL_TREE,
+  complain);
   if (field_init == error_mark_node)
return error_mark_node;
 
@@ -6230,11 +6248,11 @@ has_designator_problem (reshape_iter *d

[COMMITTED 1/2] c++: Reduce memory consumption for large static arrays.

2020-01-31 Thread Jason Merrill
PR14179 and the C counterpart PR12245 are about memory consumption of very
large file-scope arrays.  Recently, location wrappers increased memory
consumption significantly: in an array of integer constants, each one will
have a location wrapper, which added up to over 500MB in the 14179
testcase.  For this kind of testcase tracking these locations isn't worth
the cost, so this patch turns the wrappers off after 256 elements; any array
that size or larger isn't likely to be interested in the location of
individual integer constants.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/14179
* parser.c (cp_parser_initializer_list): Suppress location wrappers
after 256 elements.
---
 gcc/cp/parser.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index bd8e524f078..e0f72302e5e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -23311,6 +23311,9 @@ cp_parser_initializer_list (cp_parser* parser, bool* 
non_constant_p,
   /* Assume all of the expressions are constant.  */
   *non_constant_p = false;
 
+  unsigned nelts = 0;
+  int suppress = suppress_location_wrappers;
+
   /* Parse the rest of the list.  */
   while (true)
 {
@@ -23450,6 +23453,19 @@ cp_parser_initializer_list (cp_parser* parser, bool* 
non_constant_p,
   if (token->type == CPP_CLOSE_BRACE)
break;
 
+  /* Suppress location wrappers in a long initializer to save memory
+(14179).  The cutoff is chosen arbitrarily.  */
+  const unsigned loc_max = 256;
+  unsigned incr = 1;
+  if (TREE_CODE (initializer) == CONSTRUCTOR)
+   /* Look one level down because it's easy.  Looking deeper would require
+  passing down a nelts pointer, and I don't think multi-level massive
+  initializers are common enough to justify this.  */
+   incr = CONSTRUCTOR_NELTS (initializer);
+  nelts += incr;
+  if (nelts >= loc_max && (nelts - incr) < loc_max)
+   ++suppress_location_wrappers;
+
   /* Consume the `,' token.  */
   cp_lexer_consume_token (parser->lexer);
 }
@@ -23479,6 +23495,8 @@ cp_parser_initializer_list (cp_parser* parser, bool* 
non_constant_p,
  IDENTIFIER_MARKED (designator) = 0;
 }
 
+  suppress_location_wrappers = suppress;
+
   *designated = first_designator != NULL_TREE;
   return v;
 }

base-commit: 42f36563ef655db48d5fda60cd7f3eac9650dade
-- 
2.18.1



Re: [PATCH] diagnostics: Add options to control the column units [PR49973] [PR86904]

2020-01-31 Thread Lewis Hyatt
Thanks for taking a look, sorry about that, it's my first new option
:). I will add in the next iteration.

-Lewis

On Fri, Jan 31, 2020 at 5:45 PM Joseph Myers  wrote:
>
> This seems to be missing invoke.texi documentation for the new options.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH] Add --with-diagnostics-urls configuration option and GCC_URLS/TERM_URLS env var

2020-01-31 Thread Segher Boessenkool
Hi!

Thanks for working on this.

On Fri, Jan 31, 2020 at 04:59:02PM +, Bernd Edlinger wrote:
> I will try to improve the patch a bit, and hope you are gonna like
> it. I agree that this feature is fine, and should be enabled by
> default, and just if it is positively clear that it won't work,
> disabled in the auto mode.

Why does it not ask the terminfo, instead of only disabling this for
"dumb"?  It should disable it for *most* terminals!  The one I care
about most, which caused me to open PR93168, is TERM=screen (which is
what tmux uses), so at least exclude that one?  And doing all this
beeping and screen-corrupting stuff for everything that sets TERM=xterm
is a bad idea imnsho.

> Since I have seen much garbage from the URLs in the xfce4-terminal 0.6.3
> admittedly an old Ubuntu installation, but still in LTS status,
> and no attempt from the xfc4-terminal to _ever_ implement URLs,

This is true for *most* terminal emulators.

I have nothing against this feature, I just wish it wouldn't annoy me
on pretty much every system I use.  None of which use "TERM=dumb", but
none of which use "TERM=fancy-pants-term-v42" either.  (Did anyone ever
use "dumb", anyway?)


Segher


Re: [PATCH] diagnostics: Add options to control the column units [PR49973] [PR86904]

2020-01-31 Thread Joseph Myers
This seems to be missing invoke.texi documentation for the new options.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] libgfortran: Fix and simplify IO locking [PR92836]

2020-01-31 Thread Janne Blomqvist
On Fri, Jan 31, 2020 at 3:38 PM Janne Blomqvist
 wrote:
>
> Simplify IO locking in libgfortran.  The new IO implementation avoids
> accessing units without locks, as seen in PR 92836.  It also avoids
> lock inversion (except for a corner case wrt namelist query when
> reading from stdin and outputting to stdout), making it easier to
> verify correctness with tools like valgrind or threadsanitizer.  It is
> also simplified as the waiting and closed variables are not needed
> anymore, making it easier to understand and analyze.
>
> Regtested on x86_64-pc-linux-gnu, Ok for master?

And, I forgot the ChangeLog entry. Here it is:

libgfortran/ChangeLog:

2020-01-31  Janne Blomqvist  

PR libfortran/92836
* io/close.c (st_close): Close unit with unit_lock held.
* io/io.h (gfc_unit): Remove waiting and closed members.
(find_unit_locked): New prototype.
(inc_waiting_locked): Remove.
(predec_waiting_locked): Remove.
(dec_waiting_unlocked): Remove.
* io/list_read.c (nml_query): Avoid deadlock due to lock
inversion.
* io/unit.c (newunit_lock): New variable.
(get_gfc_unit): Add new argument, use it. Remove logic waiting for
closing logic..
(find_unit): Use new argument for get_gfc_unit.
(find_unit_locked): New function.
(find_or_create_unit): Use new argumet for gfc_get_unit.
(get_unit): Likewise.
(init_units): Initialize newunit_lock.
(close_unit_1): Change meaning of locked argument, remove waiting
for closing logic.
(close_unit): Adapt to new close_unit_1 arguments.
(close_units): Likewwise.
(newunit_alloc): Use newunit_lock to protect.
(newunit_free): Likewise.
* io/unix.c (find_file0): Lock unit before accessing.
(find_file): Remove waiting for closing logic.
(flush_all_units_1): Likewise.
(flush_all_units): Likewise.




-- 
Janne Blomqvist


[committed] analyzer: fix ICE with 'const void *' (PR 93457)

2020-01-31 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r10-6386-g6775172431a8e6f0d20ac0c4946d6b5db2f46450.

gcc/analyzer/ChangeLog:
PR analyzer/93457
* region-model.cc (make_region_for_type): Use VOID_TYPE_P rather
than checking against void_type_node.

gcc/testsuite/ChangeLog:
PR analyzer/93457
* gcc.dg/analyzer/pr93457.c: New test.
---
 gcc/analyzer/region-model.cc|  2 +-
 gcc/testsuite/gcc.dg/analyzer/pr93457.c | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr93457.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index eb6d8f3cf3a..679479c8b5c 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -6005,7 +6005,7 @@ make_region_for_type (region_id parent_rid, tree type)
 return new function_region (parent_rid, type);
 
   /* If we have a void *, make a new symbolic region.  */
-  if (type == void_type_node)
+  if (VOID_TYPE_P (type))
 return new symbolic_region (parent_rid, false);
 
   gcc_unreachable ();
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr93457.c 
b/gcc/testsuite/gcc.dg/analyzer/pr93457.c
new file mode 100644
index 000..b77911ba789
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr93457.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+void
+p5 (const void *);
+
+void
+s5 (const void *cl)
+{
+  p5 (&cl[1]); /* { dg-warning "dereferencing 'void \\*' pointer" } */
+}
-- 
2.21.0



[committed] analyzer: fix ICE handling void-type (PR 93373)

2020-01-31 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r10-6385-g09bea5845a50189b093e7fa8d5b840da702197d4.

gcc/analyzer/ChangeLog:
PR analyzer/93373
* region-model.cc (ASSERT_COMPAT_TYPES): Convert to...
(assert_compat_types): ...this, and bail when either type is NULL,
or when VOID_TYPE_P (dst_type).
(region_model::get_lvalue): Update for above conversion.
(region_model::get_rvalue): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/93373
* gcc.dg/analyzer/torture/pr93373.c: New test.
---
 gcc/analyzer/region-model.cc| 12 
 gcc/testsuite/gcc.dg/analyzer/torture/pr93373.c |  3 +++
 2 files changed, 11 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/pr93373.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index d43aef3a4d7..eb6d8f3cf3a 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4701,8 +4701,12 @@ region_model::get_lvalue_1 (path_var pv, 
region_model_context *ctxt)
 
 /* Assert that SRC_TYPE can be converted to DST_TYPE as a no-op.  */
 
-#define ASSERT_COMPAT_TYPES(SRC_TYPE, DST_TYPE) \
-  gcc_checking_assert (useless_type_conversion_p ((SRC_TYPE), (DST_TYPE)))
+static void
+assert_compat_types (tree src_type, tree dst_type)
+{
+  if (src_type && dst_type && !VOID_TYPE_P (dst_type))
+gcc_checking_assert (useless_type_conversion_p (src_type, dst_type));
+}
 
 /* Get the id of the region for PV within this region_model,
emitting any diagnostics to CTXT.  */
@@ -4714,7 +4718,7 @@ region_model::get_lvalue (path_var pv, 
region_model_context *ctxt)
 return region_id::null ();
 
   region_id result_rid = get_lvalue_1 (pv, ctxt);
-  ASSERT_COMPAT_TYPES (get_region (result_rid)->get_type (),
+  assert_compat_types (get_region (result_rid)->get_type (),
   TREE_TYPE (pv.m_tree));
   return result_rid;
 }
@@ -4795,7 +4799,7 @@ region_model::get_rvalue (path_var pv, 
region_model_context *ctxt)
 return svalue_id::null ();
   svalue_id result_sid = get_rvalue_1 (pv, ctxt);
 
-  ASSERT_COMPAT_TYPES (get_svalue (result_sid)->get_type (),
+  assert_compat_types (get_svalue (result_sid)->get_type (),
   TREE_TYPE (pv.m_tree));
 
   return result_sid;
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/pr93373.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/pr93373.c
new file mode 100644
index 000..c205ee13465
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/pr93373.c
@@ -0,0 +1,3 @@
+/* { dg-do compile }
+   { dg-require-effective-target alloca } */
+#include "../../Warray-bounds-41.c"
-- 
2.21.0



Re: [PATCH] Add --with-diagnostics-urls configuration option and GCC_URLS/TERM_URLS env var

2020-01-31 Thread David Malcolm
On Fri, 2020-01-31 at 16:59 +, Bernd Edlinger wrote:
> Hi,
> 
> this is patch is heavily based on David's original patch here:
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01409.html
> 
> and addresses Jakub's review comments here:
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01412.html
> 
> So I hope you don't mind, when I pick up this patch since there
> was not much activity recently on this issue, so I assumed you
> would appreciate some help.

Thanks Bernd; sorry, I got distracted by analyzer bug-fixing.  It's
appreciated.

> I will try to improve the patch a bit, and hope you are gonna like
> it. I agree that this feature is fine, and should be enabled by
> default, and just if it is positively clear that it won't work,
> disabled in the auto mode.
> 
> Also as requested by Jakub this tries to be more compatible to
> GCC_COLORS define, and adds the capability to switch between ST
> and BEL string termination and also have a way to disable urls
> even with -fdiagnostics-urls=always (like GCC_COLORS= disables
> colors).
> 
> In addition to that I propose to use GCC_URLS and if that
> is not defined use TERM_URLS to control that feature for
> all applications willing to support that.

I think I like the overall idea.

> Since I have seen much garbage from the URLs in the xfce4-terminal
> 0.6.3
> admittedly an old Ubuntu installation, but still in LTS status,
> and no attempt from the xfc4-terminal to _ever_ implement URLs, I
> would
> like to detect the xfce4-terminal, and use that in auto mode
> to switch off the URLs feature, since in best case the URLs will
> just be ignored, but can and will often create garbage on the screen.
> 
> Likewise on MinGW, since the windows 10 cmd prompt does at best
> ignore the URLs, but windows terminal and windows 7 cmd prompt
> print garbage when URLs are output.

That sounds reasonable, but we should document those exclusions, I
think.

I think the ChangeLog should also refer to "PR other/93168":
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93168

> diff --git a/gcc/diagnostic-color.c b/gcc/diagnostic-color.c
> index d554795..ad84d1f 100644
> --- a/gcc/diagnostic-color.c
> +++ b/gcc/diagnostic-color.c
> @@ -239,6 +239,56 @@ colorize_init (diagnostic_color_rule_t rule)
>  }
>  }
>  
> +bool diagnostic_urls_use_st = false;

I don't like global variables (a pet peeve of mine); can you make this
be a field of the pretty_printer instead?

Even better: convert pretty_printer::show_urls from a bool to a new
enum; something like:

enum diagnostic_url_format
{
  URL_FORMAT_NONE,
  URL_FORMAT_ST,
  URL_FORMAT_BEL  
};

with suitable leading comments (probably a good place to summarize some
of the compatibility info within the source).

Then we could verify the differences between ST and BEL in the
selftests, and also not have the selftests be affected by env vars.


> +/* Return true if we should use urls when in always mode, false otherwise.
> +   Additionally initialize diagnostic_urls_use_st to true if ST escapes
> +   should be used and false for BEL escapes.  */

Maybe have this return that 3-way enum instead?

> +
> +static bool
> +parse_gcc_urls()
> +{
> +  const char *p;
> +
> +  p = getenv ("GCC_URLS"); /* Plural! */
> +  if (p == NULL)
> +p = getenv ("TERM_URLS");
> +
> +  if (p == NULL)
> +return true;
> +  if (*p == '\0')
> +return false;
> +
> +  if (!strcmp (p, "no"))
> +return false;
> +  if (!strcmp (p, "st"))
> +diagnostic_urls_use_st = true;
> +  else if (!strcmp (p, "bel"))
> +diagnostic_urls_use_st = false;
> +  return true;
> +}
> +
> +/* Return true if we should use urls when in auto mode, false otherwise.  */
> +
> +static bool
> +auto_enable_urls ()
> +{
> +#ifdef __MINGW32__
> +  return false;
> +#else
> +  const char *p;
> +
> +  if (!should_colorize ())
> +return false;
> +  p = getenv ("COLORTERM");
> +  if (p == NULL)
> +return true;
> +  if (!strcmp (p, "xfce4-terminal"))
> +return false;
> +  return true;
> +#endif
> +}

I think this logic warrants a comment for each of the exclusions: why
are we excluding them?  I'd prefer to capture that in the source,
rather than just in the mailing list.

>  /* Determine if URLs should be enabled, based on RULE.
> This reuses the logic for colorization.  */
>  
> @@ -250,9 +300,12 @@ diagnostic_urls_enabled_p (diagnostic_url_rule_t rule)
>  case DIAGNOSTICS_URL_NO:
>return false;
>  case DIAGNOSTICS_URL_YES:
> -  return true;
> +  return parse_gcc_urls ();
>  case DIAGNOSTICS_URL_AUTO:
> -  return should_colorize ();
> +  if (auto_enable_urls ())
> + return parse_gcc_urls ();
> +  else
> + return false;
>  default:
>gcc_unreachable ();
>  }
> diff --git a/gcc/diagnostic-url.h b/gcc/diagnostic-url.h
> index 6be0569..06b7784 100644
> --- a/gcc/diagnostic-url.h
> +++ b/gcc/diagnostic-url.h
> @@ -32,5 +32,6 @@ typedef enum
>  } diagnostic_url_rule_t;
>  
>  extern bool diagnostic_urls_enabled_p

Re: [PING^4][PATCH 0/4] Fix library testsuite compilation for build sysroot

2020-01-31 Thread Maciej W. Rozycki
On Tue, 21 Jan 2020, Maciej W. Rozycki wrote:

>  I'll give your proposal a shot and I'm lucky enough to have a build 
> configuration where I can have no compiler preinstalled, so at least I can 
> check if testing with your change applied correctly picks the newly built 
> uninstalled compiler in that case.

 So it does seem to pick the right uninstalled compiler, however without 
the sysroot option and therefore all tests fail either like:

.../bin/riscv64-linux-gnu-ld: cannot find crt1.o: No such file or directory
.../bin/riscv64-linux-gnu-ld: cannot find -lm
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: libgomp.c/../libgomp.c-c++-common/depend-iterator-2.c (test for excess 
errors)

or like:

.../libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/cancel-parallel-1.c:4:10:
 fatal error: stdlib.h: No such file or directory
compilation terminated.
compiler exited with status 1
FAIL: libgomp.c/../libgomp.c-c++-common/cancel-parallel-1.c (test for excess 
errors)

I am somewhat wary about picking the compiler options to pass selectively 
anyway.  However the change below does not do that and works for me and I 
think it should be fine for your use case too.  Please confirm.

 I'm yet verifying the other libraries with corresponding changes and will 
formally submit v2 of this series once that has completed.

 Apologies for the slightly long RTT with this update.

  Maciej

---
 libgomp/testsuite/Makefile.am |1 +
 libgomp/testsuite/Makefile.in |1 +
 libgomp/testsuite/libgomp-test-support.exp.in |2 --
 3 files changed, 2 insertions(+), 2 deletions(-)

gcc-test-libgomp-runtestflags-tool-exec.diff
Index: gcc/libgomp/testsuite/Makefile.am
===
--- gcc.orig/libgomp/testsuite/Makefile.am
+++ gcc/libgomp/testsuite/Makefile.am
@@ -11,6 +11,7 @@ EXPECT = $(shell if test -f $(top_buildd
 _RUNTEST = $(shell if test -f $(top_srcdir)/../dejagnu/runtest; then \
 echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
 RUNTESTDEFAULTFLAGS = --tool $$tool --srcdir $$srcdir
+AM_RUNTESTFLAGS = --tool_exec "$(CC)"
 
 # Instead of directly in ../testsuite/libgomp-test-support.exp.in, the
 # following variables have to be "routed through" this Makefile, for expansion
Index: gcc/libgomp/testsuite/Makefile.in
===
--- gcc.orig/libgomp/testsuite/Makefile.in
+++ gcc/libgomp/testsuite/Makefile.in
@@ -308,6 +308,7 @@ _RUNTEST = $(shell if test -f $(top_srcd
 echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
 
 RUNTESTDEFAULTFLAGS = --tool $$tool --srcdir $$srcdir
+AM_RUNTESTFLAGS = --tool_exec "$(CC)"
 all: all-am
 
 .SUFFIXES:
Index: gcc/libgomp/testsuite/libgomp-test-support.exp.in
===
--- gcc.orig/libgomp/testsuite/libgomp-test-support.exp.in
+++ gcc/libgomp/testsuite/libgomp-test-support.exp.in
@@ -1,5 +1,3 @@
-set GCC_UNDER_TEST {@CC@}
-
 set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
 set cuda_driver_lib "@CUDA_DRIVER_LIB@"
 set hsa_runtime_lib "@HSA_RUNTIME_LIB@"


Re: [rfc PATCH] rs6000: Updated constraint documentation

2020-01-31 Thread Segher Boessenkool
On Fri, Jan 31, 2020 at 10:56:10AM -0600, Bill Schmidt wrote:
> On 1/31/20 9:42 AM, Segher Boessenkool wrote:
> >On Fri, Jan 31, 2020 at 08:49:21AM -0600, Bill Schmidt wrote:
> >>>+(define_register_constraint "wa"
> >>>"rs6000_constraints[RS6000_CONSTRAINT_wa]"
> >>>+  "A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  Either a 
> >>>@code{d}
> >>>+   or a @code{v} register.")
> >>Not quite true, as the "d" register is only half of a VSX register.  It
> >>may or may not be worth including a picture of register overlaps...
> >No, the "d" registers are the actual full registers, all 128 bits of it.
> >You often use them in a mode that uses only 64 bits, sure.
> 
> Perhaps that would be worth a few words when describing the "d" 
> constraint, then.  This is not at all obvious to the casual user. Thanks!

"They should read the ISA, it's all explained right at the start of
Chapter 7".

"All instructions that operate on an FPR are redefined to operate on
doubleword element 0 of the corresponding VSR.  The contents of
doubleword element 1 of the VSR corresponding to a source FPR or FPR
pair for these instructions are ignored and the contents of doubleword
element 1 of the VSR corresponding to the target FPR or FPR pair for
these instructions are undefined."

The twist in GCC is that all register numbers always denote the whole
register.  Compares this to (reg:DI 3) vs. (reg:SI 3) (and even HI and
QI); here it is (reg:DF 32) vs. (reg:V2DF 32) (etc.)

None of the VSRs are separate registers, they just *are* the FPRs and
the VRs.  On the machines the FPRs are in some implementations really
only half the width of the VSRs, or the other half does not even exist
on older machines; but inside GCC, such details do not matter.

But I'll add some more words, sure, it is probably useful to explain
some of the basic setup :-)

The tricky part is to have it make sense when you read it in order, but
to also make sense when used as a reference.


Segher


Re: [PATCH] diagnostics: Add options to control the column units [PR49973] [PR86904]

2020-01-31 Thread Lewis Hyatt
On Fri, Jan 31, 2020 at 3:32 PM David Malcolm  wrote:
>
> On Fri, 2020-01-31 at 14:31 -0500, Lewis Hyatt wrote:
> > Hello-
> >
> > Here is the second patch that I mentioned when I submitted the other
> > related
> > patch (which is awaiting review):
> > https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01626.html.
>
> Sorry about that; I'm v. busy with analyzer bugs right now.

Thanks very much for the feedback. Totally understood that you are
busy with more pressing things for GCC 10. Looking forward to trying
out -fanalyzer myself too. I'll apply these suggestions then and send
an updated version after GCC 10 release.

-Lewis


Re: [PATCH] Add --with-diagnostics-urls configuration option and GCC_URLS/TERM_URLS env var

2020-01-31 Thread Bernd Edlinger
Hi Andrew,

I just saw your patch here:
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01474.html
Re: [PATCH] gcc: Add new configure options to allow static libraries to be 
selected


Note: the artefacts in my patch below seem to be a missing re-generated 
gcc/configure
from your patch?
Is that right?  I did not notice any problem from this, does this work for you 
this way?
And, is it the right thing to push those changes along this patch, or do you
want to take care of this by yourself?


Thanks
Bernd.


On 1/31/20 5:58 PM, Bernd Edlinger wrote:
> Hi,
> 
> this is patch is heavily based on David's original patch here:
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01409.html
> 
> and addresses Jakub's review comments here:
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01412.html
> 
> So I hope you don't mind, when I pick up this patch since there
> was not much activity recently on this issue, so I assumed you
> would appreciate some help.
> 
> I will try to improve the patch a bit, and hope you are gonna like
> it. I agree that this feature is fine, and should be enabled by
> default, and just if it is positively clear that it won't work,
> disabled in the auto mode.
> 
> Also as requested by Jakub this tries to be more compatible to
> GCC_COLORS define, and adds the capability to switch between ST
> and BEL string termination and also have a way to disable urls
> even with -fdiagnostics-urls=always (like GCC_COLORS= disables colors).
> 
> In addition to that I propose to use GCC_URLS and if that
> is not defined use TERM_URLS to control that feature for
> all applications willing to support that.
> 
> Since I have seen much garbage from the URLs in the xfce4-terminal 0.6.3
> admittedly an old Ubuntu installation, but still in LTS status,
> and no attempt from the xfc4-terminal to _ever_ implement URLs, I would
> like to detect the xfce4-terminal, and use that in auto mode
> to switch off the URLs feature, since in best case the URLs will
> just be ignored, but can and will often create garbage on the screen.
> 
> Likewise on MinGW, since the windows 10 cmd prompt does at best
> ignore the URLs, but windows terminal and windows 7 cmd prompt
> print garbage when URLs are output.
> 
> I have one question though, regarding the autoconf version that
> we use, it is autoconf-2.69, right, at least that is what I read in
> gcc/doc/install.texi?
> 
> I don't understand why the generated configure has additional
> changes than what is changed in configure.ac.  Should I commit the
> generated version as is, or use a different autoconf version?
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
> 
> 
> Thanks
> Benrd.
> 


Re: [PATCH] diagnostics: Add options to control the column units [PR49973] [PR86904]

2020-01-31 Thread David Malcolm
On Fri, 2020-01-31 at 14:31 -0500, Lewis Hyatt wrote:
> Hello-
> 
> Here is the second patch that I mentioned when I submitted the other
> related
> patch (which is awaiting review):
> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01626.html. 

Sorry about that; I'm v. busy with analyzer bugs right now.

> This second patch
> is based on top of the first one and it closes out PR49973 and
> PR86904 by
> adding the new option -fdiagnostics-column-unit=[display|byte]. This
> allows
> to specify whether columns are output as simple byte counts (the
> current
> behavior), or as display columns including handling multibyte
> characters and
> tabs. The patch makes display columns the new default. Additionally,
> a
> second new option -fdiagnostics-column-origin is added, which allows
> to make
> the column 0-based (or N-based for any N) instead of 1-based. The
> default
> remains at 1-based as it is now.
> 
> A number of testcases were explicitly testing for the old behavior,
> so I
> have updated them to test for the new behavior instead, since the
> column
> number adjusted for tabs is more natural to test for, and matches
> what
> editors typically show (give or take 1 for the origin convention).
> 
> One other testcase (go.dg/arrayclear.go) was a bit of an oddity. It
> failed
> after this patch, although it doesn't test for any column numbers.
> The
> answer turned out to be, this test checks for identical error text on
> two
> different lines. When the column units are changed to display
> columns, then
> the column of the second error happens to match the line of the first
> one. dejagnu then misinterprets the second error as if it matched the
> location of the first one (it doesn't distinguish whether it checks
> for the
> line number or the column number in the output). I added a comment to
> the
> test explaining the situation; since adding the comment has the side
> effect
> of making the first line number no longer match the second column
> number, it
> also makes the test pass again.
> 
> It wasn't quite clear to me whether this change was appropriate for
> GCC 10
> or not at this point. We discussed it a couple months ago here:
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02171.html. Either way,
> I hope
> it isn't a problem that I submitted the patch for review now, whether
> it
> will end up in 10 or 11. Please let me know what's normally expected?
> Thanks!

Thanks Lewis.

This patch looks very promising, but should wait until gcc 11; we're
trying to stabilize gcc 10 right now (I'm knee-deep in analyzer bug-
fixing, so I don't want to add any more diagnostics changes).


> gcc/ChangeLog:
> 
> 2020-01-31  Lewis Hyatt  
>

Please reference the PRs here

[...]

> gcc/testsuite/ChangeLog:
> 
> 2020-01-31  Lewis Hyatt  

Likewise here.

[...]

> diff --git a/gcc/common.opt b/gcc/common.opt
> index 630c380bd6a..657985450c2 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1309,6 +1309,14 @@ Enum(diagnostic_url_rule) String(always) 
> Value(DIAGNOSTICS_URL_YES)
>  EnumValue
>  Enum(diagnostic_url_rule) String(auto) Value(DIAGNOSTICS_URL_AUTO)
>  
> +fdiagnostics-column-unit=
> +Common Joined RejectNegative Enum(diagnostics_column_unit)
> +-fdiagnostics-column-unit=[display|byte] Select units for column numbers.
Should this line mention the default?

> +fdiagnostics-column-origin=
> +Common Joined RejectNegative UInteger
> +-fdiagnostics-column-origin= Set the number of the first column.  
> Default 1-based.

These new options should be documented in gcc/doc/invoke.texi.

[...]

> @@ -43,21 +44,23 @@ static json::array *cur_children_array;
>  /* Generate a JSON object for LOC.  */
>  
>  json::value *
> -json_from_expanded_location (location_t loc)
> +json_from_expanded_location (diagnostic_context *context, location_t loc)
>  {
>expanded_location exploc = expand_location (loc);
>json::object *result = new json::object ();
>if (exploc.file)
>  result->set ("file", new json::string (exploc.file));
>result->set ("line", new json::integer_number (exploc.line));
> -  result->set ("column", new json::integer_number (exploc.column));
> +  const int col = diagnostic_converted_column (context, exploc);
> +  result->set ("column", new json::integer_number (col));

I wonder if the JSON output format should show *both* values: perhaps
add fields "byte-column" and "display-column", and retain the field
"column", which would follow -fdiagnostics-column-unit?

[...]

> @@ -219,6 +220,8 @@ diagnostic_initialize (diagnostic_context *context, int 
> n_opts)
>context->min_margin_width = 0;
>context->show_ruler_p = false;
>context->parseable_fixits_p = false;
> +  context->column_unit = DIAGNOSTICS_COLUMN_UNIT_DISPLAY;
> +  context->column_adj = 0;

I'm not sure, but I think I prefer it if we store the column origin
instead, rather than an offset relative to an origin of 1.

[...]

> @@ -338,8 +341,37 @@ diagnostic_get_color_for_kind (diagnostic_t kind)
>return diagnostic_

Re: [PATCH] gcc: Add new configure options to allow static libraries to be selected

2020-01-31 Thread Iain Sandoe

Hello Andrew,

Andrew Burgess  wrote:


Here's a cleaned up version of the previous patch I posted.  If Iain
reports this fixes the regressions he saw then I will push this.


I applied this to r10-6364 and tested on a bunch of Darwin platforms.

AFAICT, the configuration now reports consistent values between
libcpp, gcc, intl, and libstdc++-v3.

Bootstrap succeeded with my “normal” defaults, and testing appears
nominal.

Notes:

* I have only tested on Darwin, I’m assuming you have tested on other
 platforms.

* there are now a number of interacting ICONV options, I made no
 attempt to test multiple permutations.

thanks for the fix,
cheers
Iain



patch to fix PR91333

2020-01-31 Thread Vladimir Makarov

The following patch fixes

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91333

The patch was successfully bootstrapped and tested on x86-64.

The patch changes order of putting colorable allocnos on the stack and 
consequently changes order of assigning hard registers to the allocnos 
(they all will get a hard register).  This can change the assignment and 
eliminate some moves involving hard registers.


I tried many different heuristics of ordering colorable allocnos based 
on different allocno (and corresponding thread) features like living 
only in BB, live range, the allocno preferences etc. But surprisingly 
this simplest patch worked the best.   The patch results in the same 
SPEC2000 rates.



commit 9ba4fc60a8d4c06130dedb7a6df1c72d972c4eb6
Author: Vladimir N. Makarov 
Date:   Fri Jan 31 14:26:26 2020 -0500

Fix for PR 91333 - suboptimal register allocation for inline asm

2020-01-31  Vladimir Makarov  

PR rtl-optimization/91333
* ira-color.c (bucket_allocno_compare_func): Move conflict hard
reg preferences comparison up.

2020-01-31  Vladimir Makarov  

PR rtl-optimization/91333
* gcc.target/i386/pr91333.c: New.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 022865d005d..f4141157e97 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2020-01-31  Vladimir Makarov  
+
+	PR rtl-optimization/91333
+	* ira-color.c (bucket_allocno_compare_func): Move conflict hard
+	reg preferences comparison up.
+
 2020-01-31  Andrew Stubbs  
 
 	* config/gcn/gcn-valu.md (addv64di3_exec): Allow one '0' in each
diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 4a726dc7c1b..51c4afd6391 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2251,6 +2251,11 @@ bucket_allocno_compare_func (const void *v1p, const void *v2p)
   ira_allocno_t t2 = ALLOCNO_COLOR_DATA (a2)->first_thread_allocno;
   int cl1 = ALLOCNO_CLASS (a1), cl2 = ALLOCNO_CLASS (a2);
 
+  /* Push allocnos with minimal conflict_allocno_hard_prefs first.  */
+  pref1 = ALLOCNO_COLOR_DATA (a1)->conflict_allocno_hard_prefs;
+  pref2 = ALLOCNO_COLOR_DATA (a2)->conflict_allocno_hard_prefs;
+  if ((diff = pref1 - pref2) != 0)
+return diff;
   freq1 = ALLOCNO_COLOR_DATA (t1)->thread_freq;
   freq2 = ALLOCNO_COLOR_DATA (t2)->thread_freq;
   if ((diff = freq1 - freq2) != 0)
@@ -2276,11 +2281,6 @@ bucket_allocno_compare_func (const void *v1p, const void *v2p)
   a2_num = ALLOCNO_COLOR_DATA (a2)->available_regs_num;
   if ((diff = a2_num - a1_num) != 0)
 return diff;
-  /* Push allocnos with minimal conflict_allocno_hard_prefs first.  */
-  pref1 = ALLOCNO_COLOR_DATA (a1)->conflict_allocno_hard_prefs;
-  pref2 = ALLOCNO_COLOR_DATA (a2)->conflict_allocno_hard_prefs;
-  if ((diff = pref1 - pref2) != 0)
-return diff;
   return ALLOCNO_NUM (a2) - ALLOCNO_NUM (a1);
 }
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index f95d2d4e069..edc250d8b40 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-01-31  Vladimir Makarov  
+
+	PR rtl-optimization/91333
+	* gcc.target/i386/pr91333.c: New.
+
 2020-01-31  Tobias Burnus  
 
 	PR fortran/93462
diff --git a/gcc/testsuite/gcc.target/i386/pr91333.c b/gcc/testsuite/gcc.target/i386/pr91333.c
new file mode 100644
index 000..41fc328698e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr91333.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-O2 -mavx" } */
+/* { dg-final { scan-assembler-times "vmovapd" 2 } } */
+
+static inline double g (double x){
+  asm volatile ("" : "+x" (x));
+  return x;
+}
+static inline double f (double a, double b){
+  return g (g (a) + g (b));
+}
+double h (double a, double b){
+  return f (f (a, a), f (b, b));
+}


[committed] analyzer: fix ICE getting void return value (PR 93379)

2020-01-31 Thread David Malcolm
PR analyzer/93379 reports an ICE within
region_model::update_for_return_superedge when writing the
returned svalue_id to the lhs of the call_stmt

The root cause is that this analyzer code assumed that for any call
with a non-NULL gimple_call_lhs, the called fndecl would have non-void
return type, and thus that a non-null svalue_id would be returned from
region_model::pop_frame.  This isn't the case e.g. for a call with
conflicting types where the callee returns void but the caller assumes
int.

This patch fixes the ICE by moving the check for null result so that
it also guards setting the lhs.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r10-6383-gf1c807e887d43551bca0acc16a438d880cfaf7c9.

gcc/analyzer/ChangeLog:
PR analyzer/93379
* region-model.cc (region_model::update_for_return_superedge):
Move check for null result so that it also guards setting the
lhs.

gcc/testsuite/ChangeLog:
PR analyzer/93379
* gcc.dg/analyzer/torture/pr93379-2.c: New test.
* gcc.dg/analyzer/torture/pr93379.c: New test.
---
 gcc/analyzer/region-model.cc  |  5 -
 gcc/testsuite/gcc.dg/analyzer/torture/pr93379-2.c | 11 +++
 gcc/testsuite/gcc.dg/analyzer/torture/pr93379.c   |  2 ++
 3 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/pr93379-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/pr93379.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index f116c0ae7a2..d43aef3a4d7 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5694,12 +5694,15 @@ region_model::update_for_return_superedge (const 
return_superedge &return_edge,
   svalue_id result_sid = pop_frame (true, &stats, ctxt);
   // TODO: do something with the stats?
 
+  if (result_sid.null_p ())
+return;
+
   /* Set the result of the call, within the caller frame.  */
   const gcall *call_stmt = return_edge.get_call_stmt ();
   tree lhs = gimple_call_lhs (call_stmt);
   if (lhs)
 set_value (get_lvalue (lhs, ctxt), result_sid, ctxt);
-  else if (!result_sid.null_p ())
+  else
 {
   /* This could be a leak; try purging again, but this time,
 don't special-case the result_sid.  */
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/pr93379-2.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/pr93379-2.c
new file mode 100644
index 000..6e533dbca68
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/pr93379-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wno-implicit-function-declaration" } */
+
+void foo (void)
+{
+  int i = actually_returns_void ();
+}
+
+void actually_returns_void (void) /* { dg-warning "conflicting types" } */
+{
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/pr93379.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/pr93379.c
new file mode 100644
index 000..01465cf60e1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/pr93379.c
@@ -0,0 +1,2 @@
+/* { dg-do compile } */
+#include "../../torture/pr57330.c"
-- 
2.21.0



[committed] analyzer: fix ICE with pointers between stack frames (PR 93438)

2020-01-31 Thread David Malcolm
PR analyzer/93438 reports an ICE when merging two region_models
in which an older stack frame has a local pointing to a local in
a more recent stack frame.

  stack
older frame
  int *: "ow" --+
|
newer frame |
  int: "pk" <---+

The root cause is that the state-merging code assumes that all frame
regions in the merged model have already been created.
stack_region::can_merge_p iterates through the frames, creating
and populating each merged frame in turn, so when it attempts to
populate the older frame, it attempts to reference the newer frame in
the merged model, which doesn't exist yet.

This patch reworks stack_region::can_merge_p to use a two-pass approach
in which all frames in the merged model are created first, and then
are all populated, fixing the bug.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r10-6382-g455f58ec50465aed9d92dc31d68708a05e499388.

gcc/analyzer/ChangeLog:
PR analyzer/93438
* region-model.cc (stack_region::can_merge_p): Split into a two
pass approach, creating all stack regions first, then populating
them.
(selftest::test_state_merging): Add test coverage for (a) the case
of self-merging a model in which a local in an older stack frame
points to a local in a more recent stack frame (which previously
would ICE), and (b) the case of self-merging a model in which a
local points to a global (which previously worked OK).

gcc/testsuite/ChangeLog:
PR analyzer/93438
* gcc.dg/analyzer/torture/pr93438.c: New test.
* gcc.dg/analyzer/torture/pr93438-2.c: New test.
---
 gcc/analyzer/region-model.cc  | 115 +++---
 .../gcc.dg/analyzer/torture/pr93438-2.c   |  26 
 .../gcc.dg/analyzer/torture/pr93438.c |  13 ++
 3 files changed, 135 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/pr93438-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/pr93438.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index b546114bfd5..f116c0ae7a2 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2624,27 +2624,45 @@ stack_region::can_merge_p (const stack_region 
*stack_region_a,
   stack_region *merged_stack
 = merged_model->get_region  (rid_merged_stack);
 
-  for (unsigned i = 0; i < stack_region_a->get_num_frames (); i++)
-{
-  region_id rid_a = stack_region_a->get_frame_rid (i);
-  frame_region *frame_a = merger->get_region_a  (rid_a);
+  /* First, create all frames in the merged model, without populating them.
+ The merging code assumes that all frames in the merged model already 
exist,
+ so we have to do this first to handle the case in which a local in an
+ older frame points at a local in a more recent frame.  */
+for (unsigned i = 0; i < stack_region_a->get_num_frames (); i++)
+  {
+   region_id rid_a = stack_region_a->get_frame_rid (i);
+   frame_region *frame_a = merger->get_region_a  (rid_a);
 
-  region_id rid_b = stack_region_b->get_frame_rid (i);
-  frame_region *frame_b = merger->get_region_b  (rid_b);
+   region_id rid_b = stack_region_b->get_frame_rid (i);
+   frame_region *frame_b = merger->get_region_b  (rid_b);
 
-  if (frame_a->get_function () != frame_b->get_function ())
-   return false;
-  frame_region *merged_frame = new frame_region (rid_merged_stack,
-frame_a->get_function (),
-frame_a->get_depth ());
-  region_id rid_merged_frame = merged_model->add_region (merged_frame);
-  merged_stack->push_frame (rid_merged_frame);
-
-  if (!map_region::can_merge_p (frame_a, frame_b,
-   merged_frame, rid_merged_frame,
-   merger))
-   return false;
-}
+   if (frame_a->get_function () != frame_b->get_function ())
+ return false;
+
+   frame_region *merged_frame = new frame_region (rid_merged_stack,
+  frame_a->get_function (),
+  frame_a->get_depth ());
+   region_id rid_merged_frame = merged_model->add_region (merged_frame);
+   merged_stack->push_frame (rid_merged_frame);
+  }
+
+/* Now populate the frames we created.  */
+for (unsigned i = 0; i < stack_region_a->get_num_frames (); i++)
+  {
+   region_id rid_a = stack_region_a->get_frame_rid (i);
+   frame_region *frame_a = merger->get_region_a  (rid_a);
+
+   region_id rid_b = stack_region_b->get_frame_rid (i);
+   frame_region *frame_b = merger->get_region_b  (rid_b);
+
+   region_id rid_merged_frame = merged_stack->get_frame_rid (i);
+   frame_region *merged_frame
+ = merged_model->get_region  (ri

[PATCH] diagnostics: Add options to control the column units [PR49973] [PR86904]

2020-01-31 Thread Lewis Hyatt
Hello-

Here is the second patch that I mentioned when I submitted the other related
patch (which is awaiting review):
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01626.html. This second patch
is based on top of the first one and it closes out PR49973 and PR86904 by
adding the new option -fdiagnostics-column-unit=[display|byte]. This allows
to specify whether columns are output as simple byte counts (the current
behavior), or as display columns including handling multibyte characters and
tabs. The patch makes display columns the new default. Additionally, a
second new option -fdiagnostics-column-origin is added, which allows to make
the column 0-based (or N-based for any N) instead of 1-based. The default
remains at 1-based as it is now.

A number of testcases were explicitly testing for the old behavior, so I
have updated them to test for the new behavior instead, since the column
number adjusted for tabs is more natural to test for, and matches what
editors typically show (give or take 1 for the origin convention).

One other testcase (go.dg/arrayclear.go) was a bit of an oddity. It failed
after this patch, although it doesn't test for any column numbers. The
answer turned out to be, this test checks for identical error text on two
different lines. When the column units are changed to display columns, then
the column of the second error happens to match the line of the first
one. dejagnu then misinterprets the second error as if it matched the
location of the first one (it doesn't distinguish whether it checks for the
line number or the column number in the output). I added a comment to the
test explaining the situation; since adding the comment has the side effect
of making the first line number no longer match the second column number, it
also makes the test pass again.

It wasn't quite clear to me whether this change was appropriate for GCC 10
or not at this point. We discussed it a couple months ago here:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02171.html. Either way, I hope
it isn't a problem that I submitted the patch for review now, whether it
will end up in 10 or 11. Please let me know what's normally expected?
Thanks!

Bootstrapped and regtested all languages on linux x86-64, results were the
same before and after:

FAIL 109 109
PASS 467378 467378
UNRESOLVED 1 1
UNSUPPORTED 11183 11183
UNTESTED 202 202
XFAIL 1756 1756
XPASS 36 36

Thanks for taking a look at it.

-Lewis
gcc/ChangeLog:

2020-01-31  Lewis Hyatt  

* common.opt: Added -fdiagnostics-column-unit= and
-fdiagnostics-column-origin= options.  Fix typo in the description
text for -fdiagnostics-output-format.
* diagnostic-format-json.cc (json_from_expanded_location): Added
diagnostic_context argument.  Use it to convert column numbers as per
the new options.
(json_from_location_range): Likewise.
(json_from_fixit_hint): Likewise.
(json_end_diagnostic): Pass the new context arguments to helper
functions above.
(test_unknown_location): Likewise.
(test_bad_endpoints): Likewise.
* diagnostic.c (diagnostic_initialize): Initialize new column_unit and
column_adj members.
(diagnostic_converted_column): New function.
(maybe_line_and_column): Elide negative columns, rather than a zero
column.
(diagnostic_get_location_text): Convert column number as per the new
options.
(diagnostic_report_current_module): Likewise.
(print_parseable_fixits): Change pretty_printer argument to a
diagnostic_context. Use the context to convert column numbers.
(diagnostic_report_diagnostic): Adapt to new arguments for
print_parseable_fixits().
(test_print_parseable_fixits_none): Likewise.
(test_print_parseable_fixits_insert): Likewise.
(test_print_parseable_fixits_remove): Likewise.
(test_print_parseable_fixits_replace): Likewise.
(assert_location_text): Added origin and column_unit arguments for
testing the new functionality.
(test_diagnostic_get_location_text): Added selftests for
-fdiagnostics-column-unit= and -fdiagnostics-column-origin=.
* diagnostic.h (enum diagnostics_column_unit): New enum.
(struct diagnostic_context): Added new column_unit and column_adj
members.
(diagnostic_converted_column): Declare.
(json_from_expanded_location): Added new context argument.
* opts.c (common_handle_option): Handle the new options.
* tree-diagnostic-path.cc (default_tree_make_json_for_path): Pass the
new context argument to json_from_expanded_location().


gcc/testsuite/ChangeLog:

2020-01-31  Lewis Hyatt  

* c-c++-common/Wmisleading-indentation-3.c: Adjust expected output
for new default display column units.
* c-c++-common/Wmisleading-indentation.c: Likewise.
* g++.dg/parse/error4.C: Likewise.
* g++.old-deja/g++.bren

Re: [PATCH] adjust object size computation for union accesses and PHIs (PR 92765)

2020-01-31 Thread Martin Sebor

Attached is a reworked patch since the first one didn't go far
enough to solve the major problems.  The new solution relies on
get_range_strlen_dynamic the same way as the sprintf optimization,
and does away with the determine_min_objsize function and calling
compute_builtin_object_size.

To minimize testsuite fallout I extended get_range_strlen to handle
a couple more kinds expressions(*), but I still had to xfail and
disable a few tests that were relying on being able to use the type
of the destination object as the upper bound on the string length.

Tested on x86_64-linux.

Martin

[*] With all the issues around MEM_REFs and types this change needs
extra scrutiny.  I'm still not sure I fully understand what can and
what cannot be safely relied on at this level.

On 1/15/20 6:18 AM, Martin Sebor wrote:

The strcmp optimization newly introduced in GCC 10 relies on
the size of the smallest referenced array object to determine
whether the function can return zero.  When the size of
the object is smaller than the length of the other string
argument the optimization folds the equality to false.

The bug report has identified a couple of problems here:
1) when the access to the array object is via a pointer to
a (possibly indirect) member of a union, in GIMPLE the pointer
may actually point to a different member than the one in
the original source code.  Thus the size of the array may
appear to be smaller than in the source code which can then
result in the optimization being invalid.
2) when the pointer in the access may point to two or more
arrays of different size (i.e., it's the result of a PHI),
assuming it points to the smallest of them can also lead
to an incorrect result when the optimization is applied.

The attached patch adjusts the optimization to 1) avoid making
any assumptions about the sizes of objects accessed via union
types, and b) use the size of the largest object in PHI nodes.

Tested on x86_64-linux.

Martin



PR tree-optimization/92765 - wrong code for strcmp of a union member

gcc/ChangeLog:

	PR tree-optimization/92765
	* gimple-fold.c (get_range_strlen_tree): Handle MEM_REF and PARM_DECL.
	* tree-ssa-strlen.c (compute_string_length): Remove.
	(determine_min_objsize): Remove.
	(get_len_or_size): Add an argument.  Call get_range_strlen_dynamic.
	Avoid using type size as the upper bound on string length.
	(handle_builtin_string_cmp): Add an argument.  Adjust.
	(strlen_check_and_optimize_call): Pass additional argument to
	handle_builtin_string_cmp.

gcc/testsuite/ChangeLog:

	PR tree-optimization/92765
	* g++.dg/tree-ssa/strlenopt-1.C: New test.
	* g++.dg/tree-ssa/strlenopt-2.C: New test.
	* gcc.dg/Warray-bounds-58.c: New test.
	* gcc.dg/Wrestrict-20.c: Avoid a valid -Wformat-overflow.
	* gcc.dg/Wstring-compare.c: Xfail a test.
	* gcc.dg/strcmpopt_2.c: Disable tests.
	* gcc.dg/strcmpopt_4.c: Adjust tests.
	* gcc.dg/strcmpopt_10.c: New test.
	* gcc.dg/strlenopt-69.c: Disable tests.
	* gcc.dg/strlenopt-92.c: New test.
	* gcc.dg/strlenopt-93.c: New test.
	* gcc.dg/strlenopt.h: Declare calloc.
	* gcc.dg/tree-ssa/pr92056.c: Xfail tests until pr93518 is resolved.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: Correct test (pr93517).

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index ed225922269..d70ac67e1ca 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1280,7 +1280,7 @@ get_range_strlen_tree (tree arg, bitmap *visited, strlen_range_kind rkind,
 		   c_strlen_data *pdata, unsigned eltsize)
 {
   gcc_assert (TREE_CODE (arg) != SSA_NAME);
- 
+
   /* The length computed by this invocation of the function.  */
   tree val = NULL_TREE;
 
@@ -1422,7 +1422,42 @@ get_range_strlen_tree (tree arg, bitmap *visited, strlen_range_kind rkind,
 	 type about the length here.  */
 	  tight_bound = true;
 	}
-  else if (VAR_P (arg))
+  else if (TREE_CODE (arg) == MEM_REF
+	   && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE
+	   && TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) == INTEGER_TYPE
+	   && TREE_CODE (TREE_OPERAND (arg, 0)) == ADDR_EXPR)
+	{
+	  /* Handle a MEM_REF into a DECL accessing an array of integers,
+	 being conservative about references to extern structures with
+	 flexible array members that can be initialized to arbitrary
+	 numbers of elements as an extension (static structs are okay).
+	 FIXME: Make this less conservative -- see
+	 component_ref_size in tree.c.  */
+	  tree ref = TREE_OPERAND (TREE_OPERAND (arg, 0), 0);
+	  tree off = TREE_OPERAND (arg, 1);
+	  if ((TREE_CODE (ref) == PARM_DECL || VAR_P (ref))
+	  && (!DECL_EXTERNAL (ref)
+		  || !array_at_struct_end_p (arg)))
+	{
+	  /* Fail if the offset is out of bounds.  Such accesses
+		 should be diagnosed at some point.  */
+	  val = DECL_SIZE_UNIT (ref);
+	  if (!val
+		  || integer_zerop (val)
+		  || tree_int_cst_le (val, off))
+		return false;
+
+	  pdata->minlen = ssize_int (0);
+
+	  /* Subtract the offset and one for the terminating nul.  

Re: [Patch] Inline optimization for tanh(x)/sinh(x) -> 1.0/cosh(x)

2020-01-31 Thread Vitor Guidi
I took the required steps. The patch is attached to this email, I hope
I got it right this time. I did not forward the patch to gcc-patches
the first time, sorry for the inconvenience.

Thank you for your attention,

Vitor.

in gcc/ChangeLog:
2020-01-28 Vitor Guidi 
* match.pd: New substitution rule for tanh(x)/sinh(x) ->
1.0/cosh(x).

in gcc/testsuite/ChangeLog:
2020-01-28 Vitor Guidi 
*gcc.dg/tanhbysinh.c (new): New testcase.


On Fri, Jan 31, 2020 at 8:18 AM Richard Biener
 wrote:
>
> On Fri, Jan 31, 2020 at 12:06 PM Marc Glisse  wrote:
> >
> > On Thu, 30 Jan 2020, Vitor Guidi wrote:
> >
> > >> + /* Simplify tanh (x) / sinh (x) -> 1.0 / cosh (x). */
> > >> + (simplify
> > >> +   (rdiv (TANH @0) (SINH @0))
> > >> +   (rdiv {build_one_cst (type);} (COSH @0)))
> >
> > The existing
> >
> >   (simplify
> >(rdiv (SINH:s @0) (COSH:s @0))
> > (TANH @0))
> >
> > has :s (which AFAIK are ignored because the output is a single insn) but
> > not this new one, where it would not be ignored. That's not very
> > consistent.
>
> True.
>
> Richard.
>
> >
> > --
> > Marc Glisse
diff --git a/gcc/match.pd b/gcc/match.pd
index 5fee394e7af..7de106d5ae7 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5069,6 +5069,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (rdiv (SINH:s @0) (COSH:s @0))
(TANH @0))
 
+ /* Simplify tanh (x) / sinh (x) -> 1.0 / cosh (x). */
+ (simplify
+   (rdiv (TANH:s @0) (SINH:s @0))
+   (rdiv {build_one_cst (type);} (COSH @0)))
+
  /* Simplify cos(x) / sin(x) -> 1 / tan(x). */
  (simplify
   (rdiv (COS:s @0) (SIN:s @0))
diff --git a/gcc/testsuite/gcc.dg/tanhbysinh.c 
b/gcc/testsuite/gcc.dg/tanhbysinh.c
new file mode 100644
index 000..fde72c2f93b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tanhbysinh.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+extern float sinhf (float);
+extern float tanhf (float);
+extern double sinh (double);
+extern double tanh (double);
+extern long double sinhl (long double);
+extern long double tanhl (long double);
+
+double __attribute__ ((noinline))
+tanhbysinh_ (double x)
+{
+return tanh (x) / sinh (x);
+}
+
+float __attribute__ ((noinline))
+tanhbysinhf_ (float x)
+{
+return tanhf (x) / sinhf (x);
+}
+
+long double __attribute__ ((noinline))
+tanhbysinhl_ (long double x)
+{
+return tanhl (x) / sinhl (x);
+}
+
+
+/* There must be no calls to sinh or atanh */
+/* There must be calls to cosh */
+/* {dg-final { scan-tree-dump-not "sinh " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "tanh " "optimized" }} */
+/* {dg-final { scan-tree-dump-not "sinhf " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "tanhf " "optimized" }} */
+/* {dg-final { scan-tree-dump-not "sinhl " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "tanhl " "optimized" }} */
+/* { dg-final { scan-tree-dump "cosh " "optimized" } } */
+/* { dg-final { scan-tree-dump "coshf " "optimized" } } */
+/* { dg-final { scan-tree-dump "coshl " "optimized" } } */
\ No newline at end of file


Re: [PATCH][GCC][middle-end] Fix logical shift truncation (PR91838)

2020-01-31 Thread Jakub Jelinek
On Fri, Jan 31, 2020 at 10:12:08AM +, Tamar Christina wrote:
> 2020-01-31  Tamar Christina  
> 
>   PR 91838
>   * simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case
>   to truncate if allowed or reject combination.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-01-31  Tamar Christina  
> 
>   PR 91838
>   * g++.dg/pr91838.C: New test.

The test FAILs on i686-linux with:
FAIL: g++.dg/pr91838.C   (test for excess errors)
Excess errors:
/home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:8: warning: MMX vector 
return without MMX enabled changes the ABI [-Wpsabi]
/home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:3: warning: MMX vector 
argument without MMX enabled changes the ABI [-Wpsabi]
and on x86_64-linux with -m32 testing with failure to match the
expected pattern in there (or both with e.g. -m32/-mno-mmx/-mno-sse testing).
The test is also in a wrong directory, has non-standard specification that
it requires c++11 or later.

Fixed thusly, regtested on i686-linux and x86_64-linux (the latter with
--target_board=unix\{-m32/-mno-mmx/-mno-sse,-m32,-m64\}), committed to trunk
as obvious.

2020-01-31  Jakub Jelinek  

PR rtl-optimization/91838
* g++.dg/pr91838.C: Moved to ...
* g++.dg/opt/pr91838.C: ... here.  Require c++11 target instead of
dg-skip-if for c++98.  Pass -Wno-psabi -w to avoid psabi style
warnings on vector arg passing or return.  Add -masm=att on i?86/x86_64.
Only check for pxor %xmm0, %xmm0 on lp64 i?86/x86_64.

diff --git a/gcc/testsuite/g++.dg/opt/pr91838.C 
b/gcc/testsuite/g++.dg/opt/pr91838.C
new file mode 100644
index 000..fdf48094ade
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr91838.C
@@ -0,0 +1,11 @@
+/* { dg-do compile { target c++11 } } */
+/* { dg-additional-options "-O2 -Wno-psabi -w" } */
+/* { dg-additional-options "-masm=att" { target i?86-*-* x86_64-*-* } } */
+
+using T = unsigned char; // or ushort
+using V [[gnu::vector_size(8)]] = T;
+V f(V x) {
+  return x >> 8 * sizeof(T);
+}
+
+/* { dg-final { scan-assembler {pxor\s+%xmm0,\s+%xmm0} { target { { i?86-*-* 
x86_64-*-* } && lp64 } } } } */
diff --git a/gcc/testsuite/g++.dg/pr91838.C b/gcc/testsuite/g++.dg/pr91838.C
deleted file mode 100644
index 4dbaef05ce8..000
--- a/gcc/testsuite/g++.dg/pr91838.C
+++ /dev/null
@@ -1,11 +0,0 @@
-/* { dg-do compile } */
-/* { dg-additional-options "-O2" } */
-/* { dg-skip-if "" { *-*-* } {-std=c++98} } */
-
-using T = unsigned char; // or ushort, or uint
-using V [[gnu::vector_size(8)]] = T;
-V f(V x) {
-  return x >> 8 * sizeof(T);
-}
-
-/* { dg-final { scan-assembler {pxor\s+%xmm0,\s+%xmm0} { target x86_64-*-* } } 
} */


Jakub



[PATCH] c++: Partially implement P1042R1: __VA_OPT__ wording clarifications [PR92319]

2020-01-31 Thread Jakub Jelinek
Hi!

I've noticed we claim in cxx-status.html that we implement P1042R1,
but it seems we don't implement any of the changes from there.
The following patch implements just the change that __VA_OPT__ determines
whether to expand to nothing or the enclosed tokens no longer based on
whether there were any tokens passed to __VA_ARGS__, but whether __VA_ARGS__
expands to any tokens (from testing apparently it has to be non-CPP_PADDING
tokens).
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I'm afraid I'm completely lost about the padding preservation/removal
changes that are also in the paper, so haven't touched that part.

2020-01-31  Jakub Jelinek  

Partially implement P1042R1: __VA_OPT__ wording clarifications
PR preprocessor/92319
* macro.c (expand_arg): Move declarations before vaopt_state
definition.
(class vaopt_state): Move enum update_type definition earlier.  Remove
m_allowed member, add m_arg and m_update members.
(vaopt_state::vaopt_state): Change last argument from bool any_args
to macro_arg *arg, initialize m_arg and m_update instead of m_allowed.
(vaopt_state::update): When bumping m_state from 1 to 2 and m_update
is ERROR, determine if __VA_ARGS__ expansion has any non-CPP_PADDING
tokens and set m_update to INCLUDE if it has any, DROP otherwise.
Return m_update instead of m_allowed ? INCLUDE : DROP in m_state >= 2.
(replace_args, create_iso_definition): Adjust last argument to
vaopt_state ctor.

* c-c++-common/cpp/va-opt-4.c: New test.

--- libcpp/macro.c.jj   2020-01-29 09:35:06.022244483 +0100
+++ libcpp/macro.c  2020-01-31 12:28:02.444695860 +0100
@@ -93,6 +93,8 @@ struct macro_arg_saved_data {
 static const char *vaopt_paste_error =
   N_("'##' cannot appear at either end of __VA_OPT__");
 
+static void expand_arg (cpp_reader *, macro_arg *);
+
 /* A class for tracking __VA_OPT__ state while iterating over a
sequence of tokens.  This is used during both macro definition and
expansion.  */
@@ -100,28 +102,29 @@ class vaopt_state {
 
  public:
 
+  enum update_type
+  {
+ERROR,
+DROP,
+INCLUDE,
+BEGIN,
+END
+  };
+
   /* Initialize the state tracker.  ANY_ARGS is true if variable
  arguments were provided to the macro invocation.  */
-  vaopt_state (cpp_reader *pfile, bool is_variadic, bool any_args)
+  vaopt_state (cpp_reader *pfile, bool is_variadic, macro_arg *arg)
 : m_pfile (pfile),
-m_allowed (any_args),
+m_arg (arg),
 m_variadic (is_variadic),
 m_last_was_paste (false),
 m_state (0),
 m_paste_location (0),
-m_location (0)
+m_location (0),
+m_update (ERROR)
   {
   }
 
-  enum update_type
-  {
-ERROR,
-DROP,
-INCLUDE,
-BEGIN,
-END
-  };
-
   /* Given a token, update the state of this tracker and return a
  boolean indicating whether the token should be be included in the
  expansion.  */
@@ -154,6 +157,23 @@ class vaopt_state {
return ERROR;
  }
++m_state;
+   if (m_update == ERROR)
+ {
+   if (m_arg == NULL)
+ m_update = INCLUDE;
+   else
+ {
+   m_update = DROP;
+   if (!m_arg->expanded)
+ expand_arg (m_pfile, m_arg);
+   for (unsigned idx = 0; idx < m_arg->expanded_count; ++idx)
+ if (m_arg->expanded[idx]->type != CPP_PADDING)
+   {
+ m_update = INCLUDE;
+ break;
+   }
+ }
+ }
return DROP;
   }
 else if (m_state >= 2)
@@ -197,7 +217,7 @@ class vaopt_state {
return END;
  }
  }
-   return m_allowed ? INCLUDE : DROP;
+   return m_update;
   }
 
 /* Nothing to do with __VA_OPT__.  */
@@ -219,8 +239,9 @@ class vaopt_state {
   /* The cpp_reader.  */
   cpp_reader *m_pfile;
 
-  /* True if there were varargs.  */
-  bool m_allowed;
+  /* The __VA_ARGS__ argument.  */
+  macro_arg *m_arg;
+
   /* True if the macro is variadic.  */
   bool m_variadic;
   /* If true, the previous token was ##.  This is used to detect when
@@ -239,6 +260,10 @@ class vaopt_state {
 
   /* Location of the __VA_OPT__ token.  */
   location_t m_location;
+
+  /* If __VA_ARGS__ substitutes to no preprocessing tokens,
+ INCLUDE, otherwise DROP.  ERROR when unknown yet.  */
+  update_type m_update;
 };
 
 /* Macro expansion.  */
@@ -256,7 +281,6 @@ static _cpp_buff *collect_args (cpp_read
_cpp_buff **, unsigned *);
 static cpp_context *next_context (cpp_reader *);
 static const cpp_token *padding_token (cpp_reader *, const cpp_token *);
-static void expand_arg (cpp_reader *, macro_arg *);
 static const cpp_token *new_string_token (cpp_reader *, uchar *, unsigned int);
 static const cpp_token *stringify_arg (cpp_reader *, macro_arg *);
 

[COMMITTED] aarch64: Add Armv8.6 SVE bfloat16 support

2020-01-31 Thread Richard Sandiford
This patch adds support for the SVE intrinsics that map to Armv8.6
bfloat16 instructions.  This means that svcvtnt is now a base SVE
function for one type suffix combination; the others are still
SVE2-specific.

This relies on a binutils fix:

https://sourceware.org/ml/binutils/2020-01/msg00450.html

so anyone testing older binutils 2.34 or binutils master sources will
need to upgrade to get clean test results.  (At the time of writing,
no released version of binutils has this bug.)

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed.

Sorry again for applying this kind of thing during stage 4.  It should be
very low risk for parts unrelated to the SVE ACLE though, so shouldn't
affect anything that also existed in GCC 9.

Richard


2020-01-31  Richard Sandiford  

gcc/
* config/aarch64/aarch64.h (TARGET_SVE_BF16): New macro.
* config/aarch64/aarch64-sve-builtins-sve2.h (svcvtnt): Move to
aarch64-sve-builtins-base.h.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svcvtnt): Move to
aarch64-sve-builtins-base.cc.
* config/aarch64/aarch64-sve-builtins-base.h (svbfdot, svbfdot_lane)
(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
(svcvtnt): Declare.
* config/aarch64/aarch64-sve-builtins-base.cc (svbfdot, svbfdot_lane)
(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
(svcvtnt): New functions.
* config/aarch64/aarch64-sve-builtins-base.def (svbfdot, svbfdot_lane)
(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
(svcvtnt): New functions.
(svcvt): Add a form that converts f32 to bf16.
* config/aarch64/aarch64-sve-builtins-shapes.h (ternary_bfloat)
(ternary_bfloat_lane, ternary_bfloat_lanex2, ternary_bfloat_opt_n):
Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc (parse_element_type):
Treat B as bfloat16_t.
(ternary_bfloat_lane_base): New class.
(ternary_bfloat_def): Likewise.
(ternary_bfloat): New shape.
(ternary_bfloat_lane_def): New class.
(ternary_bfloat_lane): New shape.
(ternary_bfloat_lanex2_def): New class.
(ternary_bfloat_lanex2): New shape.
(ternary_bfloat_opt_n_def): New class.
(ternary_bfloat_opt_n): New shape.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_cvt_bfloat): New macro.
* config/aarch64/aarch64-sve.md (@aarch64_sve_vnx4sf)
(@aarch64_sve__lanevnx4sf): New patterns.
(@aarch64_sve__trunc)
(@cond__trunc): Likewise.
(*cond__trunc): Likewise.
(@aarch64_sve_cvtnt): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_cvtnt): Key
the pattern off the narrow mode instead of the wider one.
* config/aarch64/iterators.md (VNx8BF_ONLY): New mode iterator.
(UNSPEC_BFMLALB, UNSPEC_BFMLALT, UNSPEC_BFMMLA): New unspecs.
(sve_fp_op): Handle them.
(SVE_BFLOAT_TERNARY_LONG): New int itertor.
(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_asm_bf16_ok):
New proc.
* gcc.target/aarch64/sve/acle/asm/bfdot_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/bfdot_lane_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalb_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalb_lane_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalt_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalt_lane_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/cvt_bf16.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/cvtnt_bf16.c: Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_1.c: Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lane_1.c:
Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lanex2_1.c:
Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
Likweise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |  11 ++
 .../aarch64/aarch64-sve-builtins-base.def |  12 ++
 .../aarch64/aarch64-sve-builtins-base.h   |   8 ++
 .../aarch64/aarch64-sve-builtins-shapes.cc|  66 ++
 .../aarch64/aarch64-sve-builtins-shapes.h |   4 +
 .../aarch64/aarch64-sve-builtins-sve2.cc  |   1 -
 .../aarch64/aarch64-sve-builtins-sve2.h   |   1 -
 gcc/config/aarch64/aarch64-sve-builtins.cc|   5 +
 gcc/config/aarch64/aarch64-sve.md | 113 ++
 gcc/config/aarch64/aarch64-sve2.md|  14 +--
 gcc/config/aarch64/aarch64.h  |   1 +
 gcc/config/aarch64/iterators.md   |  19 ++-
 .../aarch64/sve/acle/asm/bfdot_f32.c  |  67 +++
 .../aarch64/sve/acle/asm/bfdot_lane_f32.c |  86 

[COMMITTED] aarch64: Fix SVE PCS failures for BE & ILP32

2020-01-31 Thread Richard Sandiford
This patch should (finally!) give clean test results for
aarch64-sve-pcs.exp for all {be,le}{lp64,ilp32} combinations.

The *_128.c tests require aarch64_little_endian because they test for
fixed-length 128-bit code, whereas -msve-vector-bits=128 still generates
VLA code for big-endian.

Some tests require lp64 because they match (64-bit) pointer loads and
stores.  Others require it because ilp32 adds extra zero extensions.

We still have a non-trivial amount of coverage for -mbig-endian -mabi=ilp32:

 # of expected passes663
 # of unsupported tests  59

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed.

Richard


2020-01-31  Richard Sandiford  

gcc/testsuite/
* gcc.target/aarch64/sve/pcs/args_1.c: Require lp64 for
check-function-bodies tests.
* gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Require lp64.
* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6

Re: [PATCH] V12 patch #2 of 14, Refactor rs6000_adjust_vec_address & rs6000_split_vec_extract_var

2020-01-31 Thread Segher Boessenkool
Hi!

On Thu, Jan 09, 2020 at 07:04:10PM -0500, Michael Meissner wrote:
> 2020-01-09  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (get_vector_offset): New helper function
>   to calculate the offset in memory from the start of a vector of a
>   particular element.  Add code to keep the element number in
>   bounces if the element number is variable.

"within bounds"?

> +/* Return the offset within a memory object (MEM) of a vector type to a given
> +   element within the vector (ELEMENT) with an element size (SCALAR_SIZE).  
> If
> +   the element is constant, we return a constant integer.  Otherwise, we use 
> a
> +   base register temporary to calculate the offset after making it to fit
> +   within the vector and scaling it.  */

"masking it"?

> +static rtx
> +get_vector_offset (rtx mem, rtx element, rtx base_tmp, unsigned scalar_size)
> +{
> +  if (CONST_INT_P (element))
> +return GEN_INT (INTVAL (element) * scalar_size);
> +
> +  /* All insns should use the 'Q' constraint (address is a single register) 
> if
> + the element number is not a constant.  */
> +  rtx addr = XEXP (mem, 0);
> +  gcc_assert (REG_P (addr) || SUBREG_P (addr));

satisfies_constraint_Q (addr)

> +  /* Mask the element to make sure the element number is between 0 and the
> + maximum number of elements - 1 so that we don't generate an address
> + outside the vector.  */

But why is that the correct thing to do?  Garbage in, garbage out is
perfectly fine?  Or do we have (e.g.) builtins that specify this masking?
If so, please say that here.

> +  emit_insn (gen_rtx_SET (base_tmp, and_op));
> +
> +  /* Shift the element to get the byte offset from the element number.  */
> +  int shift = exact_log2 (scalar_size);
> +  gcc_assert (shift >= 0);
> +
> +  if (shift > 0)
> +{
> +  rtx shift_op = gen_rtx_ASHIFT (Pmode, base_tmp, GEN_INT (shift));
> +  emit_insn (gen_rtx_SET (base_tmp, shift_op));
> +}

This sets the same pseudo twice (or is it never a pseudo?), not a good
idea in general.

Okay for trunk, with comments improved please.  Thanks!


Segher


Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]

2020-01-31 Thread Jeff Law
On Thu, 2020-01-30 at 17:54 +, Richard Sandiford wrote:
> Jeff Law  writes:
> > On Wed, 2020-01-29 at 19:18 +, Richard Sandiford wrote:
> > > Andreas Schwab  writes:
> > > > On Jan 27 2020, Richard Sandiford wrote:
> > > > 
> > > > >   * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
> > > > >   simplification to handle subregs as well as bare regs.
> > > > 
> > > > That breaks gcc.target/m68k/pr39726.c
> > > 
> > > Gah.  Jeff pointed out off-list that it also broke
> > > gcc.target/sh/pr64345-1.c on sh3-linux-gnu.  It didn't look like either
> > > of them would be fixable in an acceptably non-invasive and unhacky way,
> > > so I've reverted the patch "for now".
> > I would have considered letting those two targets regress those tests
> > to move forward on 87763.  aarch64 is (IMHO) more important than the sh
> > and m68k combined ;-)  It also seems to me that your patch generates
> > better RTL and that we could claim that a port that regresses on code
> > quality needs its port maintainer to step in and fix the port.
> > 
> > WRT the m68k issue I'd think it could be fixed by twiddling
> > cbranchsi4_btst_reg_insn_1 to accept a mode iterator on the
> > zero_extract and making some minor adjustments in its output code. 
> > Something like the attached.  I haven't tested it in any real way and
> > haven't really thought about whether or not it does the right thing for
> > a MEM operand.
> 
> It just feels like this is breaking the contract with extv and extzv,
> where all *_extracts are supposed to be in the mode that those patterns
> accept.  My i386 patch was doing that too TBH, I was just being
> optimistic given that QImode was already handled by that pattern. :-)
> 
> So IMO my patch has too many knock-on effects for it to be suitable for
> stage 4.  While we have make_extraction, we probably have to leave this
> kind of expression untouched.
> 
> For AArch64 I was planning on adding a new pattern to match the
> (subreg:SI (zero_extract:DI ...)) -- as one of the comments in the
> PR suggested -- but with the subreg matched via subreg_lowpart_operator,
> to make it endian-safe.  This is similar in spirit to the new i686
> popcount patterns.
Fair enough on the simplify-rtx change :-)  One might argue that we
should be loosening the requirements, but memory operands are
particularly troublesome.  I think HP and I had a light discussion on
that a year or so ago.

WRT adding patterns to aarch64.  Yea, I went that route last year, but
never was able to go from "hey, this seems to work pretty well" to
"it's OK for the trunk".

Jeff



[PATCH] Fix -ffast-math flags handling inconsistencies

2020-01-31 Thread Ulrich Weigand
Hello,

we've noticed some inconsistencies in how the component flags of -ffast-math
are handled, see the discussion on the GCC list starting here:
https://gcc.gnu.org/ml/gcc/2020-01/msg00365.html

This patch fixes those inconsistencies.  Specifically, there are the
following changes:

1. Some component flags for -ffast-math are *set* with -ffast-math
   (changing the default), but are not reset with -fno-fast-math,
   causing the latter to have surprising results.  (Note that since
   "-ffast-math -fno-fast-math" is short-cut by the driver, you'll
   only see the surprising results with "-Ofast -fno-fast-math".)
   This is fixed here by both setting and resetting the flags.

   This affects the following flags
  -fcx-limited-range
  -fexcess-precision=

2. Some component flags for -ffast-math are changed from their default,
   but are *not* included in the fast_math_flags_set_p test, causing
   __FAST_MATH__ to remain predefined even when the full set of fast
   math options is not actually in effect.  This is fixed here by
   adding those flags into the fast_math_flags_set_p test.

   This affects the following flags:
 -fcx-limited-range
 -fassociative-math
 -freciprocal-math

3. For some math flags, set_fast_math_flags has code that sets their
   values only to what is already their default.  The overall effect
   of this code is a complete no-op.  This patch removes that dead code.

   This affects the following flags:
 -frounding-math
 -fsignaling-nans


The overall effect of this patch is that now all component flags of
-ffast-math are treated exactly equivalently:
  * they are set (changed from their default) with -ffast-math
  * they are reset to their default with -fno-fast-math
  * __FAST_MATH__ is only defined if the value of the flag matches
what -ffast-math would have set it to

Tested on s390x-ibm-linux.

OK for mainline?

Bye,
Ulrich

ChangeLog:

* opts.c (set_fast_math_flags): In the !set case, also reset
x_flag_cx_limited_range and x_flag_excess_precision.  Remove dead
code resetting x_flag_signaling_nans and x_flag_rounding_math.
(fast_math_flags_set_p): Also test x_flag_cx_limited_range,
x_flag_associative_math, and x_flag_reciprocal_math.

diff --git a/gcc/opts.c b/gcc/opts.c
index 7affeb4..4452793 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2850,18 +2850,14 @@ set_fast_math_flags (struct gcc_options *opts, int set)
 opts->x_flag_finite_math_only = set;
   if (!opts->frontend_set_flag_errno_math)
 opts->x_flag_errno_math = !set;
-  if (set)
-{
-  if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT)
-   opts->x_flag_excess_precision
- = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
-  if (!opts->frontend_set_flag_signaling_nans)
-   opts->x_flag_signaling_nans = 0;
-  if (!opts->frontend_set_flag_rounding_math)
-   opts->x_flag_rounding_math = 0;
-  if (!opts->frontend_set_flag_cx_limited_range)
-   opts->x_flag_cx_limited_range = 1;
-}
+  if (!opts->frontend_set_flag_cx_limited_range)
+opts->x_flag_cx_limited_range = set;
+  if (!opts->frontend_set_flag_excess_precision)
+opts->x_flag_excess_precision
+  = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
+
+  // -ffast-math should also reset -fsignaling-nans and -frounding-math,
+  // but since those are off by default, there's nothing to do for now.
 }
 
 /* When -funsafe-math-optimizations is set the following
@@ -2884,10 +2880,13 @@ bool
 fast_math_flags_set_p (const struct gcc_options *opts)
 {
   return (!opts->x_flag_trapping_math
+ && !opts->x_flag_signed_zeros
+ && opts->x_flag_associative_math
+ && opts->x_flag_reciprocal_math
  && opts->x_flag_unsafe_math_optimizations
  && opts->x_flag_finite_math_only
- && !opts->x_flag_signed_zeros
  && !opts->x_flag_errno_math
+ && opts->x_flag_cx_limited_range
  && opts->x_flag_excess_precision == EXCESS_PRECISION_FAST);
 }
 
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH] Add --with-diagnostics-urls configuration option and GCC_URLS/TERM_URLS env var

2020-01-31 Thread Bernd Edlinger
Hi,

this is patch is heavily based on David's original patch here:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01409.html

and addresses Jakub's review comments here:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01412.html

So I hope you don't mind, when I pick up this patch since there
was not much activity recently on this issue, so I assumed you
would appreciate some help.

I will try to improve the patch a bit, and hope you are gonna like
it. I agree that this feature is fine, and should be enabled by
default, and just if it is positively clear that it won't work,
disabled in the auto mode.

Also as requested by Jakub this tries to be more compatible to
GCC_COLORS define, and adds the capability to switch between ST
and BEL string termination and also have a way to disable urls
even with -fdiagnostics-urls=always (like GCC_COLORS= disables colors).

In addition to that I propose to use GCC_URLS and if that
is not defined use TERM_URLS to control that feature for
all applications willing to support that.

Since I have seen much garbage from the URLs in the xfce4-terminal 0.6.3
admittedly an old Ubuntu installation, but still in LTS status,
and no attempt from the xfc4-terminal to _ever_ implement URLs, I would
like to detect the xfce4-terminal, and use that in auto mode
to switch off the URLs feature, since in best case the URLs will
just be ignored, but can and will often create garbage on the screen.

Likewise on MinGW, since the windows 10 cmd prompt does at best
ignore the URLs, but windows terminal and windows 7 cmd prompt
print garbage when URLs are output.

I have one question though, regarding the autoconf version that
we use, it is autoconf-2.69, right, at least that is what I read in
gcc/doc/install.texi?

I don't understand why the generated configure has additional
changes than what is changed in configure.ac.  Should I commit the
generated version as is, or use a different autoconf version?


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Benrd.
From fae99e05a356363ff719b5504e6250abe1c16aea Mon Sep 17 00:00:00 2001
From: Bernd Edlinger 
Date: Wed, 29 Jan 2020 15:31:10 +0100
Subject: [PATCH] PR 87488: Add --with-diagnostics-urls configuration option

2020-01-30  David Malcolm  
	Bernd Edlinger  

	PR 87488
	* config.in (DIAGNOSTICS_URLS_DEFAULT): New define.
	* configure.ac (--with-diagnostics-urls): New configuration
	option, based on --with-diagnostics-color.
	(DIAGNOSTICS_URLS_DEFAULT): New define.
	* config.h: Regenerate.
	* configure: Regenerate.
	* diagnostic.c (diagnostic_urls_init): Handle -1 for
	DIAGNOSTICS_URLS_DEFAULT from configure-time
	--with-diagnostics-urls=auto-if-env by querying for a GCC_URLS
	and TERM_URLS environment variable.
	* diagnostic-url.h (diagnostic_urls_use_st): New variable.
	* diagnostic-color.c (diagnostic_urls_use_st): Declare.
	(parse_gcc_urls, auto_enable_urls): New helper functions.
	(diagnostic_urls_enabled_p): Adjust.
	* pretty-print.c (pp_begin_url, pp_end_url, test_urls): Use
	diagnostic_urls_use_st.
	* doc/install.texi (--with-diagnostics-urls): Document new
	configuration option.
	* doc/invoke.texi (-fdiagnostics-urls): Add GCC_URLS vindex
	reference.  Update description of defaults based on the above.
---
 gcc/config.in  |  6 
 gcc/configure  | 87 +-
 gcc/configure.ac   | 28 
 gcc/diagnostic-color.c | 57 +++--
 gcc/diagnostic-url.h   |  1 +
 gcc/diagnostic.c   | 17 +-
 gcc/doc/install.texi   |  9 ++
 gcc/doc/invoke.texi| 15 +++--
 gcc/pretty-print.c | 11 +--
 9 files changed, 215 insertions(+), 16 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index ec5c46a..b30fb20 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -76,6 +76,12 @@
 #endif
 
 
+/* The default for -fdiagnostics-urls option */
+#ifndef USED_FOR_TARGET
+#undef DIAGNOSTICS_URLS_DEFAULT
+#endif
+
+
 /* Define 0/1 if static analyzer feature is enabled. */
 #ifndef USED_FOR_TARGET
 #undef ENABLE_ANALYZER
diff --git a/gcc/configure b/gcc/configure
index 490fe6a..68340df 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -974,6 +974,7 @@ with_zstd_include
 with_zstd_lib
 enable_rpath
 with_libiconv_prefix
+with_libiconv_type
 enable_sjlj_exceptions
 with_gcc_major_version_only
 enable_secureplt
@@ -1014,6 +1015,7 @@ enable_host_shared
 enable_libquadmath_support
 with_linker_hash_style
 with_diagnostics_color
+with_diagnostics_urls
 enable_default_pie
 '
   ac_precious_vars='build_alias
@@ -1811,6 +1813,7 @@ Optional Packages:
   --with-gnu-ld   assume the C compiler uses GNU ld default=no
   --with-libiconv-prefix[=DIR]  search for libiconv in DIR/include and DIR/lib
   --without-libiconv-prefix don't search for libiconv in includedir and libdir
+  --with-libiconv-type=TYPE type of library to search for (auto/static/shared)
   --with-gcc-major-version-only
 

Re: [rfc PATCH] rs6000: Updated constraint documentation

2020-01-31 Thread Bill Schmidt

On 1/31/20 9:42 AM, Segher Boessenkool wrote:

Hi Bill,

Thanks a lot for looking at this!  :-)

On Fri, Jan 31, 2020 at 08:49:21AM -0600, Bill Schmidt wrote:

+(define_register_constraint "wa"
"rs6000_constraints[RS6000_CONSTRAINT_wa]"
+  "A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  Either a @code{d}
+   or a @code{v} register.")

Not quite true, as the "d" register is only half of a VSX register.  It
may or may not be worth including a picture of register overlaps...

No, the "d" registers are the actual full registers, all 128 bits of it.
You often use them in a mode that uses only 64 bits, sure.



Perhaps that would be worth a few words when describing the "d" 
constraint, then.  This is not at all obvious to the casual user. Thanks!




I was planning to update this to

(define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
   "A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  This is either an
FPR (@code{d}) or a VR (@code{v}).")

Does that improve it?



Yes, sure.



The numbering thing is also mentioned in the %x output modifier stuff.
There must be a better way to present this, but I don't see it yet.  Hrm.



I honestly thought that was pretty good as is.

Thanks again!
Bill




  (define_register_constraint "we"
  "rs6000_constraints[RS6000_CONSTRAINT_we]"
-  "VSX register if the -mpower9-vector -m64 options were used or
NO_REGS.")
+  "@internal VSX register if the -mpower9-vector -m64 options were used
+   or NO_REGS.")

Suggest changing "used or" to "used, else".

Or just "used."; this is internals documentation only, and all similar
constraints will ideally go away at some point (it just didn't fit in
easily with the "enabled" attribute yet; it probably should be just "p9"
for "isa" and test the TARGET_64BIT in the insn condition, something
like that.  Or maybe there shouldn't be separate handling for 64-bit
at all here).


  (define_register_constraint "wr"
  "rs6000_constraints[RS6000_CONSTRAINT_wr]"
-  "General purpose register if 64-bit instructions are enabled or
NO_REGS.")
+  "@internal General purpose register if 64-bit instructions are enabled
+   or NO_REGS.")

Similar here.

Yup.  I didn't change this, fwiw, just synched up md.texi and
constraints.md where they diverged.


  (define_memory_constraint "es"
-  "A ``stable'' memory operand; that is, one which does not include any
-automodification of the base register.  Unlike @samp{m}, this constraint
-can be used in @code{asm} statements that might access the operand
-several times, or that might not access it at all."
+  "@internal
+   A ``stable'' memory operand; that is, one which does not include any
+   automodification of the base register.  This used to be useful when
+   @code{m} allowed automodification of the base register, but as those

Trailing whitespace here.

Yeah, I don't know how I missed that, git tends to shout about it.
Fixed.


  @item wa
-Any VSX register if the @option{-mvsx} option was used or NO_REGS.
+A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  Either a @code{d} or
a @code{v}
+register.

Same concern as above.

It is literally the same text now (unless I messed up the c'n'p).


+@ifset INTERNALS
+@item h
+@code{vrsave}, @code{ctr}, or @code{lr}.
+@end ifset


I don't see vrsave elsewhere in either document (should have noted this
in constraints.md also).

There is no other constraint for vrsave.  constraints.md says

(define_register_constraint "h" "SPECIAL_REGS"
   "@internal @code{vrsave}, @code{ctr}, or @code{lr}.")

(Same text, as should be).  It ends up only in gccint.*, not in gcc.* .


  @item we
-VSX register if the @option{-mcpu=power9} and @option{-m64} options
-were used or NO_REGS.
+VSX register if the -mpower9-vector -m64 options were used or NO_REGS.

As above.  I won't call out the rest of these.

Since this is not new text, and it now only ends up in the internals
documentation, and a lot of it should go away in the short term anyway,
and importantly I don't know a good simple way to write what it does
anyway (because it *isn't* simple), I hoped I could just keep this for
now.

Hrm, I lost markup there, will fix.


+@item wZ
+Indexed or indirect memory operand, ignoring the bottom 4 bits.
+@end ifset

For consistency, "An indexed..." ?

Yes, thanks!


+@item Z
+A memory operand that is an indexed or indirect from a register.

"indexed or indirect access"?

And s/from a register// yeah.


Great improvements!

Thanks :-)

Somewhere it should say (in the gcc.* doc) that there are other
constraints and output modifiers as well, and some are even supported
for backwards compatibility, but here only the ones you should use are
mentioned.  Not sure where to do that.


Segher


Re: [GCC][PATCH][AArch64] ACLE intrinsics for BFCVTN, BFCVTN2 (AArch64 AdvSIMD) and BFCVT (AArch64 FP)

2020-01-31 Thread Richard Sandiford
Delia Burduv  writes:
> [...]
> diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
> index 
> 3759c0d1cb449a7f0125cc2a1433127564d66622..fb2150e1d60a590046e2c034422021aafc721e23
>  100644
> --- a/gcc/config/aarch64/arm_bf16.h
> +++ b/gcc/config/aarch64/arm_bf16.h
> @@ -28,5 +28,13 @@
>  #define _AARCH64_BF16_H_
>  
>  typedef __bf16 bfloat16_t;
> +typedef float float32_t;
> +
> +__extension__ extern __inline bfloat16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vcvth_bf16_f32 (float32_t __a)
> +{
> +  return __builtin_aarch64_bfcvtbf (__a);
> +}

Sorry for not noticing last time, but this should be wrapped in:

#pragma GCC push_options
#pragma GCC target ("+nothing+bf16")

...

#pragma GCC pop_options

otherwise I think calling this function without +bf16 would trigger
an internal compiler error.  It would be good to have a test that
that doesn't happen (something along the lines of bfcvt-nobf16.c,
but for the scalar case).

> [...]
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
> new file mode 100644
> index 
> ..ffb5305e2e5ea1aadae07e82fd8ed6f9f247c1a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
> @@ -0,0 +1,48 @@
> +/* { dg-do assemble { target { aarch64*-*-* } } } */

The { target ... } isn't necessary here.  (Missed that in the other
review, sorry.)

> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> +/* { dg-add-options arm_v8_2a_bf16_neon } */
> +/* { dg-additional-options "-save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
> +
> +#include 
> +
> +/*
> +**test_bfcvtn:
> +** bfcvtn\tv0.4h, v0.4s

Like with the other review, I think the literal tab you had in the
original patch looks better than \t.

> [...]
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c
> new file mode 100644
> index 
> ..8d7dffe16275de60e884c449afa0fea0b1af6081
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c
> @@ -0,0 +1,15 @@
> +/* { dg-do assemble { target { aarch64*-*-* } } } */

This needs:

/* { dg-require-effective-target aarch64_asm_bf16_ok } */

(Doesn't exist yet, but I hope to post a patch soon.)

Looks good otherwise, thanks.

Richard


Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal for AArch64 AdvSIMD

2020-01-31 Thread Delia Burduv
Sure, here it is. I'll do that for the other patch too.

Thanks,
Delia

On 1/31/20 3:37 PM, Richard Sandiford wrote:
> Delia Burduv  writes:
>> Thank you, Richard!
>>
>> Here is the updated patch. The test that checks for errors when bf16 is
>> disabled is in the bfcvt patch.
> 
> Looks good.  Just a couple of very minor things...
> 
>>
>> Cheers,
>> Delia
>>
>> gcc/ChangeLog:
>>
>> 2019-11-06  Delia Burduv  
>>
>>   * config/aarch64/aarch64-simd-builtins.def
>>   (bfcvtn): New built-in function.
>>   (bfcvtn_q): New built-in function.
>>   (bfcvtn2): New built-in function.
>>   (bfcvt): New built-in function.
>>   * config/aarch64/aarch64-simd.md
>>   (aarch64_bfcvtn): New pattern.
>>   (aarch64_bfcvtn2v8bf): New pattern.
>>   (aarch64_bfcvtbf): New pattern.
>>   * config/aarch64/arm_bf16.h (float32_t): New typedef.
>>   (vcvth_bf16_f32): New intrinsic.
>>   * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
>>   (vcvtq_low_bf16_f32): New intrinsic.
>>   (vcvtq_high_bf16_f32): New intrinsic.
>>   * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
>>   (UNSPEC_BFCVTN): New UNSPEC.
>>   (UNSPEC_BFCVTN2): New UNSPEC.
>>   (UNSPEC_BFCVT): New UNSPEC.
>>   * config/arm/types.md (bf_cvt): New type.
> 
> The patch no longer changes types.md. :-)
> 
>> diff --git 
>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c 
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>> new file mode 100644
>> index 
>> ..9feb7ee7905cb14037427a36797fc67a6fa3fbc8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>> @@ -0,0 +1,67 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>> +/* { dg-additional-options "-save-temps" } */
>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>> +
>> +#include 
>> +
>> +/*
>> +**test_bfmlalb:
>> +**  bfmlalb\tv0.4s, v1.8h, v2.8h
> 
> This version uses \t while the previous one used literal tabs.
> TBH I think the literal tab is nicer (and what we use for SVE FWIW).
> 
> OK with those changes, thanks.  Seems silly to ask when the changes
> are so trivial, but: please could you post an updated patch so that
> I can apply verbatim?
> 
> Richard
> 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index a118f4f121de067c0a80f691b852247b0ab27f7a..02b2154cf64dad02cf57b110af51b19dd7f91c51 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -692,3 +692,14 @@
   VAR2 (TERNOP, bfdot, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_lane, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_laneq, 0, v2sf, v4sf)
+
+  /* Implemented by aarch64_bfmmlaqv4sf  */
+  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
+
+  /* Implemented by aarch64_bfmlal{_lane{q}}v4sf  */
+  VAR1 (TERNOP, bfmlalb, 0, v4sf)
+  VAR1 (TERNOP, bfmlalt, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, v4sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 97f46f96968a6bc2f93bbc812931537b819b3b19..6ba72d7dc82ed02b5b5001a13ca896ab245a9d41 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7091,3 +7091,42 @@
 }
   [(set_attr "type" "neon_dot")]
 )
+
+;; bfmmla
+(define_insn "aarch64_bfmmlaqv4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+(plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
+   (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+ (match_operand:V8BF 3 "register_operand" "w")]
+UNSPEC_BFMMLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmmla\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_fp_mla_s_q")]
+)
+
+;; bfmlal
+(define_insn "aarch64_bfmlalv4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+(plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+(unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+  (match_operand:V8BF 3 "register_operand" "w")]
+ BF_MLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmlal\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_fp_mla_s_q")]
+)
+
+(define_insn "aarch64_bfmlal_lanev4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+(plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+(unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+  (match_operand:VBF 3 "register_operand" "w"

[committed, amdgcn] Fix conditional add LRA failure

2020-01-31 Thread Andrew Stubbs
This patch fixes an ICE in testcase 
gfortran.dg/assumed_rank_bounds_3.f90. The ICE is "unable to generate 
reloads" and is caused by LRA wanting to use the same match constraint 
for two distinct operands.


It would be nice if LRA could choose not to do that (given that the 
alternative allows multiple possibilities for one of the operands), but 
the solution is to ensure that no alternative has more than one '0' 
constraint.


There are probably other patterns that can encounter the same issue, but 
I don't plan to fix them right now.


Andrew
Fix conditional add LRA failure

Fix ICE in testcase gfortran.dg/assumed_rank_bounds_3.f90.

2020-01-31  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (addv64di3_exec): Allow one '0' in each
	alternative only.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 4aad835b2ef..ecdd60b8190 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -1200,13 +1200,14 @@
(set_attr "length" "8")])
 
 (define_insn_and_split "addv64di3_exec"
-  [(set (match_operand:V64DI 0 "register_operand"		 "= &v,  &v")
+  [(set (match_operand:V64DI 0 "register_operand"	 "= &v,  &v, &v")
 	(vec_merge:V64DI
 	  (plus:V64DI
-	(match_operand:V64DI 1 "register_operand"		 "%vDb,vDb0")
-	(match_operand:V64DI 2 "gcn_alu_operand"		 "vDb0, vDb"))
-	  (match_operand:V64DI 3 "gcn_register_or_unspec_operand" " U0,  U0")
-	  (match_operand:DI 4 "gcn_exec_reg_operand"		 "   e,   e")))
+	(match_operand:V64DI 1 "register_operand"	 "%vDb,vDb0,vDb")
+	(match_operand:V64DI 2 "gcn_alu_operand"	 "vDb0, vDb,vDb"))
+	  (match_operand:V64DI 3 "gcn_register_or_unspec_operand"
+			 "   U,   U,  0")
+	  (match_operand:DI 4 "gcn_exec_reg_operand"	 "   e,   e,  e")))
(clobber (reg:DI VCC_REG))]
   ""
   "#"


Re: [rfc PATCH] rs6000: Updated constraint documentation

2020-01-31 Thread Segher Boessenkool
Hi Bill,

Thanks a lot for looking at this!  :-)

On Fri, Jan 31, 2020 at 08:49:21AM -0600, Bill Schmidt wrote:
> >+(define_register_constraint "wa" 
> >"rs6000_constraints[RS6000_CONSTRAINT_wa]"
> >+  "A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  Either a @code{d}
> >+   or a @code{v} register.")
> 
> Not quite true, as the "d" register is only half of a VSX register.  It 
> may or may not be worth including a picture of register overlaps...

No, the "d" registers are the actual full registers, all 128 bits of it.
You often use them in a mode that uses only 64 bits, sure.

I was planning to update this to

(define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
  "A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  This is either an
   FPR (@code{d}) or a VR (@code{v}).")

Does that improve it?

The numbering thing is also mentioned in the %x output modifier stuff.
There must be a better way to present this, but I don't see it yet.  Hrm.

> >  (define_register_constraint "we" 
> >  "rs6000_constraints[RS6000_CONSTRAINT_we]"
> >-  "VSX register if the -mpower9-vector -m64 options were used or 
> >NO_REGS.")
> >+  "@internal VSX register if the -mpower9-vector -m64 options were used
> >+   or NO_REGS.")
> 
> Suggest changing "used or" to "used, else".

Or just "used."; this is internals documentation only, and all similar
constraints will ideally go away at some point (it just didn't fit in
easily with the "enabled" attribute yet; it probably should be just "p9"
for "isa" and test the TARGET_64BIT in the insn condition, something
like that.  Or maybe there shouldn't be separate handling for 64-bit
at all here).

> >  (define_register_constraint "wr" 
> >  "rs6000_constraints[RS6000_CONSTRAINT_wr]"
> >-  "General purpose register if 64-bit instructions are enabled or 
> >NO_REGS.")
> >+  "@internal General purpose register if 64-bit instructions are enabled
> >+   or NO_REGS.")
> 
> Similar here.

Yup.  I didn't change this, fwiw, just synched up md.texi and
constraints.md where they diverged.

> >  (define_memory_constraint "es"
> >-  "A ``stable'' memory operand; that is, one which does not include any
> >-automodification of the base register.  Unlike @samp{m}, this constraint
> >-can be used in @code{asm} statements that might access the operand
> >-several times, or that might not access it at all."
> >+  "@internal
> >+   A ``stable'' memory operand; that is, one which does not include any
> >+   automodification of the base register.  This used to be useful when
> >+   @code{m} allowed automodification of the base register, but as those
> 
> Trailing whitespace here.

Yeah, I don't know how I missed that, git tends to shout about it.
Fixed.

> >  @item wa
> >-Any VSX register if the @option{-mvsx} option was used or NO_REGS.
> >+A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  Either a @code{d} or 
> >a @code{v}
> >+register.
> 
> Same concern as above.

It is literally the same text now (unless I messed up the c'n'p).

> >+@ifset INTERNALS
> >+@item h
> >+@code{vrsave}, @code{ctr}, or @code{lr}.
> >+@end ifset
> 
> 
> I don't see vrsave elsewhere in either document (should have noted this 
> in constraints.md also).

There is no other constraint for vrsave.  constraints.md says

(define_register_constraint "h" "SPECIAL_REGS"
  "@internal @code{vrsave}, @code{ctr}, or @code{lr}.")

(Same text, as should be).  It ends up only in gccint.*, not in gcc.* .

> >  @item we
> >-VSX register if the @option{-mcpu=power9} and @option{-m64} options
> >-were used or NO_REGS.
> >+VSX register if the -mpower9-vector -m64 options were used or NO_REGS.
> 
> As above.  I won't call out the rest of these.

Since this is not new text, and it now only ends up in the internals
documentation, and a lot of it should go away in the short term anyway,
and importantly I don't know a good simple way to write what it does
anyway (because it *isn't* simple), I hoped I could just keep this for
now.

Hrm, I lost markup there, will fix.

> >+@item wZ
> >+Indexed or indirect memory operand, ignoring the bottom 4 bits.
> >+@end ifset
> 
> For consistency, "An indexed..." ?

Yes, thanks!

> >+@item Z
> >+A memory operand that is an indexed or indirect from a register.
> 
> "indexed or indirect access"?

And s/from a register// yeah.

> Great improvements!

Thanks :-)

Somewhere it should say (in the gcc.* doc) that there are other
constraints and output modifiers as well, and some are even supported
for backwards compatibility, but here only the ones you should use are
mentioned.  Not sure where to do that.


Segher


Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal for AArch64 AdvSIMD

2020-01-31 Thread Richard Sandiford
Delia Burduv  writes:
> Thank you, Richard!
>
> Here is the updated patch. The test that checks for errors when bf16 is 
> disabled is in the bfcvt patch.

Looks good.  Just a couple of very minor things...

>
> Cheers,
> Delia
>
> gcc/ChangeLog:
>
> 2019-11-06  Delia Burduv  
>
>  * config/aarch64/aarch64-simd-builtins.def
>  (bfcvtn): New built-in function.
>  (bfcvtn_q): New built-in function.
>  (bfcvtn2): New built-in function.
>  (bfcvt): New built-in function.
>  * config/aarch64/aarch64-simd.md
>  (aarch64_bfcvtn): New pattern.
>  (aarch64_bfcvtn2v8bf): New pattern.
>  (aarch64_bfcvtbf): New pattern.
>  * config/aarch64/arm_bf16.h (float32_t): New typedef.
>  (vcvth_bf16_f32): New intrinsic.
>  * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
>  (vcvtq_low_bf16_f32): New intrinsic.
>  (vcvtq_high_bf16_f32): New intrinsic.
>  * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
>  (UNSPEC_BFCVTN): New UNSPEC.
>  (UNSPEC_BFCVTN2): New UNSPEC.
>  (UNSPEC_BFCVT): New UNSPEC.
>  * config/arm/types.md (bf_cvt): New type.

The patch no longer changes types.md. :-)

> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
> new file mode 100644
> index 
> ..9feb7ee7905cb14037427a36797fc67a6fa3fbc8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
> @@ -0,0 +1,67 @@
> +/* { dg-do assemble { target { aarch64*-*-* } } } */
> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> +/* { dg-add-options arm_v8_2a_bf16_neon } */
> +/* { dg-additional-options "-save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
> +
> +#include 
> +
> +/*
> +**test_bfmlalb:
> +**  bfmlalb\tv0.4s, v1.8h, v2.8h

This version uses \t while the previous one used literal tabs.
TBH I think the literal tab is nicer (and what we use for SVE FWIW).

OK with those changes, thanks.  Seems silly to ask when the changes
are so trivial, but: please could you post an updated patch so that
I can apply verbatim?

Richard


Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD

2020-01-31 Thread Delia Burduv
Here is the updated patch. The changes are minor, so let me know if 
there is anything else to fix or if it can be committed.

Thank you,
Delia

On 1/30/20 2:55 PM, Kyrill Tkachov wrote:
> Hi Delia,
> 
> 
> On 1/28/20 4:44 PM, Delia Burduv wrote:
>> Ping.
>> 
>> *From:* Delia Burduv 
>> *Sent:* 22 January 2020 17:26
>> *To:* gcc-patches@gcc.gnu.org 
>> *Cc:* ni...@redhat.com ; Richard Earnshaw 
>> ; Ramana Radhakrishnan 
>> ; Kyrylo Tkachov 
>> *Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla 
>> and vfma for AArch32 AdvSIMD
>> Ping.
>>
>> I have read Richard Sandiford's comments on the AArch64 patches and I
>> will apply what is relevant to this patch as well. Particularly, I will
>> change the tests to use the exact input and output registers and I will
>> change the types of the rtl patterns.
> 
> 
> Please send the updated patches so that someone can commit them for you 
> once they're reviewed.
> 
> Thanks,
> 
> Kyrill
> 
> 
>>
>> On 12/20/19 6:44 PM, Delia Burduv wrote:
>> > This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and vfmat
>> > as part of the BFloat16 extension.
>> > (https://developer.arm.com/docs/101028/latest.)
>> > The intrinsics are declared in arm_neon.h and the RTL patterns are
>> > defined in neon.md.
>> > Two new tests are added to check assembler output and lane indices.
>> >
>> > This patch depends on the Arm back-end patche.
>> > (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>> >
>> > Tested for regression on arm-none-eabi and armeb-none-eabi. I don't 
>> have
>> > commit rights, so if this is ok can someone please commit it for me?
>> >
>> > gcc/ChangeLog:
>> >
>> > 2019-11-12  Delia Burduv 
>> >
>> >  * config/arm/arm_neon.h (vbfmmlaq_f32): New.
>> >    (vbfmlalbq_f32): New.
>> >    (vbfmlaltq_f32): New.
>> >    (vbfmlalbq_lane_f32): New.
>> >    (vbfmlaltq_lane_f32): New.
>> >      (vbfmlalbq_laneq_f32): New.
>> >    (vbfmlaltq_laneq_f32): New.
>> >  * config/arm/arm_neon_builtins.def (vbfmmla): New.
>> >    (vbfmab): New.
>> >    (vbfmat): New.
>> >    (vbfmab_lane): New.
>> >    (vbfmat_lane): New.
>> >    (vbfmab_laneq): New.
>> >    (vbfmat_laneq): New.
>> >   * config/arm/iterators.md (BF_MA): New int iterator.
>> >    (bt): New int attribute.
>> >    (VQXBF): Copy of VQX with V8BF.
>> >    (V_HALF): Added V8BF.
>> >    * config/arm/neon.md (neon_vbfmmlav8hi): New insn.
>> >    (neon_vbfmav8hi): New insn.
>> >    (neon_vbfma_lanev8hi): New insn.
>> >    (neon_vbfma_laneqv8hi): New expand.
>> >    (neon_vget_high): Changed iterator to VQXBF.
>> >  * config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC.
>> >    (UNSPEC_BFMAB): New UNSPEC.
>> >    (UNSPEC_BFMAT): New UNSPEC.
>> >
>> > 2019-11-12  Delia Burduv 
>> >
>> >      * gcc.target/arm/simd/bf16_ma_1.c: New test.
>> >      * gcc.target/arm/simd/bf16_ma_2.c: New test.
>> >      * gcc.target/arm/simd/bf16_mmla_1.c: New test.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3c78f435009ab027f92693d00ab5b40960d5419d..81f8008ea6a5fb11eb09f6685ba24bb0c54fb248 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,64 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
   return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
 }
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfmmlav8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfmabv8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfmatv8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		   const int __index)
+{
+  return __builtin_neon_vbfmab_lanev8bf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		   const int __index)
+{
+  return __builtin_neon_vbfmat_lanev8bf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline

[Patch,committed][Fortran] Disable front-end optimization for OpenACC atomic (PR93462)

2020-01-31 Thread Tobias Burnus

The OpenACC code
!$acc atomic write
   a = f(n) - f(n)
got -ffrontend-optimize'd such that it ICEed in gfc_trans_omp_atomic.

The same issue occurred for OpenMP in PR 92977 and was solved by
disabling the optimization for EXEC_OMP_ATOMIC.

This patch does now the same for OpenACC (EXEC_OACC_ATOMIC).

Committed as obvious to the trunk.

Tobias

commit 6a97d9eae4543a995f895e6739530f55f5d039a7
Author: Tobias Burnus 
Date:   Fri Jan 31 15:54:21 2020 +0100

[Fortran] Disable front-end optimization for OpenACC atomic (PR93462)

PR fortran/93462
* frontend-passes.c (gfc_code_walker): For EXEC_OACC_ATOMIC, set
in_omp_atomic to true prevent front-end optimization.

PR fortran/93462
* gfortran.dg/goacc/atomic-1.f90: New.
---
 gcc/fortran/ChangeLog| 16 +++-
 gcc/fortran/frontend-passes.c|  1 +
 gcc/testsuite/ChangeLog  |  5 +
 gcc/testsuite/gfortran.dg/goacc/atomic-1.f90 | 17 +
 4 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index b8f70e6140f..9b17daf15f9 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,9 @@
+2020-01-31  Tobias Burnus  
+
+	PR fortran/93462
+	* frontend-passes.c (gfc_code_walker): For EXEC_OACC_ATOMIC, set
+	in_omp_atomic to true prevent front-end optimization.
+
 2020-01-30  Bernhard Reutner-Fischer  
 
 	PR fortran/87103
@@ -25,11 +31,11 @@
 
 2020-01-28  Andrew Benson  
 
-PR fortran/93461
-* trans.h: Increase GFC_MAX_MANGLED_SYMBOL_LEN to
-GFC_MAX_SYMBOL_LEN*3+5 to allow for inclusion of submodule name,
-plus the "." between module and submodule names.
-* gfortran.dg/pr93461.f90: New test.
+	PR fortran/93461
+	* trans.h: Increase GFC_MAX_MANGLED_SYMBOL_LEN to
+	GFC_MAX_SYMBOL_LEN*3+5 to allow for inclusion of submodule name,
+	plus the "." between module and submodule names.
+	* gfortran.dg/pr93461.f90: New test.
 
 2020-01-28  Andrew Benson  
 
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index de11524ba14..bbe34d61c99 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -5258,6 +5258,7 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 	  WALK_SUBEXPR (co->ext.dt->extra_comma);
 	  break;
 
+	case EXEC_OACC_ATOMIC:
 	case EXEC_OMP_ATOMIC:
 	  in_omp_atomic = true;
 	  break;
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 8b1dcf23855..f95d2d4e069 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-01-31  Tobias Burnus  
+
+	PR fortran/93462
+	* gfortran.dg/goacc/atomic-1.f90: New.
+
 2020-01-31  Tamar Christina  
 
 	PR rtl-optimization/91838
diff --git a/gcc/testsuite/gfortran.dg/goacc/atomic-1.f90 b/gcc/testsuite/gfortran.dg/goacc/atomic-1.f90
new file mode 100644
index 000..579f0494b78
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/atomic-1.f90
@@ -0,0 +1,17 @@
+! { dg-do compile }
+!
+! PR fortran/93462
+!
+! Contributed by G. Steinmetz
+!
+program p
+   integer :: n = 1
+   integer :: a
+!$acc atomic write
+   a = f(n) - f(n)
+contains
+   integer function f(x)
+  integer, intent(in) :: x
+  f = x
+   end
+end


Re: [GCC][PATCH][AArch64] ACLE intrinsics for BFCVTN, BFCVTN2 (AArch64 AdvSIMD) and BFCVT (AArch64 FP)

2020-01-31 Thread Delia Burduv
Sorry for the confusion, what I meant to say was:

This patch adds the Armv8.6-a ACLE intrinsics for bfcvtn, bfcvtn2 and 
bfcvt as part of the BFloat16 extension.
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
The intrinsics are declared in arm_bf16.h and arm_neon.h and the RTL 
patterns are defined in aarch64-simd.md.

Tested for regression on aarch64-none-elf and aarch64_be-none-elf. I 
don't have commit rights, so if this is ok can someone please commit it 
for me?

Here is the updated patch.

Thank you,
Delia


gcc/ChangeLog:

2019-11-06  Delia Burduv  

 * config/aarch64/aarch64-simd-builtins.def
 (bfcvtn): New built-in function.
 (bfcvtn_q): New built-in function.
 (bfcvtn2): New built-in function.
 (bfcvt): New built-in function.
 * config/aarch64/aarch64-simd.md
 (aarch64_bfcvtn): New pattern.
 (aarch64_bfcvtn2v8bf): New pattern.
 (aarch64_bfcvtbf): New pattern.
 * config/aarch64/arm_bf16.h (float32_t): New typedef.
 (vcvth_bf16_f32): New intrinsic.
 * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
 (vcvtq_low_bf16_f32): New intrinsic.
 (vcvtq_high_bf16_f32): New intrinsic.
 * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
 (UNSPEC_BFCVTN): New UNSPEC.
 (UNSPEC_BFCVTN2): New UNSPEC.
 (UNSPEC_BFCVT): New UNSPEC.
 * config/arm/types.md (bf_cvt): New type.


gcc/testsuite/ChangeLog:

2020-01-31  Delia Burduv  

 * gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c: New
test.
* gcc.target/aarch64/advsimd-intrinsics/bfcvt-nobf16.c: New
test.
* gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c: New
test.
* gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: New
test.

On 12/23/19 6:30 PM, Richard Sandiford wrote:
> Some of the comments on the BFMMLA/BFMLA[LT] patch apply here too.
> 
> Delia Burduv  writes:
>> This patch adds the Armv8.6-a ACLE intrinsics for bfmmla, bfmlalb and
>> bfmlalt as part of the BFloat16 extension.
> 
> That's the other patch :-)
> 
>> [...]
>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>> b/gcc/config/aarch64/aarch64-simd.md
>> index 
>> 55660ae248f4fa75d35ba2949cd4b9d5139ff5f5..ff7a1f5f34a19b05eba48dba96c736dfdfdf7bac
>>  100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -7027,3 +7027,32 @@
>> "xtn\t%0., %1."
>> [(set_attr "type" "neon_shift_imm_narrow_q")]
>>   )
>> +
>> +;; bfcvtn
>> +(define_insn "aarch64_bfcvtn"
>> +  [(set (match_operand:V4SF_TO_BF 0 "register_operand" "=w")
>> +(unspec:V4SF_TO_BF [(match_operand:V4SF 1 "register_operand" "w")]
>> +UNSPEC_BFCVTN))]
>> +  "TARGET_BF16_SIMD"
>> +  "bfcvtn\\t%0.4h, %1.4s"
>> +  [(set_attr "type" "f_cvt")]
>> +)
>> +
> 
> If I've understood the naming convention correctly, the closest type
> seems to be "neon_fp_cvt_narrow_s_q".
> 
>> +(define_insn "aarch64_bfcvtn2v8bf"
>> +  [(set (match_operand:V8BF 0 "register_operand" "=w")
>> +(unspec:V8BF [(match_operand:V8BF 1 "register_operand" "w")
>> +  (match_operand:V4SF 2 "register_operand" "w")]
>> +  UNSPEC_BFCVTN2))]
>> +  "TARGET_BF16_SIMD"
>> +  "bfcvtn2\\t%0.8h, %2.4s"
>> +  [(set_attr "type" "f_cvt")]
>> +)
> 
> Same here.
> 
> The constraint on operand 1 needs to be "0", otherwise operands 1 and 0
> could end up in different registers.  You could test for this using
> something like:
> 
> bfloat16x8_t test_bfcvtnq2_untied (bfloat16x8_t unused, bfloat16x8_t inactive,
>  float32x4_t a)
> {
>return vcvtq_high_bf16_f32 (inactive, a);
> }
> 
> which when compiled at -O should produce something like:
> 
> /*
> **test_bfcvtnq2_untied:
> **mov v0\.8h, v1\.8h
> **bfcvtn2 v0\.8h, v2\.4s
> **ret
> */
> 
> (Completely untested, the code above is probably wrong.)
> 
>> +
>> +(define_insn "aarch64_bfcvtbf"
>> +  [(set (match_operand:BF 0 "register_operand" "=w")
>> +(unspec:BF [(match_operand:SF 1 "register_operand" "w")]
>> +UNSPEC_BFCVT))]
>> +  "TARGET_BF16_SIMD"
> 
> I think this just needs the scalar macro rather than *_SIMD.
> 
>> +  "bfcvt\\t%h0, %s1"
>> +  [(set_attr "type" "f_cvt")]
>> +)
>> diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
>> index 
>> aedb0972735ce549fac1870bacd1ef3101e8fd26..1b9ab3690d35e153cd4f24b9e3bbb5b4cc4b4f4d
>>  100644
>> --- a/gcc/config/aarch64/arm_bf16.h
>> +++ b/gcc/config/aarch64/arm_bf16.h
>> @@ -34,7 +34,15 @@
>>   #ifdef __ARM_FEATURE_BF16_SCALAR_ARITHMETIC
>>   
>>   typedef __bf16 bfloat16_t;
>> -
>> +typedef float float32_t;
>> +
>> +__extension__ extern __inline bfloat16_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vcvth_bf16_f32 \
>> +  (float32_t __a)
> 
> No need fo

Re: [rfc PATCH] rs6000: Updated constraint documentation

2020-01-31 Thread Bill Schmidt



On 1/30/20 6:17 PM, Segher Boessenkool wrote:

This is my current work-in-progress version.  There still are rough
edges, and not much is done for the output modifiers yet, but it should
be in much better shape wrt the user manual now.  The internals manual
also is a bit better I think.

md.texi is not automatically kept in synch with constraints.md (let
alone generated from it), so the two diverged.  I tried to correct
that, too.

Please let me know if you have any ideas how to improve it further, or
if I did something terribly wrong, or anything else.  Thanks,


Segher

---
  gcc/config/rs6000/constraints.md | 159 +++--
  gcc/doc/md.texi  | 188 +++
  2 files changed, 182 insertions(+), 165 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 398c894..bafc22a 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -21,192 +21,214 @@

  ;; Register constraints

-(define_register_constraint "f" "rs6000_constraints[RS6000_CONSTRAINT_f]"
-  "@internal")
-
-(define_register_constraint "d" "rs6000_constraints[RS6000_CONSTRAINT_d]"
-  "@internal")
+; Actually defined in common.md:
+; (define_register_constraint "r" "GENERAL_REGS"
+;   "A general purpose register (GPR), @code{r0}@dots{}@code{r31}.")

  (define_register_constraint "b" "BASE_REGS"
-  "@internal")
+  "A base register.  Like @code{r}, but @code{r0} is not allowed, so
+   @code{r1}@dots{}@code{r31}.")

-(define_register_constraint "h" "SPECIAL_REGS"
-  "@internal")
+(define_register_constraint "f" "rs6000_constraints[RS6000_CONSTRAINT_f]"
+  "A floating point register (FPR), @code{f0}@dots{}@code{f31}.")

-(define_register_constraint "c" "CTR_REGS"
-  "@internal")
-
-(define_register_constraint "l" "LINK_REGS"
-  "@internal")
+(define_register_constraint "d" "rs6000_constraints[RS6000_CONSTRAINT_d]"
+  "A floating point register.  This is the same as @code{f} nowadays;
+   historically @code{f} was for single-precision and @code{d} was for
+   double-precision floating point.")

  (define_register_constraint "v" "ALTIVEC_REGS"
-  "@internal")
+  "An Altivec vector register (VR), @code{v0}@dots{}@code{v31}.")
+
+(define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
+  "A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  Either a @code{d}
+   or a @code{v} register.")



Not quite true, as the "d" register is only half of a VSX register.  It 
may or may not be worth including a picture of register overlaps...



+
+(define_register_constraint "h" "SPECIAL_REGS"
+  "@internal @code{vrsave}, @code{ctr}, or @code{lr}.")
+
+(define_register_constraint "c" "CTR_REGS"
+  "The count register, @code{ctr}.")
+
+(define_register_constraint "l" "LINK_REGS"
+  "The link register, @code{lr}.")

  (define_register_constraint "x" "CR0_REGS"
-  "@internal")
+  "Condition register field 0, @code{cr0}.")

  (define_register_constraint "y" "CR_REGS"
-  "@internal")
+  "Any condition register field, @code{cr0}@dots{}@code{cr7}.")

  (define_register_constraint "z" "CA_REGS"
-  "@internal")
-
-;; Use w as a prefix to add VSX modes
-;; any VSX register
-(define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
-  "Any VSX register if the -mvsx option was used or NO_REGS.")
+  "@internal The carry bit, @code{XER[CA]}.")

  ;; NOTE: For compatibility, "wc" is reserved to represent individual CR bits.
  ;; It is currently used for that purpose in LLVM.

  (define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]"
-  "VSX register if the -mpower9-vector -m64 options were used or NO_REGS.")
+  "@internal VSX register if the -mpower9-vector -m64 options were used
+   or NO_REGS.")



Suggest changing "used or" to "used, else".



  ;; NO_REGs register constraint, used to merge mov{sd,sf}, since movsd can use
  ;; direct move directly, and movsf can't to move between the register sets.
  ;; There is a mode_attr that resolves to wa for SDmode and wn for SFmode
-(define_register_constraint "wn" "NO_REGS" "No register (NO_REGS).")
+(define_register_constraint "wn" "NO_REGS"
+  "@internal No register (NO_REGS).")

  (define_register_constraint "wr" "rs6000_constraints[RS6000_CONSTRAINT_wr]"
-  "General purpose register if 64-bit instructions are enabled or NO_REGS.")
+  "@internal General purpose register if 64-bit instructions are enabled
+   or NO_REGS.")



Similar here.



  (define_register_constraint "wx" "rs6000_constraints[RS6000_CONSTRAINT_wx]"
-  "Floating point register if the STFIWX instruction is enabled or NO_REGS.")
+  "@internal Floating point register if the STFIWX instruction is enabled
+   or NO_REGS.")



And here.



  (define_register_constraint "wA" "rs6000_constraints[RS6000_CONSTRAINT_wA]"
-  "BASE_REGS if 64-bit instructions are enabled or NO_REGS.")
+  "@internal BASE_REGS if 64-bit instructions are enabled or NO_REGS.")



Etc.



  ;; wB ne

Re: [PATCH] V12 patch #1 of 14, add gcc_asserts for rs6000_adjust_vec_address

2020-01-31 Thread Segher Boessenkool
Hi!

On Thu, Jan 09, 2020 at 06:52:05PM -0500, Michael Meissner wrote:
> 2020-01-09  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add some
>   gcc_asserts.

> +  /* All insns should use the 'Q' constraint (address is a single 
> register)
> +  if the element number is not a constant.  */
> +  gcc_assert (REG_P (addr) || SUBREG_P (addr));

So maybe you should just more directly say

  gcc_assert (satisfies_constraint_Q (addr));

?  The Q constraint does not allow subregs, btw, is that an oversight?

Okay for trunk either way.  Thanks!


Segher


Re: [PATCH, ivopts] Fix fast-math-pr55281.c ICE

2020-01-31 Thread Andrew Stubbs

On 31/01/2020 08:09, Richard Biener wrote:

On Thu, Jan 30, 2020 at 3:09 PM Andrew Stubbs  wrote:

How about this?

I've only tested it on the one testcase, so far, but it works for that.

OK to commit (following a full test)?


OK.


X86_64 bootstrap and test showed no issues. Nor amdgcn build and test.

Committed, thanks.

Andrew


Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal for AArch64 AdvSIMD

2020-01-31 Thread Delia Burduv

Thank you, Richard!

Here is the updated patch. The test that checks for errors when bf16 is 
disabled is in the bfcvt patch.

Cheers,
Delia

gcc/ChangeLog:

2019-11-06  Delia Burduv  

 * config/aarch64/aarch64-simd-builtins.def
 (bfcvtn): New built-in function.
 (bfcvtn_q): New built-in function.
 (bfcvtn2): New built-in function.
 (bfcvt): New built-in function.
 * config/aarch64/aarch64-simd.md
 (aarch64_bfcvtn): New pattern.
 (aarch64_bfcvtn2v8bf): New pattern.
 (aarch64_bfcvtbf): New pattern.
 * config/aarch64/arm_bf16.h (float32_t): New typedef.
 (vcvth_bf16_f32): New intrinsic.
 * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
 (vcvtq_low_bf16_f32): New intrinsic.
 (vcvtq_high_bf16_f32): New intrinsic.
 * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
 (UNSPEC_BFCVTN): New UNSPEC.
 (UNSPEC_BFCVTN2): New UNSPEC.
 (UNSPEC_BFCVT): New UNSPEC.
 * config/arm/types.md (bf_cvt): New type.


gcc/testsuite/ChangeLog:

2019-11-06  Delia Burduv  

 * gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c: New
test.
 * gcc.target/aarch64/advsimd-intrinsics/bfcvt-nobf16.c: New
test.
 * gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c: New
test.
 * gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: New
test.


On 12/23/19 6:11 PM, Richard Sandiford wrote:
> Thanks for the patch, looks good.
> 
> Delia Burduv  writes:
>> This patch adds the ARMv8.6 ACLE intrinsics for bfmmla, bfmlalb and bfmlalt 
>> as part of the BFloat16 extension.
>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
>> The intrinsics are declared in arm_neon.h and the RTL patterns are defined 
>> in aarch64-simd.md.
>> Two new tests are added to check assembler output.
>>
>> This patch depends on the two Aarch64 back-end patches. 
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html and 
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html)
>>
>> Tested for regression on aarch64-none-elf and aarch64_be-none-elf. I don't 
>> have commit rights, so if this is ok can someone please commit it for me?
>>
>> gcc/ChangeLog:
>>
>> 2019-10-29  Delia Burduv  
>>
>>  * config/aarch64/aarch64-simd-builtins.def
>>(bfmmla): New built-in function.
>>(bfmlalb): New built-in function.
>>(bfmlalt): New built-in function.
>>(bfmlalb_lane): New built-in function.
>>(bfmlalt_lane): New built-in function.
>>(bfmlalb_laneq): New built-in function.
>>(bfmlalt_laneq): New built-in function
>>  * config/aarch64/aarch64-simd.md (bfmmla): New pattern.
>>(bfmlal): New patterns.
>>  * config/aarch64/arm_neon.h (vbfmmlaq_f32): New intrinsic.
>>(vbfmlalbq_f32): New intrinsic.
>>(vbfmlaltq_f32): New intrinsic.
>>(vbfmlalbq_lane_f32): New intrinsic.
>>(vbfmlaltq_lane_f32): New intrinsic.
>>(vbfmlalbq_laneq_f32): New intrinsic.
>>(vbfmlaltq_laneq_f32): New intrinsic.
>>  * config/aarch64/iterators.md (UNSPEC_BFMMLA): New UNSPEC.
>>(UNSPEC_BFMLALB): New UNSPEC.
>>(UNSPEC_BFMLALT): New UNSPEC.
>>(BF_MLA): New int iterator.
>>(bt): Added UNSPEC_BFMLALB, UNSPEC_BFMLALT.
>>  * config/arm/types.md (bf_mmla): New type.
>>(bf_mla): New type.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-29  Delia Burduv  
>>
>>  * gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c: New 
>> test.
>>  * gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c: New test.
>>  * 
>> gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c:
>>New test.
> 
> Formatting nit: continuation lines should only be indented by a tab,
> rather than a tab and two spaces.  (I agree the above looks nicer,
> but the policy is not to be flexible over this kind of thing...)
> 
>> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
>> b/gcc/config/aarch64/aarch64-simd-builtins.def
>> index 
>> f4ca35a59704c761fe2ac2b6d401fff7c8aba80d..5e9f50f090870d0c63916540a48f5ac132d2630d
>>  100644
>> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
>> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
>> @@ -682,3 +682,14 @@
>> BUILTIN_VSFDF (UNOP, frint32x, 0)
>> BUILTIN_VSFDF (UNOP, frint64z, 0)
>> BUILTIN_VSFDF (UNOP, frint64x, 0)
>> +
>> +  /* Implemented by aarch64_bfmmlaqv4sf  */
>> +  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
>> +
>> +  /* Implemented by aarch64_bfmlal{_lane{q}}v4sf  */
>> +  VAR1 (TERNOP, bfmlalb, 0, v4sf)
>> +  VAR1 (TERNOP, bfmlalt, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalb_laneq, 0, v4sf)
>> +  V

Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-31 Thread Thomas Schwinge
Hi Frederik!

On 2020-01-31T13:17:52+0100, "Harwath, Frederik"  
wrote:
> On 30.01.20 17:08, Thomas Schwinge wrote:
>
>> I understand correctly that the only reason for:
>> 
>> On 2020-01-29T10:52:57+0100, "Harwath, Frederik"  
>> wrote:
>>> * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
>>> (expect_device_properties): Split function into ...
>>> (expect_device_string_properties): ... this new function ...
>>> (expect_device_memory): ... and this new function.
>> 
>> ... this split is that we can't test 'expect_device_memory' here:
>
>> [...]
>
>> ..., because that one doesn't (re-)implement the 'acc_property_memory'
>> interface?
>
> Correct.

OK, thanks for confirming.

> But why "re-"? It has not been implemented before.

Well, yes, it has already been implemented: in
'libgomp/plugin/plugin-gcn.c' ;-) -- that's why I meant it's
re-implemented in the test case(s): in a similar yet slightly different
way compared to the respective plugin(s).  And for avoidance of doubt: I
agree that's a good approach to verify that we're getting some consistent
and meaningful results.)

>>> --- a/libgomp/plugin/plugin-gcn.c
>>> +++ b/libgomp/plugin/plugin-gcn.c

>>> +case GOACC_PROPERTY_MEMORY:
>>> +  {
>>> +   size_t size;
>>> +   hsa_region_t region = agent->data_region;
>>> +   hsa_status_t status =
>>> + hsa_fns.hsa_region_get_info_fn (region, HSA_REGION_INFO_SIZE, &size);
>>> +   if (status == HSA_STATUS_SUCCESS)
>>> + propval.val = size;
>>> +   break;
>>> +  }
>>> [...]
>>>  }
>> 
>> Here we got 'acc_property_memory' implemented, but not here:
>> 
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c
>
> Yes, there seems to be no straightforward way to determine the expected value 
> through
> the runtime API. We might of course try to replicate the logic that is
> used in plugin-gcn.c.

No need to do that now; I was just curious whether that's the reason.


>>> --- a/libgomp/plugin/plugin-gcn.c
>>> +++ b/libgomp/plugin/plugin-gcn.c
>> 
>>> @@ -4115,12 +4141,37 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, 
>>> void *dst, const void *src,
>>>  union goacc_property_value
>>>  GOMP_OFFLOAD_openacc_get_property (int device, enum goacc_property prop)
>>>  {
>>> [...]
>>> +  switch (prop)
>>> +{
>>> +case GOACC_PROPERTY_FREE_MEMORY:
>>> +  /* Not supported. */
>>> +  break;
>> 
>> (OK, can be added later when somebody feels like doing that.)
>
> Well, "not supported" means that there seems to be no (reasonable) way to 
> obtain
> the necessary information from the runtime - in contrast to the nvptx plugin
> where it can be obtained easily through the CUDA API.

OK, I see, and again that's fine.  (The API explicitly permits such
"zero" returns.)

And, I don't know what a user is actually supposed to tell from
'acc_property_free_memory' given that this can change at any point in
time, arbitrarily, not under control of the executing processe, given
that multiple processes may be using the GPU concurrently.


Grüße
 Thomas


signature.asc
Description: PGP signature


*ping* Re: [Patch][Fortran] Fix to strict associate check (PR93427)

2020-01-31 Thread Tobias Burnus

*ping* after 4 days.

On 1/27/20 2:49 PM, Tobias Burnus wrote:
Semantically, there is an issue when the function name is used both 
for recursively calling and as result variable. Hence, one should only 
use one own's function name – in context of function calls – if one 
has a separate result variable.


This somehow got messed up with  r10-5722-g4d12437 (3 Jan 2020, 
PR92994) – rejecting also the use of the function name as result 
variable.


Fixed by removing the check. At least the most straight-forward 
invalid use is still rejected as shown by the augmented test case.


OK for the trunk?

Tobias


On 1/25/20 6:37 AM, Andrew Benson wrote:

I opened PR 93427 for the issue below:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93427

The following code fails to compile (using git commit
472dc648ce3e7661762931d584d239611ddca964):

module a

type :: t
end type t

contains

recursive function b()
   class(t), pointer :: b
   type(t) :: c
   allocate(t :: b)
   select type (b)
   type is (t)
  b=c
   end select
end function b

end module a



$ gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/home/abenson/Galacticus/Tools/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/lto-wrapper 


Target: x86_64-pc-linux-gnu
Configured with: ../gcc-git/configure
--prefix=/home/abenson/Galacticus/Tools
--enable-languages=c,c++,fortran
  --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.0.1 20200124 (experimental) (GCC)


$ gfortran -c p.F90 -o p.o
p.F90:12:15:

    12 |   select type (b)
   |   1
Error: Associating entity 'b' at (1) is a procedure name
p.F90:14:5:

    14 |  b=c
   | 1
Error: 'b' at (1) associated to vector-indexed target cannot be used
in a variable definition context (assignment)


The code compiles successfully using ifort 18.0.1. Removing the
"recursive" attribute, or specifying a "result()" variable makes the
errors go away.


--

* Andrew Benson: 
http://users.obs.carnegiescience.edu/abenson/contact.html


* Galacticus: https://github.com/galacticusorg/galacticus


Re: [PATCH] [amdgcn] Scale number of threads/workers with VGPR usage

2020-01-31 Thread Andrew Stubbs

On 31/01/2020 13:56, Kwok Cheung Yeung wrote:
The GCN architecture has 4 SIMD units per compute unit, with 256 VGPRs 
per SIMD unit. OpenMP threads or OpenACC workers must be distributed 
across the SIMD units, with each thread/worker fitting entirely within a 
single SIMD unit. VGPRs are shared by the kernels running in a SIMD 
unit, so we can have 4 workers that use up to 256 VGPRs, 8 workers that 
use up to 128 VGPRs, 16 workers that use up to 64 VGPRs and so on.


If more threads/workers are requested than can be supported, then the 
runtime fails with the message:


libgomp: GCN fatal error: Asynchronous queue error
Runtime message: HSA_STATUS_ERROR_INVALID_ISA: The instruction set 
architecture is invalid.


This patch adds code to mkoffload such that the number of VGPRs (and 
SGPRs for good measure) requested by a kernel is reported to libgomp at 
runtime. When launching a kernel, if libgomp detects that the number of 
threads/workers exceeds what can be supported by the hardware, it 
automatically scales down the number to the maximum supported value.


This behaviour can be overridden using environment variables to set an 
explicit number of threads/workers (GCN_NUM_THREADS/GCN_NUM_WORKERS), 
but there is not much point IMO as the kernel will just fail to run.


Tested on a GCN3 accelerator with 6 new passes and no regressions noted 
in libgomp. Okay for trunk?


Kwok

     gcc/
     * config/gcn/mkoffload.c (process_asm): Add sgpr_count and 
vgpr_count to

     definition of hsa_kernel_description.  Parse assembly to find SGPR and
     VGPR count of kernel and store in hsa_kernel_description.

     libgomp/
     * plugin/plugin-gcn.c (struct hsa_kernel_description): Add sgpr_count
     and vgpr_count fields.
     (struct kernel_info): Add a field for a hsa_kernel_description.
     (run_kernel): Reduce the number of threads/workers if the requested
     number would require too many VGPRs.
     (init_basic_kernel_info): Initialize description field with
     the hsa_kernel_description entry for the kernel.



OK.

Andrew


[PATCH, i386]: Fix TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling.

2020-01-31 Thread Uros Bizjak
The reason for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL on AMD target is
only insn size, as advised in e.g. Software Optimization Guide for the
AMD Family 15h Processors [1], section 7.1.2, where it is said:

--quote--
7.1.2 Reduce Instruction SizeOptimization

Reduce the size of instructions when possible.

Rationale

Using smaller instruction sizes improves instruction fetch throughput.
Specific examples include the following:

*In SIMD code, use the single-precision (PS) form of instructions
instead of the double-precision (PD) form. For example, for register
to register moves, MOVAPS achieves the same result as MOVAPD, but uses
one less byte to encode the instruction and has no prefix byte. Other
examples in which single-precision forms can be substituted for
double-precision forms include MOVUPS, MOVNTPS, XORPS, ORPS, ANDPS,
and SHUFPS.
...
--/quote--

Please note that this optimization applies only to non-AVX forms, as
demonstrated by:

   0:   0f 28 c8movaps %xmm0,%xmm1
   3:   66 0f 28 c8 movapd %xmm0,%xmm1
   7:   c5 f8 28 d1 vmovaps %xmm1,%xmm2
   b:   c5 f9 28 d1 vmovapd %xmm1,%xmm2

Also note that MOVDQA is missing in the above optimization. It is
harmful to substitute MOVDQA with MOVAPS, as it can (and does)
introduce +1 cycle forwarding penalty between FLT (FPA/FPM) and INT
(VALU) FP clusters.

[1] https://www.amd.com/system/files/TechDocs/47414_15h_sw_opt_guide.pdf

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Will be committed to mainline soon.

Uros.

2020-01-31  Uroš Bizjak  

* config/i386/i386.md (*movoi_internal_avx): Do not check for
TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL.  Remove MODE_V8SF handling.
(*movti_internal): Do not check for
TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL.
(*movtf_internal): Move check for TARGET_SSE2 and size optimization
just after check for TARGET_AVX.
(*movdf_internal): Ditto.
* config/i386/mmx.md (*mov_internal): Do not check for
TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL.
* config/i386/sse.md (mov_internal): Only check
 TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL with V2DFmode.  Move check
for TARGET_SSE2 and size optimization just after check for TARGET_AVX.
(_andnot3): Move check for
TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL after check for TARGET_AVX.
(3): Ditto.
(*andnot3): Ditto.
(*andnottf3): Ditto.
(*3): Ditto.
(*tf3): Ditto.
(*andnot3): Remove
TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling.
(3): Ditto.
(*3): Ditto.
(sse4_1_blendv): Ditto.
* config/i386/x86-tune.def (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL):
Explain that tune applies to 128bit instructions only.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9f0077d59a97..e2675da24c17 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1949,18 +1949,14 @@
   if (misaligned_operand (operands[0], OImode)
  || misaligned_operand (operands[1], OImode))
{
- if (get_attr_mode (insn) == MODE_V8SF)
-   return "vmovups\t{%1, %0|%0, %1}";
- else if (get_attr_mode (insn) == MODE_XI)
+ if (get_attr_mode (insn) == MODE_XI)
return "vmovdqu32\t{%1, %0|%0, %1}";
  else
return "vmovdqu\t{%1, %0|%0, %1}";
}
   else
{
- if (get_attr_mode (insn) == MODE_V8SF)
-   return "vmovaps\t{%1, %0|%0, %1}";
- else if (get_attr_mode (insn) == MODE_XI)
+ if (get_attr_mode (insn) == MODE_XI)
return "vmovdqa32\t{%1, %0|%0, %1}";
  else
return "vmovdqa\t{%1, %0|%0, %1}";
@@ -1980,8 +1976,6 @@
   (and (eq_attr "alternative" "1")
(match_test "TARGET_AVX512VL"))
 (const_string "XI")
-  (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
-(const_string "V8SF")
  ]
  (const_string "OI")))])
 
@@ -2060,11 +2054,10 @@
   (match_test "TARGET_AVX")
 (const_string "TI")
   (ior (not (match_test "TARGET_SSE2"))
-   (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
-(and (eq_attr "alternative" "5")
- (match_test "TARGET_SSE_TYPELESS_STORES"
+   (match_test "optimize_function_for_size_p (cfun)"))
 (const_string "V4SF")
-  (match_test "optimize_function_for_size_p (cfun)")
+  (and (eq_attr "alternative" "5")
+   (match_test "TARGET_SSE_TYPELESS_STORES"))
 (const_string "V4SF")
   ]
   (const_string "TI")))
@@ -2243,12 +2236,10 @@
  (cond [(ior (match_operand 0 "ext_sse_reg_operand")
  (match_operand 1 "ext_sse_reg_operand"))
   (const_string "TI")
-(ior (not (match_test "TARGET_SSE2"))
-

[PATCH] [amdgcn] Scale number of threads/workers with VGPR usage

2020-01-31 Thread Kwok Cheung Yeung
The GCN architecture has 4 SIMD units per compute unit, with 256 VGPRs per SIMD 
unit. OpenMP threads or OpenACC workers must be distributed across the SIMD 
units, with each thread/worker fitting entirely within a single SIMD unit. VGPRs 
are shared by the kernels running in a SIMD unit, so we can have 4 workers that 
use up to 256 VGPRs, 8 workers that use up to 128 VGPRs, 16 workers that use up 
to 64 VGPRs and so on.


If more threads/workers are requested than can be supported, then the runtime 
fails with the message:


libgomp: GCN fatal error: Asynchronous queue error
Runtime message: HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture 
is invalid.


This patch adds code to mkoffload such that the number of VGPRs (and SGPRs for 
good measure) requested by a kernel is reported to libgomp at runtime. When 
launching a kernel, if libgomp detects that the number of threads/workers 
exceeds what can be supported by the hardware, it automatically scales down the 
number to the maximum supported value.


This behaviour can be overridden using environment variables to set an explicit 
number of threads/workers (GCN_NUM_THREADS/GCN_NUM_WORKERS), but there is not 
much point IMO as the kernel will just fail to run.


Tested on a GCN3 accelerator with 6 new passes and no regressions noted in 
libgomp. Okay for trunk?


Kwok

gcc/
* config/gcn/mkoffload.c (process_asm): Add sgpr_count and vgpr_count to
definition of hsa_kernel_description.  Parse assembly to find SGPR and
VGPR count of kernel and store in hsa_kernel_description.

libgomp/
* plugin/plugin-gcn.c (struct hsa_kernel_description): Add sgpr_count
and vgpr_count fields.
(struct kernel_info): Add a field for a hsa_kernel_description.
(run_kernel): Reduce the number of threads/workers if the requested
number would require too many VGPRs.
(init_basic_kernel_info): Initialize description field with
the hsa_kernel_description entry for the kernel.

diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index 0062f15..723da10 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -211,12 +211,13 @@ access_check (const char *name, int mode)
 static void
 process_asm (FILE *in, FILE *out, FILE *cfile)
 {
-  int fn_count = 0, var_count = 0, dims_count = 0;
-  struct obstack fns_os, vars_os, varsizes_os, dims_os;
+  int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0;
+  struct obstack fns_os, vars_os, varsizes_os, dims_os, regcounts_os;
   obstack_init (&fns_os);
   obstack_init (&vars_os);
   obstack_init (&varsizes_os);
   obstack_init (&dims_os);
+  obstack_init (®counts_os);
 
   struct oaccdims
   {
@@ -224,13 +225,20 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 char *name;
   } dim;
 
+  struct regcount
+  {
+int sgpr_count;
+int vgpr_count;
+char *kernel_name;
+  } regcount;
+
   /* Always add _init_array and _fini_array as kernels.  */
   obstack_ptr_grow (&fns_os, xstrdup ("_init_array"));
   obstack_ptr_grow (&fns_os, xstrdup ("_fini_array"));
   fn_count += 2;
 
   char buf[1000];
-  enum { IN_CODE, IN_VARS, IN_FUNCS } state = IN_CODE;
+  enum { IN_CODE, IN_AMD_KERNEL_CODE_T, IN_VARS, IN_FUNCS } state = IN_CODE;
   while (fgets (buf, sizeof (buf), in))
 {
   switch (state)
@@ -243,6 +251,22 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
obstack_grow (&dims_os, &dim, sizeof (dim));
dims_count++;
  }
+   else if (sscanf (buf, " .amdgpu_hsa_kernel %ms\n",
+®count.kernel_name) == 1)
+ break;
+
+   break;
+ }
+   case IN_AMD_KERNEL_CODE_T:
+ {
+   gcc_assert (regcount.kernel_name);
+   if (sscanf (buf, " wavefront_sgpr_count = %d\n",
+   ®count.sgpr_count) == 1)
+ break;
+   else if (sscanf (buf, " workitem_vgpr_count = %d\n",
+®count.vgpr_count) == 1)
+ break;
+
break;
  }
case IN_VARS:
@@ -282,19 +306,36 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
state = IN_VARS;
   else if (sscanf (buf, " .section .gnu.offload_funcs%c", &dummy) > 0)
state = IN_FUNCS;
+  else if (sscanf (buf, " .amd_kernel_code_%c", &dummy) > 0)
+   {
+ state = IN_AMD_KERNEL_CODE_T;
+ regcount.sgpr_count = regcount.vgpr_count = -1;
+   }
   else if (sscanf (buf, " .section %c", &dummy) > 0
   || sscanf (buf, " .text%c", &dummy) > 0
   || sscanf (buf, " .bss%c", &dummy) > 0
   || sscanf (buf, " .data%c", &dummy) > 0
   || sscanf (buf, " .ident %c", &dummy) > 0)
state = IN_CODE;
+  else if (sscanf (buf, " .end_amd_kernel_code_%c", &dummy) > 0)
+   {
+ state = IN_CODE;
+ gcc_assert (regcount.kernel_name != NULL
+ && regcount.sgpr_count >= 0
+  

Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.

2020-01-31 Thread Richard Sandiford
Stam Markianos-Wright  writes:
> On 1/30/20 10:01 AM, Richard Sandiford wrote:
>> Stam Markianos-Wright  writes:
>>> On 1/29/20 12:42 PM, Richard Sandiford wrote:
 Stam Markianos-Wright  writes:
> Hi all,
>
> This fixes:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300
>
> Genmodes.c was generating the "wider_mode" chain as follows:
>
> HF -> BF -> SF - > DF -> TF -> VOID
>
> This caused issues in some rare cases where conversion between modes
> was needed, such as the above PR93300 where BFmode was being picked up
> as a valid mode for:
>
> optabs.c:prepare_float_lib_cmp
>
> which then led to the ICE at expr.c:convert_mode_scalar.
>>>
>>> Hi Richard,
>>>

 Can you go into more details about why this chain was a problem?
 Naively, it's the one I'd have expected: HF should certainly have
 priority over BF,
>>>
>>> Is that because functionally it looks like genmodes puts things in reverse
>>> alphabetical order if all else is equal? (If I'm reading the comment about
>>> MODE_RANDOM, MODE_CC correctly)
>>>
 but BF coming before SF doesn't seem unusual in itself.

 I'm not saying the patch is wrong.  It just wasn't clear why it was
 right either.

>>> Yes, I see what you mean. I'll go through my thought process here:
>>>
>>> In investigating the ICE PR93300 I found that the diversion from pre-bf16
>>> behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a
>>> `FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate
>>> library calls for conversions.
>>>
>>> This was then being caught further down by the gcc_assert at expr.c:325 
>>> where
>>> GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) 
>>> because
>>> it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` 
>>> (which
>>> is what happened if i removed the gcc_assert at expr.c:325)
>>>
>>> With BFmode being a target-defined mode, I didn't want to add something 
>>> like `if
>>> (mode != BFmode)` to specifically exclude BFmode from being selected for 
>>> this.
>>> (and there's nothing different between HFmode and BFmode here to allow me to
>>> make this distinction?)
>>>
>>> Also I couldn't find anywhere where the target back-end is not consulted 
>>> for a
>>> "is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the
>>> libcall being created later on as __extendhfbf2.
>> 
>> Yeah, prepare_float_lib_cmp just checks for libfuncs rather than
>> calling target hooks directly.  The libfuncs themselves are under
>> the control of the target though.
>> 
>> By default we assume all float modes have associated libfuncs.
>> It's then up to the target to remove functions that don't exist
>> (or redirect them to other functions).  So I think we need to remove
>> BFmode libfuncs in arm_init_libfuncs in the same way as we currently
>> do for HFmode.
>> 
>> I guess we should also nullify the conversion libfuncs for BFmode,
>> not just the arithmetic and comparison ones.
>
> Ahhh now this works, thank you for the suggestion!
>
> I was aware of arm_init_libfuncs, but I had not realised that returning NULL 
> would have the desired effect for us, in this case. So I have essentially 
> rolled 
> back the whole previous version of the patch and done this in the new diff.
> It seems to have fixed the ICE and I am currently in the process of 
> regression 
> testing!

LGTM behaviourally, just a couple of requests about how it's written:

>
> Thank you!
> Stam
>
>> 
>> Thanks,
>> Richard
>> 
>>> Finally, because we really don't want __bf16 to be in the same "conversion 
>>> rank"
>>> as standard float modes for things like automatic promotion, this seemed 
>>> like a
>>> reasonable solution to that problem :)
>>>
>>> Let me know of your thoughts!
>>>
>>> Cheers,
>>> Stam
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index c47fc232f39..18055d4a75e 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -2643,6 +2643,30 @@ arm_init_libfuncs (void)
>  default:
>break;
>  }
> +
> +  /* For all possible libcalls in BFmode, return NULL.  */
> +  /* Conversions.  */
> +  set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL));
> +  set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL));
> +  set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL));
> +  set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL));
> +  set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL));
> +  set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL));

It might be slightly safer to do:

  FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT)

to iterate over all float modes on the non-BF side.

> +  /* Arithmetic.  */
> +  set_optab_libfunc (add_optab, BFmode, NULL);
> +  set_optab_libfunc (sdiv_optab, BFmode, NULL);
> +  set_optab_libfunc (smul_optab, BFmode, NULL);
> +  set_optab_libfunc (neg_optab, BFmode, NULL);
> +  set_optab_libfunc (sub_optab, BFmode, NULL);
> +
> 

Re: [PATCH][GCC][middle-end] Fix logical shift truncation (PR91838)

2020-01-31 Thread Tamar Christina
Hi Segher,

The 01/31/2020 12:13, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jan 31, 2020 at 10:12:08AM +, Tamar Christina wrote:
> > This fixes a fall-out from a patch I had submitted two years ago which 
> > started
> > allowing simplify-rtx to fold logical right shifts by offsets a followed by 
> > b
> > into >> (a + b).
> > 
> > However this can generate inefficient code when the resulting shift count 
> > ends
> > up being the same as the size of the shift mode.  This will create some
> > undefined behavior on most platforms.
> > 
> > This patch changes to code to truncate to 0 if the shift amount goes out of
> 
> > Note that this doesn't take care of the Arithmetic shift where you could 
> > replace
> > the constant with MODE_BITS (mode) - 1, but that's not a regression so 
> > punting it.
> 
> Aww :-)
> 
> > 2020-01-31  Tamar Christina  
> > 
> > PR 91838
> > * simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case
> > to truncate if allowed or reject combination.
> 
> PR rtl-optimization/91838 please?  But the bug is currently set as a
> target bug; looks like it should be fixed in bugzilla as well.
> 
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index 
> > eff1d07a2533c7bda5f0529cd318f08e6d5209d6..543cd5885105fb0e4568568a3c876c74cc1068bf
> >  100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -3647,9 +3647,21 @@ simplify_binary_operation_1 (enum rtx_code code, 
> > machine_mode mode,
> > {
> >   rtx tmp = gen_int_shift_amount
> > (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
> > - tmp = simplify_gen_binary (code, inner_mode,
> > -XEXP (SUBREG_REG (op0), 0),
> > -tmp);
> > +
> > +/* Combine would usually zero out the value when combining two
> > +   local shifts and the range becomes larger or equal to the mode.
> > +   However since we fold away one of the shifts here combine won't
> > +   see it so we should immediately truncate the shift if it's out of
> > +   range.  */
> 
> "Truncate the shift" is confusing.  "Zero the result"?
> 
> > +if (code == LSHIFTRT
> > +&& INTVAL (tmp) >= GET_MODE_BITSIZE (inner_mode))
> > + tmp = const0_rtx;
> > +else
> > +  tmp = simplify_gen_binary (code,
> > + inner_mode,
> > + XEXP (SUBREG_REG (op0), 0),
> > + tmp);
> > +
> >   return lowpart_subreg (int_mode, tmp, inner_mode);
> > }
> 
> Should this be done for at least left shifts, too?
> 

The simplification does indeed only happen for right shifts, I'll open a PR for 
supporting the
arithmetic shifts and also left shift and will add those once we're add stage 1 
again.

Thanks,
Tamar

> Looks good with those things fixed.
> 
> 
> Segher

-- 


[PATCH] libgfortran: Fix and simplify IO locking [PR92836]

2020-01-31 Thread Janne Blomqvist
Simplify IO locking in libgfortran.  The new IO implementation avoids
accessing units without locks, as seen in PR 92836.  It also avoids
lock inversion (except for a corner case wrt namelist query when
reading from stdin and outputting to stdout), making it easier to
verify correctness with tools like valgrind or threadsanitizer.  It is
also simplified as the waiting and closed variables are not needed
anymore, making it easier to understand and analyze.

Regtested on x86_64-pc-linux-gnu, Ok for master?
---
 libgfortran/io/close.c |  17 ++---
 libgfortran/io/io.h|  57 +-
 libgfortran/io/list_read.c |  26 ++-
 libgfortran/io/unit.c  | 150 +++--
 libgfortran/io/unix.c  |  89 --
 5 files changed, 109 insertions(+), 230 deletions(-)

diff --git a/libgfortran/io/close.c b/libgfortran/io/close.c
index 8aaa00393e7..3b176285149 100644
--- a/libgfortran/io/close.c
+++ b/libgfortran/io/close.c
@@ -61,20 +61,15 @@ st_close (st_parameter_close *clp)
 find_option (&clp->common, clp->status, clp->status_len,
 status_opt, "Bad STATUS parameter in CLOSE statement");
 
-  u = find_unit (clp->common.unit);
+  LOCK (&unit_lock);
+  u = find_unit_locked (clp->common.unit);
 
   if (ASYNC_IO && u && u->au)
 if (async_wait (&(clp->common), u->au))
-  {
-   library_end ();
-   return;
-  }
+  goto done;
 
   if ((clp->common.flags & IOPARM_LIBRETURN_MASK) != IOPARM_LIBRETURN_OK)
-  {
-library_end ();
-return;
-  }
+goto done;
 
   if (u != NULL)
 {
@@ -123,6 +118,8 @@ st_close (st_parameter_close *clp)
 #endif
 }
 
-  /* CLOSE on unconnected unit is legal and a no-op: F95 std., 9.3.5. */ 
+ done:
+  /* CLOSE on unconnected unit is legal and a no-op: F95 std., 9.3.5. */
+  UNLOCK (&unit_lock);
   library_end ();
 }
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index ab4a103787c..ae0ed1b10a9 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -681,16 +681,6 @@ typedef struct gfc_unit
   struct async_unit *au;
 
   __gthread_mutex_t lock;
-  /* Number of threads waiting to acquire this unit's lock.
- When non-zero, close_unit doesn't only removes the unit
- from the UNIT_ROOT tree, but doesn't free it and the
- last of the waiting threads will do that.
- This must be either atomically increased/decreased, or
- always guarded by UNIT_LOCK.  */
-  int waiting;
-  /* Flag set by close_unit if the unit as been closed.
- Must be manipulated under unit's lock.  */
-  int closed;
 
   /* For traversing arrays */
   array_loop_spec *ls;
@@ -780,6 +770,9 @@ internal_proto(stash_internal_unit);
 extern gfc_unit *find_unit (int);
 internal_proto(find_unit);
 
+extern gfc_unit *find_unit_locked (int);
+internal_proto(find_unit_locked);
+
 extern gfc_unit *find_or_create_unit (int);
 internal_proto(find_or_create_unit);
 
@@ -973,50 +966,6 @@ internal_proto(size_from_complex_kind);
 extern void free_ionml (st_parameter_dt *);
 internal_proto(free_ionml);
 
-static inline void
-inc_waiting_locked (gfc_unit *u)
-{
-#ifdef HAVE_ATOMIC_FETCH_ADD
-  (void) __atomic_fetch_add (&u->waiting, 1, __ATOMIC_RELAXED);
-#else
-  u->waiting++;
-#endif
-}
-
-static inline int
-predec_waiting_locked (gfc_unit *u)
-{
-#ifdef HAVE_ATOMIC_FETCH_ADD
-  /* Note that the pattern
-
- if (predec_waiting_locked (u) == 0)
- // destroy u
-
- could be further optimized by making this be an __ATOMIC_RELEASE,
- and then inserting a
-
- __atomic_thread_fence (__ATOMIC_ACQUIRE);
-
- inside the branch before destroying.  But for now, lets keep it
- simple.  */
-  return __atomic_add_fetch (&u->waiting, -1, __ATOMIC_ACQ_REL);
-#else
-  return --u->waiting;
-#endif
-}
-
-static inline void
-dec_waiting_unlocked (gfc_unit *u)
-{
-#ifdef HAVE_ATOMIC_FETCH_ADD
-  (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
-#else
-  __gthread_mutex_lock (&unit_lock);
-  u->waiting--;
-  __gthread_mutex_unlock (&unit_lock);
-#endif
-}
-
 
 static inline void
 memset4 (gfc_char4_t *p, gfc_char4_t c, int k)
diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index 77d61421a0f..c337b3c8600 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -2793,7 +2793,31 @@ nml_query (st_parameter_dt *dtp, char c)
   /* Store the current unit and transfer to stdout.  */
 
   temp_unit = dtp->u.p.current_unit;
-  dtp->u.p.current_unit = find_unit (options.stdout_unit);
+
+  /* Since we already hold the lock for stdin (temp_unit), trying to
+ acquire the unit_lock in order to find the stdout unit would be a
+ lock order inversion, which is not allowed as it could cause a
+ deadlock.  Hence first try to lock unit_lock without blocking.
+ If that fails, fall back to unlocking temp_unit and then block
+ waiting for the lock in order to avoid the lock order inversion.
+ This is strictly speaking not correct, 

Re: [PATCH, v3] coroutines: Fix ICE on invalid (PR93458).

2020-01-31 Thread Nathan Sidwell

On 1/30/20 9:43 AM, Iain Sandoe wrote:

Hi Nathan,



however. ….

also, what if you find something, but it's not a type template?


… I’ve switched the complain off on lookup_qualified_name and now check for
a type template.


I'm not sure that's helpful.  I think you should still complain on the 
lookup.  If that returns error_mark_node, you're done.  If it returns 
NULL, does it emit an error?  If not, you should emit a not found error. 
 Finally, if it does return a thing, check if it's a class template 
decl (or alias template I guess --  DECL_TEMPLATE_RESULT being a 
TYPE_DECL is close enough) and if not, error on that (X is not a 
template class).  Then the user's fully clued in.



I took the liberty of repeating this treatment in the coroutine handle lookup 
code
(new testcases attached).

so the errors now look like:

"cannot find a valid coroutine traits template using 'std::coroutine_traits’”


hm, 'valid'.  If you find a template_decl, but cannot instantiate it, 
that sounds not valid.  But I suspect you do not diagnose that here, 
because in general you cannot :)


sorry to be a pain.

nathan
--
Nathan Sidwell


[PATCH] tree-optimization/91123 - restore redundant store removal

2020-01-31 Thread Richard Biener


Redundant store removal in FRE was restricted for correctness reasons.
The following extends correctness fixes required to memcpy/aggregate
copy translation.  The main change is that we no longer insert
references rewritten to cover such aggregate copies into the hashtable
but the original one.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2020-01-31  Richard Biener  

PR tree-optimization/91123
* tree-ssa-sccvn.c (vn_walk_cb_data::finish): New method.
(vn_walk_cb_data::last_vuse): New member.
(vn_walk_cb_data::saved_operands): Likewsie.
(vn_walk_cb_data::~vn_walk_cb_data): Release saved_operands.
(vn_walk_cb_data::push_partial_def): Use finish.
(vn_reference_lookup_2): Update last_vuse and use finish if
we've saved operands.
(vn_reference_lookup_3): Use finish and update calls to
push_partial_defs everywhere.  When translating through
memcpy or aggregate copies save off operands and alias-set.
(eliminate_dom_walker::eliminate_stmt): Restore VN_WALKREWRITE
operation for redundant store removal.

* gcc.dg/tree-ssa/ssa-fre-85.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-85.c | 14 +
 gcc/tree-ssa-sccvn.c   | 95 +++---
 2 files changed, 74 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-85.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-85.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-85.c
new file mode 100644
index 000..c50770caa2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-85.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fstrict-aliasing -fdump-tree-fre1-details" } */
+
+struct X { int i; int j; };
+
+struct X x, y;
+void foo ()
+{
+  x.i = 1;
+  y = x;
+  y.i = 1; // redundant
+}
+
+/* { dg-final { scan-tree-dump "Deleted redundant store y.i" "fre1" } } */
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 6e0b2202157..2ffbc643669 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -1687,26 +1687,30 @@ struct vn_walk_cb_data
 {
   vn_walk_cb_data (vn_reference_t vr_, tree orig_ref_, tree *last_vuse_ptr_,
   vn_lookup_kind vn_walk_kind_, bool tbaa_p_)
-: vr (vr_), last_vuse_ptr (last_vuse_ptr_),
-  vn_walk_kind (vn_walk_kind_), tbaa_p (tbaa_p_), known_ranges (NULL)
+: vr (vr_), last_vuse_ptr (last_vuse_ptr_), last_vuse (NULL_TREE),
+  vn_walk_kind (vn_walk_kind_), tbaa_p (tbaa_p_),
+  saved_operands (vNULL), first_set (-2), known_ranges (NULL)
{
+ if (!last_vuse_ptr)
+   last_vuse_ptr = &last_vuse;
  ao_ref_init (&orig_ref, orig_ref_);
}
   ~vn_walk_cb_data ();
-  void *push_partial_def (const pd_data& pd, tree,
- alias_set_type, HOST_WIDE_INT);
+  void *finish (alias_set_type, tree);
+  void *push_partial_def (const pd_data& pd, alias_set_type, HOST_WIDE_INT);
 
   vn_reference_t vr;
   ao_ref orig_ref;
   tree *last_vuse_ptr;
+  tree last_vuse;
   vn_lookup_kind vn_walk_kind;
   bool tbaa_p;
+  vec saved_operands;
 
   /* The VDEFs of partial defs we come along.  */
   auto_vec partial_defs;
   /* The first defs range to avoid splay tree setup in most cases.  */
   pd_range first_range;
-  tree first_vuse;
   alias_set_type first_set;
   splay_tree known_ranges;
   obstack ranges_obstack;
@@ -1719,6 +1723,17 @@ vn_walk_cb_data::~vn_walk_cb_data ()
   splay_tree_delete (known_ranges);
   obstack_free (&ranges_obstack, NULL);
 }
+  saved_operands.release ();
+}
+
+void *
+vn_walk_cb_data::finish (alias_set_type set, tree val)
+{
+  if (first_set != -2)
+set = first_set;
+  return vn_reference_lookup_or_insert_for_pieces
+  (last_vuse, set, vr->type,
+   saved_operands.exists () ? saved_operands : vr->operands, val);
 }
 
 /* pd_range splay-tree helpers.  */
@@ -1753,7 +1768,7 @@ pd_tree_dealloc (void *, void *)
on failure.  */
 
 void *
-vn_walk_cb_data::push_partial_def (const pd_data &pd, tree vuse,
+vn_walk_cb_data::push_partial_def (const pd_data &pd,
   alias_set_type set, HOST_WIDE_INT maxsizei)
 {
   const HOST_WIDE_INT bufsize = 64;
@@ -1774,7 +1789,6 @@ vn_walk_cb_data::push_partial_def (const pd_data &pd, 
tree vuse,
   partial_defs.safe_push (pd);
   first_range.offset = pd.offset;
   first_range.size = pd.size;
-  first_vuse = vuse;
   first_set = set;
   last_vuse_ptr = NULL;
   /* Continue looking for partial defs.  */
@@ -1908,8 +1922,7 @@ vn_walk_cb_data::push_partial_def (const pd_data &pd, 
tree vuse,
 "Successfully combined %u partial definitions\n", ndefs);
   /* We are using the alias-set of the first store we encounter which
 should be appropriate here.  */
-  return vn_reference_lookup_or_insert_for_pieces
-   (first_vuse, first_set, vr->type, vr->operands, val);
+  

[PATCH] tree-optimization/92819 restrict new vector CTOR canonicalization

2020-01-31 Thread Richard Biener
The PR shows that code generation ends up pessimized by the new
canonicalization rules that end up nailing do-not-care elements
to specific values making it hard to generate good code later.

The temporary solution is to avoid this for the cases we also
obviously know the canonicalization will create more GIMPLE stmts than
before.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2020-01-31  Richard Biener  

PR tree-optimization/92819
* tree-ssa-forwprop.c (simplify_vector_constructor): Avoid
generating more stmts than before.

* gcc.target/i386/pr92819.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr92819.c | 45 +
 gcc/tree-ssa-forwprop.c | 15 +--
 2 files changed, 58 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92819.c

diff --git a/gcc/testsuite/gcc.target/i386/pr92819.c 
b/gcc/testsuite/gcc.target/i386/pr92819.c
new file mode 100644
index 000..773e3490ab3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92819.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2 -fdump-tree-forwprop1" } */
+
+typedef double v4df __attribute__((vector_size (32)));
+typedef double v2df __attribute__((vector_size (16)));
+typedef short v16hi __attribute__((vector_size (32)));
+typedef short v8hi __attribute__((vector_size (16)));
+
+v2df
+foo (v4df x, double *p)
+{
+  return (v2df) { x[1], *p };
+}
+
+v2df
+bar (v4df x, double *p)
+{
+  return (v2df) { x[0], *p }; /* BIT_INSERT_EXPR */
+}
+
+v2df
+baz (v2df x, double *p)
+{
+  return (v2df) { x[1], *p }; /* VEC_PERM_EXPR */
+}
+
+v2df
+qux (v2df x, double *p)
+{
+  return (v2df) { x[0], *p }; /* BIT_INSERT_EXPR */
+}
+
+v2df
+corge (v4df x, double *p)
+{
+  return (v2df) { x[3], *p };
+}
+
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 2 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "forwprop1" } } */
+/* We can't check for 1:1 assembler here so check for what we do not
+   want to see.  */
+/* { dg-final { scan-assembler-not { "perm" } } } */
+/* { dg-final { scan-assembler-not { "insert" } } } */
+/* { dg-final { scan-assembler-not { "broadcast" } } } */
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 5203891950a..f65216d23e9 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2230,7 +2230,6 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   unsigned HOST_WIDE_INT refnelts;
   enum tree_code conv_code;
   constructor_elt *elt;
-  bool maybe_ident;
 
   op = gimple_assign_rhs1 (stmt);
   type = TREE_TYPE (op);
@@ -2245,7 +2244,8 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   orig[0] = NULL;
   orig[1] = NULL;
   conv_code = ERROR_MARK;
-  maybe_ident = true;
+  bool maybe_ident = true;
+  bool maybe_blend[2] = { true, true };
   tree one_constant = NULL_TREE;
   tree one_nonconstant = NULL_TREE;
   auto_vec constants;
@@ -2290,6 +2290,8 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
  orig[j] = ref;
  if (elem != i || j != 0)
maybe_ident = false;
+ if (elem != i)
+   maybe_blend[j] = false;
  elts.safe_push (std::make_pair (j, elem));
  continue;
}
@@ -2439,6 +2441,15 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 }
   else
 {
+  /* If we combine a vector with a non-vector avoid cases where
+we'll obviously end up with more GIMPLE stmts which is when
+we'll later not fold this to a single insert into the vector
+and we had a single extract originally.  See PR92819.  */
+  if (nelts == 2
+ && refnelts > 2
+ && orig[1] == error_mark_node
+ && !maybe_blend[0])
+   return false;
   tree mask_type, perm_type, conv_src_type;
   perm_type = TREE_TYPE (orig[0]);
   conv_src_type = (nelts == refnelts
-- 
2.16.4


Re: [PATCH] doc: clarify the situation with pointer arithmetic

2020-01-31 Thread Richard Biener
On Fri, Jan 31, 2020 at 1:05 PM Uecker, Martin
 wrote:
>
> Am Freitag, den 31.01.2020, 09:02 +0100 schrieb Richard Biener:
> > On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin
> >  wrote:
> > >
> > > Am Donnerstag, den 30.01.2020, 16:50 + schrieb Michael Matz:
> > > > Hi,
> > > >
> > > > On Thu, 30 Jan 2020, Uecker, Martin wrote:
> > > >
> > > > > > guarantees face serious implementation difficulties I think
> > > > > > so the only alternative to PVNI (which I think is implementable
> > > > > > but at a optimization opportunity cost) is one that makes
> > > > > > two pointers with the same value always have the same
> > > > > > provenance (and otherwise make the behavior undefined).
> > > > >
> > > > > This would need to come with precise rules about
> > > > > when the occurance of two such pointers is UB,
> > > > > e.g. comparisons of such pointers, or that
> > > > > two such pointers are cast to int in the same
> > > > > execution.
> > > > >
> > > > > The mere existance of such pointers should be
> > > > > quite common and should not already be UB.
> > > > >
> > > > > But I am uncomfortable with the idea that
> > > > > comparison of pointers is always allowed except
> > > > > for some special case which then is UB. This
> > > > > might cause are and very difficult to find bugs.
> > > >
> > > > As Richi said, the comparison itself wouldn't be UB, all comparisons 
> > > > would
> > > > be allowed.  But _if_ the pointers compare equal, they must have same 
> > > > (or
> > > > overlapping) provenance (i.e. when they have not, then _that_ is UB).
> > >
> > > Sorry, I still don't get it.  In the following example,
> > >
> > > int a[1];
> > > int b[1];
> > >
> > > it is often the case that &a[1] and &b[0] compare equal
> > > because they have the same address but the pointer
> > > have different provenance.
> > >
> > > Or does there need to be an actual evaluation of a comparison
> > > operations? In this case, I do not see the difference to what
> > > I said.
> >
> > I guess I wanted to say that if you do
> >
> >   if (&a[1] == &b[0])
> > if (&a[1] != &b[0])
> >   abort ();
> >
> > then the abort might happen.  I'm using the term "undefined behavior"
> > here.  So whenever you create a value based on two pointers with
> > the same value and different provenance you invoke undefined behavior.
>
> Yes, but it is tricky because one needs to define
> "create a value based on two pointers with..."
>
> Assuming one does not track provenance through integers,
> the only way to create expressions using two pointers
> are comparisons, pointer subtraction, and the tertiary
> operator.
>
> The tertiary operator seems unproblematic. For pointer
> subtraction, the standard already requires same provenance.
>
> For comparisons, one could consider making this case UB.
> But I fear this could be the source of subtle bugs.
>
> Then there is the question about what happens if a
> programm inspects the representation bytes  of a
> pointer directly...

At least the pointer is then no longer a register ;)

> > That allows the compiler to optimize
> >
> > int *q, *r;
> > if (q == r)
> >   *r = 1;
> >
> > into
> >
> > if (q == r)
> >   *q = 1;
> >
> > which it is currently not allowed to do because of that dread one-after-the
> > object equality compare, not because of PNVI, but similar cases
>
> Yes, but as provenance is tracked at compile-time, you could still
> do the optimization if you assign the right provenance to the
> replaced variable, i.e. you replace 'r' with 'q' but keep the
> provenance of 'r'. So while this puts a burden on the compiler
> writers, it seems feasible. Or am  I missing something?

With SSA it's not easy since q before the comparison is the same so it's

  *q_1 = 2;
  if (q_1 == r_2)
*r_2 = 1;  ->  *q_1 = 1;

and we cannot change the provenance of q_1 since that affects the
earlier store.  We'd have to somehow attach provenance to all
_operations_ where it matters (the dereference in this case).  That's
a much larger change.

> > obviously can be constructed with integers (and make our live difficult
> > as we're tracking provenance through integers).
>
> As in PVNI integers do not have provenance, such an optimization would
> always be valid for integers as would all other natural algebraic
> optimizations for integers. I consider this a major strength of
> the proposal and I kind of hoped that compiler writers would agree.

Yes, sure - it avoids this class of problems.  PVNI is probably the
very simplest approach to fix whatever problem it tries to fix ;)

> > Compilers fundamentally work with value-equivalences, the above example
> > shows we may not.  That's IMHO a defect in the standard.
>
> I consider provenance to be part of the value. Think about
> architectures with descriptors that actually trap if you use
> the wrong pointer. This nicely corresponds to a concept
> of abstract pointers which not simple the address of a
> memory location.
>
> The problems we have that we can not (ch

Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-31 Thread Harwath, Frederik
Hi Thomas,

On 30.01.20 17:08, Thomas Schwinge wrote:

> I understand correctly that the only reason for:
> 
> On 2020-01-29T10:52:57+0100, "Harwath, Frederik"  
> wrote:
>>  * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
>>  (expect_device_properties): Split function into ...
>>  (expect_device_string_properties): ... this new function ...
>>  (expect_device_memory): ... and this new function.
> 
> ... this split is that we can't test 'expect_device_memory' here:

> [...]

> ..., because that one doesn't (re-)implement the 'acc_property_memory'
> interface?

Correct. But why "re-"? It has not been implemented before.

>> --- a/libgomp/plugin/plugin-gcn.c
>> +++ b/libgomp/plugin/plugin-gcn.c
> 
>> @@ -4115,12 +4141,37 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, 
>> void *dst, const void *src,
>>  union goacc_property_value
>>  GOMP_OFFLOAD_openacc_get_property (int device, enum goacc_property prop)
>>  {
>> [...]
>> +  switch (prop)
>> +{
>> +case GOACC_PROPERTY_FREE_MEMORY:
>> +  /* Not supported. */
>> +  break;
> 
> (OK, can be added later when somebody feels like doing that.)

Well, "not supported" means that there seems to be no (reasonable) way to obtain
the necessary information from the runtime - in contrast to the nvptx plugin
where it can be obtained easily through the CUDA API.

> 
>> +case GOACC_PROPERTY_MEMORY:
>> +  {
>> +size_t size;
>> +hsa_region_t region = agent->data_region;
>> +hsa_status_t status =
>> +  hsa_fns.hsa_region_get_info_fn (region, HSA_REGION_INFO_SIZE, &size);
>> +if (status == HSA_STATUS_SUCCESS)
>> +  propval.val = size;
>> +break;
>> +  }
>> [...]
>>  }
> 
> Here we got 'acc_property_memory' implemented, but not here:
> 
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c

Yes, there seems to be no straightforward way to determine the expected value 
through
the runtime API. We might of course try to replicate the logic that is
used in plugin-gcn.c.

Best regards,
Frederik



Re: [PATCH][GCC][middle-end] Fix logical shift truncation (PR91838)

2020-01-31 Thread Segher Boessenkool
Hi!

On Fri, Jan 31, 2020 at 10:12:08AM +, Tamar Christina wrote:
> This fixes a fall-out from a patch I had submitted two years ago which started
> allowing simplify-rtx to fold logical right shifts by offsets a followed by b
> into >> (a + b).
> 
> However this can generate inefficient code when the resulting shift count ends
> up being the same as the size of the shift mode.  This will create some
> undefined behavior on most platforms.
> 
> This patch changes to code to truncate to 0 if the shift amount goes out of

> Note that this doesn't take care of the Arithmetic shift where you could 
> replace
> the constant with MODE_BITS (mode) - 1, but that's not a regression so 
> punting it.

Aww :-)

> 2020-01-31  Tamar Christina  
> 
>   PR 91838
>   * simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case
>   to truncate if allowed or reject combination.

PR rtl-optimization/91838 please?  But the bug is currently set as a
target bug; looks like it should be fixed in bugzilla as well.

> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 
> eff1d07a2533c7bda5f0529cd318f08e6d5209d6..543cd5885105fb0e4568568a3c876c74cc1068bf
>  100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -3647,9 +3647,21 @@ simplify_binary_operation_1 (enum rtx_code code, 
> machine_mode mode,
>   {
> rtx tmp = gen_int_shift_amount
>   (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
> -   tmp = simplify_gen_binary (code, inner_mode,
> -  XEXP (SUBREG_REG (op0), 0),
> -  tmp);
> +
> +  /* Combine would usually zero out the value when combining two
> + local shifts and the range becomes larger or equal to the mode.
> + However since we fold away one of the shifts here combine won't
> + see it so we should immediately truncate the shift if it's out of
> + range.  */

"Truncate the shift" is confusing.  "Zero the result"?

> +  if (code == LSHIFTRT
> +  && INTVAL (tmp) >= GET_MODE_BITSIZE (inner_mode))
> +   tmp = const0_rtx;
> +  else
> +tmp = simplify_gen_binary (code,
> +   inner_mode,
> +   XEXP (SUBREG_REG (op0), 0),
> +   tmp);
> +
> return lowpart_subreg (int_mode, tmp, inner_mode);
>   }

Should this be done for at least left shifts, too?

Looks good with those things fixed.


Segher


Re: [PATCH] doc: clarify the situation with pointer arithmetic

2020-01-31 Thread Uecker, Martin
Am Freitag, den 31.01.2020, 09:02 +0100 schrieb Richard Biener:
> On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin
>  wrote:
> > 
> > Am Donnerstag, den 30.01.2020, 16:50 + schrieb Michael Matz:
> > > Hi,
> > > 
> > > On Thu, 30 Jan 2020, Uecker, Martin wrote:
> > > 
> > > > > guarantees face serious implementation difficulties I think
> > > > > so the only alternative to PVNI (which I think is implementable
> > > > > but at a optimization opportunity cost) is one that makes
> > > > > two pointers with the same value always have the same
> > > > > provenance (and otherwise make the behavior undefined).
> > > > 
> > > > This would need to come with precise rules about
> > > > when the occurance of two such pointers is UB,
> > > > e.g. comparisons of such pointers, or that
> > > > two such pointers are cast to int in the same
> > > > execution.
> > > > 
> > > > The mere existance of such pointers should be
> > > > quite common and should not already be UB.
> > > > 
> > > > But I am uncomfortable with the idea that
> > > > comparison of pointers is always allowed except
> > > > for some special case which then is UB. This
> > > > might cause are and very difficult to find bugs.
> > > 
> > > As Richi said, the comparison itself wouldn't be UB, all comparisons would
> > > be allowed.  But _if_ the pointers compare equal, they must have same (or
> > > overlapping) provenance (i.e. when they have not, then _that_ is UB).
> > 
> > Sorry, I still don't get it.  In the following example,
> > 
> > int a[1];
> > int b[1];
> > 
> > it is often the case that &a[1] and &b[0] compare equal
> > because they have the same address but the pointer
> > have different provenance.
> > 
> > Or does there need to be an actual evaluation of a comparison
> > operations? In this case, I do not see the difference to what
> > I said.
> 
> I guess I wanted to say that if you do
> 
>   if (&a[1] == &b[0])
> if (&a[1] != &b[0])
>   abort ();
> 
> then the abort might happen.  I'm using the term "undefined behavior"
> here.  So whenever you create a value based on two pointers with
> the same value and different provenance you invoke undefined behavior.

Yes, but it is tricky because one needs to define
"create a value based on two pointers with..."

Assuming one does not track provenance through integers,
the only way to create expressions using two pointers
are comparisons, pointer subtraction, and the tertiary
operator. 

The tertiary operator seems unproblematic. For pointer
subtraction, the standard already requires same provenance.

For comparisons, one could consider making this case UB.
But I fear this could be the source of subtle bugs.

Then there is the question about what happens if a
programm inspects the representation bytes  of a 
pointer directly...

> That allows the compiler to optimize
> 
> int *q, *r;
> if (q == r)
>   *r = 1;
> 
> into
> 
> if (q == r)
>   *q = 1;
> 
> which it is currently not allowed to do because of that dread one-after-the
> object equality compare, not because of PNVI, but similar cases

Yes, but as provenance is tracked at compile-time, you could still
do the optimization if you assign the right provenance to the
replaced variable, i.e. you replace 'r' with 'q' but keep the
provenance of 'r'. So while this puts a burden on the compiler
writers, it seems feasible. Or am  I missing something?

> obviously can be constructed with integers (and make our live difficult
> as we're tracking provenance through integers).

As in PVNI integers do not have provenance, such an optimization would
always be valid for integers as would all other natural algebraic
optimizations for integers. I consider this a major strength of
the proposal and I kind of hoped that compiler writers would agree.

> Compilers fundamentally work with value-equivalences, the above example
> shows we may not.  That's IMHO a defect in the standard.

I consider provenance to be part of the value. Think about
architectures with descriptors that actually trap if you use
the wrong pointer. This nicely corresponds to a concept
of abstract pointers which not simple the address of a
memory location.

The problems we have that we can not (cheaply) track provenance
at runtime on modern CPUs and only the address part of the pointer
is available ar runtime. For the standard, this implies
that the rules must work both abstract pointers with provenance
and address-only pointers where information about provenance
is not available. Whenever there is a discrepancy between
these two models, we can either make it UB or use the semantics
of the address-only case.

The only real problematic case we have with PVNI is comparisons
for one-after-the object pointers with a pointer of different
provenance. The only choices we have is to make this UB or 
to make the result well-defined and based on the address. 
Both choices have disadvantages.

If we track provenance through integers, there are many
other difficult problems. The reason is that you the

Re: PATCH bugzila request Bug 92146 gm2: the brig, fortran, go and D frontends are missing lang_register_spec_functions

2020-01-31 Thread Gaius Mulley
Gaius Mulley  writes:

> Hi Martin and Matthias,
>
> as requested here are the lang_register_spec_functions for brig,
> fortran, go and D:

ah correction here is the fortran lang_register_spec_functions patch:



12-patches
Description: fortran lang_register_spec_function



regards,
Gaius


PATCH bugzila request Bug 92146 gm2: the brig, fortran, go and D frontends are missing lang_register_spec_functions

2020-01-31 Thread Gaius Mulley

Hi Martin and Matthias,

as requested here are the lang_register_spec_functions for brig,
fortran, go and D:



15-patches
Description: lang register for go


14-patches
Description: lang register for brig


13-patches
Description: lang register for D


13-patches
Description: lang register for fortran


regards,
Gaius


[committed, amdgcn] Zero-initialise masked load destinations

2020-01-31 Thread Andrew Stubbs
This is one of those things I don't know why we didn't notice sooner. 
The patch ensures that unused lanes in masked vector loads are 
zero-initialized, as per the internals manual.


This fixes an execution failure in testcase gfortran.dg/assumed_rank_1.f90.

When investigating the bug I got confused about the meaning of the 
"gather_exec" define_expand, which doesn't quite fit the pattern 
of the other "_exec" instructions. It's only used in one place so I've 
inlined it to avoid future confusion. It also reduces the likelihood of 
accidentally bypassing the zero-initialization in future.


I also needed a convenient way to create 0.0 vector constants without 
uglifying the machine description code, so extending gcn_vec_constant 
seemed like a useful place to do it.


Andrew
Zero-initialise masked load destinations

Fixes an execution failure in testcase gfortran.dg/assumed_rank_1.f90.

2020-01-30  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (gather_exec): Move contents ...
	(mask_gather_load): ... here, and zero-initialize the
	destination.
	(maskloaddi): Zero-initialize the destination.
	* config/gcn/gcn.c:

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 331c768cb88..4aad835b2ef 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -701,34 +701,6 @@
 DONE;
   })
 
-(define_expand "gather_exec"
-  [(match_operand:VEC_ALLREG_MODE 0 "register_operand")
-   (match_operand:DI 1 "register_operand")
-   (match_operand:V64SI 2 "register_operand")
-   (match_operand 3 "immediate_operand")
-   (match_operand:SI 4 "gcn_alu_operand")
-   (match_operand:DI 5 "gcn_exec_reg_operand")]
-  ""
-  {
-rtx undefmode = gcn_gen_undef (mode);
-
-rtx addr = gcn_expand_scaled_offsets (DEFAULT_ADDR_SPACE, operands[1],
-	  operands[2], operands[4],
-	  INTVAL (operands[3]), operands[5]);
-
-if (GET_MODE (addr) == V64DImode)
-  emit_insn (gen_gather_insn_1offset_exec (operands[0], addr,
-		 const0_rtx, const0_rtx,
-		 const0_rtx, undefmode,
-		 operands[5]));
-else
-  emit_insn (gen_gather_insn_2offsets_exec (operands[0], operands[1],
-		  addr, const0_rtx,
-		  const0_rtx, const0_rtx,
-		  undefmode, operands[5]));
-DONE;
-  })
-
 ; Allow any address expression
 (define_expand "gather_expr"
   [(set (match_operand:VEC_ALLREG_MODE 0 "register_operand")
@@ -2801,9 +2773,12 @@
 		(mode, exec, operands[1], gen_rtx_SCRATCH (V64DImode));
 rtx as = gen_rtx_CONST_INT (VOIDmode, MEM_ADDR_SPACE (operands[1]));
 rtx v = gen_rtx_CONST_INT (VOIDmode, MEM_VOLATILE_P (operands[1]));
-rtx undef = gcn_gen_undef (mode);
-emit_insn (gen_gather_expr_exec (operands[0], addr, as, v, undef,
-	   exec));
+
+/* Masked lanes are required to hold zero.  */
+emit_move_insn (operands[0], gcn_vec_constant (mode, 0));
+
+emit_insn (gen_gather_expr_exec (operands[0], addr, as, v,
+	   operands[0], exec));
 DONE;
   })
 
@@ -2843,8 +2818,23 @@
 	operands[2] = tmp;
   }
 
-emit_insn (gen_gather_exec (operands[0], operands[1], operands[2],
-  operands[3], operands[4], exec));
+rtx addr = gcn_expand_scaled_offsets (DEFAULT_ADDR_SPACE, operands[1],
+	  operands[2], operands[4],
+	  INTVAL (operands[3]), exec);
+
+/* Masked lanes are required to hold zero.  */
+emit_move_insn (operands[0], gcn_vec_constant (mode, 0));
+
+if (GET_MODE (addr) == V64DImode)
+  emit_insn (gen_gather_insn_1offset_exec (operands[0], addr,
+		 const0_rtx, const0_rtx,
+		 const0_rtx, operands[0],
+		 exec));
+else
+  emit_insn (gen_gather_insn_2offsets_exec (operands[0], operands[1],
+		  addr, const0_rtx,
+		  const0_rtx, const0_rtx,
+		  operands[0], exec));
 DONE;
   })
 
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index a39e9f3fbd6..16c3aa2567e 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -992,9 +992,19 @@ gcn_vec_constant (machine_mode mode, int a)
 return CONST2_RTX (mode);*/
 
   int units = GET_MODE_NUNITS (mode);
-  rtx tem = gen_int_mode (a, GET_MODE_INNER (mode));
-  rtvec v = rtvec_alloc (units);
+  machine_mode innermode = GET_MODE_INNER (mode);
+
+  rtx tem;
+  if (FLOAT_MODE_P (innermode))
+{
+  REAL_VALUE_TYPE rv;
+  real_from_integer (&rv, NULL, a, SIGNED);
+  tem = const_double_from_real_value (rv, innermode);
+}
+  else
+tem = gen_int_mode (a, innermode);
 
+  rtvec v = rtvec_alloc (units);
   for (int i = 0; i < units; ++i)
 RTVEC_ELT (v, i) = tem;
 


Re: [Patch] Inline optimization for tanh(x)/sinh(x) -> 1.0/cosh(x)

2020-01-31 Thread Richard Biener
On Fri, Jan 31, 2020 at 12:06 PM Marc Glisse  wrote:
>
> On Thu, 30 Jan 2020, Vitor Guidi wrote:
>
> >> + /* Simplify tanh (x) / sinh (x) -> 1.0 / cosh (x). */
> >> + (simplify
> >> +   (rdiv (TANH @0) (SINH @0))
> >> +   (rdiv {build_one_cst (type);} (COSH @0)))
>
> The existing
>
>   (simplify
>(rdiv (SINH:s @0) (COSH:s @0))
> (TANH @0))
>
> has :s (which AFAIK are ignored because the output is a single insn) but
> not this new one, where it would not be ignored. That's not very
> consistent.

True.

Richard.

>
> --
> Marc Glisse


Re: [Patch] Inline optimization for tanh(x)/sinh(x) -> 1.0/cosh(x)

2020-01-31 Thread Marc Glisse

On Thu, 30 Jan 2020, Vitor Guidi wrote:


+ /* Simplify tanh (x) / sinh (x) -> 1.0 / cosh (x). */
+ (simplify
+   (rdiv (TANH @0) (SINH @0))
+   (rdiv {build_one_cst (type);} (COSH @0)))


The existing

 (simplify
  (rdiv (SINH:s @0) (COSH:s @0))
   (TANH @0))

has :s (which AFAIK are ignored because the output is a single insn) but 
not this new one, where it would not be ignored. That's not very 
consistent.


--
Marc Glisse


Re: [PATCH] libstdc++: Always return a sentinel from __gnu_test::test_range::end()

2020-01-31 Thread Jonathan Wakely

On 29/01/20 11:24 -0500, Patrick Palka wrote:

On Wed, 29 Jan 2020, Patrick Palka wrote:


On Wed, 29 Jan 2020, Jonathan Wakely wrote:


On 21/01/20 17:26 -0500, Patrick Palka wrote:
It seems that in practice std::sentinel_for is always 
true, and so the


Doh, good catch.

test_range container doesn't help us detect bugs in ranges code 
in which we
wrongly assume that a sentinel can be manipulated like an 
iterator.  Make the

test_range container more strict by having end() unconditionally return a
sentinel.


Yes, this seems useful.

Is this OK to commit after bootstrap+regtesting succeeds on 
x86_64-pc-linux-gnu?


As you mentioned on IRC, this makes some tests fail.

Some of them are bad tests and this change reveals that, e.g.
std/ranges/access/end.cc should not assume that r.end()==r.end() is
valid (because sentinels can't be compared with sentinels).

In other cases, the test is intentionally relying on the fact the
r.end() returns the same type as r.begin(), e.g. in 
std/ranges/access/rbegin.cc we want to test this case:


— Otherwise, make_reverse_iterator(ranges::end(t)) if both
ranges::begin(t) and ranges::end(t) are valid expressions of the same
type I which models bidirectional_iterator (23.3.4.12).

If the sentinel returned by ranges::end(r) is a different type, we
can't use test_range to verify this part of the ranges::rbegin
requirements.

I think we do want your change, so this patch fixes the tests that
inappropriately assume the sentinel is the same type as the iterator.
When we are actually relying on that property for the test, I'm adding
a new type derived from test_range which provides that guarantee and
using that instead.

There's one final change needed to make std/ranges/access/size.cc
compile after your patch, which is to make the test_range::sentinel
type satisfy std::sized_sentinel_for when the iterator satisfies
std::random_access_iterator, i.e. support subtracting iterators and
sentinels from each other.

Tested powerpc64le-linux, committed to trunk.


Thanks for the review and the testsuite fixes.



Please go ahead and commit your patch to test_range::end() after
re-testing. Thanks, and congratulations on your first libstdc++
patch.


Thanks! :)  I am re-testing now, and will commit the patch after that's
done.


It looks like there are other testsuite failures that need to get
addressed, in particular in the tests
range_operations/{prev,distance,next}.cc.

In prev.cc and next.cc we are passing a sentinel as the first argument
to ranges::next and ranges::prev, which expect an iterator as their
first argument.  In distance.cc we're passing a sentinel as the first
argument to ranges::distance, which also expects an iterator as its
first argument.

Here's an updated patch that adjusts these tests in the straightforward
way.  In test04() in next.cc I had to reset the bounds on the container
after doing ranges::next(begin, end) since we're manipulating an
input_iterator_wrapper.


Please add a comment there saying something like

  // reset single-pass input range

so it's obvious why it's needed.


Is this OK to commit?


OK for master with that tweak, thanks!




[PATCH][GCC][middle-end] Fix logical shift truncation (PR91838)

2020-01-31 Thread Tamar Christina
Hi All,

This fixes a fall-out from a patch I had submitted two years ago which started
allowing simplify-rtx to fold logical right shifts by offsets a followed by b
into >> (a + b).

However this can generate inefficient code when the resulting shift count ends
up being the same as the size of the shift mode.  This will create some
undefined behavior on most platforms.

This patch changes to code to truncate to 0 if the shift amount goes out of
range.  Before my older patch this used to happen in combine when it saw the
two shifts.  However since we combine them here combine never gets a chance to
truncate them.

The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal
with this shift constant but it's better to do the right thing in simplify-rtx.

Note that this doesn't take care of the Arithmetic shift where you could replace
the constant with MODE_BITS (mode) - 1, but that's not a regression so punting 
it.

Bootstrapped Regtested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu
with no issues.

Ok for trunk? and backport to GCC 8 and 9 with some stew?

Thanks,
Tamar

gcc/ChangeLog:

2020-01-31  Tamar Christina  

PR 91838
* simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case
to truncate if allowed or reject combination.

gcc/testsuite/ChangeLog:

2020-01-31  Tamar Christina  

PR 91838
* g++.dg/pr91838.C: New test.

-- 
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index eff1d07a2533c7bda5f0529cd318f08e6d5209d6..543cd5885105fb0e4568568a3c876c74cc1068bf 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -3647,9 +3647,21 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	{
 	  rtx tmp = gen_int_shift_amount
 	(inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
-	  tmp = simplify_gen_binary (code, inner_mode,
- XEXP (SUBREG_REG (op0), 0),
- tmp);
+
+	 /* Combine would usually zero out the value when combining two
+	local shifts and the range becomes larger or equal to the mode.
+	However since we fold away one of the shifts here combine won't
+	see it so we should immediately truncate the shift if it's out of
+	range.  */
+	 if (code == LSHIFTRT
+	 && INTVAL (tmp) >= GET_MODE_BITSIZE (inner_mode))
+	  tmp = const0_rtx;
+	 else
+	   tmp = simplify_gen_binary (code,
+  inner_mode,
+  XEXP (SUBREG_REG (op0), 0),
+  tmp);
+
 	  return lowpart_subreg (int_mode, tmp, inner_mode);
 	}
 
diff --git a/gcc/testsuite/g++.dg/pr91838.C b/gcc/testsuite/g++.dg/pr91838.C
new file mode 100644
index ..4dbaef05ce84770e1c8726dd501b40309a352aaf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr91838.C
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2" } */
+/* { dg-skip-if "" { *-*-* } {-std=c++98} } */
+
+using T = unsigned char; // or ushort, or uint
+using V [[gnu::vector_size(8)]] = T;
+V f(V x) {
+  return x >> 8 * sizeof(T);
+}
+
+/* { dg-final { scan-assembler {pxor\s+%xmm0,\s+%xmm0} { target x86_64-*-* } } } */



Re: [PATCH 2/2] analyzer: avoid use of fold_build2

2020-01-31 Thread Andrew Pinski
On Thu, Jan 30, 2020 at 5:19 PM David Malcolm  wrote:
>
> Various places in the analyzer use fold_build2, test the result, then
> discard it.  It's more efficient to use fold_binary, which avoids
> building and GC-ing a redundant tree for the cases where folding fails.

If these are all true integer constants, then you might want to use
tree_int_cst_compare instead of even using fold_binary/fold_build2.
Also if you are doing equal but always constant (but not always
integer ones), you could use simple_cst_equal instead.

Thanks,
Andrew Pinski

>
> gcc/analyzer/ChangeLog:
> * constraint-manager.cc (range::constrained_to_single_element):
> Replace fold_build2 with fold_binary.  Remove unnecessary newline.
> (constraint_manager::get_or_add_equiv_class): Replace fold_build2
> with fold_binary in two places, and remove out-of-date comment.
> (constraint_manager::eval_condition): Replace fold_build2 with
> fold_binary.
> * region-model.cc (constant_svalue::eval_condition): Likewise.
> (region_model::on_assignment): Likewise.
> ---
>  gcc/analyzer/constraint-manager.cc | 15 ++-
>  gcc/analyzer/region-model.cc   |  6 +++---
>  2 files changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/analyzer/constraint-manager.cc 
> b/gcc/analyzer/constraint-manager.cc
> index f3e31ee0830..4d138188856 100644
> --- a/gcc/analyzer/constraint-manager.cc
> +++ b/gcc/analyzer/constraint-manager.cc
> @@ -145,10 +145,9 @@ range::constrained_to_single_element (tree *out)
>m_upper_bound.ensure_closed (true);
>
>// Are they equal?
> -  tree comparison
> -= fold_build2 (EQ_EXPR, boolean_type_node,
> -  m_lower_bound.m_constant,
> -  m_upper_bound.m_constant);
> +  tree comparison = fold_binary (EQ_EXPR, boolean_type_node,
> +m_lower_bound.m_constant,
> +m_upper_bound.m_constant);
>if (comparison == boolean_true_node)
>  {
>*out = m_lower_bound.m_constant;
> @@ -930,7 +929,7 @@ constraint_manager::get_or_add_equiv_class (svalue_id sid)
>FOR_EACH_VEC_ELT (m_equiv_classes, i, ec)
> if (ec->m_constant)
>   {
> -   tree eq = fold_build2 (EQ_EXPR, boolean_type_node,
> +   tree eq = fold_binary (EQ_EXPR, boolean_type_node,
>cst, ec->m_constant);
> if (eq == boolean_true_node)
>   {
> @@ -967,10 +966,8 @@ constraint_manager::get_or_add_equiv_class (svalue_id 
> sid)
>  Determine the direction of the inequality, and record that
>  fact.  */
>   tree lt
> -   = fold_build2 (LT_EXPR, boolean_type_node,
> +   = fold_binary (LT_EXPR, boolean_type_node,
>new_ec->m_constant, other_ec.m_constant);
> - //gcc_assert (lt == boolean_true_node || lt == 
> boolean_false_node);
> - // not true for int vs float comparisons
>   if (lt == boolean_true_node)
> add_constraint_internal (new_id, CONSTRAINT_LT, other_id);
>   else if (lt == boolean_false_node)
> @@ -1016,7 +1013,7 @@ constraint_manager::eval_condition (equiv_class_id 
> lhs_ec,
>if (lhs_const && rhs_const)
>  {
>tree comparison
> -   = fold_build2 (op, boolean_type_node, lhs_const, rhs_const);
> +   = fold_binary (op, boolean_type_node, lhs_const, rhs_const);
>if (comparison == boolean_true_node)
> return tristate (tristate::TS_TRUE);
>if (comparison == boolean_false_node)
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
> index b546114bfd5..95d002f9c28 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -670,7 +670,7 @@ constant_svalue::eval_condition (constant_svalue *lhs,
>if (types_compatible_p (TREE_TYPE (lhs_const), TREE_TYPE (rhs_const)))
>  {
>tree comparison
> -   = fold_build2 (op, boolean_type_node, lhs_const, rhs_const);
> +   = fold_binary (op, boolean_type_node, lhs_const, rhs_const);
>if (comparison == boolean_true_node)
> return tristate (tristate::TS_TRUE);
>if (comparison == boolean_false_node)
> @@ -4070,9 +4070,9 @@ region_model::on_assignment (const gassign *assign, 
> region_model_context *ctxt)
> if (tree rhs1_cst = maybe_get_constant (rhs1_sid))
>   if (tree rhs2_cst = maybe_get_constant (rhs2_sid))
> {
> - tree result = fold_build2 (op, TREE_TYPE (lhs),
> + tree result = fold_binary (op, TREE_TYPE (lhs),
>  rhs1_cst, rhs2_cst);
> - if (CONSTANT_CLASS_P (result))
> + if (result && CONSTANT_CLASS_P (result))
> {
>   svalue_id result_sid
> = get_or_create_constant_svalue (result);
> --
> 2.21.

Re: [PATCH 2/2] analyzer: avoid use of fold_build2

2020-01-31 Thread Jakub Jelinek
On Thu, Jan 30, 2020 at 08:19:18PM -0500, David Malcolm wrote:
> gcc/analyzer/ChangeLog:
>   * constraint-manager.cc (range::constrained_to_single_element):
>   Replace fold_build2 with fold_binary.  Remove unnecessary newline.
>   (constraint_manager::get_or_add_equiv_class): Replace fold_build2
>   with fold_binary in two places, and remove out-of-date comment.
>   (constraint_manager::eval_condition): Replace fold_build2 with
>   fold_binary.
>   * region-model.cc (constant_svalue::eval_condition): Likewise.
>   (region_model::on_assignment): Likewise.

LGTM.

Jakub



Re: [PATCH] calls.c: refactor special_function_p for use by analyzer (v2)

2020-01-31 Thread Jakub Jelinek
On Thu, Jan 30, 2020 at 08:32:25PM -0500, David Malcolm wrote:
> gcc/analyzer/ChangeLog:
>   * analyzer.cc (is_named_call_p): Replace tests for fndecl being
>   extern at file scope and having a non-NULL DECL_NAME with a call
>   to maybe_special_function_p.
>   * function-set.cc (function_set::contains_decl_p): Add call to
>   maybe_special_function_p.
> 
> gcc/ChangeLog:
>   * calls.c (special_function_p): Split out the check for DECL_NAME
>   being non-NULL and fndecl being extern at file scope into a
>   new maybe_special_function_p and call it.  Drop check for fndecl
>   being non-NULL that was after a usage of DECL_NAME (fndecl).
>   * tree.h (maybe_special_function_p): New inline function.

LGTM.

Jakub



Re: [PATCH 1/2] analyzer: further fixes for comparisons between uncomparable types (PR 93450)

2020-01-31 Thread Jakub Jelinek
On Thu, Jan 30, 2020 at 08:19:17PM -0500, David Malcolm wrote:
> gcc/analyzer/ChangeLog:
>   PR analyzer/93450
>   * constraint-manager.cc
>   (constraint_manager::get_or_add_equiv_class): Only compare constants
>   if their types are compatible.
>   * region-model.cc (constant_svalue::eval_condition): Replace check
>   for identical types with call to types_compatible_p.

LGTM.

Jakub



Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-31 Thread Thomas Schwinge
Hi!

On 2020-01-30T16:45:39+, Andrew Stubbs  wrote:
> On 30/01/2020 16:08, Thomas Schwinge wrote:
>> Andrew and Frederik, thanks for your emails reminding/educating me about
>> 'snprintf' as well as this HSA fixed-size buffer API.  There doesn't
>> happen to be something available in the HSA API available so that we
>> could use 'sizeof [something]' instead of hard-coding '64' etc.?
>
> No, not at present; hsa_agent_get_info_fn is a somewhat generic 
> interface that takes an enum and returns a void*. The return type is 
> written in the documentation: 
> https://rocm-documentation.readthedocs.io/en/latest/ROCm_API_References/ROCr-API.html#rocr-api
>
> However, we don't use the official ROCm header files, because 
> dependencies and licenses, so we could invent our own typedefs in hsa.h, 
> if we chose. I don't see that doing so would be worth the effort now, or 
> maintenance burden later.

ACK.  I was just curious if there's an easy way to properly resolve this
"magic numbers" issue now, once and for all.  Apparently, there isn't --
which shall be fine, then.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH, ivopts] Fix fast-math-pr55281.c ICE

2020-01-31 Thread Richard Biener
On Thu, Jan 30, 2020 at 3:09 PM Andrew Stubbs  wrote:
>
> On 30/01/2020 13:49, Richard Biener wrote:
> > On Thu, Jan 30, 2020 at 2:04 PM Bin.Cheng  wrote:
> >>
> >> On Thu, Jan 30, 2020 at 8:53 PM Andrew Stubbs  
> >> wrote:
> >>>
> >>> On 29/01/2020 08:24, Richard Biener wrote:
>  On Tue, Jan 28, 2020 at 5:53 PM Andrew Stubbs  
>  wrote:
> >
> > This patch fixes an ICE compiling fast-math-pr55281.c for amdgcn.
> >
> > The problem is that an "iv" is created in which both base and step are
> > pointer types,
> 
>  How did you get a POINTER_TYPE step?  That's where the issue lies
>  I think.
> >>>
> >>> It can come from "find_inv_vars_cb":
> >>>
> >>> set_iv (idata, op, op, build_int_cst (TREE_TYPE (op), 0), true);
> >>
> >> This is recording invariant with zero step.  It seems we are using
> >> wrong type building the zero-step.  How about detecting that op has
> >> pointer type and using integer type here?
> >
> > that sounds like a good idea.
>
> How about this?
>
> I've only tested it on the one testcase, so far, but it works for that.
>
> OK to commit (following a full test)?

OK.

Richard.

> Andrew


Re: [Patch] Inline optimization for tanh(x)/sinh(x) -> 1.0/cosh(x)

2020-01-31 Thread Richard Biener
On Fri, Jan 31, 2020 at 12:53 AM Vitor Guidi  wrote:
>
> Hi.
>
> This patch adds a new optimization to avoid the redundant calculation
> tanh(x)/sinh(x) by replacing it for 1.0/cosh(x), for all cases where x is a
> double, a long double or a float.
>
> There should be no need for numerical stability testing, since the division
> of the two functions only adds numerical noise. The correctness of the
> operation is justified by the definition of tanh(x) = sinh(x)/cosh(x). If
> you think it is wise to write a test, please let me know.
>
> As far as testing goes, I ran a check-gcc test under Ubuntu 19.04 by adding
> the test in tanhbysinh.c and found no issues.

OK for GCC 11, please either attach patches or avoid quoting them.

> Best regards,
>
> Vitor.
>
> in gcc/ChangeLog:
> 2020-08-28  Vitor Guidi 
>
> * match.pd: New substitution rule for tanh(x)/sinh(x) ->
> 1.0/cosh(x).
>
> in gcc/testsuite/ChangeLog:
> 2020-08-28  Vitor Guidi 
>
> * gcc.dg/tanhbysinh.c (new): Verify if simplification is applied.

Just use :New testcase, without any function designation.

>
> > diff --git gcc/match.pd gcc/match.pd
> > index 5fee394e7af..3933fcaf9fa 100644
> > --- gcc/match.pd
> > +++ gcc/match.pd
> > @@ -5069,6 +5069,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(rdiv (SINH:s @0) (COSH:s @0))
> > (TANH @0))
> >
> > + /* Simplify tanh (x) / sinh (x) -> 1.0 / cosh (x). */
> > + (simplify
> > +   (rdiv (TANH @0) (SINH @0))
> > +   (rdiv {build_one_cst (type);} (COSH @0)))
> > +
> >   /* Simplify cos(x) / sin(x) -> 1 / tan(x). */
> >   (simplify
> >(rdiv (COS:s @0) (SIN:s @0))
> > diff --git gcc/testsuite/gcc.dg/tanhbysinh.c
> gcc/testsuite/gcc.dg/tanhbysinh.c
> > new file mode 100644
> > index 000..fde72c2f93b
> > --- /dev/null
> > +++ gcc/testsuite/gcc.dg/tanhbysinh.c
> > @@ -0,0 +1,40 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Ofast -fdump-tree-optimized" } */
> > +
> > +extern float sinhf (float);
> > +extern float tanhf (float);
> > +extern double sinh (double);
> > +extern double tanh (double);
> > +extern long double sinhl (long double);
> > +extern long double tanhl (long double);
> > +
> > +double __attribute__ ((noinline))
> > +tanhbysinh_ (double x)
> > +{
> > +return tanh (x) / sinh (x);
> > +}
> > +
> > +float __attribute__ ((noinline))
> > +tanhbysinhf_ (float x)
> > +{
> > +return tanhf (x) / sinhf (x);
> > +}
> > +
> > +long double __attribute__ ((noinline))
> > +tanhbysinhl_ (long double x)
> > +{
> > +return tanhl (x) / sinhl (x);
> > +}
> > +
> > +
> > +/* There must be no calls to sinh or atanh */
> > +/* There must be calls to cosh */
> > +/* {dg-final { scan-tree-dump-not "sinh " "optimized" } } */
> > +/* {dg-final { scan-tree-dump-not "tanh " "optimized" }} */
> > +/* {dg-final { scan-tree-dump-not "sinhf " "optimized" } } */
> > +/* {dg-final { scan-tree-dump-not "tanhf " "optimized" }} */
> > +/* {dg-final { scan-tree-dump-not "sinhl " "optimized" } } */
> > +/* {dg-final { scan-tree-dump-not "tanhl " "optimized" }} */
> > +/* { dg-final { scan-tree-dump "cosh " "optimized" } } */
> > +/* { dg-final { scan-tree-dump "coshf " "optimized" } } */
> > +/* { dg-final { scan-tree-dump "coshl " "optimized" } } */
> > \ No newline at end of file


Re: [PATCH] doc: clarify the situation with pointer arithmetic

2020-01-31 Thread Richard Biener
On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin
 wrote:
>
> Am Donnerstag, den 30.01.2020, 16:50 + schrieb Michael Matz:
> > Hi,
> >
> > On Thu, 30 Jan 2020, Uecker, Martin wrote:
> >
> > > > guarantees face serious implementation difficulties I think
> > > > so the only alternative to PVNI (which I think is implementable
> > > > but at a optimization opportunity cost) is one that makes
> > > > two pointers with the same value always have the same
> > > > provenance (and otherwise make the behavior undefined).
> > >
> > > This would need to come with precise rules about
> > > when the occurance of two such pointers is UB,
> > > e.g. comparisons of such pointers, or that
> > > two such pointers are cast to int in the same
> > > execution.
> > >
> > > The mere existance of such pointers should be
> > > quite common and should not already be UB.
> > >
> > > But I am uncomfortable with the idea that
> > > comparison of pointers is always allowed except
> > > for some special case which then is UB. This
> > > might cause are and very difficult to find bugs.
> >
> > As Richi said, the comparison itself wouldn't be UB, all comparisons would
> > be allowed.  But _if_ the pointers compare equal, they must have same (or
> > overlapping) provenance (i.e. when they have not, then _that_ is UB).
>
> Sorry, I still don't get it.  In the following example,
>
> int a[1];
> int b[1];
>
> it is often the case that &a[1] and &b[0] compare equal
> because they have the same address but the pointer
> have different provenance.
>
> Or does there need to be an actual evaluation of a comparison
> operations? In this case, I do not see the difference to what
> I said.

I guess I wanted to say that if you do

  if (&a[1] == &b[0])
if (&a[1] != &b[0])
  abort ();

then the abort might happen.  I'm using the term "undefined behavior"
here.  So whenever you create a value based on two pointers with
the same value and different provenance you invoke undefined behavior.
That allows the compiler to optimize

int *q, *r;
if (q == r)
  *r = 1;

into

if (q == r)
  *q = 1;

which it is currently not allowed to do because of that dread one-after-the
object equality compare, not because of PNVI, but similar cases
obviously can be constructed with integers (and make our live difficult
as we're tracking provenance through integers).

Compilers fundamentally work with value-equivalences, the above example
shows we may not.  That's IMHO a defect in the standard.

Richard.

>
> Best,
> Martin
>
> > > > > Others proposed to make the result of the comparison unspecified,
> > > > > but I think this does not help.
> > > >
> > > > Indeed.  It's not unspecified, it's known to evaluate to false. I
> > > > think there's existing wording in the standard that allows it to
> > > > evaluate to true for pointers one-after-the-object, that would need to
> > > > be changed of course.
> > >
> > > The problem is that if the comparison if not optimized
> > > and the pointers have the same address, then it would
> > > evaluate to true at run-time. If I understand correctly,
> > > you somehow want to make this case be UB, but I haven't
> > > quite understood how (if it is not the comparison of such
> > > pointers that invokes UB).
> >
> > By saying something like "if two pointers compare equal they must have the
> > same provenance, otherwise the behaviour is undefined".
>
> > (I don't know if this definition would or would not help with the problems
> > PVNI poses to compilers).
> >
> >
> > Ciao,
> > Michael.